Posts

Showing posts with the label Data Science

Python 2 Vs Python 3

Image
There has been a huge debate on this topic, " Whether you should learn Python 2 or Python 3". If you are also in this dilemma this post is for you. The first thing you should understand is that if an upgrade is done in anything in the world it is mostly for improving that thing in terms of experience, speed, efficiency, etc. So, this upgrade of Python language is also done to improve the features and functionality of the language but than, Why is this debate all around about Python 2 and Python 3, why can't people just accept this new update rather than debating. To understand this you will have to think deeply and understand that Python has been there for nearly 29 years now(created in 1991), there are large number of legacy systems which are built on Python 2 and there are some feature of Python 3 which are not backward compatible with Python 2. For Example:  The scenario of a simple print statement. In Python 2 print was considered as a statement and in Python 3 it is ...

Why should you learn PYTHON in 2020

Image
What is Coding/Programming? For all those who are very new to coding/programming and are about to start their journey by reading this blog post.  We give instructions to our system using the keyboard or mouse informing the computer to perform some set of tasks like: Printing a doc Writing an email Playing music Dimming the monitor backlight and many more tasks The computer is a hardware device which needs some instructions to run and when we are performing some tasks on the computer, in the background there is some code which is running and telling the computer machine what it has to do. And this happens with the help of some programming language, which acts as an interface between the computer user and hardware. So, coding/programming is an act of writing these instructions using some programming language which when runs on a computer performs some tasks. Pheww...... Programming Languages There is a large variety of programming languages available in the market right now and every...

Extracting text from PDF for NLP tasks

Image
Introduction Natural Language Processing is a task that involves data collection from various sources and not every time one is lucky to get the baked data. Many times you have to extract data from various sources, one of them is Files. In this post, I will be talking specifically about the PDF files. Getting the Guns ready After some exploration on the internet, I came across a python package PyPDF  which sounded a good contender to achieve what we want (text extraction), although it can do more than what we need. This package can also be used to generate, decrypting and merging PDF files, although its usage details are not that clear that's why I thought of writing a post to explain it. Installation pip install PyPDF2 Reading the File and extracting Text import PyPDF2 filename = 'complete path of your pdf file'  #opening the file  pdfFileObj = open(filename,'rb') #creating a pdf reader object pdfReader = PyPDF2.PdfFileReader(pdf...

Sentiment Analysis-Are we there???

Image
This one took long due to the Analysis work I was doing for this post.There is a lot of work going on in the subject of Sentiment analysis so I decided to compare the accuracy of the products. Let's start with some basics... NLP: Natural Language Processing Natural Language Processing is a very interesting topic and a subject of debate when it comes to accuracy of the NLP. Natural Language is very ambiguous as same sentences can have different meanings like "I saw a man on a hill with a telescope. " It seems like a simple statement until you begin to unpack the many alternate meanings: There’s a man on a hill, and I’m watching him with my telescope. There’s a man on a hill, who I’m seeing, and he has a telescope. There’s a man, and he’s on a hill that also has a telescope on it. I’m on a hill, and I saw a man using a telescope. There’s a man on a hill, and I’m seeing him with a telescope. Sarcasm is that component of the language that is diffi...