Posts

Showing posts with the label corpus representation

Getting Started with Glove

Image
What is GloVe? GloVe stands for global vectors for word representation. It is an unsupervised learning algorithm developed by Stanford for generating word embeddings by aggregating global word-word co-occurrence matrix from a corpus. The resulting embeddings show interesting linear substructures of the word in vector space. Examples for linear substructures are: These results are pretty impressive. This type of representation can be very useful in many machine learning algorithms. To read more about Word Vectorization you can read my other article . In this blog post, we will be learning about GloVe implementation in python. So, let's get started. Let's create the Embeddings Installing Glove-Python The GloVe is implementation in python is available in library  glove-python. pip install glove_python Text Preprocessing In this step, we will pre-process the text like removing the stop words, lemmatize the words etc. You can perform different steps based on your requirements