Posts

Showing posts with the label GROK

Information Extraction using GROKS in Python

Image
Groks in Python In my previous blog , I wrote about information extraction using GROKS and REGEX. If you have not read that I will encourage you to go through this blog first. One of the important aspects of any tool is the ability to use it in a different environment and automate the tasks. In this post, we will be looking at the implementation of GROKs in python using pygrok library. By now we know that GROKs are a form of regular expressions that are more readable. Installation Pygrok is an implementation of GROK patterns in python which is available through pip distribution pip install pygrok Usage The library is extremely useful for using the pre-built groks as well as our own custom-built GROKS. Let's start with a very basic example: Parsing Text  #import the package from pygrok import Grok #text to be processed text = ' gary is male, 25 years old and weighs 68.5 kilograms ' #pattern which you want to match pattern = ' % {WORD : ...

Using GROK for Information Extraction from Text

Image
What Information extraction from text is ??? One of the key part while working with text data is extracting information from the raw text data. Let's take an example of a text sentence that belongs to some data and has data in the following form. Details are: Name Japneet Singh Age 27 Profession Software Engineer Information Extracted from this text would look like Name: Japneet Singh Age: 27 Profession: Software Engineer This information then can be used further in any Machine Learning model. Generally, we perform this step in very early stages of data preprocessing and there can be many advanced ways to deal with it but the old way of using regex remains undefeated champion. REGEX plays an important role whenever we are playing with text data. Here, we will discuss two ways to extract the information: REGEX  GROK to deal with this data extraction. The REGEX Approach Regex is defined by regular-expression.info as A regular expressi...