Tech Scouter

Posts

Celery with heavy workloads Deep Dive in Solution

- May 20, 2018

Introduction This is about my experience with Celery a distributed framework in Python in heavy workload environment. I am dividing the blog into the different sections: Product Brief Current Architecture Problem With Data loads and Sleepless Nights and Solutions Tried The Final Destination Product Brief We are working on a product that aims at fetching data from the multiple sources and aggregating that data to generate insights from that data. Some of the Data Sources that we support as of now are: Stock News from multiple sources RSS feed Twitter Reddit Yahoo News Yahoo Finance Earning Calenders Report Filings Company Financials Stock Prices Current Architecture Broad View We knew that problem that we are solving has to deal with the cruel decentralized Internet. And we need to divide the large task of getting the data from the web and analyzing it into small tasks. Fig 1 On exploring different projects and technologies and analyzing the community support we cam

Celery Optimization For Workloads

- May 13, 2018

Introduction Firstly, a brief background about myself. I am working as a Software Engineer in one of the Alternate Asset Management Organization ( Handling 1.4 Trillion with our product suite ) responsible for maintaining and developing a product ALT Data Analyzer. My work is focused on making the engine run and feed the machines with their food. This article explains the problems we faced with scaling up our architecture and solution we followed. I am dividing the blog in the following different sections: Product Brief Current Architecture Problem With Data loads and Sleepless Nights Solutions Tried The Final Destination Product Brief The idea of building this product was to give users an aggregated view of the WEB around a company. By WEB I mean the data that is flowing freely over all the decentralized nodes of the internet. We try to capture all the financial, technical and fundamental data for the companies, pass that data through our massaging and analyz

Getting Started with Glove

- April 24, 2018

What is GloVe? GloVe stands for global vectors for word representation. It is an unsupervised learning algorithm developed by Stanford for generating word embeddings by aggregating global word-word co-occurrence matrix from a corpus. The resulting embeddings show interesting linear substructures of the word in vector space. Examples for linear substructures are: These results are pretty impressive. This type of representation can be very useful in many machine learning algorithms. To read more about Word Vectorization you can read my other article . In this blog post, we will be learning about GloVe implementation in python. So, let's get started. Let's create the Embeddings Installing Glove-Python The GloVe is implementation in python is available in library glove-python. pip install glove_python Text Preprocessing In this step, we will pre-process the text like removing the stop words, lemmatize the words etc. You can perform different steps based on your requirements

Word Vectorization

- October 30, 2017

Introduction Machine Learning has become the hottest topic in Data Industry with increasing demand for professionals who can work in this domain. There is large amount of textual data present in internet and giant servers around the world. Just for some facts 1,209,600 new data producing social media users each day. 656 million tweets per day! More than 4 million hours of content uploaded to Youtube every day, with users watching 5.97 billion hours of Youtube videos each day. 67,305,600 Instagram posts uploaded each day There are over 2 billion monthly active Facebook users, compared to 1.44 billion at the start of 2015 and 1.65 at the start of 2016. Facebook has 1.32 billion daily active users on average as of June 2017 4.3 BILLION Facebook messages posted daily! 5.75 BILLION Facebook likes every day. 22 billion texts sent every day. 5.2 BILLION daily Google Searches in 2017. Need for Vectorization The amount of textual data is massive, and the problem with textual dat