Columnar Database-MonetDB



Hey readers, recently I was exploring one of the databases offering MonetDB which is a columnar database available open source. As per the documentation, the database has very high read performance and works very well with the data rollups. Apart from this it also supports transactions and other features of transactional databases.
In this blog, I will be covering a few things that I learned while exploring the Database.
  1. Traditional Transactional DB
  2. NO SQL Databases
  3. Columnar Databases
  4. MonetDB
  5. Querying MonetDB using Python


Transactional Databases:

A decade ago the main purpose of databases was to store info and provide the information as and when required. The operations were mainly write heavy and information was stored in normalized form to avoid redundancy and maintain the integrity of information.
The most popular OLTP databases that we have in the market and widely used are :

  • Oracle
  • SQL Server
  • My SQL
There are many other opensource as well as community editions available in the market nowadays.
The core properties of these systems is ACID properties, if you have worked on databases you must be aware f these properties:
  1. Atomicity
  2. Consistency
  3. Isolation
  4. Durability
With time these systems have also developed and support a variety of features like horizontal and vertical scaling, JSON support, in memory operations, caching, fault tolerance, replication etc.

NO SQL Database:

With time the data has grown so much just for the fact over 80% of data that we have as of now is generated the in last 5 years for, obvious reasons like smartphones popularity, internet availability. Also, the data has gained to much importance that most of the businesses nowadays are built on top of data. You must have heard about the quote
Data is the New Oil
And to mine this oil we need some advanced systems which are built specifically for analytical querying.
Some of the examples for OLTP and OLAP querying are:

OLTP:

  • What is the current balance of a customer?
  • What was the last recharge of the customer?
  • Add a new customer
  • Update the phone number for a customer.

OLAP:

  • How many transactions of the amount greater than 5000 are done by the customer in last year?
  • How many new customers are added and have deposited over 10K in the last 3 years?
Can you see the visible difference in the queries?
In OLTP we are majorly working in 1 dimension but in the OLAP systems we are slicing and dicing the dimensions and working with multiple records. Also in OLAP queries, we are mostly reading the data and not updating it.

To cater to these kinds of requirements advanced systems like Hadoop, InfluxDB, Postgres, MongoDB, MonetDB, Redis etc were built.
These systems come under the category of NoSQL databases. All these systems solve some specific data problems.

NOSQL Databases are categorized in 4 classes:
  1. Document Store
  2. Graph DB
  3. Key Value Store
  4. Columnar Stores

In this post, we will be discussing the 4th category i.e the columnar databases.

Columnar Stores:

The traditional systems store the information in row formats i.e the complete information for a record is stored together.
This is good when we fire OLTP queries where data is in normalized form and table have less number of columns and most of the columns are required in a query.
The information for a record is stored at the same memory block which makes it fat to access.

But, it was realized that most of the OLAP queries revolve around very fewer columns and calculations like rollups, aggregations, sum etc are made in these columns.
Keeping these in mind and leveraging the concept of ROW major and COLUMN major that we have in matrices the Column stores were built.

Instead of keeping a complete row together ColumnStores store same column data together in one memory block.
This helps in faster access to large no. of rows which in turn helps in speeding up the calculations.

MonetDB:

As per the documentation:
MonetDB pioneered column-store solutions for high-performance data warehouses for business intelligence and eScience since 1993. It achieves its goal by innovations at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, a modern CPU-tuned query execution architecture, automatic and adaptive indices, run-time query optimization, and a modular software architecture. It is based on the SQL 2003 standard with full support for foreign keys, joins, views, triggers, and stored procedures. It is fully ACID compliant and supports a rich spectrum of programming interfaces (JDBC, ODBC, PHP, Python, RoR, C/C++, Perl).

MonetDB is the focus of database research pushing the technology envelop in many areas. Its three-level software stack, comprised of SQL front-end, tactical-optimizers, and columnar abstract-machine kernel, provide a flexible environment to customize it many different ways. A rich collection of linked-in libraries provide functionality for temporal data types, math routine, strings, and URLs. In-depth information on the technical innovations in the design and implementation of MonetDB can be found in our science library

Other features can be explored on its site.

My Experience with MonetDB:

I had a huge MongoDB database of tweets and news and it is becoming very difficult to get the performance evening after many optimization techniques that we have used.
Exploring out options we decided to explore MonetDB for our problems.

MonetDB is supported in all major operating systems and it was very easy to install it on my Ubuntu 14.04 . Steps to install the Database are mentioned here.

After installing the database it was a bit difficult to understand the environment and get used to it due to very fewer resources available for it.
There is a concept of DB Farm which is a collection of different databases which can be started and stopped independently.
An Example to build a db farm and a database in it is:
shell> monetdbd create /path/to/mydbfarm
shell> monetdbd start /path/to/mydbfarm
shell> monetdb create voc
shell> monetdb release voc
shell> mclient -u monetdb -d voc
password:<monetdb>
Welcome to mclient, the MonetDB/SQL interactive terminal (Oct2014-SP1)
Database: MonetDB v11.19.7 (Oct2014-SP1), 'mapi:monetdb://localhost:50000/voc'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>SELECT 'hello world';
+---------------+
| single_value  |
+===============+
| hello world   |
+---------------+
1 tuple (0.530ms)
sql>\q
Getting started tutorial can be found here.

Once the database is setup you need some client to interact with the database. MCLIENT is the default client which comes installed with the database.
It is a command line tool to interact with the database. MonetDB has a domain specific language which is very similar to SQL.

An example to connect with the database and fire some commands
shell> mclient -u monetdb -d voc
password:<monetdb>
sql>CREATE USER "voc" WITH PASSWORD 'voc' NAME 'VOC Explorer' SCHEMA "sys";
sql>CREATE SCHEMA "voc" AUTHORIZATION "voc";
sql>ALTER USER "voc" SET SCHEMA "voc";
sql>\q
I used python client for MonetDB to migrate the data from MongoDB to MonetDB as there is no direct tool available to do so.

PymonetDB is the python client for MonetDB and it is very easy to interact using it. It is available through PYPI and can be installed using the command
$ pip install pymonetdb
Following code shows how to connect and interact with the database:
> # import the SQL module
> import pymonetdb
>
> # set up a connection. arguments below are the defaults
> connection = pymonetdb.connect(username="monetdb", password="monetdb",
>                                    hostname="localhost", database="demo")
>
> # create a cursor
> cursor = connection.cursor()
>
> # increase the rows fetched to increase performance (optional)
> cursor.arraysize = 100
>
> # execute a query (return the number of rows to fetch)
> cursor.execute('SELECT * FROM tables')
26
>
> # fetch only one row
> cursor.fetchone()
[1062, 'schemas', 1061, None, 0, True, 0, 0]
>
> # fetch the remaining rows
> cursor.fetchall()
[[1067, 'types', 1061, None, 0, True, 0, 0],
 [1076, 'functions', 1061, None, 0, True, 0, 0],
 [1085, 'args', 1061, None, 0, True, 0, 0],
 [1093, 'sequences', 1061, None, 0, True, 0, 0],
 [1103, 'dependencies', 1061, None, 0, True, 0, 0],
 [1107, 'connections', 1061, None, 0, True, 0, 0],
 [1116, '_tables', 1061, None, 0, True, 0, 0],
 ...
 [4141, 'user_role', 1061, None, 0, True, 0, 0],
 [4144, 'auths', 1061, None, 0, True, 0, 0],
 [4148, 'privileges', 1061, None, 0, True, 0, 0]]
>
> # Show the table meta data
> cursor.description
[('id', 'int', 4, 4, None, None, None),
 ('name', 'varchar', 12, 12, None, None, None),
 ('schema_id', 'int', 4, 4, None, None, None),
 ('query', 'varchar', 168, 168, None, None, None),
 ('type', 'smallint', 1, 1, None, None, None),
 ('system', 'boolean', 5, 5, None, None, None),
 ('commit_action', 'smallint', 1, 1, None, None, None),
 ('temporary', 'tinyint', 1, 1, None, None, None)]

I had to insert a large number of records 2 crore records to be precise and I have to admit it that monetDB is slow in insertions. On average it took 0.057 seconds to insert one record when simple insertion is used and in batch mode, it was taking around 0.001 seconds for inserting one record.
It took me around 9 hours to preprocess and insert documents in Database.

But the read and aggregation times were impresive. I tested it with different aggregation queries and compared the execution time with MongoDB counterpart and MonetDB performed way better in queries where MongoDB even failed.
Also, MongoDB was not able to filter the documents without the secondary indices.

If you can manage with slow writes and need functionality like transactions, joins, scalability along with high read and aggregation performance you can explore MonetDB as an option.

With this I conclude this post, I hope it helps you to get started with MonetDB.

Comments

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hard Rock Hotel & Casino Las Vegas - MapyRO
    Find Hard Rock Hotel & Casino Las Vegas, Nevada, 보령 출장마사지 United 전라북도 출장안마 States, 안산 출장안마 United States, ratings, 세종특별자치 출장안마 photos, prices, expert advice, traveler reviews and 경산 출장샵 tips, and more information.

    ReplyDelete

Post a Comment

Popular posts from this blog

Spidering the web with Python

Word Vectorization

Machine Learning -Solution or Problem