Pandas - Series

Image result for panda

Introduction

Python is a really powerful language and any piece of software like web application,
windows application, ML model etc can be built using Python.

Prior to Pandas, Python was majorly used for data munging and preparation. It had a
very little contribution towards data analysis. Pandas solved this problem. Using Pandas,
we can accomplish five typical steps in the processing and analysis of data, regardless
of the origin of data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
This blog provides code for basic usage of Series Data Type in Pandas.


Data types in Pandas

  • Series: 1D,labeled homogeneous array, size immutable.
  • Data Frames: General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns
  • Panel: General 3D labeled, size-mutable array.

Series

In [20]:
#importing pandas and Series
import pandas as pd
from pandas import DataFrame, Series
Pandas constructor for Series
s=Series( data, index, dtype, copy)
A Series can be created from list, numpy array, dict, scalar
In [21]:
#series from array
s=Series([10,20,30,40,50])
s1=Series([100,200,300,400,500])
print s

0    10
1    20
2    30
3    40
4    50
dtype: int64
In [4]:
#series from dict
d={'a':10,'b':20,'c':30}
s=Series(d)
#dict keys are used for indexing notice NaN
print s

#creating a series from dict with cutom index
s=Series(d,index=['a','c','d'])
print s
a    10
b    20
c    30
dtype: int64
a    10.0
c    30.0
d     NaN
dtype: float64
In [5]:
#series from scalar values   (index is required)
s=Series(4,index=[1,2,3,4],dtype=pd.Float64Index)
s
Out[5]:
1    4.0
2    4.0
3    4.0
4    4.0
dtype: float64
In [6]:
#accessing elements from a series 
s=Series([1,2,3,4,5,6,7,8,9,10])

#to get first element
print s[1:]

#to print elements from 1 -6
print s[1:7]

#printing last 3 element
print s[-3:]

#printing the last element
s.iloc[-3]
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64
7     8
8     9
9    10
dtype: int64
Out[6]:
8

Functions in Series

In [11]:
#functions on series
s=Series([10,20,30,40,50])

#adding,subtracting, multiplying adn dividing from a series

print "Addition ",s.add(1)
print "Subtraction ",s.subtract(2)
print "Multiplication ",s.multiply(2)
print "Divide ",s.divide(2)
Addition  0    11
1    21
2    31
3    41
4    51
dtype: int64
Subtraction  0     8
1    18
2    28
3    38
4    48
dtype: int64
Multiplication  0     20
1     40
2     60
3     80
4    100
dtype: int64
Divide  0     5.0
1    10.0
2    15.0
3    20.0
4    25.0
dtype: float64

Rolling Functions in Series

In [13]:
#Rolling Functions on Series

#gives cumulative difference of last n records
print "CUmulative Difference \n",s.diff(2)

#to calculate cumulative product
print "Cumulative Proudct\n",s.cumprod()

#to calculate cumulative sum
print "Cumulative Sum \n",s.cumsum()

#gives teh summary of the series
print "Basic info about the series \n",s.describe()
CUmulative Difference 
0     NaN
1     NaN
2    20.0
3    20.0
4    20.0
dtype: float64
Cumulative Proudct
0          10
1         200
2        6000
3      240000
4    12000000
dtype: int64
Cumulative Sum 
0     10
1     30
2     60
3    100
4    150
dtype: int64
Basic info about the series 
count     5.000000
mean     30.000000
std      15.811388
min      10.000000
25%      20.000000
50%      30.000000
75%      40.000000
max      50.000000
dtype: float64

Custom functionions on Series

In [15]:
#applying a custom function 
def custom_func(x):
    return ((x*x)-1)

print "Custom Function ",s.apply(custom_func)
Custom Function  0      99
1     399
2     899
3    1599
4    2499
dtype: int64

Statistical Functions on Series

In [24]:
#to get a 2D matrix
print "Matrix from series\n",s.as_matrix()

#to perform autocorrelation
print "Autocorrelation is :",s.autocorr(4)

#to calculate the correlation between 2 series
print "Correlation is: ",s.corr(s1,'pearson')

#to find the covariance between two series
print "Covariance is : ",s.cov(s1,2)
Matrix from series
[10 20 30 40 50]
Autocorrelation is : nan
Correlation is:  1.0
Covariance is :  2500.0

Some other useful functions in Series

In [28]:
#largest element
print "Index of largest element ",s.max()

#smallest element
print "Index of minimum element ",s.min()

#index of the largest element
print "Index of largest element ",s.argmax()

#index of the smallest element
print "Index of minimum element ",s.argmin()

#returns the indexes of the sorted array
print "Indexes of elements on a Series when sorted ",s.argsort()
Index of largest element  50
Index of minimum element  10
Index of largest element  4
Index of minimum element  0
Indexes of elements on a Series when sorted  0    0
1    1
2    2
3    3
4    4
dtype: int64

Comments

Popular posts from this blog

Spidering the web with Python

Word Vectorization

Machine Learning -Solution or Problem