Chandan Rajpurohit

An Artist With Technical Skills

Once we data in DataFrame i.e. DataFrame is prepared, independent of source of data (csv,xls,db etc.) we can work with it, like it is a table in database, selecting element of our interest.

import pandas as pd
df1 = pd.read_csv('example.csv')
df1.head()
df1.shape

We have loaded data from csv file and created DataFrame. Now, we will see different select operation on this DataFrame.

Selecting a single column in DataFrame

column1 = df1['column_name']
column1.head()

Selecting multiple columns in DataFrame

cols = df1['column1','column2']
cols.head()

Selecting rows using indexing [] in DataFrame

rows = df1[10:20]
print (rows)

rows = df1[10:20]['column1','column2']
print (rows)

Selecting rows by lable (.loc[]) in DataFrame

df2 = df1.loc[10:20]
df3 = df1.loc[10:20,['column1','column2']]
print(df2)
print(df3)

Selecting rows by position (.iloc[]) in DataFrame

dfpos = df1.iloc[10:20,[3,4]] 

After looking at fetching required set of column we proceed to manipulation of DataFrame

How to manipulate a DataFrame

1. Transpose

import pandas as pd
df1 = pd.read_csv('example.csv')
df2 = df1[10:20]['cols1','cols2']
print("transpose : {}".format(df2.T))

2. sort_values

df2 = df1.sort_values(by='column_name') # sorting by single column
df3 = df1.sort_values(by=['col1','col2']) # sorting by multiple column
print(df2)
print(df3)

3. sort_index

df2 = df1.sort_index()
print(df2)

4. Re-indexing

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3),index=['a','b','c'])
print(df1)
df2 = df1.reindex([1,2,3])
print(df2)

5. Adding a new column

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(10))
print(df1)
df1['col_new'] = 'a'
print(df1)

6. Remove existing column

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(10))
print(df1)
df1['col_new'] = 'a'
print(df1)

del df1['col_new']  # del df1[1] will work same

7. Data at particular location by label

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
val1 = df1.at(1,'a')
print(val1)
df1.at(1,'a') = 0 # assign value at particular location

8. Data at particular location by position

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
val1 = df1.iat[1,1]
print(val1)
df1.at[1,1] = 0 # assign value at particular location

df1[df1>0] = 2 # assign data at all location based on a condition

9. Applying a method or function in DataFrame

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])

def add_five(number):
    return number+5

df.apply(add_five,axis=2)

There are few more functions such as dropna(),fillna() etc. which is used in manipulation of data. DataFrame also have many statiscal methods like info(),describe(),value_counts(),mean(),std() etc.

We will learn filtering and iterating in DataFrame in upcoming post.

Keep learning!

Hope it helps ๐Ÿ™‚


Leave a Reply

%d bloggers like this: