Once we data in DataFrame i.e. DataFrame is prepared, independent of source of data (csv,xls,db etc.) we can work with it, like it is a table in database, selecting element of our interest.

import pandas as pd
df1 = pd.read_csv('example.csv')
df1.head()
df1.shape

We have loaded data from csv file and created DataFrame. Now, we will see different select operation on this DataFrame.

Selecting a single column in DataFrame

column1 = df1['column_name']
column1.head()

Selecting multiple columns in DataFrame

cols = df1['column1','column2']
cols.head()

Selecting rows using indexing [] in DataFrame

rows = df1[10:20]
print (rows)

rows = df1[10:20]['column1','column2']
print (rows)

Selecting rows by lable (.loc[]) in DataFrame

df2 = df1.loc[10:20]
df3 = df1.loc[10:20,['column1','column2']]
print(df2)
print(df3)

Selecting rows by position (.iloc[]) in DataFrame

dfpos = df1.iloc[10:20,[3,4]] 

After looking at fetching required set of column we proceed to manipulation of DataFrame

How to manipulate a DataFrame

1. Transpose

import pandas as pd
df1 = pd.read_csv('example.csv')
df2 = df1[10:20]['cols1','cols2']
print("transpose : {}".format(df2.T))

2. sort_values

df2 = df1.sort_values(by='column_name') # sorting by single column
df3 = df1.sort_values(by=['col1','col2']) # sorting by multiple column
print(df2)
print(df3)

3. sort_index

df2 = df1.sort_index()
print(df2)

4. Re-indexing

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3),index=['a','b','c'])
print(df1)
df2 = df1.reindex([1,2,3])
print(df2)

5. Adding a new column

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(10))
print(df1)
df1['col_new'] = 'a'
print(df1)

6. Remove existing column

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(10))
print(df1)
df1['col_new'] = 'a'
print(df1)

del df1['col_new']  # del df1[1] will work same

7. Data at particular location by label

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
val1 = df1.at(1,'a')
print(val1)
df1.at(1,'a') = 0 # assign value at particular location

8. Data at particular location by position

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
val1 = df1.iat[1,1]
print(val1)
df1.at[1,1] = 0 # assign value at particular location

df1[df1>0] = 2 # assign data at all location based on a condition

9. Applying a method or function in DataFrame

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])

def add_five(number):
    return number+5

df.apply(add_five,axis=2)

There are few more functions such as dropna(),fillna() etc. which is used in manipulation of data. DataFrame also have many statiscal methods like info(),describe(),value_counts(),mean(),std() etc.

We will learn filtering and iterating in DataFrame in upcoming post.

Keep learning!

Hope it helps 🙂

%d bloggers like this: