Once we data in DataFrame i.e. DataFrame is prepared, independent of source of data (csv,xls,db etc.) we can work with it, like it is a table in database, selecting element of our interest.
import pandas as pd
df1 = pd.read_csv('example.csv')
df1.head()
df1.shape
We have loaded data from csv file and created DataFrame. Now, we will see different select operation on this DataFrame.
Selecting a single column in DataFrame
column1 = df1['column_name']
column1.head()
Selecting multiple columns in DataFrame
cols = df1['column1','column2']
cols.head()
Selecting rows using indexing [] in DataFrame
rows = df1[10:20]
print (rows)
rows = df1[10:20]['column1','column2']
print (rows)
Selecting rows by lable (.loc[]) in DataFrame
df2 = df1.loc[10:20]
df3 = df1.loc[10:20,['column1','column2']]
print(df2)
print(df3)
Selecting rows by position (.iloc[]) in DataFrame
dfpos = df1.iloc[10:20,[3,4]]
After looking at fetching required set of column we proceed to manipulation of DataFrame
How to manipulate a DataFrame
1. Transpose
import pandas as pd
df1 = pd.read_csv('example.csv')
df2 = df1[10:20]['cols1','cols2']
print("transpose : {}".format(df2.T))
2. sort_values
df2 = df1.sort_values(by='column_name') # sorting by single column
df3 = df1.sort_values(by=['col1','col2']) # sorting by multiple column
print(df2)
print(df3)
3. sort_index
df2 = df1.sort_index()
print(df2)
4. Re-indexing
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3),index=['a','b','c'])
print(df1)
df2 = df1.reindex([1,2,3])
print(df2)
5. Adding a new column
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(10))
print(df1)
df1['col_new'] = 'a'
print(df1)
6. Remove existing column
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(10))
print(df1)
df1['col_new'] = 'a'
print(df1)
del df1['col_new'] # del df1[1] will work same
7. Data at particular location by label
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
val1 = df1.at(1,'a')
print(val1)
df1.at(1,'a') = 0 # assign value at particular location
8. Data at particular location by position
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
val1 = df1.iat[1,1]
print(val1)
df1.at[1,1] = 0 # assign value at particular location
df1[df1>0] = 2 # assign data at all location based on a condition
9. Applying a method or function in DataFrame
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(3,5),index=[1,2,3],columns=['a','b','c','d','e'])
def add_five(number):
return number+5
df.apply(add_five,axis=2)
There are few more functions such as dropna(),fillna() etc. which is used in manipulation of data. DataFrame also have many statiscal methods like info(),describe(),value_counts(),mean(),std() etc.
We will learn filtering and iterating in DataFrame in upcoming post.
Keep learning!
Hope it helps ๐