Note on Pandas
Dec 10, 2019
·
2 min read
Motivation
pandas is a great tool for representing structured data in python, similar to Table in matlab and dataframe in R .
Using it to interact with table data (like experimental records) on hard drive and process it in python is also great!
Create Dataframe
Source:
- List of tuples
- np.ndarray
- dictionary
Read / Write
Data Structure
pandasarray- Can be transformed into
numpyby.to_numpy(). Or to normallistby.to_list()
- Can be transformed into
Interaction
Slicing using boolean array. and
df.indexis the normal row number starting from 0.ExpTable[ExpTable.index == 100]- Selecting rows using condition
color_and_shape = df.loc[(df.Color == 'Green') & (df.Shape == 'Rectangle')]
ExpTable.index[ExpTable.comments.str.contains("Evolution")]- Caveat: you cannot use
nptype array to directly slice adataFrame. But you can use it to slice the individual columns.
Filtering using sub-string pattern
ExpTable.Expi[ExpTable.comments.str.contains("Evolution")]- Or
df.ephysFN[df.ephysFN.str.contains("Alfa")==True]
Remove lines containing any
nanor allnandf = df.dropna(axis=0, how='all')df = df.dropna(axis=0, how='any')- Note:
dropnawill not move the index of the original array! Thus the index of the original lines will not be changed. However there will be gap in index space.reindexwill close the gap!df = df.reindex(index=range(df.shape[0]))
Fill
nanwith valuedf.comments.fillna(value="", inplace=True)- Note: just like numpy, df function usually returns a new view or object instead of write inplace! Unless you specify that!
Concatenate 2 tables together
pd.concat([df_old,df_new], axis=0, ignore_index=True)