Pandas notes
From Simson Garfinkel
Jump to navigationJump to search
Memory Ideas
print the data frame types:
df.dtypes
print if the data frame columns are dense are sparse:
df.ftypes
Other ideas:
df.info() df.info(memory_usage='deep') df.memory_usage(deep=True) sys.getsizeof(df)
Convert the record_id field from an integer to a float
surveys_df['record_id'] = surveys_df['record_id'].astype('float64') surveys_df['record_id'].dtype
Missing values:
any missing values = df.isnull().values.any() total missing values = df.isnull().sum()
References:
- https://stackoverflow.com/questions/22470690/get-list-of-pandas-dataframe-columns-based-on-data-type
- http://chris.friedline.net/2015-12-15-rutgers/lessons/python2/03-data-types-and-format.html
- https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.memory_usage.html
- https://www.dataquest.io/blog/pandas-big-data/
- https://medium.com/@jeru92/reducing-data-capacity-for-quicker-predictions-8d1210ed9536