Difference between revisions of "Pandas notes"

From Simson Garfinkel
Jump to navigationJump to search
m
m
Line 1: Line 1:
==Simple Manipulation==
Get the data:
    df = pd.read_csv(open(INFILE))
Rows:
    df_rows_0_to_99 = df[0:100]
Columns:
    df = pd.read_csv(open(INFILE))
    df_just_year = df['Year']
    df_year_and_count = df[['Year'],['Count']]
Filtering:
    df_years_this_century = df.local[df['Year'] > 1999]
   
==Printing==
==Printing==
     pd.set_option('display.width',174)
     pd.set_option('display.width',174)

Revision as of 07:19, 1 July 2018

Simple Manipulation

Get the data:

   df = pd.read_csv(open(INFILE))

Rows:

   df_rows_0_to_99 = df[0:100]


Columns:

   df = pd.read_csv(open(INFILE))
   df_just_year = df['Year']
   df_year_and_count = df[['Year'],['Count']]


Filtering:

   df_years_this_century = df.local[df['Year'] > 1999]


Printing

   pd.set_option('display.width',174)

Options:

Memory Ideas

print the data frame types:

   df.dtypes

print if the data frame columns are dense are sparse:

   df.ftypes

Other ideas:

   df.info()
   df.info(memory_usage='deep')
   df.memory_usage(deep=True)
   sys.getsizeof(df)
   

Convert the record_id field from an integer to a float

   surveys_df['record_id'] = surveys_df['record_id'].astype('float64')
   surveys_df['record_id'].dtype

Missing values:

   any missing values = df.isnull().values.any()
   total missing values = df.isnull().sum()

References: