Saturday, September 29, 2018

Pandas Dataframe Tips

When dropping rows with NaN's in a Pandas Dataframe, this

df = df.dopna()

may be faster than this

df.dropna(inplace=True).




Casting converting a column of a dataframe to float (for example) can be very slow if done like this:

df[col1] = pd.to_numericdf(df[col1])

or

df[col1] = df[col1].apply(pd.to_numeric)

or

df[col1] = df[col1].astype(float)

If you know how to do this efficiently, please tell me. As of now, my only work around is to avoid doing this operation if possible.

Wednesday, September 26, 2018

Pandas DataFrame Jupyter notebook display: increasing column width

To increase the column width when displaying a Pandas dataframe in Jupyter notebook, do this:

pd.set_option('display.max_colwidth', -1)

[Courtesy: this]

Friday, September 21, 2018

Specify data types when creating Pandas dataframe from csv

To specify data types for various columns when creating a Pandas dataframe, do this:

df = pd.read_csv(filename, dtype={'col1':np.int64, 'col2':str, ...})

If converters are given, they will be used instead of data type conversion.

Convert Python / numpy datetime to Unix timestamp

To convert Python datetime to Unix / POSIX timestamp (float), do this:

mytime.timestamp()

[Courtesy: this]

To convert numpy.timestamp to Unix / POSIX timestamp (float), do this:

mytime.astype('uint64')

[Courtesy: this]

Changing dimensions of plot in matplotlib

To specify dimensions of a plot in matplotlib (e.g. to get longer x or y axis), do this:

fig = plt.figure(figsize=(20,3)) #x-axis = 20", y-axis=3"
ax = fig.add_subplot(111)
ax.plot(x, y)

[Courtesy: this]

Show summary statistics for a Pandas dataframe

To see the summary statistics for a Pandas dataframe, do this:

df.describe()

This will show things like: count, mean, standard deviation, etc.

To see column details (data types), do this:

df.info()


Cast a Pandas dataframe column to timestamp

To cast / convert a column in a Pandas dataframe to the timestamp datatype, do this:

df['timestamp'] = pd.to_datetime(df['timestamp'])


Jupyter-notebook: plot matplotlib graphs inline

To plot matplotlib graphs inline in Jupyter-notebook, do this:

import matplotlib.pyplot as plt
%matplotlib inline

Now, plt.plot(...) while show the graph inline.

Jupyter-notebook module reload

To reload modules being used in a jupyter-notebook (when they have been changed outside), do this:

%load_ext autoreload
%autoreload 2

This will automatically reload the module every time it is changed.

[Courtesy: this]