pandas

Topics related to pandas:

Getting started with pandas

Reshaping and pivoting

Save pandas dataframe to a csv file

Creating DataFrames

Indexing and selecting data

Grouping Data

Missing Data

Should we include the non-documented ffill and bfill?

Series

Pandas Datareader

Merge, join, and concatenate

Reading files into pandas DataFrame

Duplicated data

Resampling

Read SQL Server to Dataframe

String manipulation

Pandas IO tools (reading and saving data sets)

The pandas official documentation includes a page on IO Tools with a list of relevant functions to read and write to files, as well as some examples and common parameters.

Data Types

dtypes are not native to pandas. They are a result of pandas close architectural coupling to numpy.

the dtype of a column does not in any way have to correlate to the python type of the object contained in the column.

Here we have a pd.Series with floats. The dtype will be float.

Then we use astype to "cast" it to object.

pd.Series([1.,2.,3.,4.,5.]).astype(object)
0    1
1    2
2    3
3    4
4    5
dtype: object

The dtype is now object, but the objects in the list are still float. Logical if you know that in python, everything is an object, and can be upcasted to object.

type(pd.Series([1.,2.,3.,4.,5.]).astype(object)[0])
float

Here we try "casting" the floats to strings.

pd.Series([1.,2.,3.,4.,5.]).astype(str)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: object

The dtype is now object, but the type of the entries in the list are string. This is because numpy does not deal with strings, and thus acts as if they are just objects and of no concern.

type(pd.Series([1.,2.,3.,4.,5.]).astype(str)[0])
str

Do not trust dtypes, they are an artifact of an architectural flaw in pandas. Specify them as you must, but do not rely on what dtype is set on a column.

Meta: Documentation Guidelines

This meta post is similar to the python version http://stackoverflow.com/documentation/python/394/meta-documentation-guidelines#t=201607240058406359521.

Please make edit suggestions, and comment on those (in lieu of proper comments), so we can flesh out/iterate on these suggestions :)

Graphs and Visualizations

MultiIndex

Categorical data

Map Values

Grouping Time Series Data

JSON

Analysis: Bringing it all together and making decisions

IO for Google BigQuery

Computational Tools

Dealing with categorical variables

Gotchas of pandas

Gotcha in general is a construct that is although documented, but not intuitive. Gotchas produce some output that is normally not expected because of its counter-intuitive character.

Pandas package has several gotchas, that can confuse someone, who is not aware of them, and some of them are presented on this documentation page.

Appending to DataFrame

Simple manipulation of DataFrames

Getting information about DataFrames

pd.DataFrame.apply

Working with Time Series

Using .ix, .iloc, .loc, .at and .iat to access a DataFrame

Shifting and Lagging Data

Holiday Calendars

Making Pandas Play Nice With Native Python Datatypes

Cross sections of different axes with MultiIndex

Read MySQL to DataFrame

Boolean indexing of dataframes