Yes it can be googled, or gpt'ed, but I should know this by heart.

Both Series and Dataframes can handle arithmetic operations in a similar way as numpy arrays. (broadcasting etc), if their types allow for it.

Series

Arrays that allow different types. Corresponds to one column in a dataframe

Access

please access indexes 1 to 4 in a pandas series {python} s
?
{python}result = s[1:5]

Please convert a pandas series {python}sto:

  1. A list
  2. A numpy array
  3. A dict
  4. A string
  5. A dataframe
    ?
s.to_list()
s.to_numpy()
s.to_dict()
s.to_string()
s.to_frame(name=COLUMNNAME)

Dataframes

Creation

Please create a dataframe from the following dictionary
{python}data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
?

import pandas as pd

data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)

Please create a dataframe from a list of lists. Name the columns accordingly.
{python}data = [['Alex',10],['Bob',12],['Clarke',13]]
?

import pandas as pd

data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data, columns=["Name", "Age"])

Please create a dataframe from a list of dictionaries. Notice how not all features are present in the first dictionary. What if you want to specify the features/columns?
{python}data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
?
With all columns:

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data) # NAN will be appended to the first dict in columns 'c'

Selecting only columns a and b, disregarding c:

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, columns=['a', 'b']) # columns c will not get created

Access

Each row of a dataframe usually represents a datapoint. Each column a feature. To be able to access these rows, we use indexes.

Indexes are created per default. They go from 0 to n (amt of rows - 1)

Please access row 50 of a dataframe {python}df
?
{python}df.iloc[49]

Please access rows 1 to 3 of dataframe {python}df
?
{python}print(df.iloc[1:4])

Please access the (row) index of a dataframe.
?
{python}df.index

Please access the columns of a dataframe.
?
{python}df.columns # usually a list of strings (column names)

Please rename the dataframe columns {python}column1 to {python}age
?
{python}df = df.rename(columns={"column1":"age"})

Please select rows 1 to 5 and the columns "column1", "column3" from the dataframe {python}df
?

print(df.loc[1:5, ["column1", "column3"]]) # it will fetch row 5 as well.

Filtering

Please filter a dataframe, only select rows where the age is between 30 and 40
?
{python}filtered_df = df[(df['Age'] > 30) & (df['Age'] < 40)]

Deletion

Please drop row 43 from dataframe {python}df. What if you want to drop multiple rows? From a list and slicing please.
?
{python}df.drop(43)
{python}df.drop([1,2,3,4,43])
{python}df.drop(df.index[1:32])

Please delete columns {python}"column1", "column3" from {python}df
?

df = df.drop(columns=["column1", "column3"])

# or use inplace
df.drop(columns=["column1", "column3"], inplace=True) # return None

Insertion

Please add a row to a dataframe. Assume that the dataframe index is numerical from 0 to n-1.

import pandas as pd

X = {"column1": [1,2,3,4,5], "column2": [6,7,8,9,10]}
df = pd.DataFrame(X)

# add a row to the dataframe here

?

new_row = {'column1':4, 'column2':98}
df.loc[len(df)] = new_row

Please insert column "C" into an existing dataframe {python}df
?
{python}df['C'] = [7, 9, 19]

Modification

What does the {python}df.apply(...) function do and why use it?
?
{python}df = df.apply(func, axis, ..) it will apply the func on the axis of the dataframe (it will replace the values inside of the dataframe). Unless your func is vectorized, I would avoid using it.
Example: substract by the mean of each column:

def f(col):
    mean = col.mean()
    return col - mean

df = df.apply(f) # axis = 0, so the func f is applied to each column

Please modify value from row 7 and column "column3" from 4 to 23.
?
{python}df.loc[7, "column3"] = 23

Other

Please concatenate two dataframes: {python}df1, df2. Assume that the indexes will overlap and can be ignored
?
{python}combined_df = pd.concat([df1, df2], ignore_index=True)

I/O tools with pandas

You have a csv file. Please read it and convert it to a dataframe
?
{python}df = pd.read_csv(filepath)

You have json data. Please convert it to a dataframe.
?
{python}df = pd.read_json(filepath)

Please save a dataframe to a csv file
?
{python}df.to_csv(filepath)

You have an excel file, please read it and convert it to a dataframe
?
{python}pd.read_excel(filepath)

You have an excel file, please read {python}"sheet2" from it.
?
{python}pd.read_excel(filepath, sheet_name="sheet2")

Please use IPython to display a dataframe in a nice way.
?

from IPython.display import display

display(df)

Practical tips

Let's say you have multiple files with data. One column of the data is the id. This ID is unique. It is possible to combine data into one dataframe using that id.

It is then a good idea to use that "id" column as an index.

Please set an "id" column as the index of a dataframe {python}df.
?

df.index = df["id"]
df = df.drop(columns=["id"])
display(df)

Or shorter:

df.set_index(keys=["id"], inplace=True) # here we could set more than just one column.

Note that the parameter is called keys and not columns (which would seem more consistent with other library methods, because it would also accept an external key, like a list, an array or a pandas series of the same length like the dataframe. Even though I don't see a reason to do this ever.

Styling

All methods regarding how the dataframe is visualised. Basically stylistic changes to the Style, when calling {python}display(df)

Please make sure, that {python}display(df) shows all rows of a dataframe in an interactive python environment. Please use a context manager.
?

# best practice is to use a context manager
with pd.option_context("display.max_rows", None):
	display(df) # will show all rows

Please give a dataframe {python}df a title
?
{python}display(df.style.set_caption("dataframe title"))
Careful, this call returns a styler object, not a dataframe anymore. You can however display that.

Please highlight the maximum value in each column of a dataframe {python}df
?
{python}display(df.style.highlight_max(axis = 0))

Please modify the style of one dataframes {python}df column. Use the following values:

{
"background-color": "#00BFA5",
"color": "#000000",
}

?

style_params = {"background-color": "#00BFA5", "color": "#000000"}
display(df.style.set_properties(subset=["Col1"], **style_params))

Why the use of **kwargs, python arbitrary keyword arguments? Because the arguments are passed to a css function and are not actually used within python itself.

Please apply the following css style {python}"background-color: #00BFA5; color: #000000"to all cells in a dataframe column, if their values are above 2
?

def highlight_values(val):
	style = "background-color: #00BFA5; color: #000000" if val > 2 else ''
	return style

display(df.style.apply( 
	lambda col: col.map(highlight_values),
	)
)

Other

Flashcards, python dictionaries