Skip to content
Snippets Groups Projects
Commit 8d142e5d authored by LOUIS TYRRELL OLIPHANT's avatar LOUIS TYRRELL OLIPHANT
Browse files

started lec 38 on plotting

parent 45395818
No related branches found
No related tags found
No related merge requests found
This diff is collapsed.
%% Cell type:markdown id: tags:
## Warmup 0: Imports
%% Cell type:code id: tags:
``` python
import sqlite3
import pandas as pd
from pandas import DataFrame, Series
import matplotlib
from matplotlib import pyplot as plt
matplotlib.rcParams["font.size"] = 16
```
%% Cell type:markdown id: tags:
## Warmup 1a: Save all the data from the `"piazza"` table to `piazza_df`
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 1b: Set the index of `piazza_df` to be `student_id`
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 1c: Add a column `"total"` to `piazza_df`.
This should be the sum of the number of `posts`, `answers`, `edits`, `followups`, and `replies_to_followups`
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 2a: Create a new dataframe `contributors_df` which contains those that had more than 0 total contributions.
Sort by this value from **highest** to **lowest**. Break **ties** by `name` in **alphabetical** order.
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 2b: Answer the same question using SQL
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 3a: Of those that contributed, what was their average number of contributions?
Do your analysis by `role` (e.g. by `"ta"`, `"instructor"`, and `"student"`)
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 3b: Answer the same question using SQL
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 4: What is the correlation between all of the numeric columns?
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Warmup 5: Close the connection
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
# Plotting Applications
**Learning Objectives**
- Make a line plot on a series or on a DataFrame
- Apply features of line plots and bar plots to visualize results of data investigations
- Clean Series data by dropping NaN values and by converting to int
- Make a stacked bar plot
%% Cell type:markdown id: tags:
## Line plots
- `SERIES.plot.line()`
- `DATAFRAME.plot.line()` each column in the data frame becomes a line in the plot
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.line.html
%% Cell type:markdown id: tags:
#### When you make a Series from a list, the default indices are: 0, 1, 2, ...
%% Cell type:code id: tags:
``` python
s = Series([1758, 2002, 2408, 2898, 3814, 4803, 5713, 6661, 7618, 8391, 8764]) # y values
s.plot.line()
s
```
%% Cell type:markdown id: tags:
#### You can make a series from a list and add indices
%% Cell type:code id: tags:
``` python
s = Series([1758, 2002, 2408, 2898, 3814, 4803, 5713, 6661, 7618, 8391, 8764], \
index=[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])
s.plot.line()
s
```
%% Cell type:markdown id: tags:
#### We can save the AxesSubplot and "beautify" it like the other plots...
%% Cell type:code id: tags:
``` python
ax = s.plot.line()
ax.set_title("Number of Craft Breweries in the USA")
ax.set_xlabel("Year")
ax.set_ylabel("# Craft Breweries")
```
%% Cell type:markdown id: tags:
#### Be careful! If the indices are out of order you get a MESS!
Pandas plots each `(index, value)` in the order given
%% Cell type:code id: tags:
``` python
s = Series([1758, 2408, 2898, 3814, 4803, 5713, 6661, 7618, 8391, 8764, 2002], \
index=[2010, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2011])
s.plot.line()
s
```
%% Cell type:markdown id: tags:
#### You can fix this by calling `sort_index`
%% Cell type:code id: tags:
``` python
s.sort_index().plot.line()
s.sort_index()
```
%% Cell type:markdown id: tags:
### Plotting lines from a DataFrame
%% Cell type:markdown id: tags:
#### City of Madison normal high and low (degrees F) by month
%% Cell type:code id: tags:
``` python
temp_df = DataFrame(
{
"high": [26, 31, 43, 57, 68, 78, 82, 79, 72, 59, 44, 30],
"low": [11, 15, 25, 36, 46, 56, 61, 59, 50, 39, 28, 16] }
)
temp_df
```
%% Cell type:markdown id: tags:
### A Line Plot made from a DataFrame automatically plots all columns
The same is true for bar plots; we'll see this later.
%% Cell type:code id: tags:
``` python
ax = temp_df.plot.line(figsize=(12, 4))
ax.set_title("Average Temperatures in Madison, WI")
ax.set_xlabel("Month")
ax.set_ylabel("Temp (Fahrenheit)")
ax.set_xticks(range(12)) # makes a range from 0 to 11
ax.set_xticklabels(["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"])
None # This gets rid of the weird output
```
%% Cell type:markdown id: tags:
#### We can also explicitly pass the `x` and `y` parameters...
%% Cell type:code id: tags:
``` python
temp_df_with_month = DataFrame(
{
"month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
"high": [26, 31, 43, 57, 68, 78, 82, 79, 72, 59, 44, 30],
"low": [11, 15, 25, 36, 46, 56, 61, 59, 50, 39, 28, 16] }
)
ax = temp_df_with_month.plot.line(x="month", y=["high", "low"], figsize=(12, 4))
ax.set_title("Average Temperatures in Madison, WI")
ax.set_xlabel("Month")
ax.set_ylabel("Temp (Fahrenheit)")
```
%% Cell type:markdown id: tags:
### We can perform a calculation on an entire DataFrame
Let's change the entire DataFrame to Celcius
%% Cell type:code id: tags:
``` python
def f_to_c(f):
return (f - 32)*5/9
celcius_df = f_to_c(temp_df)
celcius_df
```
%% Cell type:markdown id: tags:
#### We can also do that using a lambda function
%% Cell type:code id: tags:
``` python
celcius_df = (lambda f: (f - 32)*5/9)(temp_df)
celcius_df
```
%% Cell type:markdown id: tags:
#### Here is one way to add a horizontal line to our line plots
%% Cell type:code id: tags:
``` python
celcius_df["freezing"] = 0
celcius_df
```
%% Cell type:markdown id: tags:
#### This plots each column as lines with rotation for the tick labels
%% Cell type:code id: tags:
``` python
ax = celcius_df.plot.line(y=["high", "low", "freezing"], figsize = (12,4))
ax.set_xlabel("Month")
ax.set_ylabel("Temp (Celcius)")
ax.set_xticks(range(12))
ax.set_xticklabels(["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"], rotation=45)
None
```
%% Cell type:markdown id: tags:
### Bar Plot Example w/ Fire Hydrants
- General review of Pandas
- Some new Bar Plot options
%% Cell type:code id: tags:
``` python
hdf = pd.read_csv("Fire_Hydrants.csv")
hdf.tail()
```
%% Cell type:markdown id: tags:
## Exercise 1: Find just the column names of `hdf`
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
### Let's create a *bar plot* to visualize *colors* of fire hydrants.
%% Cell type:markdown id: tags:
## Exercise 2: Make a series called `counts_series` which stores the value counts of the `"nozzle_color"`
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
#### We can see that the data needs to be cleaned, since colors seem to be repeating with different cases...
%% Cell type:markdown id: tags:
## Exercise 3: Clean the `"nozzle_color"` data by changing the case to lower case before finding the value counts.
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Exercise 4: Make a horizontal bar plot of counts of colors and have the colors match
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
### Let's create a *bar plot* to visualize *style* of fire hydrants.
%% Cell type:markdown id: tags:
## Exercise 5: Repeat the same as above with `"Style"` instead
Make your own decisions on how to present the data, and "beautify" it.
%% Cell type:code id: tags:
``` python
# write your code here
```
%% Cell type:markdown id: tags:
## Exercise 6: Plot the decade manufactured for the Pacer Style as opposed to other styles
1. Get the year of manufacture of all `"Pacer"` style hydrants.
2. Get the year of manufacture of all other hydrants.
3. Get the decade of manufacture of all `"Pacer"` style hydrants as well as other hydrants.
4. Drop missing data, and find the value counts of the decade of manufacture for both `"Pacer"` style hydrants as well as other hydrants.
5. Create a DataFrame with all the data, and make a stacked bar plot. Beautify it.
%% Cell type:code id: tags:
``` python
# write your code here
```
This diff is collapsed.
File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment