Skip to content
Snippets Groups Projects
Commit e4262b25 authored by LOUIS TYRRELL OLIPHANT's avatar LOUIS TYRRELL OLIPHANT
Browse files

Lec 39 Plotting 4

parent 5c8b395d
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` python
# Run this cell to make the emphasized text red and use the full width of the screen
from IPython.core.display import HTML
HTML('<style>em { color: red; }</style> <style>.container {width:100% !important; }</style>')
```
%% Cell type:code id: tags:
``` python
# import statements
import sqlite3
import os
import pandas as pd
from pandas import DataFrame, Series
import matplotlib
from matplotlib import pyplot as plt
matplotlib.rcParams["font.size"] = 16
import math
import requests
```
%% Cell type:code id: tags:
``` python
# TODO: read "fire_hydrants.csv" into a DataFrame
hdf = pd.read_csv("fire_hydrants.csv")
hdf.tail()
```
%% Cell type:markdown id: tags:
## Warmup
%% Cell type:code id: tags:
``` python
## Warmup 1: Create a line plot with 'year_manufactured' on the x axis and a count of the number of hydryants on the y axis. Label the axes.
```
%% Cell type:code id: tags:
``` python
## Warmup 2: Create a stacked bar blot. The x axis should be the decade of the 'year_manufactured' The y axis should be the counts of the different `HydrantType`s for each decaded
data=pd.DataFrame(hdf['year_manufactured']//10*10)
data.columns=['Manufactured Decade']
data['Hydrant Type']=hdf['HydrantType']
data = data.dropna()
data['Manufactured Decade']=data['Manufactured Decade'].apply(lambda d: int(d))
data=data.value_counts().unstack()
## Your Code goes Here
```
%% Cell type:markdown id: tags:
# Plotting 4
* Late days may not be used on P13
* Everything is due on Wednesday - send me an email if you have special circumstances
### Exam Conflict Form
* [Final - December 14, 7:25 pm - 9:25 pm](https://docs.google.com/forms/d/e/1FAIpQLSfJmpjKaM3t8iOwBTGAWI6jKZUqGI1Matz3bidhSbFu_c4_2g/viewform)
### Reading
* [Reading 1](https://cs220.cs.wisc.edu/s23/materials/readings/matplotlib-intro.html)
* [Reading 2](https://matplotlib.org/stable/tutorials/introductory/quick_start.html)
## Learning objectives
- how to use logarithmic axes
- how to create multiple plots within same figure
%% Cell type:markdown id: tags:
### Logarithmic scale
- math.log(y, base)
- find an x, such that 10**x == y
- math.log10(y)
%% Cell type:code id: tags:
``` python
print(math.log10(1000))
print(math.log10(1000000))
```
%% Cell type:code id: tags:
``` python
print(math.log(32, 2))
print(math.log(256, 4))
```
%% Cell type:code id: tags:
``` python
def log_approx(y):
assert type(y) == int
assert y >= 1
return len(str(y))
```
%% Cell type:code id: tags:
``` python
print(log_approx(123456789)) # What will this output?
print(math.log10(123456789))
```
%% Cell type:code id: tags:
``` python
print(log_approx(989898))
print(math.log10(989898))
```
%% Cell type:code id: tags:
``` python
errors = []
for y in range(1, 1000001):
err = abs(log_approx(y) - math.log10(y))
errors.append(err)
max(errors)
```
%% Cell type:markdown id: tags:
### Why does this matter?
- Comparing two numbers:
- 134234255623423423423432423432432432
- 2342343252523
- Eventually I don't care what the number is, but only counting the number of digits in the number to know how big the number is!
- log base 2: counting how many bits we need
- log base 10: 10 digits 0 through 9!
%% Cell type:code id: tags:
``` python
s = Series([1, 10, 100, 1000, 10000, 100000, 1000000])
s.plot.line()
```
%% Cell type:code id: tags:
``` python
s.plot.line(???)
```
%% Cell type:markdown id: tags:
### Population example
https://ourworldindata.org/grapher/population
%% Cell type:code id: tags:
``` python
populations = pd.Series({
"China":1439323776,
"India": 1380004385,
"Mexico": 128932753,
"Senegal":16743927,
"Bahrain":1701575,
"Grenada":112523,
"Tuvalu": 11792
})
```
%% Cell type:markdown id: tags:
Plot populations as a bar chart.
%% Cell type:code id: tags:
``` python
# not that readable
???
```
%% Cell type:markdown id: tags:
Now plot on a logarithmic scale.
%% Cell type:code id: tags:
``` python
???
```
%% Cell type:markdown id: tags:
### Multiple *axessubplots* in the same plot with plt.subplots
```
fig,axes = plt.subplots()
```
* `nrows` and `ncols` -- specify the number of subplots along the rows and columns
* `sharex` and `sharey` -- boolean parameters to define if subplots use the same values along their axes
* Use the multi-indexed `axes` returned value to put a plot into each of the subplots
%% Cell type:code id: tags:
``` python
plt.subplots() # default is to create one
```
%% Cell type:code id: tags:
``` python
plt.subplots(ncols=2)
```
%% Cell type:code id: tags:
``` python
plt.subplots(nrows=2)
```
%% Cell type:code id: tags:
``` python
plt.subplots(nrows=2,figsize=(10,4))
```
%% Cell type:code id: tags:
``` python
s1 = Series([1, 2, 3, 3, 4])
s2 = Series([5, 7, 7, 8])
```
%% Cell type:markdown id: tags:
Let's create a single plot with two sub figures (line plots) and plot s1 on the left and s2 on the right.
%% Cell type:code id: tags:
``` python
fig, axes = plt.subplots(ncols = 2)
# axes[0] # the area on the left
# axes[1] # the area on the right
s1.plot.line(ax=axes[0])
s2.plot.line(ax=axes[1])
```
%% Cell type:markdown id: tags:
What is wrong with the plots above?
The y-axes are misleading. Use the `sharex` and `sharey` parameters
%% Cell type:code id: tags:
``` python
# fix the misleading y axes
???
```
%% Cell type:markdown id: tags:
### Iris dataset
%% Cell type:code id: tags:
``` python
# Gather the data.
resp = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")
resp.raise_for_status()
iris_f = open("iris.csv", "w")
iris_f.write(resp.text)
iris_f.close()
iris_df = pd.read_csv("iris.csv",
names = ["sep-len", "sep-wid", "pet-len", "pet-wid", "class"])
iris_df.head()
```
%% Cell type:code id: tags:
``` python
# for the `Iris-setosa` class, plot the sepal length vs sepal width
```
%% Cell type:code id: tags:
``` python
colors = ["r", "g", "b"]
markers = [".", "^", "v"]
varieties = list(set(iris_df["class"]))
varieties
# create a 3 column plot
# Plot the sepal length vs the sepal width for each of the 3 classes of flowers.
```
%% Cell type:markdown id: tags:
## Do this again, but for petal length vs petal width
%% Cell type:code id: tags:
``` python
colors = ["r", "g", "b"]
markers = [".", "^", "v"]
varieties = list(set(iris_df["class"]))
```
Source diff could not be displayed: it is too large. Options to address this: view the blob.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment