Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • HLI877/cs220-lecture-material
  • DANDAPANTULA/cs220-lecture-material
  • cdis/cs/courses/cs220/cs220-lecture-material
  • GIMOTEA/cs220-lecture-material
  • TWMILLER4/cs220-lecture-material
  • GU227/cs220-lecture-material
  • ABADAL/cs220-lecture-material
  • CMILTON3/cs220-lecture-material
  • BDONG39/cs220-lecture-material
  • JSANDOVAL6/cs220-lecture-material
  • JSABHARWAL2/cs220-lecture-material
  • GFREDERICKS/cs220-lecture-material
  • LMSUN/cs220-lecture-material
  • RBHALE/cs220-lecture-material
  • MILNARIK/cs220-lecture-material
  • SUTTI/cs220-lecture-material
  • NMISHRA4/cs220-lecture-material
  • HXIA36/cs220-lecture-material
  • DEPPELER/cs220-lecture-material
  • KIM2245/cs220-lecture-material
  • SKLEPFER/cs220-lecture-material
  • BANDIERA/cs220-lecture-material
  • JKILPS/cs220-lecture-material
  • SOERGEL/cs220-lecture-material
  • DBAUTISTA2/cs220-lecture-material
  • VLEFTWICH/cs220-lecture-material
  • MOU5/cs220-lecture-material
  • ALJACOBSON3/cs220-lecture-material
  • RCHOUDHARY5/cs220-lecture-material
  • MGERSCH/cs220-lecture-material
  • EKANDERSON8/cs220-lecture-material
  • ZHANG2752/cs220-lecture-material
  • VSANTAMARIA/cs220-lecture-material
  • VILBRANDT/cs220-lecture-material
  • ELADD2/cs220-lecture-material
  • YLIU2328/cs220-lecture-material
  • LMEASNER/cs220-lecture-material
  • ATANG28/cs220-lecture-material
  • AKSCHELLIN/cs220-lecture-material
  • OMBUSH/cs220-lecture-material
  • MJDAVID/cs220-lecture-material
  • AKHATRY/cs220-lecture-material
  • CZHUANG6/cs220-lecture-material
  • JPDEYOUNG/cs220-lecture-material
  • SDREES/cs220-lecture-material
  • CLCAMPBELL3/cs220-lecture-material
  • CJCAMPOS/cs220-lecture-material
  • AMARAN/cs220-lecture-material
  • rmflynn2/cs220-lecture-material
  • zhang2855/cs220-lecture-material
  • imanzoor/cs220-lecture-material
  • TOUSEEF/cs220-lecture-material
  • qchen445/cs220-lecture-material
  • nareed2/cs220-lecture-material
  • younkman/cs220-lecture-material
  • kli382/cs220-lecture-material
  • bsaulnier/cs220-lecture-material
  • isatrom/cs220-lecture-material
  • kgoodrum/cs220-lecture-material
  • mransom2/cs220-lecture-material
  • ahstevens/cs220-lecture-material
  • JRADUECHEL/cs220-lecture-material
  • mpcyr/cs220-lecture-material
  • wmeyrose/cs220-lecture-material
  • mmaltman/cs220-lecture-material
  • lsonntag/cs220-lecture-material
  • ghgallant/cs220-lecture-material
  • agkaiser2/cs220-lecture-material
  • rlgerhardt/cs220-lecture-material
  • chen2552/cs220-lecture-material
  • mickiewicz/cs220-lecture-material
  • cbarnish/cs220-lecture-material
  • alampson/cs220-lecture-material
  • mjwendt4/cs220-lecture-material
  • somsakhein/cs220-lecture-material
  • heppenibanez/cs220-lecture-material
  • szhang926/cs220-lecture-material
  • wewatson/cs220-lecture-material
  • jho34/cs220-lecture-material
  • lmedin/cs220-lecture-material
  • hjiang373/cs220-lecture-material
  • hfry2/cs220-lecture-material
  • ajroberts7/cs220-lecture-material
  • mcerhardt/cs220-lecture-material
  • njtomaszewsk/cs220-lecture-material
  • rwang728/cs220-lecture-material
  • jhansonflore/cs220-lecture-material
  • msajja/cs220-lecture-material
  • bjornson2/cs220-lecture-material
  • ccmclaren/cs220-lecture-material
  • armstrongbag/cs220-lecture-material
  • eloe2/cs220-lecture-material
92 results
Show changes
File added
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:code id: tags:
``` python
# Warmup 0
import pandas as pd
import math
import copy
import requests
import matplotlib
import numpy as np
from matplotlib import pyplot as plt
matplotlib.rcParams["font.size"] = 16
```
%% Cell type:code id: tags:
``` python
# Let's compare populations from around the world!
populations = pd.Series({
"China":1439323776,
"India": 1380004385,
"Mexico": 128932753,
"Senegal":16743927,
"Bahrain":1701575,
"Grenada":112523,
"Tuvalu": 11792
})
```
%% Cell type:code id: tags:
``` python
# Plot this as a bar chart... What's the issue? It's hard to compare!
populations.plot.bar()
```
%% Cell type:code id: tags:
``` python
# Now plot it using a logarithmic scale!
populations.plot.bar(logy = True)
```
%% Cell type:code id: tags:
``` python
# Plot the populations again but using a horizontal bar chart...
populations.plot.barh(logx = True)
```
%% Cell type:code id: tags:
``` python
# Save the hydrants data into a variable called "hydrants"
hydrants = pd.read_csv("Fire_Hydrants.csv")
hydrants.head()
```
%% Cell type:code id: tags:
``` python
# What are the styles for each hydrant? Drop any NA values and
# save to a variable called hydrant_styles
hydrant_styles = hydrants["Style"].dropna()
hydrant_styles
```
%% Cell type:code id: tags:
``` python
# How many of each type are there?
hydrant_styles.value_counts()
```
%% Cell type:code id: tags:
``` python
# Clean the data first by making it uppercase...
hydrant_styles = hydrant_styles.str.upper()
hydrant_styles.value_counts()
```
%% Cell type:code id: tags:
``` python
# ... and just the first word
hydrant_styles = hydrant_styles.str.split(" ").apply(lambda hyd : hyd[0])
hydrant_styles.value_counts()
```
%% Cell type:code id: tags:
``` python
# Save this to a new column called "Clean_Style"
hydrants["Clean_Style"] = hydrant_styles
hydrants
```
%% Cell type:code id: tags:
``` python
# Save the year_manufactured for "PACER" hydrants to a variable called pacer_years
pacer_years = hydrants[hydrants["Clean_Style"] == "PACER"]["year_manufactured"]
pacer_years
```
%% Cell type:code id: tags:
``` python
# Save the year_manufactured for "MUELLER" hydrants to a variable called mueller_years
mueller_years = hydrants[hydrants["Clean_Style"] == "MUELLER"]["year_manufactured"]
mueller_years
```
%% Cell type:code id: tags:
``` python
# Save the year_manufactured for "M-3" hydrants to a variable called m3_years
m3_years = hydrants[hydrants["Clean_Style"] == "M-3"]["year_manufactured"]
m3_years
```
%% Cell type:code id: tags:
``` python
# Save the year_manufactured for all other hydrants to a variable called other_years
other_years = hydrants["year_manufactured"]
other_years = other_years[hydrants["Clean_Style"] != "PACER"]
other_years = other_years[hydrants["Clean_Style"] != "MUELLER"]
other_years = other_years[hydrants["Clean_Style"] != "M-3"]
other_years
```
%% Cell type:code id: tags:
``` python
# Drop the NA values and cast as int
pacer_years = pacer_years.dropna().astype(int)
mueller_years = mueller_years.dropna().astype(int)
m3_years = m3_years.dropna().astype(int)
other_years = other_years.dropna().astype(int)
other_years
```
%% Cell type:code id: tags:
``` python
# Group by decade
pacer_decades = (pacer_years // 10 * 10).value_counts()
mueller_decades = (mueller_years // 10 * 10).value_counts()
m3_decades = (m3_years // 10 * 10).value_counts()
other_decades = (other_years // 10 * 10).value_counts()
other_decades
```
%% Cell type:code id: tags:
``` python
# Make a dataframe out of this data.
style_df = pd.DataFrame({
"PACER": pacer_decades.sort_index(),
"MUELLER": mueller_decades.sort_index(),
"M-3": m3_decades.sort_index(),
"OTHER": other_decades.sort_index()
})
style_df
```
%% Cell type:code id: tags:
``` python
# Visualize with a line plot.
ax = style_df.plot.line()
ax.set_xlabel("Decade")
ax.set_ylabel("Hydrant Count")
```
%% Cell type:code id: tags:
``` python
# Replace the NA values with 0
style_df = style_df.fillna(0)
style_df.plot.line()
```
%% Cell type:code id: tags:
``` python
# Visualize since only 1950 as a bar plot
ax = style_df[style_df.index >= 1950].plot.bar()
ax.set_xlabel("Decade")
ax.set_ylabel("Hydrant Count")
```
%% Cell type:code id: tags:
``` python
# Stack the bar plots rather than making them appear side-by-side.
ax = style_df[style_df.index >= 1950].plot.bar(stacked = True)
ax.set_xlabel("Decade")
ax.set_ylabel("Hydrant Count")
None
```
%% Cell type:code id: tags:
``` python
# Let's put all of the above together!
hydrants = pd.read_csv("Fire_Hydrants.csv")
hydrants["Clean_Style"] = hydrants["Style"].dropna() \
.str.upper() \
.str.split(" ") \
.apply(lambda hyd : hyd[0])
typs = ["PACER", "MUELLER", "M-3"]
typ_to_decades_dict = {}
for typ in typs:
years = hydrants[hydrants["Clean_Style"] == typ]["year_manufactured"]
clean_years = years.dropna().astype(int)
decades = (clean_years // 10 * 10).value_counts()
typ_to_decades_dict[typ] = decades.sort_index()
other_years = hydrants["year_manufactured"]
for typ in typs:
other_years = other_years[hydrants["Clean_Style"] != typ]
typ_to_decades_dict["OTHER"] = (other_years.dropna().astype(int) // 10 * 10).value_counts().sort_index()
style_df = pd.DataFrame(typ_to_decades_dict).fillna(0)
style_df.plot.line()
```
%% Cell type:code id: tags:
``` python
# Let's remind ourselves of what the "features" of our data are.
hydrants.columns
```
%% Cell type:code id: tags:
``` python
# Plot all of the hydrants' X and Y coordinates on a scatter plot!
ax = hydrants.plot.scatter(x="X", y="Y", xlabel="Longitude", ylabel="Latitude")
ax.grid()
```
%% Cell type:code id: tags:
``` python
# What about just "PACER" fire hydrants?
pacer_hyds = hydrants[hydrants["Clean_Style"] == "PACER"]
pacer_hyds.plot.scatter(x="X", y="Y")
```
%% Cell type:code id: tags:
``` python
# What about just "MEDALLION" fire hydrants?
medallion_hyds = hydrants[hydrants["Clean_Style"] == "MEDALLION"]
ax = medallion_hyds.plot.scatter(x="X", y="Y")
# What's deceiving about this? They are not on the same scale.
```
%% Cell type:code id: tags:
``` python
# What are the minimum and maximum coordinates?
min_x = hydrants["X"].min()
max_x = hydrants["X"].max()
min_y = hydrants["Y"].min()
max_y = hydrants["Y"].max()
print(min_x, max_x)
print(min_y, max_y)
```
%% Cell type:code id: tags:
``` python
# Does the plot show all these coordinates?
assert ax.get_xlim()[0] <= min_x
assert ax.get_xlim()[1] >= max_x
assert ax.get_ylim()[0] <= min_y
assert ax.get_ylim()[1] <= max_y
```
%% Cell type:code id: tags:
``` python
# Now plot the "MEDALLION" fire hydrants using these limits...
ax = medallion_hyds.plot.scatter(x="X", y="Y", xlim=[min_x, max_x], ylim=[min_y, max_y])
assert ax.get_xlim()[0] <= min_x
assert ax.get_xlim()[1] >= max_x
assert ax.get_ylim()[0] <= min_y
assert ax.get_ylim()[1] <= max_y
```
%% Cell type:code id: tags:
``` python
# What if we wanted to plot 4 different types of hydrants
# in four different "quadrants" of the graph?
```
%% Cell type:code id: tags:
``` python
# Let's dive into subplots!
plt.subplots()
```
%% Cell type:code id: tags:
``` python
# Can have a number of columns...
plt.subplots(ncols = 2)
```
%% Cell type:code id: tags:
``` python
# ... or rows
plt.subplots(nrows = 2)
```
%% Cell type:code id: tags:
``` python
# Let's create some data to plot...
s1 = pd.Series([1, 2, 3, 3, 4])
s2 = pd.Series([5, 7, 7, 8])
```
%% Cell type:code id: tags:
``` python
# Creating the plots...
fig, axes = plt.subplots(ncols = 2)
# Plotting the data...
s1.plot.line(ax = axes[0])
s2.plot.line(ax = axes[1])
# axes[0] is the area on the left, axes[1] is the area on the right
# What's misleading about this? Different y axes.
```
%% Cell type:code id: tags:
``` python
# Fix this by adding sharey=True
fig, axes = plt.subplots(ncols = 2, sharey = True)
pd.Series([1, 2, 3, 3, 4]).plot.line(ax = axes[0])
pd.Series([5, 7, 7, 8]).plot.line(ax = axes[1])
```
%% Cell type:code id: tags:
``` python
# An example... Evolution of fire hydrants through the ages
fig, axes = plt.subplots(ncols = 2, nrows = 3, sharex=True, sharey=True, figsize=(8.5, 11))
hydrants[hydrants["year_manufactured"] < 1930].plot.scatter(x="X", y="Y", ax=axes[0][0], title="1920s")
hydrants[hydrants["year_manufactured"] < 1950].plot.scatter(x="X", y="Y", ax=axes[0][1], title="1940s")
hydrants[hydrants["year_manufactured"] < 1970].plot.scatter(x="X", y="Y", ax=axes[1][0], title="1960s")
hydrants[hydrants["year_manufactured"] < 1990].plot.scatter(x="X", y="Y", ax=axes[1][1], title="1980s")
hydrants[hydrants["year_manufactured"] < 2010].plot.scatter(x="X", y="Y", ax=axes[2][0], title="2000s")
hydrants.plot.scatter(x="X", y="Y", ax=axes[2][1], title="Today")
fig.suptitle("Evolution of Hydrants from 1920s-Today")
```
%% Cell type:code id: tags:
``` python
# Compare the locations of PACER, MUELLER, and W-59 hydrants
# to that of the whole City of Madison
fig, axes = plt.subplots(ncols = 2, nrows = 2, sharex=True, sharey=True)
hydrants.plot.scatter(x="X", y="Y", ax=axes[0][0], title="All Hydrants")
hydrants[hydrants["Clean_Style"] == "PACER"].plot.scatter(x="X", y="Y", ax=axes[1][0], title="Pacer", color="salmon")
hydrants[hydrants["Clean_Style"] == "MUELLER"].plot.scatter(x="X", y="Y", ax=axes[1][1], title="Mueller", color="goldenrod")
hydrants[hydrants["Clean_Style"] == "W-59"].plot.scatter(x="X", y="Y", ax=axes[0][1], title="W-59", color="green")
```
%% Cell type:code id: tags:
``` python
# Let's remind ourselves of the hydrant colors...
hydrants["nozzle_color"].str.upper().value_counts()
```
%% Cell type:code id: tags:
``` python
# What are the locations of the hydrants with
# BLUE, GREEN, and ORANGE "nozzle_color"?
fig, axes = plt.subplots(ncols = 3, figsize=(10, 2.5), sharex=True, sharey=True)
hydrants[hydrants["nozzle_color"].str.upper() == "BLUE"].plot.scatter(x="X", y="Y", color="blue", ax=axes[0])
hydrants[hydrants["nozzle_color"].str.upper() == "GREEN"].plot.scatter(x="X", y="Y", color="green", ax=axes[1])
hydrants[hydrants["nozzle_color"].str.upper() == "ORANGE"].plot.scatter(x="X", y="Y", color="orange", ax=axes[2])
```
%% Cell type:code id: tags:
``` python
# Can you do it with less copying and pasting? Use a loop!
colors = ["BLUE", "GREEN", "ORANGE"]
fig, axes = plt.subplots(ncols = 3, figsize=(10, 2.5), sharex=True, sharey=True)
for i in range(len(colors)):
colour = colors[i]
hyds_of_color = hydrants[hydrants["nozzle_color"].str.upper() == colour]
hyds_of_color.plot.scatter(x="X", y="Y", color=colour, ax=axes[i], title=colour.lower())
```
%% Cell type:code id: tags:
``` python
```