Lec 39 Plotting 4

e4262b25 · LOUIS TYRRELL OLIPHANT · 5c8b395d · e4262b25 · e4262b25 · e4262b25
Commit e4262b25 authored 4 months ago by LOUIS TYRRELL OLIPHANT
--- a/f24/Louis_Lecture_Notes/39_Plotting4/Lec39_Plotting4.ipynb
+++ b/f24/Louis_Lecture_Notes/39_Plotting4/Lec39_Plotting4.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell to make the emphasized text red and use the full width of the screen\n",
+    "from IPython.core.display import HTML\n",
+    "HTML('<style>em { color: red; }</style> <style>.container {width:100% !important; }</style>')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# import statements\n",
+    "import sqlite3\n",
+    "import os\n",
+    "\n",
+    "import pandas as pd\n",
+    "from pandas import DataFrame, Series\n",
+    "\n",
+    "import matplotlib\n",
+    "from matplotlib import pyplot as plt\n",
+    "matplotlib.rcParams[\"font.size\"] = 16\n",
+    "\n",
+    "import math\n",
+    "\n",
+    "import requests"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: read \"fire_hydrants.csv\" into a DataFrame\n",
+    "hdf = pd.read_csv(\"fire_hydrants.csv\")\n",
+    "hdf.tail()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## Warmup 1: Create a line plot with 'year_manufactured' on the x axis and a count of the number of hydryants on the y axis.  Label the axes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## Warmup 2: Create a stacked bar blot.  The x axis should be the decade of the 'year_manufactured'  The y axis should be the counts of the different `HydrantType`s for each decaded\n",
+    "\n",
+    "data=pd.DataFrame(hdf['year_manufactured']//10*10)\n",
+    "data.columns=['Manufactured Decade']\n",
+    "data['Hydrant Type']=hdf['HydrantType']\n",
+    "data = data.dropna()\n",
+    "data['Manufactured Decade']=data['Manufactured Decade'].apply(lambda d: int(d))\n",
+    "data=data.value_counts().unstack()\n",
+    "\n",
+    "\n",
+    "## Your Code goes Here\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Plotting 4\n",
+    "\n",
+    "* Late days may not be used on P13\n",
+    "* Everything is due on Wednesday - send me an email if you have special circumstances\n",
+    "\n",
+    "\n",
+    "### Exam Conflict Form\n",
+    "* [Final - December 14, 7:25 pm - 9:25 pm](https://docs.google.com/forms/d/e/1FAIpQLSfJmpjKaM3t8iOwBTGAWI6jKZUqGI1Matz3bidhSbFu_c4_2g/viewform)\n",
+    "\n",
+    "### Reading\n",
+    "* [Reading 1](https://cs220.cs.wisc.edu/s23/materials/readings/matplotlib-intro.html)\n",
+    "* [Reading 2](https://matplotlib.org/stable/tutorials/introductory/quick_start.html)\n",
+    "\n",
+    "## Learning objectives\n",
+    "- how to use logarithmic axes\n",
+    "- how to create multiple plots within same figure"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Logarithmic scale\n",
+    "- math.log(y, base)\n",
+    "- find an x, such that 10**x == y\n",
+    "    - math.log10(y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(math.log10(1000))\n",
+    "print(math.log10(1000000))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(math.log(32, 2))\n",
+    "print(math.log(256, 4))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def log_approx(y):\n",
+    "    assert type(y) == int\n",
+    "    assert y >= 1\n",
+    "    return len(str(y))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(log_approx(123456789)) # What will this output?\n",
+    "print(math.log10(123456789))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(log_approx(989898))\n",
+    "print(math.log10(989898))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "errors = []\n",
+    "for y in range(1, 1000001):\n",
+    "    err = abs(log_approx(y) - math.log10(y))\n",
+    "    errors.append(err)\n",
+    "max(errors)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Why does this matter?\n",
+    "- Comparing two numbers:\n",
+    "     - 134234255623423423423432423432432432\n",
+    "     - 2342343252523\n",
+    "\n",
+    "- Eventually I don't care what the number is, but only counting the number of digits in the number to know how big the number is!\n",
+    "- log base 2: counting how many bits we need\n",
+    "- log base 10: 10 digits 0 through 9!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s = Series([1, 10, 100, 1000, 10000, 100000, 1000000])\n",
+    "s.plot.line()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s.plot.line(???)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Population example\n",
+    "https://ourworldindata.org/grapher/population"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "populations = pd.Series({\n",
+    "        \"China\":1439323776,\n",
+    "        \"India\": 1380004385,\n",
+    "        \"Mexico\": 128932753,\n",
+    "        \"Senegal\":16743927,\n",
+    "        \"Bahrain\":1701575,\n",
+    "        \"Grenada\":112523,\n",
+    "        \"Tuvalu\": 11792\n",
+    "})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plot populations as a bar chart."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# not that readable\n",
+    "???"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now plot on a logarithmic scale."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "???"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Multiple *axessubplots* in the same plot with plt.subplots\n",
+    "\n",
+    "```\n",
+    "fig,axes = plt.subplots()\n",
+    "```\n",
+    "\n",
+    "* `nrows` and `ncols` -- specify the number of subplots along the rows and columns\n",
+    "* `sharex` and `sharey` -- boolean parameters to define if subplots use the same values along their axes\n",
+    "* Use the multi-indexed `axes` returned value to put a plot into each of the subplots"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.subplots()  # default is to create one "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.subplots(ncols=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.subplots(nrows=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.subplots(nrows=2,figsize=(10,4))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s1 = Series([1, 2, 3, 3, 4])\n",
+    "s2 = Series([5, 7, 7, 8])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's create a single plot with two sub figures (line plots) and plot s1 on the left and s2 on the right."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, axes = plt.subplots(ncols = 2)\n",
+    "# axes[0] # the area on the left\n",
+    "# axes[1] # the area on the right\n",
+    "s1.plot.line(ax=axes[0])\n",
+    "s2.plot.line(ax=axes[1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What is wrong with the plots above?\n",
+    "\n",
+    "The y-axes are misleading.  Use the `sharex` and `sharey` parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# fix the misleading y axes\n",
+    "\n",
+    "???\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Iris dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Gather the data.\n",
+    "resp = requests.get(\"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\")\n",
+    "resp.raise_for_status()\n",
+    "\n",
+    "iris_f = open(\"iris.csv\", \"w\")\n",
+    "iris_f.write(resp.text)\n",
+    "iris_f.close()\n",
+    "\n",
+    "iris_df = pd.read_csv(\"iris.csv\",\n",
+    "                 names = [\"sep-len\", \"sep-wid\", \"pet-len\", \"pet-wid\", \"class\"])\n",
+    "iris_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# for the `Iris-setosa` class, plot the sepal length vs sepal width\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "colors = [\"r\", \"g\", \"b\"]\n",
+    "markers = [\".\", \"^\", \"v\"]\n",
+    "\n",
+    "varieties = list(set(iris_df[\"class\"]))\n",
+    "\n",
+    "varieties\n",
+    "\n",
+    "# create a 3 column plot\n",
+    "# Plot the sepal length vs the sepal width for each of the 3 classes of flowers.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Do this again, but for petal length vs petal width"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "colors = [\"r\", \"g\", \"b\"]\n",
+    "markers = [\".\", \"^\", \"v\"]\n",
+    "\n",
+    "varieties = list(set(iris_df[\"class\"]))\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
+%% Cell type:code id: tags:
+
+``` python
+# Run this cell to make the emphasized text red and use the full width of the screen
+from IPython.core.display import HTML
+HTML('<style>em { color: red; }</style> <style>.container {width:100% !important; }</style>')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# import statements
+import sqlite3
+import os
+
+import pandas as pd
+from pandas import DataFrame, Series
+
+import matplotlib
+from matplotlib import pyplot as plt
+matplotlib.rcParams["font.size"] = 16
+
+import math
+
+import requests
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# TODO: read "fire_hydrants.csv" into a DataFrame
+hdf = pd.read_csv("fire_hydrants.csv")
+hdf.tail()
+```
+
+%% Cell type:markdown id: tags:
+
+## Warmup
+
+%% Cell type:code id: tags:
+
+``` python
+## Warmup 1: Create a line plot with 'year_manufactured' on the x axis and a count of the number of hydryants on the y axis.  Label the axes.
+```
+
+%% Cell type:code id: tags:
+
+``` python
+## Warmup 2: Create a stacked bar blot.  The x axis should be the decade of the 'year_manufactured'  The y axis should be the counts of the different `HydrantType`s for each decaded
+
+data=pd.DataFrame(hdf['year_manufactured']//10*10)
+data.columns=['Manufactured Decade']
+data['Hydrant Type']=hdf['HydrantType']
+data = data.dropna()
+data['Manufactured Decade']=data['Manufactured Decade'].apply(lambda d: int(d))
+data=data.value_counts().unstack()
+
+
+## Your Code goes Here
+```
+
+%% Cell type:markdown id: tags:
+
+# Plotting 4
+
+* Late days may not be used on P13
+* Everything is due on Wednesday - send me an email if you have special circumstances
+
+
+### Exam Conflict Form
+* [Final - December 14, 7:25 pm - 9:25 pm](https://docs.google.com/forms/d/e/1FAIpQLSfJmpjKaM3t8iOwBTGAWI6jKZUqGI1Matz3bidhSbFu_c4_2g/viewform)
+
+### Reading
+* [Reading 1](https://cs220.cs.wisc.edu/s23/materials/readings/matplotlib-intro.html)
+* [Reading 2](https://matplotlib.org/stable/tutorials/introductory/quick_start.html)
+
+## Learning objectives
+- how to use logarithmic axes
+- how to create multiple plots within same figure
+
+%% Cell type:markdown id: tags:
+
+### Logarithmic scale
+- math.log(y, base)
+- find an x, such that 10**x == y
+    - math.log10(y)
+
+%% Cell type:code id: tags:
+
+``` python
+print(math.log10(1000))
+print(math.log10(1000000))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+print(math.log(32, 2))
+print(math.log(256, 4))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+def log_approx(y):
+    assert type(y) == int
+    assert y >= 1
+    return len(str(y))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+print(log_approx(123456789)) # What will this output?
+print(math.log10(123456789))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+print(log_approx(989898))
+print(math.log10(989898))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+errors = []
+for y in range(1, 1000001):
+    err = abs(log_approx(y) - math.log10(y))
+    errors.append(err)
+max(errors)
+```
+
+%% Cell type:markdown id: tags:
+
+### Why does this matter?
+- Comparing two numbers:
+     - 134234255623423423423432423432432432
+     - 2342343252523
+
+- Eventually I don't care what the number is, but only counting the number of digits in the number to know how big the number is!
+- log base 2: counting how many bits we need
+- log base 10: 10 digits 0 through 9!
+
+%% Cell type:code id: tags:
+
+``` python
+s = Series([1, 10, 100, 1000, 10000, 100000, 1000000])
+s.plot.line()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+s.plot.line(???)
+```
+
+%% Cell type:markdown id: tags:
+
+### Population example
+https://ourworldindata.org/grapher/population
+
+%% Cell type:code id: tags:
+
+``` python
+populations = pd.Series({
+        "China":1439323776,
+        "India": 1380004385,
+        "Mexico": 128932753,
+        "Senegal":16743927,
+        "Bahrain":1701575,
+        "Grenada":112523,
+        "Tuvalu": 11792
+})
+```
+
+%% Cell type:markdown id: tags:
+
+Plot populations as a bar chart.
+
+%% Cell type:code id: tags:
+
+``` python
+# not that readable
+???
+```
+
+%% Cell type:markdown id: tags:
+
+Now plot on a logarithmic scale.
+
+%% Cell type:code id: tags:
+
+``` python
+???
+```
+
+%% Cell type:markdown id: tags:
+
+### Multiple *axessubplots* in the same plot with plt.subplots
+
+```
+fig,axes = plt.subplots()
+```
+
+* `nrows` and `ncols` -- specify the number of subplots along the rows and columns
+* `sharex` and `sharey` -- boolean parameters to define if subplots use the same values along their axes
+* Use the multi-indexed `axes` returned value to put a plot into each of the subplots
+
+%% Cell type:code id: tags:
+
+``` python
+plt.subplots()  # default is to create one
+```
+
+%% Cell type:code id: tags:
+
+``` python
+plt.subplots(ncols=2)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+plt.subplots(nrows=2)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+plt.subplots(nrows=2,figsize=(10,4))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+s1 = Series([1, 2, 3, 3, 4])
+s2 = Series([5, 7, 7, 8])
+```
+
+%% Cell type:markdown id: tags:
+
+Let's create a single plot with two sub figures (line plots) and plot s1 on the left and s2 on the right.
+
+%% Cell type:code id: tags:
+
+``` python
+fig, axes = plt.subplots(ncols = 2)
+# axes[0] # the area on the left
+# axes[1] # the area on the right
+s1.plot.line(ax=axes[0])
+s2.plot.line(ax=axes[1])
+```
+
+%% Cell type:markdown id: tags:
+
+What is wrong with the plots above?
+
+The y-axes are misleading.  Use the `sharex` and `sharey` parameters
+
+%% Cell type:code id: tags:
+
+``` python
+# fix the misleading y axes
+
+???
+```
+
+%% Cell type:markdown id: tags:
+
+### Iris dataset
+
+%% Cell type:code id: tags:
+
+``` python
+# Gather the data.
+resp = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")
+resp.raise_for_status()
+
+iris_f = open("iris.csv", "w")
+iris_f.write(resp.text)
+iris_f.close()
+
+iris_df = pd.read_csv("iris.csv",
+                 names = ["sep-len", "sep-wid", "pet-len", "pet-wid", "class"])
+iris_df.head()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# for the `Iris-setosa` class, plot the sepal length vs sepal width
+
+```
+
+%% Cell type:code id: tags:
+
+``` python
+
+colors = ["r", "g", "b"]
+markers = [".", "^", "v"]
+
+varieties = list(set(iris_df["class"]))
+
+varieties
+
+# create a 3 column plot
+# Plot the sepal length vs the sepal width for each of the 3 classes of flowers.
+```
+
+%% Cell type:markdown id: tags:
+
+## Do this again, but for petal length vs petal width
+
+%% Cell type:code id: tags:
+
+``` python
+colors = ["r", "g", "b"]
+markers = [".", "^", "v"]
+
+varieties = list(set(iris_df["class"]))
+
+```
--- a/f24/Louis_Lecture_Notes/39_Plotting4/Lec39_Plotting4_Solution.ipynb
+++ b/f24/Louis_Lecture_Notes/39_Plotting4/Lec39_Plotting4_Solution.ipynb
--- a/f24/Louis_Lecture_Notes/39_Plotting4/fire_hydrants.csv
+++ b/f24/Louis_Lecture_Notes/39_Plotting4/fire_hydrants.csv