add lecture 35

6671cbaf · Ashwin Maran · 76a6e4fd · 6671cbaf · 6671cbaf
Commit 6671cbaf authored 11 months ago by Ashwin Maran
--- a/s24/AmFam_Ashwin/35_Plotting2/Lecture Code/Lec35_Plotting2_Solution.ipynb
+++ b/s24/AmFam_Ashwin/35_Plotting2/Lecture Code/Lec35_Plotting2_Solution.ipynb
--- a/s24/AmFam_Ashwin/35_Plotting2/Lecture Code/Lec35_Plotting2_Template.ipynb
+++ b/s24/AmFam_Ashwin/35_Plotting2/Lecture Code/Lec35_Plotting2_Template.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "from pandas import DataFrame, Series\n",
+    "\n",
+    "import sqlite3\n",
+    "import os\n",
+    "\n",
+    "import matplotlib\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "import requests\n",
+    "matplotlib.rcParams[\"font.size\"] = 12"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Titanic dataset: https://www.kaggle.com/datasets/yasserh/titanic-dataset\n",
+    "\n",
+    "A **copy** can be found at: `https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-lecture-material/-/raw/main/s24/AmFam_Ashwin/35_Plotting2/Lecture%20Code/titanic.csv`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup 1:  Requests and file writing\n",
+    "\n",
+    "Download this file and save it locally in the file `titanic.csv`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup 2:  Making a DataFrame\n",
+    "\n",
+    "Read the `\"titanic.csv\"` file into a Pandas DataFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup 3: Some of our column names are not very clear, let's change them.\n",
+    "These should be our headers: `\"ID\", \"Survived\", \"Passenger Class\", \"Name\", \"Sex\", \"Age\", \"No. of Siblings/Spouses aboard\", \"No. of Parents/Children aboard\", \"Ticket Number\", \"Fare\", \"Cabin\", \"Location Embarked\"`\n",
+    "\n",
+    "Refer to the documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup 4: Connect to our database version of this data!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### This following code will create a `titanic.db` file and write the contents of `titanic_df` into this Database"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "titanic_conn = sqlite3.connect(\"titanic.db\")\n",
+    "titanic_df.to_sql(\"titanic\", titanic_conn, if_exists=\"replace\", index=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pd.read_sql(\"SELECT * FROM sqlite_master WHERE type='table'\", titanic_conn)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pd.read_sql(\"SELECT * FROM titanic LIMIT 5\", titanic_conn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup 5: Using SQL, get the 10 oldest male Titanic passengers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Warmup 6: Using SQL, get the average Fare for each Passenger Class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Lecture 35:  Scatter Plots\n",
+    "**Learning Objectives**\n",
+    "- Set the marker, color, and size of scatter plot data\n",
+    "- Calculate correlation between DataFrame columns\n",
+    "- Use subplots to group scatterplot data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Set the marker, color, and size of scatter plot data\n",
+    "\n",
+    "To start, let's look at some made-up data about Trees.\n",
+    "The city of Madison maintains a database of all the trees they care for."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trees = [\n",
+    "    {\"age\": 1, \"height\": 1.5, \"diameter\": 0.8},\n",
+    "    {\"age\": 1, \"height\": 1.9, \"diameter\": 1.2},\n",
+    "    {\"age\": 1, \"height\": 1.8, \"diameter\": 1.4},\n",
+    "    {\"age\": 2, \"height\": 1.8, \"diameter\": 0.9},\n",
+    "    {\"age\": 2, \"height\": 2.5, \"diameter\": 1.5},\n",
+    "    {\"age\": 2, \"height\": 3, \"diameter\": 1.8},\n",
+    "    {\"age\": 2, \"height\": 2.9, \"diameter\": 1.7},\n",
+    "    {\"age\": 3, \"height\": 3.2, \"diameter\": 2.1},\n",
+    "    {\"age\": 3, \"height\": 3, \"diameter\": 2},\n",
+    "    {\"age\": 3, \"height\": 2.4, \"diameter\": 2.2},\n",
+    "    {\"age\": 2, \"height\": 3.1, \"diameter\": 2.9},\n",
+    "    {\"age\": 4, \"height\": 2.5, \"diameter\": 3.1},\n",
+    "    {\"age\": 4, \"height\": 3.9, \"diameter\": 3.1},\n",
+    "    {\"age\": 4, \"height\": 4.9, \"diameter\": 2.8},\n",
+    "    {\"age\": 4, \"height\": 5.2, \"diameter\": 3.5},\n",
+    "    {\"age\": 4, \"height\": 4.8, \"diameter\": 4},\n",
+    "]\n",
+    "trees_df = DataFrame(trees)\n",
+    "trees_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Scatter Plots\n",
+    "We can make a scatter plot of a DataFrame using the following function...\n",
+    "\n",
+    "`df_name.plot.scatter(x=\"x_col_name\", y=\"y_col_name\", color=\"peachpuff\")`\n",
+    "\n",
+    "## Example 1: Plot the trees data comparing a tree's age to its height\n",
+    "<pre>\n",
+    " - What is `df_name`?\n",
+    " - What is `x_col_name`?\n",
+    " - What is `y_col_name`?\n",
+    "</pre>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trees_df.plot.scatter(x=\"age\", y=\"height\", color=\"g\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Now plot with a little more beautification...\n",
+    " - Use a new [color](https://matplotlib.org/3.5.0/_images/sphx_glr_named_colors_003.png)\n",
+    " - Use a type of [marker](https://matplotlib.org/stable/api/markers_api.html)\n",
+    " - Change the size (any int)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trees_df.plot.scatter(x=\"age\", y=\"height\", color=\"r\", marker=\"D\", s=50) # D for diamond"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### And we can add a Title to our plot..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ax = trees_df.plot.scatter(x=\"age\", y=\"height\", color=\"r\", marker=\"D\", s=50)\n",
+    "ax.set_title(\"Tree Age vs Height\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Correlation\n",
+    "\n",
+    "## Example 2: What is the correlation between our DataFrame columns?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "corr_df = trees_df.corr()\n",
+    "corr_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 1:  What is the correlation between age and height?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Variating Stylistic Parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trees_df.plot.scatter(x=\"age\", y=\"height\", marker=\"H\", s=\"diameter\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### We should scale up the sizes to make them more easily visible"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trees_df.plot.scatter(x=\"age\", y=\"height\", marker=\"H\", s=trees_df[\"diameter\"] * 20) # this way allows you to make it bigger"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Use subplots to group scatterplot data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Re-visit the Titanic Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "titanic_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### How do we create a *scatter plot* for various *class types*?\n",
+    "First, gather all the class types."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### In Pandas..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "classes = list(set(titanic_df[\"Passenger Class\"]))\n",
+    "classes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### In SQL..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "classes = sorted(list(pd.read_sql(\"\"\"\n",
+    "    SELECT DISTINCT `Passenger Class`\n",
+    "    FROM titanic\n",
+    "\"\"\", titanic_conn)[\"Passenger Class\"]))\n",
+    "classes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### In reality, you can choose to write Pandas or SQL queries (or a mix of both!). For the rest of this lecture, we'll use Pandas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If you want to continue using SQL instead, don't close the connection!\n",
+    "titanic_conn.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Change this scatter plot so that the data is only for `Passenger class = 3`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "titanic_df.plot.scatter(x=\"Age\", y=\"Fare\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Write a for loop that iterates through each Passenger Class and makes a plot for only that class"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Make the same series of plots, but this time make each plot a different color"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "colors = [\"blue\", \"green\", \"red\"]\n",
+    "\n",
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Make the same series of plots, but this time make each plot a different color AND marker"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "colors = [\"blue\", \"green\", \"red\"]\n",
+    "markers = [\"o\", \"^\", \"v\"]\n",
+    "\n",
+    "# write your code here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Food for thought:** Did you notice that it made 3 plots? What's deceptive about this?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "colors = [\"blue\", \"green\", \"red\"]\n",
+    "markers = [\"o\", \"^\", \"v\"]\n",
+    "min_x = titanic_df[\"Age\"].min()\n",
+    "max_x = titanic_df[\"Age\"].max()\n",
+    "min_y = titanic_df[\"Fare\"].min()\n",
+    "max_y = titanic_df[\"Fare\"].max()\n",
+    "\n",
+    "for i in range(len(classes)):\n",
+    "    pass_class = classes[i]\n",
+    "    \n",
+    "    # make a df just of just the data for this variety\n",
+    "    pass_class_df = titanic_df[titanic_df[\"Passenger Class\"] == pass_class] \n",
+    "    \n",
+    "    # make a scatter plot for this passenger class\n",
+    "    pass_class_df.plot.scatter(x=\"Age\", y=\"Fare\", label=pass_class, color=colors[i], marker=markers[i], xlim=(min_x, max_x), ylim=(min_y, max_y))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### We have to be VERY careful to not crop out data. We'll talk about this next lecture..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### We can also make Subplots in plots, called an AxesSubplot, keyword `ax`\n",
+    "\n",
+    "<pre>\n",
+    "1. if AxesSuplot ax passed, then plot in that subplot\n",
+    "2. if ax is None, create a new AxesSubplot\n",
+    "3. return AxesSubplot that was used\n",
+    "</pre>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plot_area = None   # don't change this...look at this variable in the last line\n",
+    "colors = [\"blue\", \"green\", \"red\"]\n",
+    "markers = [\"o\", \"^\", \"v\"]\n",
+    "for i in range(len(classes)):\n",
+    "    pass_class = classes[i]\n",
+    "    \n",
+    "    # make a df just of just the data for this variety\n",
+    "    pass_class_df = titanic_df[titanic_df[\"Passenger Class\"] == pass_class] \n",
+    "    \n",
+    "    # make a scatter plot for this passenger class\n",
+    "    plot_area = pass_class_df.plot.scatter(x=\"Age\", y=\"Fare\", label=pass_class, color=colors[i], marker=markers[i], ax=plot_area)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
+%% Cell type:code id: tags:
+
+``` python
+import pandas as pd
+from pandas import DataFrame, Series
+
+import sqlite3
+import os
+
+import matplotlib
+from matplotlib import pyplot as plt
+
+import requests
+matplotlib.rcParams["font.size"] = 12
+```
+
+%% Cell type:markdown id: tags:
+
+### Titanic dataset: https://www.kaggle.com/datasets/yasserh/titanic-dataset
+
+A **copy** can be found at: `https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-lecture-material/-/raw/main/s24/AmFam_Ashwin/35_Plotting2/Lecture%20Code/titanic.csv`
+
+%% Cell type:markdown id: tags:
+
+## Warmup 1:  Requests and file writing
+
+Download this file and save it locally in the file `titanic.csv`
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+## Warmup 2:  Making a DataFrame
+
+Read the `"titanic.csv"` file into a Pandas DataFrame
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+## Warmup 3: Some of our column names are not very clear, let's change them.
+These should be our headers: `"ID", "Survived", "Passenger Class", "Name", "Sex", "Age", "No. of Siblings/Spouses aboard", "No. of Parents/Children aboard", "Ticket Number", "Fare", "Cabin", "Location Embarked"`
+
+Refer to the documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+## Warmup 4: Connect to our database version of this data!
+
+%% Cell type:markdown id: tags:
+
+#### This following code will create a `titanic.db` file and write the contents of `titanic_df` into this Database
+
+%% Cell type:code id: tags:
+
+``` python
+titanic_conn = sqlite3.connect("titanic.db")
+titanic_df.to_sql("titanic", titanic_conn, if_exists="replace", index=False)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+pd.read_sql("SELECT * FROM sqlite_master WHERE type='table'", titanic_conn)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+pd.read_sql("SELECT * FROM titanic LIMIT 5", titanic_conn)
+```
+
+%% Cell type:markdown id: tags:
+
+## Warmup 5: Using SQL, get the 10 oldest male Titanic passengers
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+## Warmup 6: Using SQL, get the average Fare for each Passenger Class.
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+# Lecture 35:  Scatter Plots
+**Learning Objectives**
+- Set the marker, color, and size of scatter plot data
+- Calculate correlation between DataFrame columns
+- Use subplots to group scatterplot data
+
+%% Cell type:markdown id: tags:
+
+## Set the marker, color, and size of scatter plot data
+
+To start, let's look at some made-up data about Trees.
+The city of Madison maintains a database of all the trees they care for.
+
+%% Cell type:code id: tags:
+
+``` python
+trees = [
+    {"age": 1, "height": 1.5, "diameter": 0.8},
+    {"age": 1, "height": 1.9, "diameter": 1.2},
+    {"age": 1, "height": 1.8, "diameter": 1.4},
+    {"age": 2, "height": 1.8, "diameter": 0.9},
+    {"age": 2, "height": 2.5, "diameter": 1.5},
+    {"age": 2, "height": 3, "diameter": 1.8},
+    {"age": 2, "height": 2.9, "diameter": 1.7},
+    {"age": 3, "height": 3.2, "diameter": 2.1},
+    {"age": 3, "height": 3, "diameter": 2},
+    {"age": 3, "height": 2.4, "diameter": 2.2},
+    {"age": 2, "height": 3.1, "diameter": 2.9},
+    {"age": 4, "height": 2.5, "diameter": 3.1},
+    {"age": 4, "height": 3.9, "diameter": 3.1},
+    {"age": 4, "height": 4.9, "diameter": 2.8},
+    {"age": 4, "height": 5.2, "diameter": 3.5},
+    {"age": 4, "height": 4.8, "diameter": 4},
+]
+trees_df = DataFrame(trees)
+trees_df.head()
+```
+
+%% Cell type:markdown id: tags:
+
+### Scatter Plots
+We can make a scatter plot of a DataFrame using the following function...
+
+`df_name.plot.scatter(x="x_col_name", y="y_col_name", color="peachpuff")`
+
+## Example 1: Plot the trees data comparing a tree's age to its height
+<pre>
+ - What is `df_name`?
+ - What is `x_col_name`?
+ - What is `y_col_name`?
+</pre>
+
+%% Cell type:code id: tags:
+
+``` python
+trees_df.plot.scatter(x="age", y="height", color="g")
+```
+
+%% Cell type:markdown id: tags:
+
+#### Now plot with a little more beautification...
+ - Use a new [color](https://matplotlib.org/3.5.0/_images/sphx_glr_named_colors_003.png)
+ - Use a type of [marker](https://matplotlib.org/stable/api/markers_api.html)
+ - Change the size (any int)
+
+%% Cell type:code id: tags:
+
+``` python
+trees_df.plot.scatter(x="age", y="height", color="r", marker="D", s=50) # D for diamond
+```
+
+%% Cell type:markdown id: tags:
+
+#### And we can add a Title to our plot...
+
+%% Cell type:code id: tags:
+
+``` python
+ax = trees_df.plot.scatter(x="age", y="height", color="r", marker="D", s=50)
+ax.set_title("Tree Age vs Height")
+```
+
+%% Cell type:markdown id: tags:
+
+# Correlation
+
+## Example 2: What is the correlation between our DataFrame columns?
+
+%% Cell type:code id: tags:
+
+``` python
+corr_df = trees_df.corr()
+corr_df
+```
+
+%% Cell type:markdown id: tags:
+
+## Exercise 1:  What is the correlation between age and height?
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+### Variating Stylistic Parameters
+
+%% Cell type:code id: tags:
+
+``` python
+trees_df.plot.scatter(x="age", y="height", marker="H", s="diameter")
+```
+
+%% Cell type:markdown id: tags:
+
+#### We should scale up the sizes to make them more easily visible
+
+%% Cell type:code id: tags:
+
+``` python
+trees_df.plot.scatter(x="age", y="height", marker="H", s=trees_df["diameter"] * 20) # this way allows you to make it bigger
+```
+
+%% Cell type:markdown id: tags:
+
+## Use subplots to group scatterplot data
+
+%% Cell type:markdown id: tags:
+
+### Re-visit the Titanic Data
+
+%% Cell type:code id: tags:
+
+``` python
+titanic_df.head()
+```
+
+%% Cell type:markdown id: tags:
+
+### How do we create a *scatter plot* for various *class types*?
+First, gather all the class types.
+
+%% Cell type:markdown id: tags:
+
+#### In Pandas...
+
+%% Cell type:code id: tags:
+
+``` python
+classes = list(set(titanic_df["Passenger Class"]))
+classes
+```
+
+%% Cell type:markdown id: tags:
+
+#### In SQL...
+
+%% Cell type:code id: tags:
+
+``` python
+classes = sorted(list(pd.read_sql("""
+    SELECT DISTINCT `Passenger Class`
+    FROM titanic
+""", titanic_conn)["Passenger Class"]))
+classes
+```
+
+%% Cell type:markdown id: tags:
+
+#### In reality, you can choose to write Pandas or SQL queries (or a mix of both!). For the rest of this lecture, we'll use Pandas.
+
+%% Cell type:code id: tags:
+
+``` python
+# If you want to continue using SQL instead, don't close the connection!
+titanic_conn.close()
+```
+
+%% Cell type:markdown id: tags:
+
+## Exercise 2: Change this scatter plot so that the data is only for `Passenger class = 3`
+
+%% Cell type:code id: tags:
+
+``` python
+titanic_df.plot.scatter(x="Age", y="Fare")
+```
+
+%% Cell type:markdown id: tags:
+
+## Exercise 3: Write a for loop that iterates through each Passenger Class and makes a plot for only that class
+
+%% Cell type:code id: tags:
+
+``` python
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+#### Make the same series of plots, but this time make each plot a different color
+
+%% Cell type:code id: tags:
+
+``` python
+colors = ["blue", "green", "red"]
+
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+#### Make the same series of plots, but this time make each plot a different color AND marker
+
+%% Cell type:code id: tags:
+
+``` python
+colors = ["blue", "green", "red"]
+markers = ["o", "^", "v"]
+
+# write your code here
+```
+
+%% Cell type:markdown id: tags:
+
+**Food for thought:** Did you notice that it made 3 plots? What's deceptive about this?
+
+%% Cell type:code id: tags:
+
+``` python
+colors = ["blue", "green", "red"]
+markers = ["o", "^", "v"]
+min_x = titanic_df["Age"].min()
+max_x = titanic_df["Age"].max()
+min_y = titanic_df["Fare"].min()
+max_y = titanic_df["Fare"].max()
+
+for i in range(len(classes)):
+    pass_class = classes[i]
+
+    # make a df just of just the data for this variety
+    pass_class_df = titanic_df[titanic_df["Passenger Class"] == pass_class]
+
+    # make a scatter plot for this passenger class
+    pass_class_df.plot.scatter(x="Age", y="Fare", label=pass_class, color=colors[i], marker=markers[i], xlim=(min_x, max_x), ylim=(min_y, max_y))
+```
+
+%% Cell type:markdown id: tags:
+
+#### We have to be VERY careful to not crop out data. We'll talk about this next lecture...
+
+%% Cell type:markdown id: tags:
+
+### We can also make Subplots in plots, called an AxesSubplot, keyword `ax`
+
+<pre>
+1. if AxesSuplot ax passed, then plot in that subplot
+2. if ax is None, create a new AxesSubplot
+3. return AxesSubplot that was used
+</pre>
+
+%% Cell type:code id: tags:
+
+``` python
+plot_area = None   # don't change this...look at this variable in the last line
+colors = ["blue", "green", "red"]
+markers = ["o", "^", "v"]
+for i in range(len(classes)):
+    pass_class = classes[i]
+
+    # make a df just of just the data for this variety
+    pass_class_df = titanic_df[titanic_df["Passenger Class"] == pass_class]
+
+    # make a scatter plot for this passenger class
+    plot_area = pass_class_df.plot.scatter(x="Age", y="Fare", label=pass_class, color=colors[i], marker=markers[i], ax=plot_area)
+```