Skip to content
Snippets Groups Projects
Commit 5a3ed56c authored by LOUIS TYRRELL OLIPHANT's avatar LOUIS TYRRELL OLIPHANT
Browse files

lec 34 Plotting 1

parent c05e2799
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` python
import pandas as pd
from pandas import DataFrame, Series
import sqlite3
import os
import copy
```
%% Cell type:markdown id: tags:
## Warmup
%% Cell type:code id: tags:
``` python
# Warmup 1: Open a connection, define the qry function, and save all of the movies
# data to a dataframe called "movies_df"
movies_path = "movies.db"
assert os.path.exists(movies_path)
c = sqlite3.connect(movies_path)
def qry(sql, conn = c):
return pd.read_sql(sql, conn)
movies_df = qry("""
SELECT *
FROM movies
""")
# movies_df
```
%% Cell type:code id: tags:
``` python
# Warmup 2: What are the names and revenues of the top 2 movies by James Gunn?
qry("""
""")
```
%% Cell type:code id: tags:
``` python
# Warmup 3: What is the average revenue of movies by Brad Bird?
qry("""
""")
```
%% Cell type:code id: tags:
``` python
# Warmup 4: What is the average revenue of movies for each director? Sort your answer from highest to lowest.
qry("""
""")
```
%% Cell type:code id: tags:
``` python
# Warmup 5: Of those directors who had more than 3 movies, what is the average revenue
# of movies for each director? Sort your answer from highest to lowest.
qry("""
""")
```
%% Cell type:code id: tags:
``` python
# Warmup 6: Of those directors who had more than 3 movies with ratings above 7.0,
# what is the average revenue of those movies for each director?
# Sort your answer from highest to lowest.
qry("""
""")
```
%% Cell type:markdown id: tags:
![image.png](attachment:image.png)
%% Cell type:code id: tags:
``` python
# Warmup 7: Make a scatter plot where the rating is on the x-axis
# and the revenue is on the y-axis
movies_df.plot.scatter(x="Rating",y="Revenue")
```
%% Cell type:code id: tags:
``` python
# Warmup 8: What is the correlation between the rating and the revenue?
movies_df[["Rating","Revenue"]].corr()
```
%% Cell type:markdown id: tags:
# Bar Plots
Learning Objectives:
- Make a bar plot from a Pandas Series
- Add features: x-label, y-label, title, gridlines, color to plot
- Set the index of a DataFrame certain column
- Create an 'other' column in a DataFrame
%% Cell type:code id: tags:
``` python
# Without this Jupyter notebook cannot display the "first" plot in older versions of Python / mathplotlib / jupyter
%matplotlib inline
```
%% Cell type:markdown id: tags:
### Helpful documentation and an overview of how Matplotlib works
https://matplotlib.org/stable/tutorials/introductory/usage.html
%% Cell type:code id: tags:
``` python
# matplotlib is a plotting module similar to MATLAB
import matplotlib
from matplotlib import pyplot as plt
# matplotlib is highly configurable, acts like a style sheet for Pandas Plots
# rc stands for runtime config, syntax is like a dictionary
matplotlib.rcParams # show all parameters
#matplotlib.rcParams["font.size"] # show current font size setting
#matplotlib.rcParams["font.size"] = 18 # change current font size setting
```
%% Cell type:markdown id: tags:
## Bar plots: From a Series
The index is the x-label
The values are the height of each bar
https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.bar.html
%% Cell type:code id: tags:
``` python
s = Series({"Police": 5000000, "Fire": 3000000, "Schools": 2000000})
# make a bar plot...notice the type
```
%% Cell type:code id: tags:
``` python
# if we store the returned object in a variable, we can configure the Axes
# typically the variable name used is 'ax'
ax = s.plot.bar()
type(ax)
```
%% Cell type:markdown id: tags:
### How can we set the x-axis, y-axis labels, and title?
- use the Axes object
%% Cell type:code id: tags:
``` python
# better plot:
# instead of 1e6, divide all values in s by 1 million
ax = (s / 1000000).plot.bar()
# give the x ticklabels a rotation of 45 degrees
ax.set_xticklabels(list(s.index), rotation = 45)
# set the y_label to "Dollars (Millions)"
ax.set_ylabel("Dollars (Millions)")
# set the title to "Annual City Spending"
ax.set_title("Annual City Spending")
```
%% Cell type:markdown id: tags:
### How can we change the figure size?
- figsize keyword argument
- should be a tuple with two values: width and height (in inches)
%% Cell type:code id: tags:
``` python
ax = (s / 1000000).plot.bar(figsize = (1.5, 4))
ax.set_ylabel("Dollars (Millions)")
ax.set_title("Annual City Spending")
```
%% Cell type:markdown id: tags:
### How can we make the bars horizontal?
https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.barh.html
- switch figsize arguments
- change y-label to x-label
%% Cell type:code id: tags:
``` python
# paste the previous code cell here and modify
```
%% Cell type:markdown id: tags:
### Change bar color by using the argument color= ' '
<pre>
- plot.bar(figsize = (width,height ) , color = ??? )
- 8 standard colors: r, g, b, c, m, y, k, w color = 'c' cyan
- can use value of grey between 0 and 1 color = '0.6'
- can use a tuple (r,g,b) between 0 and 1 color = (0, .3, .4)
</pre>
%% Cell type:code id: tags:
``` python
# color as a single char
ax = (s / 1000000).plot.barh(figsize = (4, 1.5), color='c')
ax.set_xlabel("Dollars (Millions)")
ax.set_title("Annual City Spending")
```
%% Cell type:code id: tags:
``` python
# color as a value of grey
ax = (s / 1000000).plot.barh(figsize = (4, 1.5), color='0.5')
ax.set_xlabel("Dollars (Millions)")
ax.set_title("Annual City Spending")
```
%% Cell type:code id: tags:
``` python
# color as tuple of (r,g,b)
ax = (s / 1000000).plot.barh(figsize = (4, 1.5), color=(.2, .5, 0))
ax.set_xlabel("Dollars (Millions)")
ax.set_title("Annual City Spending")
```
%% Cell type:markdown id: tags:
### How can we mark gridlines?
- use ax.grid()
%% Cell type:code id: tags:
``` python
# copy the previous code and add grid lines
ax = (s / 1000000).plot.barh(figsize = (4, 1.5), color='y')
ax.set_xlabel("Dollars (Millions)")
ax.set_title("Annual City Spending")
ax.grid()
```
%% Cell type:markdown id: tags:
## Examples with the Movies Database
%% Cell type:code id: tags:
``` python
# What happens if we just plot the entire data frame?
movies_df.plot.bar() # Answer:
```
%% Cell type:code id: tags:
``` python
# Let's see that more clearly, plot the first 3 movies
```
%% Cell type:code id: tags:
``` python
# How about just the revenue of the first three movies?
```
%% Cell type:code id: tags:
``` python
# What is movie 0??? Let's make the title the index of the dataframe
movies_df = movies_df.set_index("Title")
movies_df
```
%% Cell type:code id: tags:
``` python
# What are revenues of the top 10 revenue movies?
top_rev_movies = movies_df["Revenue"].sort_values(ascending=False).iloc[:10]
top_rev_movies
```
%% Cell type:code id: tags:
``` python
# Now plot them!
top_rev_movies.plot.bar(xlabel="Movie", ylabel="Revenue (in Millions)", title="Top Grossing Movies", color="salmon")
```
%% Cell type:code id: tags:
``` python
# Wouldn't it be nice to have an "other" bar to represent other revenue?
other_rev = movies_df["Revenue"].sort_values(ascending=False).iloc[10:].sum()
top_rev_movies_with_other = copy.deepcopy(top_rev_movies)
top_rev_movies_with_other["Other Revenue"] = other_rev
top_rev_movies_with_other.plot.bar(xlabel="Movie", ylabel="Revenue (in Millions)", title="Top Grossing Movies", color="salmon")
```
%% Cell type:code id: tags:
``` python
# Add the argument logy=True to show on a logarithmic scale
top_rev_movies_with_other.plot.bar(xlabel="Movie", ylabel="Revenue (in Millions)", title="Top Grossing Movies", color="salmon", logy=True)
```
This diff is collapsed.
File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment