Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • cdis/cs/courses/cs320/s24
  • EBBARTELS/s24
  • kenninger/s24
  • hbartle/s24
  • jvoegeli/s24
  • chin6/s24
  • lallo/s24
  • cbjensen/s24
  • bjhicks/s24
  • JPERLOFF/s24
  • RMILLER56/s24
  • sswain2/s24
  • SHINEGEORGE/s24
  • SKALMAZROUEI/s24
  • nkempf2/s24
  • kmalovrh/s24
  • alagiriswamy/s24
  • SWEINGARTEN2/s24
  • SKALMAZROUEI/s-24-fork
  • jchasco/s24
20 results
Show changes
Showing
with 11802 additions and 0 deletions
import flask
import matplotlib.pyplot as plt
import io # input / output
import pandas as pd
# Notes for matplotlib.pyplot module
"""
fig, ax = plt.subplots(figsize=(3, 2)) enables us to create a new plot
fig.savefig(<file object>, format=<fig format>) enables us to save the figure into a file - default format is png
plt.close() closes most recent fig
plt.tight_layout() enables us to avoid cropping of the image
"""
# Notes for io module
"""
io.BytesIO for fake binary file
io.StringIO for fake text file
<fileobject>.getvalue() returns the content of the fake file
"""
# Notes for flask
"""
flask.request.args enables us process query string
@app.route("<URL>", methods=["POST"]) enables us to process HTTP POST requests
flask.request.get_data() enables us to access data (byte format) sent via HTTP POST request
"""
temps = [80, 85, 83, 90]
app = flask.Flask("my dashboard")
# DYNAMIC
@app.route("/")
def home():
return """
<html>
<body bgcolor="Salmon">
<h3>Example 1: PNG</h3>
<img src="plot1.png">
<h3>Example 2: SVG</h3>
<img src="plot2.svg">
</body>
</html>
"""
# TODO: add route to plot1.png
"""
IMPORTANT: file name and extension should match as in html content
Steps:
1. generate a plot
2. return the image contents:
2a. v1: write and read from a temporary file
2b. v2: use a fake file (io module)
3. fix the content type: default content type is text/html: Content-Type: text/html
3a. How can we find various content types? Google for "MIME types".
4. IMPORTANT: close the figure. If this is not done, after 20 refreshes, you will start getting the below warning:
More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
5. try to set y_label for the plot. This will not show up.
6. create a line plot with temps Series
"""
@app.route("/plot1.png")
def plot1():
fig, ax = plt.subplots(figsize=(3, 2))
#pd.Series(temps).plot.line(ax=ax)
#CDF code
s = pd.Series(sorted(temps))
rev = pd.Series((s.index+1)/len(s)*100, index=s.values)
rev.plot.line(ax=ax, ylim=0, drawstyle="steps-post")
ax.set_xlabel("Temperatures")
ax.set_ylabel("Distribution")
plt.tight_layout()
# v1 - write and read a temporary plot file (cumbersome)
# with open("temporary.png", "wb") as f:
# fig.savefig(f)
# with open("temporary.png", "rb") as f:
# return f.read()
# v2 - write and read from a fake file
f = io.BytesIO()
fig.savefig(f)
plt.close()
return flask.Response(f.getvalue(), headers={"Content-type": "image/png"})
# TODO: add route to plot2.svg
"""
IMPORTANT: file name and extension should match as in html content
Things to change from plot1 function:
1. Change content type
2. Change format for savefig
3. SVG files have text type (unlike PNG) - so we should use io.StringIO
"""
@app.route("/plot2.svg")
def plot2():
fig, ax = plt.subplots(figsize=(3, 2))
#pd.Series(temps).plot.line(ax=ax)
pd.Series(temps).plot.hist(ax=ax, bins=100)
ax.set_ylabel("Temperatures")
plt.tight_layout()
f = io.StringIO()
fig.savefig(f, format="svg")
plt.close()
return flask.Response(f.getvalue(), headers={"Content-type": "image/svg+xml"})
# TODO: add route for "/upload"
"""
Steps:
1. support query string:
- with key/parameter as temps and value as "," separated temperature values
- add the values into temps list
2. return len(temps)
Disadvantages of query string approach:
1. If we have a lot of data, it is difficult to type. What if we are trying to upload a video?
2. Caching:
- memory of what we have already seen before; instead of slow web request, show what was already sent previously for the same request
- browser cache
- cache devices that sit in front of the server
- server caching
Use POST request instead:
1. Update route to add "methods=["POST"]"
2. Humans don't send POST requests, instead we need to use "curl -X POST <URL> -d <data>" --- curl is a simple command line tool that enables us to send HTTP requests
3. Use flask.request.get_data() - make sure to convert type to str
"""
@app.route("/upload", methods=["POST"])
def upload():
# v1 - query string
# new_temps = flask.request.args["temps"]
# new_temps = new_temps.split(",")
# for val in new_temps:
# temps.append(float(val))
# v2 - POST request
new_temps = str(flask.request.get_data(), encoding="utf-8")
new_temps = new_temps.split(",")
for val in new_temps:
temps.append(float(val))
return f"thanks, you now have {len(temps)} records"
# TODO: change SVG to histogram - very sensitive to number of "bins"
# TODO: change PNG to CDF (Cumulative Distribution Function):
"""
Idea: sort the data to observe some distribution, then switch x and y axes
Steps:
1. sort the data
2. switch x and y axes
3. normalize the y axis from 0 to 100 - make sure to set ylim to 0
4. change line plot "drawstyle" to "steps-post" - avoid extrapolating information between points
s = pd.Series(sorted(temps))
rev = pd.Series((s.index+1)/len(s)*100, index=s.values)
rev.plot.line(ax=ax, ylim=0, drawstyle="steps-post")
"""
if __name__ == "__main__":
# threaded must be False whenever we use matplotlib
app.run(host="0.0.0.0", debug=True, threaded=False)
# app.run never returns, so don't define functions
# after this (the def lines will never be reached)
import flask
# Notes for matplotlib.pyplot module
"""
fig, ax = plt.subplots(figsize=(3, 2)) enables us to create a new plot
fig.savefig(<file object>, format=<fig format>) enables us to save the figure into a file - default format is png
plt.close() closes most recent fig
plt.tight_layout() enables us to avoid cropping of the image
"""
# Notes for io module
"""
io.BytesIO for fake binary file
io.StringIO for fake text file
<fileobject>.getvalue() returns the content of the fake file
"""
# Notes for flask
"""
flask.request.args enables us process query string
@app.route("<URL>", methods=["POST"]) enables us to process HTTP POST requests
flask.request.get_data() enables us to access data (byte format) sent via HTTP POST request
"""
temps = [80, 85, 83, 90]
app = flask.Flask("my dashboard")
# DYNAMIC
@app.route("/")
def home():
return """
<html>
<body bgcolor="Salmon">
<h3>Example 1: PNG</h3>
<img src="plot1.png">
<h3>Example 2: SVG</h3>
<img src="plot2.svg">
</body>
</html>
"""
# TODO: add route to plot1.png
"""
IMPORTANT: file name and extension should match as in html content
Steps:
1. generate a plot
2. return the image contents:
2a. v1: write and read from a temporary file
2b. v2: use a fake file (io module)
3. fix the content type: default content type is text/html: Content-Type: text/html
3a. How can we find various content types? Google for "MIME types".
4. IMPORTANT: close the figure. If this is not done, after 20 refreshes, you will start getting the below warning:
More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
5. try to set y_label for the plot. This will not show up.
6. create a line plot with temps Series
"""
# TODO: add route to plot2.svg
"""
IMPORTANT: file name and extension should match as in html content
Things to change from plot1 function:
1. Change content type
2. Change format for savefig
3. SVG files have text type (unlike PNG) - so we should use io.StringIO
"""
# TODO: add route for "/upload"
"""
Steps:
1. support query string:
- with key/parameter as temps and value as "," separated temperature values
- add the values into temps list
2. return len(temps)
Disadvantages of query string approach:
1. If we have a lot of data, it is difficult to type. What if we are trying to upload a video?
2. Caching:
- memory of what we have already seen before; instead of slow web request, show what was already sent previously for the same request
- browser cache
- cache devices that sit in front of the server
- server caching
Use POST request instead:
1. Update route to add "methods=["POST"]"
2. Humans don't send POST requests, instead we need to use "curl -X POST <URL> -d <data>" --- curl is a simple command line tool that enables us to send HTTP requests
3. Use flask.request.get_data() - make sure to convert type to str
"""
# TODO: change SVG to histogram - very sensitive to number of "bins"
# TODO: change PNG to CDF (Cumulative Distribution Function):
"""
Idea: sort the data to observe some distribution, then switch x and y axes
Steps:
1. sort the data
2. switch x and y axes
3. normalize the y axis from 0 to 100 - make sure to set ylim to 0
4. change line plot "drawstyle" to "steps-post" - avoid extrapolating information between points
s = pd.Series(sorted(temps))
rev = pd.Series((s.index+1)/len(s)*100, index=s.values)
rev.plot.line(ax=ax, ylim=0, drawstyle="steps-post")
"""
if __name__ == "__main__":
# threaded must be False whenever we use matplotlib
app.run(host="0.0.0.0", debug=True, threaded=False)
# app.run never returns, so don't define functions
# after this (the def lines will never be reached)
import flask
# Notes for matplotlib.pyplot module
"""
fig, ax = plt.subplots(figsize=(3, 2)) enables us to create a new plot
fig.savefig(<file object>, format=<fig format>) enables us to save the figure into a file - default format is png
plt.close() closes most recent fig
plt.tight_layout() enables us to avoid cropping of the image
"""
# Notes for io module
"""
io.BytesIO for fake binary file
io.StringIO for fake text file
<fileobject>.getvalue() returns the content of the fake file
"""
# Notes for flask
"""
flask.request.args enables us process query string
@app.route("<URL>", methods=["POST"]) enables us to process HTTP POST requests
flask.request.get_data() enables us to access data (byte format) sent via HTTP POST request
"""
temps = [80, 85, 83, 90]
app = flask.Flask("my dashboard")
# DYNAMIC
@app.route("/")
def home():
return """
<html>
<body bgcolor="Salmon">
<h3>Example 1: PNG</h3>
<img src="plot1.png">
<h3>Example 2: SVG</h3>
<img src="plot2.svg">
</body>
</html>
"""
# TODO: add route to plot1.png
"""
IMPORTANT: file name and extension should match as in html content
Steps:
1. generate a plot
2. return the image contents:
2a. v1: write and read from a temporary file
2b. v2: use a fake file (io module)
3. fix the content type: default content type is text/html: Content-Type: text/html
3a. How can we find various content types? Google for "MIME types".
4. IMPORTANT: close the figure. If this is not done, after 20 refreshes, you will start getting the below warning:
More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
5. try to set y_label for the plot. This will not show up.
6. create a line plot with temps Series
"""
# TODO: add route to plot2.svg
"""
IMPORTANT: file name and extension should match as in html content
Things to change from plot1 function:
1. Change content type
2. Change format for savefig
3. SVG files have text type (unlike PNG) - so we should use io.StringIO
"""
# TODO: add route for "/upload"
"""
Steps:
1. support query string:
- with key/parameter as temps and value as "," separated temperature values
- add the values into temps list
2. return len(temps)
Disadvantages of query string approach:
1. If we have a lot of data, it is difficult to type. What if we are trying to upload a video?
2. Caching:
- memory of what we have already seen before; instead of slow web request, show what was already sent previously for the same request
- browser cache
- cache devices that sit in front of the server
- server caching
Use POST request instead:
1. Update route to add "methods=["POST"]"
2. Humans don't send POST requests, instead we need to use "curl -X POST <URL> -d <data>" --- curl is a simple command line tool that enables us to send HTTP requests
3. Use flask.request.get_data() - make sure to convert type to str
"""
# TODO: change SVG to histogram - very sensitive to number of "bins"
# TODO: change PNG to CDF (Cumulative Distribution Function):
"""
Idea: sort the data to observe some distribution, then switch x and y axes
Steps:
1. sort the data
2. switch x and y axes
3. normalize the y axis from 0 to 100 - make sure to set ylim to 0
4. change line plot "drawstyle" to "steps-post" - avoid extrapolating information between points
s = pd.Series(sorted(temps))
rev = pd.Series((s.index+1)/len(s)*100, index=s.values)
rev.plot.line(ax=ax, ylim=0, drawstyle="steps-post")
"""
if __name__ == "__main__":
# threaded must be False whenever we use matplotlib
app.run(host="0.0.0.0", debug=True, threaded=False)
# app.run never returns, so don't define functions
# after this (the def lines will never be reached)
File added
File added
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id:7a9f4231-13d3-4e0c-8d06-d1bed74592ef tags:
#### <b>Note:</b> If you read this file directly from GitLab, you may see redundant backslashes. You should open this file in JupyterLab to view the expressions properly.
%% Cell type:code id:17d7ad83-024e-4763-a36e-43cf74f36a8b tags:
``` python
import re
```
%% Cell type:markdown id:29085f57 tags:
### Emails example guidance
- `name` part matches the part before the `@` or `AT`
- `at` part matches `@` or `(at)` or `(AT)` or `(aT)` or `(At)`, etc.,
- `domain` part matches `cs.wisc.edu` or `wisc.edu` or `something.com` or `something1.something2.org`, etc.,
- `full_regex` puts the entire regex together using the above three parts
%% Cell type:markdown id:5aa7f04e tags:
### Self-practice
### Q1: Which regex will NOT match "123"
1. r"\d\d\d"
2. r"\d{3}"
3. r"\D\D\D"
4. r"..."
A: 3. `\D\D\D`
Explanation: `\D` matches everything except digits
%% Cell type:code id:1c434f48-6995-40cb-9907-d8fd334b267e tags:
``` python
print(re.findall(r"\d\d\d", "123"))
print(re.findall(r"\d{3}", "123"))
print(re.findall(r"\D\D\D", "123"))
print(re.findall(r"...", "123"))
```
%% Output
['123']
['123']
[]
['123']
%% Cell type:markdown id:a495ecbf-56e6-4037-a6f2-e0adba9ca603 tags:
### Q2: What will r"^A" match?
1. "A"
2. "^A"
3. "BA"
4. "B"
5. "BB"
A: 1. `"A"`
Explanation: `^` is anchor symbol indicating string should begin with `A`. So `^A` can only match option 1, which is the only option that begins with `A`.
Option 2 has a literal `^` - to match that regex should escape the special meaning using backslash, that is regex should be `r"\^A"`
%% Cell type:code id:bd4c3c85-3983-4447-959c-7cd12210acfe tags:
``` python
print(re.findall(r"^A", "A"))
print(re.findall(r"^A", "^A"))
print(re.findall(r"^A", "BA"))
print(re.findall(r"^A", "B"))
print(re.findall(r"^A", "BB"))
# To match option 2 you need to escape regular meaning of ^
print(re.findall(r"\^A", "^A"))
```
%% Output
['A']
[]
[]
[]
[]
['^A']
%% Cell type:markdown id:836a0eaa-5bf7-4730-a923-6e0c098e09fa tags:
### Q3: Which one can match "HH"?
1. r"HA+H"
2. r"HA+?H"
3. r"H(A+)?H"
A: 3. `r"H(A+)?H"`
Explanation:
Option 1 specifies `A+` which is one or more `A`'s in between the two `H`'s, which wouldn't match `HH`. This option will match strings like "HAH", "HAAH", "HAAAH", etc.,
Option 2 specifies `A+?` which is non-greedy match for one or more `A`'s in between the two `H`'s, which wouldn't match `HH`.
Option 3 is the correct way to specify that you want to optionally match one or more `A`'s, by specifying `(A+)?` ---> now the `?` acts as 0 or 1 operator, instead of greedy versus non-greedy.
%% Cell type:code id:80db0ebc-f346-49b4-b17c-1e711bb37d33 tags:
``` python
print(re.findall(r"HA+H", "HH"))
print(re.findall(r"HA+?H", "HH"))
print(re.findall(r"H(A+)?H", "HH"))
# To see actual match with option 3, you must surround the whole regex
# using a group capture, that is:
print(re.findall(r"(H(A+)?H)", "HH"))
```
%% Output
[]
[]
['']
[('HH', '')]
%% Cell type:code id:d9a9288f-7c89-440b-84d5-685f42212133 tags:
``` python
# Option 1: strings that can be matched
print(re.findall(r"HA+H", "HAH"))
print(re.findall(r"HA+H", "HAAH"))
print(re.findall(r"HA+H", "HAAAH"))
print(re.findall(r"HA+H", "HAHAAH"))
# Option 2: same set of matches but non-greedy
# (which doesn't make a difference for these examples)
```
%% Output
['HAH']
['HAAH']
['HAAAH']
['HAH']
%% Cell type:markdown id:d1bd4782-e78e-4f84-a5b2-76aa5458c035 tags:
### Q4: Which string(s) will match r"^(ha)*$"
1. ""
2. "hahah"
3. "that"
4. "HAHA"
A: 1. `""`
Explanation: The string is supposed to begin with `ha` and end with `ha` --- this is because of `^` and `$`. In between `(ha)*` can match 0 or more `ha`'s.
%% Cell type:code id:c2d34df4-3aba-4ffa-8f91-8fdcc4470b0b tags:
``` python
print(re.findall(r"^(ha)*$", ""))
print(re.findall(r"^(ha)*$", "hahah"))
print(re.findall(r"^(ha)*$", "that"))
print(re.findall(r"^(ha)*$", "HAHA"))
```
%% Output
['']
[]
[]
[]
%% Cell type:code id:a8c366f4-94a6-4695-a45c-5933051a7c07 tags:
``` python
# Strings that can have a match with `r"^(ha)*$"`
print(re.findall(r"^(ha)*$", "ha"))
print(re.findall(r"^(ha)*$", "haha"))
print(re.findall(r"^(ha)*$", "hahaha"))
# and so on
```
%% Output
['ha']
['ha']
['ha']
%% Cell type:markdown id:d63a95fa-bd75-4e9b-9ac5-57ab3fa8f11d tags:
### Q5: What is the type of the following?
`re.findall(r"(\d) (\w+)", some_str)[0]`
1. list
2. tuple
3. string
A: 2. `tuple`
Explanation: there are two groups indicated by two () inside the regex. So the return value will be a `list` of `tuples`. So inner data structure indexed using 0 will be `tuple`.
%% Cell type:markdown id:3ff199e4 tags:
### Q6: What will it do?
```python
re.sub(r"(\d{3})-(\d{3}-\d{4})",
r"(\g<1>) \g<2>",
"608-123-4567")
```
A: converts 608-123-4567 phone number format to (608) 123-4567 format.
Explanation:
Regex parts:
1. group capture of 3 \d matches `(\d{3})`
2. hyphen match `-`
3. group capture of 3 \d matches, then a `-`, and then 4 \d matches `(\d{3}-\d{4})`
Replacement string part (which is also a raw string)
1. group 1 within parenthesis `(\g<1>)`
2. space
3. group 2 as such
%% Cell type:code id:06a51bcf-628d-45da-9e55-76fa43fc5b16 tags:
``` python
re.sub(r"(\d{3})-(\d{3}-\d{4})",
r"(\g<1>) \g<2>",
"608-123-4567")
```
%% Output
'(608) 123-4567'
%% Cell type:code id:5735e93a-d938-488e-b17a-6ffc36b35153 tags:
``` python
```
lecture_material/16-viz-1/Target_plot.png

27.3 KiB

Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id:084b5333 tags:
# Visualization 1
- Advanced visualization, example: https://trailsofwind.figures.cc/
- Custom visualization steps:
- draw "patches" (shapes) on the screen (what):
- lines
- polygons
- circle
- text
- location of the "patches" on the screen (where):
- X & Y co-ordinate
- "Coordinate Reference System (CRS)":
- takes some X & Y and maps it on to actual space on screen
- several CRS
%% Cell type:code id:5df39a4b-d55b-4ba0-ab78-bd06fac8047e tags:
``` python
# import statements
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import math
```
%% Cell type:markdown id:af357b87 tags:
### Review: drawing a figure
- `fig, ax = plt.subplots(figsize=(<width>, <height>))`
### Drawing a circle
- Type `plt.` and then tab to see a list of `patches`.
- `plt.Circle((<X>, <Y>), <RADIUS>)`
- To see the cicle, we need to invoke either:
- `ax.add_patch(<circle object>)`
- `ax.add_artist(<circle object>)`
- this invocation needs to be in the same cell as the one that draws the figure
- Is there a difference between `ax.add_patch` and `ax.add_artist`?
- `ax.autoscale_view()`: automatically chose limits for the axes; typically works better with `ax.add_patch(...)`
%% Cell type:code id:152cd4b0-7334-491f-841d-c0bfe3fce3c9 tags:
``` python
fig, ax = plt.subplots(figsize=(6, 4))
# Let's draw a circle at (0.5, 0.5) of radius 0.3
# Add the circle to the AxesSubplot
```
%% Cell type:code id:5064c802 tags:
``` python
```
%% Cell type:markdown id:5b760ce1 tags:
Type and MRO of circle object.
%% Cell type:code id:8a714acd-e33d-4fa0-9fdd-de8438e94418 tags:
``` python
type(c)
```
%% Cell type:code id:2617ab22 tags:
``` python
type(c)
```
%% Cell type:markdown id:085918f5 tags:
### Making the circle circular
1. Have same figure width and height
2. Aspect ratio
3. Transformers: let's us pick a Coordinate Reference System (CRS)
%% Cell type:code id:1bf67506 tags:
``` python
# Option 1: Have same figure width and height
fig, ax = plt.subplots(figsize=(6, 4))
c = plt.Circle((0.5, 0.5), 0.3)
ax.add_patch(c)
ax.autoscale_view()
```
%% Cell type:markdown id:4bd1648d-55ce-4156-b94e-6b9a10db4da7 tags:
### Aspect Ratio
- `ax.set_aspect(<Y DIM>)`: how much space y axes takes with respect to x axes space
%% Cell type:code id:06b32774-26a2-4627-b363-f79eb2838d07 tags:
``` python
fig, ax = plt.subplots(figsize=(6, 4))
c = plt.Circle((0.5, 0.5), 0.3)
ax.add_artist(c)
# Set aspect for y-axis to 1
```
%% Cell type:markdown id:65f0928e tags:
What if we want x and y axes to have the same aspect ratio? That is we care more about the figure being square than about the circle being circular.
%% Cell type:code id:7fa11875-34ba-4550-8e8b-9a9f92e58a9a tags:
``` python
fig, ax = plt.subplots(figsize=(6,4))
# Set x axis limit to (0, 3)
c = plt.Circle((0.5, 0.5), 0.3)
ax.add_artist(c)
# Set aspect for y-axis to 3
```
%% Cell type:markdown id:c2429f83-1603-4aaf-b767-a60969fc20d7 tags:
### Transformers: let us pick a Coordinate Reference System (CRS)
- Documentation: https://matplotlib.org/stable/tutorials/advanced/transforms_tutorial.html
- `ax.transData`: default
- `ax.transAxes` and `fig.transFigure`:
- (0, 0) is bottom left
- (1, 1) is top right
- these are true immaterial of the axes limits
- `None` or `IdentityTransform()`: disabling CRS
%% Cell type:markdown id:84c0e7b7 tags:
### Review:
- `fig, ax = plt.subplots(figsize=(<width>, <height>), ncols=<N>, nrows=<N>)`:
- ncols: split into vertical sub plots
- nrows: split into horizontal sub plots
- `ax.set_xlim(<lower limit>, <upper limit>)`: set x-axis limits
- `ax.set_ylim(<lower limit>, <upper limit>)`: set y-axis limits
### `ax.transData`
- `transform` parameter in "patch" creation function let's us specify the CRS
- `color` parameter controls the color of the "patch"
- `edgecolor` parameter controls outer border color of the "patch"
- `linewidth` parameter controls the size of the border of the "patch"
- `facecolor` parameter controls the filled in color of the "patch"
%% Cell type:code id:5adb1223-0fc4-422c-95ef-1446ad811940 tags:
``` python
# Create a plot with two vertical subplots
# Set right subplot x-axis limit from 0 to 3
# Left subplot: plot Circle at (0.5, 0.5) with radius 0.2
# Specify CRS as ax1.transData (tranform parameter)
# Right subplot: plot Circle at (0.5, 0.5) with radius 0.2
# default: transform=ax2.transData
# Observe that we get a different circle
# Transform based on ax1, but crop based on ax2
# Left subplot: plot Circle at (1, 1) with radius 0.3 and crop using ax2
# where to position the shape
# how to crop the shape
# Right subplot: plot Circle at (1, 1) with radius 0.3 and crop using ax2
# where to position the shape
# how to crop the shape
```
%% Cell type:markdown id:0167a871 tags:
### `ax.transAxes` and `fig.transFigure`
- (0, 0) is bottom left
- (1, 1) is top right
- these are true immaterial of the axes limits
%% Cell type:code id:38aa99c6-039a-468e-9cb1-b1a3d9b36a80 tags:
``` python
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 4))
ax2.set_xlim(0, 3)
# Left subplot
c = plt.Circle((0.5, 0.5), 0.2, transform=???)
???.add_artist(c)
# Right subplot
c = plt.Circle((0.5, 0.5), 0.2, transform=???)
???.add_artist(c)
# whole figure
# edgecolor="red", facecolor="none", linewidth=3
```
%% Cell type:markdown id:bc52379c tags:
### No CRS (raw pixel coordinates)
- `fig.dpi`: dots (aka pixesl) per inch
- increasing dpi makes the figure have higher resolution (helpful when you want to print a large size)
- Review:
- `plt.tight_layout()`: avoid unncessary cropping of the figure --- always needed for No CRS cases
- `fig.savefig(<relative path.png>)`: to save a local copy of the image
- Jupyter command to avoid cropping:
- `%config InlineBackend.print_figure_kwargs={'bbox_inches': None}`
- bbox_inches stands for bounding box
%% Cell type:code id:222eb737 tags:
``` python
# Jupyter commands begin with %
```
%% Cell type:code id:e09ef243-ba52-4b70-a980-6ff4735f0fc2 tags:
``` python
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 4))
ax2.set_xlim(0, 3)
# What is the dpi?
# dots (aka pixel) per inch
# Calculate total width and height of the figure using dpi and dimensions
width = ???
height = ???
# Calculate (x, y) in the middle of the plot
x = ???
y = >>>
print(x, y)
# Make sure to invoke plt.tight_layout()
# matplotlib does the cropping better than Jupyter
# Draw a circle at (x, y) with radius 30
# Make sure to set transform=None
# Save the figure to temp.png
```
%% Cell type:markdown id:744d76e5 tags:
### Mix and match
- `ax.transData.transform((x, y))`: converts axes / data coords into raw coordinates
- How to draw an arrow:
`matplotlib.patches.FancyArrowPatch((<x1>, <y1>), (<x2>, <y2>)), transform=None, arrowstyle=<STYLE>)`
- arrowstyle="simple,head_width=10,head_length=10"
%% Cell type:code id:db8a296e-6eae-4e57-a94e-b7901c476ae2 tags:
``` python
# GOAL: draw a visual circle at axes / data coords 0.5, 0.5
# with raw co-ordinate radius 30 on right subplot
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 4))
ax2.set_xlim(0, 3)
# crop now (after .transform, we don't want to move anything!)
# plt.tight_layout()
# Transform (0.5, 0.5) to transData CRS
print(x, y)
# Draw a circle at (x, y) with radius 30 and set transform to None
# GOAL: arrow from 0.2, 0.2 (left) to 2, 0.5 (right)
# Use axes / data coords from one subplot to another subplot
# arrowstyle="simple,head_width=10,head_length=10"
arrow = matplotlib.patches.FancyArrowPatch((x1, y1), (x2, y2), transform=None)
fig.add_artist(arrow)
```
%% Cell type:markdown id:d2272a0b-f8f1-4bdd-87d5-971aee1ce160 tags:
### Custom Scatter Plots with Angles
%% Cell type:code id:468492f8-0e1e-4a37-b16c-d08e818a1b63 tags:
``` python
df = pd.DataFrame([
{"x":2, "y":5, "a": 90},
{"x":3, "y":1, "a": 0},
{"x":6, "y":6, "a": 45},
{"x":8, "y":1, "a": 180}
])
df
```
%% Cell type:code id:4963466b-acab-468e-bdde-97f9908933a8 tags:
``` python
fig, ax = plt.subplots(figsize=(3, 2))
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
for row in df.itertuples():
print(row.x, row.y, row.a)
# v1: draw a circle for each scatter point
# x, y = ax.transData.transform((row.x, row.y))
# c = plt.Circle((x,y), radius=10, transform=None)
# ax.add_artist(c)
# v2: draw an arrow for each scatter point (correct angle)
#x, y = ax.transData.transform((row.x, row.y))
# Calculate angle: math.radians(row.a)
#a = ???
# Calculate end axes / data coords:
# using math.cos(a) * 25 and math.sin(a) * 25
#x_diff = ???
#y_diff = ???
c = matplotlib.patches.FancyArrowPatch((x,y), (x+x_diff, y+y_diff),transform=None, color="k",
arrowstyle="simple,head_width=10,head_length=10")
ax.add_artist(c)
```
%% Cell type:markdown id:1ddfa239-804f-4c86-b8e9-6f94e7709ab6 tags:
### Plot annotations
- Target plot:
<img src = "Target_plot.png">
%% Cell type:markdown id:dff3b3a7-171f-4dab-9bb4-9c5fc76ca0db tags:
- `ax.text(<x>, <y>, <text>, ha=<someval>, va=<someval>)`
- `ha`: horizontalalignment
- `va`: verticalalignment
- enables us to modify "anchor" of the text
### More patches
- `plt.Line2D((<x1>, <x2>), (<y1>, <y2>)))`
- `plt.Rectangle((<x>,<y>), <width>, <height>)`
%% Cell type:code id:648f8e12-d357-43ec-b7de-1f8a14b7264b tags:
``` python
plt.rcParams["font.size"] = 16
df = pd.DataFrame({"A": [1,2,8,9], "B": [5,7,12,15]}, index=[10,20,30,40])
ax = df.plot.line(figsize=(4,3))
ax.set_xlabel("Day")
ax.set_ylabel("Amount")
plt.tight_layout()
# Enables us to control borders (aka spines)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
# 1. Replace legened with line labels
# 2. Draw a vertical line at x=20
# color="r", linestyle="--"
# 3. Highlight a region from x=25 to x=35
# color="k", zorder=50, alpha=0.2, linewidth=0
df
```
%% Cell type:markdown id:084b5333 tags:
# Visualization 1
- Advanced visualization, example: https://trailsofwind.figures.cc/
- Custom visualization steps:
- draw "patches" (shapes) on the screen (what):
- lines
- polygons
- circle
- text
- location of the "patches" on the screen (where):
- X & Y co-ordinate
- "Coordinate Reference System (CRS)":
- takes some X & Y and maps it on to actual space on screen
- several CRS
%% Cell type:code id:5df39a4b-d55b-4ba0-ab78-bd06fac8047e tags:
``` python
# import statements
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import math
```
%% Cell type:markdown id:af357b87 tags:
### Review: drawing a figure
- `fig, ax = plt.subplots(figsize=(<width>, <height>))`
### Drawing a circle
- Type `plt.` and then tab to see a list of `patches`.
- `plt.Circle((<X>, <Y>), <RADIUS>)`
- To see the cicle, we need to invoke either:
- `ax.add_patch(<circle object>)`
- `ax.add_artist(<circle object>)`
- this invocation needs to be in the same cell as the one that draws the figure
- Is there a difference between `ax.add_patch` and `ax.add_artist`?
- `ax.autoscale_view()`: automatically chose limits for the axes; typically works better with `ax.add_patch(...)`
%% Cell type:code id:152cd4b0-7334-491f-841d-c0bfe3fce3c9 tags:
``` python
fig, ax = plt.subplots(figsize=(6, 4))
# Let's draw a circle at (0.5, 0.5) of radius 0.3
# Add the circle to the AxesSubplot
```
%% Cell type:code id:5064c802 tags:
``` python
```
%% Cell type:markdown id:5b760ce1 tags:
Type and MRO of circle object.
%% Cell type:code id:8a714acd-e33d-4fa0-9fdd-de8438e94418 tags:
``` python
type(c)
```
%% Cell type:code id:2617ab22 tags:
``` python
type(c)
```
%% Cell type:markdown id:085918f5 tags:
### Making the circle circular
1. Have same figure width and height
2. Aspect ratio
3. Transformers: let's us pick a Coordinate Reference System (CRS)
%% Cell type:code id:1bf67506 tags:
``` python
# Option 1: Have same figure width and height
fig, ax = plt.subplots(figsize=(6, 4))
c = plt.Circle((0.5, 0.5), 0.3)
ax.add_patch(c)
ax.autoscale_view()
```
%% Cell type:markdown id:4bd1648d-55ce-4156-b94e-6b9a10db4da7 tags:
### Aspect Ratio
- `ax.set_aspect(<Y DIM>)`: how much space y axes takes with respect to x axes space
%% Cell type:code id:06b32774-26a2-4627-b363-f79eb2838d07 tags:
``` python
fig, ax = plt.subplots(figsize=(6, 4))
c = plt.Circle((0.5, 0.5), 0.3)
ax.add_artist(c)
# Set aspect for y-axis to 1
```
%% Cell type:markdown id:65f0928e tags:
What if we want x and y axes to have the same aspect ratio? That is we care more about the figure being square than about the circle being circular.
%% Cell type:code id:7fa11875-34ba-4550-8e8b-9a9f92e58a9a tags:
``` python
fig, ax = plt.subplots(figsize=(6,4))
# Set x axis limit to (0, 3)
c = plt.Circle((0.5, 0.5), 0.3)
ax.add_artist(c)
# Set aspect for y-axis to 3
```
%% Cell type:markdown id:c2429f83-1603-4aaf-b767-a60969fc20d7 tags:
### Transformers: let us pick a Coordinate Reference System (CRS)
- Documentation: https://matplotlib.org/stable/tutorials/advanced/transforms_tutorial.html
- `ax.transData`: default
- `ax.transAxes` and `fig.transFigure`:
- (0, 0) is bottom left
- (1, 1) is top right
- these are true immaterial of the axes limits
- `None` or `IdentityTransform()`: disabling CRS
%% Cell type:markdown id:84c0e7b7 tags:
### Review:
- `fig, ax = plt.subplots(figsize=(<width>, <height>), ncols=<N>, nrows=<N>)`:
- ncols: split into vertical sub plots
- nrows: split into horizontal sub plots
- `ax.set_xlim(<lower limit>, <upper limit>)`: set x-axis limits
- `ax.set_ylim(<lower limit>, <upper limit>)`: set y-axis limits
### `ax.transData`
- `transform` parameter in "patch" creation function let's us specify the CRS
- `color` parameter controls the color of the "patch"
- `edgecolor` parameter controls outer border color of the "patch"
- `linewidth` parameter controls the size of the border of the "patch"
- `facecolor` parameter controls the filled in color of the "patch"
%% Cell type:code id:5adb1223-0fc4-422c-95ef-1446ad811940 tags:
``` python
# Create a plot with two vertical subplots
# Set right subplot x-axis limit from 0 to 3
# Left subplot: plot Circle at (0.5, 0.5) with radius 0.2
# Specify CRS as ax1.transData (tranform parameter)
# Right subplot: plot Circle at (0.5, 0.5) with radius 0.2
# default: transform=ax2.transData
# Observe that we get a different circle
# Transform based on ax1, but crop based on ax2
# Left subplot: plot Circle at (1, 1) with radius 0.3 and crop using ax2
# where to position the shape
# how to crop the shape
# Right subplot: plot Circle at (1, 1) with radius 0.3 and crop using ax2
# where to position the shape
# how to crop the shape
```
%% Cell type:markdown id:0167a871 tags:
### `ax.transAxes` and `fig.transFigure`
- (0, 0) is bottom left
- (1, 1) is top right
- these are true immaterial of the axes limits
%% Cell type:code id:38aa99c6-039a-468e-9cb1-b1a3d9b36a80 tags:
``` python
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 4))
ax2.set_xlim(0, 3)
# Left subplot
c = plt.Circle((0.5, 0.5), 0.2, transform=???)
???.add_artist(c)
# Right subplot
c = plt.Circle((0.5, 0.5), 0.2, transform=???)
???.add_artist(c)
# whole figure
# edgecolor="red", facecolor="none", linewidth=3
```
%% Cell type:markdown id:bc52379c tags:
### No CRS (raw pixel coordinates)
- `fig.dpi`: dots (aka pixesl) per inch
- increasing dpi makes the figure have higher resolution (helpful when you want to print a large size)
- Review:
- `plt.tight_layout()`: avoid unncessary cropping of the figure --- always needed for No CRS cases
- `fig.savefig(<relative path.png>)`: to save a local copy of the image
- Jupyter command to avoid cropping:
- `%config InlineBackend.print_figure_kwargs={'bbox_inches': None}`
- bbox_inches stands for bounding box
%% Cell type:code id:222eb737 tags:
``` python
# Jupyter commands begin with %
```
%% Cell type:code id:e09ef243-ba52-4b70-a980-6ff4735f0fc2 tags:
``` python
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 4))
ax2.set_xlim(0, 3)
# What is the dpi?
# dots (aka pixel) per inch
# Calculate total width and height of the figure using dpi and dimensions
width = ???
height = ???
# Calculate (x, y) in the middle of the plot
x = ???
y = >>>
print(x, y)
# Make sure to invoke plt.tight_layout()
# matplotlib does the cropping better than Jupyter
# Draw a circle at (x, y) with radius 30
# Make sure to set transform=None
# Save the figure to temp.png
```
%% Cell type:markdown id:744d76e5 tags:
### Mix and match
- `ax.transData.transform((x, y))`: converts axes / data coords into raw coordinates
- How to draw an arrow:
`matplotlib.patches.FancyArrowPatch((<x1>, <y1>), (<x2>, <y2>)), transform=None, arrowstyle=<STYLE>)`
- arrowstyle="simple,head_width=10,head_length=10"
%% Cell type:code id:db8a296e-6eae-4e57-a94e-b7901c476ae2 tags:
``` python
# GOAL: draw a visual circle at axes / data coords 0.5, 0.5
# with raw co-ordinate radius 30 on right subplot
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 4))
ax2.set_xlim(0, 3)
# crop now (after .transform, we don't want to move anything!)
# plt.tight_layout()
# Transform (0.5, 0.5) to transData CRS
print(x, y)
# Draw a circle at (x, y) with radius 30 and set transform to None
# GOAL: arrow from 0.2, 0.2 (left) to 2, 0.5 (right)
# Use axes / data coords from one subplot to another subplot
# arrowstyle="simple,head_width=10,head_length=10"
arrow = matplotlib.patches.FancyArrowPatch((x1, y1), (x2, y2), transform=None)
fig.add_artist(arrow)
```
%% Cell type:markdown id:d2272a0b-f8f1-4bdd-87d5-971aee1ce160 tags:
### Custom Scatter Plots with Angles
%% Cell type:code id:468492f8-0e1e-4a37-b16c-d08e818a1b63 tags:
``` python
df = pd.DataFrame([
{"x":2, "y":5, "a": 90},
{"x":3, "y":1, "a": 0},
{"x":6, "y":6, "a": 45},
{"x":8, "y":1, "a": 180}
])
df
```
%% Cell type:code id:4963466b-acab-468e-bdde-97f9908933a8 tags:
``` python
fig, ax = plt.subplots(figsize=(3, 2))
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
for row in df.itertuples():
print(row.x, row.y, row.a)
# v1: draw a circle for each scatter point
# x, y = ax.transData.transform((row.x, row.y))
# c = plt.Circle((x,y), radius=10, transform=None)
# ax.add_artist(c)
# v2: draw an arrow for each scatter point (correct angle)
#x, y = ax.transData.transform((row.x, row.y))
# Calculate angle: math.radians(row.a)
#a = ???
# Calculate end axes / data coords:
# using math.cos(a) * 25 and math.sin(a) * 25
#x_diff = ???
#y_diff = ???
c = matplotlib.patches.FancyArrowPatch((x,y), (x+x_diff, y+y_diff),transform=None, color="k",
arrowstyle="simple,head_width=10,head_length=10")
ax.add_artist(c)
```
%% Cell type:markdown id:1ddfa239-804f-4c86-b8e9-6f94e7709ab6 tags:
### Plot annotations
- Target plot:
<img src = "Target_plot.png">
%% Cell type:markdown id:dff3b3a7-171f-4dab-9bb4-9c5fc76ca0db tags:
- `ax.text(<x>, <y>, <text>, ha=<someval>, va=<someval>)`
- `ha`: horizontalalignment
- `va`: verticalalignment
- enables us to modify "anchor" of the text
### More patches
- `plt.Line2D((<x1>, <x2>), (<y1>, <y2>)))`
- `plt.Rectangle((<x>,<y>), <width>, <height>)`
%% Cell type:code id:648f8e12-d357-43ec-b7de-1f8a14b7264b tags:
``` python
plt.rcParams["font.size"] = 16
df = pd.DataFrame({"A": [1,2,8,9], "B": [5,7,12,15]}, index=[10,20,30,40])
ax = df.plot.line(figsize=(4,3))
ax.set_xlabel("Day")
ax.set_ylabel("Amount")
plt.tight_layout()
# Enables us to control borders (aka spines)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
# 1. Replace legened with line labels
# 2. Draw a vertical line at x=20
# color="r", linestyle="--"
# 3. Highlight a region from x=25 to x=35
# color="k", zorder=50, alpha=0.2, linewidth=0
df
```
File added
File added
File added
File added
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id:471a762b tags:
# Visualization 2
%% Cell type:markdown id:2478daaa-cb6c-4d73-92af-01ae91e773fe tags:
### Geographic Data / Maps
#### Installation
```python
pip3 install --upgrade pip
pip3 install geopandas shapely descartes geopy netaddr
sudo apt install -y python3-rtree
```
- `import geopandas as gpd`
- `.shp` => Shapefile
- `gpd.datasets.get_path(<shp file path>)`:
- example: `gpd.datasets.get_path("naturalearth_lowres")`
- `gpd.read_file(<path>)`
%% Cell type:code id:e6f50cc3 tags:
``` python
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import math
import requests
import re
import os
# new import statements
import geopandas as gpd
from shapely.geometry import Point, Polygon, box
```
%% Cell type:code id:257671bc-1e79-47c2-aa7a-1725562d23ef tags:
``` python
!ls /home/gurmail.singh/.local/lib/python3.8/site-packages/geopandas/datasets/naturalearth_lowres
```
%% Cell type:code id:e7269706-188a-48e3-882a-ac068170b1af tags:
``` python
!ls /home/gurmail.singh/.local/lib/python3.8/site-packages/geopandas/datasets
```
%% Cell type:code id:273ed288 tags:
``` python
# Find the path for "naturalearth_lowres"
path =
# Read the shapefile for "naturalearth_lowres" and
# set index using "name" column
gdf =
```
%% Cell type:code id:cf4d871c-d6d6-4d99-be96-75faf804d4a7 tags:
``` python
gdf.head()
```
%% Cell type:code id:5c983208-7851-4d25-92e7-3f110039411a tags:
``` python
type(gdf).__mro__
```
%% Cell type:code id:653edec9-8b82-4684-ae82-a9fa399459ce tags:
``` python
# All shapefiles have a column called "geometry"
gdf["geometry"]
```
%% Cell type:code id:e2eb071a-b29a-4edd-8747-50b02fd43108 tags:
``` python
type(gdf["geometry"]).__mro__
```
%% Cell type:code id:da5b5783-d78e-460f-bd22-b41e7f24f871 tags:
``` python
# First country's name and geometry
```
%% Cell type:code id:8f4da997-567d-47f6-8594-f0a37b92b9e6 tags:
``` python
# Second country's name geometry
print(gdf.index[1])
gdf["geometry"].iat[1]
```
%% Cell type:code id:d380b32f-c401-4601-b6d3-02ac344b3f08 tags:
``` python
# Geometry for "United States of America"
# gdf.at[<row_index>, <column_name>]
```
%% Cell type:code id:3528f252-36b7-4ca5-9108-0951367837cc tags:
``` python
# Type of Tanzania's geometry
print(gdf.index[1], type(gdf["geometry"].iat[1]))
# Type of United States of America's geometry
print("United States of America", type(gdf.at["United States of America", "geometry"]))
```
%% Cell type:markdown id:47711b72-a24c-4da8-a804-3faf88a5fe8b tags:
- `gdf.plot(figsize=(<width>, <height>), column=<column name>)`
- `ax.set_axis_off()`
%% Cell type:code id:9874c7de-226e-4296-af60-45e091836a19 tags:
``` python
ax = gdf.plot(figsize=(8,4))
```
%% Cell type:code id:f5547cc5-3404-421e-aeed-4b4cf8ee8382 tags:
``` python
# Set facecolor="0.7", edgecolor="black"
ax = gdf.plot(figsize=(8,4))
# Turn off the axes
# ax.set_axis_off()
```
%% Cell type:code id:58da716e tags:
``` python
# Color the map based on population column, column="pop_est" and set cmap="cool" and legend=True
ax = gdf.plot(figsize=(8,4))
# Turn off the axes
ax.set_axis_off()
```
%% Cell type:markdown id:c7a85431-bdab-46ab-8e29-5f91d8a2e0bc tags:
#### Create a map where countries with >100M people are red, others are gray.
%% Cell type:code id:962a552b-94a6-4690-8018-f82173dd6096 tags:
``` python
# Create a map where countries with >100M people are red, others are gray
# Add a new column called color to gdf and set default value to "lightgray"
# Boolean indexing to set color to red for countries with "pop_est" > 1e8
# Create the plot
# ax = gdf.plot(figsize=(8,4), color=gdf["color"])
# ax.set_axis_off()
```
%% Cell type:markdown id:5a32c52c-509f-4f7f-8dd6-bc7fe53c64fd tags:
### All shapefile geometries are shapely shapes.
%% Cell type:code id:33e92def-a7db-4d90-adc8-a3d01d472ac1 tags:
``` python
type(gdf["geometry"].iat[2])
```
%% Cell type:markdown id:39c95c23 tags:
### Shapely shapes
- `from shapely.geometry import Point, Polygon, box`
- `Polygon([(<x1>, <y1>), (<x2>, <y2>), (<x3>, <y3>), ...])`
- `box(minx, miny, maxx, maxy)`
- `Point(<x>, <y>)`
- `<shapely object>.buffer(<size>)`
- example: `Point(5, 5).buffer(3)` creates a circle
%% Cell type:code id:61716db9 tags:
``` python
triangle = # triangle
triangle
```
%% Cell type:code id:6a36021a-8653-4698-a0ba-818c14091d79 tags:
``` python
type(triangle)
```
%% Cell type:code id:bddd958d-6fde-42df-8ae3-459e25fa3f04 tags:
``` python
box1 = # not a type; just a function that creates box
box1
```
%% Cell type:code id:eb7ade8d-96d2-4770-bce9-b3e35996b0b8 tags:
``` python
type(box1)
```
%% Cell type:code id:e5308c61-b1fa-433b-bb05-485ca7bd23da tags:
``` python
point =
point
```
%% Cell type:code id:5c669619-af78-477d-b807-3e6d99278f2f tags:
``` python
type(point)
```
%% Cell type:code id:f015d27e-8fd8-446f-992f-4f7a88cc582d tags:
``` python
# use buffer to create a circle from a point
circle =
circle
```
%% Cell type:code id:39fe5752-8038-46cc-b4a6-320e6a78bbd1 tags:
``` python
type(circle)
```
%% Cell type:code id:800cc48d-c241-4439-9161-ff309e50373f tags:
``` python
triangle_buffer = triangle.buffer(3)
triangle_buffer
```
%% Cell type:code id:6b9478d3-7cc9-4e80-ae84-e56d7bfd0b71 tags:
``` python
type(triangle_buffer)
```
%% Cell type:markdown id:1a340443-d750-43d5-99cd-28e7034ce898 tags:
#### Shapely methods
- Shapely methods:
- `union`: any point that is in either shape (OR)
- `intersection`: any point that is in both shapes (AND)
- `difference`: subtraction
- `intersects`: do they overlap? returns True / False
%% Cell type:code id:d1c5f9f7 tags:
``` python
# union triangle and box1
# it will give any point that is in either shape (OR)
```
%% Cell type:code id:8a2d3357 tags:
``` python
# intersection triangle and box1
# any point that is in both shapes (AND)
```
%% Cell type:code id:f327bb99-be34-4a00-a216-a6f40df48268 tags:
``` python
# difference of triangle and box1
```
%% Cell type:code id:9082b54a tags:
``` python
# difference of box1 and triangle
box1.difference(triangle) # subtraction
```
%% Cell type:code id:0def4359-78f6-4f14-80ae-05c25ff64bc5 tags:
``` python
# check whether triangle intersects box1
# the is, check do they overlap?
```
%% Cell type:markdown id:51a82cf3-fe28-46d2-a019-d87f6a0f41b6 tags:
Is the point "near" (<6 units) the triangle?
%% Cell type:code id:7a87b70f tags:
``` python
triangle.union(point.buffer(6))
```
%% Cell type:code id:e04daa60 tags:
``` python
triangle.intersects(point.buffer(6))
```
%% Cell type:markdown id:bf500303 tags:
#### Extracting "Europe" data from "naturalearth_lowres"
%% Cell type:code id:410b08cf tags:
``` python
# Europe bounding box
eur_window = box(-10.67, 34.5, 31.55, 71.05)
```
%% Cell type:markdown id:d45fdfcb-b244-4eff-a9d6-5cc4fefdfbd8 tags:
Can we use `intersects` method?
%% Cell type:code id:24f4a32b-9ed4-468b-a194-ebe0395fcdb6 tags:
``` python
gdf.intersects(eur_window)
```
%% Cell type:code id:ff455178-84da-45bf-b5b9-a71a7f9160f7 tags:
``` python
# Incorrect v1
gdf[gdf.intersects(eur_window)].plot()
```
%% Cell type:code id:03aa8da8-72f4-45fd-8931-e410d1204226 tags:
``` python
# Incorrect v2
gdf[~gdf.intersects(eur_window)].plot()
```
%% Cell type:markdown id:4173f3c5-017f-44ff-b713-158219fcf3e5 tags:
Can we use `intersection` method?
%% Cell type:code id:d7226fd9-3bb2-48c3-a5e9-5ce1ca0dd88f tags:
``` python
gdf.intersection(eur_window)
```
%% Cell type:code id:108b8b8a tags:
``` python
gdf.intersection(eur_window).plot()
```
%% Cell type:markdown id:f13e4e16-286a-4681-8bc5-9bd3fcf599a7 tags:
How can we get rid of empty polygons (and remove the warning)?
%% Cell type:code id:8c0dd051-a808-4fd4-b5ff-fee6ed580753 tags:
``` python
eur = gdf.intersection(eur_window)
eur.is_empty
```
%% Cell type:markdown id:acb1b886-f7c8-41d4-979a-2309c2b973bd tags:
Remove all the empty polygons using `is_empty`.
%% Cell type:code id:55a76a00 tags:
``` python
eur = eur[~eur.is_empty]
eur
```
%% Cell type:code id:08c59df7 tags:
``` python
eur.plot()
```
%% Cell type:markdown id:97a12d23 tags:
#### Centroids of European countries
%% Cell type:code id:eb5c83b7 tags:
``` python
# plot the centroids
ax = eur.plot(facecolor="lightgray", edgecolor="k")
eur.centroid.plot(ax=ax)
```
%% Cell type:markdown id:dbfab5e7-5941-438a-baf2-e05369106004 tags:
### Lat / Lon CRS
- Lon is x-coord
- Lat is y-coord
- tells you where the point on Earth is
- **IMPORTANT**: degrees are not a unit of distance. 1 degree of longitute near the equator is a lot farther than moving 1 degree of longitute near the north pole
Using `.crs` to access CRS of a gdf.
%% Cell type:code id:89251896 tags:
``` python
eur.crs
```
%% Cell type:markdown id:2a6ba447-6f1a-4d31-8c33-ec1eeb9e0deb tags:
#### Single CRS doesn't work for the whole earth
- Setting a different CRS for Europe that is based on meters.
- https://spatialreference.org/ref/?search=europe
%% Cell type:code id:586038b1 tags:
``` python
# Setting CRS to "EPSG:3035"
eur2 = eur.to_crs("EPSG:3035")
eur2.crs
```
%% Cell type:code id:d46c124c-9aff-4c2f-81fe-08885b606800 tags:
``` python
ax = eur2.plot(facecolor="lightgray", edgecolor="k")
eur2.centroid.plot()
```
%% Cell type:code id:045b9c33 tags:
``` python
ax = eur2.plot(facecolor="lightgray", edgecolor="k")
eur2.centroid.plot(ax=ax)
```
%% Cell type:markdown id:0634941f tags:
#### How much error does lat/long computation introduce?
%% Cell type:code id:c0b72aff tags:
``` python
ax = eur2.plot(facecolor="lightgray", edgecolor="k")
eur2.centroid.plot(ax=ax, color="k") # black => correct
eur.centroid.to_crs("EPSG:3035").plot(ax=ax, color="r") # red => miscalculated
```
%% Cell type:code id:ca9e306e tags:
``` python
type(eur2.iloc[0])
```
%% Cell type:code id:f489c88d-5964-4358-b8c4-fc80d28b5491 tags:
``` python
type(eur2).__mro__
```
%% Cell type:markdown id:85880c73 tags:
#### Area of European countries
%% Cell type:code id:3e4874d9 tags:
``` python
eur2.area # area in sq meters
```
%% Cell type:markdown id:95f55824 tags:
What is the area in **sq miles**?
%% Cell type:code id:85ee20c2 tags:
``` python
# Conversion: / 1000 / 1000 / 2.59
(eur2.area / 1000 / 1000 / 2.59).sort_values(ascending=False)
# careful! some countries (e.g., Russia) were cropped when we did intersection
```
%% Cell type:code id:cd600837 tags:
``` python
# area on screen, not real area
eur.area
```
%% Cell type:markdown id:daf1245f-9939-468d-9a5f-34225c2dbc51 tags:
### CRS
- `<GeoDataFrame object>.crs`: gives you information about current CRS.
- `<GeoDataFrame object>.to_crs(<TARGET CRS>)`: changes CRS to `<TARGET CRS>`.
%% Cell type:markdown id:9da3ee4c tags:
### Madison area emergency services
- Data source: https://data-cityofmadison.opendata.arcgis.com/
- Search for:
- "City limit"
- "Lakes and rivers"
- "Fire stations"
- "Police stations"
- CRS for Madison area: https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system#/media/File:Universal_Transverse_Mercator_zones.svg
%% Cell type:code id:a6f80847 tags:
``` python
city = gpd.read_file("City_Limit.zip").to_crs("epsg:32616")
```
%% Cell type:code id:7f8595be tags:
``` python
city.crs
```
%% Cell type:code id:9ebd361f tags:
``` python
water = gpd.read_file("Lakes_and_Rivers.zip").to_crs(city.crs)
fire = gpd.read_file("Fire_Stations.zip").to_crs(city.crs)
police = gpd.read_file("Police_Stations.zip").to_crs(city.crs)
```
%% Cell type:markdown id:723e2bff-545f-4a8c-9f72-8a9906a99b1b tags:
#### Run this on your virtual machine
`sudo sh -c "echo 'Options = UnsafeLegacyRenegotiation' >> /etc/ssl/openssl.cnf"`
then restart notebook!
%% Cell type:markdown id:b6069860-7dd9-43f2-970e-fd15980135ef tags:
#### GeoJSON
How to find the below URL?
- Go to info page of a dataset, for example: https://data-cityofmadison.opendata.arcgis.com/datasets/police-stations/explore?location=43.081769%2C-89.391550%2C12.81
- Then click on "I want to use this" > "View API Resources" > "GeoJSON"
%% Cell type:code id:3a095d5e tags:
``` python
url = "https://maps.cityofmadison.com/arcgis/rest/services/Public/OPEN_DATA/MapServer/2/query?outFields=*&where=1%3D1&f=geojson"
police2 = gpd.read_file(url).to_crs(city.crs)
```
%% Cell type:code id:248be81e tags:
``` python
ax = city.plot(color="lightgray")
water.plot(color="lightblue", ax=ax)
fire.plot(color="red", ax=ax, marker="+", label="Fire")
police2.plot(color="blue", ax=ax, label="Police")
ax.legend(loc="upper left", frameon=False)
ax.set_axis_off()
```
%% Cell type:code id:3a609d81 tags:
``` python
fire.to_file("fire.geojson")
```
%% Cell type:markdown id:77600884 tags:
### Geocoding: street address => lat / lon
- `gpd.tools.geocode(<street address>, provider=<geocoding service name>, user_agent=<user agent name>)`: converts street address into lat/long
#### Daily incident reports: https://www.cityofmadison.com/fire/daily-reports
%% Cell type:code id:6b0b2aa0 tags:
``` python
url = "https://www.cityofmadison.com/fire/daily-reports"
r = requests.get(url)
r
```
%% Cell type:code id:bee28b41 tags:
``` python
r.raise_for_status() # give me an exception if not 200 (e.g., 404)
```
%% Cell type:code id:0bae00e7 tags:
``` python
# doesn't work
# pd.read_html(url)
```
%% Cell type:code id:47173ec2 tags:
``` python
# print(r.text)
```
%% Cell type:markdown id:39f166c5 tags:
Find all **span** tags with **streetAddress** using regex.
%% Cell type:code id:ac7b9482-5512-49e4-b0b8-7f9ed34b5844 tags:
``` python
# <p>1700 block Thierer Road<br>
# addrs = re.findall(r'<p>1700 block Thierer Road<br>', r.text)
```
%% Cell type:code id:8e9b49d2-3e0d-4a39-9dcf-13b288a165e7 tags:
``` python
addrs = re.findall(r'', r.text)
addrs = pd.Series(addrs)
addrs
```
%% Cell type:markdown id:4fc8ba0c-9d45-4251-8055-1310cabf15a9 tags:
#### Without city name and state name, geocoding would return match with the most famous location with such a street name.
%% Cell type:code id:095b9ebe-b583-4098-a3a2-47dd46610d59 tags:
``` python
geo_info = gpd.tools.geocode("1300 East Washington Ave")
geo_info
```
%% Cell type:code id:e52ee0fc-d73c-4942-9bec-d1586a702f68 tags:
``` python
geo_info["address"].loc[0]
```
%% Cell type:markdown id:e80b714c-a962-4a47-b7cc-3a19d0da8ac0 tags:
#### To get the correct address we want, we should concatenate "Madison, Wisconsin" to the end of the address.
%% Cell type:code id:cf3f590b-8d00-46c4-ac57-6f2f61509f68 tags:
``` python
geo_info = gpd.tools.geocode("1300 East Washington Ave, Madison, Wisconsin")
geo_info
```
%% Cell type:markdown id:3caf8c12-67a3-4e19-a758-d556d010eead tags:
#### Addresses with "block" often won't work or won't give you the correct lat/long. We need to remove the word "block" before geocoding.
%% Cell type:code id:54103e4a-5ac2-4f19-811b-385c02623ed2 tags:
``` python
gpd.tools.geocode("800 block W. Johnson Street, Madison, Wisconsin")
```
%% Cell type:code id:66072f4b-2286-4d66-ba2c-18ccd2e37ed6 tags:
``` python
gpd.tools.geocode("800 W. Johnson Street, Madison, Wisconsin")
```
%% Cell type:code id:cf982302 tags:
``` python
fixed_addrs = addrs.str.replace(" block ", " ") + ", Madison, WI"
fixed_addrs
```
%% Cell type:markdown id:d6a267ed-6876-4866-a66f-74390a3d4ee1 tags:
#### Using a different provider than the default one
- `gpd.tools.geocode(<street address>, provider=<geocoding service name>, user_agent=<user agent name>)`: converts street address into lat/long
- We will be using "OpenStreetMap", for which the argument is "nominatim"
- We also need to specify argument to `user_agent` parameter, indicating where the request is coming from; for example: "cs320_bot"
%% Cell type:code id:ab0e699f tags:
``` python
incidents = gpd.tools.geocode(fixed_addrs, provider="nominatim", user_agent="cs320bot")
incidents
```
%% Cell type:markdown id:4ad73b2c-5171-45c2-a3f0-eef30f93c492 tags:
It is often a good idea to drop na values. Although in this version of the example, there are no failed geocodings.
%% Cell type:code id:41a7d12e-73be-4442-b43c-285229e6cfdb tags:
``` python
incidents = incidents.dropna()
incidents
```
%% Cell type:markdown id:b30733e6-9491-4be0-bf20-bba64903334d tags:
#### Self-practice
If you want practice with regex, try to write regular expression and use the match result to make sure that "Madison" and "Wisconsin" is part of each address.
%% Cell type:code id:843bbba2-3de5-4cc0-b76b-92e7bebdd379 tags:
``` python
# self-practice
for addr in incidents["address"]:
print(addr)
```
%% Cell type:code id:1a04c2b0 tags:
``` python
ax = city.plot(color="lightgray")
water.plot(color="lightblue", ax=ax)
fire.plot(color="red", ax=ax, marker="+", label="Fire")
police2.plot(color="blue", ax=ax, label="Police")
incidents.to_crs(city.crs).plot(ax=ax, color="k", label="Incidents")
ax.legend(loc="upper left", frameon=False)
ax.set_axis_off()
```
%% Cell type:markdown id:471a762b tags:
# Visualization 2
%% Cell type:markdown id:2478daaa-cb6c-4d73-92af-01ae91e773fe tags:
### Geographic Data / Maps
#### Installation
```python
pip3 install --upgrade pip
pip3 install geopandas shapely descartes geopy netaddr
sudo apt install -y python3-rtree
```
- `import geopandas as gpd`
- `.shp` => Shapefile
- `gpd.datasets.get_path(<shp file path>)`:
- example: `gpd.datasets.get_path("naturalearth_lowres")`
- `gpd.read_file(<path>)`
%% Cell type:code id:e6f50cc3 tags:
``` python
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import math
import requests
import re
import os
# new import statements
import geopandas as gpd
from shapely.geometry import Point, Polygon, box
```
%% Cell type:code id:257671bc-1e79-47c2-aa7a-1725562d23ef tags:
``` python
!ls /home/gurmail.singh/.local/lib/python3.8/site-packages/geopandas/datasets/naturalearth_lowres
```
%% Cell type:code id:e7269706-188a-48e3-882a-ac068170b1af tags:
``` python
!ls /home/gurmail.singh/.local/lib/python3.8/site-packages/geopandas/datasets
```
%% Cell type:code id:273ed288 tags:
``` python
# Find the path for "naturalearth_lowres"
path =
# Read the shapefile for "naturalearth_lowres" and
# set index using "name" column
gdf =
```
%% Cell type:code id:cf4d871c-d6d6-4d99-be96-75faf804d4a7 tags:
``` python
gdf.head()
```
%% Cell type:code id:5c983208-7851-4d25-92e7-3f110039411a tags:
``` python
type(gdf).__mro__
```
%% Cell type:code id:653edec9-8b82-4684-ae82-a9fa399459ce tags:
``` python
# All shapefiles have a column called "geometry"
gdf["geometry"]
```
%% Cell type:code id:e2eb071a-b29a-4edd-8747-50b02fd43108 tags:
``` python
type(gdf["geometry"]).__mro__
```
%% Cell type:code id:da5b5783-d78e-460f-bd22-b41e7f24f871 tags:
``` python
# First country's name and geometry
```
%% Cell type:code id:8f4da997-567d-47f6-8594-f0a37b92b9e6 tags:
``` python
# Second country's name geometry
print(gdf.index[1])
gdf["geometry"].iat[1]
```
%% Cell type:code id:d380b32f-c401-4601-b6d3-02ac344b3f08 tags:
``` python
# Geometry for "United States of America"
# gdf.at[<row_index>, <column_name>]
```
%% Cell type:code id:3528f252-36b7-4ca5-9108-0951367837cc tags:
``` python
# Type of Tanzania's geometry
print(gdf.index[1], type(gdf["geometry"].iat[1]))
# Type of United States of America's geometry
print("United States of America", type(gdf.at["United States of America", "geometry"]))
```
%% Cell type:markdown id:47711b72-a24c-4da8-a804-3faf88a5fe8b tags:
- `gdf.plot(figsize=(<width>, <height>), column=<column name>)`
- `ax.set_axis_off()`
%% Cell type:code id:9874c7de-226e-4296-af60-45e091836a19 tags:
``` python
ax = gdf.plot(figsize=(8,4))
```
%% Cell type:code id:f5547cc5-3404-421e-aeed-4b4cf8ee8382 tags:
``` python
# Set facecolor="0.7", edgecolor="black"
ax = gdf.plot(figsize=(8,4))
# Turn off the axes
# ax.set_axis_off()
```
%% Cell type:code id:58da716e tags:
``` python
# Color the map based on population column, column="pop_est" and set cmap="cool" and legend=True
ax = gdf.plot(figsize=(8,4))
# Turn off the axes
ax.set_axis_off()
```
%% Cell type:markdown id:c7a85431-bdab-46ab-8e29-5f91d8a2e0bc tags:
#### Create a map where countries with >100M people are red, others are gray.
%% Cell type:code id:962a552b-94a6-4690-8018-f82173dd6096 tags:
``` python
# Create a map where countries with >100M people are red, others are gray
# Add a new column called color to gdf and set default value to "lightgray"
# Boolean indexing to set color to red for countries with "pop_est" > 1e8
# Create the plot
# ax = gdf.plot(figsize=(8,4), color=gdf["color"])
# ax.set_axis_off()
```
%% Cell type:markdown id:5a32c52c-509f-4f7f-8dd6-bc7fe53c64fd tags:
### All shapefile geometries are shapely shapes.
%% Cell type:code id:33e92def-a7db-4d90-adc8-a3d01d472ac1 tags:
``` python
type(gdf["geometry"].iat[2])
```
%% Cell type:markdown id:39c95c23 tags:
### Shapely shapes
- `from shapely.geometry import Point, Polygon, box`
- `Polygon([(<x1>, <y1>), (<x2>, <y2>), (<x3>, <y3>), ...])`
- `box(minx, miny, maxx, maxy)`
- `Point(<x>, <y>)`
- `<shapely object>.buffer(<size>)`
- example: `Point(5, 5).buffer(3)` creates a circle
%% Cell type:code id:61716db9 tags:
``` python
triangle = # triangle
triangle
```
%% Cell type:code id:6a36021a-8653-4698-a0ba-818c14091d79 tags:
``` python
type(triangle)
```
%% Cell type:code id:bddd958d-6fde-42df-8ae3-459e25fa3f04 tags:
``` python
box1 = # not a type; just a function that creates box
box1
```
%% Cell type:code id:eb7ade8d-96d2-4770-bce9-b3e35996b0b8 tags:
``` python
type(box1)
```
%% Cell type:code id:e5308c61-b1fa-433b-bb05-485ca7bd23da tags:
``` python
point =
point
```
%% Cell type:code id:5c669619-af78-477d-b807-3e6d99278f2f tags:
``` python
type(point)
```
%% Cell type:code id:f015d27e-8fd8-446f-992f-4f7a88cc582d tags:
``` python
# use buffer to create a circle from a point
circle =
circle
```
%% Cell type:code id:39fe5752-8038-46cc-b4a6-320e6a78bbd1 tags:
``` python
type(circle)
```
%% Cell type:code id:800cc48d-c241-4439-9161-ff309e50373f tags:
``` python
triangle_buffer = triangle.buffer(3)
triangle_buffer
```
%% Cell type:code id:6b9478d3-7cc9-4e80-ae84-e56d7bfd0b71 tags:
``` python
type(triangle_buffer)
```
%% Cell type:markdown id:1a340443-d750-43d5-99cd-28e7034ce898 tags:
#### Shapely methods
- Shapely methods:
- `union`: any point that is in either shape (OR)
- `intersection`: any point that is in both shapes (AND)
- `difference`: subtraction
- `intersects`: do they overlap? returns True / False
%% Cell type:code id:d1c5f9f7 tags:
``` python
# union triangle and box1
# it will give any point that is in either shape (OR)
```
%% Cell type:code id:8a2d3357 tags:
``` python
# intersection triangle and box1
# any point that is in both shapes (AND)
```
%% Cell type:code id:f327bb99-be34-4a00-a216-a6f40df48268 tags:
``` python
# difference of triangle and box1
```
%% Cell type:code id:9082b54a tags:
``` python
# difference of box1 and triangle
box1.difference(triangle) # subtraction
```
%% Cell type:code id:0def4359-78f6-4f14-80ae-05c25ff64bc5 tags:
``` python
# check whether triangle intersects box1
# the is, check do they overlap?
```
%% Cell type:markdown id:51a82cf3-fe28-46d2-a019-d87f6a0f41b6 tags:
Is the point "near" (<6 units) the triangle?
%% Cell type:code id:7a87b70f tags:
``` python
triangle.union(point.buffer(6))
```
%% Cell type:code id:e04daa60 tags:
``` python
triangle.intersects(point.buffer(6))
```
%% Cell type:markdown id:bf500303 tags:
#### Extracting "Europe" data from "naturalearth_lowres"
%% Cell type:code id:410b08cf tags:
``` python
# Europe bounding box
eur_window = box(-10.67, 34.5, 31.55, 71.05)
```
%% Cell type:markdown id:d45fdfcb-b244-4eff-a9d6-5cc4fefdfbd8 tags:
Can we use `intersects` method?
%% Cell type:code id:24f4a32b-9ed4-468b-a194-ebe0395fcdb6 tags:
``` python
gdf.intersects(eur_window)
```
%% Cell type:code id:ff455178-84da-45bf-b5b9-a71a7f9160f7 tags:
``` python
# Incorrect v1
gdf[gdf.intersects(eur_window)].plot()
```
%% Cell type:code id:03aa8da8-72f4-45fd-8931-e410d1204226 tags:
``` python
# Incorrect v2
gdf[~gdf.intersects(eur_window)].plot()
```
%% Cell type:markdown id:4173f3c5-017f-44ff-b713-158219fcf3e5 tags:
Can we use `intersection` method?
%% Cell type:code id:d7226fd9-3bb2-48c3-a5e9-5ce1ca0dd88f tags:
``` python
gdf.intersection(eur_window)
```
%% Cell type:code id:108b8b8a tags:
``` python
gdf.intersection(eur_window).plot()
```
%% Cell type:markdown id:f13e4e16-286a-4681-8bc5-9bd3fcf599a7 tags:
How can we get rid of empty polygons (and remove the warning)?
%% Cell type:code id:8c0dd051-a808-4fd4-b5ff-fee6ed580753 tags:
``` python
eur = gdf.intersection(eur_window)
eur.is_empty
```
%% Cell type:markdown id:acb1b886-f7c8-41d4-979a-2309c2b973bd tags:
Remove all the empty polygons using `is_empty`.
%% Cell type:code id:55a76a00 tags:
``` python
eur = eur[~eur.is_empty]
eur
```
%% Cell type:code id:08c59df7 tags:
``` python
eur.plot()
```
%% Cell type:markdown id:97a12d23 tags:
#### Centroids of European countries
%% Cell type:code id:eb5c83b7 tags:
``` python
# plot the centroids
ax = eur.plot(facecolor="lightgray", edgecolor="k")
eur.centroid.plot(ax=ax)
```
%% Cell type:markdown id:dbfab5e7-5941-438a-baf2-e05369106004 tags:
### Lat / Lon CRS
- Lon is x-coord
- Lat is y-coord
- tells you where the point on Earth is
- **IMPORTANT**: degrees are not a unit of distance. 1 degree of longitute near the equator is a lot farther than moving 1 degree of longitute near the north pole
Using `.crs` to access CRS of a gdf.
%% Cell type:code id:89251896 tags:
``` python
eur.crs
```
%% Cell type:markdown id:2a6ba447-6f1a-4d31-8c33-ec1eeb9e0deb tags:
#### Single CRS doesn't work for the whole earth
- Setting a different CRS for Europe that is based on meters.
- https://spatialreference.org/ref/?search=europe
%% Cell type:code id:586038b1 tags:
``` python
# Setting CRS to "EPSG:3035"
eur2 = eur.to_crs("EPSG:3035")
eur2.crs
```
%% Cell type:code id:d46c124c-9aff-4c2f-81fe-08885b606800 tags:
``` python
ax = eur2.plot(facecolor="lightgray", edgecolor="k")
eur2.centroid.plot()
```
%% Cell type:code id:045b9c33 tags:
``` python
ax = eur2.plot(facecolor="lightgray", edgecolor="k")
eur2.centroid.plot(ax=ax)
```
%% Cell type:markdown id:0634941f tags:
#### How much error does lat/long computation introduce?
%% Cell type:code id:c0b72aff tags:
``` python
ax = eur2.plot(facecolor="lightgray", edgecolor="k")
eur2.centroid.plot(ax=ax, color="k") # black => correct
eur.centroid.to_crs("EPSG:3035").plot(ax=ax, color="r") # red => miscalculated
```
%% Cell type:code id:ca9e306e tags:
``` python
type(eur2.iloc[0])
```
%% Cell type:code id:f489c88d-5964-4358-b8c4-fc80d28b5491 tags:
``` python
type(eur2).__mro__
```
%% Cell type:markdown id:85880c73 tags:
#### Area of European countries
%% Cell type:code id:3e4874d9 tags:
``` python
eur2.area # area in sq meters
```
%% Cell type:markdown id:95f55824 tags:
What is the area in **sq miles**?
%% Cell type:code id:85ee20c2 tags:
``` python
# Conversion: / 1000 / 1000 / 2.59
(eur2.area / 1000 / 1000 / 2.59).sort_values(ascending=False)
# careful! some countries (e.g., Russia) were cropped when we did intersection
```
%% Cell type:code id:cd600837 tags:
``` python
# area on screen, not real area
eur.area
```
%% Cell type:markdown id:daf1245f-9939-468d-9a5f-34225c2dbc51 tags:
### CRS
- `<GeoDataFrame object>.crs`: gives you information about current CRS.
- `<GeoDataFrame object>.to_crs(<TARGET CRS>)`: changes CRS to `<TARGET CRS>`.
%% Cell type:markdown id:9da3ee4c tags:
### Madison area emergency services
- Data source: https://data-cityofmadison.opendata.arcgis.com/
- Search for:
- "City limit"
- "Lakes and rivers"
- "Fire stations"
- "Police stations"
- CRS for Madison area: https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system#/media/File:Universal_Transverse_Mercator_zones.svg
%% Cell type:code id:a6f80847 tags:
``` python
city = gpd.read_file("City_Limit.zip").to_crs("epsg:32616")
```
%% Cell type:code id:7f8595be tags:
``` python
city.crs
```
%% Cell type:code id:9ebd361f tags:
``` python
water = gpd.read_file("Lakes_and_Rivers.zip").to_crs(city.crs)
fire = gpd.read_file("Fire_Stations.zip").to_crs(city.crs)
police = gpd.read_file("Police_Stations.zip").to_crs(city.crs)
```
%% Cell type:markdown id:723e2bff-545f-4a8c-9f72-8a9906a99b1b tags:
#### Run this on your virtual machine
`sudo sh -c "echo 'Options = UnsafeLegacyRenegotiation' >> /etc/ssl/openssl.cnf"`
then restart notebook!
%% Cell type:markdown id:b6069860-7dd9-43f2-970e-fd15980135ef tags:
#### GeoJSON
How to find the below URL?
- Go to info page of a dataset, for example: https://data-cityofmadison.opendata.arcgis.com/datasets/police-stations/explore?location=43.081769%2C-89.391550%2C12.81
- Then click on "I want to use this" > "View API Resources" > "GeoJSON"
%% Cell type:code id:3a095d5e tags:
``` python
url = "https://maps.cityofmadison.com/arcgis/rest/services/Public/OPEN_DATA/MapServer/2/query?outFields=*&where=1%3D1&f=geojson"
police2 = gpd.read_file(url).to_crs(city.crs)
```
%% Cell type:code id:248be81e tags:
``` python
ax = city.plot(color="lightgray")
water.plot(color="lightblue", ax=ax)
fire.plot(color="red", ax=ax, marker="+", label="Fire")
police2.plot(color="blue", ax=ax, label="Police")
ax.legend(loc="upper left", frameon=False)
ax.set_axis_off()
```
%% Cell type:code id:3a609d81 tags:
``` python
fire.to_file("fire.geojson")
```
%% Cell type:markdown id:77600884 tags:
### Geocoding: street address => lat / lon
- `gpd.tools.geocode(<street address>, provider=<geocoding service name>, user_agent=<user agent name>)`: converts street address into lat/long
#### Daily incident reports: https://www.cityofmadison.com/fire/daily-reports
%% Cell type:code id:6b0b2aa0 tags:
``` python
url = "https://www.cityofmadison.com/fire/daily-reports"
r = requests.get(url)
r
```
%% Cell type:code id:bee28b41 tags:
``` python
r.raise_for_status() # give me an exception if not 200 (e.g., 404)
```
%% Cell type:code id:0bae00e7 tags:
``` python
# doesn't work
# pd.read_html(url)
```
%% Cell type:code id:47173ec2 tags:
``` python
# print(r.text)
```
%% Cell type:markdown id:39f166c5 tags:
Find all **span** tags with **streetAddress** using regex.
%% Cell type:code id:ac7b9482-5512-49e4-b0b8-7f9ed34b5844 tags:
``` python
# <p>1700 block Thierer Road<br>
# addrs = re.findall(r'<p>1700 block Thierer Road<br>', r.text)
```
%% Cell type:code id:8e9b49d2-3e0d-4a39-9dcf-13b288a165e7 tags:
``` python
addrs = re.findall(r'', r.text)
addrs = pd.Series(addrs)
addrs
```
%% Cell type:markdown id:4fc8ba0c-9d45-4251-8055-1310cabf15a9 tags:
#### Without city name and state name, geocoding would return match with the most famous location with such a street name.
%% Cell type:code id:095b9ebe-b583-4098-a3a2-47dd46610d59 tags:
``` python
geo_info = gpd.tools.geocode("1300 East Washington Ave")
geo_info
```
%% Cell type:code id:e52ee0fc-d73c-4942-9bec-d1586a702f68 tags:
``` python
geo_info["address"].loc[0]
```
%% Cell type:markdown id:e80b714c-a962-4a47-b7cc-3a19d0da8ac0 tags:
#### To get the correct address we want, we should concatenate "Madison, Wisconsin" to the end of the address.
%% Cell type:code id:cf3f590b-8d00-46c4-ac57-6f2f61509f68 tags:
``` python
geo_info = gpd.tools.geocode("1300 East Washington Ave, Madison, Wisconsin")
geo_info
```
%% Cell type:markdown id:3caf8c12-67a3-4e19-a758-d556d010eead tags:
#### Addresses with "block" often won't work or won't give you the correct lat/long. We need to remove the word "block" before geocoding.
%% Cell type:code id:54103e4a-5ac2-4f19-811b-385c02623ed2 tags:
``` python
gpd.tools.geocode("800 block W. Johnson Street, Madison, Wisconsin")
```
%% Cell type:code id:66072f4b-2286-4d66-ba2c-18ccd2e37ed6 tags:
``` python
gpd.tools.geocode("800 W. Johnson Street, Madison, Wisconsin")
```
%% Cell type:code id:cf982302 tags:
``` python
fixed_addrs = addrs.str.replace(" block ", " ") + ", Madison, WI"
fixed_addrs
```
%% Cell type:markdown id:d6a267ed-6876-4866-a66f-74390a3d4ee1 tags:
#### Using a different provider than the default one
- `gpd.tools.geocode(<street address>, provider=<geocoding service name>, user_agent=<user agent name>)`: converts street address into lat/long
- We will be using "OpenStreetMap", for which the argument is "nominatim"
- We also need to specify argument to `user_agent` parameter, indicating where the request is coming from; for example: "cs320_bot"
%% Cell type:code id:ab0e699f tags:
``` python
incidents = gpd.tools.geocode(fixed_addrs, provider="nominatim", user_agent="cs320bot")
incidents
```
%% Cell type:markdown id:4ad73b2c-5171-45c2-a3f0-eef30f93c492 tags:
It is often a good idea to drop na values. Although in this version of the example, there are no failed geocodings.
%% Cell type:code id:41a7d12e-73be-4442-b43c-285229e6cfdb tags:
``` python
incidents = incidents.dropna()
incidents
```
%% Cell type:markdown id:b30733e6-9491-4be0-bf20-bba64903334d tags:
#### Self-practice
If you want practice with regex, try to write regular expression and use the match result to make sure that "Madison" and "Wisconsin" is part of each address.
%% Cell type:code id:843bbba2-3de5-4cc0-b76b-92e7bebdd379 tags:
``` python
# self-practice
for addr in incidents["address"]:
print(addr)
```
%% Cell type:code id:1a04c2b0 tags:
``` python
ax = city.plot(color="lightgray")
water.plot(color="lightblue", ax=ax)
fire.plot(color="red", ax=ax, marker="+", label="Fire")
police2.plot(color="blue", ax=ax, label="Police")
incidents.to_crs(city.crs).plot(ax=ax, color="k", label="Incidents")
ax.legend(loc="upper left", frameon=False)
ax.set_axis_off()
```