Skip to content
Snippets Groups Projects
Commit df587723 authored by gsingh58's avatar gsingh58
Browse files

lec12 & 13 updated

parent 26470de5
No related branches found
No related tags found
No related merge requests found
Pipeline #795906 passed
Showing with 118 additions and 40 deletions
%% Cell type:markdown id:cf313adf tags:
# Web 2: Flask
%% Cell type:code id:d55e4bb4-9f29-4f4f-bba6-05054718259b tags:
``` python
import requests
import time
import urllib.robotparser
```
%% Cell type:markdown id:527600aa tags:
### Rate-limited webpage parsing
- `requests` module:
- `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
- `resp.status_code`: status code of the response
- `resp.text`: `str` text content of the response
- `resp.headers`: `dict` content of response headers
- `@` operator is called a "decorator"
- `flask.Response`: enables us to create a response object instance
- Arguments: `str` representing reponse, `headers` dict representing metadata, `status` representing status code.
- ex:
```python
flask.Response("<b>go away</b>",
status=429,
headers={"Retry-After": "3"})
```
```python
flask.Response("""User-Agent: *
Disallow: /never
""", headers={"Content-Type": "text/plain"})
```
- `flask.request.remote_addr`: enables us to take action based on the IP address from which we receive the request
%% Cell type:code id:8241e51c tags:
``` python
base_url = "http://34.123.132.20:5000/"
```
%% Cell type:code id:6cc81b85 tags:
``` python
def friendly_get(url):
while True:
resp = requests.get(url)
if resp.status_code == 429:
seconds = int(resp.headers.get("Retry-After", 1))
print(f"sleep {seconds}")
time.sleep(seconds)
continue
resp.raise_for_status() # raise exception if not 200
return resp
friendly_get(base_url + "/slow").text
friendly_get(base_url + "slow").text
```
%% Output
'welcome!'
......
import flask # requires installation if not already installed - pip3 install flask
import time
import json
# import json
app = flask.Flask("my application") # name of the web application can be anything
......
%% Cell type:markdown id:cf313adf tags:
# Web 2: Flask
%% Cell type:code id:d55e4bb4-9f29-4f4f-bba6-05054718259b tags:
``` python
import requests
import time
```
%% Cell type:markdown id:527600aa tags:
### Rate-limited webpage parsing
- `requests` module:
- `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
- `resp.status_code`: status code of the response
- `resp.text`: `str` text content of the response
- `resp.headers`: `dict` content of response headers
- `@` operator is called a "decorator"
- `flask.Response`: enables us to create a response object instance
- Arguments: `str` representing reponse, `headers` dict representing metadata, `status` representing status code.
- ex:
```python
flask.Response("<b>go away</b>",
status=429,
headers={"Retry-After": "3"})
```
```python
flask.Response("""User-Agent: *
Disallow: /never
""", headers={"Content-Type": "text/plain"})
```
- `flask.request.remote_addr`: enables us to take action based on the IP address from which we receive the request
%% Cell type:code id:8241e51c tags:
``` python
base_url = "http://34.123.132.20:5000/"
```
%% Cell type:code id:6cc81b85 tags:
``` python
def friendly_get(url):
while True:
resp = requests.get(url)
resp.raise_for_status() # raise exception if not 200
return resp
friendly_get(base_url + "/slow").text
```
......
import flask # requires installation if not already installed - pip3 install flask
import time
import json
app = flask.Flask("my application") # name of the web application can be anything
......@@ -18,8 +18,7 @@ last_visit = 0 # TODO: dict of visit times, for each IP
# TODO: create a dynamic page ha.html
# DYNAMIC
# STATIC
# @ operator is called a "decorator"
# STATIC
# @ operator is called a "decorator"
@app.route("/")
......
%% Cell type:markdown id:cf313adf tags:
# Web 2: Flask
%% Cell type:code id:d55e4bb4-9f29-4f4f-bba6-05054718259b tags:
``` python
import requests
import time
```
%% Cell type:markdown id:527600aa tags:
### Rate-limited webpage parsing
- `requests` module:
- `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
- `resp.status_code`: status code of the response
- `resp.text`: `str` text content of the response
- `resp.headers`: `dict` content of response headers
- `@` operator is called a "decorator"
- `flask.Response`: enables us to create a response object instance
- Arguments: `str` representing reponse, `headers` dict representing metadata, `status` representing status code.
- ex:
```python
flask.Response("<b>go away</b>",
status=429,
headers={"Retry-After": "3"})
```
```python
flask.Response("""User-Agent: *
Disallow: /never
""", headers={"Content-Type": "text/plain"})
```
- `flask.request.remote_addr`: enables us to take action based on the IP address from which we receive the request
%% Cell type:code id:8241e51c tags:
``` python
base_url = "http://34.123.132.20:5000/"
```
%% Cell type:code id:6cc81b85 tags:
``` python
def friendly_get(url):
while True:
resp = requests.get(url)
resp.raise_for_status() # raise exception if not 200
return resp
friendly_get(base_url + "/slow").text
friendly_get(base_url + "slow").text
```
......
import flask # requires installation if not already installed - pip3 install flask
import time
import json
app = flask.Flask("my application") # name of the web application can be anything
......@@ -18,8 +17,7 @@ last_visit = 0 # TODO: dict of visit times, for each IP
# TODO: create a dynamic page ha.html
# DYNAMIC
# STATIC
# @ operator is called a "decorator"
# STATIC
# @ operator is called a "decorator"
@app.route("/")
......
%% Cell type:markdown id:cf313adf tags:
# Web 3: More Flask
%% Cell type:code id:d55e4bb4-9f29-4f4f-bba6-05054718259b tags:
``` python
import requests
import time
import urllib.robotparser
```
%% Cell type:markdown id:527600aa tags:
%% Cell type:markdown id:725234ef-a02b-47da-bfec-aaa16e852a58 tags:
### Rate-limited webpage parsing
- `requests` module:
- `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
- `resp.status_code`: status code of the response
- `resp.text`: `str` text content of the response
- `resp.headers`: `dict` content of response headers
- `flask.request.args`: enables us to get the arguments passed as part of the URL
- How do we pass arguments?
- at the end of the URL, add a "?"
- then separate argument-value pair by "="
- use "&" as delimiter between two argument-value pairs
- examples:
- http://34.123.132.20:5000/add?x=10&y=20
- http://34.123.132.20:5000/survey?major=CS
- http://34.123.132.20:5000/survey?major=Mechanical_Engineering
%% Cell type:code id:8241e51c tags:
``` python
base_url = "http://34.123.132.20:5000/"
```
%% Cell type:markdown id:23ba100b tags:
### `urllib.robotparser`
- Documentation: https://docs.python.org/3/library/urllib.robotparser.html
%% Cell type:code id:379c3ae5-7344-45b1-88c3-b35f0bd8eb5b tags:
``` python
rp = urllib.robotparser.RobotFileParser()
rp.set_url(base_url + "/robots.txt")
rp.read()
rp.can_fetch("cs320bot", base_url + "/slow")
```
%% Output
True
%% Cell type:code id:2e3fb01c-4281-4cbf-8828-98e04d27d09a tags:
``` python
rp.can_fetch("cs320bot", base_url + "/never")
rp.can_fetch("cs320bot", base_url + "never")
```
%% Output
True
False
%% Cell type:code id:6cc81b85 tags:
``` python
def friendly_get(url):
if not rp.can_fetch("cs320bot", url):
raise Exception("you're not supposed to visit that page")
while True:
resp = requests.get(url)
if resp.status_code == 429:
seconds = int(resp.headers.get("Retry-After", 1))
print(f"sleep {seconds}")
time.sleep(seconds)
continue
resp.raise_for_status() # raise exception if not 200
return resp
friendly_get(base_url + "/slow").text
friendly_get(base_url + "slow").text
```
%% Output
'welcome!'
......
import flask # requires installation if not already installed - pip3 install flask
import time
import json
# import json
app = flask.Flask("my application") # name of the web application can be anything
......
%% Cell type:markdown id:cf313adf tags:
# Web 3: More Flask
%% Cell type:code id:d55e4bb4-9f29-4f4f-bba6-05054718259b tags:
``` python
import requests
import time
```
%% Cell type:markdown id:527600aa tags:
%% Cell type:markdown id:a5f26f67-000a-4d68-8033-3e46023b8196 tags:
### Rate-limited webpage parsing
- `requests` module:
- `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
- `resp.status_code`: status code of the response
- `resp.text`: `str` text content of the response
- `resp.headers`: `dict` content of response headers
- `flask.request.args`: enables us to get the arguments passed as part of the URL
- How do we pass arguments?
- at the end of the URL, add a "?"
- then separate argument-value pair by "="
- use "&" as delimiter between two argument-value pairs
- examples:
- http://34.123.132.20:5000/add?x=10&y=20
- http://34.123.132.20:5000/survey?major=CS
- http://34.123.132.20:5000/survey?major=Mechanical_Engineering
%% Cell type:code id:8241e51c tags:
``` python
base_url = "http://34.123.132.20:5000/"
```
%% Cell type:markdown id:23ba100b tags:
### `urllib.robotparser`
- Documentation: https://docs.python.org/3/library/urllib.robotparser.html
%% Cell type:code id:379c3ae5-7344-45b1-88c3-b35f0bd8eb5b tags:
``` python
```
%% Cell type:code id:2e3fb01c-4281-4cbf-8828-98e04d27d09a tags:
``` python
```
%% Cell type:code id:6cc81b85 tags:
``` python
def friendly_get(url):
while True:
resp = requests.get(url)
if resp.status_code == 429:
seconds = int(resp.headers.get("Retry-After", 1))
print(f"sleep {seconds}")
time.sleep(seconds)
continue
resp.raise_for_status() # raise exception if not 200
return resp
friendly_get(base_url + "/slow").text
friendly_get(base_url + "slow").text
```
......
import flask # requires installation if not already installed - pip3 install flask
import time
import json
app = flask.Flask("my application") # name of the web application can be anything
......@@ -29,8 +28,6 @@ last_visit = 0 # TODO: dict of visit times, for each IP
# TODO: create a dynamic page ha.html
# DYNAMIC
# STATIC
# @ operator is called a "decorator"
# STATIC
# @ operator is called a "decorator"
@app.route("/")
......
%% Cell type:markdown id:cf313adf tags:
# Web 3: More Flask
%% Cell type:code id:d55e4bb4-9f29-4f4f-bba6-05054718259b tags:
``` python
import requests
import time
```
%% Cell type:markdown id:14525019-f30b-40a7-be53-db49868abce6 tags:
- `flask.request.args`: enables us to get the arguments passed as part of the URL
- How do we pass arguments?
- at the end of the URL, add a "?"
- then separate argument-value pair by "="
- use "&" as delimiter between two argument-value pairs
- examples:
- http://34.123.132.20:5000/add?x=10&y=20
- http://34.123.132.20:5000/survey?major=CS
- http://34.123.132.20:5000/survey?major=Mechanical_Engineering
%% Cell type:markdown id:527600aa tags:
### Rate-limited webpage parsing
- `requests` module:
- `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
- `resp.status_code`: status code of the response
- `resp.text`: `str` text content of the response
- `resp.headers`: `dict` content of response headers
%% Cell type:code id:8241e51c tags:
``` python
base_url = "http://34.123.132.20:5000/"
```
%% Cell type:markdown id:23ba100b tags:
### `urllib.robotparser`
- Documentation: https://docs.python.org/3/library/urllib.robotparser.html
%% Cell type:code id:379c3ae5-7344-45b1-88c3-b35f0bd8eb5b tags:
``` python
```
%% Cell type:code id:2e3fb01c-4281-4cbf-8828-98e04d27d09a tags:
``` python
```
%% Cell type:code id:6cc81b85 tags:
``` python
def friendly_get(url):
while True:
resp = requests.get(url)
if resp.status_code == 429:
seconds = int(resp.headers.get("Retry-After", 1))
print(f"sleep {seconds}")
time.sleep(seconds)
continue
resp.raise_for_status() # raise exception if not 200
return resp
friendly_get(base_url + "/slow").text
friendly_get(base_url + "slow").text
```
......
import flask # requires installation if not already installed - pip3 install flask
import time
import json
app = flask.Flask("my application") # name of the web application can be anything
......@@ -29,8 +28,6 @@ last_visit = 0 # TODO: dict of visit times, for each IP
# TODO: create a dynamic page ha.html
# DYNAMIC
# STATIC
# @ operator is called a "decorator"
# STATIC
# @ operator is called a "decorator"
@app.route("/")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment