Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • cdis/cs/courses/cs544/s25/main
  • zzhang2478/main
  • spark667/main
  • vijayprabhak/main
  • vijayprabhak/544-main
  • wyang338/cs-544-s-25
  • jmin39/main
7 results
Show changes
Commits on Source (110)
Showing
with 1916 additions and 1 deletion
# Read-only Access
We've opened up the `autobadger` tool in attempt to make things more visible to you and less of a "black box". We've done this by making the repository *read-only*, meaning you should be able to `git clone` the repo but not `git push` to it.
To start, navigate to a directory outside of any class project. I'd recommend cloning to the same directory as your projects.
```bash
git clone https://oauth2:glpat-CSTX_tgpf38eJHyUW213@git.doit.wisc.edu/cdis/cs/courses/cs544/s25/tools/autobadger.git
```
> **NOTE**: if you want to use this method throughout the semester, you'll need to `git pull` to get up-to-date code for each project.
Your folder structure should look something like
```
some-directory/
autobadger/
p1/
p2/
... # other projects
```
# Making Changes
You can change the code inside of `autobadger`. The only files that will be of interest to you are inside the `projects/` directory, i.e. `projects/*.py`). Your changes will be for debugging, i.e. `print()` or `breakpoint()` statements.
#### Using `pip`
For whatever project you're working on, you will need to *apply* any changes you make using `pip`
For example, assuming
- I'm working on `p2`
- in my `p2` directory
- and have my `venv` activated
I would do something like:
```bash
pip3 install ../autobadger/.
```
This would install and replace my local version of `autobadger` . Now when I run
```
autobadger --project=p2
```
I will see my changes in effect.
# Breakpoints
Since `breakpoint()` is less known and straightforward, I will teach about it here.
> **NOTE**: It is not required to use `breakpoint()`. You are also welcome to use `print()` instead. `breakpoint()` has a **steeper learning curve**, but may **help you iterate more quickly and save you time** once the basic concepts are well-understood.
### What is a breakpoint?
`breakpoint()` is a built-in function in Python and starts the **debugger** at the point where it is called. It allows developers to inspect variables, step through code, and debug interactively.
#### Simple Example:
```python
# Inside of /path/to/file.py
def calculate_sum(a, b):
breakpoint() # Debugger starts here
return a + b
calculate_sum(3, 5) # execute function
```
Adding a `breakpoint()` will pause execution, allowing you to inspect `a` and `b` before proceeding. I would see something like:
```
> /path/to/file.py(3)calculate_sum()
-> return a + b
```
in the terminal, which displays
1. the next line to be executed `return a + b`
2. `(3)calculate_sum()` tells me the line number and the function name (if applicable)
3. `/path/to/file.py` tells me the current file
### Navigating the debugger
While the Python debugger is active, you can use several commands to navigate through your program and investigate.
- `Variable name`: I can type any variable that is in scope and get it's value.
- Ex: Typing `a` in the previous example would return the *value* of `a`
- **NOTE**: if a variable name also coincides with a command keyword in the debugger, you may need to use `print(<variable_name>)` instead. `b` is one of those commands, so to print the value of `b` to the terminal, I would need to do `print(b)`:
- `Evaluation`: I can also evaluate statements (i.e. add two numbers)
```
In [3]: calculate_sum(3, 5)
> <ipython-input-2-443b6e8e0b0a>(3)calculate_sum()
-> return a + b
(Pdb) print(a)
3
(Pdb) print(b)
5
(Pdb) print(a + b)
8
```
- `n`: Steps to the next line of my program
- `c`: Continues execution of the program until the next breakpoint, or until the program ends.
- `s`: Steps *into* a function or method call
- `exit`: kills the debugger and ends the program
# An example
### Using breakpoints
Suppose I want to investigate `Q4` for `p2`. I can add `breakpoint()` statements to the Q4 test method for the `ProjectTwoTest` class.
Navigating to `projects/p2.py` inside of `autobadger`, I find:
```python
@graded(Q=4, points=10)
def test_simple_http(self) -> int | TestError:
address = self._test_cache_server("-cache-1")
if isinstance(address, TestError):
return address
r = requests.get(f"{address}/lookup/53706")
r.raise_for_status()
result = r.json()
if "addrs" not in result or "source" not in result:
return TestError(
message=f"Result body should be JSON with 'addrs' and 'source' fields, but got {result}.",
earned=5,
)
return 10
```
> Note: This is Q4 since I have `Q=4` in the decorator.
**I can edit this method by adding *breakpoints*!**
```python
@graded(Q=4, points=10)
def test_simple_http(self) -> int | TestError:
breakpoint()
address = self._test_cache_server("-cache-1")
if isinstance(address, TestError):
return address
r = requests.get(f"{address}/lookup/53706")
breakpoint()
r.raise_for_status()
result = r.json()
if "addrs" not in result or "source" not in result:
return TestError(
message=f"Result body should be JSON with 'addrs' and 'source' fields, but got {result}.",
earned=5,
)
return 10
```
Now, after I update with `pip` as mentioned above, I can run `autobadger --project=p2` and get:
```
> /Users/.../p2.py(103)test_simple_http()
-> address = self._test_cache_server("-cache-1")
```
Note that in this situation, typing `address` would give me an error cause it **not yet defined**:
```
(Pdb) address
*** NameError: name 'address' is not defined
```
###### Using `n` (next line)
`address` defined on the *next line*. So, I use the `n` command to step!
```
(Pdb) n
> /Users/.../p2.py(104)test_simple_http()
-> if isinstance(address, TestError):
(Pdb) address
'http://localhost:64879'
```
###### Using `s` (step into)
I could have also used `s` to *step into* `self._test_cache_server(...)` if I had wanted to investigate further:
```
> /Users/.../p2.py(103)test_simple_http()
-> address = self._test_cache_server("-cache-1")
(Pdb) s
--Call--
> /Users/.../p2.py(118)_test_cache_server()
-> def _test_cache_server(self, server_suffix: str) -> str | TestError:
# Now in a new method — _test_cache_server
(Pdb) n
> /Users/.../p2.py(119)_test_cache_server()
-> cache_server = [c for c in self.containers if c["Name"].endswith(server_suffix)]
```
###### Using `c` (continue)
I can also *continue* till the next breakpoint, which is quite convenient if you don't need to step over every line of code:
```
> /Users/.../p2.py(103)test_simple_http()
-> address = self._test_cache_server("-cache-1")
(Pdb) c
> /Users/.../p2.py(108)test_simple_http()
-> r.raise_for_status()
(Pdb) print(r.json())
{'addrs': [...], 'error': None, 'source': '...'}
```
Using `c` jumped from line `103` to line `108`, where I had my two breakpoints defined.
> **NOTE**: using `c` again would continue the Python program till the end of its execution since I have no other `breakpoint()` statements
\ No newline at end of file
File added
File added
File added
File added
File added
File added
File added
File added
...@@ -2,4 +2,4 @@ FROM ubuntu:24.04 ...@@ -2,4 +2,4 @@ FROM ubuntu:24.04
RUN apt-get update && apt-get install -y python3 python3-pip curl iproute2 RUN apt-get update && apt-get install -y python3 python3-pip curl iproute2
COPY requirements.txt /tmp/requirements.txt COPY requirements.txt /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt --break-system-packages RUN pip3 install -r /tmp/requirements.txt --break-system-packages
CMD ["python3", "-m", "jupyterlab", "--no-browser", "--ip=0.0.0.0", "--port=????", "--allow-root", "--NotebookApp.token=''"] CMD ["python3", "-m", "jupyterlab", "--no-browser", "--ip=0.0.0.0", "--port=600", "--allow-root", "--NotebookApp.token=''"]
FROM ubuntu:24.04
RUN apt-get update && apt-get install -y python3 python3-pip curl iproute2 wget unzip
COPY requirements.txt /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt --break-system-packages
RUN wget https://pages.cs.wisc.edu/~harter/cs544/data/hdma-wi-2021.zip && unzip hdma-wi-2021.zip
CMD ["python3", "-m", "jupyterlab", "--no-browser", "--ip=0.0.0.0", "--port=300", "--allow-root", "--NotebookApp.token=''"]
%% Cell type:code id:7371704c-78be-4548-af6a-252ed8b9f4eb tags:
``` python
!getconf -a | grep LEVEL
```
%% Output
LEVEL1_ICACHE_SIZE 32768
LEVEL1_ICACHE_ASSOC
LEVEL1_ICACHE_LINESIZE 64
LEVEL1_DCACHE_SIZE 32768
LEVEL1_DCACHE_ASSOC 8
LEVEL1_DCACHE_LINESIZE 64
LEVEL2_CACHE_SIZE 2097152
LEVEL2_CACHE_ASSOC 8
LEVEL2_CACHE_LINESIZE 64
LEVEL3_CACHE_SIZE 16777216
LEVEL3_CACHE_ASSOC 16
LEVEL3_CACHE_LINESIZE 64
LEVEL4_CACHE_SIZE 0
LEVEL4_CACHE_ASSOC
LEVEL4_CACHE_LINESIZE
%% Cell type:code id:17dddf21-b239-4910-b335-9e9b098315c6 tags:
``` python
import numpy as np
import time
```
%% Cell type:code id:4403ac1c-1a82-40fb-9fec-9edc5cd7c249 tags:
``` python
A = np.random.randint(0, 10, size=(1_000_000, 8))
```
%% Cell type:code id:d36a6bd7-0cdc-4a23-8083-adc8eb89e5b3 tags:
``` python
print(A.shape, A.dtype)
```
%% Output
(1000000, 8) int64
%% Cell type:code id:92e15a21-b589-4f2d-ac46-d7750928f84f tags:
``` python
start = time.time()
result = A[:, 0].sum()
end = time.time()
print(result)
print((end-start)*1000, "ms")
```
%% Output
4500793
3.9000511169433594 ms
%% Cell type:code id:23ce32bd-013b-418d-bce3-9fd12c6c1539 tags:
``` python
B = A.T.copy().T
```
%% Cell type:code id:17edde6c-91ea-4ae2-be5b-3abf032ffd04 tags:
``` python
B.shape
```
%% Output
(1000000, 8)
%% Cell type:code id:551dc87f-c2b5-4011-baec-61e237bc5b8d tags:
``` python
start = time.time()
result = B[:, 0].sum()
end = time.time()
print(result)
print((end-start)*1000, "ms")
```
%% Output
4500793
0.9744167327880859 ms
%% Cell type:markdown id:2577f10b-34ae-4901-8971-f54c879a6f28 tags:
# PyArrow with CPU cache
%% Cell type:code id:1f72a39a-daac-45f7-a356-54f5de10c301 tags:
``` python
import pyarrow as pa
import pyarrow.csv
import pandas as pd
```
%% Cell type:code id:9a66d46b-23b0-4d5a-bcab-891b84cddbb5 tags:
``` python
t0 = time.time()
t = pa.csv.read_csv("hdma-wi-2021.csv")
t1 = time.time()
t1-t0
```
%% Output
0.4076399803161621
%% Cell type:code id:963ae4c3-e87e-445c-9710-7bdbf3528f6b tags:
``` python
t0 = time.time()
df = t.to_pandas()
t1 = time.time()
t1 - t0
```
%% Output
0.26976633071899414
%% Cell type:code id:5c0ebb07-1c07-4388-a49a-3f40bd765efe tags:
``` python
df.head()
```
%% Output
activity_year lei derived_msa-md state_code \
0 2021 54930034MNPILHP25H80 99999 WI
1 2021 54930034MNPILHP25H80 99999 WI
2 2021 54930034MNPILHP25H80 99999 WI
3 2021 54930034MNPILHP25H80 29404 WI
4 2021 54930034MNPILHP25H80 11540 WI
county_code census_tract conforming_loan_limit derived_loan_product_type \
0 55027.0 5.502796e+10 C Conventional:First Lien
1 55001.0 5.500195e+10 C Conventional:First Lien
2 55013.0 5.501397e+10 C Conventional:First Lien
3 55059.0 5.505900e+10 C Conventional:First Lien
4 55087.0 5.508701e+10 C Conventional:First Lien
derived_dwelling_category derived_ethnicity ... \
0 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
1 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
2 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
3 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
4 Single Family (1-4 Units):Site-Built Joint ...
denial_reason-2 denial_reason-3 denial_reason-4 tract_population \
0 NaN NaN NaN 4196
1 NaN NaN NaN 1511
2 NaN NaN NaN 3895
3 NaN NaN NaN 5561
4 NaN NaN NaN 7248
tract_minority_population_percent ffiec_msa_md_median_family_income \
0 3.67 69600
1 5.43 69600
2 9.63 69600
3 9.15 102500
4 5.22 85600
tract_to_msa_income_percentage tract_owner_occupied_units \
0 108 1422
1 65 541
2 80 1685
3 106 1851
4 111 1939
tract_one_to_four_family_homes tract_median_age_of_housing_units
0 1839 57
1 1966 33
2 5859 35
3 2208 30
4 2351 14
[5 rows x 99 columns]
%% Cell type:code id:c2bffa2b-2cff-40c2-b21b-9bd04052d769 tags:
``` python
t0 = time.time()
df = pd.read_csv("hdma-wi-2021.csv")
t1 = time.time()
t1 - t0
```
%% Output
/tmp/ipykernel_21/1409717381.py:2: DtypeWarning: Columns (22,23,24,26,27,28,29,30,31,32,33,38,43,44) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("hdma-wi-2021.csv")
2.950857162475586
%% Cell type:code id:ce44d805-44e6-44a0-8bcc-2098ba0cb649 tags:
``` python
import pyarrow.compute as pc
```
%% Cell type:code id:ae7ddc2b-569c-484a-8286-e1ed171a0d5a tags:
``` python
pc.utf8_lower(t["lei"])
```
%% Output
<pyarrow.lib.ChunkedArray object at 0x71bf417f6d90>
[
[
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
...
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731"
],
[
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
...
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731"
],
...,
[
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
...
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34"
],
[
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
...
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80"
]
]
%% Cell type:code id:1a937571-6f85-4778-90b5-8daf5545f358 tags:
``` python
t[:3].to_pandas()
```
%% Output
activity_year lei derived_msa-md state_code \
0 2021 54930034MNPILHP25H80 99999 WI
1 2021 54930034MNPILHP25H80 99999 WI
2 2021 54930034MNPILHP25H80 99999 WI
county_code census_tract conforming_loan_limit derived_loan_product_type \
0 55027 55027961800 C Conventional:First Lien
1 55001 55001950501 C Conventional:First Lien
2 55013 55013970400 C Conventional:First Lien
derived_dwelling_category derived_ethnicity ... \
0 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
1 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
2 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
denial_reason-2 denial_reason-3 denial_reason-4 tract_population \
0 NaN NaN NaN 4196
1 NaN NaN NaN 1511
2 NaN NaN NaN 3895
tract_minority_population_percent ffiec_msa_md_median_family_income \
0 3.67 69600
1 5.43 69600
2 9.63 69600
tract_to_msa_income_percentage tract_owner_occupied_units \
0 108 1422
1 65 541
2 80 1685
tract_one_to_four_family_homes tract_median_age_of_housing_units
0 1839 57
1 1966 33
2 5859 35
[3 rows x 99 columns]
%% Cell type:code id:2089300f-83dc-4813-b473-6d279e784297 tags:
``` python
pc.mean(t["income"].drop_null()).as_py()
```
%% Output
377.5220353645974
%% Cell type:markdown id:478496cb-ead9-40ac-a75f-0607ab9fe1e3 tags:
# PyArrow with Page Cache
%% Cell type:code id:59e063d1-9f8e-4b45-bffa-23a1350731ad tags:
``` python
import pyarrow as pa
import pyarrow.compute as pc
batch = pa.RecordBatch.from_arrays([range(1,1_000_000),
range(1,1_000_000),
range(1,1_000_000)],
names=["x", "y", "z"])
print(batch.nbytes / 1024**2)
with pa.ipc.new_file("test.arrow", schema=batch.schema) as f:
for i in range(50):
f.write_batch(batch)
```
%% Output
22.888160705566406
%% Cell type:code id:d3fcc839-8401-401a-b4fc-6cd82efc747b tags:
``` python
with pa.ipc.open_file("test.arrow") as f:
t = f.read_all()
```
%% Cell type:code id:385d2d13-2b61-4bb2-82f9-ce84df2a4ce1 tags:
``` python
import mmap
with open("test.arrow", "rb") as f:
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
```
%% Cell type:code id:a6b2c39f-b0bc-4111-87eb-e880f60f0fd6 tags:
``` python
ls -lh *.arrow
```
%% Output
-rw-r--r-- 1 root root 1.2G Feb 12 15:36 test.arrow
%% Cell type:code id:2aa04790-1bf0-4726-97f0-9e4917620d83 tags:
``` python
import pyarrow as pa
with pa.ipc.open_file(mm) as f:
tbl = f.read_all()
```
%% Cell type:code id:9fa9e9e2-9389-4ea9-be35-80df3f9b1393 tags:
``` python
import pyarrow.compute as pc
pc.sum(tbl["x"])
```
%% Output
<pyarrow.Int64Scalar: 24999975000000>
%% Cell type:code id:e4f934ff-8b86-42aa-b18e-09511352aa9e tags:
``` python
pc.sum(tbl["y"])
```
%% Output
<pyarrow.Int64Scalar: 24999975000000>
%% Cell type:code id:8fbfa010-3a2c-4179-9f1e-4513f26d4443 tags:
``` python
pc.sum(tbl["z"])
```
%% Output
<pyarrow.Int64Scalar: 24999975000000>
%% Cell type:code id:48a741c9-4839-49b1-ace4-2a8e5bce8d10 tags:
``` python
! getconf -a | grep LEVEL
```
%% Output
LEVEL1_ICACHE_SIZE 32768
LEVEL1_ICACHE_ASSOC
LEVEL1_ICACHE_LINESIZE 64
LEVEL1_DCACHE_SIZE 32768
LEVEL1_DCACHE_ASSOC 8
LEVEL1_DCACHE_LINESIZE 64
LEVEL2_CACHE_SIZE 2097152
LEVEL2_CACHE_ASSOC 8
LEVEL2_CACHE_LINESIZE 64
LEVEL3_CACHE_SIZE 16777216
LEVEL3_CACHE_ASSOC 16
LEVEL3_CACHE_LINESIZE 64
LEVEL4_CACHE_SIZE 0
LEVEL4_CACHE_ASSOC
LEVEL4_CACHE_LINESIZE
%% Cell type:code id:07b57514-2b12-4414-88ec-3cfea8a5ffc2 tags:
``` python
import numpy as np
```
%% Cell type:code id:b04553d9-1811-4bcf-b125-a31b2f8ee7ba tags:
``` python
A = np.random.randint(0, 10, size=(1_000_000, 8))
A.shape, A.dtype
```
%% Output
((1000000, 8), dtype('int64'))
%% Cell type:code id:ef61a170-1d1b-4e81-a346-ab2e1f3683ba tags:
``` python
import time
```
%% Cell type:code id:cf729c68-086f-46ef-98a2-4a30913aff44 tags:
``` python
t0 = time.time()
result = A[:, 0].sum()
t1 = time.time()
print(result)
print((t1-t0)*1000, "ms")
```
%% Output
4493899
4.729032516479492 ms
%% Cell type:code id:e254c702-e065-4d96-8525-175b5b260a44 tags:
``` python
B = A.T.copy().T
B.shape, B.dtype
```
%% Output
((1000000, 8), dtype('int64'))
%% Cell type:code id:631e2393-5902-44d2-b309-0470ede1781a tags:
``` python
t0 = time.time()
result = B[:, 0].sum()
t1 = time.time()
print(result)
print((t1-t0)*1000, "ms")
```
%% Output
4493899
0.9555816650390625 ms
%% Cell type:code id:e4a8f1cc-8dcf-43f7-97f9-9777335e4fbd tags:
``` python
import pyarrow as pa
import pyarrow.csv
import pandas as pd
```
%% Cell type:code id:71083b80-fd9c-4d18-8810-d424a44a730b tags:
``` python
start = time.time()
t = pa.csv.read_csv("hdma-wi-2021.csv")
end = time.time()
print(end-start)
```
%% Output
0.4116203784942627
%% Cell type:code id:a71c42da-4422-4bd6-b1d9-4b7ca20eb01f tags:
``` python
start = time.time()
df = t.to_pandas()
end = time.time()
print(end-start)
```
%% Output
0.29718923568725586
%% Cell type:code id:0f5e9ade-b837-451b-a063-fdde979eae2b tags:
``` python
t
```
%% Output
pyarrow.Table
activity_year: int64
lei: string
derived_msa-md: int64
state_code: string
county_code: int64
census_tract: int64
conforming_loan_limit: string
derived_loan_product_type: string
derived_dwelling_category: string
derived_ethnicity: string
derived_race: string
derived_sex: string
action_taken: int64
purchaser_type: int64
preapproval: int64
loan_type: int64
loan_purpose: int64
lien_status: int64
reverse_mortgage: int64
open-end_line_of_credit: int64
business_or_commercial_purpose: int64
loan_amount: double
loan_to_value_ratio: string
interest_rate: string
rate_spread: string
hoepa_status: int64
total_loan_costs: string
total_points_and_fees: string
origination_charges: string
discount_points: string
lender_credits: string
loan_term: string
prepayment_penalty_term: string
intro_rate_period: string
negative_amortization: int64
interest_only_payment: int64
balloon_payment: int64
other_nonamortizing_features: int64
property_value: string
construction_method: int64
occupancy_type: int64
manufactured_home_secured_property_type: int64
manufactured_home_land_property_interest: int64
total_units: string
multifamily_affordable_units: string
income: int64
debt_to_income_ratio: string
applicant_credit_score_type: int64
co-applicant_credit_score_type: int64
applicant_ethnicity-1: int64
applicant_ethnicity-2: int64
applicant_ethnicity-3: int64
applicant_ethnicity-4: int64
applicant_ethnicity-5: int64
co-applicant_ethnicity-1: int64
co-applicant_ethnicity-2: int64
co-applicant_ethnicity-3: int64
co-applicant_ethnicity-4: int64
co-applicant_ethnicity-5: null
applicant_ethnicity_observed: int64
co-applicant_ethnicity_observed: int64
applicant_race-1: int64
applicant_race-2: int64
applicant_race-3: int64
applicant_race-4: int64
applicant_race-5: int64
co-applicant_race-1: int64
co-applicant_race-2: int64
co-applicant_race-3: int64
co-applicant_race-4: int64
co-applicant_race-5: int64
applicant_race_observed: int64
co-applicant_race_observed: int64
applicant_sex: int64
co-applicant_sex: int64
applicant_sex_observed: int64
co-applicant_sex_observed: int64
applicant_age: string
co-applicant_age: string
applicant_age_above_62: string
co-applicant_age_above_62: string
submission_of_application: int64
initially_payable_to_institution: int64
aus-1: int64
aus-2: int64
aus-3: int64
aus-4: int64
aus-5: int64
denial_reason-1: int64
denial_reason-2: int64
denial_reason-3: int64
denial_reason-4: int64
tract_population: int64
tract_minority_population_percent: double
ffiec_msa_md_median_family_income: int64
tract_to_msa_income_percentage: int64
tract_owner_occupied_units: int64
tract_one_to_four_family_homes: int64
tract_median_age_of_housing_units: int64
----
activity_year: [[2021,2021,2021,2021,2021,...,2021,2021,2021,2021,2021],[2021,2021,2021,2021,2021,...,2021,2021,2021,2021,2021],...,[2021,2021,2021,2021,2021,...,2021,2021,2021,2021,2021],[2021,2021,2021,2021,2021,...,2021,2021,2021,2021,2021]]
lei: [["54930034MNPILHP25H80","54930034MNPILHP25H80","54930034MNPILHP25H80","54930034MNPILHP25H80","54930034MNPILHP25H80",...,"RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731"],["RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731",...,"RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731","RVDPPPGHCGZ40J4VQ731"],...,["549300LYRWPSYPK6S325","549300LYRWPSYPK6S325","549300LYRWPSYPK6S325","549300LYRWPSYPK6S325","549300LYRWPSYPK6S325",...,"5493002JV15GSVFKZW34","5493002JV15GSVFKZW34","5493002JV15GSVFKZW34","5493002JV15GSVFKZW34","5493002JV15GSVFKZW34"],["5493002JV15GSVFKZW34","5493002JV15GSVFKZW34","5493002JV15GSVFKZW34","5493002JV15GSVFKZW34","5493002JV15GSVFKZW34",...,"54930034MNPILHP25H80","54930034MNPILHP25H80","54930034MNPILHP25H80","54930034MNPILHP25H80","54930034MNPILHP25H80"]]
derived_msa-md: [[99999,99999,99999,29404,11540,...,31540,99999,99999,33460,22540],[99999,99999,29404,99999,20740,...,29404,29404,99999,20260,33340],...,[33340,48140,27500,33340,33340,...,48140,48140,48140,48140,48140],[99999,48140,48140,20740,48140,...,31540,99999,31540,99999,31540]]
state_code: [["WI","WI","WI","WI","WI",...,"WI","WI","WI","WI","WI"],["WI","WI","WI","WI","WI",...,"WI","WI","WI","WI","WI"],...,["WI","WI","WI","WI","WI",...,"WI","WI","WI","WI","WI"],["WI","WI","WI","WI","WI",...,"WI","WI","WI","WI","WI"]]
county_code: [[55027,55001,55013,55059,55087,...,55025,55125,55027,55109,55039],[55027,55067,55059,55115,55035,...,55059,55059,55137,55031,55133],...,[55133,55073,55105,55079,55133,...,55073,55073,55073,55073,55073],[55097,55073,55069,55035,55073,...,55025,55029,55025,55051,55021]]
census_tract: [[55027961800,55001950501,55013970400,55059002000,55087013300,...,55025013301,55125950700,55027960100,55109120501,55039041400],[55027960400,55067960700,55059002100,55115100300,55035000502,...,55059001800,55059002500,55137960700,55031020400,55133201504],...,[55133203000,55073001102,55105003100,55079070300,55133204002,...,55073001900,55073001102,55073001102,55073001104,55073001700],[55097960800,55073001600,55069960900,55035001300,55073001102,...,55025011301,55029100800,55025012300,55051180300,55021970300]]
conforming_loan_limit: [["C","C","C","C","C",...,"C","C","C","C","C"],["C","C","C","C","C",...,"C","C","C","C","C"],...,["C","C","C","C","C",...,"C","C","C","C","C"],["C","C","C","C","C",...,"C","C","C","C","C"]]
derived_loan_product_type: [["Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien",...,"VA:First Lien","VA:First Lien","FHA:First Lien","FHA:First Lien","FSA/RHS:First Lien"],["FSA/RHS:First Lien","FSA/RHS:First Lien","FHA:First Lien","VA:First Lien","FHA:First Lien",...,"VA:First Lien","VA:First Lien","FSA/RHS:First Lien","FHA:First Lien","VA:First Lien"],...,["FHA:First Lien","FHA:First Lien","FHA:First Lien","Conventional:First Lien","VA:First Lien",...,"Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien"],["Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien",...,"Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien","Conventional:First Lien"]]
derived_dwelling_category: [["Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built",...,"Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built"],["Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built",...,"Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built"],...,["Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built",...,"Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built"],["Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built",...,"Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built","Single Family (1-4 Units):Site-Built"]]
derived_ethnicity: [["Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Joint",...,"Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available"],["Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available",...,"Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available","Ethnicity Not Available"],...,["Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino",...,"Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino"],["Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Ethnicity Not Available",...,"Ethnicity Not Available","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino","Not Hispanic or Latino"]]
...
%% Cell type:code id:96408d76-505b-463c-9ab1-b3006f017113 tags:
``` python
df.head()
```
%% Output
activity_year lei derived_msa-md state_code \
0 2021 54930034MNPILHP25H80 99999 WI
1 2021 54930034MNPILHP25H80 99999 WI
2 2021 54930034MNPILHP25H80 99999 WI
3 2021 54930034MNPILHP25H80 29404 WI
4 2021 54930034MNPILHP25H80 11540 WI
county_code census_tract conforming_loan_limit derived_loan_product_type \
0 55027.0 5.502796e+10 C Conventional:First Lien
1 55001.0 5.500195e+10 C Conventional:First Lien
2 55013.0 5.501397e+10 C Conventional:First Lien
3 55059.0 5.505900e+10 C Conventional:First Lien
4 55087.0 5.508701e+10 C Conventional:First Lien
derived_dwelling_category derived_ethnicity ... \
0 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
1 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
2 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
3 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
4 Single Family (1-4 Units):Site-Built Joint ...
denial_reason-2 denial_reason-3 denial_reason-4 tract_population \
0 NaN NaN NaN 4196
1 NaN NaN NaN 1511
2 NaN NaN NaN 3895
3 NaN NaN NaN 5561
4 NaN NaN NaN 7248
tract_minority_population_percent ffiec_msa_md_median_family_income \
0 3.67 69600
1 5.43 69600
2 9.63 69600
3 9.15 102500
4 5.22 85600
tract_to_msa_income_percentage tract_owner_occupied_units \
0 108 1422
1 65 541
2 80 1685
3 106 1851
4 111 1939
tract_one_to_four_family_homes tract_median_age_of_housing_units
0 1839 57
1 1966 33
2 5859 35
3 2208 30
4 2351 14
[5 rows x 99 columns]
%% Cell type:code id:464c643b-be78-44d5-ac38-ef867d1c437d tags:
``` python
start = time.time()
df = pd.read_csv("hdma-wi-2021.csv")
end = time.time()
print(end-start)
```
%% Output
/tmp/ipykernel_21/1215760448.py:2: DtypeWarning: Columns (22,23,24,26,27,28,29,30,31,32,33,38,43,44) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("hdma-wi-2021.csv")
3.0895423889160156
%% Cell type:code id:cf4b2f0d-0df3-43ea-a255-e341fe4e25a3 tags:
``` python
import pyarrow.compute as pc
pc.utf8_lower(t["lei"])
```
%% Output
<pyarrow.lib.ChunkedArray object at 0x78c0506bdfd0>
[
[
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
...
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731"
],
[
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
...
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731",
"rvdpppghcgz40j4vq731"
],
...,
[
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
"549300lyrwpsypk6s325",
...
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34"
],
[
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
"5493002jv15gsvfkzw34",
...
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80",
"54930034mnpilhp25h80"
]
]
%% Cell type:code id:53f35b40-52c2-4389-aee5-482a2a9a1601 tags:
``` python
pc.mean(t["income"].drop_null()).as_py()
```
%% Output
377.5220353645974
%% Cell type:code id:9501a5f3-0168-4e20-a706-9a814a8c6544 tags:
``` python
t[:3].to_pandas()
```
%% Output
activity_year lei derived_msa-md state_code \
0 2021 54930034MNPILHP25H80 99999 WI
1 2021 54930034MNPILHP25H80 99999 WI
2 2021 54930034MNPILHP25H80 99999 WI
county_code census_tract conforming_loan_limit derived_loan_product_type \
0 55027 55027961800 C Conventional:First Lien
1 55001 55001950501 C Conventional:First Lien
2 55013 55013970400 C Conventional:First Lien
derived_dwelling_category derived_ethnicity ... \
0 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
1 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
2 Single Family (1-4 Units):Site-Built Not Hispanic or Latino ...
denial_reason-2 denial_reason-3 denial_reason-4 tract_population \
0 NaN NaN NaN 4196
1 NaN NaN NaN 1511
2 NaN NaN NaN 3895
tract_minority_population_percent ffiec_msa_md_median_family_income \
0 3.67 69600
1 5.43 69600
2 9.63 69600
tract_to_msa_income_percentage tract_owner_occupied_units \
0 108 1422
1 65 541
2 80 1685
tract_one_to_four_family_homes tract_median_age_of_housing_units
0 1839 57
1 1966 33
2 5859 35
[3 rows x 99 columns]
%% Cell type:markdown id:99a3a96d-65bb-4112-a0ec-889c25bc0c5b tags:
# mmap Demo
%% Cell type:code id:ff1e931a-7a8e-454a-af2f-8487191e7b98 tags:
``` python
import pyarrow as pa
import pyarrow.compute as pc
```
%% Cell type:code id:d073d1ba-0724-4026-aaf1-bb06591555ba tags:
``` python
batch = pa.RecordBatch.from_arrays([range(1,1_000_000),
range(1,1_000_000),
range(1,1_000_000)],
names=["x", "y", "z"])
print(batch.nbytes / 1024**2)
with pa.ipc.new_file("test.arrow", schema=batch.schema) as f:
for i in range(50):
f.write_batch(batch)
```
%% Output
22.888160705566406
%% Cell type:code id:a20b3986-c00d-4f40-bb02-43b221cea40b tags:
``` python
!ls -lah *.arrow
```
%% Output
-rw-r--r-- 1 root root 1.2G Feb 12 17:48 test.arrow
%% Cell type:code id:92eafdf5-1a7e-4e28-86fb-eb26c74c3dbb tags:
``` python
# with pa.ipc.open_file("test.arrow") as f:
# tbl = f.read_all()
```
%% Cell type:code id:88908cab-faa2-4425-b3e2-d77826bf30d5 tags:
``` python
import mmap
```
%% Cell type:code id:b988bec9-c7a1-4383-a36e-b2741affc0f7 tags:
``` python
with open("test.arrow", "rb") as f:
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
```
%% Cell type:code id:84745dd0-e3ed-4e59-ac30-2edeb8f383b0 tags:
``` python
mm[:20]
```
%% Output
b'ARROW1\x00\x00\xff\xff\xff\xff\xd8\x00\x00\x00\x10\x00\x00\x00'
%% Cell type:code id:418c3958-307d-4e59-a2e0-49b81b666efb tags:
``` python
with pa.ipc.open_file(mm) as f:
tbl = f.read_all()
```
%% Cell type:code id:eb09d993-6697-4ad5-b5db-a1716a031ff2 tags:
``` python
pc.sum(tbl["x"])
```
%% Output
<pyarrow.Int64Scalar: 24999975000000>
%% Cell type:code id:feff1985-e98d-40bf-8783-9f920d63423c tags:
``` python
pc.sum(tbl["y"])
```
%% Output
<pyarrow.Int64Scalar: 24999975000000>
%% Cell type:code id:9f504ca1-8aa3-4aab-b356-70011638f245 tags:
``` python
pc.sum(tbl["z"])
```
%% Output
<pyarrow.Int64Scalar: 24999975000000>
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==24.2.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.1.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.3.2
comm==0.2.2
debugpy==1.8.5
decorator==5.1.1
defusedxml==0.7.1
executing==2.1.0
fastjsonschema==2.20.0
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.5
httpx==0.27.2
idna==3.10
ipykernel==6.29.5
ipython==8.27.0
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.4
json5==0.9.25
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
MarkupSafe==2.1.5
matplotlib-inline==0.1.7
mistune==3.0.2
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
notebook_shim==0.2.4
numpy==2.1.1
overrides==7.7.0
packaging==24.1
pandas==2.2.3
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
platformdirs==4.3.3
prometheus_client==0.20.0
prompt_toolkit==3.0.47
psutil==6.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==17.0.0
pycparser==2.22
Pygments==2.18.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
referencing==0.35.1
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.20.0
Send2Trash==1.8.3
setuptools==68.1.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
stack-data==0.6.3
terminado==0.18.1
tinycss2==1.3.0
tornado==6.4.1
traitlets==5.14.3
types-python-dateutil==2.9.0.20240906
tzdata==2024.2
uri-template==1.3.0
urllib3==2.2.3
wcwidth==0.2.13
webcolors==24.8.0
webencodings==0.5.1
websocket-client==1.8.0
wheel==0.42.0
FROM ubuntu:24.04
RUN apt-get update && apt-get install -y python3 python3-pip curl iproute2 wget unzip software-properties-common
RUN add-apt-repository -y ppa:deadsnakes/ppa && apt-get update && apt-get install -y python3.13-nogil python3.13-dev libffi-dev
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.13-nogil get-pip.py
COPY requirements.txt /tmp/requirements.txt
RUN python3.13 -m pip install -r /tmp/requirements.txt --break-system-packages
RUN python3.13-nogil -m pip install ipykernel
RUN python3.13-nogil -m ipykernel install --user --name python3.13-nogil --display-name "Python 3.13-nogil"
# JupyterLab needs GIL, but kernel does not
CMD ["python3.13", "-m", "jupyterlab", "--no-browser", "--ip=0.0.0.0", "--port=300", "--allow-root", "--NotebookApp.token=''"]
%% Cell type:code id:3b8d1ed9-9568-473e-bfa2-3b08f9e4587f tags:
``` python
import threading
import time
def task():
print("hi from thread ID:", threading.get_native_id())
t = threading.Thread(target=task)
t.start()
print("hi from main thread, ID:", threading.get_native_id())
```
%% Output
hi from thread ID:hi from main thread, ID: 65
125
%% Cell type:code id:651dc205-4fdc-4054-be28-b19e2e798017 tags:
``` python
total = 0
def task(count):
global total
for i in range(count):
total += 1
t = threading.Thread(target=task, args=[1_000_000])
t.start()
t.join() # wait until it exits
print(total)
```
%% Output
1000000
%% Cell type:code id:5eb4a3e1-decf-4352-b61c-b10e53aec763 tags:
``` python
total
```
%% Output
1000
%% Cell type:code id:ceeb7904-ff2c-424c-bf7c-724d1b285e3b tags:
``` python
total = 0
def task(count):
global total
for i in range(count):
total += 1
t1 = threading.Thread(target=task, args=[1_000_000])
t1.start()
t2 = threading.Thread(target=task, args=[1_000_000])
t2.start()
t1.join()
t2.join()
total
```
%% Output
1084635
%% Cell type:code id:48a5226c-6d63-4062-a2be-f7dd890c4dc1 tags:
``` python
import dis
dis.dis("total += 1")
```
%% Output
0 RESUME 0
1 LOAD_NAME 0 (total)
LOAD_CONST 0 (1)
BINARY_OP 13 (+=)
STORE_NAME 0 (total)
RETURN_CONST 1 (None)
%% Cell type:code id:3b8d1ed9-9568-473e-bfa2-3b08f9e4587f tags:
``` python
import threading
import time
```
%% Cell type:code id:da885575-cedf-4bb4-8bbd-44400974d3f8 tags:
``` python
import dis
dis.dis("total += 1")
```
%% Output
0 RESUME 0
1 LOAD_NAME 0 (total)
LOAD_CONST 0 (1)
BINARY_OP 13 (+=)
STORE_NAME 0 (total)
RETURN_CONST 1 (None)
%% Cell type:code id:ceeb7904-ff2c-424c-bf7c-724d1b285e3b tags:
``` python
%%time
# 133 ms with no locks
# 348 ms with locks (fine grained)
# 124 ms with locks (coarse grained)
lock = threading.Lock() # this protects the "total" variable
total = 0
def task(count):
global total
lock.acquire()
for i in range(count):
total += 1
lock.release()
t1 = threading.Thread(target=task, args=[1_000_000])
t1.start()
t2 = threading.Thread(target=task, args=[1_000_000])
t2.start()
t1.join()
t2.join()
total
```
%% Output
CPU times: user 133 ms, sys: 144 μs, total: 133 ms
Wall time: 129 ms
2000000
%% Cell type:code id:042356c8-7d34-4fa5-bef1-0c43e829fc13 tags:
``` python
import threading
bank_accounts = {"x": 25, "y": 100, "z": 200} # in dollars
lock = threading.Lock() # protects bank_accounts
def transfer(src, dst, amount):
with lock: # automatically acquire now, and release after the with statement
success = False
if bank_accounts[src] >= amount:
bank_accounts[src] -= amount
bank_accounts[dst] += amount
success = True
print("transferred" if success else "denied")
print("locked inside with?", lock.locked())
print("locked after with?", lock.locked())
```
%% Cell type:code id:1878620a-553b-4f25-9606-1e2605b318ac tags:
``` python
transfer("x", "y", 20)
```
%% Output
transferred
locked inside with? True
locked after with? False
%% Cell type:code id:8901ec24-7471-40e1-8182-585d3cc760c7 tags:
``` python
bank_accounts
```
%% Output
{'x': 5, 'y': 120, 'z': 200}
%% Cell type:code id:9cee2370-0f3d-4ac0-930a-c59f9f0d7f1b tags:
``` python
transfer("x", "z", 10)
```
%% Output
denied
locked inside with? True
locked after with? False
%% Cell type:code id:3a583c7c-57a3-4f42-8301-8ce601a2249e tags:
``` python
bank_accounts
```
%% Output
{'x': 5, 'y': 120, 'z': 200}
%% Cell type:code id:3346b86d-5f80-4007-8879-b452c4826c50 tags:
``` python
transfer("w", "z", 10)
```
%% Output
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[8], line 1
----> 1 transfer("w", "z", 10)
Cell In[3], line 9, in transfer(src, dst, amount)
7 with lock:
8 success = False
----> 9 if bank_accounts[src] >= amount:
10 bank_accounts[src] -= amount
11 bank_accounts[dst] += amount
KeyError: 'w'
%% Cell type:code id:02262baa-6f4f-455e-b913-accfa894cc18 tags:
``` python
transfer("z", "x", 3)
```
%% Output
transferred
locked inside with? True
locked after with? False
%% Cell type:code id:1b338a2c-1274-41e8-9179-89b797e4ec2f tags:
``` python
bank_accounts
```
%% Output
{'x': 8, 'y': 120, 'z': 197}
%% Cell type:code id:de014d74-4685-4f16-85fd-f571fd5a91db tags:
``` python
import threading
```
%% Cell type:code id:af3da760-5f6e-40c7-94cd-b505abe59012 tags:
``` python
def task():
print("hello from thread ID:", threading.get_native_id())
#task()
t = threading.Thread(target=task)
t.start()
print("hello from main thread, with ID:", threading.get_native_id())
```
%% Output
hello from thread ID:hello from main thread, with ID: 589
602
%% Cell type:code id:3dd19fcb-2c4a-4b26-b72f-3b87bd8e5bb5 tags:
``` python
total = 0
def task(count):
global total
for i in range(count):
total += 1
t = threading.Thread(target=task, args=[1_000_000])
t.start()
t.join() # wait until the thread is done before we continue
total
```
%% Output
1000000
%% Cell type:code id:de6ce736-5bdf-42b9-b98c-afee738b8f94 tags:
``` python
total
```
%% Output
1000000
%% Cell type:code id:b735264e-f27a-4d27-ba5f-4d6a1f53543e tags:
``` python
total = 0
def task(count):
global total
for i in range(count):
total += 1
t1 = threading.Thread(target=task, args=[1_000_000])
t1.start()
t2 = threading.Thread(target=task, args=[1_000_000])
t2.start()
t1.join()
t2.join()
total
```
%% Output
1100428
%% Cell type:code id:c3d4a396-1793-4504-b2ff-fb8b81ae48d5 tags:
``` python
import dis
dis.dis("total += 1")
```
%% Output
0 RESUME 0
1 LOAD_NAME 0 (total)
LOAD_CONST 0 (1)
BINARY_OP 13 (+=)
STORE_NAME 0 (total)
RETURN_CONST 1 (None)
%% Cell type:code id:de014d74-4685-4f16-85fd-f571fd5a91db tags:
``` python
import threading
```
%% Cell type:code id:82b2f08c-1a13-4a0f-9546-e5d624dbee5b tags:
``` python
import dis
dis.dis("total += 1")
```
%% Output
0 RESUME 0
1 LOAD_NAME 0 (total)
LOAD_CONST 0 (1)
BINARY_OP 13 (+=)
STORE_NAME 0 (total)
RETURN_CONST 1 (None)
%% Cell type:code id:ae617ed8-dd51-4154-ad83-308df527d1f1 tags:
``` python
import threading
```
%% Cell type:code id:b735264e-f27a-4d27-ba5f-4d6a1f53543e tags:
``` python
%%time
# 141 ms (no locks)
# 340 ms (fine-grained locking)
# 122 ms (coarse-grained locking)
lock = threading.Lock() # this protects total
total = 0
def task(count):
global total
lock.acquire()
for i in range(count):
total += 1
lock.release()
t1 = threading.Thread(target=task, args=[1_000_000])
t1.start()
t2 = threading.Thread(target=task, args=[1_000_000])
t2.start()
t1.join()
t2.join()
total
```
%% Output
CPU times: user 151 ms, sys: 0 ns, total: 151 ms
Wall time: 148 ms
2000000
%% Cell type:code id:7cf83d46-b7db-4ab6-b0bb-4b35fecba5b2 tags:
``` python
bank_accounts = {"x": 25, "y": 100, "z": 200} # in dollars
lock = threading.Lock() # protects bank_accounts
def transfer(src, dst, amount):
with lock: # automatically acquire now, automatically release after the with
success = False
if bank_accounts[src] >= amount:
bank_accounts[src] -= amount
bank_accounts[dst] += amount
success = True
print("transferred" if success else "denied")
print("is it locked inside the with?", lock.locked())
print("is it locked after the with?", lock.locked())
```
%% Cell type:code id:c9f19a3d-172f-4103-b631-11b42a8dc94c tags:
``` python
transfer("x", "y", 20)
bank_accounts
```
%% Output
transferred
is it locked inside the with? True
is it locked after the with? False
{'x': 5, 'y': 120, 'z': 200}
%% Cell type:code id:bdd94c73-dd01-4b9e-aecc-70faff00685d tags:
``` python
transfer("x", "z", 10)
bank_accounts
```
%% Output
denied
is it locked inside the with? True
is it locked after the with? False
{'x': 5, 'y': 120, 'z': 200}
%% Cell type:code id:41f9d7d2-86f8-433c-bf56-724a7f445753 tags:
``` python
transfer("w", "z", 10) # there is no "w" bank account
bank_accounts
```
%% Output
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[8], line 1
----> 1 transfer("w", "z", 10) # there is no "w" bank account
2 bank_accounts
Cell In[5], line 7, in transfer(src, dst, amount)
5 with lock: # automatically acquire now, automatically release after the with
6 success = False
----> 7 if bank_accounts[src] >= amount:
8 bank_accounts[src] -= amount
9 bank_accounts[dst] += amount
KeyError: 'w'
%% Cell type:code id:f193975b-6f75-4c3c-b789-c494ad1130a6 tags:
``` python
transfer("z", "y", 50)
bank_accounts
```
%% Output
transferred
is it locked inside the with? True
is it locked after the with? False
{'x': 5, 'y': 170, 'z': 150}
%% Cell type:code id:cd9dd92a-3d2c-47c7-85f1-4bec952d1a67 tags:
``` python
```
anyio==4.8.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.4
attrs==25.1.0
babel==2.17.0
beautifulsoup4==4.13.3
bleach==6.2.0
blinker==1.7.0
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
comm==0.2.2
cryptography==41.0.7
dbus-python==1.3.2
debugpy==1.8.12
decorator==5.1.1
defusedxml==0.7.1
distro==1.9.0
distro-info==1.7+build1
executing==2.2.0
fastjsonschema==2.21.1
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.7
httplib2==0.20.4
httpx==0.28.1
idna==3.10
ipykernel==6.29.5
ipython==8.32.0
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.5
json5==0.10.0
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-events==0.12.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.15.0
jupyter_server_terminals==0.5.3
jupyterlab==4.3.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
launchpadlib==1.11.0
lazr.restfulclient==0.14.6
lazr.uri==1.0.6
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mistune==3.1.1
nbclient==0.10.2
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
notebook_shim==0.2.4
oauthlib==3.2.2
overrides==7.7.0
packaging==24.2
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
platformdirs==4.3.6
prometheus_client==0.21.1
prompt_toolkit==3.0.50
psutil==6.1.1
ptyprocess==0.7.0
pure_eval==0.2.3
pycparser==2.22
Pygments==2.19.1
PyGObject==3.48.2
PyJWT==2.7.0
pyparsing==3.1.1
python-apt==2.7.7+ubuntu4
python-dateutil==2.9.0.post0
python-json-logger==3.2.1
PyYAML==6.0.2
pyzmq==26.2.1
referencing==0.36.2
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.22.3
Send2Trash==1.8.3
setuptools==68.1.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
stack-data==0.6.3
terminado==0.18.1
tinycss2==1.4.0
tornado==6.4.2
traitlets==5.14.3
types-python-dateutil==2.9.0.20241206
typing_extensions==4.12.2
unattended-upgrades==0.1
uri-template==1.3.0
urllib3==2.3.0
wadllib==1.3.6
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
wheel==0.42.0