Skip to content
Snippets Groups Projects
Commit 4eb6f5c3 authored by JINLANG WANG's avatar JINLANG WANG
Browse files

add lab4

parent ffce38e0
No related branches found
No related tags found
No related merge requests found
Pipeline #744259 passed
# BSTs (Binary Search Trees)
In this lab, you'll create a BST that can be used to lookup values by
a key (it will behave a bit like a Python dict where the all the dict
values are lists of values). You'll use the BST for P2.
## Basics Node and BST classes
Start by pasting+completing the following:
```python
class Node():
def __init__(self, key):
self.key = ????
self.values = []
self.left = None
????
```
Let's create a `BST` class with an `add` method that automatically
places a node in a place that preserves the search property (i.e., all
keys in left subtree are less than a parent's value, which is less
than those in the right tree).
Add+complete with the following. Note that this is a non-recursive
version of `add`:
```python
class BST():
def __init__(self):
self.root = None
def add(self, key, val):
if self.root == None:
self.root = ????
curr = self.root
while True:
if key < curr.key:
# go left
if curr.left == None:
curr.left = Node(key)
curr = curr.left
elif key > curr.key:
# go right
????
????
????
else:
# found it!
assert curr.key == key
break
curr.values.append(val)
```
## Dump
Let's write some methods to BST to dump out all the keys and values (note
that "__" before a method name is a hint that it is for internal use
-- methods inside the class might call `__dump`, but code outside the
class probably shouldn't):
```python
def __dump(self, node):
if node == None:
return
self.__dump(node.right) # 1
print(node.key, ":", node.values) # 2
self.__dump(node.left) # 3
def dump(self):
self.__dump(self.root)
```
Try it:
```python
tree = BST()
tree.add("A", 9)
tree.add("A", 5)
tree.add("B", 22)
tree.add("C", 33)
tree.dump()
```
You should see this:
```
C : [33]
B : [22]
A : [9, 5]
```
Play around with the order of lines 1, 2, and 3 in `__dump()` above. Can you
arrange those three so that the output is in ascending alphabetical
order, by key?
## Length
Add a special method `__len__` to `Node` so that we can find the size
of a tree. Count every entry in the `.values` list of each `Node`.
```python
def __len__(self):
size = len(self.values)
if self.left != None:
size += ????
????
????
return size
```
```python
t = BST()
t.add("B", 3)
assert len(t.root) == 1
t.add("A", 2)
assert len(t.root) == 2
t.add("C", 1)
assert len(t.root) == 3
t.add("C", 4)
assert len(t.root) == 4
```
Discuss with your neighbour: why not have a `Node.__dump(self)` method
instead of the `BST.__dump(self, node)` method?
<details>
<summary>Answer</summary>
Right now, it is convenient to check at the beginning if `node` is
None. A receiver (the `self` parameter) can't be None if the
`object.method(...)` syntax is used (you would get the
"AttributeError: 'NoneType' object has no attribute 'method'" error).
We could have a `Node.__dump(self)` method, but then we would need to do the None checks on both `.left` and `.right`, which is slightly longer.
</details>
## Lookups
Write a `lookup` method in `Node` that returns all the values that match a given key. Some examples:
* `t.root.lookup("A")` should return `[2]`
* `t.root.lookup("C")` should return `[1, 4]`
* `t.root.lookup("Z")` should return `[]`
Some pseudocode for you to translate to Python:
```
lookup method (takes key)
if key matches my key, return my values
if key is less than my key and I have a left child
call lookup on my left child and return what it returns
if key is greater than my key and I have a right child
call lookup on my right child and return what it returns
otherwise return an empty list
```
## `search.py` module
If you've been developing your `BST` and `Node` classes in a notebook,
you should now move them to a module called `search.py` in your `p2`
directory.
%% Cell type:markdown id: tags:
# This just generates random data -- look at main.ipynb to debug
%% Cell type:code id: tags:
``` python
import names
import numpy as np
import pandas as pd
```
%% Cell type:code id: tags:
``` python
df = pd.DataFrame({"name": [names.get_first_name() for i in range(10)]})
for i in range(5):
df[f"P{i+1}"] = np.random.random(size=len(df)) * 0.15 + 0.85
df[f"Final"] = np.random.random(size=len(df)) * 0.3 + 0.7
df[f"Participation"] = np.random.random(size=len(df)) * 0.1 + 0.9
df
```
%% Output
name P1 P2 P3 P4 P5 Final \
0 Elsie 0.954955 0.913779 0.921565 0.936532 0.901380 0.928387
1 Brian 0.947351 0.952920 0.925073 0.875950 0.857365 0.938826
2 Loretta 0.958606 0.891525 0.950882 0.946470 0.989340 0.933632
3 Esther 0.985102 0.872918 0.977284 0.988530 0.930378 0.724164
4 Dawn 0.966695 0.927002 0.991770 0.959826 0.895863 0.859567
5 Crystal 0.859427 0.952088 0.965462 0.899423 0.995269 0.989677
6 Clarence 0.851693 0.926668 0.906261 0.880833 0.932816 0.834454
7 Virginia 0.928037 0.934979 0.874236 0.955648 0.997138 0.715540
8 Ernest 0.893501 0.959971 0.938698 0.887911 0.881159 0.978723
9 Jane 0.922780 0.964439 0.926576 0.937013 0.853827 0.700346
Participation
0 0.914671
1 0.958818
2 0.969530
3 0.995753
4 0.908425
5 0.987587
6 0.943060
7 0.959519
8 0.928907
9 0.998765
%% Cell type:code id: tags:
``` python
df.to_csv("scores.csv", index=False)
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
import pandas as pd
```
%% Cell type:code id: tags:
``` python
# Point vs Grade Distribution:
# Projects: 10% each
# Final: 30%
# Participation: 20%
df = pd.read_csv("scores.csv")
df
```
%% Output
name P1 P2 P3 P4 P5 Final \
0 Elsie 0.954955 0.913779 0.921565 0.936532 0.901380 0.928387
1 Brian 0.947351 0.952920 0.925073 0.875950 0.857365 0.938826
2 Loretta 0.958606 0.891525 0.950882 0.946470 0.989340 0.933632
3 Esther 0.985102 0.872918 0.977284 0.988530 0.930378 0.724164
4 Dawn 0.966695 0.927002 0.991770 0.959826 0.895863 0.859567
5 Crystal 0.859427 0.952088 0.965462 0.899423 0.995269 0.989677
6 Clarence 0.851693 0.926668 0.906261 0.880833 0.932816 0.834454
7 Virginia 0.928037 0.934979 0.874236 0.955648 0.997138 0.715540
8 Ernest 0.893501 0.959971 0.938698 0.887911 0.881159 0.978723
9 Jane 0.922780 0.964439 0.926576 0.937013 0.853827 0.700346
Participation
0 0.914671
1 0.958818
2 0.969530
3 0.995753
4 0.908425
5 0.987587
6 0.943060
7 0.959519
8 0.928907
9 0.998765
%% Cell type:code id: tags:
``` python
class Student:
def __init__(self,name):
self.name = name
self.grade = 0
def compute_grade(self, category, points):
grade = 0
if category == "Participation":
grade += points*20
if "P" in category:
grade += points*10
if category == "Final":
grade += points*30
self.grade += self.grade
def get_grade(self):
return self.grade
```
%% Cell type:code id: tags:
``` python
cs320 = {}
for i in range(len(df)):
student = Student(df["name"][i])
for col in df.columns:
student.compute_grade(col, df.at[i, col])
cs320[student.name] = student.get_grade()
```
%% Cell type:code id: tags:
``` python
# max score should be 100; Crystal should have highest score, about 96.16
cs320
```
%% Output
{'Elsie': 0,
'Brian': 0,
'Loretta': 0,
'Esther': 0,
'Dawn': 0,
'Crystal': 0,
'Clarence': 0,
'Virginia': 0,
'Ernest': 0,
'Jane': 0}
%% Cell type:markdown id: tags:
# Hints
Debugging is about asking good questions related to the issue, then finding answers to those questions (often with print statements). Some good questions to ask here:
* what line is supposed to update `self.grade` (which appears to remain zero, incorrectly)? Does this line run? You could add a `print("DEBUG")` before to find out.
* what is getting added to `self.grade` each time `compute_grade` is called? A print can help with this question too.
* is any category getting counted more than once? You could print the category inside `compute_grade` and print "ADD" inside each `if` statement to look for double counting.
%% Cell type:code id: tags:
``` python
```
name,P1,P2,P3,P4,P5,Final,Participation
Elsie,0.9549550658129085,0.913778536271882,0.9215645170508429,0.9365319073397634,0.9013803540759843,0.9283869569882017,0.914671094034935
Brian,0.9473514567792112,0.9529196735053641,0.9250734196408118,0.8759504151368779,0.8573646255287061,0.9388263403512875,0.958818414131339
Loretta,0.9586059945221289,0.8915254116636335,0.9508819366200232,0.9464696745374475,0.9893403649836676,0.933631602588951,0.9695295462457724
Esther,0.9851022156424294,0.8729183336955997,0.977283721063346,0.9885295844870531,0.9303782853727577,0.7241638599763238,0.995752825762694
Dawn,0.9666952120841795,0.9270022970580513,0.9917699643450418,0.9598264328786889,0.8958627959043414,0.8595672515103249,0.9084248987279755
Crystal,0.8594269068766791,0.9520883722829931,0.965461916205357,0.8994228809152749,0.9952691008971792,0.9896774016995538,0.9875868086271352
Clarence,0.8516934263744823,0.926667638882182,0.9062605836013653,0.8808329786074112,0.9328156387216657,0.8344540588006799,0.9430598375038679
Virginia,0.9280369533547376,0.9349792379562916,0.8742357364765729,0.9556482581209289,0.9971375119681434,0.7155395489950207,0.9595187884592306
Ernest,0.893501309690879,0.9599711435403204,0.9386984759172361,0.8879113516975059,0.8811594481923767,0.9787231284971425,0.9289073086234929
Jane,0.9227803346581137,0.9644389789471546,0.9265759728084851,0.9370125768424054,0.8538274635091643,0.7003457399712552,0.9987653286592659
# Lab 4: BST
1. We have the detailed mortgage dataset we're using for P2 thanks to the 1975 Home Mortgage Disclosure Act. Discuss with your group: *If you could pass a law requiring the collection and release of a new dataset, what data would you choose?* Feel free to answer based on what you think would be fun or interesting, or you can think about how your dataset might bring more transparency to a societal issue (like how the HDMA data makes it easier to monitor for discriminatory lending practices).
2. Inside this folder, there is a notebook `debug/self/main.ipynb`. Open it and fix the bugs.
3. Create a [binary search tree](./bst-groups) for use in P2.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment