Skip to content
Snippets Groups Projects
Commit 0e84a6d8 authored by gsingh58's avatar gsingh58
Browse files

Lec17 update

parent 27622921
No related branches found
No related tags found
No related merge requests found
Source diff could not be displayed: it is too large. Options to address this: view the blob.
# this returns the text of the book, in file pg420.txt.
# just use it!
# (you don't need to understand how it works at this point in the semester)
def read_book():
with open('pg420.txt') as f:
return f.read()
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id: tags:
 
# Dictionaries
 
* Download ALL files for today's lecture
* Read
* Downey Ch 11 ("A Dictionary is a Mapping" through "Looping and Dictionaries")
* [Python for Everybody, 10.1 - 10.7](https://runestone.academy/ns/books/published/py4e-int/dictionaries/toctree.html)
* [Exam 2 Conflict Form](https://docs.google.com/forms/d/e/1FAIpQLSegJSzTsDHEnygijU3-HQvZDUbTCkHFPKccDkqMt1dGzC67_w/viewform)
* [Regrade Request](https://piazza.com/class/ld8bqui1lgeas/post/105)
* Exam 1: Post questions on Piazza or visit office hour
%% Cell type:markdown id: tags:
# Data Structure Methods
* [String Methods](https://www.w3schools.com/python/python_strings_methods.asp)
* [List Methods](https://www.w3schools.com/python/python_ref_list.asp)
* [Set Methods](https://www.w3schools.com/python/python_ref_set.asp)
* [Dict Methods](https://www.w3schools.com/python/python_ref_dictionary.asp)
%% Cell type:code id: tags:
 
``` python
import csv
```
 
%% Cell type:markdown id: tags:
 
### Warmup 1: Read in the file 'cs220_survey_data.csv' into a lists of lists `csv_data`
 
%% Cell type:code id: tags:
 
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
example_file = open(filename, encoding="utf-8")
example_reader = csv.reader(example_file)
example_data = list(example_reader)
example_file.close()
 
return example_data
```
 
%% Cell type:code id: tags:
 
``` python
csv_data = process_csv("cs220_survey_data.csv") # TODO: change this
 
# TODO: compute the length of this list of lists
len(csv_data)
```
 
%% Output
 
993
 
%% Cell type:markdown id: tags:
 
### Warmup 2: store the first row in a variable called `cs220_header`
 
%% Cell type:code id: tags:
 
``` python
cs220_header = csv_data[0] # TODO: change this
cs220_header
```
 
%% Output
 
['Lecture',
'Age',
'Major',
'Zip Code',
'Latitude',
'Longitude',
'Pizza topping',
'Pet preference',
'Runner',
'Sleep habit',
'Procrastinator']
 
%% Cell type:markdown id: tags:
 
### Warmup 3: store the rest of the data in a variable called `cs220_data`
 
%% Cell type:code id: tags:
 
``` python
cs220_data = csv_data[1:] # TODO: change this
cs220_data[0]
```
 
%% Output
 
['LEC001',
'22',
'Engineering: Biomedical',
'53703',
'43.073051',
'-89.40123',
'none (just cheese)',
'neither',
'No',
'no preference',
'Maybe']
 
%% Cell type:markdown id: tags:
 
### Warmup 4: show the last 3 rows of data
 
%% Cell type:code id: tags:
 
``` python
cs220_data[-3:]
```
 
%% Output
 
[['LEC001',
'18',
'Undecided',
'53706',
'44.8341',
'87.377',
'basil/spinach',
'dog',
'No',
'no preference',
'Yes'],
['LEC003',
'19',
'Engineering: Mechanical',
'53705',
'46.589146',
'-112.039108',
'none (just cheese)',
'cat',
'No',
'night owl',
'Yes'],
['LEC001',
'20',
'Economics',
'53703',
'39.631506',
'118.143239',
'pineapple',
'dog',
'No',
'night owl',
'Maybe']]
 
%% Cell type:markdown id: tags:
 
### Warmup 5: what is the output of `cs220_data[-1:]`
 
- Be careful with slicing.
- Slicing a string gives a new string
- Slicing a list gives a new list
- Slicing a list of list will always give a new list of list (even if your slice only contains one of the inner lists)
 
%% Cell type:code id: tags:
 
``` python
cs220_data[-1:]
```
 
%% Output
 
[['LEC001',
'20',
'Economics',
'53703',
'39.631506',
'118.143239',
'pineapple',
'dog',
'No',
'night owl',
'Maybe']]
 
%% Cell type:markdown id: tags:
 
### Warmup 6: Write a function that counts the frequency of a value in a column
 
%% Cell type:code id: tags:
 
``` python
def column_frequency(value, col_name):
''' Returns the frequency of value in col_name. '''
count = 0
for row in cs220_data:
if row[cs220_header.index(col_name)].lower() == value.lower():
count += 1
return count
```
 
%% Cell type:code id: tags:
 
``` python
# Test your function
column_frequency("pineapple", "Pizza topping")
```
 
%% Output
 
112
 
%% Cell type:code id: tags:
 
``` python
# Try other test cases
column_frequency("macaroni/pasta", "Pizza topping")
```
 
%% Output
 
34
 
%% Cell type:markdown id: tags:
 
#### TODO: Discuss: Is there an easy way to count *every* topping's frequency?
 
%% Cell type:markdown id: tags:
 
## Learning Objectives:
 
- Use correct dictionary syntax
- to create a dictionary using either {} or dict()
- to lookup, insert, update, and pop key-value pairs
- Use a for loop, the in operator, and common methods when working with dictionaries.
- Write code that uses a dictionary
- to store frequencies
- to iterate through all key-value pairs
 
%% Cell type:markdown id: tags:
 
## Data Structure
A data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data (source: Wikipedia).
 
<div>
<img src="attachment:Lists.png" width="500"/>
</div>
 
%% Cell type:markdown id: tags:
 
Python contains built-in Data Structures called Collections
 
<div>
<img src="attachment:Collections.png" width="500"/>
</div>
 
Today we'll learn how store data and perform various operations in Dictionaries.
 
%% Cell type:markdown id: tags:
 
## Mappings
 
Common data structure approach:
- store many values
- give each value a label
- use labels to lookup values
 
`list` is an example of a mapping-based data structure
 
%% Cell type:code id: tags:
 
``` python
# index 0 1 2 3
nums_list = [300, 200, 400, 100]
nums_list[2] # lookup using index label
```
 
%% Output
 
400
 
%% Cell type:markdown id: tags:
 
Labels in a list are inflexible. They can only be consecutive `int`s starting at label 0.
 
%% Cell type:markdown id: tags:
 
### Dictionary
 
A dictionary (`dict`) is like a `list`, but more general. In a list, the indices have to be integers; but a dictionary they can be any **immutable** type. Just like lists, values can be anything.
A dictionary (`dict`) is like a `list`, but more general. In a list, the indices have to be integers; but in a dictionary they can be any **immutable** type. Just like lists, values can be anything.
 
You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.
 
(from Think Python, Chapter 11)
 
%% Cell type:code id: tags:
 
``` python
# empty dictionary
some_dict = {} # we use curly braces to create a dictionary
# We'll shortly discuss about dict versus set
 
# empty dictionary
some_other_dict = dict()
```
 
%% Cell type:markdown id: tags:
 
Just like a `list`, `dict` key-value pairs are separated by a `,`.
 
The `key` and the `value` are separated by a `:`. That is `key:value`.
 
%% Cell type:code id: tags:
 
``` python
# TODO: let's define nums_dict
 
nums_list = [300, 200, 400, 100]
nums_dict = {
"first": 300,
"second": 200,
"third": 400,
"fourth": 100,
}
 
nums_dict
```
 
%% Output
 
{'first': 300, 'second': 200, 'third': 400, 'fourth': 100}
 
%% Cell type:code id: tags:
 
``` python
# a dictionary that stores prices of bakery items
# Notice that a dict can span over more than one line, indentation doesn't matter
 
price_dict = { 'broccoli': 3.95,
'spinach': 1.50,
'donut': 1.25, 'muffin': 2.25, "ice cream": 3.99,
'brownie': 3.15,
'cookie': 0.79, 'milk': 1.65, 'loaf': 5.99,
'cauliflower': 3.99} # feel free to add some of your own here
price_dict
```
 
%% Output
 
{'broccoli': 3.95,
'spinach': 1.5,
'donut': 1.25,
'muffin': 2.25,
'ice cream': 3.99,
'brownie': 3.15,
'cookie': 0.79,
'milk': 1.65,
'loaf': 5.99,
'cauliflower': 3.99}
 
%% Cell type:markdown id: tags:
 
#### Dictionaries maintain insertion based ordering in recent versions of Python (3.7 and above versions).
 
- Go back to the previous cell and add 'ice cream': 3.99 key-value pair before 'brownie': 3.15. Re-run the cell to see how the `dict` definition changes with respect to insertion order.
 
%% Cell type:markdown id: tags:
 
### Dictionary lookups
 
- same syntax as `list` indexing
- `some_dict[key]`
 
%% Cell type:markdown id: tags:
 
### Lookup price of 'brownie', 'cookie', and 'ice cream'.
 
%% Cell type:code id: tags:
 
``` python
print(price_dict["brownie"])
print(price_dict["cookie"])
print(price_dict["ice cream"])
```
 
%% Output
 
3.15
0.79
3.99
 
%% Cell type:markdown id: tags:
 
### Lookup price of 'sugar'.
 
%% Cell type:code id: tags:
 
``` python
# print(price_dict['sugar']) # KeyError
# doesn't work
```
 
%% Cell type:markdown id: tags:
 
### Can you perform lookup using values? Answer is no, the mapping is one-way, that is key to value and not vice versa.
 
%% Cell type:code id: tags:
 
``` python
# print(price_dict[0.79]) # KeyError
# doesn't work
```
 
%% Cell type:markdown id: tags:
 
<div>
<img src="attachment:Parenthetical_characters_1.png" width="700"/>
</div>
 
%% Cell type:markdown id: tags:
 
<div>
<img src="attachment:Parenthetical_characters_2.png" width="500"/>
</div>
 
%% Cell type:markdown id: tags:
 
### Dictionaries are Mutable
 
- update existing key's value
- insert a new key-value pair
- pop method to delete a key-value pair
 
%% Cell type:code id: tags:
 
``` python
# TODO: change price of 'cauliflower' to 2.99
price_dict['cauliflower'] = 2.99
price_dict
```
 
%% Output
 
{'broccoli': 3.95,
'spinach': 1.5,
'donut': 1.25,
'muffin': 2.25,
'ice cream': 3.99,
'brownie': 3.15,
'cookie': 0.79,
'milk': 1.65,
'loaf': 5.99,
'cauliflower': 2.99}
 
%% Cell type:code id: tags:
 
``` python
# TODO: insert new key-value pair 'carrot' mapping to 1.99
price_dict["carrot"] = 1.99
price_dict
```
 
%% Output
 
{'broccoli': 3.95,
'spinach': 1.5,
'donut': 1.25,
'muffin': 2.25,
'ice cream': 3.99,
'brownie': 3.15,
'cookie': 0.79,
'milk': 1.65,
'loaf': 5.99,
'cauliflower': 2.99,
'carrot': 1.99}
 
%% Cell type:code id: tags:
 
``` python
nums_list = [10, 20, 30]
# nums_list[3] = 40 # Recall that this doesn't work on a list due to IndexError
 
# TODO: comment out line 2 and use proper syntax to add item 40 to nums_list
nums_list.append(40)
nums_list
```
 
%% Output
 
[10, 20, 30, 40]
 
%% Cell type:code id: tags:
 
``` python
# use pop to delete the 'spinach' key-value pair
price_dict.pop("spinach")
 
# Alternate
del(price_dict['donut'])
 
# try deleting someting that is not there
price_dict.pop('pizza') # KeyError
```
 
%% Output
 
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [21], in <cell line: 8>()
5 del(price_dict['donut'])
7 # try deleting someting that is not there
----> 8 price_dict.pop('pizza')
KeyError: 'pizza'
 
%% Cell type:markdown id: tags:
 
### `in` operator enables us to check whether a key exists in the dictionary
 
%% Cell type:code id: tags:
 
``` python
# TODO: fix the above example with a conditional
if "pizza" in price_dict:
price_dict.pop('pizza')
else:
print("Oops couldn't find it!")
```
 
%% Output
 
Oops couldn't find it!
 
%% Cell type:markdown id: tags:
 
### `len` built-in function returns the number of key-value pairs in a dictionary
 
%% Cell type:code id: tags:
 
``` python
# TODO: print length of price_dict
len(price_dict)
```
 
%% Output
 
9
 
%% Cell type:markdown id: tags:
 
### `for` loop enables us to iterate over keys in a dictionary
 
%% Cell type:code id: tags:
 
``` python
# TODO: iterate over price_dict and print each key-value pair in its own line
 
for key in price_dict:
print(key, price_dict[key])
```
 
%% Output
 
broccoli 3.95
muffin 2.25
ice cream 3.99
brownie 3.15
cookie 0.79
milk 1.65
loaf 5.99
cauliflower 2.99
carrot 1.99
 
%% Cell type:markdown id: tags:
 
### `keys` method
 
- retrieves keys of a dictionary
- can be converted into a list
 
%% Cell type:code id: tags:
 
``` python
# get all keys and convert to a list
print(price_dict.keys())
```
 
%% Output
 
dict_keys(['broccoli', 'muffin', 'ice cream', 'brownie', 'cookie', 'milk', 'loaf', 'cauliflower', 'carrot'])
 
%% Cell type:markdown id: tags:
 
### `values` method
 
- retrieves values of a dictionary
- can be converted into a list
 
%% Cell type:code id: tags:
 
``` python
# get all values and convert to a list
print(price_dict.values())
```
 
%% Output
 
dict_values([3.95, 2.25, 3.99, 3.15, 0.79, 1.65, 5.99, 2.99, 1.99])
 
%% Cell type:code id: tags:
 
``` python
# use 'in' price_dict, price_dict.keys(), price_dict.values()
 
print('donut' in price_dict) # default is to check the keys
print(9.95 in price_dict) # default is NOT values
print('apple' in price_dict.keys()) # can call out the keys
print(3.95 in price_dict.values()) # can check the values
```
 
%% Output
 
False
False
False
True
 
%% Cell type:markdown id: tags:
 
### Example 1: find total cost of shopping order
 
%% Cell type:code id: tags:
 
``` python
order = ['pie', 'donut', 'milk', 'cookie', 'tofu'] # add more items to the order
print(order)
 
total_cost = 0
for item in order:
# TODO: check if item is a key in price_dict
# if yes, retrieve the value and add it to total_cost
# if not, display "Couldn't find <item> in price list!"
if item in price_dict:
total_cost += price_dict[item]
else:
print("Couldn't find {} in price list!".format(item))
 
# find the total of the items in the order
print ("Your total cost is ${:.2f}".format(total_cost))
```
 
%% Output
 
['pie', 'donut', 'milk', 'cookie', 'tofu']
Couldn't find pie in price list!
Couldn't find donut in price list!
Couldn't find tofu in price list!
Your total cost is $2.44
 
%% Cell type:markdown id: tags:
 
### Example 2a: find the letter that occurred the most in a sentence
 
%% Cell type:code id: tags:
 
``` python
# start with an empty dictionary
letter_freq = {} # KEY: unique letter; VALUE: count of unique letter
 
sentence = "Meet me at the bike racks after school at 3:30 today."
 
for letter in sentence:
# TODO: check if letter is a key in letter_freq
# if yes, increment letter frequency by 1
# if no, insert a new key-value pair
if letter in letter_freq:
letter_freq[letter] += 1
else:
letter_freq[letter] = 1
 
print(letter_freq)
```
 
%% Output
 
{'M': 1, 'e': 6, 't': 6, ' ': 10, 'm': 1, 'a': 5, 'h': 2, 'b': 1, 'i': 1, 'k': 2, 'r': 2, 'c': 2, 's': 2, 'f': 1, 'o': 3, 'l': 1, '3': 2, ':': 1, '0': 1, 'd': 1, 'y': 1, '.': 1}
 
%% Cell type:markdown id: tags:
 
### Example 2b: find the letter that occurred the most
 
%% Cell type:code id: tags:
 
``` python
most_used_key = None
max_value = None
 
for letter in letter_freq:
# TODO: you already know how to use a for loop to compute max
if max_value == None or letter_freq[letter] > max_value:
max_value = letter_freq[letter]
most_used_key = letter
 
print("The character \"{}\" appeared {} times.".format(str(most_used_key), max_value))
```
 
%% Output
 
The character " " appeared 10 times.
 
%% Cell type:code id: tags:
 
``` python
# TODO: discuss: why not use range-based for loop?
 
for i in range(len(letter_freq)):
print(i) # can you do anything with i in this letter_freq dictionary?
```
 
%% Output
 
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 
%% Cell type:markdown id: tags:
 
### Example 3a: Survey dataset: count every primary major's frequency
 
%% Cell type:code id: tags:
 
``` python
major_freq = {} # KEY: unique major; VALUE: count of unique major
 
# TODO: iterate over each student's data from cs220_data
# TODO: extract "Primary major" column's value
# TODO: check if current student's major already a key in major_freq
# - if yes, increase the corresponding value by 1
# - if no, insert a new key-value pair
 
for row in cs220_data:
major = row[cs220_header.index("Major")]
if major in major_freq:
major_freq[major] += 1
else:
major_freq[major] = 1
 
major_freq
```
 
%% Output
 
{'Engineering: Biomedical': 45,
'Undecided': 23,
'Engineering: Industrial': 58,
'Engineering: Other|Engineering: Computer': 1,
'Data Science': 164,
'Mathematics/AMEP': 34,
'Engineering: Other': 13,
'Economics': 53,
'Psychology': 7,
'Science: Biology/Life': 37,
'Engineering: Mechanical': 198,
'Economics (Mathematical Emphasis)': 7,
'Computer Science': 115,
'Science: Other|Political Science': 1,
'Business: Other': 11,
'Business: Other|Real Estate': 2,
'Engineering: Other|Engineering Physics: Scientific Computing': 1,
'Business: Finance': 30,
'Business: Information Systems': 24,
'Statistics': 26,
'Business: Actuarial': 22,
'Science: Physics': 8,
'Science: Other': 9,
'Business: Other|Accounting': 2,
'Business: Other|business analytics': 1,
'Science: Other|animal sciences': 1,
'Mathematics': 2,
'Health Promotion and Health Equity': 2,
'Art': 1,
'Mathematics, Data Science': 1,
'Science: Other|Science: Genetics and Genomics': 1,
'Statistics (actuarial route)': 1,
'Business: Other|Business: Accounting': 1,
'Engineering: Other|Computer Engineering': 1,
'Engineering: Other|Computer engineering': 1,
'Engineering: Other|Material Science Engineering': 1,
'Civil engineering - hydropower engineering': 1,
'Science: Chemistry': 6,
'Communication arts': 1,
'Business andministration': 1,
'Education': 2,
'Pre-business': 1,
'Science: Other|Environmental Science': 4,
'History': 2,
'Information science': 2,
'consumer behavior and marketplace studies': 1,
'Conservation Biology': 1,
'Engineering: Other|Chemical Engineering': 1,
'Science: Other|Biophysics PhD': 1,
'Business: Other|Technology Strategy/ Product Management': 1,
'Political Science': 6,
'Graphic Design': 1,
'Business: Other|Marketing': 3,
'Cartography and GIS': 1,
'Sociology': 2,
'Business: Other|Consumer Behavior and Marketplace Studies': 1,
'Atmospheric Sciences': 1,
'Languages': 4,
'Engineering Mechanics (Aerospace Engineering)': 1,
'Science: Other|Psychology': 2,
'Engineering: Other|Civil and Environmental Engineering': 1,
'International Studies': 2,
'Agricultural and Applied Economics': 1,
'Business: Other|MHR': 1,
'Medicine': 1,
'Science: Other|Personal Finance': 1,
'Environmental science': 1,
'Geoscience': 1,
'Business: Other|accounting': 1,
'Design Studies': 1,
'Science: Other|Environmetal Science': 1,
'Science: Other|Atmospheric and Oceanic Sciences (AOS)': 1,
'Business: Other|Business Analytics': 1,
'Journalism': 2,
'Science: Other|Politcal Science': 1,
'Communication Sciences and Disorder': 1,
'Science: Other|Geoscience': 1,
'Science: Other|Atmospheric and oceanic science': 1,
'Engineering: Other|Engineering Mechanics': 1,
'Pre-Business': 1,
'Industrial Engineering': 1,
'Mechanical Engineering': 1,
'Science: Other|Environmental science': 1,
'Life Sciences Communication': 1,
'Science: Other|Atmospheric and Oceanic Sciences': 1,
'Rehabilitation Psychology': 1,
'Accounting': 1,
'Engineering: Other|Civil- Intelligent Transportation System': 1,
'Science: Other|Animal and Dairy Science': 1,
'Interior Architecture': 1,
'Science: Other|Atmospheric & Oceanic Sciences': 1,
'Computer Science and Statistics': 1,
'Business analytics': 1,
'Legal Studies': 1,
'Journalism: Strategic Comm./Advertising': 1,
'Master of Public Affairs': 1,
'Environment & Resources': 1,
'Environmental Studies': 1}
 
%% Cell type:markdown id: tags:
 
### Example 3b: find primary major with highest frequency
 
%% Cell type:code id: tags:
 
``` python
# Example 3b: use the algorithm from 2b to find the major with the highest frequency
 
most_used_key = None
max_value = None
 
for major in major_freq:
# TODO: you already know how to use a for loop to compute max
if max_value == None or major_freq[major] > max_value:
max_value = major_freq[major]
most_used_key = major
 
print("The major \"{}\" appeared {} times.".format(str(most_used_key), max_value))
```
 
%% Output
 
The major "Engineering: Mechanical" appeared 198 times.
 
%% Cell type:markdown id: tags:
 
### After Lecture Practice
 
%% Cell type:markdown id: tags:
 
Organize your data structure notes ...
 
<div>
<img src="attachment:DataStructure_notes.png" width="700"/>
</div>
 
%% Cell type:markdown id: tags:
 
#### Review slide deck
 
%% Cell type:markdown id: tags:
 
#### Review this summary of common dictionary methods:
https://www.w3schools.com/python/python_ref_dictionary.asp
......
%% Cell type:markdown id: tags:
 
# Dictionaries
 
* Download ALL files for today's lecture
* Read
* Downey Ch 11 ("A Dictionary is a Mapping" through "Looping and Dictionaries")
* [Python for Everybody, 10.1 - 10.7](https://runestone.academy/ns/books/published/py4e-int/dictionaries/toctree.html)
* [Exam 2 Conflict Form](https://docs.google.com/forms/d/e/1FAIpQLSegJSzTsDHEnygijU3-HQvZDUbTCkHFPKccDkqMt1dGzC67_w/viewform)
* [Regrade Request](https://piazza.com/class/ld8bqui1lgeas/post/105)
* Exam 1: Post questions on Piazza or visit office hour
%% Cell type:markdown id: tags:
# Data Structure Methods
* [String Methods](https://www.w3schools.com/python/python_strings_methods.asp)
* [List Methods](https://www.w3schools.com/python/python_ref_list.asp)
* [Set Methods](https://www.w3schools.com/python/python_ref_set.asp)
* [Dict Methods](https://www.w3schools.com/python/python_ref_dictionary.asp)
%% Cell type:code id: tags:
 
``` python
import csv
```
 
%% Cell type:markdown id: tags:
 
### Warmup 1: Read in the file 'cs220_survey_data.csv' into a lists of lists `csv_data`
 
%% Cell type:code id: tags:
 
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
example_file = open(filename, encoding="utf-8")
example_reader = csv.reader(example_file)
example_data = list(example_reader)
example_file.close()
 
return example_data
```
 
%% Cell type:code id: tags:
 
``` python
csv_data = None # TODO: change this
 
# TODO: compute the length of this list of lists
```
 
%% Cell type:markdown id: tags:
 
### Warmup 2: store the first row in a variable called `cs220_header`
 
%% Cell type:code id: tags:
 
``` python
cs220_header = None # TODO: change this
cs220_header
```
 
%% Cell type:markdown id: tags:
 
### Warmup 3: store the rest of the data in a variable called `cs220_data`
 
%% Cell type:code id: tags:
 
``` python
cs220_data = None # TODO: change this
cs220_data[0]
```
 
%% Cell type:markdown id: tags:
 
### Warmup 4: show the last 3 rows of data
 
%% Cell type:code id: tags:
 
``` python
```
 
%% Cell type:markdown id: tags:
 
### Warmup 5: what is the output of `cs220_data[-1:]`
 
- Be careful with slicing.
- Slicing a string gives a new string
- Slicing a list gives a new list
- Slicing a list of list will always give a new list of list (even if your slice only contains one of the inner lists)
 
%% Cell type:code id: tags:
 
``` python
```
 
%% Cell type:markdown id: tags:
 
### Warmup 6: Write a function that counts the frequency of a value in a column
 
%% Cell type:code id: tags:
 
``` python
def column_frequency(value, col_name):
''' Returns the frequency of value in col_name. '''
count = 0
for row in rows:
if row[header.index(col_name)].lower() == value.lower():
count += 1
return count
```
 
%% Cell type:code id: tags:
 
``` python
# Test your function
column_frequency("pineapple", "Pizza topping")
```
 
%% Cell type:code id: tags:
 
``` python
# Try other test cases
```
 
%% Cell type:markdown id: tags:
 
#### TODO: Discuss: Is there an easy way to count *every* topping's frequency?
 
%% Cell type:markdown id: tags:
 
## Learning Objectives:
 
- Use correct dictionary syntax
- to create a dictionary using either {} or dict()
- to lookup, insert, update, and pop key-value pairs
- Use a for loop, the in operator, and common methods when working with dictionaries.
- Write code that uses a dictionary
- to store frequencies
- to iterate through all key-value pairs
 
%% Cell type:markdown id: tags:
 
## Data Structure
A data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data (source: Wikipedia).
 
<div>
<img src="attachment:Lists.png" width="500"/>
</div>
 
%% Cell type:markdown id: tags:
 
Python contains built-in Data Structures called Collections
 
<div>
<img src="attachment:Collections.png" width="500"/>
</div>
 
Today we'll learn how store data and perform various operations in Dictionaries.
 
%% Cell type:markdown id: tags:
 
## Mappings
 
Common data structure approach:
- store many values
- give each value a label
- use labels to lookup values
 
`list` is an example of a mapping-based data structure
 
%% Cell type:code id: tags:
 
``` python
# index 0 1 2 3
nums_list = [300, 200, 400, 100]
nums_list[2] # lookup using index label
```
 
%% Cell type:markdown id: tags:
 
Labels in a list are inflexible. They can only be consecutive `int`s starting at label 0.
 
%% Cell type:markdown id: tags:
 
### Dictionary
 
A dictionary (`dict`) is like a `list`, but more general. In a list, the indices have to be integers; but a dictionary they can be any **immutable** type. Just like lists, values can be anything.
A dictionary (`dict`) is like a `list`, but more general. In a list, the indices have to be integers; but in a dictionary they can be any **immutable** type. Just like lists, values can be anything.
 
You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.
 
(from Think Python, Chapter 11)
 
%% Cell type:code id: tags:
 
``` python
# empty dictionary
some_dict = {} # we use curly braces to create a dictionary
# We'll shortly discuss about dict versus set
 
# empty dictionary
some_other_dict = dict()
```
 
%% Cell type:markdown id: tags:
 
Just like a `list`, `dict` key-value pairs are separated by a `,`.
 
The `key` and the `value` are separated by a `:`. That is `key:value`.
 
%% Cell type:code id: tags:
 
``` python
# TODO: let's define nums_dict
 
nums_list = [300, 200, 400, 100]
 
 
nums_dict
```
 
%% Cell type:code id: tags:
 
``` python
# a dictionary that stores prices of bakery items
# Notice that a dict can span over more than one line, indentation doesn't matter
 
price_dict = { 'broccoli': 3.95,
'spinach': 1.50,
'donut': 1.25, 'muffin': 2.25,
'brownie': 3.15,
'cookie': 0.79, 'milk': 1.65, 'loaf': 5.99,
'cauliflower': 3.99} # feel free to add some of your own here
price_dict
```
 
%% Cell type:markdown id: tags:
 
#### Dictionaries maintain insertion based ordering in recent versions of Python (3.7 and above versions).
 
- Go back to the previous cell and add 'ice cream': 3.99 key-value pair before 'brownie': 3.15. Re-run the cell to see how the `dict` definition changes with respect to insertion order.
 
%% Cell type:markdown id: tags:
 
### Dictionary lookups
 
- same syntax as `list` indexing
- `some_dict[key]`
 
%% Cell type:markdown id: tags:
 
### Lookup price of 'brownie', 'cookie', and 'ice cream'.
 
%% Cell type:code id: tags:
 
``` python
print()
print()
print()
```
 
%% Cell type:markdown id: tags:
 
### Lookup price of 'sugar'.
 
%% Cell type:code id: tags:
 
``` python
print()
```
 
%% Cell type:markdown id: tags:
 
### Can you perform lookup using values? Answer is no, the mapping is one-way, that is key to value and not vice versa.
 
%% Cell type:code id: tags:
 
``` python
print(price_dict[0.79]) # KeyError
```
 
%% Cell type:markdown id: tags:
 
<div>
<img src="attachment:Parenthetical_characters_1.png" width="700"/>
</div>
 
%% Cell type:markdown id: tags:
 
<div>
<img src="attachment:Parenthetical_characters_2.png" width="500"/>
</div>
 
%% Cell type:markdown id: tags:
 
### Dictionaries are Mutable
 
- update existing key's value
- insert a new key-value pair
- pop method to delete a key-value pair
 
%% Cell type:code id: tags:
 
``` python
# TODO: change price of 'cauliflower' to 2.99
 
price_dict
```
 
%% Cell type:code id: tags:
 
``` python
# TODO: insert new key-value pair 'carrot' mapping to 1.99
 
price_dict
```
 
%% Cell type:code id: tags:
 
``` python
nums_list = [10, 20, 30]
nums_list[3] = 40 # Recall that this doesn't work on a list due to IndexError
 
# TODO: comment out line 2 and use proper syntax to add item 40 to nums_list
 
 
nums_list
```
 
%% Cell type:code id: tags:
 
``` python
# use pop to delete the 'spinach' key-value pair
 
 
# Alternate
del(price_dict['donut'])
 
# try deleting someting that is not there
price_dict.pop('pizza') # KeyError
```
 
%% Cell type:markdown id: tags:
 
### `in` operator enables us to check whether a key exists in the dictionary
 
%% Cell type:code id: tags:
 
``` python
# TODO: fix the above example with a conditional
if ???:
price_dict.pop('pizza')
else:
print("Oops couldn't find it!")
```
 
%% Cell type:markdown id: tags:
 
### `len` built-in function returns the number of key-value pairs in a dictionary
 
%% Cell type:code id: tags:
 
``` python
# TODO: print length of price_dict
```
 
%% Cell type:markdown id: tags:
 
### `for` loop enables us to iterate over keys in a dictionary
 
%% Cell type:code id: tags:
 
``` python
# TODO: iterate over price_dict and print each key-value pair in its own line
```
 
%% Cell type:markdown id: tags:
 
### `keys` method
 
- retrieves keys of a dictionary
- can be converted into a list
 
%% Cell type:code id: tags:
 
``` python
# get all keys and convert to a list
print(price_dict.keys())
```
 
%% Cell type:markdown id: tags:
 
### `values` method
 
- retrieves values of a dictionary
- can be converted into a list
 
%% Cell type:code id: tags:
 
``` python
# get all values and convert to a list
print(price_dict.values())
```
 
%% Cell type:code id: tags:
 
``` python
# use 'in' price_dict, price_dict.keys(), price_dict.values()
 
print('donut' in price_dict) # default is to check the keys
print(9.95 in price_dict) # default is NOT values
print('apple' in price_dict.keys()) # can call out the keys
print(3.95 in price_dict.values()) # can check the values
```
 
%% Cell type:markdown id: tags:
 
### Example 1: find total cost of shopping order
 
%% Cell type:code id: tags:
 
``` python
order = ['pie', 'donut', 'milk', 'cookie', 'tofu'] # add more items to the order
print(order)
 
total_cost = 0
for item in order:
# TODO: check if item is a key in price_dict
# if yes, retrieve the value and add it to total_cost
# if not, display "Couldn't find <item> in price list!"
pass
 
# find the total of the items in the order
print ("Your total cost is ${:.2f}".format(total_cost))
```
 
%% Cell type:markdown id: tags:
 
### Example 2a: find the letter that occurred the most in a sentence
 
%% Cell type:code id: tags:
 
``` python
# start with an empty dictionary
letter_freq = {} # KEY: unique letter; VALUE: count of unique letter
 
sentence = "Meet me at the bike racks after school at 3:30 today."
 
for letter in sentence:
# TODO: check if letter is a key in letter_freq
# if yes, increment letter frequency by 1
# if no, insert a new key-value pair
pass
 
print(letter_freq)
```
 
%% Cell type:markdown id: tags:
 
### Example 2b: find the letter that occurred the most
 
%% Cell type:code id: tags:
 
``` python
most_used_key = None
max_value = None
 
for letter in letter_freq:
# TODO: you already know how to use a for loop to compute max
pass
 
print("The character \"{}\" appeared {} times.".format(str(most_used_key), max_value))
```
 
%% Cell type:code id: tags:
 
``` python
# TODO: discuss: why not use range-based for loop?
 
for i in range(len(letter_freq)):
print(i) # can you do anything with i in this letter_freq dictionary?
```
 
%% Cell type:markdown id: tags:
 
### Example 3a: Survey dataset: count every primary major's frequency
 
%% Cell type:code id: tags:
 
``` python
major_freq = {} # KEY: ???; VALUE: ???
 
# TODO: iterate over each student's data from cs220_data
# TODO: extract "Primary major" column's value
# TODO: check if current student's major already a key in major_freq
# - if yes, increase the corresponding value by 1
# - if no, insert a new key-value pair
 
major_freq
```
 
%% Cell type:markdown id: tags:
 
### Example 3b: find primary major with highest frequency
 
%% Cell type:code id: tags:
 
``` python
# Example 3b: use the algorithm from 2b to find the major with the highest frequency
 
 
print("The major \"{}\" appeared {} times.".format(str(most_used_key), max_value))
```
 
%% Cell type:markdown id: tags:
 
### After Lecture Practice
 
%% Cell type:markdown id: tags:
 
Organize your data structure notes ...
 
<div>
<img src="attachment:DataStructure_notes.png" width="700"/>
</div>
 
%% Cell type:markdown id: tags:
 
#### Review slide deck
 
%% Cell type:markdown id: tags:
 
#### Review this summary of common dictionary methods:
https://www.w3schools.com/python/python_ref_dictionary.asp
......
Source diff could not be displayed: it is too large. Options to address this: view the blob.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment