Skip to content
Snippets Groups Projects
Commit 27622921 authored by gsingh58's avatar gsingh58
Browse files

Lec16 update

parent 174c932e
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:5fef60fd tags:
# Announcements - List Practice
* Download ALL files from Canvas for today's lecture
* Worksheet is online only - great practice for the exam!
* Exam I
* We estimate results will be available this afternoon
* This will tie last semesters turn around time
* [Exam 2 Conflict Form](https://docs.google.com/forms/d/e/1FAIpQLSegJSzTsDHEnygijU3-HQvZDUbTCkHFPKccDkqMt1dGzC67_w/viewform)
* [Regrade Request](https://piazza.com/class/ld8bqui1lgeas/post/105)
* [P6 FAQ](https://piazza.com/class/ld8bqui1lgeas/post/294)
* Quiz 4 due Today at 11:59:00 pm
* We release the answers after the quiz closes so we do not reopen quizzes for students even if something goes wrong.
* We do drop 2 quiz scores (in case something goes wrong)
%% Cell type:markdown id:72348536 tags:
# List Practice
%% Cell type:code id:2bf3c996 tags:
``` python
import csv
```
%% Cell type:markdown id:b34b84ae tags:
### Warmup 1: min / max
%% Cell type:code id:b89c41e1 tags:
``` python
some_list = [45, -4, 66, 220, 10]
min_val = None
for val in some_list:
if min_val == None or val < min_val:
min_val = val
print(min_val)
max_val = None
for val in some_list:
if max_val == None or val > max_val:
max_val = val
print(max_val)
```
%% Output
-4
220
%% Cell type:markdown id:59a689b1 tags:
### Warmup 2: median
%% Cell type:code id:2fd5e101 tags:
``` python
def median(some_items):
"""
Returns median of a list passed as argument
"""
some_items.sort()
n = len(some_items)
if n % 2 == 1:
return some_items[n // 2]
else:
first_middle = some_items[n//2 - 1]
second_middle = some_items[n // 2]
median = (first_middle + second_middle) / 2
return median
nums = [5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
nums = [6, 5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
```
%% Output
Median of [1, 2, 3, 4, 5] is 3
Median of [1, 2, 3, 4, 5, 6] is 3.5
%% Cell type:code id:cf14bf7f tags:
``` python
vals = ["A", "C", "B"]
print("Median of", nums, "is" , median(vals))
vals = ["A", "C", "B", "D"]
# print("Median of", nums, "is" , median(vals)) # does not work due to TypeError
```
%% Output
Median of [1, 2, 3, 4, 5, 6] is B
%% Cell type:markdown id:bdbf6f75 tags:
### set data structure
- **not a sequence**
- no ordering of values:
- this implies that you can only store unique values within a `set`
- very helpful to find unique values stored in a `list`
- easy to convert a `list` to `set` and vice-versa.
- ordering is not guaranteed once we use `set`
%% Cell type:code id:52e80a6b tags:
``` python
some_set = {10, 20, 30, 30, 40, 50, 10} # use a pair of curly braces to define it
some_set
```
%% Output
{10, 20, 30, 40, 50}
%% Cell type:code id:2587184f tags:
``` python
some_list = [10, 20, 30, 30, 40, 50, 10] # Initialize a list containing duplicate numbers
# TODO: to find unique values, convert it into a set
print(set(some_list))
# TODO: convert the set back into a list
print(list(set(some_list)))
```
%% Output
{40, 10, 50, 20, 30}
[40, 10, 50, 20, 30]
%% Cell type:markdown id:8a143e1c tags:
Can you index / slice into a `set`?
%% Cell type:code id:ce43cb95 tags:
``` python
# some_set[1] # doesn't work - remember set has no order
```
%% Cell type:code id:cd6473f8 tags:
``` python
# some_set[1:] # doesn't work - remember set has no order
```
%% Cell type:code id:9d936c1c tags:
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
# open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists
example_data = list(example_reader)
# close the file to tidy up our workspace
example_file.close()
# return the list of lists
return example_data
```
%% Cell type:markdown id:89621c98 tags:
### Student Information Survey data
%% Cell type:code id:d3c252b4 tags:
``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv("cs220_survey_data.csv")
```
%% Cell type:code id:5838ae5f tags:
``` python
# Store the header row into cs220_header, using indexing
cs220_header = cs220_csv[0]
cs220_header
```
%% Output
['section',
'Lecture',
'Age',
'Primary major',
'Other Primary Major',
'Other majors',
'Zip Code',
'Latitude',
'Longitude',
'Pet owner',
'Pizza topping',
'Pet owner',
'Runner',
'Sleep habit',
'Procrastinator',
'Song']
%% Cell type:code id:66fda88d tags:
``` python
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = cs220_csv[1:]
# TODO: use slicing to display top 3 rows data
cs220_data[:3]
```
%% Output
[['COMP SCI 220:LAB345, COMP SCI 220:LEC004',
'LEC004',
'',
'Other (please provide details below).',
'',
'',
'53,706',
'22.5726',
'88.3639',
'No',
'pepperoni',
'dog',
'No',
'night owl',
'Yes',
'Island in the Sun - Harry Belafonte'],
['COMP SCI 220:LEC003, COMP SCI 220:LAB332',
'LEC001',
'19',
'Engineering: Mechanical',
'',
'',
'53,703',
'44.5876',
'-71.9466',
'No',
'pepperoni',
'dog',
'No',
'night owl',
'Yes',
'No role modelz by J. Cole'],
['COMP SCI 220:LAB325, COMP SCI 220:LEC002',
'LEC002',
'18',
'Engineering: Mechanical',
'.',
'.',
'53,706',
'40.7128',
'-74.006',
'Maybe',
'none (just cheese)',
'dog',
'No',
'night owl',
'Yes',
'\xa0biggest bird']]
%% Cell type:markdown id:4267fe3e tags:
### What `Pizza topping` does the 13th student prefer?
%% Cell type:code id:4b8dbe8b tags:
``` python
cs220_data[12][6] # bad example: we hard-coded the column index
```
%% Output
'53,703'
%% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags:
``` python
cs220_data[12][cs220_header.index("Pizza topping")]
```
%% Output
'pepperoni'
%% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags:
``` python
cs220_data[3][cs220_header.index("Lecture")]
```
%% Output
'LEC001'
%% Cell type:markdown id:059de363 tags:
### What **unique** `age` values are included in the dataset?
%% Cell type:code id:45909f22 tags:
``` python
ages = []
for row in cs220_data:
age = row[cs220_header.index("Age")]
if age == '':
continue
age = int(age)
if age < 0 or age > 118:
continue
ages.append(age)
ages = list(set(ages))
ages
```
%% Output
[33, 69, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30]
%% Cell type:markdown id:8e18663d tags:
### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags:
``` python
def cell(row_idx, col_name):
"""
Returns the data value (cell) corresponding to the row index and
the column name of a CSV file.
"""
# TODO: get the index of col_name
col_idx = cs220_header.index(col_name)
# TODO: get the value of cs220_data at the specified cell
val = cs220_data[row_idx][col_idx]
# TODO: handle missing values, by returning None
if val == '':
return None
# TODO: handle type conversions
if col_name in ["Age",]:
return int(val)
elif col_name in ['Latitude', 'Longitude']:
return float(val)
return val
```
%% Cell type:markdown id:1624fafd tags:
### Find average age per lecture.
%% Cell type:code id:5755511a tags:
``` python
# TODO: initialize 4 lists for the 4 lectures
lec1_ages = []
lec2_ages = []
lec3_ages = []
lec4_ages = []
# Iterate over the data and populate the lists
for row_idx in range(len(cs220_data)):
age = cell(row_idx, "Age")
if age != None and age > 0 and age < 125:
lecture = cell(row_idx, "Lecture")
if lecture == "LEC001":
lec1_ages.append(age)
elif lecture == "LEC002":
lec2_ages.append(age)
elif lecture == "LEC003":
lec3_ages.append(age)
elif lecture == "LEC004":
lec4_ages.append(age)
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
```
%% Output
LEC001 average student age: 19.71
LEC002 average student age: 20.24
LEC003 average student age: 19.41
LEC004 average student age: 19.43
%% Cell type:markdown id:b7c8e726 tags:
### Function `avg_age_per_lecture(lecture)`
%% Cell type:code id:4894d0c7 tags:
``` python
def avg_age_per_lecture(lecture):
'''
avg_age_per_lecture(lecture) returns the average age of
the students in the given `lecture`; if there are no
students in the given `lecture`, it returns `None`
'''
# To compute average you don't need to actually populate a list.
# But here a list will come in handy. It will help you with the None return requirement.
ages = []
for row_idx in range(len(cs220_data)):
curr_lecture = cell(row_idx, "Lecture")
if lecture == curr_lecture:
age = cell(row_idx, "Age")
if age != None and age > 0 and age <= 125:
ages.append(age)
if len(ages) > 0:
return sum(ages) / len(ages)
else:
return None
```
%% Cell type:code id:f0a05e42 tags:
``` python
avg_age_per_lecture("LEC003")
```
%% Output
19.405228758169933
%% Cell type:code id:ec9af3da tags:
``` python
print(avg_age_per_lecture("LEC007"))
```
%% Output
None
%% Cell type:markdown id:0a4d313c tags:
### Find all unique zip codes.
%% Cell type:code id:bf8f0ecc tags:
``` python
# TODO: initialize list of keep track of zip codes
zip_codes = []
for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code")
if zip_code != None:
zip_codes.append(zip_code)
list(set(zip_codes))
```
%% Output
['53,713',
'53,589',
'53,558',
'53,719',
'53,563',
'53,711',
'53,572',
'53,051',
'98,607',
'53,121',
'-53,703',
'93,703',
'533,706',
'53,521',
'53,575',
'1,520',
'53.715',
'8,820',
'53,562',
'53,716',
'53,705',
'53,590',
'94,596',
'53,527',
'53,726',
'53,150',
'53,701',
'57,303',
'56,511',
'53,703',
'53,715',
'53,704',
'60,517',
'53.706',
'54,703',
'53,706',
'95,030',
'52,703',
'50,376',
'51,735']
%% Cell type:markdown id:94548bf4 tags:
### `sort` method versus `sorted` function
- `sort` (and other list methods) have an impact on the original list
- `sorted` function returns a new list with expected ordering
- default sorting order is ascending / alphanumeric
- `reverse` parameter is applicable for both `sort` method and `sorted` function:
- enables you to specify descending order by passing argument as `True`
%% Cell type:code id:c1e555f9 tags:
``` python
some_list = [10, 4, 25, 2, -10]
```
%% Cell type:code id:152297bb tags:
``` python
# TODO: Invoke sort method
rv = some_list.sort()
print(some_list)
# What does the sort method return?
# TODO: Capture return value into a variable rv and print the return value.
print(rv)
```
%% Output
[-10, 2, 4, 10, 25]
None
%% Cell type:markdown id:3c0d5e7d tags:
`sort` method returns `None` because it sorts the values in the original list
%% Cell type:code id:c06d8976 tags:
``` python
# TODO: invoke sorted function and pass some_list as argument
# TODO: capture return value into sorted_some_list
sorted_some_list = sorted(some_list)
# What does the sorted function return?
# It returns a brand new list with the values in sorted order
print(sorted_some_list)
```
%% Output
[-10, 2, 4, 10, 25]
%% Cell type:markdown id:ded0304c tags:
TODO: go back to `sort` method call and `sorted` function call and pass keyword argument `reverse = True`.
%% Cell type:markdown id:3579e061 tags:
Can you call `sort` method on a set?
%% Cell type:code id:14d8a670 tags:
``` python
# some_set.sort()
# doesn't work: no method named sort associated with type set
# you cannot sort a set because of the lack of ordering
```
%% Cell type:markdown id:1fb64b44 tags:
Can you pass a `set` as argument to `sorted` function? Python is intelligent :)
%% Cell type:code id:03b1183f tags:
``` python
# works because Python converts the set into a list and then sorts the list
sorted(some_set)
```
%% Output
[10, 20, 30, 40, 50]
%% Cell type:markdown id:efa2869e tags:
### Function: `find_majors(phrase)`
%% Cell type:code id:655f876d tags:
``` python
def find_majors(phrase):
"""
find_majors(phrase) returns a list of all the room names that contain the
substring (case insensitive match) `phrase`.
"""
# TODO: initialize the target list here
majors = []
# TODO: iterate over row indices
for row_idx in range(len(cs220_data)):
major = cell(row_idx, "Primary major")
if phrase.lower() in major.lower():
majors.append(major)
return majors
```
%% Cell type:markdown id:ed19265f tags:
### Find all `major` that contain **either** `"Computer"` **or** `"Science"`.
Your output **must** be a *list*. The order **does not** matter, but if a `major` contains **both** `"Computer"` and `"Science"`, then the room must be included **only once** in your list.
%% Cell type:code id:ab656189 tags:
``` python
computer_majors = find_majors("Computer")
science_majors = find_majors("Science")
computer_and_science_majors = computer_majors + science_majors
# TODO: Now find just the unique values
computer_and_science_majors = list(set(computer_and_science_majors))
computer_and_science_majors
```
%% Output
['Data Science',
'Science: Other',
'Science: Chemistry',
'Science: Biology/Life',
'Computer Science',
'Science: Physics']
%% Cell type:markdown id:64fd0945 tags:
### Order the `major` that contain **either** `"Computer"` **or** `"Science"` using ascending order.
%% Cell type:code id:d4e2e6fc tags:
``` python
# VERSION 1
# Be very careful: if you use sorted, make sure your return value
# variable matches with the variable for that project question
sorted_computer_and_science_majors = sorted(computer_and_science_majors)
sorted_computer_and_science_majors
```
%% Output
['Computer Science',
'Data Science',
'Science: Biology/Life',
'Science: Chemistry',
'Science: Other',
'Science: Physics']
%% Cell type:code id:c28e77ce tags:
``` python
# VERSION 2
computer_and_science_majors.sort()
computer_and_science_majors
```
%% Output
['Computer Science',
'Data Science',
'Science: Biology/Life',
'Science: Chemistry',
'Science: Other',
'Science: Physics']
%% Cell type:markdown id:e354b781 tags:
### Order the `major` that contain **either** `"Computer"` **or** `"Science"` using descending order.
%% Cell type:code id:ca887135 tags:
``` python
# VERSION 1
# Be very careful: if you use sorted, make sure your return value
# variable matches with the variable for that project question
reverse_sorted_computer_and_science_majors = sorted(computer_and_science_majors, reverse = True)
reverse_sorted_computer_and_science_majors
```
%% Output
['Science: Physics',
'Science: Other',
'Science: Chemistry',
'Science: Biology/Life',
'Data Science',
'Computer Science']
%% Cell type:code id:1606075f tags:
``` python
# VERSION 2
computer_and_science_majors.sort(reverse = True)
computer_and_science_majors
```
%% Output
['Science: Physics',
'Science: Other',
'Science: Chemistry',
'Science: Biology/Life',
'Data Science',
'Computer Science']
%% Cell type:markdown id:31a381fe tags:
## Self-practice
%% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major?
......
%% Cell type:markdown id:2cceab19 tags:
# Announcements - List Practice
* Download ALL files from Canvas for today's lecture
* Worksheet is online only - great practice for the exam!
* Exam I
* We estimate results will be available this afternoon
* This will tie last semesters turn around time
* [Exam 2 Conflict Form](https://docs.google.com/forms/d/e/1FAIpQLSegJSzTsDHEnygijU3-HQvZDUbTCkHFPKccDkqMt1dGzC67_w/viewform)
* [Regrade Request](https://piazza.com/class/ld8bqui1lgeas/post/105)
* [P6 FAQ](https://piazza.com/class/ld8bqui1lgeas/post/294)
* Quiz 4 due Today at 11:59:00 pm
* We release the answers after the quiz closes so we do not reopen quizzes for students even if something goes wrong.
* We do drop 2 quiz scores (in case something goes wrong)
%% Cell type:markdown id:72348536 tags:
# List Practice
%% Cell type:code id:d21a94b5 tags:
``` python
import csv
```
%% Cell type:markdown id:cd8a434c tags:
### Warmup 1: min / max
%% Cell type:code id:baa730ba tags:
``` python
some_list = [45, -4, 66, 220, 10]
min_val = None
for val in some_list:
if min_val == None or val < min_val:
min_val = val
print(min_val)
max_val = None
for val in some_list:
if max_val == None or val > max_val:
max_val = val
print(max_val)
```
%% Cell type:markdown id:3502c700 tags:
### Warmup 2: median
%% Cell type:code id:414ae09e tags:
``` python
def median(some_items):
"""
Returns median of a list passed as argument
"""
pass
nums = [5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
nums = [6, 5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
```
%% Cell type:code id:73fa337e tags:
``` python
vals = ["A", "C", "B"]
print("Median of", nums, "is" , median(vals))
vals = ["A", "C", "B", "D"]
# print("Median of", nums, "is" , median(vals)) # does not work due to TypeError
```
%% Cell type:markdown id:050fd57c tags:
### set data structure
- **not a sequence**
- no ordering of values:
- this implies that you can only store unique values within a `set`
- very helpful to find unique values stored in a `list`
- easy to convert a `list` to `set` and vice-versa.
- ordering is not guaranteed once we use `set`
%% Cell type:code id:7d4a693f tags:
``` python
some_set = {10, 20, 30, 30, 40, 50, 10} # use a pair of curly braces to define it
some_set
```
%% Cell type:code id:baef596c tags:
``` python
some_list = [10, 20, 30, 30, 40, 50, 10] # Initialize a list containing duplicate numbers
# TODO: to find unique values, convert it into a set
print(set(some_list))
# TODO: convert the set back into a list
print(list(set(some_list)))
```
%% Cell type:markdown id:2be52d13 tags:
Can you index / slice into a `set`?
%% Cell type:code id:f622a5eb tags:
``` python
some_set[1] # doesn't work - remember set has no order
```
%% Cell type:code id:e679d3a7 tags:
``` python
some_set[1:] # doesn't work - remember set has no order
```
%% Cell type:code id:9d936c1c tags:
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
# open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists
example_data = list(example_reader)
# close the file to tidy up our workspace
example_file.close()
# return the list of lists
return example_data
```
%% Cell type:markdown id:89621c98 tags:
### Student Information Survey data
%% Cell type:code id:d3c252b4 tags:
``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv(???)
```
%% Cell type:code id:5838ae5f tags:
``` python
# Store the header row into cs220_header, using indexing
cs220_header = ???
cs220_header
```
%% Cell type:code id:66fda88d tags:
``` python
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = ???
# TODO: use slicing to display top 3 rows data
cs220_data???
```
%% Cell type:markdown id:4267fe3e tags:
### What `Pizza topping` does the 13th student prefer?
%% Cell type:code id:4b8dbe8b tags:
``` python
# bad example: we hard-coded the column index
```
%% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags:
``` python
```
%% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags:
``` python
```
%% Cell type:markdown id:059de363 tags:
### What **unique** `age` values are included in the dataset?
%% Cell type:code id:45909f22 tags:
``` python
```
%% Cell type:markdown id:8e18663d tags:
### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags:
``` python
def cell(row_idx, col_name):
"""
Returns the data value (cell) corresponding to the row index and
the column name of a CSV file.
"""
# TODO: get the index of col_name
# TODO: get the value of cs220_data at the specified cell
# TODO: handle missing values, by returning None
# TODO: handle type conversions
return val
```
%% Cell type:markdown id:e2278e4c tags:
### Find average age per lecture.
%% Cell type:code id:09e90e94 tags:
``` python
# TODO: initialize 4 lists for the 4 lectures
# Iterate over the data and populate the lists
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
```
%% Cell type:markdown id:b7c8e726 tags:
### Function `avg_age_per_lecture(lecture)`
%% Cell type:code id:fa5598e0 tags:
``` python
def avg_age_per_lecture(lecture):
'''
avg_age_per_lecture(lecture) returns the average age of
the students in the given `lecture`; if there are no
students in the given `lecture`, it returns `None`
'''
# To compute average you don't need to actually populate a list.
# But here a list will come in handy. It will help you with the None return requirement.
pass
```
%% Cell type:code id:f0a05e42 tags:
``` python
avg_age_per_lecture("LEC003")
```
%% Cell type:code id:9f2c7e6e tags:
``` python
print(avg_age_per_lecture("LEC007"))
```
%% Cell type:markdown id:c07f9cfc tags:
### Find all unique zip codes.
%% Cell type:code id:a6df8ef5 tags:
``` python
# TODO: initialize list of keep track of zip codes
zip_codes = []
for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code")
if zip_code != None:
zip_codes.append(zip_code)
zip_codes # How do we get the unique values?
```
%% Cell type:markdown id:94548bf4 tags:
### `sort` method versus `sorted` function
- `sort` (and other list methods) have an impact on the original list
- `sorted` function returns a new list with expected ordering
- default sorting order is ascending / alphanumeric
- `reverse` parameter is applicable for both `sort` method and `sorted` function:
- enables you to specify descending order by passing argument as `True`
%% Cell type:code id:c1e555f9 tags:
``` python
some_list = [10, 4, 25, 2, -10]
```
%% Cell type:code id:152297bb tags:
``` python
# TODO: Invoke sort method
rv = ???
print(some_list)
# What does the sort method return?
# TODO: Capture return value into a variable rv and print the return value.
print(rv)
```
%% Cell type:markdown id:3c0d5e7d tags:
`sort` method returns `None` because it sorts the values in the original list
%% Cell type:code id:c06d8976 tags:
``` python
# TODO: invoke sorted function and pass some_list as argument
# TODO: capture return value into sorted_some_list
???
# What does the sorted function return?
# It returns a brand new list with the values in sorted order
print(sorted_some_list)
```
%% Cell type:markdown id:ded0304c tags:
TODO: go back to `sort` method call and `sorted` function call and pass keyword argument `reverse = True`.
%% Cell type:markdown id:35894ef5 tags:
Can you call `sort` method on a set?
%% Cell type:code id:fc08879e tags:
``` python
some_set.sort()
# doesn't work: no method named sort associated with type set
# you cannot sort a set because of the lack of ordering
```
%% Cell type:markdown id:99161c42 tags:
Can you pass a `set` as argument to `sorted` function? Python is intelligent :)
%% Cell type:code id:2549df29 tags:
``` python
# works because Python converts the set into a list and then sorts the list
sorted(some_set)
```
%% Cell type:markdown id:5c7f3489 tags:
### Function: `find_majors(phrase)`
%% Cell type:code id:b6adbfe0 tags:
``` python
def find_majors(phrase):
"""
find_majors(phrase) returns a list of all the room names that contain the
substring (case insensitive match) `phrase`.
"""
# TODO: initialize the target list here
# TODO: iterate over row indices
for row_idx in range(len(cs220_data)):
major = cell(row_idx, "Major")
# TODO: write the actual logic here
return majors
```
%% Cell type:markdown id:1b7f671f tags:
### Find all `major` that contain **either** `"Computer"` **or** `"Science"`.
Your output **must** be a *list*. The order **does not** matter, but if a `major` contains **both** `"Computer"` and `"Science"`, then the room must be included **only once** in your list.
%% Cell type:code id:ed895a3b tags:
``` python
computer_majors = ???
science_majors = ???
computer_and_science_majors = ???
# TODO: Now find just the unique values
computer_and_science_majors = ???
computer_and_science_majors
```
%% Cell type:markdown id:64fd0945 tags:
### Order the `major` that contain **either** `"Computer"` **or** `"Science"` using ascending order.
%% Cell type:code id:efcdf514 tags:
``` python
# VERSION 1
# Be very careful: if you use sorted, make sure your return value
# variable matches with the variable for that project question
sorted_computer_and_science_majors = sorted(computer_and_science_majors)
sorted_computer_and_science_majors
```
%% Cell type:code id:c28e77ce tags:
``` python
# VERSION 2
computer_and_science_majors.sort()
computer_and_science_majors
```
%% Cell type:markdown id:e354b781 tags:
### Order the `major` that contain **either** `"Computer"` **or** `"Science"` using descending order.
%% Cell type:code id:ca887135 tags:
``` python
# VERSION 1
# Be very careful: if you use sorted, make sure your return value
# variable matches with the variable for that project question
reverse_sorted_computer_and_science_majors = sorted(computer_and_science_majors, reverse = ???)
reverse_sorted_computer_and_science_majors
```
%% Cell type:code id:b6c61532 tags:
``` python
# VERSION 2
computer_and_science_majors.sort(reverse = ???)
computer_and_science_majors
```
%% Cell type:markdown id:31a381fe tags:
## Self-practice
%% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major?
......
%% Cell type:markdown id:6850fba1 tags:
# Announcements - List Practice
* Download ALL files from Canvas for today's lecture
* Worksheet is online only - great practice for the exam!
* Exam I
* We estimate results will be available this afternoon
* This will tie last semesters turn around time
* [Exam 2 Conflict Form](https://docs.google.com/forms/d/e/1FAIpQLSegJSzTsDHEnygijU3-HQvZDUbTCkHFPKccDkqMt1dGzC67_w/viewform)
* [Regrade Request](https://piazza.com/class/ld8bqui1lgeas/post/105)
* [P6 FAQ](https://piazza.com/class/ld8bqui1lgeas/post/294)
* Quiz 4 due Today at 11:59:00 pm
* We release the answers after the quiz closes so we do not reopen quizzes for students even if something goes wrong.
* We do drop 2 quiz scores (in case something goes wrong)
%% Cell type:markdown id:72348536 tags:
# List Practice
%% Cell type:code id:d21a94b5 tags:
``` python
import csv
```
%% Cell type:markdown id:cd8a434c tags:
### Warmup 1: min / max
%% Cell type:code id:baa730ba tags:
``` python
some_list = [45, -4, 66, 220, 10]
min_val = None
for val in some_list:
if min_val == None or val < min_val:
min_val = val
print(min_val)
max_val = None
for val in some_list:
if max_val == None or val > max_val:
max_val = val
print(max_val)
```
%% Cell type:markdown id:3502c700 tags:
### Warmup 2: median
%% Cell type:code id:414ae09e tags:
``` python
def median(some_items):
"""
Returns median of a list passed as argument
"""
pass
nums = [5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
nums = [6, 5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
```
%% Cell type:code id:73fa337e tags:
``` python
vals = ["A", "C", "B"]
print("Median of", nums, "is" , median(vals))
vals = ["A", "C", "B", "D"]
# print("Median of", nums, "is" , median(vals)) # does not work due to TypeError
```
%% Cell type:markdown id:050fd57c tags:
### set data structure
- **not a sequence**
- no ordering of values:
- this implies that you can only store unique values within a `set`
- very helpful to find unique values stored in a `list`
- easy to convert a `list` to `set` and vice-versa.
- ordering is not guaranteed once we use `set`
%% Cell type:code id:7d4a693f tags:
``` python
some_set = {10, 20, 30, 30, 40, 50, 10} # use a pair of curly braces to define it
some_set
```
%% Cell type:code id:baef596c tags:
``` python
some_list = [10, 20, 30, 30, 40, 50, 10] # Initialize a list containing duplicate numbers
# TODO: to find unique values, convert it into a set
print(set(some_list))
# TODO: convert the set back into a list
print(list(set(some_list)))
```
%% Cell type:markdown id:2be52d13 tags:
Can you index / slice into a `set`?
%% Cell type:code id:f622a5eb tags:
``` python
some_set[1] # doesn't work - remember set has no order
```
%% Cell type:code id:e679d3a7 tags:
``` python
some_set[1:] # doesn't work - remember set has no order
```
%% Cell type:code id:9d936c1c tags:
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
# open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists
example_data = list(example_reader)
# close the file to tidy up our workspace
example_file.close()
# return the list of lists
return example_data
```
%% Cell type:markdown id:89621c98 tags:
### Student Information Survey data
%% Cell type:code id:d3c252b4 tags:
``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv(???)
```
%% Cell type:code id:5838ae5f tags:
``` python
# Store the header row into cs220_header, using indexing
cs220_header = ???
cs220_header
```
%% Cell type:code id:66fda88d tags:
``` python
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = ???
# TODO: use slicing to display top 3 rows data
cs220_data???
```
%% Cell type:markdown id:4267fe3e tags:
### What `Pizza topping` does the 13th student prefer?
%% Cell type:code id:4b8dbe8b tags:
``` python
# bad example: we hard-coded the column index
```
%% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags:
``` python
```
%% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags:
``` python
```
%% Cell type:markdown id:059de363 tags:
### What **unique** `age` values are included in the dataset?
%% Cell type:code id:45909f22 tags:
``` python
```
%% Cell type:markdown id:8e18663d tags:
### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags:
``` python
def cell(row_idx, col_name):
"""
Returns the data value (cell) corresponding to the row index and
the column name of a CSV file.
"""
# TODO: get the index of col_name
# TODO: get the value of cs220_data at the specified cell
# TODO: handle missing values, by returning None
# TODO: handle type conversions
return val
```
%% Cell type:markdown id:e2278e4c tags:
### Find average age per lecture.
%% Cell type:code id:09e90e94 tags:
``` python
# TODO: initialize 4 lists for the 4 lectures
# Iterate over the data and populate the lists
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
```
%% Cell type:markdown id:b7c8e726 tags:
### Function `avg_age_per_lecture(lecture)`
%% Cell type:code id:fa5598e0 tags:
``` python
def avg_age_per_lecture(lecture):
'''
avg_age_per_lecture(lecture) returns the average age of
the students in the given `lecture`; if there are no
students in the given `lecture`, it returns `None`
'''
# To compute average you don't need to actually populate a list.
# But here a list will come in handy. It will help you with the None return requirement.
pass
```
%% Cell type:code id:f0a05e42 tags:
``` python
avg_age_per_lecture("LEC003")
```
%% Cell type:code id:9f2c7e6e tags:
``` python
print(avg_age_per_lecture("LEC007"))
```
%% Cell type:markdown id:c07f9cfc tags:
### Find all unique zip codes.
%% Cell type:code id:a6df8ef5 tags:
``` python
# TODO: initialize list of keep track of zip codes
zip_codes = []
for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code")
if zip_code != None:
zip_codes.append(zip_code)
zip_codes # How do we get the unique values?
```
%% Cell type:markdown id:94548bf4 tags:
### `sort` method versus `sorted` function
- `sort` (and other list methods) have an impact on the original list
- `sorted` function returns a new list with expected ordering
- default sorting order is ascending / alphanumeric
- `reverse` parameter is applicable for both `sort` method and `sorted` function:
- enables you to specify descending order by passing argument as `True`
%% Cell type:code id:c1e555f9 tags:
``` python
some_list = [10, 4, 25, 2, -10]
```
%% Cell type:code id:152297bb tags:
``` python
# TODO: Invoke sort method
rv = ???
print(some_list)
# What does the sort method return?
# TODO: Capture return value into a variable rv and print the return value.
print(rv)
```
%% Cell type:markdown id:3c0d5e7d tags:
`sort` method returns `None` because it sorts the values in the original list
%% Cell type:code id:c06d8976 tags:
``` python
# TODO: invoke sorted function and pass some_list as argument
# TODO: capture return value into sorted_some_list
???
# What does the sorted function return?
# It returns a brand new list with the values in sorted order
print(sorted_some_list)
```
%% Cell type:markdown id:ded0304c tags:
TODO: go back to `sort` method call and `sorted` function call and pass keyword argument `reverse = True`.
%% Cell type:markdown id:35894ef5 tags:
Can you call `sort` method on a set?
%% Cell type:code id:fc08879e tags:
``` python
some_set.sort()
# doesn't work: no method named sort associated with type set
# you cannot sort a set because of the lack of ordering
```
%% Cell type:markdown id:99161c42 tags:
Can you pass a `set` as argument to `sorted` function? Python is intelligent :)
%% Cell type:code id:2549df29 tags:
``` python
# works because Python converts the set into a list and then sorts the list
sorted(some_set)
```
%% Cell type:markdown id:5c7f3489 tags:
### Function: `find_majors(phrase)`
%% Cell type:code id:b6adbfe0 tags:
``` python
def find_majors(phrase):
"""
find_majors(phrase) returns a list of all the room names that contain the
substring (case insensitive match) `phrase`.
"""
# TODO: initialize the target list here
# TODO: iterate over row indices
for row_idx in range(len(cs220_data)):
major = cell(row_idx, "Major")
# TODO: write the actual logic here
return majors
```
%% Cell type:markdown id:1b7f671f tags:
### Find all `major` that contain **either** `"Computer"` **or** `"Science"`.
Your output **must** be a *list*. The order **does not** matter, but if a `major` contains **both** `"Computer"` and `"Science"`, then the room must be included **only once** in your list.
%% Cell type:code id:ed895a3b tags:
``` python
computer_majors = ???
science_majors = ???
computer_and_science_majors = ???
# TODO: Now find just the unique values
computer_and_science_majors = ???
computer_and_science_majors
```
%% Cell type:markdown id:64fd0945 tags:
### Order the `major` that contain **either** `"Computer"` **or** `"Science"` using ascending order.
%% Cell type:code id:efcdf514 tags:
``` python
# VERSION 1
# Be very careful: if you use sorted, make sure your return value
# variable matches with the variable for that project question
sorted_computer_and_science_majors = sorted(computer_and_science_majors)
sorted_computer_and_science_majors
```
%% Cell type:code id:c28e77ce tags:
``` python
# VERSION 2
computer_and_science_majors.sort()
computer_and_science_majors
```
%% Cell type:markdown id:e354b781 tags:
### Order the `major` that contain **either** `"Computer"` **or** `"Science"` using descending order.
%% Cell type:code id:ca887135 tags:
``` python
# VERSION 1
# Be very careful: if you use sorted, make sure your return value
# variable matches with the variable for that project question
reverse_sorted_computer_and_science_majors = sorted(computer_and_science_majors, reverse = ???)
reverse_sorted_computer_and_science_majors
```
%% Cell type:code id:b6c61532 tags:
``` python
# VERSION 2
computer_and_science_majors.sort(reverse = ???)
computer_and_science_majors
```
%% Cell type:markdown id:31a381fe tags:
## Self-practice
%% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major?
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment