Skip to content
Snippets Groups Projects
Commit 3775be37 authored by gsingh58's avatar gsingh58
Browse files

Lec15 updated

parent a8f64c53
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:9330bd96 tags:
# Announcements - Tabular Data, CSV Files
* Download ALL files for today's lecture
* P3 Last day to request regrades
* P4 Last day to turn in late
* P5 Due today
* Q4 Released tonight at 5 pm
* Exam
* We estimate results will be available Thursday
* Quiz 4 released tonight : due Friday
* Office Hours - long lines?
* Check the calendar
* Find a partner!
* Be patient on Tuesdays and Wednesdays
%% Cell type:markdown id:72348536 tags: %% Cell type:markdown id:72348536 tags:
# Comma Separated Values (CSV) # Comma Separated Values (CSV)
%% Cell type:code id:ba562f5e tags: %% Cell type:code id:ba562f5e tags:
``` python ``` python
import csv import csv
``` ```
%% Cell type:markdown id:ddcf7595 tags: %% Cell type:markdown id:ddcf7595 tags:
### Warmup 1 ### Warmup 1
- Use `list` type's `index` method to lookup index of "ice cream" - Use `list` type's `index` method to lookup index of "ice cream"
- Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp - Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp
%% Cell type:code id:a0fecc18 tags: %% Cell type:code id:a0fecc18 tags:
``` python ``` python
dairy = ["milk", "ice cream", "cheese", "yogurt" ] dairy = ["milk", "ice cream", "cheese", "yogurt" ]
print(dairy.index("ice cream")) print(dairy.index("ice cream"))
# print(dairy.index("paneer")) # doesn't work due to ValueError (runtime error) # print(dairy.index("paneer")) # doesn't work due to ValueError (runtime error)
``` ```
%% Output %% Output
1 1
%% Cell type:markdown id:a1a4e2e7 tags: %% Cell type:markdown id:a1a4e2e7 tags:
### Warmup 2 ### Warmup 2
Use `in` operator to complete the condition to check if food_shelf contains any dairy products. Use `in` operator to complete the condition to check if food_shelf contains any dairy products.
%% Cell type:code id:eae06501 tags: %% Cell type:code id:eae06501 tags:
``` python ``` python
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"] food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf: for item in food_shelf:
if item.lower() in dairy: if item.lower() in dairy:
print(item, "is dairy") print(item, "is dairy")
else: else:
print(item, "is not dairy") print(item, "is not dairy")
``` ```
%% Output %% Output
peanut butter is not dairy peanut butter is not dairy
milk is dairy milk is dairy
bread is not dairy bread is not dairy
cheese is dairy cheese is dairy
YOGURT is dairy YOGURT is dairy
%% Cell type:markdown id:75811c5d tags: %% Cell type:markdown id:75811c5d tags:
## Warmup 3 ## Warmup 3
Determine median of a list. Determine median of a list.
Examples: Examples:
- Median of [1, 2, 3, 4, 5] is: 3 - Median of [1, 2, 3, 4, 5] is: 3
- Median of [1, 2, 3, 4, 5, 6] is: 3.5 - Median of [1, 2, 3, 4, 5, 6] is: 3.5
%% Cell type:code id:97b3f76f tags: %% Cell type:code id:97b3f76f tags:
``` python ``` python
def median(some_items): def median(some_items):
""" """
Returns median of a list passed as argument Returns median of a list passed as argument
""" """
some_items.sort() some_items.sort()
n = len(some_items) n = len(some_items)
if n % 2 == 1: if n % 2 == 1:
return some_items[n // 2] return some_items[n // 2]
else: else:
first_middle = some_items[n//2 - 1] first_middle = some_items[n//2 - 1]
second_middle = some_items[n // 2] second_middle = some_items[n // 2]
median = (first_middle + second_middle) / 2 median = (first_middle + second_middle) / 2
return median return median
``` ```
%% Cell type:code id:a4d704c0 tags: %% Cell type:code id:a4d704c0 tags:
``` python ``` python
nums = [5, 4, 3, 2, 1] nums = [5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums)) print("Median of", nums, "is" , median(nums))
nums = [6, 5, 4, 3, 2, 1] nums = [6, 5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums)) print("Median of", nums, "is" , median(nums))
vals = ["A", "C", "B"] vals = ["A", "C", "B"]
print("Median of", vals, "is" , median(vals)) print("Median of", vals, "is" , median(vals))
vals = ["A", "C", "B", "D"] vals = ["A", "C", "B", "D"]
# print("Median of", nums, "is" , median(vals)) # does not work due to TypeError # print("Median of", nums, "is" , median(vals)) # does not work due to TypeError
``` ```
%% Output %% Output
Median of [1, 2, 3, 4, 5] is 3 Median of [1, 2, 3, 4, 5] is 3
Median of [1, 2, 3, 4, 5, 6] is 3.5 Median of [1, 2, 3, 4, 5, 6] is 3.5
Median of ['A', 'B', 'C'] is B Median of ['A', 'B', 'C'] is B
%% Cell type:markdown id:a9d5085c tags: %% Cell type:markdown id:a9d5085c tags:
## Learning Objectives: ## Learning Objectives:
- Open an Excel file and export it to a Comma Separated Value file. - Open an Excel file and export it to a Comma Separated Value file.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet. - Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists. - Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists. - Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists. - Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:code id:9d936c1c tags: %% Cell type:code id:9d936c1c tags:
``` python ``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/ # inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename): def process_csv(filename):
# open the file, its a text file utf-8 # open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8") example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object # prepare it for reading as a CSV object
example_reader = csv.reader(example_file) example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists # use the built-in list function to convert this into a list of lists
example_data = list(example_reader) example_data = list(example_reader)
# close the file to tidy up our workspace # close the file to tidy up our workspace
example_file.close() example_file.close()
# return the list of lists # return the list of lists
return example_data return example_data
``` ```
%% Cell type:markdown id:89621c98 tags: %% Cell type:markdown id:89621c98 tags:
### Student Information Survey data ### Student Information Survey data
%% Cell type:code id:d3c252b4 tags: %% Cell type:code id:d3c252b4 tags:
``` python ``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv # TODO: call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv("cs220_survey_data.csv") cs220_csv = process_csv("cs220_survey_data.csv")
``` ```
%% Cell type:code id:5838ae5f tags: %% Cell type:code id:5838ae5f tags:
``` python ``` python
# Store the header row into cs220_header, using indexing # Store the header row into cs220_header, using indexing
cs220_header = cs220_csv[0] cs220_header = cs220_csv[0]
cs220_header cs220_header
``` ```
%% Output %% Output
['section', ['section',
'Lecture', 'Lecture',
'Age', 'Age',
'Primary major', 'Primary major',
'Other Primary Major', 'Other Primary Major',
'Other majors', 'Other majors',
'Zip Code', 'Zip Code',
'Latitude', 'Latitude',
'Longitude', 'Longitude',
'Pet owner', 'Pet owner',
'Pizza topping', 'Pizza topping',
'Pet owner', 'Pet owner',
'Runner', 'Runner',
'Sleep habit', 'Sleep habit',
'Procrastinator', 'Procrastinator',
'Song'] 'Song']
%% Cell type:code id:66fda88d tags: %% Cell type:code id:66fda88d tags:
``` python ``` python
# TODO: Store all of the data rows into cs220_data, using slicing # TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = cs220_csv[1:] cs220_data = cs220_csv[1:]
# TODO: use slicing to display top 3 rows data # TODO: use slicing to display top 3 rows data
cs220_data[:3] cs220_data[:3]
``` ```
%% Output %% Output
[['COMP SCI 220:LAB345, COMP SCI 220:LEC004', [['COMP SCI 220:LAB345, COMP SCI 220:LEC004',
'LEC004', 'LEC004',
'', '',
'Other (please provide details below).', 'Other (please provide details below).',
'', '',
'', '',
'53,706', '53,706',
'22.5726', '22.5726',
'88.3639', '88.3639',
'No', 'No',
'pepperoni', 'pepperoni',
'dog', 'dog',
'No', 'No',
'night owl', 'night owl',
'Yes', 'Yes',
'Island in the Sun - Harry Belafonte'], 'Island in the Sun - Harry Belafonte'],
['COMP SCI 220:LEC003, COMP SCI 220:LAB332', ['COMP SCI 220:LEC003, COMP SCI 220:LAB332',
'LEC001', 'LEC001',
'19', '19',
'Engineering: Mechanical', 'Engineering: Mechanical',
'', '',
'', '',
'53,703', '53,703',
'44.5876', '44.5876',
'-71.9466', '-71.9466',
'No', 'No',
'pepperoni', 'pepperoni',
'dog', 'dog',
'No', 'No',
'night owl', 'night owl',
'Yes', 'Yes',
'No role modelz by J. Cole'], 'No role modelz by J. Cole'],
['COMP SCI 220:LAB325, COMP SCI 220:LEC002', ['COMP SCI 220:LAB325, COMP SCI 220:LEC002',
'LEC002', 'LEC002',
'18', '18',
'Engineering: Mechanical', 'Engineering: Mechanical',
'.', '.',
'.', '.',
'53,706', '53,706',
'40.7128', '40.7128',
'-74.006', '-74.006',
'Maybe', 'Maybe',
'none (just cheese)', 'none (just cheese)',
'dog', 'dog',
'No', 'No',
'night owl', 'night owl',
'Yes', 'Yes',
'\xa0biggest bird']] '\xa0biggest bird']]
%% Cell type:markdown id:4267fe3e tags: %% Cell type:markdown id:4267fe3e tags:
### What is the Sleep habit for the 2nd student? ### What is the Sleep habit for the 2nd student?
%% Cell type:code id:4b8dbe8b tags: %% Cell type:code id:4b8dbe8b tags:
``` python ``` python
cs220_data[1][9] # bad example: we hard-coded the column index cs220_data[1][9] # bad example: we hard-coded the column index
``` ```
%% Output %% Output
'No' 'No'
%% Cell type:markdown id:4f125240 tags: %% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work. What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable. Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags: %% Cell type:code id:f2e52e06 tags:
``` python ``` python
cs220_data[1][cs220_header.index("Sleep habit")] cs220_data[1][cs220_header.index("Sleep habit")]
``` ```
%% Output %% Output
'night owl' 'night owl'
%% Cell type:markdown id:5d298a4c tags: %% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student? ### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags: %% Cell type:code id:3617b3de tags:
``` python ``` python
cs220_data[3][cs220_header.index("Lecture")] cs220_data[3][cs220_header.index("Lecture")]
``` ```
%% Output %% Output
'LEC001' 'LEC001'
%% Cell type:markdown id:059de363 tags: %% Cell type:markdown id:059de363 tags:
### Create a list containing Age of all students 10 years from now ### Create a list containing Age of all students 10 years from now
%% Cell type:code id:45909f22 tags: %% Cell type:code id:45909f22 tags:
``` python ``` python
ages_in_ten_years = [] ages_in_ten_years = []
for row in cs220_data: for row in cs220_data:
age = row[cs220_header.index("Age")] age = row[cs220_header.index("Age")]
if age == '': if age == '':
continue continue
age = int(age) age = int(age)
ages_in_ten_years.append(age + 10) ages_in_ten_years.append(age + 10)
ages_in_ten_years[:3] ages_in_ten_years[:3]
``` ```
%% Output %% Output
[29, 28, 32] [29, 28, 32]
%% Cell type:markdown id:8e18663d tags: %% Cell type:markdown id:8e18663d tags:
### cell function ### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions - It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags: %% Cell type:code id:bba90038 tags:
``` python ``` python
def cell(row_idx, col_name): def cell(row_idx, col_name):
""" """
Returns the data value (cell) corresponding to the row index and Returns the data value (cell) corresponding to the row index and
the column name of a CSV file. the column name of a CSV file.
""" """
# TODO: get the index of col_name # TODO: get the index of col_name
col_idx = cs220_header.index(col_name) col_idx = cs220_header.index(col_name)
# TODO: get the value of cs220_data at the specified cell # TODO: get the value of cs220_data at the specified cell
val = cs220_data[row_idx][col_idx] val = cs220_data[row_idx][col_idx]
# TODO: handle missing values, by returning None # TODO: handle missing values, by returning None
if val == '': if val == '':
return None return None
# TODO: handle type conversions # TODO: handle type conversions
if col_name in ["Age",]: if col_name in ["Age",]:
return int(val) return int(val)
elif col_name in ['Latitude', 'Longitude']: elif col_name in ['Latitude', 'Longitude']:
return float(val) return float(val)
return val return val
``` ```
%% Cell type:markdown id:b7c8e726 tags: %% Cell type:markdown id:b7c8e726 tags:
### Find average age per lecture. ### Find average age per lecture.
%% Cell type:code id:f0a05e42 tags: %% Cell type:code id:f0a05e42 tags:
``` python ``` python
# TODO: initialize 6 lists for the 6 lectures # TODO: initialize 6 lists for the 6 lectures
lec1_ages = [] lec1_ages = []
lec2_ages = [] lec2_ages = []
lec3_ages = [] lec3_ages = []
lec4_ages = [] lec4_ages = []
# Iterate over the data and populate the lists # Iterate over the data and populate the lists
for row_idx in range(len(cs220_data)): for row_idx in range(len(cs220_data)):
age = cell(row_idx, "Age") age = cell(row_idx, "Age")
if age != None and age > 0 and age < 125: if age != None and age > 0 and age < 125:
lecture = cell(row_idx, "Lecture") lecture = cell(row_idx, "Lecture")
if lecture == "LEC001": if lecture == "LEC001":
lec1_ages.append(age) lec1_ages.append(age)
elif lecture == "LEC002": elif lecture == "LEC002":
lec2_ages.append(age) lec2_ages.append(age)
elif lecture == "LEC003": elif lecture == "LEC003":
lec3_ages.append(age) lec3_ages.append(age)
elif lecture == "LEC004": elif lecture == "LEC004":
lec4_ages.append(age) lec4_ages.append(age)
# TODO: compute average age of each lecture # TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2)) print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2)) print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2)) print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2)) print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
``` ```
%% Output %% Output
LEC001 average student age: 19.71 LEC001 average student age: 19.71
LEC002 average student age: 20.24 LEC002 average student age: 20.24
LEC003 average student age: 19.41 LEC003 average student age: 19.41
LEC004 average student age: 19.43 LEC004 average student age: 19.43
%% Cell type:markdown id:64fd0945 tags: %% Cell type:markdown id:64fd0945 tags:
### Find all unique zip codes. ### Find all unique zip codes.
%% Cell type:code id:c28e77ce tags: %% Cell type:code id:c28e77ce tags:
``` python ``` python
# TODO: initialize list of keep track of zip codes # TODO: initialize list of keep track of zip codes
zip_codes = [] zip_codes = []
for row_idx in range(len(cs220_data)): for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code") zip_code = cell(row_idx, "Zip Code")
if zip_code != None: if zip_code != None:
zip_codes.append(zip_code) zip_codes.append(zip_code)
list(set(zip_codes)) list(set(zip_codes))
``` ```
%% Output %% Output
['94,596', ['94,596',
'53,590', '53,590',
'53,121', '53,121',
'53,589', '53,589',
'53,527', '53,527',
'95,030', '95,030',
'53,572', '53,572',
'53,706', '53,706',
'53,705', '53,705',
'53,726', '53,726',
'60,517', '60,517',
'53,703', '53,703',
'53,704', '53,704',
'53,715', '53,715',
'53,562', '53,562',
'53,150', '53,150',
'52,703', '52,703',
'53,716', '53,716',
'53,521', '53,521',
'53,713', '53,713',
'53,558', '53,558',
'98,607', '98,607',
'533,706', '533,706',
'-53,703', '-53,703',
'1,520', '1,520',
'93,703', '93,703',
'8,820', '8,820',
'53.706', '53.706',
'53,711', '53,711',
'51,735', '51,735',
'50,376', '50,376',
'53,051', '53,051',
'54,703', '54,703',
'53,563', '53,563',
'57,303', '57,303',
'53,719', '53,719',
'53,575', '53,575',
'53.715', '53.715',
'56,511', '56,511',
'53,701'] '53,701']
%% Cell type:markdown id:31a381fe tags: %% Cell type:markdown id:31a381fe tags:
## Self-practice ## Self-practice
%% Cell type:markdown id:8ac26620 tags: %% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner? ### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags: %% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"? ### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags: %% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major? ### How old is the oldest basil/spinach-loving Business major?
......
%% Cell type:markdown id:47524973 tags:
# Announcements - Tabular Data, CSV Files
* Download ALL files for today's lecture
* P3 Last day to request regrades
* P4 Last day to turn in late
* P5 Due today
* Q4 Released tonight at 5 pm
* Exam
* We estimate results will be available Thursday
* Quiz 4 released tonight : due Friday
* Office Hours - long lines?
* Check the calendar
* Find a partner!
* Be patient on Tuesdays and Wednesdays
%% Cell type:markdown id:72348536 tags: %% Cell type:markdown id:72348536 tags:
# Comma Separated Values (CSV) # Comma Separated Values (CSV)
%% Cell type:code id:ba562f5e tags: %% Cell type:code id:ba562f5e tags:
``` python ``` python
import csv import csv
``` ```
%% Cell type:markdown id:ddcf7595 tags: %% Cell type:markdown id:ddcf7595 tags:
### Warmup 1 ### Warmup 1
- Use `list` type's `index` method to lookup index of "ice cream" - Use `list` type's `index` method to lookup index of "ice cream"
- Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp - Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp
%% Cell type:code id:a0fecc18 tags: %% Cell type:code id:a0fecc18 tags:
``` python ``` python
dairy = ["milk", "ice cream", "cheese", "yogurt" ] dairy = ["milk", "ice cream", "cheese", "yogurt" ]
print() print()
``` ```
%% Cell type:markdown id:a1a4e2e7 tags: %% Cell type:markdown id:a1a4e2e7 tags:
### Warmup 2 ### Warmup 2
Use `in` operator to complete the condition to check if food_shelf contains any dairy products. Use `in` operator to complete the condition to check if food_shelf contains any dairy products.
%% Cell type:code id:eae06501 tags: %% Cell type:code id:eae06501 tags:
``` python ``` python
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"] food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf: for item in food_shelf:
if ???: if ???:
print(item, "is dairy") print(item, "is dairy")
else: else:
print(item, "is not dairy") print(item, "is not dairy")
``` ```
%% Cell type:markdown id:8a5f548e tags: %% Cell type:markdown id:8a5f548e tags:
## Warmup 3 ## Warmup 3
Determine median of a list. Determine median of a list.
Examples: Examples:
- Median of [1, 2, 3, 4, 5] is: 3 - Median of [1, 2, 3, 4, 5] is: 3
- Median of [1, 2, 3, 4, 5, 6] is: 3.5 - Median of [1, 2, 3, 4, 5, 6] is: 3.5
%% Cell type:code id:2f610ffe tags: %% Cell type:code id:2f610ffe tags:
``` python ``` python
def median(some_items): def median(some_items):
""" """
Returns median of a list passed as argument Returns median of a list passed as argument
""" """
pass pass
``` ```
%% Cell type:code id:e9340eaa tags: %% Cell type:code id:e9340eaa tags:
``` python ``` python
nums = [5, 4, 3, 2, 1] nums = [5, 4, 3, 2, 1]
print(nums, median(nums)) print(nums, median(nums))
nums = [6, 5, 4, 3, 2, 1] nums = [6, 5, 4, 3, 2, 1]
print(nums, median(nums)) print(nums, median(nums))
vals = ["A", "C", "B"] vals = ["A", "C", "B"]
print(vals, median(vals)) print(vals, median(vals))
vals = ["A", "C", "B", "D"] vals = ["A", "C", "B", "D"]
# print(nums, median(vals)) # does not work due to TypeError # print(nums, median(vals)) # does not work due to TypeError
``` ```
%% Cell type:markdown id:a9d5085c tags: %% Cell type:markdown id:a9d5085c tags:
## Learning Objectives: ## Learning Objectives:
- Open an Excel file and export it to a Comma Separated Value file. - Open an Excel file and export it to a Comma Separated Value file.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet. - Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists. - Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists. - Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists. - Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:code id:9d936c1c tags: %% Cell type:code id:9d936c1c tags:
``` python ``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/ # inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename): def process_csv(filename):
# open the file, its a text file utf-8 # open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8") example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object # prepare it for reading as a CSV object
example_reader = csv.reader(example_file) example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists # use the built-in list function to convert this into a list of lists
example_data = list(example_reader) example_data = list(example_reader)
# close the file to tidy up our workspace # close the file to tidy up our workspace
example_file.close() example_file.close()
# return the list of lists # return the list of lists
return example_data return example_data
``` ```
%% Cell type:markdown id:89621c98 tags: %% Cell type:markdown id:89621c98 tags:
### Student Information Survey data ### Student Information Survey data
%% Cell type:code id:d3c252b4 tags: %% Cell type:code id:d3c252b4 tags:
``` python ``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv # TODO: call the process_csv function and store the list of lists in cs220_csv
``` ```
%% Cell type:code id:5838ae5f tags: %% Cell type:code id:5838ae5f tags:
``` python ``` python
# Store the header row into cs220_header, using indexing # Store the header row into cs220_header, using indexing
cs220_header = ??? cs220_header = ???
cs220_header cs220_header
``` ```
%% Cell type:code id:66fda88d tags: %% Cell type:code id:66fda88d tags:
``` python ``` python
# TODO: Store all of the data rows into cs220_data, using slicing # TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = ??? cs220_data = ???
# TODO: use slicing to display top 3 rows data # TODO: use slicing to display top 3 rows data
cs220_data[:3] cs220_data[:3]
``` ```
%% Cell type:markdown id:4267fe3e tags: %% Cell type:markdown id:4267fe3e tags:
### What is the Sleep habit for the 2nd student? ### What is the Sleep habit for the 2nd student?
%% Cell type:code id:4b8dbe8b tags: %% Cell type:code id:4b8dbe8b tags:
``` python ``` python
# bad example: we hard-coded the column index # bad example: we hard-coded the column index
``` ```
%% Cell type:markdown id:4f125240 tags: %% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work. What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable. Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags: %% Cell type:code id:f2e52e06 tags:
``` python ``` python
``` ```
%% Cell type:markdown id:5d298a4c tags: %% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student? ### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags: %% Cell type:code id:3617b3de tags:
``` python ``` python
``` ```
%% Cell type:markdown id:059de363 tags: %% Cell type:markdown id:059de363 tags:
### Create a list containing Age of all students 10 years from now ### Create a list containing Age of all students 10 years from now
%% Cell type:code id:45909f22 tags: %% Cell type:code id:45909f22 tags:
``` python ``` python
``` ```
%% Cell type:markdown id:8e18663d tags: %% Cell type:markdown id:8e18663d tags:
### cell function ### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions - It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags: %% Cell type:code id:bba90038 tags:
``` python ``` python
def cell(row_idx, col_name): def cell(row_idx, col_name):
""" """
Returns the data value (cell) corresponding to the row index and Returns the data value (cell) corresponding to the row index and
the column name of a CSV file. the column name of a CSV file.
""" """
# TODO: get the index of col_name # TODO: get the index of col_name
# TODO: get the value of cs220_data at the specified cell # TODO: get the value of cs220_data at the specified cell
# TODO: handle missing values, by returning None # TODO: handle missing values, by returning None
# TODO: handle type conversions # TODO: handle type conversions
``` ```
%% Cell type:markdown id:b7c8e726 tags: %% Cell type:markdown id:b7c8e726 tags:
### Find average age per lecture. ### Find average age per lecture.
%% Cell type:code id:f0a05e42 tags: %% Cell type:code id:f0a05e42 tags:
``` python ``` python
# TODO: initialize 6 lists for the 6 lectures # TODO: initialize 6 lists for the 6 lectures
# Iterate over the data and populate the lists # Iterate over the data and populate the lists
# TODO: compute average age of each lecture # TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2)) print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2)) print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2)) print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2)) print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
``` ```
%% Cell type:markdown id:64fd0945 tags: %% Cell type:markdown id:64fd0945 tags:
### Find all unique zip codes. ### Find all unique zip codes.
%% Cell type:code id:c28e77ce tags: %% Cell type:code id:c28e77ce tags:
``` python ``` python
# TODO: initialize list of keep track of zip codes # TODO: initialize list of keep track of zip codes
zip_codes = [] zip_codes = []
for row_idx in range(len(cs220_data)): for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code") zip_code = cell(row_idx, "Zip Code")
if zip_code != None: if zip_code != None:
zip_codes.append(zip_code) zip_codes.append(zip_code)
zip_codes # How do we get the unique values? zip_codes # How do we get the unique values?
``` ```
%% Cell type:markdown id:31a381fe tags: %% Cell type:markdown id:31a381fe tags:
## Self-practice ## Self-practice
%% Cell type:markdown id:8ac26620 tags: %% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner? ### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags: %% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"? ### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags: %% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major? ### How old is the oldest basil/spinach-loving Business major?
......
%% Cell type:markdown id:7778f7e0 tags:
# Announcements - Tabular Data, CSV Files
* Download ALL files for today's lecture
* P3 Last day to request regrades
* P4 Last day to turn in late
* P5 Due today
* Q4 Released tonight at 5 pm
* Exam
* We estimate results will be available Thursday
* Quiz 4 released tonight : due Friday
* Office Hours - long lines?
* Check the calendar
* Find a partner!
* Be patient on Tuesdays and Wednesdays
%% Cell type:markdown id:72348536 tags: %% Cell type:markdown id:72348536 tags:
# Comma Separated Values (CSV) # Comma Separated Values (CSV)
%% Cell type:code id:ba562f5e tags: %% Cell type:code id:ba562f5e tags:
``` python ``` python
import csv import csv
``` ```
%% Cell type:markdown id:ddcf7595 tags: %% Cell type:markdown id:ddcf7595 tags:
### Warmup 1 ### Warmup 1
- Use `list` type's `index` method to lookup index of "ice cream" - Use `list` type's `index` method to lookup index of "ice cream"
- Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp - Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp
%% Cell type:code id:a0fecc18 tags: %% Cell type:code id:a0fecc18 tags:
``` python ``` python
dairy = ["milk", "ice cream", "cheese", "yogurt" ] dairy = ["milk", "ice cream", "cheese", "yogurt" ]
print() print()
``` ```
%% Cell type:markdown id:a1a4e2e7 tags: %% Cell type:markdown id:a1a4e2e7 tags:
### Warmup 2 ### Warmup 2
Use `in` operator to complete the condition to check if food_shelf contains any dairy products. Use `in` operator to complete the condition to check if food_shelf contains any dairy products.
%% Cell type:code id:eae06501 tags: %% Cell type:code id:eae06501 tags:
``` python ``` python
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"] food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf: for item in food_shelf:
if ???: if ???:
print(item, "is dairy") print(item, "is dairy")
else: else:
print(item, "is not dairy") print(item, "is not dairy")
``` ```
%% Cell type:markdown id:8a5f548e tags: %% Cell type:markdown id:8a5f548e tags:
## Warmup 3 ## Warmup 3
Determine median of a list. Determine median of a list.
Examples: Examples:
- Median of [1, 2, 3, 4, 5] is: 3 - Median of [1, 2, 3, 4, 5] is: 3
- Median of [1, 2, 3, 4, 5, 6] is: 3.5 - Median of [1, 2, 3, 4, 5, 6] is: 3.5
%% Cell type:code id:2f610ffe tags: %% Cell type:code id:2f610ffe tags:
``` python ``` python
def median(some_items): def median(some_items):
""" """
Returns median of a list passed as argument Returns median of a list passed as argument
""" """
pass pass
``` ```
%% Cell type:code id:e9340eaa tags: %% Cell type:code id:e9340eaa tags:
``` python ``` python
nums = [5, 4, 3, 2, 1] nums = [5, 4, 3, 2, 1]
print(nums, median(nums)) print(nums, median(nums))
nums = [6, 5, 4, 3, 2, 1] nums = [6, 5, 4, 3, 2, 1]
print(nums, median(nums)) print(nums, median(nums))
vals = ["A", "C", "B"] vals = ["A", "C", "B"]
print(vals, median(vals)) print(vals, median(vals))
vals = ["A", "C", "B", "D"] vals = ["A", "C", "B", "D"]
# print(nums, median(vals)) # does not work due to TypeError # print(nums, median(vals)) # does not work due to TypeError
``` ```
%% Cell type:markdown id:a9d5085c tags: %% Cell type:markdown id:a9d5085c tags:
## Learning Objectives: ## Learning Objectives:
- Open an Excel file and export it to a Comma Separated Value file. - Open an Excel file and export it to a Comma Separated Value file.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet. - Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists. - Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists. - Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists. - Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:code id:9d936c1c tags: %% Cell type:code id:9d936c1c tags:
``` python ``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/ # inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename): def process_csv(filename):
# open the file, its a text file utf-8 # open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8") example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object # prepare it for reading as a CSV object
example_reader = csv.reader(example_file) example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists # use the built-in list function to convert this into a list of lists
example_data = list(example_reader) example_data = list(example_reader)
# close the file to tidy up our workspace # close the file to tidy up our workspace
example_file.close() example_file.close()
# return the list of lists # return the list of lists
return example_data return example_data
``` ```
%% Cell type:markdown id:89621c98 tags: %% Cell type:markdown id:89621c98 tags:
### Student Information Survey data ### Student Information Survey data
%% Cell type:code id:d3c252b4 tags: %% Cell type:code id:d3c252b4 tags:
``` python ``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv # TODO: call the process_csv function and store the list of lists in cs220_csv
``` ```
%% Cell type:code id:5838ae5f tags: %% Cell type:code id:5838ae5f tags:
``` python ``` python
# Store the header row into cs220_header, using indexing # Store the header row into cs220_header, using indexing
cs220_header = ??? cs220_header = ???
cs220_header cs220_header
``` ```
%% Cell type:code id:66fda88d tags: %% Cell type:code id:66fda88d tags:
``` python ``` python
# TODO: Store all of the data rows into cs220_data, using slicing # TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = ??? cs220_data = ???
# TODO: use slicing to display top 3 rows data # TODO: use slicing to display top 3 rows data
cs220_data[:3] cs220_data[:3]
``` ```
%% Cell type:markdown id:4267fe3e tags: %% Cell type:markdown id:4267fe3e tags:
### What is the Sleep habit for the 2nd student? ### What is the Sleep habit for the 2nd student?
%% Cell type:code id:4b8dbe8b tags: %% Cell type:code id:4b8dbe8b tags:
``` python ``` python
# bad example: we hard-coded the column index # bad example: we hard-coded the column index
``` ```
%% Cell type:markdown id:4f125240 tags: %% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work. What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable. Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags: %% Cell type:code id:f2e52e06 tags:
``` python ``` python
``` ```
%% Cell type:markdown id:5d298a4c tags: %% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student? ### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags: %% Cell type:code id:3617b3de tags:
``` python ``` python
``` ```
%% Cell type:markdown id:059de363 tags: %% Cell type:markdown id:059de363 tags:
### Create a list containing Age of all students 10 years from now ### Create a list containing Age of all students 10 years from now
%% Cell type:code id:45909f22 tags: %% Cell type:code id:45909f22 tags:
``` python ``` python
``` ```
%% Cell type:markdown id:8e18663d tags: %% Cell type:markdown id:8e18663d tags:
### cell function ### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions - It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags: %% Cell type:code id:bba90038 tags:
``` python ``` python
def cell(row_idx, col_name): def cell(row_idx, col_name):
""" """
Returns the data value (cell) corresponding to the row index and Returns the data value (cell) corresponding to the row index and
the column name of a CSV file. the column name of a CSV file.
""" """
# TODO: get the index of col_name # TODO: get the index of col_name
# TODO: get the value of cs220_data at the specified cell # TODO: get the value of cs220_data at the specified cell
# TODO: handle missing values, by returning None # TODO: handle missing values, by returning None
# TODO: handle type conversions # TODO: handle type conversions
``` ```
%% Cell type:markdown id:b7c8e726 tags: %% Cell type:markdown id:b7c8e726 tags:
### Find average age per lecture. ### Find average age per lecture.
%% Cell type:code id:f0a05e42 tags: %% Cell type:code id:f0a05e42 tags:
``` python ``` python
# TODO: initialize 6 lists for the 6 lectures # TODO: initialize 6 lists for the 6 lectures
# Iterate over the data and populate the lists # Iterate over the data and populate the lists
# TODO: compute average age of each lecture # TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2)) print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2)) print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2)) print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2)) print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
``` ```
%% Cell type:markdown id:64fd0945 tags: %% Cell type:markdown id:64fd0945 tags:
### Find all unique zip codes. ### Find all unique zip codes.
%% Cell type:code id:c28e77ce tags: %% Cell type:code id:c28e77ce tags:
``` python ``` python
# TODO: initialize list of keep track of zip codes # TODO: initialize list of keep track of zip codes
zip_codes = [] zip_codes = []
for row_idx in range(len(cs220_data)): for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code") zip_code = cell(row_idx, "Zip Code")
if zip_code != None: if zip_code != None:
zip_codes.append(zip_code) zip_codes.append(zip_code)
zip_codes # How do we get the unique values? zip_codes # How do we get the unique values?
``` ```
%% Cell type:markdown id:31a381fe tags: %% Cell type:markdown id:31a381fe tags:
## Self-practice ## Self-practice
%% Cell type:markdown id:8ac26620 tags: %% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner? ### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags: %% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"? ### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags: %% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major? ### How old is the oldest basil/spinach-loving Business major?
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment