Skip to content
Snippets Groups Projects
Commit 3775be37 authored by gsingh58's avatar gsingh58
Browse files

Lec15 updated

parent a8f64c53
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:9330bd96 tags:
# Announcements - Tabular Data, CSV Files
* Download ALL files for today's lecture
* P3 Last day to request regrades
* P4 Last day to turn in late
* P5 Due today
* Q4 Released tonight at 5 pm
* Exam
* We estimate results will be available Thursday
* Quiz 4 released tonight : due Friday
* Office Hours - long lines?
* Check the calendar
* Find a partner!
* Be patient on Tuesdays and Wednesdays
%% Cell type:markdown id:72348536 tags:
# Comma Separated Values (CSV)
%% Cell type:code id:ba562f5e tags:
``` python
import csv
```
%% Cell type:markdown id:ddcf7595 tags:
### Warmup 1
- Use `list` type's `index` method to lookup index of "ice cream"
- Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp
%% Cell type:code id:a0fecc18 tags:
``` python
dairy = ["milk", "ice cream", "cheese", "yogurt" ]
print(dairy.index("ice cream"))
# print(dairy.index("paneer")) # doesn't work due to ValueError (runtime error)
```
%% Output
1
%% Cell type:markdown id:a1a4e2e7 tags:
### Warmup 2
Use `in` operator to complete the condition to check if food_shelf contains any dairy products.
%% Cell type:code id:eae06501 tags:
``` python
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf:
if item.lower() in dairy:
print(item, "is dairy")
else:
print(item, "is not dairy")
```
%% Output
peanut butter is not dairy
milk is dairy
bread is not dairy
cheese is dairy
YOGURT is dairy
%% Cell type:markdown id:75811c5d tags:
## Warmup 3
Determine median of a list.
Examples:
- Median of [1, 2, 3, 4, 5] is: 3
- Median of [1, 2, 3, 4, 5, 6] is: 3.5
%% Cell type:code id:97b3f76f tags:
``` python
def median(some_items):
"""
Returns median of a list passed as argument
"""
some_items.sort()
n = len(some_items)
if n % 2 == 1:
return some_items[n // 2]
else:
first_middle = some_items[n//2 - 1]
second_middle = some_items[n // 2]
median = (first_middle + second_middle) / 2
return median
```
%% Cell type:code id:a4d704c0 tags:
``` python
nums = [5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
nums = [6, 5, 4, 3, 2, 1]
print("Median of", nums, "is" , median(nums))
vals = ["A", "C", "B"]
print("Median of", vals, "is" , median(vals))
vals = ["A", "C", "B", "D"]
# print("Median of", nums, "is" , median(vals)) # does not work due to TypeError
```
%% Output
Median of [1, 2, 3, 4, 5] is 3
Median of [1, 2, 3, 4, 5, 6] is 3.5
Median of ['A', 'B', 'C'] is B
%% Cell type:markdown id:a9d5085c tags:
## Learning Objectives:
- Open an Excel file and export it to a Comma Separated Value file.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:code id:9d936c1c tags:
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
# open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists
example_data = list(example_reader)
# close the file to tidy up our workspace
example_file.close()
# return the list of lists
return example_data
```
%% Cell type:markdown id:89621c98 tags:
### Student Information Survey data
%% Cell type:code id:d3c252b4 tags:
``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv("cs220_survey_data.csv")
```
%% Cell type:code id:5838ae5f tags:
``` python
# Store the header row into cs220_header, using indexing
cs220_header = cs220_csv[0]
cs220_header
```
%% Output
['section',
'Lecture',
'Age',
'Primary major',
'Other Primary Major',
'Other majors',
'Zip Code',
'Latitude',
'Longitude',
'Pet owner',
'Pizza topping',
'Pet owner',
'Runner',
'Sleep habit',
'Procrastinator',
'Song']
%% Cell type:code id:66fda88d tags:
``` python
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = cs220_csv[1:]
# TODO: use slicing to display top 3 rows data
cs220_data[:3]
```
%% Output
[['COMP SCI 220:LAB345, COMP SCI 220:LEC004',
'LEC004',
'',
'Other (please provide details below).',
'',
'',
'53,706',
'22.5726',
'88.3639',
'No',
'pepperoni',
'dog',
'No',
'night owl',
'Yes',
'Island in the Sun - Harry Belafonte'],
['COMP SCI 220:LEC003, COMP SCI 220:LAB332',
'LEC001',
'19',
'Engineering: Mechanical',
'',
'',
'53,703',
'44.5876',
'-71.9466',
'No',
'pepperoni',
'dog',
'No',
'night owl',
'Yes',
'No role modelz by J. Cole'],
['COMP SCI 220:LAB325, COMP SCI 220:LEC002',
'LEC002',
'18',
'Engineering: Mechanical',
'.',
'.',
'53,706',
'40.7128',
'-74.006',
'Maybe',
'none (just cheese)',
'dog',
'No',
'night owl',
'Yes',
'\xa0biggest bird']]
%% Cell type:markdown id:4267fe3e tags:
### What is the Sleep habit for the 2nd student?
%% Cell type:code id:4b8dbe8b tags:
``` python
cs220_data[1][9] # bad example: we hard-coded the column index
```
%% Output
'No'
%% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags:
``` python
cs220_data[1][cs220_header.index("Sleep habit")]
```
%% Output
'night owl'
%% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags:
``` python
cs220_data[3][cs220_header.index("Lecture")]
```
%% Output
'LEC001'
%% Cell type:markdown id:059de363 tags:
### Create a list containing Age of all students 10 years from now
%% Cell type:code id:45909f22 tags:
``` python
ages_in_ten_years = []
for row in cs220_data:
age = row[cs220_header.index("Age")]
if age == '':
continue
age = int(age)
ages_in_ten_years.append(age + 10)
ages_in_ten_years[:3]
```
%% Output
[29, 28, 32]
%% Cell type:markdown id:8e18663d tags:
### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags:
``` python
def cell(row_idx, col_name):
"""
Returns the data value (cell) corresponding to the row index and
the column name of a CSV file.
"""
# TODO: get the index of col_name
col_idx = cs220_header.index(col_name)
# TODO: get the value of cs220_data at the specified cell
val = cs220_data[row_idx][col_idx]
# TODO: handle missing values, by returning None
if val == '':
return None
# TODO: handle type conversions
if col_name in ["Age",]:
return int(val)
elif col_name in ['Latitude', 'Longitude']:
return float(val)
return val
```
%% Cell type:markdown id:b7c8e726 tags:
### Find average age per lecture.
%% Cell type:code id:f0a05e42 tags:
``` python
# TODO: initialize 6 lists for the 6 lectures
lec1_ages = []
lec2_ages = []
lec3_ages = []
lec4_ages = []
# Iterate over the data and populate the lists
for row_idx in range(len(cs220_data)):
age = cell(row_idx, "Age")
if age != None and age > 0 and age < 125:
lecture = cell(row_idx, "Lecture")
if lecture == "LEC001":
lec1_ages.append(age)
elif lecture == "LEC002":
lec2_ages.append(age)
elif lecture == "LEC003":
lec3_ages.append(age)
elif lecture == "LEC004":
lec4_ages.append(age)
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
```
%% Output
LEC001 average student age: 19.71
LEC002 average student age: 20.24
LEC003 average student age: 19.41
LEC004 average student age: 19.43
%% Cell type:markdown id:64fd0945 tags:
### Find all unique zip codes.
%% Cell type:code id:c28e77ce tags:
``` python
# TODO: initialize list of keep track of zip codes
zip_codes = []
for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code")
if zip_code != None:
zip_codes.append(zip_code)
list(set(zip_codes))
```
%% Output
['94,596',
'53,590',
'53,121',
'53,589',
'53,527',
'95,030',
'53,572',
'53,706',
'53,705',
'53,726',
'60,517',
'53,703',
'53,704',
'53,715',
'53,562',
'53,150',
'52,703',
'53,716',
'53,521',
'53,713',
'53,558',
'98,607',
'533,706',
'-53,703',
'1,520',
'93,703',
'8,820',
'53.706',
'53,711',
'51,735',
'50,376',
'53,051',
'54,703',
'53,563',
'57,303',
'53,719',
'53,575',
'53.715',
'56,511',
'53,701']
%% Cell type:markdown id:31a381fe tags:
## Self-practice
%% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major?
......
%% Cell type:markdown id:47524973 tags:
# Announcements - Tabular Data, CSV Files
* Download ALL files for today's lecture
* P3 Last day to request regrades
* P4 Last day to turn in late
* P5 Due today
* Q4 Released tonight at 5 pm
* Exam
* We estimate results will be available Thursday
* Quiz 4 released tonight : due Friday
* Office Hours - long lines?
* Check the calendar
* Find a partner!
* Be patient on Tuesdays and Wednesdays
%% Cell type:markdown id:72348536 tags:
# Comma Separated Values (CSV)
%% Cell type:code id:ba562f5e tags:
``` python
import csv
```
%% Cell type:markdown id:ddcf7595 tags:
### Warmup 1
- Use `list` type's `index` method to lookup index of "ice cream"
- Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp
%% Cell type:code id:a0fecc18 tags:
``` python
dairy = ["milk", "ice cream", "cheese", "yogurt" ]
print()
```
%% Cell type:markdown id:a1a4e2e7 tags:
### Warmup 2
Use `in` operator to complete the condition to check if food_shelf contains any dairy products.
%% Cell type:code id:eae06501 tags:
``` python
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf:
if ???:
print(item, "is dairy")
else:
print(item, "is not dairy")
```
%% Cell type:markdown id:8a5f548e tags:
## Warmup 3
Determine median of a list.
Examples:
- Median of [1, 2, 3, 4, 5] is: 3
- Median of [1, 2, 3, 4, 5, 6] is: 3.5
%% Cell type:code id:2f610ffe tags:
``` python
def median(some_items):
"""
Returns median of a list passed as argument
"""
pass
```
%% Cell type:code id:e9340eaa tags:
``` python
nums = [5, 4, 3, 2, 1]
print(nums, median(nums))
nums = [6, 5, 4, 3, 2, 1]
print(nums, median(nums))
vals = ["A", "C", "B"]
print(vals, median(vals))
vals = ["A", "C", "B", "D"]
# print(nums, median(vals)) # does not work due to TypeError
```
%% Cell type:markdown id:a9d5085c tags:
## Learning Objectives:
- Open an Excel file and export it to a Comma Separated Value file.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:code id:9d936c1c tags:
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
# open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists
example_data = list(example_reader)
# close the file to tidy up our workspace
example_file.close()
# return the list of lists
return example_data
```
%% Cell type:markdown id:89621c98 tags:
### Student Information Survey data
%% Cell type:code id:d3c252b4 tags:
``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv
```
%% Cell type:code id:5838ae5f tags:
``` python
# Store the header row into cs220_header, using indexing
cs220_header = ???
cs220_header
```
%% Cell type:code id:66fda88d tags:
``` python
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = ???
# TODO: use slicing to display top 3 rows data
cs220_data[:3]
```
%% Cell type:markdown id:4267fe3e tags:
### What is the Sleep habit for the 2nd student?
%% Cell type:code id:4b8dbe8b tags:
``` python
# bad example: we hard-coded the column index
```
%% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags:
``` python
```
%% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags:
``` python
```
%% Cell type:markdown id:059de363 tags:
### Create a list containing Age of all students 10 years from now
%% Cell type:code id:45909f22 tags:
``` python
```
%% Cell type:markdown id:8e18663d tags:
### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags:
``` python
def cell(row_idx, col_name):
"""
Returns the data value (cell) corresponding to the row index and
the column name of a CSV file.
"""
# TODO: get the index of col_name
# TODO: get the value of cs220_data at the specified cell
# TODO: handle missing values, by returning None
# TODO: handle type conversions
```
%% Cell type:markdown id:b7c8e726 tags:
### Find average age per lecture.
%% Cell type:code id:f0a05e42 tags:
``` python
# TODO: initialize 6 lists for the 6 lectures
# Iterate over the data and populate the lists
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
```
%% Cell type:markdown id:64fd0945 tags:
### Find all unique zip codes.
%% Cell type:code id:c28e77ce tags:
``` python
# TODO: initialize list of keep track of zip codes
zip_codes = []
for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code")
if zip_code != None:
zip_codes.append(zip_code)
zip_codes # How do we get the unique values?
```
%% Cell type:markdown id:31a381fe tags:
## Self-practice
%% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major?
......
%% Cell type:markdown id:7778f7e0 tags:
# Announcements - Tabular Data, CSV Files
* Download ALL files for today's lecture
* P3 Last day to request regrades
* P4 Last day to turn in late
* P5 Due today
* Q4 Released tonight at 5 pm
* Exam
* We estimate results will be available Thursday
* Quiz 4 released tonight : due Friday
* Office Hours - long lines?
* Check the calendar
* Find a partner!
* Be patient on Tuesdays and Wednesdays
%% Cell type:markdown id:72348536 tags:
# Comma Separated Values (CSV)
%% Cell type:code id:ba562f5e tags:
``` python
import csv
```
%% Cell type:markdown id:ddcf7595 tags:
### Warmup 1
- Use `list` type's `index` method to lookup index of "ice cream"
- Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp
%% Cell type:code id:a0fecc18 tags:
``` python
dairy = ["milk", "ice cream", "cheese", "yogurt" ]
print()
```
%% Cell type:markdown id:a1a4e2e7 tags:
### Warmup 2
Use `in` operator to complete the condition to check if food_shelf contains any dairy products.
%% Cell type:code id:eae06501 tags:
``` python
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf:
if ???:
print(item, "is dairy")
else:
print(item, "is not dairy")
```
%% Cell type:markdown id:8a5f548e tags:
## Warmup 3
Determine median of a list.
Examples:
- Median of [1, 2, 3, 4, 5] is: 3
- Median of [1, 2, 3, 4, 5, 6] is: 3.5
%% Cell type:code id:2f610ffe tags:
``` python
def median(some_items):
"""
Returns median of a list passed as argument
"""
pass
```
%% Cell type:code id:e9340eaa tags:
``` python
nums = [5, 4, 3, 2, 1]
print(nums, median(nums))
nums = [6, 5, 4, 3, 2, 1]
print(nums, median(nums))
vals = ["A", "C", "B"]
print(vals, median(vals))
vals = ["A", "C", "B", "D"]
# print(nums, median(vals)) # does not work due to TypeError
```
%% Cell type:markdown id:a9d5085c tags:
## Learning Objectives:
- Open an Excel file and export it to a Comma Separated Value file.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:code id:9d936c1c tags:
``` python
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
# open the file, its a text file utf-8
example_file = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
example_reader = csv.reader(example_file)
# use the built-in list function to convert this into a list of lists
example_data = list(example_reader)
# close the file to tidy up our workspace
example_file.close()
# return the list of lists
return example_data
```
%% Cell type:markdown id:89621c98 tags:
### Student Information Survey data
%% Cell type:code id:d3c252b4 tags:
``` python
# TODO: call the process_csv function and store the list of lists in cs220_csv
```
%% Cell type:code id:5838ae5f tags:
``` python
# Store the header row into cs220_header, using indexing
cs220_header = ???
cs220_header
```
%% Cell type:code id:66fda88d tags:
``` python
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = ???
# TODO: use slicing to display top 3 rows data
cs220_data[:3]
```
%% Cell type:markdown id:4267fe3e tags:
### What is the Sleep habit for the 2nd student?
%% Cell type:code id:4b8dbe8b tags:
``` python
# bad example: we hard-coded the column index
```
%% Cell type:markdown id:4f125240 tags:
What if we decided to add a new column before sleeping habit? Your code will no longer work.
Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.
%% Cell type:code id:f2e52e06 tags:
``` python
```
%% Cell type:markdown id:5d298a4c tags:
### What is the Lecture of the 4th student?
%% Cell type:code id:3617b3de tags:
``` python
```
%% Cell type:markdown id:059de363 tags:
### Create a list containing Age of all students 10 years from now
%% Cell type:code id:45909f22 tags:
``` python
```
%% Cell type:markdown id:8e18663d tags:
### cell function
- It would be very helpful to define a cell function, which can handle missing data and type conversions
%% Cell type:code id:bba90038 tags:
``` python
def cell(row_idx, col_name):
"""
Returns the data value (cell) corresponding to the row index and
the column name of a CSV file.
"""
# TODO: get the index of col_name
# TODO: get the value of cs220_data at the specified cell
# TODO: handle missing values, by returning None
# TODO: handle type conversions
```
%% Cell type:markdown id:b7c8e726 tags:
### Find average age per lecture.
%% Cell type:code id:f0a05e42 tags:
``` python
# TODO: initialize 6 lists for the 6 lectures
# Iterate over the data and populate the lists
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
```
%% Cell type:markdown id:64fd0945 tags:
### Find all unique zip codes.
%% Cell type:code id:c28e77ce tags:
``` python
# TODO: initialize list of keep track of zip codes
zip_codes = []
for row_idx in range(len(cs220_data)):
zip_code = cell(row_idx, "Zip Code")
if zip_code != None:
zip_codes.append(zip_code)
zip_codes # How do we get the unique values?
```
%% Cell type:markdown id:31a381fe tags:
## Self-practice
%% Cell type:markdown id:8ac26620 tags:
### How many students are both a procrastinator and a pet owner?
%% Cell type:markdown id:172141ea tags:
### What percentage of 18-year-olds have their major declared as "Other"?
%% Cell type:markdown id:d9a7a2b1 tags:
### How old is the oldest basil/spinach-loving Business major?
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment