Lec15 updated

3775be37 · gsingh58 · a8f64c53 · 3775be37 · 3775be37 · 3775be37
Commit 3775be37 authored 2 years ago by gsingh58
--- a/s23/Gurmail_lecture_notes/15_CSV_Files/lec_15_CSV.ipynb
+++ b/s23/Gurmail_lecture_notes/15_CSV_Files/lec_15_CSV.ipynb
 {
 "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9330bd96",
+   "metadata": {},
+   "source": [
+    "# Announcements - Tabular Data, CSV Files\n",
+    "\n",
+    "* Download ALL files for today's lecture\n",
+    "* P3 Last day to request regrades\n",
+    "* P4 Last day to turn in late\n",
+    "* P5 Due today\n",
+    "* Q4 Released tonight at 5 pm\n",
+    "* Exam\n",
+    "  * We estimate results will be available Thursday\n",
+    "* Quiz 4 released tonight : due Friday\n",
+    "* Office Hours - long lines?\n",
+    "  * Check the calendar\n",
+    "  * Find a partner!\n",
+    "  * Be patient on Tuesdays and Wednesdays"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "72348536",
@@ -655,7 +676,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@@ -669,7 +690,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.8"
+   "version": "3.9.13"
  }
 },
 "nbformat": 4,

+%% Cell type:markdown id:9330bd96 tags:
+
+# Announcements - Tabular Data, CSV Files
+
+* Download ALL files for today's lecture
+* P3 Last day to request regrades
+* P4 Last day to turn in late
+* P5 Due today
+* Q4 Released tonight at 5 pm
+* Exam
+  * We estimate results will be available Thursday
+* Quiz 4 released tonight : due Friday
+* Office Hours - long lines?
+  * Check the calendar
+  * Find a partner!
+  * Be patient on Tuesdays and Wednesdays
+
 %% Cell type:markdown id:72348536 tags:

 # Comma Separated Values (CSV)

 %% Cell type:code id:ba562f5e tags:

 ``` python
 import csv
 ```

 %% Cell type:markdown id:ddcf7595 tags:

 ### Warmup 1

 - Use `list` type's `index` method to lookup index of "ice cream"
 - Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp

 %% Cell type:code id:a0fecc18 tags:

 ``` python
 dairy = ["milk", "ice cream", "cheese", "yogurt" ]
 print(dairy.index("ice cream"))
 # print(dairy.index("paneer")) # doesn't work due to ValueError (runtime error)
 ```

 %% Output

    1

 %% Cell type:markdown id:a1a4e2e7 tags:

 ### Warmup 2
 Use `in` operator to complete the condition to check if food_shelf contains any dairy products.

 %% Cell type:code id:eae06501 tags:

 ``` python
 food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
 for item in food_shelf:
    if item.lower() in dairy:
        print(item, "is dairy")
    else:
        print(item, "is not dairy")
 ```

 %% Output

    peanut butter is not dairy
    milk is dairy
    bread is not dairy
    cheese is dairy
    YOGURT is dairy

 %% Cell type:markdown id:75811c5d tags:

 ## Warmup 3
 Determine median of a list.

 Examples:
 - Median of [1, 2, 3, 4, 5] is: 3
 - Median of [1, 2, 3, 4, 5, 6] is: 3.5

 %% Cell type:code id:97b3f76f tags:

 ``` python
 def median(some_items):
    """
    Returns median of a list passed as argument
    """
    some_items.sort()
    n = len(some_items)

    if n % 2 == 1:
        return some_items[n // 2]
    else:
        first_middle = some_items[n//2 - 1]
        second_middle = some_items[n // 2]
        median = (first_middle + second_middle) / 2
        return median
 ```

 %% Cell type:code id:a4d704c0 tags:

 ``` python
 nums = [5, 4, 3, 2, 1]
 print("Median of", nums, "is" , median(nums))

 nums = [6, 5, 4, 3, 2, 1]
 print("Median of", nums, "is" , median(nums))

 vals = ["A", "C", "B"]
 print("Median of", vals, "is" , median(vals))

 vals = ["A", "C", "B", "D"]
 # print("Median of", nums, "is" , median(vals)) # does not work due to TypeError
 ```

 %% Output

    Median of [1, 2, 3, 4, 5] is 3
    Median of [1, 2, 3, 4, 5, 6] is 3.5
    Median of ['A', 'B', 'C'] is B

 %% Cell type:markdown id:a9d5085c tags:

 ## Learning Objectives:

 - Open an Excel file and export it to a Comma Separated Value file.
 - Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
 - Use pre-written Python code to read a CSV file into a list of lists.
 - Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
 - Write code that answers questions about CSV data by writing for loops on lists of lists.

 %% Cell type:code id:9d936c1c tags:

 ``` python
 # inspired by https://automatetheboringstuff.com/2e/chapter16/
 def process_csv(filename):
    # open the file, its a text file utf-8
    example_file = open(filename, encoding="utf-8")
    # prepare it for reading as a CSV object
    example_reader = csv.reader(example_file)
    # use the built-in list function to convert this into a list of lists
    example_data = list(example_reader)
    # close the file to tidy up our workspace
    example_file.close()
    # return the list of lists

    return example_data
 ```

 %% Cell type:markdown id:89621c98 tags:

 ### Student Information Survey data

 %% Cell type:code id:d3c252b4 tags:

 ``` python
 # TODO: call the process_csv function and store the list of lists in cs220_csv
 cs220_csv = process_csv("cs220_survey_data.csv")
 ```

 %% Cell type:code id:5838ae5f tags:

 ``` python
 # Store the header row into cs220_header, using indexing
 cs220_header = cs220_csv[0]
 cs220_header
 ```

 %% Output

    ['section',
     'Lecture',
     'Age',
     'Primary major',
     'Other Primary Major',
     'Other majors',
     'Zip Code',
     'Latitude',
     'Longitude',
     'Pet owner',
     'Pizza topping',
     'Pet owner',
     'Runner',
     'Sleep habit',
     'Procrastinator',
     'Song']

 %% Cell type:code id:66fda88d tags:

 ``` python
 # TODO: Store all of the data rows into cs220_data, using slicing
 cs220_data = cs220_csv[1:]

 # TODO: use slicing to display top 3 rows data
 cs220_data[:3]
 ```

 %% Output

    [['COMP SCI 220:LAB345, COMP SCI 220:LEC004',
      'LEC004',
      '',
      'Other (please provide details below).',
      '',
      '',
      '53,706',
      '22.5726',
      '88.3639',
      'No',
      'pepperoni',
      'dog',
      'No',
      'night owl',
      'Yes',
      'Island in the Sun - Harry Belafonte'],
     ['COMP SCI 220:LEC003, COMP SCI 220:LAB332',
      'LEC001',
      '19',
      'Engineering: Mechanical',
      '',
      '',
      '53,703',
      '44.5876',
      '-71.9466',
      'No',
      'pepperoni',
      'dog',
      'No',
      'night owl',
      'Yes',
      'No role modelz by J. Cole'],
     ['COMP SCI 220:LAB325, COMP SCI 220:LEC002',
      'LEC002',
      '18',
      'Engineering: Mechanical',
      '.',
      '.',
      '53,706',
      '40.7128',
      '-74.006',
      'Maybe',
      'none (just cheese)',
      'dog',
      'No',
      'night owl',
      'Yes',
      '\xa0biggest bird']]

 %% Cell type:markdown id:4267fe3e tags:

 ### What is the Sleep habit for the 2nd student?

 %% Cell type:code id:4b8dbe8b tags:

 ``` python
 cs220_data[1][9] # bad example: we hard-coded the column index
 ```

 %% Output

    'No'

 %% Cell type:markdown id:4f125240 tags:

 What if we decided to add a new column before sleeping habit? Your code will no longer work.

 Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.

 %% Cell type:code id:f2e52e06 tags:

 ``` python
 cs220_data[1][cs220_header.index("Sleep habit")]
 ```

 %% Output

    'night owl'

 %% Cell type:markdown id:5d298a4c tags:

 ### What is the Lecture of the 4th student?

 %% Cell type:code id:3617b3de tags:

 ``` python
 cs220_data[3][cs220_header.index("Lecture")]
 ```

 %% Output

    'LEC001'

 %% Cell type:markdown id:059de363 tags:

 ### Create a list containing Age of all students 10 years from now

 %% Cell type:code id:45909f22 tags:

 ``` python
 ages_in_ten_years = []

 for row in cs220_data:
    age = row[cs220_header.index("Age")]

    if age == '':
        continue

    age = int(age)
    ages_in_ten_years.append(age + 10)

 ages_in_ten_years[:3]
 ```

 %% Output

    [29, 28, 32]

 %% Cell type:markdown id:8e18663d tags:

 ### cell function

 - It would be very helpful to define a cell function, which can handle missing data and type conversions

 %% Cell type:code id:bba90038 tags:

 ``` python
 def cell(row_idx, col_name):
    """
    Returns the data value (cell) corresponding to the row index and
    the column name of a CSV file.
    """
    # TODO: get the index of col_name
    col_idx = cs220_header.index(col_name)

    # TODO: get the value of cs220_data at the specified cell
    val = cs220_data[row_idx][col_idx]

    # TODO: handle missing values, by returning None
    if val == '':
        return None

    # TODO: handle type conversions
    if col_name in ["Age",]:
        return int(val)
    elif col_name in ['Latitude', 'Longitude']:
        return float(val)

    return val
 ```

 %% Cell type:markdown id:b7c8e726 tags:

 ### Find average age per lecture.

 %% Cell type:code id:f0a05e42 tags:

 ``` python
 # TODO: initialize 6 lists for the 6 lectures
 lec1_ages = []
 lec2_ages = []
 lec3_ages = []
 lec4_ages = []

 # Iterate over the data and populate the lists

 for row_idx in range(len(cs220_data)):
    age = cell(row_idx, "Age")

    if age != None and age > 0 and age < 125:
        lecture = cell(row_idx, "Lecture")
        if lecture == "LEC001":
            lec1_ages.append(age)
        elif lecture == "LEC002":
            lec2_ages.append(age)
        elif lecture == "LEC003":
            lec3_ages.append(age)
        elif lecture == "LEC004":
            lec4_ages.append(age)

 # TODO: compute average age of each lecture
 print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
 print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
 print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
 print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
 ```

 %% Output

    LEC001 average student age: 19.71
    LEC002 average student age: 20.24
    LEC003 average student age: 19.41
    LEC004 average student age: 19.43

 %% Cell type:markdown id:64fd0945 tags:

 ### Find all unique zip codes.

 %% Cell type:code id:c28e77ce tags:

 ``` python
 # TODO: initialize list of keep track of zip codes
 zip_codes = []

 for row_idx in range(len(cs220_data)):
    zip_code = cell(row_idx, "Zip Code")

    if zip_code != None:
        zip_codes.append(zip_code)

 list(set(zip_codes))
 ```

 %% Output

    ['94,596',
     '53,590',
     '53,121',
     '53,589',
     '53,527',
     '95,030',
     '53,572',
     '53,706',
     '53,705',
     '53,726',
     '60,517',
     '53,703',
     '53,704',
     '53,715',
     '53,562',
     '53,150',
     '52,703',
     '53,716',
     '53,521',
     '53,713',
     '53,558',
     '98,607',
     '533,706',
     '-53,703',
     '1,520',
     '93,703',
     '8,820',
     '53.706',
     '53,711',
     '51,735',
     '50,376',
     '53,051',
     '54,703',
     '53,563',
     '57,303',
     '53,719',
     '53,575',
     '53.715',
     '56,511',
     '53,701']

 %% Cell type:markdown id:31a381fe tags:

 ## Self-practice

 %% Cell type:markdown id:8ac26620 tags:

 ### How many students are both a procrastinator and a pet owner?

 %% Cell type:markdown id:172141ea tags:

 ### What percentage of 18-year-olds have their major declared as "Other"?

 %% Cell type:markdown id:d9a7a2b1 tags:

 ### How old is the oldest basil/spinach-loving Business major?

--- a/s23/Gurmail_lecture_notes/15_CSV_Files/lec_15_CSV_template_Gurmail_lec1.ipynb
+++ b/s23/Gurmail_lecture_notes/15_CSV_Files/lec_15_CSV_template_Gurmail_lec1.ipynb
 {
 "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "47524973",
+   "metadata": {},
+   "source": [
+    "# Announcements - Tabular Data, CSV Files\n",
+    "\n",
+    "* Download ALL files for today's lecture\n",
+    "* P3 Last day to request regrades\n",
+    "* P4 Last day to turn in late\n",
+    "* P5 Due today\n",
+    "* Q4 Released tonight at 5 pm\n",
+    "* Exam\n",
+    "  * We estimate results will be available Thursday\n",
+    "* Quiz 4 released tonight : due Friday\n",
+    "* Office Hours - long lines?\n",
+    "  * Check the calendar\n",
+    "  * Find a partner!\n",
+    "  * Be patient on Tuesdays and Wednesdays"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "72348536",
@@ -382,7 +403,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@@ -396,7 +417,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.8"
+   "version": "3.9.13"
  }
 },
 "nbformat": 4,

+%% Cell type:markdown id:47524973 tags:
+
+# Announcements - Tabular Data, CSV Files
+
+* Download ALL files for today's lecture
+* P3 Last day to request regrades
+* P4 Last day to turn in late
+* P5 Due today
+* Q4 Released tonight at 5 pm
+* Exam
+  * We estimate results will be available Thursday
+* Quiz 4 released tonight : due Friday
+* Office Hours - long lines?
+  * Check the calendar
+  * Find a partner!
+  * Be patient on Tuesdays and Wednesdays
+
 %% Cell type:markdown id:72348536 tags:

 # Comma Separated Values (CSV)

 %% Cell type:code id:ba562f5e tags:

 ``` python
 import csv
 ```

 %% Cell type:markdown id:ddcf7595 tags:

 ### Warmup 1

 - Use `list` type's `index` method to lookup index of "ice cream"
 - Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp

 %% Cell type:code id:a0fecc18 tags:

 ``` python
 dairy = ["milk", "ice cream", "cheese", "yogurt" ]
 print()
 ```

 %% Cell type:markdown id:a1a4e2e7 tags:

 ### Warmup 2
 Use `in` operator to complete the condition to check if food_shelf contains any dairy products.

 %% Cell type:code id:eae06501 tags:

 ``` python
 food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
 for item in food_shelf:
    if ???:
        print(item, "is dairy")
    else:
        print(item, "is not dairy")
 ```

 %% Cell type:markdown id:8a5f548e tags:

 ## Warmup 3
 Determine median of a list.

 Examples:
 - Median of [1, 2, 3, 4, 5] is: 3
 - Median of [1, 2, 3, 4, 5, 6] is: 3.5

 %% Cell type:code id:2f610ffe tags:

 ``` python
 def median(some_items):
    """
    Returns median of a list passed as argument
    """
    pass
 ```

 %% Cell type:code id:e9340eaa tags:

 ``` python
 nums = [5, 4, 3, 2, 1]
 print(nums, median(nums))

 nums = [6, 5, 4, 3, 2, 1]
 print(nums, median(nums))

 vals = ["A", "C", "B"]
 print(vals, median(vals))

 vals = ["A", "C", "B", "D"]
 # print(nums, median(vals)) # does not work due to TypeError
 ```

 %% Cell type:markdown id:a9d5085c tags:

 ## Learning Objectives:

 - Open an Excel file and export it to a Comma Separated Value file.
 - Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
 - Use pre-written Python code to read a CSV file into a list of lists.
 - Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
 - Write code that answers questions about CSV data by writing for loops on lists of lists.

 %% Cell type:code id:9d936c1c tags:

 ``` python
 # inspired by https://automatetheboringstuff.com/2e/chapter16/
 def process_csv(filename):
    # open the file, its a text file utf-8
    example_file = open(filename, encoding="utf-8")
    # prepare it for reading as a CSV object
    example_reader = csv.reader(example_file)
    # use the built-in list function to convert this into a list of lists
    example_data = list(example_reader)
    # close the file to tidy up our workspace
    example_file.close()
    # return the list of lists

    return example_data
 ```

 %% Cell type:markdown id:89621c98 tags:

 ### Student Information Survey data

 %% Cell type:code id:d3c252b4 tags:

 ``` python
 # TODO: call the process_csv function and store the list of lists in cs220_csv
 ```

 %% Cell type:code id:5838ae5f tags:

 ``` python
 # Store the header row into cs220_header, using indexing
 cs220_header = ???
 cs220_header
 ```

 %% Cell type:code id:66fda88d tags:

 ``` python
 # TODO: Store all of the data rows into cs220_data, using slicing
 cs220_data = ???

 # TODO: use slicing to display top 3 rows data
 cs220_data[:3]
 ```

 %% Cell type:markdown id:4267fe3e tags:

 ### What is the Sleep habit for the 2nd student?

 %% Cell type:code id:4b8dbe8b tags:

 ``` python
 # bad example: we hard-coded the column index
 ```

 %% Cell type:markdown id:4f125240 tags:

 What if we decided to add a new column before sleeping habit? Your code will no longer work.

 Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.

 %% Cell type:code id:f2e52e06 tags:

 ``` python
 ```

 %% Cell type:markdown id:5d298a4c tags:

 ### What is the Lecture of the 4th student?

 %% Cell type:code id:3617b3de tags:

 ``` python
 ```

 %% Cell type:markdown id:059de363 tags:

 ### Create a list containing Age of all students 10 years from now

 %% Cell type:code id:45909f22 tags:

 ``` python
 ```

 %% Cell type:markdown id:8e18663d tags:

 ### cell function

 - It would be very helpful to define a cell function, which can handle missing data and type conversions

 %% Cell type:code id:bba90038 tags:

 ``` python
 def cell(row_idx, col_name):
    """
    Returns the data value (cell) corresponding to the row index and
    the column name of a CSV file.
    """
    # TODO: get the index of col_name

    # TODO: get the value of cs220_data at the specified cell

    # TODO: handle missing values, by returning None

    # TODO: handle type conversions
 ```

 %% Cell type:markdown id:b7c8e726 tags:

 ### Find average age per lecture.

 %% Cell type:code id:f0a05e42 tags:

 ``` python
 # TODO: initialize 6 lists for the 6 lectures


 # Iterate over the data and populate the lists


 # TODO: compute average age of each lecture
 print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
 print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
 print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
 print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
 ```

 %% Cell type:markdown id:64fd0945 tags:

 ### Find all unique zip codes.

 %% Cell type:code id:c28e77ce tags:

 ``` python
 # TODO: initialize list of keep track of zip codes
 zip_codes = []

 for row_idx in range(len(cs220_data)):
    zip_code = cell(row_idx, "Zip Code")

    if zip_code != None:
        zip_codes.append(zip_code)

 zip_codes # How do we get the unique values?
 ```

 %% Cell type:markdown id:31a381fe tags:

 ## Self-practice

 %% Cell type:markdown id:8ac26620 tags:

 ### How many students are both a procrastinator and a pet owner?

 %% Cell type:markdown id:172141ea tags:

 ### What percentage of 18-year-olds have their major declared as "Other"?

 %% Cell type:markdown id:d9a7a2b1 tags:

 ### How old is the oldest basil/spinach-loving Business major?

--- a/s23/Gurmail_lecture_notes/15_CSV_Files/lec_15_CSV_template_Gurmail_lec2.ipynb
+++ b/s23/Gurmail_lecture_notes/15_CSV_Files/lec_15_CSV_template_Gurmail_lec2.ipynb
 {
 "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7778f7e0",
+   "metadata": {},
+   "source": [
+    "# Announcements - Tabular Data, CSV Files\n",
+    "\n",
+    "* Download ALL files for today's lecture\n",
+    "* P3 Last day to request regrades\n",
+    "* P4 Last day to turn in late\n",
+    "* P5 Due today\n",
+    "* Q4 Released tonight at 5 pm\n",
+    "* Exam\n",
+    "  * We estimate results will be available Thursday\n",
+    "* Quiz 4 released tonight : due Friday\n",
+    "* Office Hours - long lines?\n",
+    "  * Check the calendar\n",
+    "  * Find a partner!\n",
+    "  * Be patient on Tuesdays and Wednesdays"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "72348536",
@@ -382,7 +403,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@@ -396,7 +417,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.8"
+   "version": "3.9.13"
  }
 },
 "nbformat": 4,

+%% Cell type:markdown id:7778f7e0 tags:
+
+# Announcements - Tabular Data, CSV Files
+
+* Download ALL files for today's lecture
+* P3 Last day to request regrades
+* P4 Last day to turn in late
+* P5 Due today
+* Q4 Released tonight at 5 pm
+* Exam
+  * We estimate results will be available Thursday
+* Quiz 4 released tonight : due Friday
+* Office Hours - long lines?
+  * Check the calendar
+  * Find a partner!
+  * Be patient on Tuesdays and Wednesdays
+
 %% Cell type:markdown id:72348536 tags:

 # Comma Separated Values (CSV)

 %% Cell type:code id:ba562f5e tags:

 ``` python
 import csv
 ```

 %% Cell type:markdown id:ddcf7595 tags:

 ### Warmup 1

 - Use `list` type's `index` method to lookup index of "ice cream"
 - Take a look at other list methods: https://www.w3schools.com/python/python_ref_list.asp

 %% Cell type:code id:a0fecc18 tags:

 ``` python
 dairy = ["milk", "ice cream", "cheese", "yogurt" ]
 print()
 ```

 %% Cell type:markdown id:a1a4e2e7 tags:

 ### Warmup 2
 Use `in` operator to complete the condition to check if food_shelf contains any dairy products.

 %% Cell type:code id:eae06501 tags:

 ``` python
 food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
 for item in food_shelf:
    if ???:
        print(item, "is dairy")
    else:
        print(item, "is not dairy")
 ```

 %% Cell type:markdown id:8a5f548e tags:

 ## Warmup 3
 Determine median of a list.

 Examples:
 - Median of [1, 2, 3, 4, 5] is: 3
 - Median of [1, 2, 3, 4, 5, 6] is: 3.5

 %% Cell type:code id:2f610ffe tags:

 ``` python
 def median(some_items):
    """
    Returns median of a list passed as argument
    """
    pass
 ```

 %% Cell type:code id:e9340eaa tags:

 ``` python
 nums = [5, 4, 3, 2, 1]
 print(nums, median(nums))

 nums = [6, 5, 4, 3, 2, 1]
 print(nums, median(nums))

 vals = ["A", "C", "B"]
 print(vals, median(vals))

 vals = ["A", "C", "B", "D"]
 # print(nums, median(vals)) # does not work due to TypeError
 ```

 %% Cell type:markdown id:a9d5085c tags:

 ## Learning Objectives:

 - Open an Excel file and export it to a Comma Separated Value file.
 - Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
 - Use pre-written Python code to read a CSV file into a list of lists.
 - Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
 - Write code that answers questions about CSV data by writing for loops on lists of lists.

 %% Cell type:code id:9d936c1c tags:

 ``` python
 # inspired by https://automatetheboringstuff.com/2e/chapter16/
 def process_csv(filename):
    # open the file, its a text file utf-8
    example_file = open(filename, encoding="utf-8")
    # prepare it for reading as a CSV object
    example_reader = csv.reader(example_file)
    # use the built-in list function to convert this into a list of lists
    example_data = list(example_reader)
    # close the file to tidy up our workspace
    example_file.close()
    # return the list of lists

    return example_data
 ```

 %% Cell type:markdown id:89621c98 tags:

 ### Student Information Survey data

 %% Cell type:code id:d3c252b4 tags:

 ``` python
 # TODO: call the process_csv function and store the list of lists in cs220_csv
 ```

 %% Cell type:code id:5838ae5f tags:

 ``` python
 # Store the header row into cs220_header, using indexing
 cs220_header = ???
 cs220_header
 ```

 %% Cell type:code id:66fda88d tags:

 ``` python
 # TODO: Store all of the data rows into cs220_data, using slicing
 cs220_data = ???

 # TODO: use slicing to display top 3 rows data
 cs220_data[:3]
 ```

 %% Cell type:markdown id:4267fe3e tags:

 ### What is the Sleep habit for the 2nd student?

 %% Cell type:code id:4b8dbe8b tags:

 ``` python
 # bad example: we hard-coded the column index
 ```

 %% Cell type:markdown id:4f125240 tags:

 What if we decided to add a new column before sleeping habit? Your code will no longer work.

 Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.

 %% Cell type:code id:f2e52e06 tags:

 ``` python
 ```

 %% Cell type:markdown id:5d298a4c tags:

 ### What is the Lecture of the 4th student?

 %% Cell type:code id:3617b3de tags:

 ``` python
 ```

 %% Cell type:markdown id:059de363 tags:

 ### Create a list containing Age of all students 10 years from now

 %% Cell type:code id:45909f22 tags:

 ``` python
 ```

 %% Cell type:markdown id:8e18663d tags:

 ### cell function

 - It would be very helpful to define a cell function, which can handle missing data and type conversions

 %% Cell type:code id:bba90038 tags:

 ``` python
 def cell(row_idx, col_name):
    """
    Returns the data value (cell) corresponding to the row index and
    the column name of a CSV file.
    """
    # TODO: get the index of col_name

    # TODO: get the value of cs220_data at the specified cell

    # TODO: handle missing values, by returning None

    # TODO: handle type conversions
 ```

 %% Cell type:markdown id:b7c8e726 tags:

 ### Find average age per lecture.

 %% Cell type:code id:f0a05e42 tags:

 ``` python
 # TODO: initialize 6 lists for the 6 lectures


 # Iterate over the data and populate the lists


 # TODO: compute average age of each lecture
 print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
 print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
 print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
 print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))
 ```

 %% Cell type:markdown id:64fd0945 tags:

 ### Find all unique zip codes.

 %% Cell type:code id:c28e77ce tags:

 ``` python
 # TODO: initialize list of keep track of zip codes
 zip_codes = []

 for row_idx in range(len(cs220_data)):
    zip_code = cell(row_idx, "Zip Code")

    if zip_code != None:
        zip_codes.append(zip_code)

 zip_codes # How do we get the unique values?
 ```

 %% Cell type:markdown id:31a381fe tags:

 ## Self-practice

 %% Cell type:markdown id:8ac26620 tags:

 ### How many students are both a procrastinator and a pet owner?

 %% Cell type:markdown id:172141ea tags:

 ### What percentage of 18-year-olds have their major declared as "Other"?

 %% Cell type:markdown id:d9a7a2b1 tags:

 ### How old is the oldest basil/spinach-loving Business major?