Skip to content
Snippets Groups Projects
Commit 04aa65b0 authored by LOUIS TYRRELL OLIPHANT's avatar LOUIS TYRRELL OLIPHANT
Browse files

lec 15 CSVs

parent 5c5a8cf6
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
## Warmup
%% Cell type:code id: tags:
``` python
# Warmup 1: Mutating a list
my_groceries = []
#use .append() to add an item to the end of a list
print(my_groceries)
#use .extend() to add *ALL* items from one list to another list
print(my_groceries)
#use .pop() to remove an item at a specific index
print(my_groceries)
#use .remove() by take out a value
print(my_groceries)
```
%% Cell type:code id: tags:
``` python
# Warmup #2: Take a look at these list methods
# https://www.w3schools.com/python/python_ref_list.asp
dairy = ["milk", "ice cream", "cheese", "yogurt" ]
#use the .index() method to get the index of "ice cream"
```
%% Cell type:code id: tags:
``` python
# Warmup #3: Because a list is a sequence, we can use the 'in' operator
food_shelf = ["peanut butter", "milk", "bread", "cheese", "YOGURT"]
for item in food_shelf:
if ???: # see if item.lower() is in the dairy list
print(item, "is dairy")
else:
print(item, "is not dairy")
```
%% Cell type:markdown id: tags:
# Comma-Separated Values (CSV) Files
## Readings
- [Sweigart Ch 16 (through "Reading Data from Reader Objects in a for Loop")](https://automatetheboringstuff.com/2e/chapter16/)
## Learning Objectives
After this lecture you will be able to...
- Open an Spreadsheet file and export it to a Comma Separated Value file.
- View a CSV file in Jupyter Lab's *CSV Viewer*.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:markdown id: tags:
## CSV File Format
CSV file format is a non-propriatary format for sharing tables of data. The data in the CSV file is contained in rows with the values in the rows separated by commas. It is very common (though not required) to have the top row of a CSV file contain the the header names for the columns of data:
```
Animal,Country,Population
panda, China,1864
lemur,Madagascar,2500
macaque,India,100000
```
You can open CSV files using a spreadsheet program like Microsoft Excel or Apple Numbers. With Jupyter Lab you can open a CSV file for viewing by doubule-clicking on the file name within Jupyter. If you want to open it for editing then you need to right-click on the file name and select *Open With -> Editor* in the context menu which will open the file in a text editor.
%% Cell type:markdown id: tags:
## Reading a CSV in Python
Python has a module for working with CSV files called `csv`. One common approach for reading in all the data from a CSV file is to load the data as a list of lists. Look at the function defined in the cell below which takes a file name, loads the data from the file using methods from the csv module and returns a list of lists.
%% Cell type:code id: tags:
``` python
# Now lets store the contents of the CSV file into Python lists
# copied from https://automatetheboringstuff.com/chapter14/
import csv
def process_csv(filename):
# open the file, its a text file utf-8
exampleFile = open(filename, encoding="utf-8")
# prepare it for reading as a CSV object
exampleReader = csv.reader(exampleFile)
# use the built-in list function to convert this into a list of lists
exampleData = list(exampleReader)
# close the file to tidy up our workspace
exampleFile.close()
# return the list of lists
return exampleData
```
%% Cell type:markdown id: tags:
Finish the code in the cell below by useing the `process_csv()` function to load the `colors.csv` file as a list of lists and store the data in the `colors` variable then print the loaded data. Open the colors.csv file in a separate tab to see data in a spreadsheet type layout and compare what you see in Jupyter Lab with the content on that tab.
%% Cell type:code id: tags:
``` python
##TODO load the 'colors.csv' data and store and print the data
colors = ...
print(colors)
```
%% Cell type:markdown id: tags:
You can use double indexing to access specific elements from the loaded data. Recall that the first indexing will access the row and the second indexing will access the column within that row:
```python
colors[3][0]
```
```
Aqua
```
```python
colors[3][2]
```
```
rgb(0,100,100)
```
On web pages you can describe colors using the color's name, the color's hex value, or the color's rgb value. The function defined in the cell below accepts any of these three ways of describing a color and displays a tiny swatch of that color.
%% Cell type:code id: tags:
``` python
from IPython.display import HTML
def show_color(color):
color=color.replace(' ','')
color_html = '<div style="background-color: {}; width: 50px; height: 20px;"></div>'.format(color)
display(HTML(color_html))
```
%% Cell type:markdown id: tags:
### Using Double Indexing
Finish the code in the cell below by calling `show_color()` with specific values from the `colors` variable.
%% Cell type:code id: tags:
``` python
## Display Blanched Almond (index 9) using its name (index 0)
print(colors[9][0])
show_color(colors[9][0])
## Display Blue Violet (index 11) using the HEX value (index 1)
print(colors[11][1])
show_color(colors[11][1])
## Display Khaki (index 61) using the rgb value (index 2)
print(...)
show_color(...)
## You pick a color and display the color swatch using the name, HEX value, and rgb value
## also print the name, HEX value, and rgb value
```
%% Cell type:markdown id: tags:
Now let's work with our survey data. Finish the code in the cell below to load the data using the `process_csv()` function. Then divide the header row off from the data portion, and printer the header and the top 2 rows of the data.
%% Cell type:code id: tags:
``` python
# Call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv('cs220_survey_data.csv')
cs220_csv
cs220_header = ...
cs220_data = ...
print(cs220_header)
print(...)
```
%% Cell type:markdown id: tags:
## CSVs as a List of Lists
Finish the cells below to access and print the desired item. Since the csv_data is a list of lists, you will need to use double indexing.
%% Cell type:code id: tags:
``` python
# Print out the lecture number of the 4th student...by hardcoding its row and column....
cs220_data[???][???] # [row][col]
```
%% Cell type:code id: tags:
``` python
# Print out the sleeping habit for the 2nd student...by hardcoding its row and column....
cs220_data[???][???]
```
%% Cell type:code id: tags:
``` python
# Print out how many students completed the survey.
len(???)
```
%% Cell type:code id: tags:
``` python
# Print out every student's sleep habits and major
for i in range(len(cs220_data)):
current_sleep_habit = ???
current_major = ???
print(current_sleep_habit + '\t\t' + current_major)
```
%% Cell type:code id: tags:
``` python
# FIX: Print out every students' age in 10 years.
for i in range(???):
current_age = cs220_data[i][2]
print(current_age + 10)
```
%% Cell type:markdown id: tags:
## It would be nice to have a helper function!
Let's introduce `cell`
%% Cell type:code id: tags:
``` python
# Remember creating cs220_header?
cs220_header
```
%% Cell type:code id: tags:
``` python
# Get the column index of "Pizza Topping"
cs220_header.index(???)
```
%% Cell type:code id: tags:
``` python
# We want to invoke something like...
# cell(24, "Cats or Dogs")
# cell(63, "Zip Code")
def cell(row_idx, col_name):
col_idx = ??? # get the index of col_name
val = ??? # get the value of cs220_data at the specified cell
return val
```
%% Cell type:code id: tags:
``` python
# Print out the lecture number of the 4th student... using the cell function
cell(???, ???)
```
%% Cell type:code id: tags:
``` python
# Print out every student's sleep habits and major using the cell function
for i in range(len(cs220_data)):
current_sleep_habit = ???
current_major = ???
print(current_sleep_habit + '\t\t' + current_major)
```
%% Cell type:code id: tags:
``` python
# Print out every students' age in 10 years using the cell function
# ... that didn't really help us here!
for i in range(len(cs220_data)):
current_age = cell(i, "Age")
if current_age != None:
print(current_age + 10)
```
%% Cell type:code id: tags:
``` python
# Improve the cell function so it returns the appropriate type.
# If there is nothing in the cell, return None
def cell(row_idx, col_name):
col_idx = cs220_header.index(col_name)
val = cs220_data[row_idx][col_idx]
if ???:
return None
elif ???:
return int(val)
else:
return val
```
%% Cell type:code id: tags:
``` python
# Print out every students' age in 10 years using the cell function
# ... much better!
for i in range(len(cs220_data)):
current_age = cell(i, "Age")
if current_age != None:
print(current_age + 10)
```
%% Cell type:code id: tags:
``` python
# Get the average age of each lecture...
```
%% Cell type:markdown id: tags:
## You try!
%% Cell type:markdown id: tags:
Complete the challenges below. First try completing the problem directly using the list of lists (e.g. double indexing \[\]\[\]), then try using the `cell` function!
%% Cell type:code id: tags:
``` python
# Of all runners, how many are procrastinators?
```
%% Cell type:code id: tags:
``` python
# What percentage of 18-year-olds have their major declared as "Other"?
```
%% Cell type:code id: tags:
``` python
# Does the oldest basil/spinach-loving Business major prefer cats, dogs, or neither?
```
%% Cell type:markdown id: tags:
## Summary
The `csv` module can be used to load CSV files and store them as a list of lists. Use double-indexing and helper functions to work with data in this format.
This diff is collapsed.
Name,HEX,RGB
Alice Blue,#F0F8FF,"rgb(239,247,255)"
Antique White,#FAEBD7,"rgb(249,234,214)"
Aqua,#00FFFF,"rgb(0,255,255)"
Aquamarine,#7FFFD4,"rgb(127,255,211)"
Azure,#F0FFFF,"rgb(239,255,255)"
Beige,#F5F5DC,"rgb(244,244,219)"
Bisque,#FFE4C4,"rgb(255,226,196)"
Black,#000000,"rgb(0,0,0)"
Blanched Almond,#FFEBCD,"rgb(255,234,204)"
Blue,#0000FF,"rgb(0,0,255)"
Blue Violet,#8A2BE2,"rgb(137,43,226)"
Brown,#A52A2A,"rgb(165,40,40)"
Burlywood,#DEB887,"rgb(221,183,135)"
Cadet Blue,#5F9EA0,"rgb(94,158,160)"
Chartreuse,#7FFF00,"rgb(127,255,0)"
Chocolate,#D2691E,"rgb(209,104,30)"
Coral,#FF7F50,"rgb(255,127,79)"
Cornflower Blue,#6495ED,"rgb(99,147,237)"
Cornsilk,#FFF8DC,"rgb(255,247,219)"
Crimson,#DC143C,"rgb(219,20,61)"
Cyan,#00FFFF,"rgb(0,255,255)"
Dark Blue,#00008B,"rgb(0,0,140)"
Dark Cyan,#008B8B,"rgb(0,140,140)"
Dark Goldenrod,#B8860B,"rgb(183,135,10)"
Dark Gray,#A9A9A9,"rgb(168,168,168)"
Dark Green,#006400,"rgb(0,99,0)"
Dark Khaki,#BDB76B,"rgb(188,183,107)"
Dark Magenta,#8B008B,"rgb(140,0,140)"
Dark Olive Green,#556B2F,"rgb(84,107,45)"
Dark Orange,#FF8C00,"rgb(255,140,0)"
Dark Orchid,#9932CC,"rgb(153,51,204)"
Dark Red,#8B0000,"rgb(140,0,0)"
Dark Salmon,#E9967A,"rgb(232,150,122)"
Dark Sea Green,#8FBC8F,"rgb(142,188,142)"
Dark Slate Blue,#483D8B,"rgb(71,61,140)"
Dark Slate Gray,#2F4F4F,"rgb(45,79,79)"
Dark Turquoise,#00CED1,"rgb(0,206,209)"
Dark Violet,#9400D3,"rgb(147,0,211)"
Deep Pink,#FF1493,"rgb(255,20,147)"
Deep Sky Blue,#00BFFF,"rgb(0,191,255)"
Dim Gray,#696969,"rgb(104,104,104)"
Dodger Blue,#1E90FF,"rgb(30,142,255)"
Firebrick,#B22222,"rgb(178,33,33)"
Floral White,#FFFAF0,"rgb(255,249,239)"
Forest Green,#228B22,"rgb(33,140,33)"
Fuchsia,#FF00FF,"rgb(255,0,255)"
Gainsboro,#DCDCDC,"rgb(219,219,219)"
Ghost White,#F8F8FF,"rgb(247,247,255)"
Gold,#FFD700,"rgb(255,214,0)"
Goldenrod,#DAA520,"rgb(216,165,33)"
Gray,#BEBEBE,"rgb(191,191,191)"
Web Gray,#808080,"rgb(127,127,127)"
Green,#00FF00,"rgb(0,255,0)"
Web Green,#008000,"rgb(0,127,0)"
Green Yellow,#ADFF2F,"rgb(173,255,45)"
Honeydew,#F0FFF0,"rgb(239,255,239)"
Hot Pink,#FF69B4,"rgb(255,104,181)"
Indian Red,#CD5C5C,"rgb(204,91,91)"
Indigo,#4B0082,"rgb(73,0,130)"
Ivory,#FFFFF0,"rgb(255,255,239)"
Khaki,#F0E68C,"rgb(239,229,140)"
Lavender,#E6E6FA,"rgb(229,229,249)"
Lavender Blush,#FFF0F5,"rgb(255,239,244)"
Lawn Green,#7CFC00,"rgb(124,252,0)"
Lemon Chiffon,#FFFACD,"rgb(255,249,204)"
Light Blue,#ADD8E6,"rgb(173,216,229)"
Light Coral,#F08080,"rgb(239,127,127)"
Light Cyan,#E0FFFF,"rgb(224,255,255)"
Light Goldenrod,#FAFAD2,"rgb(249,249,209)"
Light Gray,#D3D3D3,"rgb(211,211,211)"
Light Green,#90EE90,"rgb(142,237,142)"
Light Pink,#FFB6C1,"rgb(255,181,193)"
Light Salmon,#FFA07A,"rgb(255,160,122)"
Light Sea Green,#20B2AA,"rgb(33,178,170)"
Light Sky Blue,#87CEFA,"rgb(135,206,249)"
Light Slate Gray,#778899,"rgb(119,135,153)"
Light Steel Blue,#B0C4DE,"rgb(175,196,221)"
Light Yellow,#FFFFE0,"rgb(255,255,224)"
Lime,#00FF00,"rgb(0,255,0)"
Lime Green,#32CD32,"rgb(51,204,51)"
Linen,#FAF0E6,"rgb(249,239,229)"
Magenta,#FF00FF,"rgb(255,0,255)"
Maroon,#B03060,"rgb(175,48,96)"
Web Maroon,#800000,"rgb(127,0,0)"
Medium Aquamarine,#66CDAA,"rgb(102,204,170)"
Medium Blue,#0000CD,"rgb(0,0,204)"
Medium Orchid,#BA55D3,"rgb(186,84,211)"
Medium Purple,#9370DB,"rgb(147,112,219)"
Medium Sea Green,#3CB371,"rgb(61,178,112)"
Medium Slate Blue,#7B68EE,"rgb(122,104,237)"
Medium Spring Green,#00FA9A,"rgb(0,249,153)"
Medium Turquoise,#48D1CC,"rgb(71,209,204)"
Medium Violet Red,#C71585,"rgb(198,20,132)"
Midnight Blue,#191970,"rgb(25,25,112)"
Mint Cream,#F5FFFA,"rgb(244,255,249)"
Misty Rose,#FFE4E1,"rgb(255,226,224)"
Moccasin,#FFE4B5,"rgb(255,226,181)"
Navajo White,#FFDEAD,"rgb(255,221,173)"
Navy Blue,#000080,"rgb(0,0,127)"
Old Lace,#FDF5E6,"rgb(252,244,229)"
Olive,#808000,"rgb(127,127,0)"
Olive Drab,#6B8E23,"rgb(107,142,35)"
Orange,#FFA500,"rgb(255,165,0)"
Orange Red,#FF4500,"rgb(255,68,0)"
Orchid,#DA70D6,"rgb(216,112,214)"
Pale Goldenrod,#EEE8AA,"rgb(237,232,170)"
Pale Green,#98FB98,"rgb(153,249,153)"
Pale Turquoise,#AFEEEE,"rgb(175,237,237)"
Pale Violet Red,#DB7093,"rgb(219,112,147)"
Papaya Whip,#FFEFD5,"rgb(255,239,214)"
Peach Puff,#FFDAB9,"rgb(255,216,186)"
Peru,#CD853F,"rgb(204,132,63)"
Pink,#FFC0CB,"rgb(255,191,204)"
Plum,#DDA0DD,"rgb(221,160,221)"
Powder Blue,#B0E0E6,"rgb(175,224,229)"
Purple,#A020F0,"rgb(160,33,239)"
Web Purple,#800080,"rgb(127,0,127)"
Rebecca Purple,#663399,"rgb(102,51,153)"
Red,#FF0000,"rgb(255,0,0)"
Rosy Brown,#BC8F8F,"rgb(188,142,142)"
Royal Blue,#4169E1,"rgb(63,104,224)"
Saddle Brown,#8B4513,"rgb(140,68,17)"
Salmon,#FA8072,"rgb(249,127,114)"
Sandy Brown,#F4A460,"rgb(244,163,96)"
Sea Green,#2E8B57,"rgb(45,140,86)"
Seashell,#FFF5EE,"rgb(255,244,237)"
Sienna,#A0522D,"rgb(160,81,45)"
Silver,#C0C0C0,"rgb(191,191,191)"
Sky Blue,#87CEEB,"rgb(135,206,234)"
Slate Blue,#6A5ACD,"rgb(107,89,204)"
Slate Gray,#708090,"rgb(112,127,142)"
Snow,#FFFAFA,"rgb(255,249,249)"
Spring Green,#00FF7F,"rgb(0,255,127)"
Steel Blue,#4682B4,"rgb(68,130,181)"
Tan,#D2B48C,"rgb(209,181,140)"
Teal,#008080,"rgb(0,127,127)"
Thistle,#D8BFD8,"rgb(216,191,216)"
Tomato,#FF6347,"rgb(255,99,71)"
Turquoise,#40E0D0,"rgb(63,224,209)"
Violet,#EE82EE,"rgb(237,130,237)"
Wheat,#F5DEB3,"rgb(244,221,178)"
White,#FFFFFF,"rgb(255,255,255)"
White Smoke,#F5F5F5,"rgb(244,244,244)"
Yellow,#FFFF00,"rgb(255,255,0)"
Yellow Green,#9ACD32,"rgb(153,204,51)"
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment