-[Sweigart Ch 16 (through "Reading Data from Reader Objects in a for Loop")](https://automatetheboringstuff.com/2e/chapter16/)
-[Sweigart Ch 16 (through "Reading Data from Reader Objects in a for Loop")](https://automatetheboringstuff.com/2e/chapter16/)
## Learning Objectives
## Learning Objectives
After this lecture you will be able to...
After this lecture you will be able to...
- Open an Spreadsheet file and export it to a Comma Separated Value file.
- Open an Spreadsheet file and export it to a Comma Separated Value file.
- View a CSV file in Jupyter Lab's *CSV Viewer*.
- View a CSV file in Jupyter Lab's *CSV Viewer*.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Open a CSV file in TextEditor/Jupyter and connect the elements of the CSV file to the rows and columns in the spreadsheet.
- Use pre-written Python code to read a CSV file into a list of lists.
- Use pre-written Python code to read a CSV file into a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write Python statements with double list indexing to access any element of a CSV file via a list of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists.
- Write code that answers questions about CSV data by writing for loops on lists of lists.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## CSV File Format
## CSV File Format
CSV file format is a non-propriatary format for sharing tables of data. The data in the CSV file is contained in rows with the values in the rows separated by commas. It is very common (though not required) to have the top row of a CSV file contain the the header names for the columns of data:
CSV file format is a non-propriatary format for sharing tables of data. The data in the CSV file is contained in rows with the values in the rows separated by commas. It is very common (though not required) to have the top row of a CSV file contain the the header names for the columns of data:
```
```
Animal,Country,Population
Animal,Country,Population
panda, China,1864
panda, China,1864
lemur,Madagascar,2500
lemur,Madagascar,2500
macaque,India,100000
macaque,India,100000
```
```
You can open CSV files using a spreadsheet program like Microsoft Excel or Apple Numbers. With Jupyter Lab you can open a CSV file for viewing by doubule-clicking on the file name within Jupyter. If you want to open it for editing then you need to right-click on the file name and select *Open With -> Editor* in the context menu which will open the file in a text editor.
You can open CSV files using a spreadsheet program like Microsoft Excel or Apple Numbers. With Jupyter Lab you can open a CSV file for viewing by doubule-clicking on the file name within Jupyter. If you want to open it for editing then you need to right-click on the file name and select *Open With -> Editor* in the context menu which will open the file in a text editor.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Reading a CSV in Python
## Reading a CSV in Python
Python has a module for working with CSV files called `csv`. One common approach for reading in all the data from a CSV file is to load the data as a list of lists. Look at the function defined in the cell below which takes a file name, loads the data from the file using methods from the csv module and returns a list of lists.
Python has a module for working with CSV files called `csv`. One common approach for reading in all the data from a CSV file is to load the data as a list of lists. Look at the function defined in the cell below which takes a file name, loads the data from the file using methods from the csv module and returns a list of lists.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Now lets store the contents of the CSV file into Python lists
# Now lets store the contents of the CSV file into Python lists
# copied from https://automatetheboringstuff.com/chapter14/
# copied from https://automatetheboringstuff.com/chapter14/
importcsv
importcsv
defprocess_csv(filename):
defprocess_csv(filename):
# open the file, its a text file utf-8
# open the file, its a text file utf-8
exampleFile=open(filename,encoding="utf-8")
exampleFile=open(filename,encoding="utf-8")
# prepare it for reading as a CSV object
# prepare it for reading as a CSV object
exampleReader=csv.reader(exampleFile)
exampleReader=csv.reader(exampleFile)
# use the built-in list function to convert this into a list of lists
# use the built-in list function to convert this into a list of lists
exampleData=list(exampleReader)
exampleData=list(exampleReader)
# close the file to tidy up our workspace
# close the file to tidy up our workspace
exampleFile.close()
exampleFile.close()
# return the list of lists
# return the list of lists
returnexampleData
returnexampleData
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Finish the code in the cell below by useing the `process_csv()` function to load the `colors.csv` file as a list of lists and store the data in the `colors` variable then print the loaded data. Open the colors.csv file in a separate tab to see data in a spreadsheet type layout and compare what you see in Jupyter Lab with the content on that tab.
Finish the code in the cell below by useing the `process_csv()` function to load the `colors.csv` file as a list of lists and store the data in the `colors` variable then print the loaded data. Open the colors.csv file in a separate tab to see data in a spreadsheet type layout and compare what you see in Jupyter Lab with the content on that tab.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
##TODO load the 'colors.csv' data and store and print the data
##TODO load the 'colors.csv' data and store and print the data
colors=...
colors=...
print(colors)
print(colors)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
You can use double indexing to access specific elements from the loaded data. Recall that the first indexing will access the row and the second indexing will access the column within that row:
You can use double indexing to access specific elements from the loaded data. Recall that the first indexing will access the row and the second indexing will access the column within that row:
```python
```python
colors[3][0]
colors[3][0]
```
```
```
```
Aqua
Aqua
```
```
```python
```python
colors[3][2]
colors[3][2]
```
```
```
```
rgb(0,100,100)
rgb(0,100,100)
```
```
On web pages you can describe colors using the color's name, the color's hex value, or the color's rgb value. The function defined in the cell below accepts any of these three ways of describing a color and displays a tiny swatch of that color.
On web pages you can describe colors using the color's name, the color's hex value, or the color's rgb value. The function defined in the cell below accepts any of these three ways of describing a color and displays a tiny swatch of that color.
Finish the code in the cell below by calling `show_color()` with specific values from the `colors` variable.
Finish the code in the cell below by calling `show_color()` with specific values from the `colors` variable.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
## Display Blanched Almond (index 9) using its name (index 0)
## Display Blanched Almond (index 9) using its name (index 0)
print(colors[9][0])
print(colors[9][0])
show_color(colors[9][0])
show_color(colors[9][0])
## Display Blue Violet (index 11) using the HEX value (index 1)
## Display Blue Violet (index 11) using the HEX value (index 1)
print(colors[11][1])
print(colors[11][1])
show_color(colors[11][1])
show_color(colors[11][1])
## Display Khaki (index 61) using the rgb value (index 2)
## Display Khaki (index 61) using the rgb value (index 2)
print(...)
print(...)
show_color(...)
show_color(...)
## You pick a color and display the color swatch using the name, HEX value, and rgb value
## You pick a color and display the color swatch using the name, HEX value, and rgb value
## also print the name, HEX value, and rgb value
## also print the name, HEX value, and rgb value
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Now let's work with our survey data. Finish the code in the cell below to load the data using the `process_csv()` function. Then divide the header row off from the data portion, and printer the header and the top 2 rows of the data.
Now let's work with our survey data. Finish the code in the cell below to load the data using the `process_csv()` function. Then divide the header row off from the data portion, and printer the header and the top 2 rows of the data.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Call the process_csv function and store the list of lists in cs220_csv
# Call the process_csv function and store the list of lists in cs220_csv
cs220_csv=process_csv('cs220_survey_data.csv')
cs220_csv=process_csv('cs220_survey_data.csv')
cs220_csv
cs220_csv
cs220_header=...
cs220_header=...
cs220_data=...
cs220_data=...
print(cs220_header)
print(cs220_header)
print(...)
print(...)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## CSVs as a List of Lists
## CSVs as a List of Lists
Finish the cells below to access and print the desired item. Since the csv_data is a list of lists, you will need to use double indexing.
Finish the cells below to access and print the desired item. Since the csv_data is a list of lists, you will need to use double indexing.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out the lecture number of the 4th student...by hardcoding its row and column....
# Print out the lecture number of the 4th student...by hardcoding its row and column....
cs220_data[???][???]# [row][col]
cs220_data[???][???]# [row][col]
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out the sleeping habit for the 2nd student...by hardcoding its row and column....
# Print out the sleeping habit for the 2nd student...by hardcoding its row and column....
cs220_data[???][???]
cs220_data[???][???]
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out how many students completed the survey.
# Print out how many students completed the survey.
len(???)
len(???)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out every student's sleep habits and major
# Print out every student's sleep habits and major
foriinrange(len(cs220_data)):
foriinrange(len(cs220_data)):
current_sleep_habit=???
current_sleep_habit=???
current_major=???
current_major=???
print(current_sleep_habit+'\t\t'+current_major)
print(current_sleep_habit+'\t\t'+current_major)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# FIX: Print out every students' age in 10 years.
# FIX: Print out every students' age in 10 years.
foriinrange(???):
foriinrange(???):
current_age=cs220_data[i][2]
current_age=cs220_data[i][2]
print(current_age+10)
print(current_age+10)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## It would be nice to have a helper function!
## It would be nice to have a helper function!
Let's introduce `cell`
Let's introduce `cell`
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Remember creating cs220_header?
# Remember creating cs220_header?
cs220_header
cs220_header
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Get the column index of "Pizza Topping"
# Get the column index of "Pizza Topping"
cs220_header.index(???)
cs220_header.index(???)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# We want to invoke something like...
# We want to invoke something like...
# cell(24, "Cats or Dogs")
# cell(24, "Cats or Dogs")
# cell(63, "Zip Code")
# cell(63, "Zip Code")
defcell(row_idx,col_name):
defcell(row_idx,col_name):
col_idx=???# get the index of col_name
col_idx=???# get the index of col_name
val=???# get the value of cs220_data at the specified cell
val=???# get the value of cs220_data at the specified cell
returnval
returnval
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out the lecture number of the 4th student... using the cell function
# Print out the lecture number of the 4th student... using the cell function
cell(???,???)
cell(???,???)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out every student's sleep habits and major using the cell function
# Print out every student's sleep habits and major using the cell function
foriinrange(len(cs220_data)):
foriinrange(len(cs220_data)):
current_sleep_habit=???
current_sleep_habit=???
current_major=???
current_major=???
print(current_sleep_habit+'\t\t'+current_major)
print(current_sleep_habit+'\t\t'+current_major)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out every students' age in 10 years using the cell function
# Print out every students' age in 10 years using the cell function
# ... that didn't really help us here!
# ... that didn't really help us here!
foriinrange(len(cs220_data)):
foriinrange(len(cs220_data)):
current_age=cell(i,"Age")
current_age=cell(i,"Age")
ifcurrent_age!=None:
ifcurrent_age!=None:
print(current_age+10)
print(current_age+10)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Improve the cell function so it returns the appropriate type.
# Improve the cell function so it returns the appropriate type.
# If there is nothing in the cell, return None
# If there is nothing in the cell, return None
defcell(row_idx,col_name):
defcell(row_idx,col_name):
col_idx=cs220_header.index(col_name)
col_idx=cs220_header.index(col_name)
val=cs220_data[row_idx][col_idx]
val=cs220_data[row_idx][col_idx]
if???:
if???:
returnNone
returnNone
elif???:
elif???:
returnint(val)
returnint(val)
else:
else:
returnval
returnval
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Print out every students' age in 10 years using the cell function
# Print out every students' age in 10 years using the cell function
# ... much better!
# ... much better!
foriinrange(len(cs220_data)):
foriinrange(len(cs220_data)):
current_age=cell(i,"Age")
current_age=cell(i,"Age")
ifcurrent_age!=None:
ifcurrent_age!=None:
print(current_age+10)
print(current_age+10)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Get the average age of each lecture...
# Get the average age of each lecture...
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## You try!
## You try!
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Complete the challenges below. First try completing the problem directly using the list of lists (e.g. double indexing \[\]\[\]), then try using the `cell` function!
Complete the challenges below. First try completing the problem directly using the list of lists (e.g. double indexing \[\]\[\]), then try using the `cell` function!
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Of all runners, how many are procrastinators?
# Of all runners, how many are procrastinators?
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# What percentage of 18-year-olds have their major declared as "Other"?
# What percentage of 18-year-olds have their major declared as "Other"?
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Does the oldest basil/spinach-loving Business major prefer cats, dogs, or neither?
# Does the oldest basil/spinach-loving Business major prefer cats, dogs, or neither?
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Summary
## Summary
The `csv` module can be used to load CSV files and store them as a list of lists. Use double-indexing and helper functions to work with data in this format.
The `csv` module can be used to load CSV files and store them as a list of lists. Use double-indexing and helper functions to work with data in this format.