Skip to content
Snippets Groups Projects
Commit 98cc8c28 authored by LOUIS TYRRELL OLIPHANT's avatar LOUIS TYRRELL OLIPHANT
Browse files

lec 16 list practice

parent e4a0714f
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
## Announcements
### CS 220 Enrichment Activities
Students interested in working on a real-world data set and learning about the full data processing pipeline?
Voluntary working groups will learn about data management, data wrangling/processing, modeling, and reporting/communication skills.
**When: Thursday, March 6th @ 4pm**
**Where: Computer Science Room 1325**
### Resources To Improve In The Course
* **CS 220 Office Hours** -- As I'm sure you know, you can go to the [course office hours](https://sites.google.com/wisc.edu/cs220-oh-sp25/) to get help with labs and projects.
* **CS Learning Center** -- Offer free [small group tutoring](https://www.cs.wisc.edu/computer-sciences-learning-center-cslc/), not for debugging your programs, but to talk about course concepts.
* **Undergraduate Learning Center** -- Provides tutoring and [academic support](https://engineering.wisc.edu/student-services/undergraduate-learning-center/). They have [drop-in tutoring](https://intranet.engineering.wisc.edu/undergraduate-students/ulc/drop-in-tutoring/).
%% Cell type:markdown id: tags:
## Warmup
%% Cell type:code id: tags:
``` python
# Warmup 0: Hotkeys
# We move quickly, it's good to know some hotkeys!
# All-around good-to-knows...
# Ctrl+A: Select all the text in a cell.
# Ctrl+C: Copy selected text.
# Ctrl+X: Cut selected text.
# Ctrl+V: Paste text from clipboard.
# Ctrl+S: Save.
# Jupyter-specific good-to-knows...
# Ctrl+Enter: Run Cell
# Ctrl+/: Comment/uncomment sections of code.
# Esc->A: Insert cell above
# Esc->B: Insert cell below
# Esc->Shift+L: Toggle line numbers (not working on my machine, but can look under View->Show Line Numbers).
```
%% Cell type:code id: tags:
``` python
# Warmup 1: Empty List
weekend_plans = [] # I have no weekend plans :(
print(weekend_plans)
# TODO add three things to your weekend plans using .append
print(weekend_plans)
# TODO add three things to your weekend using .extend
print(weekend_plans)
```
%% Cell type:code id: tags:
``` python
# Warmup 2: Tic-Tac-Toe
board = []
# TODO using .append(), add three lists of row data for tic-tac-toe (Noughts and Crosses)
# make up the placement of X's and O's.
# TODO now use nested loops to print the board
# TODO print out the center value using double indexing
print(board[1][1])
```
%% Cell type:markdown id: tags:
# List Practice
**Readings**
- Optional: [Set Data Type](https://docs.python.org/3.10/library/stdtypes.html#set-types-set-frozenset)
- Optional: [W3Schools on Set Data Type](https://www.w3schools.com/python/python_sets.asp)
**Objectives**
- Understand and use the `set` data type for removing duplicates
- Create helper functions for filtering data
- Use the `sort()` and `sorted()` functions and understand the difference between them.
%% Cell type:markdown id: tags:
## Set Data Type
Another data type similar to a list is a set. To create a set you use curly braces instead of square brackets. Sets cannot contain duplicate items.
```python
my_set = {2, 3, 3, 4}
print(my_set)
```
```
{2,3,4}
```
One common use for sets is to remove duplicates from a list. You can do this by converting a list to a set and then convert it back again.
```python
groceries = ['apples','oranges','kiwis','apples']
groceries = list(set(groceries))
print(groceries)
```
```
['apples','oranges','kiwis']
```
You can read more about sets and the methods that you can use with them at [w3school](https://www.w3schools.com/python/python_sets.asp).
### You Try It
Use the `.add()` and `.discard()` methods to the set below to add at least 4 weekend plans, with one being a duplicate and then removing one of the weekend plans.
%% Cell type:code id: tags:
``` python
# However, it is unordered and unique.
# The function names are also a little different!
weekend_plans = set() # creates an empty set
## TODO: use .add() to add 4 items to weekend_plans with one being a duplicate
print(weekend_plans)
# TODO: use .discard() to remove one of the items from the set
# Unlike a list's remove, this will not throw an error if DNE (does not exist).
print(weekend_plans)
```
%% Cell type:markdown id: tags:
## Helper Functions
Last class we created the `cell()` function to help with selecting values from the survey data. Let's take this a step further today and create a range of helper functions that we can use to answer more challenging questions. Investing time into creating good helper functions can speed up tackling answering these challenging questions and they provide flexibility so you can use the same helper functions to answer a variety of questions.
First, let's create our `process_csv()` function to help with loading the data and then split the data into the header and data portions. Finish the code in the cells below:
%% Cell type:code id: tags:
``` python
import csv
# source: Automate the Boring Stuff with Python Ch 12
def process_csv(filename):
exampleFile = open(filename, encoding="utf-8")
exampleReader = csv.reader(exampleFile)
exampleData = list(exampleReader)
exampleFile.close()
return exampleData
```
%% Cell type:code id: tags:
``` python
# TODO: Seperate the data into 2 parts...
# a header row, and a list of data rows
cs220_csv = process_csv('cs220_survey_data.csv')
cs220_header = ...
cs220_data = ...
```
%% Cell type:markdown id: tags:
Let's make the `cell()` function again, but this time let's make it a little more robust. If you recall we filtered empty values and returned the `None` instead of an empty string. We also converted the `Age` column's type to int() (and filtered out if a decimal point was found.
Let's take that a step further and think about every column and what values we would want to filter out and instead return a `None` and what the data types we would want to return. The `Latitude` and `Longitude` columns contain values that would best be treated as a float. Can you think of any other changes to data types or specific values that should be filtered out?
Make those changes to the cell function below:
%% Cell type:code id: tags:
``` python
# Remember the improved cell function
def cell(row_idx, col_name):
col_idx = cs220_header.index(col_name)
val = cs220_data[row_idx][col_idx]
if val == "":
return None
elif col_name == "Age":
return int(val)
##TODO add an elif for converting latitude and longitude to float
else:
return val
```
%% Cell type:markdown id: tags:
Let's test our improved `cell()` function and make sure it is working properly.
Since the `Age`, `Latitude` and `Longitude` should now all be numbers, let's loop through the data and check the return type for these columns. Finish the code in the cell below.
%% Cell type:code id: tags:
``` python
for i in range(len(cs220_data)):
age = cell(i,'Age')
if age == None or type(age) == int:
pass
else:
print("found an age which is the wrong type:",age)
##TODO add checks for latitude and longitude
```
%% Cell type:markdown id: tags:
### Helper Function: Getting smallest value in column
Now let's think about functions as tools. What tool would we want to help answer questions we might have. One possible question we might have revolves around find the smallest or largest value in a column.
And if we think for a moment about the design of the function, we can make it very flexible. In other words it might help with multiple possible questions. Some possible questions involving the smallest value in a column might be:
* What is the age of the youngest person who answered the survey?
* What is the age of the youngest person in your lecture?
* What song did the youngest person enter?
All of these questions involve a youngest person. But notice two things:
* Knowing the index of this person allows us to find out other properties about the person
* Working with a subset of the data (e.g. youngest in your lecture) is a common practice
To handle these different ways of looking for the youngest, we can create our function like so:
```python
def get_smallest(col_name, indexes=None):
"""Returns the index of the smallest value
for the column with col_name, looking only
at the rows in indexes. If indexes is None
then looks at all rows."""
```
Notice that the function returns the index, not the value. The function also has an optional `indexes` parameter which can be used if you already have a subset of the rows you want to work with.
Finish writing the function below.
%% Cell type:code id: tags:
``` python
def get_smallest(col_name, indexes=None):
"""Returns the index of the smallest value
for the column with col_name, looking only
at the rows in indexes. If indexes is None
then looks at all rows."""
if indexes == None:
indexes = list(range(0,len(cs220_data)))
smallest_value = None
smallest_index = 0
for i in indexes:
val = cell(i,col_name)
if val == None:
continue
##TODO FINISH Function
return smallest_index
```
%% Cell type:markdown id: tags:
Okay, if we have written our function well, we can now use it to answer the questions we have. Use the `get_smallest()` to answer the questions in the cell below.
%% Cell type:code id: tags:
``` python
##TODO What is the age of the youngest person who answered the survey?
##TODO What song did the youngest person enter?
```
%% Cell type:markdown id: tags:
### Helper Function: Filter rows by a column that matches a value
Notice we didn't ask one question -- What is the age of the youngest person in your lecture? To answer this we first need to get a list of just the rows that are for your lecture.
Let's create a filter function that will return a list of the indexes that meet some matching criteria. And we can still use the option indexes idea.
```python
def filter_match(col_name,col_value,indexes=None):
"""returns a subset of indexes where the
col_name has a value of col_value. If indexes
is None then looks at all rows"""
```
Finish the code in the cell below.
%% Cell type:code id: tags:
``` python
def filter_match(col_name,col_value,indexes=None):
if indexes == None:
indexes = list(range(len(cs220_data)))
ret_value = []
for i in indexes:
val = cell(i,col_name)
##TODO: Finish function
return ret_value
```
%% Cell type:markdown id: tags:
Okay, use this `filter_match()` function, perhaps combined with `get_smallest()`, to answer the questions in the cell below.
%% Cell type:code id: tags:
``` python
## TODO how many people are in your lecture who filled out the survey?
## TODO What is the age of the youngest person in your lecture?
```
%% Cell type:markdown id: tags:
### Helper Function: Filter rows by a column that contains a value
Exact matching is only one way that we might want to filter our data. Another possible way would be if a column contains a particular value as a portion of what was entered. For example, how many primary majors are Engineering majors? Since the field is a string and "Engineering" is only a portion of the value, we really want to see of the field contains the value.
Finish the `filter_contains()` function below.
%% Cell type:code id: tags:
``` python
def filter_contains(col_name,col_value,indexes=None):
"""returns a subset of indexes where the column
col_name has values that are strings and col_value is
a portion of the value (case insensitive)
"""
if indexes == None:
indexes = list(range(len(cs220_data)))
ret_value = []
col_value = col_value.lower()
for i in indexes:
val = cell(i,col_name)
if val == None:
continue
val = val.lower()
##TODO FINISH FUNCTION
return ret_value
```
%% Cell type:markdown id: tags:
With our functions we can answer some rather challenging questions.
%% Cell type:code id: tags:
``` python
## TODO: How many people in your lecture are majoring in Engineering?
## TODO: What Bruno Mar's songs did people enter?
```
%% Cell type:markdown id: tags:
Come up with your own questions where you can use these functions.
%% Cell type:code id: tags:
``` python
## TODO: What is your question? Write the question then write code to answer it.
```
%% Cell type:markdown id: tags:
## Sorting
There are two ways common ways in Python to sort a list. One way modifies the original list and the second creates a new list that is sorted but does not modify the original list.
- The `sorted()` function creates a new sorted list and leaves the original list unmodified.
- The `.sort()` method sorts the original list, mutating it.
```python
x = [2, 4, 1]
y = sorted(x)
print(x)
print(y)
```
```
[2,4,1]
[1,2,4]
```
```python
x = [2, 4, 1]
y = x.sort()
print(x)
print(y)
```
```
[1,2,4]
None
```
%% Cell type:code id: tags:
``` python
# TODO Sort using sorted() function
x = [2, 4, 1]
y = ...
print(x)
print(y)
```
%% Cell type:code id: tags:
``` python
# TODO Sort using sort() method
x = [2, 4, 1]
y = ...
print(x)
print(y)
```
%% Cell type:markdown id: tags:
## You Try It
Using sorting and the functions above to find:
- A sorted list of songs from those who run and are over 20
- A sorted list of majors of the procrastinators, removing duplicates
%% Cell type:code id: tags:
``` python
##TODO: Sorted list of songs from runners over 20
```
%% Cell type:code id: tags:
``` python
##TODO: Sorted list of majors by procrastinators -- no duplicates
```
%% Cell type:code id: tags:
``` python
# Lecture 16 worksheet answers
# https://www.msyamkumar.com/cs220/s22/materials/lec-16-worksheet.pdf
# The purpose of this worksheet is to prepare you for exam questions.
# You should do the worksheet by hand, then check your work.
# If you have questions please make a public post on Piazza and include the Given
# Students, feel free to answer each other's questions
```
%% Cell type:code id: tags:
``` python
# Problem 1 Given:
nums = [100, 2, 3, 40, 99]
words = ["three", "two", "one"]
```
%% Cell type:code id: tags:
``` python
# Problem 1 answers
print(nums[-1])
print(nums[1:3])
print(words[1])
print(words[1][1])
print(words[1][-2] * nums[2])
print()
print(words.index("two"))
print(nums[words.index("two")])
print(nums[:1] + words[:1])
print(",".join(words))
print((",".join(words))[4:7])
```
%% Output
99
[2, 3]
two
w
www
1
2
[100, 'three']
three,two,one
e,t
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
# Problem 2 Given:
rows = [["x", "y","name"], [3,4,"Alice"], [9,1,"Bob"], [-3,4,"Cindy"]]
header = rows[0]
data = rows[1:]
X = 0
Y = 1
NAME = 2
```
%% Cell type:code id: tags:
``` python
# Problem 2 answers
print(len(rows))
print(len(data))
print(len(header))
print(rows[1][-1])
print(data[1][-1])
print()
print(header.index("name"))
print(data[-1][header.index("name")])
print((data[0][X] + data[1][X] + data[2][X]) / 3)
print((data[-1][X] ** 2 + data[-1][Y] ** 2) ** 0.5)
print(min(data[0][NAME], data[1][NAME], data[2][NAME]))
```
%% Output
4
3
3
Alice
Bob
2
Cindy
3.0
5.0
Alice
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
# Problem 3 Given:
rows = [ ["Food Science", "24000", "0.049188446", "62000"],
["CS", "783000", "0.049518657", "78000"],
["Microbiology", "70000", "0.050880749", "60000"],
["Math", "433000", "0.05293608", "66000"] ]
hd = ["major", "students", "unemployed", "salary"]
```
%% Cell type:code id: tags:
``` python
# Problem 3 answers
print(rows[1][0])
print(rows[3][hd.index("students")])
print(len(hd) == len(rows[1]))
print(rows[0][1] + rows[2][1])
```
%% Output
CS
433000
True
2400070000
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
# Problem 4 Given:
rows = [ ["city", "state", "y14", "y15"],
["Chicago", "Illinois", "411", "478"],
["Milwaukee", "Wisconsin", "90", "145"],
["Detroit", "Michigan", "298", "295"] ]
hd = rows[0]
rows = rows[1:] #this removes the header and stores the result in rows
```
%% Cell type:code id: tags:
``` python
# Problem 4 answers:
print(rows[0][hd.index("city")])
print(rows[0][hd.index("y14")])
print(rows[2][hd.index("y14")] < rows[2][hd.index("y15")])
print(", ".join(rows[-1][:2]))
```
%% Output
Chicago
411
False
Detroit, Michigan
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
# Problem 5 Given:
```
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment