Compare revisions

5087efbe · 5087efbe · 5087efbe · 5087efbe · 5087efbe · 5087efbe
--- a/Old_exams/Final_Exam/f21-final/f21-final-answers.txt
+++ b/Old_exams/Final_Exam/f21-final/f21-final-answers.txt
+1) B
+2) C
+3) A
+4) D
+5) D
+6) C
+7) A
+8) C
+9) A
+10) C
+11) D
+12) B
+13) A
+14) B
+15) D
+16) B
+17) B
+18) C
+19) C
+20) A
+21) B
+22) C
+23) B
+24) B
+25) C
+26) B
+27) B
+28) D
+29) C
+30) A
--- a/Old_exams/Final_Exam/f21-final/f21-final.pdf
+++ b/Old_exams/Final_Exam/f21-final/f21-final.pdf
--- a/Old_exams/Final_Exam/f22-final/f22-final-answers.txt
+++ b/Old_exams/Final_Exam/f22-final/f22-final-answers.txt
+1) A
+2) B
+3) D
+4) A
+5) A
+6) D
+7) C
+8) C
+9) B
+10) C
+11) A
+12) B
+13) C
+14) B
+15) B
+16) A
+17) A
+18) C
+19) A
+20) C
+21) D
+22) B
+23) D
+24) C
+25) C
+26) D
+27) C
+28) C
+29) C
+30) C
\ No newline at end of file
--- a/Old_exams/Final_Exam/f22-final/f22-final.pdf
+++ b/Old_exams/Final_Exam/f22-final/f22-final.pdf
--- a/Old_exams/Final_Exam/s22-final/s22-final-answers.txt
+++ b/Old_exams/Final_Exam/s22-final/s22-final-answers.txt
+1) C
+2) C
+3) D
+4) D
+5) B
+6) B
+7) A
+8) E
+9) A
+10) C
+11) A
+12) C
+13) C
+14) A
+15) B
+16) A
+17) B
+18) D
+19) A
+20) B
+21) B
+22) D
+23) B
+24) B
+25) A
+26) E
+27) B
+28) A
+29) A
+30) A
--- a/Old_exams/Final_Exam/s22-final/s22-final.pdf
+++ b/Old_exams/Final_Exam/s22-final/s22-final.pdf
--- a/Old_exams/Final_Exam/s23-final/s23-final-answers.pdf
+++ b/Old_exams/Final_Exam/s23-final/s23-final-answers.pdf
--- a/Old_exams/Final_Exam/s23-final/s23-final.pdf
+++ b/Old_exams/Final_Exam/s23-final/s23-final.pdf
--- a/Old_exams/Final_Exam/su23-final/su23-final-solution.pdf
+++ b/Old_exams/Final_Exam/su23-final/su23-final-solution.pdf
--- a/Old_exams/Final_Exam/su23-final/su23-final.pdf
+++ b/Old_exams/Final_Exam/su23-final/su23-final.pdf
--- a/Projects/P3/.gitkeep
+++ b/Projects/P3/.gitkeep
--- a/Projects/P4/.gitkeep
+++ b/Projects/P4/.gitkeep
--- a/Projects/P5/.gitkeep
+++ b/Projects/P5/.gitkeep
--- a/Projects/P6/.gitkeep
+++ b/Projects/P6/.gitkeep
--- a/cs320.cs.wisc.edu @ a358174d
+++ b/cs320.cs.wisc.edu @ a358174d
-Subproject commit a358174dbd16992bb01668111a6c50b345cee33e
--- a/lecture_material/03-performance1/reading1.html
+++ b/lecture_material/03-performance1/reading1.html
--- a/lecture_material/03-performance1/reading1.ipynb
+++ b/lecture_material/03-performance1/reading1.ipynb
-%% Cell type:markdown id: tags:
-
-# Running and Timing Programs
-
-In this notebook, we'll learn how to write programs that can launch other programs and time how long it takes to do things (you'll often be combining these skills to time how long it takes to run a program).
-
-Both these skills are covered in much more detail in an optional reading, Chapter 17 of Automate the Boring Stuff: https://automatetheboringstuff.com/2e/chapter17/.  If you decide to read that, we recommend skipping the middle sections, "Multithreading" through "Project: Multithreaded XKCD Downloader"
-
-## Running Programs
-
-### Example 1: Running `pwd`
-
-Remember that running the `pwd` program in a shell tells you what directory you're currently in.  Let's write some Python code to run the `pwd` program automatically and capture the output.  We'll do this with the `check_output` function in the `subprocess` module (https://docs.python.org/2/library/subprocess.html#subprocess.check_output) -- let's import that.
-
-%% Cell type:code id: tags:
-
-``` python
-from subprocess import check_output
-```
-
-%% Cell type:markdown id: tags:
-
-In the simplest form, we can run pass a program name (as a string) to the function, which will capture and return the output:
-
-%% Cell type:code id: tags:
-
-``` python
-output = check_output("pwd")
-output
-```
-
-%% Output
-
-    b'/home/trh/lec3\n'
-
-%% Cell type:markdown id: tags:
-
-What type is that output?  It looks like a string, but with a "b" in front.  Hmmmm....
-
-%% Cell type:code id: tags:
-
-``` python
-type(output)
-```
-
-%% Output
-
-    bytes
-
-%% Cell type:markdown id: tags:
-
-The `bytes` type in Python is a sequence, like a string.  The difference is that `bytes` may contain letters (as in this case), or other types.  If we know the encoding of a bytes sequence, we can convert to a string as follows:
-
-%% Cell type:code id: tags:
-
-``` python
-str_output = str(output, encoding="utf-8")
-str_output
-```
-
-%% Output
-
-    '/home/trh/lec3\n'
-
-%% Cell type:code id: tags:
-
-``` python
-type(str_output)
-```
-
-%% Output
-
-    str
-
-%% Cell type:markdown id: tags:
-
-### Example 2: Checking Versions
-
-What version of git do we have on this computer?  From the command line, we could run `git --version` to find out.  But let's do that in code.  This is a little trickier because we have both a program name, `git`, and an argument, `--version`.  The `checkout_output` function supports two ways of running programs with arguments.
-
-Way 1: pass `shell=True`
-
-%% Cell type:code id: tags:
-
-``` python
-check_output("git --version", shell=True)
-```
-
-%% Output
-
-    b'git version 2.17.1\n'
-
-%% Cell type:markdown id: tags:
-
-Or (preferred), we can pass the program and arguments in one list:
-
-%% Cell type:code id: tags:
-
-``` python
-check_output(["git", "--version"])
-```
-
-%% Output
-
-    b'git version 2.17.1\n'
-
-%% Cell type:markdown id: tags:
-
-Let's actually do the string manipulation work to isolate the version:
-
-%% Cell type:code id: tags:
-
-``` python
-output = str(check_output(["git", "--version"]), encoding="utf-8")
-output
-```
-
-%% Output
-
-    'git version 2.17.1\n'
-
-%% Cell type:code id: tags:
-
-``` python
-parts = output.strip().split()
-parts
-```
-
-%% Output
-
-    ['git', 'version', '2.17.1']
-
-%% Cell type:code id: tags:
-
-``` python
-version = parts[-1]
-version
-```
-
-%% Output
-
-    '2.17.1'
-
-%% Cell type:markdown id: tags:
-
-If we needed to have a specific version, we might use the above to have an assert like this:
-
-```
-assert version == `2.17.1`
-```
-
-What if the program isn't installed, or we pass it some arguments that cause it to crash, as in the following example?  We'll want to have catch some exceptions in these scenarios:
-
-%% Cell type:code id: tags:
-
-``` python
-import subprocess
-
-try:
-    output = str(check_output(["git", "--oops"]), encoding="utf-8")
-except FileNotFoundError:
-    print("program not installed?")
-except subprocess.CalledProcessError as e:
-    print("program crashed")
-    # if there were any output before it crashed, we could look at it
-    # with this:
-    print("OUTPUT:", e.output)
-```
-
-%% Output
-
-    program crashed
-    OUTPUT: b''
-
-%% Cell type:markdown id: tags:
-
-### Example 3: Making Animations
-
-A common situation is that there will be some program that does something useful that we can't directly do in Python, and we'll want to write Python code to run these external programs to make use of their features.
-
-For example, the `ffmpeg` program can make an animated video by glueing together a bunch of `.png` image files in sequence.  There are ways to make animations directly in Python, but for now let's see how we can execute `ffmpeg` with `check_output` to make a video.
-
-First, you should install the `ffmpeg` program on Ubuntu so we can use it -- run the following in the shell:
-
-```
-sudo apt install ffmpeg
-```
-
-Now, let's write some code to make a series of plots with a red dot in different positions, and save those plots as `0.png`, `1.png`, etc.  The idea is that these images are similar enough that if you flipped through them, it would look like a rough video.
-
-%% Cell type:code id: tags:
-
-``` python
-import os
-import matplotlib
-from matplotlib import pyplot as plt
-```
-
-%% Cell type:code id: tags:
-
-``` python
-%matplotlib inline
-```
-
-%% Cell type:code id: tags:
-
-``` python
-matplotlib.rcParams["font.size"] = 16
-```
-
-%% Cell type:code id: tags:
-
-``` python
-def plot_circle(filename, x, y):
-    fig, ax = plt.subplots()
-    ax.set_xlim(0, 1)
-    ax.set_ylim(0, 1)
-    ax.plot(x, y, 'ro', markersize=20)
-    fig.savefig(os.path.join("img", filename))
-
-if not os.path.exists("img"):
-    os.mkdir("img")
-
-plot_circle("0.png", x=0, y=0)
-plot_circle("1.png", x=0.25, y=0.25)
-plot_circle("2.png", x=0.5, y=0.5)
-plot_circle("3.png", x=0.75, y=0.25)
-plot_circle("4.png", x=1, y=0)
-```
-
-%% Output
-
-
-
-
-
-
-
-
-
-
-
-%% Cell type:markdown id: tags:
-
-Let's check that we created the png files in the `img` directory:
-
-%% Cell type:code id: tags:
-
-``` python
-os.listdir("img")
-```
-
-%% Output
-
-    ['0.png', '1.png', '4.png', '2.png', '3.png']
-
-%% Cell type:markdown id: tags:
-
-Let's also check that they look right.  In the `IPython.display` module, there are `Image(...)` and `HTML(...)` functions that are useful for loading pictures and HTML directly into our notebook.  Let's use the first function to check that our `0.png` file looks right.
-
-%% Cell type:code id: tags:
-
-``` python
-from IPython.display import Image, HTML
-Image(filename='img/0.png')
-```
-
-%% Output
-
-
-    <IPython.core.display.Image object>
-
-%% Cell type:markdown id: tags:
-
-Great!  Now, from the command line, try running this command, inside the same directory where this notebook is running:
-
-```
-'ffmpeg -y -framerate 5 -i img/%d.png out.mp4
-```
-
-If it succeeds, there should be an out.mp4 file generated.  Try downloading it to your computer via the Jupyter interface (don't try to open it directly from Jupyter) and open it on your laptop.  Cool, huh?
-
-Now let's try running that same command with `check_output`.  We'll need to break up all the arguments into different entries in a list:
-
-%% Cell type:code id: tags:
-
-``` python
-check_output(['ffmpeg', '-y', '-framerate', '5', '-i', 'img/%d.png', 'out.mp4'])
-```
-
-%% Output
-
-    b''
-
-%% Cell type:markdown id: tags:
-
-There was no output, which is fine.  But it should have created an `out.mp4` file, as before.  You can embed `.mp4` video files in websites with the `<video>` tag.  This is great because we can inject HTML using the `HTML(...)` function from earlier:
-
-%% Cell type:code id: tags:
-
-``` python
-HTML("This is <b>bold</b> text.")
-```
-
-%% Output
-
-    <IPython.core.display.HTML object>
-
-%% Cell type:markdown id: tags:
-
-The natural thing to do is to inject some HTML to embed the `out.mp4` animation we just created:
-
-%% Cell type:code id: tags:
-
-``` python
-HTML("""
-<video width="320" height="240" controls>
-  <source src="out.mp4" type="video/mp4">
-</video>
-""")
-```
-
-%% Output
-
-    <IPython.core.display.HTML object>
-
-%% Cell type:markdown id: tags:
-
-## Measuring Time
-
-The easiest way to measure how long something takes is to check the time before and after we do it.  We can check with the `time` function inside the `time` module.  This function returns the number of seconds elapsed since Jan 1, 1970:
-
-%% Cell type:code id: tags:
-
-``` python
-import time
-now = time.time()
-now
-```
-
-%% Output
-
-    1580137926.8447468
-
-%% Cell type:code id: tags:
-
-``` python
-minutes = now / 60
-hours = minutes / 60
-days = hours / 24
-years = days / 365
-years # should be about number of years since 1970 -- is it?
-```
-
-%% Output
-
-    50.10584496590395
-
-%% Cell type:markdown id: tags:
-
-Let's use this to time how long a print call takes:
-
-%% Cell type:code id: tags:
-
-``` python
-before = time.time()
-print("I'm printing something")
-after = time.time()
-print("It took", (after-before), "seconds to print")
-```
-
-%% Output
-
-    I'm printing something
-    It took 0.0007390975952148438 seconds to print
-
-%% Cell type:markdown id: tags:
-
-A slightly cleaner version of the same that computers milliseconds (1ms is 1/1000 seconds):
-
-%% Cell type:code id: tags:
-
-``` python
-t0 = time.time()
-print("I'm printing something")
-t1 = time.time()
-ms = (t1-t0) * 1000
-print("It took", ms, "ms to print")
-```
-
-%% Output
-
-    I'm printing something
-    It took 0.7138252258300781 ms to print
-
-%% Cell type:markdown id: tags:
-
-How long does it take to append something to the end of a list?
-
-%% Cell type:code id: tags:
-
-``` python
-L = []
-t0 = time.time()
-L.append("test")
-t1 = time.time()
-us = (t1-t0) * 1e6 # microseconds (there are 1 second has 1000000 microseconds)
-print("microseconds:", us)
-```
-
-%% Output
-
-    microseconds: 57.45887756347656
-
-%% Cell type:markdown id: tags:
-
-The problem with the above measurement is that it is it varies significantly each time you try it, and we can easily end up measuring something other than append time.  For example, what if calling `time.time()` is much slower than calling `L.append("test")`?  It is better to perform an operation many times between checking the start+stop times and then divide to get the average cost of the operation:
-
-%% Cell type:code id: tags:
-
-``` python
-L = []
-append_count = 1000000 # do 1 million appends
-t0 = time.time()
-for i in range(append_count):
-    L.append("test")
-t1 = time.time()
-us = (t1-t0) / append_count * 1e6 # microseconds (there are 1 second has 1000000 microseconds)
-print("microseconds:", us)
-```
-
-%% Output
-
-    microseconds: 0.11862897872924805
-
-%% Cell type:markdown id: tags:
-
-### Example 1: the `in` operator
-
-The `in` operator can be used to check whether a value is in a list or a set, but it's much faster on a set.  If your code needs to perform the `in` operation a lot, this is a good reason to use a set rather than a list.  Let's review how `in` works on each:
-
-%% Cell type:code id: tags:
-
-``` python
-L = ["A", "B", "C"]
-S = {"A", "B", "C"}
-```
-
-%% Cell type:code id: tags:
-
-``` python
-"A" in L, "D" in L, "A" in S, "D" in S
-```
-
-%% Output
-
-    (True, False, True, False)
-
-%% Cell type:markdown id: tags:
-
-Let's see how fast `in` is if we are checking over 1 million numbers in a list or a set.
-
-%% Cell type:code id: tags:
-
-``` python
-seq_size = 1000000
-L = list(range(seq_size))
-S = set(range(seq_size))
-
-# return average microseconds to perform lookup
-def time_lookup(data, search):
-    trials = 1000
-    t0 = time.time()
-    for i in range(trials):
-        found = search in data
-    t1 = time.time()
-    return (t1-t0)*1e6/trials
-
-time_lookup(L, 0), time_lookup(S, 0)
-```
-
-%% Output
-
-    (0.03719329833984375, 0.03743171691894531)
-
-%% Cell type:markdown id: tags:
-
-Ok, looks like looking up `0` (the first number) is about equally fast in either data structure.
-
-What if we lookup a number that's not stored?
-
-%% Cell type:code id: tags:
-
-``` python
-time_lookup(L, -1), time_lookup(S, -1)
-```
-
-%% Output
-
-    (10068.64070892334, 0.0438690185546875)
-
-%% Cell type:markdown id: tags:
-
-Woah, now the list is >10K times slower!  What if we lookup the last item in the list?
-
-%% Cell type:code id: tags:
-
-``` python
-time_lookup(L, 999999), time_lookup(S, 999999)
-```
-
-%% Output
-
-    (11480.518341064453, 0.0591278076171875)
-
-%% Cell type:markdown id: tags:
-
-The set is fast again, but the list is still really slow (about as slow as looking up something that doesn't exist).  What if we lookup a number in the middle?
-
-%% Cell type:code id: tags:
-
-``` python
-time_lookup(L, 500000), time_lookup(S, 500000)
-```
-
-%% Output
-
-    (5739.615678787231, 0.05888938903808594)
-
-%% Cell type:markdown id: tags:
-
-Well, checking for something in the middle of a list is about twice as fast as checking for the last item.  Can you guess why?
-
-It turns out that while sets are designed around making `in` fast, running `in` on a list amounts to looping over ever item, much like a call to the following function.
-
-%% Cell type:code id: tags:
-
-``` python
-def is_in(L, search):
-    for item in L:
-        if search == L: # if this is True early in the list, the search is fast
-            return True
-    return False
-```
-
-%% Cell type:markdown id: tags:
-
-How does the list size factor in when we perform an `in` and we don't find anything?  Let's do an experiment to find out.
-
-%% Cell type:code id: tags:
-
-``` python
-from pandas import Series
-
-times = Series()
-
-for size in [1000, 2000, 5000, 10000]:
-    L = list(range(size))
-    microseconds = time_lookup(L, -1)
-    times.loc[size] = microseconds
-times
-```
-
-%% Output
-
-    1000      9.510994
-    2000     18.847942
-    5000     47.323465
-    10000    95.795155
-    dtype: float64
-
-%% Cell type:code id: tags:
-
-``` python
-ax = times.plot.line(color="r")
-
-# following makes plot look better (only necessary if we plan to share it with others)
-ax.spines["right"].set_visible(False)
-ax.spines["top"].set_visible(False)
-ax.set_xlabel("List Size")
-ax.set_ylabel("Lookup Miss Time (μs)")
-None
-```
-
-%% Output
-
-
-
-%% Cell type:markdown id: tags:
-
-Looking at the above, we would say that the `in` operator scales *linearly*.  In otherwords, doubling the list size doubles the time it takes to perform the operation.
-
-### Example 2: Ratio Search
-
-Not all functions we'll encounter will scale linearly.  For example, consider this one, which checks whether the ratio of any two numbers in a list matches the ratio we're searching for:
-
-%% Cell type:code id: tags:
-
-``` python
-def ratio_search(L, ratio):
-    for numerator in L:
-        for denominator in L:
-            if numerator / denominator == ratio:
-                return True
-    return False
-
-ratio_search([1, 2, 3, 4], 0.75)
-```
-
-%% Output
-
-    True
-
-%% Cell type:code id: tags:
-
-``` python
-ratio_search([1, 2, 3, 4], 0.2)
-```
-
-%% Output
-
-    False
-
-%% Cell type:markdown id: tags:
-
-Let's see how it scales when we search for a ratio we know we won't find.
-
-%% Cell type:code id: tags:
-
-``` python
-import random, string
-
-times = Series()
-
-for i in range(6):
-    size = i * 1000
-    L = list(range(1, size+1)) # don't include 0, because we need to divide
-
-    t0 = time.time()
-    found = ratio_search(L, -1)
-    t1 = time.time()
-
-    times.loc[size] = t1-t0
-times
-```
-
-%% Output
-
-    0       0.000003
-    1000    0.056291
-    2000    0.222333
-    3000    0.499056
-    4000    0.890178
-    5000    1.397182
-    dtype: float64
-
-%% Cell type:code id: tags:
-
-``` python
-ax = times.plot.line(color="r")
-
-# following makes plot look better (only necessary if we plan to share it with others)
-ax.spines["right"].set_visible(False)
-ax.spines["top"].set_visible(False)
-ax.set_xlabel("List Size")
-ax.set_ylabel("Lookup Miss Time (μs)")
-None
-```
-
-%% Output
-
-
-
-%% Cell type:markdown id: tags:
-
-The above is an example of quadratic scaling: doubling the list size quadruples the time it takes to run!
-
-%% Cell type:markdown id: tags:
-
-# Conclusion
-
-In this notebook, we've learned how to automatically run programs and time code.  Together, these skills provide the empirical basis for exploring performance and scalability.  Soon, we'll be learning a bit of theory (complexity analysis) and notation (big-O) for thinking about what happens to performance as we add more data.
-
-%% Cell type:code id: tags:
-
-``` python
-```
-%% Cell type:markdown id: tags:
-
-# Running and Timing Programs
-
-In this notebook, we'll learn how to write programs that can launch other programs and time how long it takes to do things (you'll often be combining these skills to time how long it takes to run a program).
-
-Both these skills are covered in much more detail in an optional reading, Chapter 17 of Automate the Boring Stuff: https://automatetheboringstuff.com/2e/chapter17/.  If you decide to read that, we recommend skipping the middle sections, "Multithreading" through "Project: Multithreaded XKCD Downloader"
-
-## Running Programs
-
-### Example 1: Running `pwd`
-
-Remember that running the `pwd` program in a shell tells you what directory you're currently in.  Let's write some Python code to run the `pwd` program automatically and capture the output.  We'll do this with the `check_output` function in the `subprocess` module (https://docs.python.org/2/library/subprocess.html#subprocess.check_output) -- let's import that.
-
-%% Cell type:code id: tags:
-
-``` python
-from subprocess import check_output
-```
-
-%% Cell type:markdown id: tags:
-
-In the simplest form, we can run pass a program name (as a string) to the function, which will capture and return the output:
-
-%% Cell type:code id: tags:
-
-``` python
-output = check_output("pwd")
-output
-```
-
-%% Output
-
-    b'/home/trh/lec3\n'
-
-%% Cell type:markdown id: tags:
-
-What type is that output?  It looks like a string, but with a "b" in front.  Hmmmm....
-
-%% Cell type:code id: tags:
-
-``` python
-type(output)
-```
-
-%% Output
-
-    bytes
-
-%% Cell type:markdown id: tags:
-
-The `bytes` type in Python is a sequence, like a string.  The difference is that `bytes` may contain letters (as in this case), or other types.  If we know the encoding of a bytes sequence, we can convert to a string as follows:
-
-%% Cell type:code id: tags:
-
-``` python
-str_output = str(output, encoding="utf-8")
-str_output
-```
-
-%% Output
-
-    '/home/trh/lec3\n'
-
-%% Cell type:code id: tags:
-
-``` python
-type(str_output)
-```
-
-%% Output
-
-    str
-
-%% Cell type:markdown id: tags:
-
-### Example 2: Checking Versions
-
-What version of git do we have on this computer?  From the command line, we could run `git --version` to find out.  But let's do that in code.  This is a little trickier because we have both a program name, `git`, and an argument, `--version`.  The `checkout_output` function supports two ways of running programs with arguments.
-
-Way 1: pass `shell=True`
-
-%% Cell type:code id: tags:
-
-``` python
-check_output("git --version", shell=True)
-```
-
-%% Output
-
-    b'git version 2.17.1\n'
-
-%% Cell type:markdown id: tags:
-
-Or (preferred), we can pass the program and arguments in one list:
-
-%% Cell type:code id: tags:
-
-``` python
-check_output(["git", "--version"])
-```
-
-%% Output
-
-    b'git version 2.17.1\n'
-
-%% Cell type:markdown id: tags:
-
-Let's actually do the string manipulation work to isolate the version:
-
-%% Cell type:code id: tags:
-
-``` python
-output = str(check_output(["git", "--version"]), encoding="utf-8")
-output
-```
-
-%% Output
-
-    'git version 2.17.1\n'
-
-%% Cell type:code id: tags:
-
-``` python
-parts = output.strip().split()
-parts
-```
-
-%% Output
-
-    ['git', 'version', '2.17.1']
-
-%% Cell type:code id: tags:
-
-``` python
-version = parts[-1]
-version
-```
-
-%% Output
-
-    '2.17.1'
-
-%% Cell type:markdown id: tags:
-
-If we needed to have a specific version, we might use the above to have an assert like this:
-
-```
-assert version == `2.17.1`
-```
-
-What if the program isn't installed, or we pass it some arguments that cause it to crash, as in the following example?  We'll want to have catch some exceptions in these scenarios:
-
-%% Cell type:code id: tags:
-
-``` python
-import subprocess
-
-try:
-    output = str(check_output(["git", "--oops"]), encoding="utf-8")
-except FileNotFoundError:
-    print("program not installed?")
-except subprocess.CalledProcessError as e:
-    print("program crashed")
-    # if there were any output before it crashed, we could look at it
-    # with this:
-    print("OUTPUT:", e.output)
-```
-
-%% Output
-
-    program crashed
-    OUTPUT: b''
-
-%% Cell type:markdown id: tags:
-
-### Example 3: Making Animations
-
-A common situation is that there will be some program that does something useful that we can't directly do in Python, and we'll want to write Python code to run these external programs to make use of their features.
-
-For example, the `ffmpeg` program can make an animated video by glueing together a bunch of `.png` image files in sequence.  There are ways to make animations directly in Python, but for now let's see how we can execute `ffmpeg` with `check_output` to make a video.
-
-First, you should install the `ffmpeg` program on Ubuntu so we can use it -- run the following in the shell:
-
-```
-sudo apt install ffmpeg
-```
-
-Now, let's write some code to make a series of plots with a red dot in different positions, and save those plots as `0.png`, `1.png`, etc.  The idea is that these images are similar enough that if you flipped through them, it would look like a rough video.
-
-%% Cell type:code id: tags:
-
-``` python
-import os
-import matplotlib
-from matplotlib import pyplot as plt
-```
-
-%% Cell type:code id: tags:
-
-``` python
-%matplotlib inline
-```
-
-%% Cell type:code id: tags:
-
-``` python
-matplotlib.rcParams["font.size"] = 16
-```
-
-%% Cell type:code id: tags:
-
-``` python
-def plot_circle(filename, x, y):
-    fig, ax = plt.subplots()
-    ax.set_xlim(0, 1)
-    ax.set_ylim(0, 1)
-    ax.plot(x, y, 'ro', markersize=20)
-    fig.savefig(os.path.join("img", filename))
-
-if not os.path.exists("img"):
-    os.mkdir("img")
-
-plot_circle("0.png", x=0, y=0)
-plot_circle("1.png", x=0.25, y=0.25)
-plot_circle("2.png", x=0.5, y=0.5)
-plot_circle("3.png", x=0.75, y=0.25)
-plot_circle("4.png", x=1, y=0)
-```
-
-%% Output
-
-
-
-
-
-
-
-
-
-
-
-%% Cell type:markdown id: tags:
-
-Let's check that we created the png files in the `img` directory:
-
-%% Cell type:code id: tags:
-
-``` python
-os.listdir("img")
-```
-
-%% Output
-
-    ['0.png', '1.png', '4.png', '2.png', '3.png']
-
-%% Cell type:markdown id: tags:
-
-Let's also check that they look right.  In the `IPython.display` module, there are `Image(...)` and `HTML(...)` functions that are useful for loading pictures and HTML directly into our notebook.  Let's use the first function to check that our `0.png` file looks right.
-
-%% Cell type:code id: tags:
-
-``` python
-from IPython.display import Image, HTML
-Image(filename='img/0.png')
-```
-
-%% Output
-
-
-    <IPython.core.display.Image object>
-
-%% Cell type:markdown id: tags:
-
-Great!  Now, from the command line, try running this command, inside the same directory where this notebook is running:
-
-```
-'ffmpeg -y -framerate 5 -i img/%d.png out.mp4
-```
-
-If it succeeds, there should be an out.mp4 file generated.  Try downloading it to your computer via the Jupyter interface (don't try to open it directly from Jupyter) and open it on your laptop.  Cool, huh?
-
-Now let's try running that same command with `check_output`.  We'll need to break up all the arguments into different entries in a list:
-
-%% Cell type:code id: tags:
-
-``` python
-check_output(['ffmpeg', '-y', '-framerate', '5', '-i', 'img/%d.png', 'out.mp4'])
-```
-
-%% Output
-
-    b''
-
-%% Cell type:markdown id: tags:
-
-There was no output, which is fine.  But it should have created an `out.mp4` file, as before.  You can embed `.mp4` video files in websites with the `<video>` tag.  This is great because we can inject HTML using the `HTML(...)` function from earlier:
-
-%% Cell type:code id: tags:
-
-``` python
-HTML("This is <b>bold</b> text.")
-```
-
-%% Output
-
-    <IPython.core.display.HTML object>
-
-%% Cell type:markdown id: tags:
-
-The natural thing to do is to inject some HTML to embed the `out.mp4` animation we just created:
-
-%% Cell type:code id: tags:
-
-``` python
-HTML("""
-<video width="320" height="240" controls>
-  <source src="out.mp4" type="video/mp4">
-</video>
-""")
-```
-
-%% Output
-
-    <IPython.core.display.HTML object>
-
-%% Cell type:markdown id: tags:
-
-## Measuring Time
-
-The easiest way to measure how long something takes is to check the time before and after we do it.  We can check with the `time` function inside the `time` module.  This function returns the number of seconds elapsed since Jan 1, 1970:
-
-%% Cell type:code id: tags:
-
-``` python
-import time
-now = time.time()
-now
-```
-
-%% Output
-
-    1580137926.8447468
-
-%% Cell type:code id: tags:
-
-``` python
-minutes = now / 60
-hours = minutes / 60
-days = hours / 24
-years = days / 365
-years # should be about number of years since 1970 -- is it?
-```
-
-%% Output
-
-    50.10584496590395
-
-%% Cell type:markdown id: tags:
-
-Let's use this to time how long a print call takes:
-
-%% Cell type:code id: tags:
-
-``` python
-before = time.time()
-print("I'm printing something")
-after = time.time()
-print("It took", (after-before), "seconds to print")
-```
-
-%% Output
-
-    I'm printing something
-    It took 0.0007390975952148438 seconds to print
-
-%% Cell type:markdown id: tags:
-
-A slightly cleaner version of the same that computers milliseconds (1ms is 1/1000 seconds):
-
-%% Cell type:code id: tags:
-
-``` python
-t0 = time.time()
-print("I'm printing something")
-t1 = time.time()
-ms = (t1-t0) * 1000
-print("It took", ms, "ms to print")
-```
-
-%% Output
-
-    I'm printing something
-    It took 0.7138252258300781 ms to print
-
-%% Cell type:markdown id: tags:
-
-How long does it take to append something to the end of a list?
-
-%% Cell type:code id: tags:
-
-``` python
-L = []
-t0 = time.time()
-L.append("test")
-t1 = time.time()
-us = (t1-t0) * 1e6 # microseconds (there are 1 second has 1000000 microseconds)
-print("microseconds:", us)
-```
-
-%% Output
-
-    microseconds: 57.45887756347656
-
-%% Cell type:markdown id: tags:
-
-The problem with the above measurement is that it is it varies significantly each time you try it, and we can easily end up measuring something other than append time.  For example, what if calling `time.time()` is much slower than calling `L.append("test")`?  It is better to perform an operation many times between checking the start+stop times and then divide to get the average cost of the operation:
-
-%% Cell type:code id: tags:
-
-``` python
-L = []
-append_count = 1000000 # do 1 million appends
-t0 = time.time()
-for i in range(append_count):
-    L.append("test")
-t1 = time.time()
-us = (t1-t0) / append_count * 1e6 # microseconds (there are 1 second has 1000000 microseconds)
-print("microseconds:", us)
-```
-
-%% Output
-
-    microseconds: 0.11862897872924805
-
-%% Cell type:markdown id: tags:
-
-### Example 1: the `in` operator
-
-The `in` operator can be used to check whether a value is in a list or a set, but it's much faster on a set.  If your code needs to perform the `in` operation a lot, this is a good reason to use a set rather than a list.  Let's review how `in` works on each:
-
-%% Cell type:code id: tags:
-
-``` python
-L = ["A", "B", "C"]
-S = {"A", "B", "C"}
-```
-
-%% Cell type:code id: tags:
-
-``` python
-"A" in L, "D" in L, "A" in S, "D" in S
-```
-
-%% Output
-
-    (True, False, True, False)
-
-%% Cell type:markdown id: tags:
-
-Let's see how fast `in` is if we are checking over 1 million numbers in a list or a set.
-
-%% Cell type:code id: tags:
-
-``` python
-seq_size = 1000000
-L = list(range(seq_size))
-S = set(range(seq_size))
-
-# return average microseconds to perform lookup
-def time_lookup(data, search):
-    trials = 1000
-    t0 = time.time()
-    for i in range(trials):
-        found = search in data
-    t1 = time.time()
-    return (t1-t0)*1e6/trials
-
-time_lookup(L, 0), time_lookup(S, 0)
-```
-
-%% Output
-
-    (0.03719329833984375, 0.03743171691894531)
-
-%% Cell type:markdown id: tags:
-
-Ok, looks like looking up `0` (the first number) is about equally fast in either data structure.
-
-What if we lookup a number that's not stored?
-
-%% Cell type:code id: tags:
-
-``` python
-time_lookup(L, -1), time_lookup(S, -1)
-```
-
-%% Output
-
-    (10068.64070892334, 0.0438690185546875)
-
-%% Cell type:markdown id: tags:
-
-Woah, now the list is >10K times slower!  What if we lookup the last item in the list?
-
-%% Cell type:code id: tags:
-
-``` python
-time_lookup(L, 999999), time_lookup(S, 999999)
-```
-
-%% Output
-
-    (11480.518341064453, 0.0591278076171875)
-
-%% Cell type:markdown id: tags:
-
-The set is fast again, but the list is still really slow (about as slow as looking up something that doesn't exist).  What if we lookup a number in the middle?
-
-%% Cell type:code id: tags:
-
-``` python
-time_lookup(L, 500000), time_lookup(S, 500000)
-```
-
-%% Output
-
-    (5739.615678787231, 0.05888938903808594)
-
-%% Cell type:markdown id: tags:
-
-Well, checking for something in the middle of a list is about twice as fast as checking for the last item.  Can you guess why?
-
-It turns out that while sets are designed around making `in` fast, running `in` on a list amounts to looping over ever item, much like a call to the following function.
-
-%% Cell type:code id: tags:
-
-``` python
-def is_in(L, search):
-    for item in L:
-        if search == L: # if this is True early in the list, the search is fast
-            return True
-    return False
-```
-
-%% Cell type:markdown id: tags:
-
-How does the list size factor in when we perform an `in` and we don't find anything?  Let's do an experiment to find out.
-
-%% Cell type:code id: tags:
-
-``` python
-from pandas import Series
-
-times = Series()
-
-for size in [1000, 2000, 5000, 10000]:
-    L = list(range(size))
-    microseconds = time_lookup(L, -1)
-    times.loc[size] = microseconds
-times
-```
-
-%% Output
-
-    1000      9.510994
-    2000     18.847942
-    5000     47.323465
-    10000    95.795155
-    dtype: float64
-
-%% Cell type:code id: tags:
-
-``` python
-ax = times.plot.line(color="r")
-
-# following makes plot look better (only necessary if we plan to share it with others)
-ax.spines["right"].set_visible(False)
-ax.spines["top"].set_visible(False)
-ax.set_xlabel("List Size")
-ax.set_ylabel("Lookup Miss Time (μs)")
-None
-```
-
-%% Output
-
-
-
-%% Cell type:markdown id: tags:
-
-Looking at the above, we would say that the `in` operator scales *linearly*.  In otherwords, doubling the list size doubles the time it takes to perform the operation.
-
-### Example 2: Ratio Search
-
-Not all functions we'll encounter will scale linearly.  For example, consider this one, which checks whether the ratio of any two numbers in a list matches the ratio we're searching for:
-
-%% Cell type:code id: tags:
-
-``` python
-def ratio_search(L, ratio):
-    for numerator in L:
-        for denominator in L:
-            if numerator / denominator == ratio:
-                return True
-    return False
-
-ratio_search([1, 2, 3, 4], 0.75)
-```
-
-%% Output
-
-    True
-
-%% Cell type:code id: tags:
-
-``` python
-ratio_search([1, 2, 3, 4], 0.2)
-```
-
-%% Output
-
-    False
-
-%% Cell type:markdown id: tags:
-
-Let's see how it scales when we search for a ratio we know we won't find.
-
-%% Cell type:code id: tags:
-
-``` python
-import random, string
-
-times = Series()
-
-for i in range(6):
-    size = i * 1000
-    L = list(range(1, size+1)) # don't include 0, because we need to divide
-
-    t0 = time.time()
-    found = ratio_search(L, -1)
-    t1 = time.time()
-
-    times.loc[size] = t1-t0
-times
-```
-
-%% Output
-
-    0       0.000003
-    1000    0.056291
-    2000    0.222333
-    3000    0.499056
-    4000    0.890178
-    5000    1.397182
-    dtype: float64
-
-%% Cell type:code id: tags:
-
-``` python
-ax = times.plot.line(color="r")
-
-# following makes plot look better (only necessary if we plan to share it with others)
-ax.spines["right"].set_visible(False)
-ax.spines["top"].set_visible(False)
-ax.set_xlabel("List Size")
-ax.set_ylabel("Lookup Miss Time (μs)")
-None
-```
-
-%% Output
-
-
-
-%% Cell type:markdown id: tags:
-
-The above is an example of quadratic scaling: doubling the list size quadruples the time it takes to run!
-
-%% Cell type:markdown id: tags:
-
-# Conclusion
-
-In this notebook, we've learned how to automatically run programs and time code.  Together, these skills provide the empirical basis for exploring performance and scalability.  Soon, we'll be learning a bit of theory (complexity analysis) and notation (big-O) for thinking about what happens to performance as we add more data.
-
-%% Cell type:code id: tags:
-
-``` python
-```
--- a/lecture_material/03-performance1/lecture.ipynb
+++ b/lecture_material/03-performance1/lecture.ipynb
 %% Cell type:markdown id:d15a3b25 tags:

 # Performance 1

-%% Cell type:markdown id:64bfcf90 tags:
+%% Cell type:markdown id:cd7f646c tags:

 ### Few shortcuts
 * shift + enter = exceute a cell (= Run) and move to the next cell
 * ctrl + enter = excecute a cell and stay in the same cell
 * ESC + A = add a cell above the current cell
 * ESC + B = add a cell below the current cell
 * ctrl + / = toggle comment(s) (that is, adds/removes #)

 %% Cell type:markdown id:ea8c9210 tags:

 Recommendation: include all `import` statements in a cell at the top of the notebook file or your script file (`.py`).

 ### Two styles of import

 1. `from <module> import <some_function, some_variable>`
    - invocation `some_function()`
 2. `import <module>`
    - invocation `<module>.some_function()`

 %% Cell type:code id:4782ff79 tags:

 ``` python
 # import statements

 # TODO: use from style of import for importing "check_output" from subprocess
 from subprocess import check_output

 # TODO: use import style of import for importing "time" module
 import time
 ```

 %% Cell type:markdown id:de8ea97c tags:

 ### How to open documentation about a function inside `jupyter`?
 Press "Shift + tab" after entering function name.

 %% Cell type:code id:4a61ed24 tags:

 ``` python
 # TODO: open documentation for check_output
 check_output
 ```

 %% Output

    <function subprocess.check_output(*popenargs, timeout=None, **kwargs)>

 %% Cell type:markdown id:fc3025fc tags:

 ### What does `check_output` do?

 Enables us to run a command with or without arguments. It returns the output of the command.
 - Argument: command to run
 - Return value: output of the command as a `byte` object.

 %% Cell type:code id:e09ab658 tags:

 ``` python
 # TODO: invoke check_output to execute "pwd"
 pwd_output = check_output("pwd")
 pwd_output
 ```

 %% Output

-    b'/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1\n'
+    b'/home/gurmail.singh\n'

 %% Cell type:code id:668afc99 tags:

 ``` python
 # TODO: use type function call to check the output type of check_output
 type(pwd_output)
 ```

 %% Output

    bytes

 %% Cell type:markdown id:6dffec77 tags:

 ### What is a `byte` object?

 - `byte` is an example of a sequence.

 - Recall that `list`, `str`, `tuple` are examples of Python sequences.
 - Key sequence features:
    - indexing `seq[index]`
    - slicing `seq[start index:exclusive end index]`
    - iteration `for val in seq:`
-    - length `len(seq)`
+    - length `length(seq)`
    - existence / constituency match `<val> in seq`
 - indexing:
    - begins with 0 and increases by 1 for every value
    - can use negative values: -1 represents index for last value, -2 penultimate, etc.,

 %% Cell type:code id:70e41d0b-a12a-4e35-be97-083aa91af28f tags:

 ``` python
 # TODO: use indexing to extract value at index 0
 pwd_output[0]
 ```

 %% Output

    47

 %% Cell type:markdown id:60504389 tags:

 ### `byte` conversion to `str`
 - requires details about encoding
 - `str(<byte_variable>, <encoding>)`
 - Most programs in linux use `utf-8` encoding

 %% Cell type:code id:09e5bd77 tags:

 ``` python
 # Can we just convert bytes directly into str?
 # Not really, you need specify the encoding
 str(pwd_output)
 ```

 %% Output

-    "b'/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1\\n'"
+    "b'/home/gurmail.singh\\n'"

 %% Cell type:code id:fbc2ad71 tags:

 ``` python
 # TODO: let's try utf-8 encoding
 pwd_output_str = str(pwd_output, "utf-8")
 pwd_output_str
 ```

 %% Output

-    '/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1\n'
+    '/home/gurmail.singh\n'

 %% Cell type:markdown id:5ce854cd tags:

 Recall that, when you print an `str`, it formats the output.

 %% Cell type:code id:8e0c335c tags:

 ``` python
 print(pwd_output_str)
 ```

 %% Output

-    /home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1
+    /home/gurmail.singh
    

 %% Cell type:code id:da2f1bc1 tags:

 ``` python
 # You must use the correct encoding, otherwise the conversion will fail
 str(pwd_output, "cp273")
 ```

 %% Output

-    '\x07Ç?_Á\x07_Ë`/_,Í_/Ê\x07ÈÁ_ø\x07[Ë\x93\x16\x90\x05%Á[ÈÍÊÁ\x05>?ÈÁË\x07%Á[^\x90\x94^&ÁÊÃ?Ê_/>[Á^\x91\x8e'
+    '\x07Ç?_Á\x07ÅÍÊ_/Ñ%\x06ËÑ>ÅÇ\x8e'

 %% Cell type:markdown id:11ce814c tags:

 ### `str` methods recap

 - `<str_variable>.strip()`: removes leading and trailing whitespace
 - `<str_varaible>.split(<separator>)`: returns list of strings split by separator

 %% Cell type:code id:8d8bb61a-39ce-404b-81a1-992434ef26de tags:

 ``` python
 # TODO: try strip method
 pwd_output_str.strip()
 ```

 %% Output

-    '/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1'
+    '/home/gurmail.singh'

 %% Cell type:code id:fabe2232 tags:

 ``` python
 # TODO: try split method using "/" as separator
 pwd_output_str.split("/")
 ```

 %% Output

-    ['',
-     'home',
-     'msyamkumar',
-     'temp',
-     'cs320-lecture-notes',
-     'lec_04_Performance_1\n']
+    ['', 'home', 'gurmail.singh\n']

 %% Cell type:code id:27590458-53b4-4b8d-b7db-25ea54e304c3 tags:

 ``` python
 # You can string methods or function calls together
 # TODO: first strip and then split the string
 pwd_output_str.strip().split("/")
 ```

 %% Output

-    ['',
-     'home',
-     'msyamkumar',
-     'temp',
-     'cs320-lecture-notes',
-     'lec_04_Performance_1']
+    ['', 'home', 'gurmail.singh']

 %% Cell type:markdown id:a83a11b3 tags:

 ### What does `check_output` do when the command doesn't exist?
 - `FileNotFoundError`

 %% Cell type:code id:b865a24c-bf5f-4fbf-be5b-0b578df66df8 tags:

 ``` python
 # TODO: invoke check_output by passing "hahaha" as argument
 check_output("hahaha")
 ```

 %% Output

    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
 Cell     In[13], line 2
          1 # TODO: invoke check_output by passing "hahaha" as argument
    ----> 2 check_output("hahaha")
-File     /usr/lib/python3.10/subprocess.py:420, in check_output(timeout, *popenargs, **kwargs)
-        417         empty = b''
-        418     kwargs['input'] = empty
-    --> 420 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
-        421            **kwargs).stdout
-File     /usr/lib/python3.10/subprocess.py:501, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
-        498     kwargs['stdout'] = PIPE
-        499     kwargs['stderr'] = PIPE
-    --> 501 with Popen(*popenargs, **kwargs) as process:
-        502     try:
-        503         stdout, stderr = process.communicate(input, timeout=timeout)
-File     /usr/lib/python3.10/subprocess.py:969, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
-        965         if self.text_mode:
-        966             self.stderr = io.TextIOWrapper(self.stderr,
-        967                     encoding=encoding, errors=errors)
-    --> 969     self._execute_child(args, executable, preexec_fn, close_fds,
-        970                         pass_fds, cwd, env,
-        971                         startupinfo, creationflags, shell,
-        972                         p2cread, p2cwrite,
-        973                         c2pread, c2pwrite,
-        974                         errread, errwrite,
-        975                         restore_signals,
-        976                         gid, gids, uid, umask,
-        977                         start_new_session)
-        978 except:
-        979     # Cleanup if the child failed starting.
-        980     for f in filter(None, (self.stdin, self.stdout, self.stderr)):
-File     /usr/lib/python3.10/subprocess.py:1845, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
-       1843     if errno_num != 0:
-       1844         err_msg = os.strerror(errno_num)
-    -> 1845     raise child_exception_type(errno_num, err_msg, err_filename)
-       1846 raise child_exception_type(err_msg)
+File     /usr/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
+        418         empty = b''
+        419     kwargs['input'] = empty
+    --> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
+        422            **kwargs).stdout
+File     /usr/lib/python3.10/subprocess.py:503, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
+        500     kwargs['stdout'] = PIPE
+        501     kwargs['stderr'] = PIPE
+    --> 503 with Popen(*popenargs, **kwargs) as process:
+        504     try:
+        505         stdout, stderr = process.communicate(input, timeout=timeout)
+File     /usr/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
+        967         if self.text_mode:
+        968             self.stderr = io.TextIOWrapper(self.stderr,
+        969                     encoding=encoding, errors=errors)
+    --> 971     self._execute_child(args, executable, preexec_fn, close_fds,
+        972                         pass_fds, cwd, env,
+        973                         startupinfo, creationflags, shell,
+        974                         p2cread, p2cwrite,
+        975                         c2pread, c2pwrite,
+        976                         errread, errwrite,
+        977                         restore_signals,
+        978                         gid, gids, uid, umask,
+        979                         start_new_session)
+        980 except:
+        981     # Cleanup if the child failed starting.
+        982     for f in filter(None, (self.stdin, self.stdout, self.stderr)):
+File     /usr/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
+       1861     if errno_num != 0:
+       1862         err_msg = os.strerror(errno_num)
+    -> 1863     raise child_exception_type(errno_num, err_msg, err_filename)
+       1864 raise child_exception_type(err_msg)
    FileNotFoundError: [Errno 2] No such file or directory: 'hahaha'

 %% Cell type:markdown id:8512a276 tags:

 ### How can we use `check_output` to execute a command with arguments?

 - option 1: pass the command with arguments as a string and pass `True` as argument to parameter `shell`
 - option 2: pass a list of strings; for example: `[<command>, <arg1>, <arg2>]`

 %% Cell type:markdown id:9a5d0722 tags:

 ### git --version

 %% Cell type:code id:fad47b1d-1f52-4c8e-8df4-0847fe1718c6 tags:

 ``` python
 # TODO: use option 1 to run "git --version"
 check_output("git --version", shell=True)
 ```

 %% Output

    b'git version 2.34.1\n'

 %% Cell type:markdown id:632699d8 tags:

 What would happen if we switch the order of the two arguments? Recall that positional arguments should come before keyword arguments.

 %% Cell type:code id:6efc81d6 tags:

 ``` python
 check_output(shell=True, "git --version")
 ```

 %% Output

      Cell In[15], line 1
        check_output(shell=True, "git --version")
                                                ^
    SyntaxError: positional argument follows keyword argument

 %% Cell type:code id:847b3c3d-bd9b-45d9-acf7-745a562c579e tags:

 ``` python
 # TODO: use option 2 to run "git --version"
 check_output(["git", "--version"])
 ```

 %% Output

    b'git version 2.34.1\n'

 %% Cell type:code id:7bcf8ef4 tags:

 ``` python
 # TODO: combine check_output with str typecast
 git_version_str = str(check_output(["git", "--version"]), "utf-8")
 ```

 %% Cell type:code id:d0d6c938 tags:

 ``` python
 # TODO: write code to extract just the version number
 print(git_version_str.strip().split(" ")[-1]) # option 1
 print(git_version_str[-7:-1]) # option 2
 ```

 %% Output

    2.34.1
    2.34.1

 %% Cell type:markdown id:2ef0b826-cb54-4d43-abe4-7dda3425ce65 tags:

 ### How long does it take to run code?

 Let's learn about `time` module `time` function. It returns the current time in seconds since epoch.

 What is epoch? epoch is January 1, 1970. **FUN FACT**: epoch is considered beginning of time for computers.

 %% Cell type:code id:89b5982c-40b5-498f-9acf-f05a9d233f59 tags:

 ``` python
 # TODO: invoke time module time function
 # keep in mind that we used import style of import
 time.time()
 # number of seconds since Jan 1, 1970
 ```

 %% Output

-    1675231286.744164
+    1706617377.4253736

 %% Cell type:code id:0c41fd03-d2b8-4641-a288-4b95c48b5d24 tags:

 ``` python
 start_time = time.time()
 # DO SOMETHING (e.g., check_output)
 end_time = time.time()

 print(end_time - start_time)
 ```

 %% Output

-    3.814697265625e-05
+    4.5299530029296875e-05

 %% Cell type:code id:31737e44 tags:

 ``` python
 # TODO: let's convert to milliseconds
 print((end_time-start_time) * 1e3)

 # TODO: let's convert to microseconds
 print((end_time-start_time) * 1e6)
 ```

 %% Output

-    0.03814697265625
-    38.14697265625
+    0.045299530029296875
+    45.299530029296875

 %% Cell type:markdown id:01651895 tags:

 How long does it take to run simple computations (example: 4 + 5)?

 %% Cell type:code id:0b739045-bfbe-439e-ac57-7e3f069db7b6 tags:

 ``` python
 start_time = time.time()
 x = 4 + 5
 end_time = time.time()

 print(end_time - start_time)
 ```

 %% Output

-    7.677078247070312e-05
+    4.9591064453125e-05

 %% Cell type:markdown id:1e378bfb tags:

 How long does it take to print simple computations (example: 4 + 5)?

 %% Cell type:code id:2665edb8-bd7d-4cf7-baf5-5143bd6f25a0 tags:

 ``` python
 start_time = time.time()
 print(4 + 5)
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

    9
-    1.7392635345458984
+    0.9014606475830078

 %% Cell type:markdown id:a0b5c105 tags:

 Printing is a relatively slow operation. If your program is printing lot of things, its performance might get impacted!

 %% Cell type:markdown id:030b867b tags:

 How long does it take to run a python program?

 Let's do a recap of python interactive mode.
 `python3 -c "code"`

 %% Cell type:code id:991218f8 tags:

 ``` python
 start_time = time.time()
 check_output(["python3", "-c", "print(4 + 5)"])
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    30.985593795776367
+    31.372547149658203

 %% Cell type:markdown id:1c68a8cd tags:

 ### Everytime we run a command, we get slightly different output. How can we eliminate the noise?

 %% Cell type:markdown id:7dad0036 tags:

 Let's try this with "pwd".

 %% Cell type:code id:5b4dba6f tags:

 ``` python
 start_time = time.time()
 check_output("pwd")
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    2.9201507568359375
+    4.441499710083008

 %% Cell type:markdown id:f853aab6 tags:

 Recall that `range` built-in function produces a sequence of integers starting at 0.

 %% Cell type:code id:b45228b7 tags:

 ``` python
 iters = 1000

 start_time = time.time()
 for i in range(iters):
    check_output("pwd")
 end_time = time.time()

 print((end_time-start_time) * 1e3 / iters)
 ```

 %% Output

-    1.6988284587860107
+    1.8102648258209229

 %% Cell type:markdown id:ff7e8971 tags:

 ### Data structures review
 - lists (sequence: ordered)
 - sets (not a sequence: not ordered):
    - indexing doesn't work, but `in` operator works
    - only stores unique values

 %% Cell type:code id:10a09035 tags:

 ``` python
 # TODO: create a simple list of integers
 some_numbers = [11, 22, 33]
 some_numbers
 ```

 %% Output

    [11, 22, 33]

 %% Cell type:code id:553b126b tags:

 ``` python
 # TODO: use range() to produce a list containing 1000000 numbers
 some_numbers = list(range(1000000))
 ```

 %% Cell type:markdown id:a73c753a tags:

 `in` operator: existence / constituency match

 %% Cell type:code id:e927c0b7 tags:

 ``` python
 100 in some_numbers
 ```

 %% Output

    True

 %% Cell type:code id:896176d0 tags:

 ``` python
 -20 in some_numbers
 ```

 %% Output

    False

 %% Cell type:markdown id:bc5f912b tags:

 How long does `in` operator take? It kind of depends on the location of the item we are searching.

 %% Cell type:code id:e0eda15b tags:

 ``` python
 # TODO: time how long it takes to find 99 in some_numbers
 start_time = time.time()
 99 in some_numbers
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    0.09298324584960938
+    0.08988380432128906

 %% Cell type:code id:ae667c2f tags:

 ``` python
 # TODO: time how long it takes to find 999999 in some_numbers
 start_time = time.time()
 999999 in some_numbers
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    11.307954788208008
+    11.118888854980469

 %% Cell type:code id:e77a228d-1709-4f9a-b8ff-7c97fbda38bc tags:

 ``` python
 # TODO: time how long it takes to find -1 in some_numbers
 start_time = time.time()
 -1 in some_numbers
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    11.208295822143555
+    9.768009185791016

 %% Cell type:code id:3ba890dc tags:

 ``` python
 # TODO: create a simple set of numbers
 some_set = {11, 22, 33}
 some_set
 ```

 %% Output

    {11, 22, 33}

 %% Cell type:code id:64d7b73f-d8a9-43c7-a3a3-01e9aaa9a09f tags:

 ``` python
 # TODO: convert some_numbers into set
 some_set = set(some_numbers)
 ```

 %% Cell type:code id:a55e59ef-31e3-4468-a0db-aad1c3fc291d tags:

 ``` python
 # TODO: time how long it takes to find -1 in some_numbers
 start_time = time.time()
 -1 in some_set
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    0.06175041198730469
+    0.11181831359863281

 %% Cell type:markdown id:d15a3b25 tags:

 # Performance 1

-%% Cell type:markdown id:64bfcf90 tags:
+%% Cell type:markdown id:cd7f646c tags:

 ### Few shortcuts
 * shift + enter = exceute a cell (= Run) and move to the next cell
 * ctrl + enter = excecute a cell and stay in the same cell
 * ESC + A = add a cell above the current cell
 * ESC + B = add a cell below the current cell
 * ctrl + / = toggle comment(s) (that is, adds/removes #)

 %% Cell type:markdown id:ea8c9210 tags:

 Recommendation: include all `import` statements in a cell at the top of the notebook file or your script file (`.py`).

 ### Two styles of import

 1. `from <module> import <some_function, some_variable>`
    - invocation `some_function()`
 2. `import <module>`
    - invocation `<module>.some_function()`

 %% Cell type:code id:4782ff79 tags:

 ``` python
 # import statements

 # TODO: use from style of import for importing "check_output" from subprocess
 from subprocess import check_output

 # TODO: use import style of import for importing "time" module
 import time
 ```

 %% Cell type:markdown id:de8ea97c tags:

 ### How to open documentation about a function inside `jupyter`?
 Press "Shift + tab" after entering function name.

 %% Cell type:code id:4a61ed24 tags:

 ``` python
 # TODO: open documentation for check_output
 check_output
 ```

 %% Output

    <function subprocess.check_output(*popenargs, timeout=None, **kwargs)>

 %% Cell type:markdown id:fc3025fc tags:

 ### What does `check_output` do?

 Enables us to run a command with or without arguments. It returns the output of the command.
 - Argument: command to run
 - Return value: output of the command as a `byte` object.

 %% Cell type:code id:e09ab658 tags:

 ``` python
 # TODO: invoke check_output to execute "pwd"
 pwd_output = check_output("pwd")
 pwd_output
 ```

 %% Output

-    b'/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1\n'
+    b'/home/gurmail.singh\n'

 %% Cell type:code id:668afc99 tags:

 ``` python
 # TODO: use type function call to check the output type of check_output
 type(pwd_output)
 ```

 %% Output

    bytes

 %% Cell type:markdown id:6dffec77 tags:

 ### What is a `byte` object?

 - `byte` is an example of a sequence.

 - Recall that `list`, `str`, `tuple` are examples of Python sequences.
 - Key sequence features:
    - indexing `seq[index]`
    - slicing `seq[start index:exclusive end index]`
    - iteration `for val in seq:`
-    - length `len(seq)`
+    - length `length(seq)`
    - existence / constituency match `<val> in seq`
 - indexing:
    - begins with 0 and increases by 1 for every value
    - can use negative values: -1 represents index for last value, -2 penultimate, etc.,

 %% Cell type:code id:70e41d0b-a12a-4e35-be97-083aa91af28f tags:

 ``` python
 # TODO: use indexing to extract value at index 0
 pwd_output[0]
 ```

 %% Output

    47

 %% Cell type:markdown id:60504389 tags:

 ### `byte` conversion to `str`
 - requires details about encoding
 - `str(<byte_variable>, <encoding>)`
 - Most programs in linux use `utf-8` encoding

 %% Cell type:code id:09e5bd77 tags:

 ``` python
 # Can we just convert bytes directly into str?
 # Not really, you need specify the encoding
 str(pwd_output)
 ```

 %% Output

-    "b'/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1\\n'"
+    "b'/home/gurmail.singh\\n'"

 %% Cell type:code id:fbc2ad71 tags:

 ``` python
 # TODO: let's try utf-8 encoding
 pwd_output_str = str(pwd_output, "utf-8")
 pwd_output_str
 ```

 %% Output

-    '/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1\n'
+    '/home/gurmail.singh\n'

 %% Cell type:markdown id:5ce854cd tags:

 Recall that, when you print an `str`, it formats the output.

 %% Cell type:code id:8e0c335c tags:

 ``` python
 print(pwd_output_str)
 ```

 %% Output

-    /home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1
+    /home/gurmail.singh
    

 %% Cell type:code id:da2f1bc1 tags:

 ``` python
 # You must use the correct encoding, otherwise the conversion will fail
 str(pwd_output, "cp273")
 ```

 %% Output

-    '\x07Ç?_Á\x07_Ë`/_,Í_/Ê\x07ÈÁ_ø\x07[Ë\x93\x16\x90\x05%Á[ÈÍÊÁ\x05>?ÈÁË\x07%Á[^\x90\x94^&ÁÊÃ?Ê_/>[Á^\x91\x8e'
+    '\x07Ç?_Á\x07ÅÍÊ_/Ñ%\x06ËÑ>ÅÇ\x8e'

 %% Cell type:markdown id:11ce814c tags:

 ### `str` methods recap

 - `<str_variable>.strip()`: removes leading and trailing whitespace
 - `<str_varaible>.split(<separator>)`: returns list of strings split by separator

 %% Cell type:code id:8d8bb61a-39ce-404b-81a1-992434ef26de tags:

 ``` python
 # TODO: try strip method
 pwd_output_str.strip()
 ```

 %% Output

-    '/home/msyamkumar/temp/cs320-lecture-notes/lec_04_Performance_1'
+    '/home/gurmail.singh'

 %% Cell type:code id:fabe2232 tags:

 ``` python
 # TODO: try split method using "/" as separator
 pwd_output_str.split("/")
 ```

 %% Output

-    ['',
-     'home',
-     'msyamkumar',
-     'temp',
-     'cs320-lecture-notes',
-     'lec_04_Performance_1\n']
+    ['', 'home', 'gurmail.singh\n']

 %% Cell type:code id:27590458-53b4-4b8d-b7db-25ea54e304c3 tags:

 ``` python
 # You can string methods or function calls together
 # TODO: first strip and then split the string
 pwd_output_str.strip().split("/")
 ```

 %% Output

-    ['',
-     'home',
-     'msyamkumar',
-     'temp',
-     'cs320-lecture-notes',
-     'lec_04_Performance_1']
+    ['', 'home', 'gurmail.singh']

 %% Cell type:markdown id:a83a11b3 tags:

 ### What does `check_output` do when the command doesn't exist?
 - `FileNotFoundError`

 %% Cell type:code id:b865a24c-bf5f-4fbf-be5b-0b578df66df8 tags:

 ``` python
 # TODO: invoke check_output by passing "hahaha" as argument
 check_output("hahaha")
 ```

 %% Output

    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
 Cell     In[13], line 2
          1 # TODO: invoke check_output by passing "hahaha" as argument
    ----> 2 check_output("hahaha")
-File     /usr/lib/python3.10/subprocess.py:420, in check_output(timeout, *popenargs, **kwargs)
-        417         empty = b''
-        418     kwargs['input'] = empty
-    --> 420 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
-        421            **kwargs).stdout
-File     /usr/lib/python3.10/subprocess.py:501, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
-        498     kwargs['stdout'] = PIPE
-        499     kwargs['stderr'] = PIPE
-    --> 501 with Popen(*popenargs, **kwargs) as process:
-        502     try:
-        503         stdout, stderr = process.communicate(input, timeout=timeout)
-File     /usr/lib/python3.10/subprocess.py:969, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
-        965         if self.text_mode:
-        966             self.stderr = io.TextIOWrapper(self.stderr,
-        967                     encoding=encoding, errors=errors)
-    --> 969     self._execute_child(args, executable, preexec_fn, close_fds,
-        970                         pass_fds, cwd, env,
-        971                         startupinfo, creationflags, shell,
-        972                         p2cread, p2cwrite,
-        973                         c2pread, c2pwrite,
-        974                         errread, errwrite,
-        975                         restore_signals,
-        976                         gid, gids, uid, umask,
-        977                         start_new_session)
-        978 except:
-        979     # Cleanup if the child failed starting.
-        980     for f in filter(None, (self.stdin, self.stdout, self.stderr)):
-File     /usr/lib/python3.10/subprocess.py:1845, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
-       1843     if errno_num != 0:
-       1844         err_msg = os.strerror(errno_num)
-    -> 1845     raise child_exception_type(errno_num, err_msg, err_filename)
-       1846 raise child_exception_type(err_msg)
+File     /usr/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
+        418         empty = b''
+        419     kwargs['input'] = empty
+    --> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
+        422            **kwargs).stdout
+File     /usr/lib/python3.10/subprocess.py:503, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
+        500     kwargs['stdout'] = PIPE
+        501     kwargs['stderr'] = PIPE
+    --> 503 with Popen(*popenargs, **kwargs) as process:
+        504     try:
+        505         stdout, stderr = process.communicate(input, timeout=timeout)
+File     /usr/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
+        967         if self.text_mode:
+        968             self.stderr = io.TextIOWrapper(self.stderr,
+        969                     encoding=encoding, errors=errors)
+    --> 971     self._execute_child(args, executable, preexec_fn, close_fds,
+        972                         pass_fds, cwd, env,
+        973                         startupinfo, creationflags, shell,
+        974                         p2cread, p2cwrite,
+        975                         c2pread, c2pwrite,
+        976                         errread, errwrite,
+        977                         restore_signals,
+        978                         gid, gids, uid, umask,
+        979                         start_new_session)
+        980 except:
+        981     # Cleanup if the child failed starting.
+        982     for f in filter(None, (self.stdin, self.stdout, self.stderr)):
+File     /usr/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
+       1861     if errno_num != 0:
+       1862         err_msg = os.strerror(errno_num)
+    -> 1863     raise child_exception_type(errno_num, err_msg, err_filename)
+       1864 raise child_exception_type(err_msg)
    FileNotFoundError: [Errno 2] No such file or directory: 'hahaha'

 %% Cell type:markdown id:8512a276 tags:

 ### How can we use `check_output` to execute a command with arguments?

 - option 1: pass the command with arguments as a string and pass `True` as argument to parameter `shell`
 - option 2: pass a list of strings; for example: `[<command>, <arg1>, <arg2>]`

 %% Cell type:markdown id:9a5d0722 tags:

 ### git --version

 %% Cell type:code id:fad47b1d-1f52-4c8e-8df4-0847fe1718c6 tags:

 ``` python
 # TODO: use option 1 to run "git --version"
 check_output("git --version", shell=True)
 ```

 %% Output

    b'git version 2.34.1\n'

 %% Cell type:markdown id:632699d8 tags:

 What would happen if we switch the order of the two arguments? Recall that positional arguments should come before keyword arguments.

 %% Cell type:code id:6efc81d6 tags:

 ``` python
 check_output(shell=True, "git --version")
 ```

 %% Output

      Cell In[15], line 1
        check_output(shell=True, "git --version")
                                                ^
    SyntaxError: positional argument follows keyword argument

 %% Cell type:code id:847b3c3d-bd9b-45d9-acf7-745a562c579e tags:

 ``` python
 # TODO: use option 2 to run "git --version"
 check_output(["git", "--version"])
 ```

 %% Output

    b'git version 2.34.1\n'

 %% Cell type:code id:7bcf8ef4 tags:

 ``` python
 # TODO: combine check_output with str typecast
 git_version_str = str(check_output(["git", "--version"]), "utf-8")
 ```

 %% Cell type:code id:d0d6c938 tags:

 ``` python
 # TODO: write code to extract just the version number
 print(git_version_str.strip().split(" ")[-1]) # option 1
 print(git_version_str[-7:-1]) # option 2
 ```

 %% Output

    2.34.1
    2.34.1

 %% Cell type:markdown id:2ef0b826-cb54-4d43-abe4-7dda3425ce65 tags:

 ### How long does it take to run code?

 Let's learn about `time` module `time` function. It returns the current time in seconds since epoch.

 What is epoch? epoch is January 1, 1970. **FUN FACT**: epoch is considered beginning of time for computers.

 %% Cell type:code id:89b5982c-40b5-498f-9acf-f05a9d233f59 tags:

 ``` python
 # TODO: invoke time module time function
 # keep in mind that we used import style of import
 time.time()
 # number of seconds since Jan 1, 1970
 ```

 %% Output

-    1675231286.744164
+    1706617377.4253736

 %% Cell type:code id:0c41fd03-d2b8-4641-a288-4b95c48b5d24 tags:

 ``` python
 start_time = time.time()
 # DO SOMETHING (e.g., check_output)
 end_time = time.time()

 print(end_time - start_time)
 ```

 %% Output

-    3.814697265625e-05
+    4.5299530029296875e-05

 %% Cell type:code id:31737e44 tags:

 ``` python
 # TODO: let's convert to milliseconds
 print((end_time-start_time) * 1e3)

 # TODO: let's convert to microseconds
 print((end_time-start_time) * 1e6)
 ```

 %% Output

-    0.03814697265625
-    38.14697265625
+    0.045299530029296875
+    45.299530029296875

 %% Cell type:markdown id:01651895 tags:

 How long does it take to run simple computations (example: 4 + 5)?

 %% Cell type:code id:0b739045-bfbe-439e-ac57-7e3f069db7b6 tags:

 ``` python
 start_time = time.time()
 x = 4 + 5
 end_time = time.time()

 print(end_time - start_time)
 ```

 %% Output

-    7.677078247070312e-05
+    4.9591064453125e-05

 %% Cell type:markdown id:1e378bfb tags:

 How long does it take to print simple computations (example: 4 + 5)?

 %% Cell type:code id:2665edb8-bd7d-4cf7-baf5-5143bd6f25a0 tags:

 ``` python
 start_time = time.time()
 print(4 + 5)
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

    9
-    1.7392635345458984
+    0.9014606475830078

 %% Cell type:markdown id:a0b5c105 tags:

 Printing is a relatively slow operation. If your program is printing lot of things, its performance might get impacted!

 %% Cell type:markdown id:030b867b tags:

 How long does it take to run a python program?

 Let's do a recap of python interactive mode.
 `python3 -c "code"`

 %% Cell type:code id:991218f8 tags:

 ``` python
 start_time = time.time()
 check_output(["python3", "-c", "print(4 + 5)"])
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    30.985593795776367
+    31.372547149658203

 %% Cell type:markdown id:1c68a8cd tags:

 ### Everytime we run a command, we get slightly different output. How can we eliminate the noise?

 %% Cell type:markdown id:7dad0036 tags:

 Let's try this with "pwd".

 %% Cell type:code id:5b4dba6f tags:

 ``` python
 start_time = time.time()
 check_output("pwd")
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    2.9201507568359375
+    4.441499710083008

 %% Cell type:markdown id:f853aab6 tags:

 Recall that `range` built-in function produces a sequence of integers starting at 0.

 %% Cell type:code id:b45228b7 tags:

 ``` python
 iters = 1000

 start_time = time.time()
 for i in range(iters):
    check_output("pwd")
 end_time = time.time()

 print((end_time-start_time) * 1e3 / iters)
 ```

 %% Output

-    1.6988284587860107
+    1.8102648258209229

 %% Cell type:markdown id:ff7e8971 tags:

 ### Data structures review
 - lists (sequence: ordered)
 - sets (not a sequence: not ordered):
    - indexing doesn't work, but `in` operator works
    - only stores unique values

 %% Cell type:code id:10a09035 tags:

 ``` python
 # TODO: create a simple list of integers
 some_numbers = [11, 22, 33]
 some_numbers
 ```

 %% Output

    [11, 22, 33]

 %% Cell type:code id:553b126b tags:

 ``` python
 # TODO: use range() to produce a list containing 1000000 numbers
 some_numbers = list(range(1000000))
 ```

 %% Cell type:markdown id:a73c753a tags:

 `in` operator: existence / constituency match

 %% Cell type:code id:e927c0b7 tags:

 ``` python
 100 in some_numbers
 ```

 %% Output

    True

 %% Cell type:code id:896176d0 tags:

 ``` python
 -20 in some_numbers
 ```

 %% Output

    False

 %% Cell type:markdown id:bc5f912b tags:

 How long does `in` operator take? It kind of depends on the location of the item we are searching.

 %% Cell type:code id:e0eda15b tags:

 ``` python
 # TODO: time how long it takes to find 99 in some_numbers
 start_time = time.time()
 99 in some_numbers
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    0.09298324584960938
+    0.08988380432128906

 %% Cell type:code id:ae667c2f tags:

 ``` python
 # TODO: time how long it takes to find 999999 in some_numbers
 start_time = time.time()
 999999 in some_numbers
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    11.307954788208008
+    11.118888854980469

 %% Cell type:code id:e77a228d-1709-4f9a-b8ff-7c97fbda38bc tags:

 ``` python
 # TODO: time how long it takes to find -1 in some_numbers
 start_time = time.time()
 -1 in some_numbers
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    11.208295822143555
+    9.768009185791016

 %% Cell type:code id:3ba890dc tags:

 ``` python
 # TODO: create a simple set of numbers
 some_set = {11, 22, 33}
 some_set
 ```

 %% Output

    {11, 22, 33}

 %% Cell type:code id:64d7b73f-d8a9-43c7-a3a3-01e9aaa9a09f tags:

 ``` python
 # TODO: convert some_numbers into set
 some_set = set(some_numbers)
 ```

 %% Cell type:code id:a55e59ef-31e3-4468-a0db-aad1c3fc291d tags:

 ``` python
 # TODO: time how long it takes to find -1 in some_numbers
 start_time = time.time()
 -1 in some_set
 end_time = time.time()

 print((end_time-start_time) * 1e3)
 ```

 %% Output

-    0.06175041198730469
+    0.11181831359863281

--- a/lecture_material/04-performance2/solution.ipynb
+++ b/lecture_material/04-performance2/solution.ipynb
+%% Cell type:markdown id:d617eefb tags:
+
+# Performance 2
+
+%% Cell type:code id:783117c5-146f-454a-963e-ed2873b8a6d3 tags:
+
+``` python
+# known import statements
+import pandas as pd
+import csv
+from subprocess import check_output
+
+# new import statements
+import zipfile
+from io import TextIOWrapper
+```
+
+%% Cell type:markdown id:4e2be82d tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:4eaa8a8d tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Output
+
+    ['total 21M',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 2.0K Jan 30 20:49 lec2.ipynb',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 5.2K Feb  1 13:08 lecture.ipynb',
+     '-rw------- 1 gurmail.singh gurmail.singh 230K Feb  1 13:09 nohup.out',
+     'drwxrwxr-x 3 gurmail.singh gurmail.singh 4.0K Jan 30 20:42 paper',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh   39 Jan 25 18:32 paper1.txt',
+     'drwxrwxr-x 8 gurmail.singh gurmail.singh 4.0K Jan 30 14:06 s24',
+     'drwx------ 3 gurmail.singh gurmail.singh 4.0K Jan 30 12:31 snap',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh  21M Feb  1 12:44 wi.zip',
+     '']
+
+%% Cell type:markdown id:b8c7dc7f tags:
+
+### Let's `unzip` "wi.zip".
+
+%% Cell type:code id:ed32cf4c tags:
+
+``` python
+check_output(["unzip", "wi.zip"])
+```
+
+%% Output
+
+    b'Archive:  wi.zip\n  inflating: wi.csv                  \n'
+
+%% Cell type:markdown id:4eac1b48 tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:a6852e43 tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Output
+
+    ['total 198M',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 2.0K Jan 30 20:49 lec2.ipynb',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 5.2K Feb  1 13:08 lecture.ipynb',
+     '-rw------- 1 gurmail.singh gurmail.singh 230K Feb  1 13:09 nohup.out',
+     'drwxrwxr-x 3 gurmail.singh gurmail.singh 4.0K Jan 30 20:42 paper',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh   39 Jan 25 18:32 paper1.txt',
+     'drwxrwxr-x 8 gurmail.singh gurmail.singh 4.0K Jan 30 14:06 s24',
+     'drwx------ 3 gurmail.singh gurmail.singh 4.0K Jan 30 12:31 snap',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 177M Jan 14  2022 wi.csv',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh  21M Feb  1 12:44 wi.zip',
+     '']
+
+%% Cell type:markdown id:8ba94151 tags:
+
+### Traditional way of reading data using pandas
+
+%% Cell type:code id:529a4bd2 tags:
+
+``` python
+df = pd.read_csv("wi.csv")
+```
+
+%% Output
+
+    /tmp/ipykernel_36341/3756477020.py:1: DtypeWarning: Columns (22,23,24,26,27,28,29,30,31,32,33,38,43,44) have mixed types. Specify dtype option on import or set low_memory=False.
+      df = pd.read_csv("wi.csv")
+
+%% Cell type:code id:570485b8 tags:
+
+``` python
+df.head(5) # Top 5 rows within the DataFrame
+```
+
+%% Output
+
+       activity_year                   lei  derived_msa-md state_code  \
+    0           2020  549300FX7K8PTEQUU487           31540         WI
+    1           2020  549300FX7K8PTEQUU487           99999         WI
+    2           2020  549300FX7K8PTEQUU487           99999         WI
+    3           2020  549300FX7K8PTEQUU487           99999         WI
+    4           2020  549300FX7K8PTEQUU487           33460         WI
+    
+       county_code  census_tract conforming_loan_limit  \
+    0      55025.0  5.502500e+10                     C
+    1      55013.0  5.501397e+10                     C
+    2      55127.0  5.512700e+10                     C
+    3      55127.0  5.512700e+10                     C
+    4      55109.0  5.510912e+10                     C
+    
+           derived_loan_product_type             derived_dwelling_category  \
+    0        Conventional:First Lien  Single Family (1-4 Units):Site-Built
+    1        Conventional:First Lien  Single Family (1-4 Units):Site-Built
+    2                  VA:First Lien  Single Family (1-4 Units):Site-Built
+    3  Conventional:Subordinate Lien  Single Family (1-4 Units):Site-Built
+    4                  VA:First Lien  Single Family (1-4 Units):Site-Built
+    
+             derived_ethnicity  ... denial_reason-2 denial_reason-3  \
+    0   Not Hispanic or Latino  ...             NaN             NaN
+    1   Not Hispanic or Latino  ...             NaN             NaN
+    2   Not Hispanic or Latino  ...             NaN             NaN
+    3  Ethnicity Not Available  ...             NaN             NaN
+    4   Not Hispanic or Latino  ...             NaN             NaN
+    
+       denial_reason-4  tract_population  tract_minority_population_percent  \
+    0              NaN              3572                              41.15
+    1              NaN              2333                               9.90
+    2              NaN              5943                              13.26
+    3              NaN              5650                               7.63
+    4              NaN              7210                               4.36
+    
+       ffiec_msa_md_median_family_income  tract_to_msa_income_percentage  \
+    0                              96600                              64
+    1                              68000                              87
+    2                              68000                             104
+    3                              68000                             124
+    4                              97300                              96
+    
+       tract_owner_occupied_units  tract_one_to_four_family_homes  \
+    0                         812                             910
+    1                        1000                            2717
+    2                        1394                            1856
+    3                        1712                            2104
+    4                        2101                            2566
+    
+       tract_median_age_of_housing_units
+    0                                 45
+    1                                 34
+    2                                 44
+    3                                 36
+    4                                 22
+    
+    [5 rows x 99 columns]
+
+%% Cell type:markdown id:bad7dce4 tags:
+
+### How can we see all the column names?
+
+%% Cell type:code id:d0a98751 tags:
+
+``` python
+df.columns
+```
+
+%% Output
+
+    Index(['activity_year', 'lei', 'derived_msa-md', 'state_code', 'county_code',
+           'census_tract', 'conforming_loan_limit', 'derived_loan_product_type',
+           'derived_dwelling_category', 'derived_ethnicity', 'derived_race',
+           'derived_sex', 'action_taken', 'purchaser_type', 'preapproval',
+           'loan_type', 'loan_purpose', 'lien_status', 'reverse_mortgage',
+           'open-end_line_of_credit', 'business_or_commercial_purpose',
+           'loan_amount', 'loan_to_value_ratio', 'interest_rate', 'rate_spread',
+           'hoepa_status', 'total_loan_costs', 'total_points_and_fees',
+           'origination_charges', 'discount_points', 'lender_credits', 'loan_term',
+           'prepayment_penalty_term', 'intro_rate_period', 'negative_amortization',
+           'interest_only_payment', 'balloon_payment',
+           'other_nonamortizing_features', 'property_value', 'construction_method',
+           'occupancy_type', 'manufactured_home_secured_property_type',
+           'manufactured_home_land_property_interest', 'total_units',
+           'multifamily_affordable_units', 'income', 'debt_to_income_ratio',
+           'applicant_credit_score_type', 'co-applicant_credit_score_type',
+           'applicant_ethnicity-1', 'applicant_ethnicity-2',
+           'applicant_ethnicity-3', 'applicant_ethnicity-4',
+           'applicant_ethnicity-5', 'co-applicant_ethnicity-1',
+           'co-applicant_ethnicity-2', 'co-applicant_ethnicity-3',
+           'co-applicant_ethnicity-4', 'co-applicant_ethnicity-5',
+           'applicant_ethnicity_observed', 'co-applicant_ethnicity_observed',
+           'applicant_race-1', 'applicant_race-2', 'applicant_race-3',
+           'applicant_race-4', 'applicant_race-5', 'co-applicant_race-1',
+           'co-applicant_race-2', 'co-applicant_race-3', 'co-applicant_race-4',
+           'co-applicant_race-5', 'applicant_race_observed',
+           'co-applicant_race_observed', 'applicant_sex', 'co-applicant_sex',
+           'applicant_sex_observed', 'co-applicant_sex_observed', 'applicant_age',
+           'co-applicant_age', 'applicant_age_above_62',
+           'co-applicant_age_above_62', 'submission_of_application',
+           'initially_payable_to_institution', 'aus-1', 'aus-2', 'aus-3', 'aus-4',
+           'aus-5', 'denial_reason-1', 'denial_reason-2', 'denial_reason-3',
+           'denial_reason-4', 'tract_population',
+           'tract_minority_population_percent',
+           'ffiec_msa_md_median_family_income', 'tract_to_msa_income_percentage',
+           'tract_owner_occupied_units', 'tract_one_to_four_family_homes',
+           'tract_median_age_of_housing_units'],
+          dtype='object')
+%% Cell type:markdown id:d617eefb tags:
+
+# Performance 2
+
+%% Cell type:code id:783117c5-146f-454a-963e-ed2873b8a6d3 tags:
+
+``` python
+# known import statements
+import pandas as pd
+import csv
+from subprocess import check_output
+
+# new import statements
+import zipfile
+from io import TextIOWrapper
+```
+
+%% Cell type:markdown id:4e2be82d tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:4eaa8a8d tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Output
+
+    ['total 21M',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 2.0K Jan 30 20:49 lec2.ipynb',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 5.2K Feb  1 13:08 lecture.ipynb',
+     '-rw------- 1 gurmail.singh gurmail.singh 230K Feb  1 13:09 nohup.out',
+     'drwxrwxr-x 3 gurmail.singh gurmail.singh 4.0K Jan 30 20:42 paper',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh   39 Jan 25 18:32 paper1.txt',
+     'drwxrwxr-x 8 gurmail.singh gurmail.singh 4.0K Jan 30 14:06 s24',
+     'drwx------ 3 gurmail.singh gurmail.singh 4.0K Jan 30 12:31 snap',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh  21M Feb  1 12:44 wi.zip',
+     '']
+
+%% Cell type:markdown id:b8c7dc7f tags:
+
+### Let's `unzip` "wi.zip".
+
+%% Cell type:code id:ed32cf4c tags:
+
+``` python
+check_output(["unzip", "wi.zip"])
+```
+
+%% Output
+
+    b'Archive:  wi.zip\n  inflating: wi.csv                  \n'
+
+%% Cell type:markdown id:4eac1b48 tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:a6852e43 tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Output
+
+    ['total 198M',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 2.0K Jan 30 20:49 lec2.ipynb',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 5.2K Feb  1 13:08 lecture.ipynb',
+     '-rw------- 1 gurmail.singh gurmail.singh 230K Feb  1 13:09 nohup.out',
+     'drwxrwxr-x 3 gurmail.singh gurmail.singh 4.0K Jan 30 20:42 paper',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh   39 Jan 25 18:32 paper1.txt',
+     'drwxrwxr-x 8 gurmail.singh gurmail.singh 4.0K Jan 30 14:06 s24',
+     'drwx------ 3 gurmail.singh gurmail.singh 4.0K Jan 30 12:31 snap',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh 177M Jan 14  2022 wi.csv',
+     '-rw-rw-r-- 1 gurmail.singh gurmail.singh  21M Feb  1 12:44 wi.zip',
+     '']
+
+%% Cell type:markdown id:8ba94151 tags:
+
+### Traditional way of reading data using pandas
+
+%% Cell type:code id:529a4bd2 tags:
+
+``` python
+df = pd.read_csv("wi.csv")
+```
+
+%% Output
+
+    /tmp/ipykernel_36341/3756477020.py:1: DtypeWarning: Columns (22,23,24,26,27,28,29,30,31,32,33,38,43,44) have mixed types. Specify dtype option on import or set low_memory=False.
+      df = pd.read_csv("wi.csv")
+
+%% Cell type:code id:570485b8 tags:
+
+``` python
+df.head(5) # Top 5 rows within the DataFrame
+```
+
+%% Output
+
+       activity_year                   lei  derived_msa-md state_code  \
+    0           2020  549300FX7K8PTEQUU487           31540         WI
+    1           2020  549300FX7K8PTEQUU487           99999         WI
+    2           2020  549300FX7K8PTEQUU487           99999         WI
+    3           2020  549300FX7K8PTEQUU487           99999         WI
+    4           2020  549300FX7K8PTEQUU487           33460         WI
+    
+       county_code  census_tract conforming_loan_limit  \
+    0      55025.0  5.502500e+10                     C
+    1      55013.0  5.501397e+10                     C
+    2      55127.0  5.512700e+10                     C
+    3      55127.0  5.512700e+10                     C
+    4      55109.0  5.510912e+10                     C
+    
+           derived_loan_product_type             derived_dwelling_category  \
+    0        Conventional:First Lien  Single Family (1-4 Units):Site-Built
+    1        Conventional:First Lien  Single Family (1-4 Units):Site-Built
+    2                  VA:First Lien  Single Family (1-4 Units):Site-Built
+    3  Conventional:Subordinate Lien  Single Family (1-4 Units):Site-Built
+    4                  VA:First Lien  Single Family (1-4 Units):Site-Built
+    
+             derived_ethnicity  ... denial_reason-2 denial_reason-3  \
+    0   Not Hispanic or Latino  ...             NaN             NaN
+    1   Not Hispanic or Latino  ...             NaN             NaN
+    2   Not Hispanic or Latino  ...             NaN             NaN
+    3  Ethnicity Not Available  ...             NaN             NaN
+    4   Not Hispanic or Latino  ...             NaN             NaN
+    
+       denial_reason-4  tract_population  tract_minority_population_percent  \
+    0              NaN              3572                              41.15
+    1              NaN              2333                               9.90
+    2              NaN              5943                              13.26
+    3              NaN              5650                               7.63
+    4              NaN              7210                               4.36
+    
+       ffiec_msa_md_median_family_income  tract_to_msa_income_percentage  \
+    0                              96600                              64
+    1                              68000                              87
+    2                              68000                             104
+    3                              68000                             124
+    4                              97300                              96
+    
+       tract_owner_occupied_units  tract_one_to_four_family_homes  \
+    0                         812                             910
+    1                        1000                            2717
+    2                        1394                            1856
+    3                        1712                            2104
+    4                        2101                            2566
+    
+       tract_median_age_of_housing_units
+    0                                 45
+    1                                 34
+    2                                 44
+    3                                 36
+    4                                 22
+    
+    [5 rows x 99 columns]
+
+%% Cell type:markdown id:bad7dce4 tags:
+
+### How can we see all the column names?
+
+%% Cell type:code id:d0a98751 tags:
+
+``` python
+df.columns
+```
+
+%% Output
+
+    Index(['activity_year', 'lei', 'derived_msa-md', 'state_code', 'county_code',
+           'census_tract', 'conforming_loan_limit', 'derived_loan_product_type',
+           'derived_dwelling_category', 'derived_ethnicity', 'derived_race',
+           'derived_sex', 'action_taken', 'purchaser_type', 'preapproval',
+           'loan_type', 'loan_purpose', 'lien_status', 'reverse_mortgage',
+           'open-end_line_of_credit', 'business_or_commercial_purpose',
+           'loan_amount', 'loan_to_value_ratio', 'interest_rate', 'rate_spread',
+           'hoepa_status', 'total_loan_costs', 'total_points_and_fees',
+           'origination_charges', 'discount_points', 'lender_credits', 'loan_term',
+           'prepayment_penalty_term', 'intro_rate_period', 'negative_amortization',
+           'interest_only_payment', 'balloon_payment',
+           'other_nonamortizing_features', 'property_value', 'construction_method',
+           'occupancy_type', 'manufactured_home_secured_property_type',
+           'manufactured_home_land_property_interest', 'total_units',
+           'multifamily_affordable_units', 'income', 'debt_to_income_ratio',
+           'applicant_credit_score_type', 'co-applicant_credit_score_type',
+           'applicant_ethnicity-1', 'applicant_ethnicity-2',
+           'applicant_ethnicity-3', 'applicant_ethnicity-4',
+           'applicant_ethnicity-5', 'co-applicant_ethnicity-1',
+           'co-applicant_ethnicity-2', 'co-applicant_ethnicity-3',
+           'co-applicant_ethnicity-4', 'co-applicant_ethnicity-5',
+           'applicant_ethnicity_observed', 'co-applicant_ethnicity_observed',
+           'applicant_race-1', 'applicant_race-2', 'applicant_race-3',
+           'applicant_race-4', 'applicant_race-5', 'co-applicant_race-1',
+           'co-applicant_race-2', 'co-applicant_race-3', 'co-applicant_race-4',
+           'co-applicant_race-5', 'applicant_race_observed',
+           'co-applicant_race_observed', 'applicant_sex', 'co-applicant_sex',
+           'applicant_sex_observed', 'co-applicant_sex_observed', 'applicant_age',
+           'co-applicant_age', 'applicant_age_above_62',
+           'co-applicant_age_above_62', 'submission_of_application',
+           'initially_payable_to_institution', 'aus-1', 'aus-2', 'aus-3', 'aus-4',
+           'aus-5', 'denial_reason-1', 'denial_reason-2', 'denial_reason-3',
+           'denial_reason-4', 'tract_population',
+           'tract_minority_population_percent',
+           'ffiec_msa_md_median_family_income', 'tract_to_msa_income_percentage',
+           'tract_owner_occupied_units', 'tract_one_to_four_family_homes',
+           'tract_median_age_of_housing_units'],
+          dtype='object')
--- a/lecture_material/04-performance2/template_lec_001.ipynb
+++ b/lecture_material/04-performance2/template_lec_001.ipynb
+%% Cell type:markdown id:1a6cc54c tags:
+
+# Performance 2
+
+%% Cell type:code id:783117c5-146f-454a-963e-ed2873b8a6d3 tags:
+
+``` python
+# known import statements
+import pandas as pd
+import csv
+from subprocess import check_output
+
+# new import statements
+import zipfile
+from io import TextIOWrapper
+```
+
+%% Cell type:markdown id:66db2ad0 tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:6cef713e tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Cell type:markdown id:c76f819d tags:
+
+### Let's `unzip` "wi.zip".
+
+%% Cell type:code id:0e87ec01 tags:
+
+``` python
+check_output(["unzip", "wi.zip"])
+```
+
+%% Cell type:markdown id:274fa49a tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:a2da3cd0 tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Cell type:markdown id:90b11343 tags:
+
+### Traditional way of reading data using pandas
+
+%% Cell type:code id:a3175526 tags:
+
+``` python
+df = pd.read_csv("wi.csv")
+```
+
+%% Cell type:code id:13e6e034 tags:
+
+``` python
+df.head(5) # Top 5 rows within the DataFrame
+```
+
+%% Cell type:markdown id:5c79984c tags:
+
+### How can we see all the column names?
+
+%% Cell type:code id:08d9501d tags:
+
+``` python
+df.columns
+```
+%% Cell type:markdown id:1a6cc54c tags:
+
+# Performance 2
+
+%% Cell type:code id:783117c5-146f-454a-963e-ed2873b8a6d3 tags:
+
+``` python
+# known import statements
+import pandas as pd
+import csv
+from subprocess import check_output
+
+# new import statements
+import zipfile
+from io import TextIOWrapper
+```
+
+%% Cell type:markdown id:66db2ad0 tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:6cef713e tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Cell type:markdown id:c76f819d tags:
+
+### Let's `unzip` "wi.zip".
+
+%% Cell type:code id:0e87ec01 tags:
+
+``` python
+check_output(["unzip", "wi.zip"])
+```
+
+%% Cell type:markdown id:274fa49a tags:
+
+### Let's take a look at the files inside the current working directory.
+
+%% Cell type:code id:a2da3cd0 tags:
+
+``` python
+str(check_output(["ls", "-lh"]), encoding="utf-8").split("\n")
+```
+
+%% Cell type:markdown id:90b11343 tags:
+
+### Traditional way of reading data using pandas
+
+%% Cell type:code id:a3175526 tags:
+
+``` python
+df = pd.read_csv("wi.csv")
+```
+
+%% Cell type:code id:13e6e034 tags:
+
+``` python
+df.head(5) # Top 5 rows within the DataFrame
+```
+
+%% Cell type:markdown id:5c79984c tags:
+
+### How can we see all the column names?
+
+%% Cell type:code id:08d9501d tags:
+
+``` python
+df.columns
+```
No results found