From dd1632af4bf264336c6a60e0b6d65faff185ee12 Mon Sep 17 00:00:00 2001 From: msyamkumar <msyamkumar@wisc.edu> Date: Wed, 16 Nov 2022 09:22:07 -0600 Subject: [PATCH] New lec 29 notebooks --- .../lec_28_pandas2-checkpoint.ipynb | 5626 ----------------- .../lec_28_pandas2_template-checkpoint.ipynb | 1208 ---- .../demo_lec_28-checkpoint.ipynb | 1038 --- .../demo_lec_28_template-checkpoint.ipynb | 1151 ---- .../lec_29_web1-checkpoint.ipynb | 3255 ---------- .../lec_29_web1_template-checkpoint.ipynb | 767 --- .../pandas1-checkpoint.ipynb | 1736 ----- .../pandas_1_worksheet-checkpoint.ipynb | 2042 ------ f22/meena_lec_notes/lec-29/lec_29_web1.ipynb | 2124 ++----- .../lec-29/lec_29_web1_template.ipynb | 284 +- f22/meena_lec_notes/lec-29/my_course_data.csv | 18 - 11 files changed, 705 insertions(+), 18544 deletions(-) delete mode 100644 f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2_template-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28_template-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1_template-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas1-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas_1_worksheet-checkpoint.ipynb delete mode 100644 f22/meena_lec_notes/lec-29/my_course_data.csv diff --git a/f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2-checkpoint.ipynb b/f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2-checkpoint.ipynb deleted file mode 100644 index df2d96e..0000000 --- a/f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2-checkpoint.ipynb +++ /dev/null @@ -1,5626 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "from pandas import Series, DataFrame\n", - "# We can explictly import Series and DataFrame, why might we do this?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Series Review\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Series from `list`" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 54\n", - "1 22\n", - "2 19\n", - "3 73\n", - "4 80\n", - "dtype: int64" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores_list = [54, 22, 19, 73, 80]\n", - "scores_series = Series(scores_list)\n", - "scores_series\n", - "\n", - "# What is the terminology for: 0, 1, 2, ... ?? A: index\n", - "# What is the terminology for: 54, 22, 19, .... ?? A: value" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Selecting certain scores.\n", - "What are all the scores `> 50`?" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 True\n", - "1 False\n", - "2 False\n", - "3 True\n", - "4 True\n", - "dtype: bool" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores_series > 50" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Answer:** Boolean indexing. Try the following..." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 54\n", - "1 22\n", - "4 80\n", - "dtype: int64" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores_series[[True, True, False, False, True]] # often called a \"mask\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are really writing a \"mask\" for our data." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 54\n", - "3 73\n", - "4 80\n", - "dtype: int64" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores_series[scores_series > 50]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Series from `dict`" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Rita 5\n", - "Therese 3\n", - "Janice 6\n", - "dtype: int64\n", - "Rita 3\n", - "Therese 7\n", - "Janice 4\n", - "dtype: int64\n", - "Therese 5\n", - "Janice 5\n", - "Rita 8\n", - "dtype: int64\n" - ] - } - ], - "source": [ - "# Imagine we hire students and track their weekly hours\n", - "week1 = Series({\"Rita\": 5, \"Therese\": 3, \"Janice\": 6})\n", - "week2 = Series({\"Rita\": 3, \"Therese\": 7, \"Janice\": 4})\n", - "week3 = Series({\"Therese\": 5, \"Janice\": 5, \"Rita\": 8}) # Wrong order! Will this matter?\n", - "print(week1)\n", - "print(week2)\n", - "print(week3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For everyone in Week 1, add 3 to their hours " - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Rita 8\n", - "Therese 6\n", - "Janice 9\n", - "dtype: int64" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "week1 = week1 + 3\n", - "week1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Total up everyone's hours" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Janice 18\n", - "Rita 19\n", - "Therese 18\n", - "dtype: int64" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "total_hours = week1 + week2 + week3\n", - "total_hours" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### What is week1 / week3 ?" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Janice 1.8\n", - "Rita 1.0\n", - "Therese 1.2\n", - "dtype: float64" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "week1 / week3\n", - "# Notice that we didn't have to worry about the order of indices" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### What type of values are stored in week1 > week2?" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Rita 8\n", - "Therese 6\n", - "Janice 9\n", - "dtype: int64\n", - "Rita 3\n", - "Therese 7\n", - "Janice 4\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "Rita True\n", - "Therese False\n", - "Janice True\n", - "dtype: bool" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(week1)\n", - "print(week2)\n", - "week1 > week2 \n", - "# Notice that indices are ordered the same" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### What is week1 > week3?" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Rita 8\n", - "Therese 6\n", - "Janice 9\n", - "dtype: int64\n", - "Therese 5\n", - "Janice 5\n", - "Rita 8\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "Janice True\n", - "Rita False\n", - "Therese True\n", - "dtype: bool" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(week1)\n", - "print(week3)\n", - "# week1 > week3 # Does not work (ValueError) because indices are not in same order\n", - "\n", - "# How can we fix this?\n", - "week1.sort_index() > week3.sort_index()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "# Lecture 28: Pandas 2 - DataFrames\n", - "\n", - "\n", - "Learning Objectives:\n", - "- Create a DataFrame from \n", - " - a dictionary of Series, lists, or dicts\n", - " - a list of Series, lists, dicts\n", - "- Select a column, row, cell, or rectangular region of a DataFrame\n", - "- Convert CSV files into DataFrames and DataFrames into CSV Files\n", - "- Access the head or tail of a DataFrame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Big Idea**: Data Frames store 2-dimensional data in tables! It is a collection of Series." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## You can create a DataFrame in a variety of ways!\n", - "\n", - "- dictionary of Series\n", - "- dictionary of lists\n", - "- dictionary of dictionaries\n", - "- list of dictionarines\n", - "- list of lists\n", - "\n", - "### From a dictionary of Series" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "0 Alice 6\n", - "1 Bob 7\n", - "2 Cindy 8\n", - "3 Dan 9" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "names = Series([\"Alice\", \"Bob\", \"Cindy\", \"Dan\"])\n", - "scores = Series([6, 7, 8, 9])\n", - "\n", - "# to make a dictionary of Series, need to write column names for the keys\n", - "DataFrame({\n", - " \"Player name\": names,\n", - " \"Score\": scores\n", - "})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a dictionary of lists" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "0 Alice 6\n", - "1 Bob 7\n", - "2 Cindy 8\n", - "3 Dan 9" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "name_list = [\"Alice\", \"Bob\", \"Cindy\", \"Dan\"]\n", - "score_list = [6, 7, 8, 9]\n", - "\n", - "# this is the same as above, reminding us that Series act like lists\n", - "DataFrame({\n", - " \"Player name\": name_list,\n", - " \"Score\": score_list\n", - "})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a dictionary of dictionaries\n", - "We need to make up keys to match the things in each column" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "0 Alice 6\n", - "1 Bob 7\n", - "2 Cindy 8\n", - "3 Dan 9" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data = {\n", - " \"Player name\": {0: \"Alice\", 1: \"Bob\", 2: \"Cindy\", 3: \"Dan\"},\n", - " \"Score\": {0: 6, 1: 7, 2: 8, 3: 9}\n", - "}\n", - "DataFrame(data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a list of dicts" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "0 Alice 6\n", - "1 Bob 7\n", - "2 Cindy 8\n", - "3 Dan 9" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data = [\n", - " {\"Player name\": \"Alice\", \"Score\": 6},\n", - " {\"Player name\": \"Bob\", \"Score\": 7},\n", - " {\"Player name\": \"Cindy\", \"Score\": 8},\n", - " {\"Player name\": \"Dan\", \"Score\": 9}\n", - "]\n", - "DataFrame(data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a list of lists" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>0</th>\n", - " <th>1</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " 0 1\n", - "0 Alice 6\n", - "1 Bob 7\n", - "2 Cindy 8\n", - "3 Dan 9" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data = [\n", - " [\"Alice\", 6],\n", - " [\"Bob\", 7],\n", - " [\"Cindy\", 8],\n", - " [\"Dan\", 9]\n", - "]\n", - "DataFrame(data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explicitly naming the columns\n", - "We have to add the column names, we do this with `columns = [name1, name2, ....]` " - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "0 Alice 6\n", - "1 Bob 7\n", - "2 Cindy 8\n", - "3 Dan 9" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data = [\n", - " [\"Alice\", 6],\n", - " [\"Bob\", 7],\n", - " [\"Cindy\", 8],\n", - " [\"Dan\", 9]\n", - "]\n", - "DataFrame(data, columns=[\"Player name\", \"Score\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explicitly naming the indices\n", - "We can use `index = [name1, name2, ...]` to rename the index of each row" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>A</th>\n", - " <td>Alice</td>\n", - " <td>6</td>\n", - " </tr>\n", - " <tr>\n", - " <th>B</th>\n", - " <td>Bob</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>C</th>\n", - " <td>Cindy</td>\n", - " <td>8</td>\n", - " </tr>\n", - " <tr>\n", - " <th>D</th>\n", - " <td>Dan</td>\n", - " <td>9</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "A Alice 6\n", - "B Bob 7\n", - "C Cindy 8\n", - "D Dan 9" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data = [\n", - " {\"Player name\": \"Alice\", \"Score\": 6},\n", - " {\"Player name\": \"Bob\", \"Score\": 7},\n", - " {\"Player name\": \"Cindy\", \"Score\": 8},\n", - " {\"Player name\": \"Dan\", \"Score\": 9}\n", - "]\n", - "DataFrame(data, index = [\"A\", \"B\", \"C\", \"D\"]) # must have a name for each row" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explicitly naming the columns" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Name</th>\n", - " <th>Age</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>A</th>\n", - " <td>Hope</td>\n", - " <td>10</td>\n", - " </tr>\n", - " <tr>\n", - " <th>B</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>C</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>D</th>\n", - " <td>Love</td>\n", - " <td>11</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Name Age\n", - "A Hope 10\n", - "B Peace 7\n", - "C Joy 4\n", - "D Love 11" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# You try: \n", - "# Make a DataFrame of 4 people you know with different ages\n", - "# Give names to both the columns and rows\n", - "ages = [\n", - " [\"Hope\", 10],\n", - " [\"Peace\", 7],\n", - " [\"Joy\", 4],\n", - " [\"Love\", 11]\n", - "]\n", - "DataFrame(ages, index = [\"A\", \"B\", \"C\", \"D\"], columns = [\"Name\", \"Age\"])\n", - "\n", - "# Share how you did with this with your neighbor\n", - "# If you both did it the same way, try it a different way." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Select a column, row, cell, or rectangular region of a DataFrame\n", - "### Data lookup: Series\n", - "- `s.loc[X]` <- lookup by pandas index\n", - "- `s.iloc[X]` <- lookup by integer position" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Alice 6\n", - "Bob 7\n", - "Cindy 8\n", - "Dan 9\n", - "dtype: int64" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "hours = Series({\"Alice\":6, \"Bob\":7, \"Cindy\":8, \"Dan\":9})\n", - "hours" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Lookup Bob's hours by pandas index.\n", - "hours.loc[\"Bob\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Lookup Bob's hours by integer position.\n", - "hours.iloc[2]" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Lookup Cindy's hours by pandas index.\n", - "hours.loc[\"Cindy\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Data lookup: DataFrame\n", - "\n", - "\n", - "- `d.loc[r]` lookup ROW by pandas ROW index\n", - "- `d.iloc[r]` lookup ROW by ROW integer position\n", - "- `d[c]` lookup COL by pandas COL index\n", - "- `d.loc[r, c]` lookup by pandas ROW index and pandas COL index\n", - "- `d.iloc[r, c]` lookup by ROW integer position and COL integer position" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>10</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Love</td>\n", - " <td>11</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 10\n", - "P Peace 7\n", - "J Joy 4\n", - "L Love 11" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# We often call the object that we make df\n", - "data = [\n", - " [\"Hope\", 10],\n", - " [\"Peace\", 7],\n", - " [\"Joy\", 4],\n", - " [\"Love\", 11]\n", - "]\n", - "df = DataFrame(data, index = [\"H\", \"P\", \"J\", \"L\"], columns = [\"Player name\", \"Score\"])\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are 3 different ways of accessing row L? " - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Player name Love\n", - "Score 11\n", - "Name: L, dtype: object\n", - "Player name Love\n", - "Score 11\n", - "Name: L, dtype: object\n", - "Player name Love\n", - "Score 11\n", - "Name: L, dtype: object\n" - ] - } - ], - "source": [ - "#df[\"L\"] # Nope!\n", - "print(df.loc[\"L\"])\n", - "print(df.iloc[3])\n", - "print(df.iloc[-1])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How about accessing a column?" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>10</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Love</td>\n", - " <td>11</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 10\n", - "P Peace 7\n", - "J Joy 4\n", - "L Love 11" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "H Hope\n", - "P Peace\n", - "J Joy\n", - "L Love\n", - "Name: Player name, dtype: object\n" - ] - } - ], - "source": [ - "print(df[\"Player name\"])\n", - "#df[0] # Doesn't work!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are 3 different ways to access a single cell?" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>10</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Love</td>\n", - " <td>11</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 10\n", - "P Peace 7\n", - "J Joy 4\n", - "L Love 11" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Joy\n", - "Joy\n", - "Joy\n" - ] - } - ], - "source": [ - "# How to access Cindy?\n", - "#print(df[\"C\", \"Player name\"]) # Nope!\n", - "print(df.loc[\"J\", \"Player name\"])\n", - "print(df[\"Player name\"].loc[\"J\"])\n", - "print(df.iloc[2, 0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to set values for a specific entry?\n", - "\n", - "- `d.loc[r, c] = new_val`\n", - "- `d.iloc[r, c] = new_val`" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>10</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>11</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 10\n", - "P Peace 7\n", - "J Joy 4\n", - "L Luisa 11" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#change player D's name\n", - "df.loc[\"L\", \"Player name\"] = \"Luisa\"\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>10</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>14</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 10\n", - "P Peace 7\n", - "J Joy 4\n", - "L Luisa 14" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# then add 3 to that player's score using .loc\n", - "df.loc[\"L\",\"Score\"] += 3\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>14</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "P Peace 7\n", - "J Joy 4\n", - "L Luisa 14" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# add 7 to a different player's score using .iloc\n", - "df.iloc[0, 1] += 7\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find the max score and the mean score" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "17 10.5\n" - ] - } - ], - "source": [ - "# find the max and mean of the \"Score\" column\n", - "print(df[\"Score\"].max(), df[\"Score\"].mean())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find the highest scoring player" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>14</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "P Peace 7\n", - "J Joy 4\n", - "L Luisa 14" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'Hope'" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "highest_scorer = df[\"Score\"].idxmax()\n", - "df[\"Player name\"].loc[highest_scorer]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing a DataFrame\n", - "\n", - "- `df.iloc[ROW_SLICE, COL_SLICE]` <- make a rectangular slice from the DataFrame using integer positions\n", - "- `df.loc[ROW_SLICE, COL_SLICE]` <- make a rectangular slice from the DataFrame using index" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "P Peace 7\n", - "J Joy 4" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.iloc[1:3, 0:2]" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "P Peace 7\n", - "J Joy 4" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.loc[\"P\":\"J\", \"Player name\":\"Score\"] # notice that this way is inclusive of endpoints" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set values for sliced DataFrame\n", - "\n", - "- `d.loc[ROW_SLICE, COL_SLICE] = new_val` <- set value by ROW INDEX and COL INDEX\n", - "- `d.iloc[ROW_SLICE, COL_SLICE] = new_val` <- set value by ROW Integer position and COL Integer position" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>7</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>14</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "P Peace 7\n", - "J Joy 4\n", - "L Luisa 14" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>12</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>9</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>14</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "P Peace 12\n", - "J Joy 9\n", - "L Luisa 14" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.loc[\"P\":\"J\", \"Score\"] += 5\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pandas allows slicing of non-contiguous columns" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "P Peace\n", - "L Luisa\n", - "Name: Player name, dtype: object" - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# just get Player name for Index B and D\n", - "df.loc[[\"P\", \"L\"],\"Player name\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>14</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>9</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>16</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "P Peace 14\n", - "J Joy 9\n", - "L Luisa 16" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# add 2 to the people in rows B and D\n", - "df.loc[[\"P\", \"L\"],\"Score\"] += 2\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Boolean indexing on a DataFrame\n", - "\n", - "- `d[BOOL SERIES]` <- makes a new DF of all rows that lined up were True" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>P</th>\n", - " <td>Peace</td>\n", - " <td>14</td>\n", - " </tr>\n", - " <tr>\n", - " <th>J</th>\n", - " <td>Joy</td>\n", - " <td>9</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>16</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "P Peace 14\n", - "J Joy 9\n", - "L Luisa 16" - ] - }, - "execution_count": 42, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Make a Series of Booleans based on Score >= 15" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "H True\n", - "P False\n", - "J False\n", - "L True\n", - "Name: Score, dtype: bool" - ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "b = df[\"Score\"] >= 15\n", - "b" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### use b to slice the DataFrame\n", - "if b is true, include this row in the new df" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>16</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "L Luisa 16" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df[b]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### do the last two things in a single step" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Player name</th>\n", - " <th>Score</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>H</th>\n", - " <td>Hope</td>\n", - " <td>17</td>\n", - " </tr>\n", - " <tr>\n", - " <th>L</th>\n", - " <td>Luisa</td>\n", - " <td>16</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Player name Score\n", - "H Hope 17\n", - "L Luisa 16" - ] - }, - "execution_count": 45, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df[df[\"Score\"] >= 15]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating DataFrame from csv" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>0</td>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>1</td>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>2</td>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>3</td>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>4</td>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>1063</td>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>1064</td>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>1065</td>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>1066</td>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>1067</td>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 9 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre \\\n", - "0 0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 2 Split Horror,Thriller \n", - "3 3 Sing Animation,Comedy,Family \n", - "4 4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... ... \n", - "1063 1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 1064 Baby Driver Action, Crime, Drama \n", - "1065 1065 Only the Brave Action, Biography, Drama \n", - "1066 1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 \n", - "... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 \n", - "\n", - "[1068 rows x 9 columns]" - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# it's that easy! \n", - "df = pd.read_csv(\"IMDB-Movie-Data.csv\")\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View the first few lines of the DataFrame\n", - "- `.head(n)` gets the first n lines, 5 is the default" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>0</td>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>1</td>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>2</td>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>3</td>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>4</td>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre \\\n", - "0 0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 2 Split Horror,Thriller \n", - "3 3 Sing Animation,Comedy,Family \n", - "4 4 Suicide Squad Action,Adventure,Fantasy \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 " - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### get the first 2 rows" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>0</td>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>1</td>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre Director \\\n", - "0 0 Guardians of the Galaxy Action,Adventure,Sci-Fi James Gunn \n", - "1 1 Prometheus Adventure,Mystery,Sci-Fi Ridley Scott \n", - "\n", - " Cast Year Runtime Rating \\\n", - "0 Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 8.1 \n", - "1 Noomi Rapace, Logan Marshall-Green, Michael ... 2012 124 7.0 \n", - "\n", - " Revenue \n", - "0 333.13 \n", - "1 126.46M " - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.head(2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View the first few lines of the DataFrame\n", - "- `.tail(n)` gets the last n lines, 5 is the default" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>1063</td>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>1064</td>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>1065</td>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>1066</td>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>1067</td>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre \\\n", - "1063 1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 1064 Baby Driver Action, Crime, Drama \n", - "1065 1065 Only the Brave Action, Biography, Drama \n", - "1066 1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 " - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.tail()" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>1065</td>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>1066</td>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>1067</td>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre Director \\\n", - "1065 1065 Only the Brave Action, Biography, Drama Joseph Kosinski \n", - "1066 1066 Incredibles 2 Animation, Action, Adventure Brad Bird \n", - "1067 1067 A Star Is Born Drama, Music, Romance Bradley Cooper \n", - "\n", - " Cast Year Runtime \\\n", - "1065 Josh Brolin, Miles Teller, Jeff Bridges, Jenni... 2017 134 \n", - "1066 Craig T. Nelson, Holly Hunter, Sarah Vowell, H... 2018 118 \n", - "1067 Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... 2018 136 \n", - "\n", - " Rating Revenue \n", - "1065 7.6 18.34 \n", - "1066 7.6 608.58 \n", - "1067 7.6 215.29 " - ] - }, - "execution_count": 50, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.tail(3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are the first and the last years in our dataset?" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "First year: 2006, Last year: 2020\n" - ] - } - ], - "source": [ - "print(\"First year: {}, Last year: {}\".format(df[\"Year\"].min(), df[\"Year\"].max()))" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>114</th>\n", - " <td>114</td>\n", - " <td>Harry Potter and the Deathly Hallows: Part 2</td>\n", - " <td>Adventure,Drama,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, M...</td>\n", - " <td>2011</td>\n", - " <td>130</td>\n", - " <td>8.1</td>\n", - " <td>380.96</td>\n", - " </tr>\n", - " <tr>\n", - " <th>314</th>\n", - " <td>314</td>\n", - " <td>Harry Potter and the Order of the Phoenix</td>\n", - " <td>Adventure,Family,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, B...</td>\n", - " <td>2007</td>\n", - " <td>138</td>\n", - " <td>7.5</td>\n", - " <td>292</td>\n", - " </tr>\n", - " <tr>\n", - " <th>417</th>\n", - " <td>417</td>\n", - " <td>Harry Potter and the Deathly Hallows: Part 1</td>\n", - " <td>Adventure,Family,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, B...</td>\n", - " <td>2010</td>\n", - " <td>146</td>\n", - " <td>7.7</td>\n", - " <td>294.98</td>\n", - " </tr>\n", - " <tr>\n", - " <th>472</th>\n", - " <td>472</td>\n", - " <td>Harry Potter and the Half-Blood Prince</td>\n", - " <td>Adventure,Family,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, M...</td>\n", - " <td>2009</td>\n", - " <td>153</td>\n", - " <td>7.5</td>\n", - " <td>301.96</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Index Title \\\n", - "114 114 Harry Potter and the Deathly Hallows: Part 2 \n", - "314 314 Harry Potter and the Order of the Phoenix \n", - "417 417 Harry Potter and the Deathly Hallows: Part 1 \n", - "472 472 Harry Potter and the Half-Blood Prince \n", - "\n", - " Genre Director \\\n", - "114 Adventure,Drama,Fantasy David Yates \n", - "314 Adventure,Family,Fantasy David Yates \n", - "417 Adventure,Family,Fantasy David Yates \n", - "472 Adventure,Family,Fantasy David Yates \n", - "\n", - " Cast Year Runtime Rating \\\n", - "114 Daniel Radcliffe, Emma Watson, Rupert Grint, M... 2011 130 8.1 \n", - "314 Daniel Radcliffe, Emma Watson, Rupert Grint, B... 2007 138 7.5 \n", - "417 Daniel Radcliffe, Emma Watson, Rupert Grint, B... 2010 146 7.7 \n", - "472 Daniel Radcliffe, Emma Watson, Rupert Grint, M... 2009 153 7.5 \n", - "\n", - " Revenue \n", - "114 380.96 \n", - "314 292 \n", - "417 294.98 \n", - "472 301.96 " - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "### What are the rows that correspond to movies whose title contains \"Harry\" ? \n", - "df[df[\"Title\"].str.contains(\"Harry\")]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the movie at index 6 ? " - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Index 6\n", - "Title La La Land\n", - "Genre Comedy,Drama,Music\n", - "Director Damien Chazelle\n", - "Cast Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....\n", - "Year 2016\n", - "Runtime 128\n", - "Rating 8.3\n", - "Revenue 151.06M\n", - "Name: 6, dtype: object" - ] - }, - "execution_count": 53, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.iloc[6]" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>0</td>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>1</td>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>2</td>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>3</td>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>4</td>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>1063</td>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>1064</td>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>1065</td>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>1066</td>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>1067</td>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 9 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre \\\n", - "0 0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 2 Split Horror,Thriller \n", - "3 3 Sing Animation,Comedy,Family \n", - "4 4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... ... \n", - "1063 1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 1064 Baby Driver Action, Crime, Drama \n", - "1065 1065 Only the Brave Action, Biography, Drama \n", - "1066 1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 \n", - "... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 \n", - "\n", - "[1068 rows x 9 columns]" - ] - }, - "execution_count": 54, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Notice that there are two index columns\n", - "- That happened because when you write a csv from pandas to a file, it writes a new index column\n", - "- So if the dataFrame already contains an index, you are going to get two index columns\n", - "- Let's fix that problem" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How can you use slicing to get just columns with Title and Year?" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Year</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>2014</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>2012</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>2016</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Sing</td>\n", - " <td>2016</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>2016</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>2017</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>Baby Driver</td>\n", - " <td>2017</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>2017</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>2018</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>2018</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 2 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Year\n", - "0 Guardians of the Galaxy 2014\n", - "1 Prometheus 2012\n", - "2 Split 2016\n", - "3 Sing 2016\n", - "4 Suicide Squad 2016\n", - "... ... ...\n", - "1063 Guardians of the Galaxy Vol. 2 2017\n", - "1064 Baby Driver 2017\n", - "1065 Only the Brave 2017\n", - "1066 Incredibles 2 2018\n", - "1067 A Star Is Born 2018\n", - "\n", - "[1068 rows x 2 columns]" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df2 = df[[\"Title\", \"Year\"]]\n", - "df2\n", - "# notice that this does not have the 'index' column" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How can you use slicing to get rid of the first column?" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 8 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 Split Horror,Thriller \n", - "3 Sing Animation,Comedy,Family \n", - "4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... \n", - "1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 Baby Driver Action, Crime, Drama \n", - "1065 Only the Brave Action, Biography, Drama \n", - "1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 \n", - "... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 \n", - "\n", - "[1068 rows x 8 columns]" - ] - }, - "execution_count": 56, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = df.iloc[:, 1:] #all the rows, not column 0\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write a df to a csv file" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [], - "source": [ - "df.to_csv(\"better_movies.csv\", index = False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Practice on your own.....Data Analysis with Data Frames\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are all the movies that have above average run time (long movies)? " - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>La La Land</td>\n", - " <td>Comedy,Drama,Music</td>\n", - " <td>Damien Chazelle</td>\n", - " <td>Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....</td>\n", - " <td>2016</td>\n", - " <td>128</td>\n", - " <td>8.3</td>\n", - " <td>151.06M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1060</th>\n", - " <td>Just Mercy</td>\n", - " <td>Biography, Crime, Drama</td>\n", - " <td>Destin Daniel Cretton</td>\n", - " <td>Michael B. Jordan, Jamie Foxx, Brie Larson, Ch...</td>\n", - " <td>2019</td>\n", - " <td>137</td>\n", - " <td>7.6</td>\n", - " <td>50.4</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>463 rows × 8 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 Split Horror,Thriller \n", - "4 Suicide Squad Action,Adventure,Fantasy \n", - "6 La La Land Comedy,Drama,Music \n", - "... ... ... \n", - "1060 Just Mercy Biography, Crime, Drama \n", - "1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1065 Only the Brave Action, Biography, Drama \n", - "1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director \\\n", - "0 James Gunn \n", - "1 Ridley Scott \n", - "2 M. Night Shyamalan \n", - "4 David Ayer \n", - "6 Damien Chazelle \n", - "... ... \n", - "1060 Destin Daniel Cretton \n", - "1063 James Gunn \n", - "1065 Joseph Kosinski \n", - "1066 Brad Bird \n", - "1067 Bradley Cooper \n", - "\n", - " Cast Year Runtime \\\n", - "0 Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 \n", - "1 Noomi Rapace, Logan Marshall-Green, Michael ... 2012 124 \n", - "2 James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 117 \n", - "4 Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 123 \n", - "6 Ryan Gosling, Emma Stone, Rosemarie DeWitt, J.... 2016 128 \n", - "... ... ... ... \n", - "1060 Michael B. Jordan, Jamie Foxx, Brie Larson, Ch... 2019 137 \n", - "1063 Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... 2017 136 \n", - "1065 Josh Brolin, Miles Teller, Jeff Bridges, Jenni... 2017 134 \n", - "1066 Craig T. Nelson, Holly Hunter, Sarah Vowell, H... 2018 118 \n", - "1067 Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... 2018 136 \n", - "\n", - " Rating Revenue \n", - "0 8.1 333.13 \n", - "1 7.0 126.46M \n", - "2 7.3 138.12M \n", - "4 6.2 325.02 \n", - "6 8.3 151.06M \n", - "... ... ... \n", - "1060 7.6 50.4 \n", - "1063 7.6 389.81 \n", - "1065 7.6 18.34 \n", - "1066 7.6 608.58 \n", - "1067 7.6 215.29 \n", - "\n", - "[463 rows x 8 columns]" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "long_movies = df [df[\"Runtime\"] > df[\"Runtime\"].mean()]\n", - "long_movies" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Which long movie has the lowest rating?" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3.2" - ] - }, - "execution_count": 59, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "min_rating = long_movies[\"Rating\"].min()\n", - "min_rating" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>646</th>\n", - " <td>Tall Men</td>\n", - " <td>Fantasy,Horror,Thriller</td>\n", - " <td>Jonathan Holbrook</td>\n", - " <td>Dan Crisafulli, Kay Whitney, Richard Garcia, P...</td>\n", - " <td>2016</td>\n", - " <td>133</td>\n", - " <td>3.2</td>\n", - " <td>0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title Genre Director \\\n", - "646 Tall Men Fantasy,Horror,Thriller Jonathan Holbrook \n", - "\n", - " Cast Year Runtime Rating \\\n", - "646 Dan Crisafulli, Kay Whitney, Richard Garcia, P... 2016 133 3.2 \n", - "\n", - " Revenue \n", - "646 0 " - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Which movies had this min rating?\n", - "long_movies[long_movies[\"Rating\"] == min_rating]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are all long movies with someone in the cast named \"Emma\" ? " - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>La La Land</td>\n", - " <td>Comedy,Drama,Music</td>\n", - " <td>Damien Chazelle</td>\n", - " <td>Ryan Gosling, Emma Stone, Rosemarie DeWitt, J....</td>\n", - " <td>2016</td>\n", - " <td>128</td>\n", - " <td>8.3</td>\n", - " <td>151.06M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>92</th>\n", - " <td>The Help</td>\n", - " <td>Drama</td>\n", - " <td>Tate Taylor</td>\n", - " <td>Emma Stone, Viola Davis, Octavia Spencer, Bryc...</td>\n", - " <td>2011</td>\n", - " <td>146</td>\n", - " <td>8.1</td>\n", - " <td>169.71M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>114</th>\n", - " <td>Harry Potter and the Deathly Hallows: Part 2</td>\n", - " <td>Adventure,Drama,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, M...</td>\n", - " <td>2011</td>\n", - " <td>130</td>\n", - " <td>8.1</td>\n", - " <td>380.96</td>\n", - " </tr>\n", - " <tr>\n", - " <th>157</th>\n", - " <td>Crazy, Stupid, Love.</td>\n", - " <td>Comedy,Drama,Romance</td>\n", - " <td>Glenn Ficarra</td>\n", - " <td>Steve Carell, Ryan Gosling, Julianne Moore, Em...</td>\n", - " <td>2011</td>\n", - " <td>118</td>\n", - " <td>7.4</td>\n", - " <td>84.24</td>\n", - " </tr>\n", - " <tr>\n", - " <th>253</th>\n", - " <td>The Amazing Spider-Man 2</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>Marc Webb</td>\n", - " <td>Andrew Garfield, Emma Stone, Jamie Foxx, Paul ...</td>\n", - " <td>2014</td>\n", - " <td>142</td>\n", - " <td>6.7</td>\n", - " <td>202.85</td>\n", - " </tr>\n", - " <tr>\n", - " <th>314</th>\n", - " <td>Harry Potter and the Order of the Phoenix</td>\n", - " <td>Adventure,Family,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, B...</td>\n", - " <td>2007</td>\n", - " <td>138</td>\n", - " <td>7.5</td>\n", - " <td>292</td>\n", - " </tr>\n", - " <tr>\n", - " <th>367</th>\n", - " <td>The Amazing Spider-Man</td>\n", - " <td>Action,Adventure</td>\n", - " <td>Marc Webb</td>\n", - " <td>Andrew Garfield, Emma Stone, Rhys Ifans, Irrfa...</td>\n", - " <td>2012</td>\n", - " <td>136</td>\n", - " <td>7.0</td>\n", - " <td>262.03</td>\n", - " </tr>\n", - " <tr>\n", - " <th>417</th>\n", - " <td>Harry Potter and the Deathly Hallows: Part 1</td>\n", - " <td>Adventure,Family,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, B...</td>\n", - " <td>2010</td>\n", - " <td>146</td>\n", - " <td>7.7</td>\n", - " <td>294.98</td>\n", - " </tr>\n", - " <tr>\n", - " <th>472</th>\n", - " <td>Harry Potter and the Half-Blood Prince</td>\n", - " <td>Adventure,Family,Fantasy</td>\n", - " <td>David Yates</td>\n", - " <td>Daniel Radcliffe, Emma Watson, Rupert Grint, M...</td>\n", - " <td>2009</td>\n", - " <td>153</td>\n", - " <td>7.5</td>\n", - " <td>301.96</td>\n", - " </tr>\n", - " <tr>\n", - " <th>609</th>\n", - " <td>Beautiful Creatures</td>\n", - " <td>Drama,Fantasy,Romance</td>\n", - " <td>Richard LaGravenese</td>\n", - " <td>Alice Englert, Viola Davis, Emma Thompson,Alde...</td>\n", - " <td>2013</td>\n", - " <td>124</td>\n", - " <td>6.2</td>\n", - " <td>19.45</td>\n", - " </tr>\n", - " <tr>\n", - " <th>717</th>\n", - " <td>Noah</td>\n", - " <td>Action,Adventure,Drama</td>\n", - " <td>Darren Aronofsky</td>\n", - " <td>Russell Crowe, Jennifer Connelly, Anthony Hopk...</td>\n", - " <td>2014</td>\n", - " <td>138</td>\n", - " <td>5.8</td>\n", - " <td>101.16M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>879</th>\n", - " <td>Saving Mr. Banks</td>\n", - " <td>Biography,Comedy,Drama</td>\n", - " <td>John Lee Hancock</td>\n", - " <td>Emma Thompson, Tom Hanks, Annie Rose Buckley, ...</td>\n", - " <td>2013</td>\n", - " <td>125</td>\n", - " <td>7.5</td>\n", - " <td>83.3</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1044</th>\n", - " <td>Little Women</td>\n", - " <td>Drama, Romance</td>\n", - " <td>Greta Gerwig</td>\n", - " <td>Saoirse Ronan, Emma Watson, Florence Pugh, Eli...</td>\n", - " <td>2019</td>\n", - " <td>135</td>\n", - " <td>7.8</td>\n", - " <td>108.1</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "6 La La Land Comedy,Drama,Music \n", - "92 The Help Drama \n", - "114 Harry Potter and the Deathly Hallows: Part 2 Adventure,Drama,Fantasy \n", - "157 Crazy, Stupid, Love. Comedy,Drama,Romance \n", - "253 The Amazing Spider-Man 2 Action,Adventure,Sci-Fi \n", - "314 Harry Potter and the Order of the Phoenix Adventure,Family,Fantasy \n", - "367 The Amazing Spider-Man Action,Adventure \n", - "417 Harry Potter and the Deathly Hallows: Part 1 Adventure,Family,Fantasy \n", - "472 Harry Potter and the Half-Blood Prince Adventure,Family,Fantasy \n", - "609 Beautiful Creatures Drama,Fantasy,Romance \n", - "717 Noah Action,Adventure,Drama \n", - "879 Saving Mr. Banks Biography,Comedy,Drama \n", - "1044 Little Women Drama, Romance \n", - "\n", - " Director Cast \\\n", - "6 Damien Chazelle Ryan Gosling, Emma Stone, Rosemarie DeWitt, J.... \n", - "92 Tate Taylor Emma Stone, Viola Davis, Octavia Spencer, Bryc... \n", - "114 David Yates Daniel Radcliffe, Emma Watson, Rupert Grint, M... \n", - "157 Glenn Ficarra Steve Carell, Ryan Gosling, Julianne Moore, Em... \n", - "253 Marc Webb Andrew Garfield, Emma Stone, Jamie Foxx, Paul ... \n", - "314 David Yates Daniel Radcliffe, Emma Watson, Rupert Grint, B... \n", - "367 Marc Webb Andrew Garfield, Emma Stone, Rhys Ifans, Irrfa... \n", - "417 David Yates Daniel Radcliffe, Emma Watson, Rupert Grint, B... \n", - "472 David Yates Daniel Radcliffe, Emma Watson, Rupert Grint, M... \n", - "609 Richard LaGravenese Alice Englert, Viola Davis, Emma Thompson,Alde... \n", - "717 Darren Aronofsky Russell Crowe, Jennifer Connelly, Anthony Hopk... \n", - "879 John Lee Hancock Emma Thompson, Tom Hanks, Annie Rose Buckley, ... \n", - "1044 Greta Gerwig Saoirse Ronan, Emma Watson, Florence Pugh, Eli... \n", - "\n", - " Year Runtime Rating Revenue \n", - "6 2016 128 8.3 151.06M \n", - "92 2011 146 8.1 169.71M \n", - "114 2011 130 8.1 380.96 \n", - "157 2011 118 7.4 84.24 \n", - "253 2014 142 6.7 202.85 \n", - "314 2007 138 7.5 292 \n", - "367 2012 136 7.0 262.03 \n", - "417 2010 146 7.7 294.98 \n", - "472 2009 153 7.5 301.96 \n", - "609 2013 124 6.2 19.45 \n", - "717 2014 138 5.8 101.16M \n", - "879 2013 125 7.5 83.3 \n", - "1044 2019 135 7.8 108.1 " - ] - }, - "execution_count": 61, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "long_movies[long_movies[\"Cast\"].str.contains(\"Emma\")]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the title of the shortest movie?" - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "792 Ma vie de Courgette\n", - "Name: Title, dtype: object" - ] - }, - "execution_count": 62, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df[df[\"Runtime\"] == df[\"Runtime\"].min()][\"Title\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What movie had the highest revenue?" - ] - }, - { - "cell_type": "code", - "execution_count": 63, - "metadata": {}, - "outputs": [], - "source": [ - "# What movie had the highest revenue?\n", - "# df[\"Revnue\"].max() did not work\n", - "# we need to clean our data\n", - "\n", - "def format_revenue(revenue):\n", - " #TODO: Check the last character of the string\n", - " if type(revenue) == float: # need this in here if we run code multiple times\n", - " return revenue\n", - " elif revenue[-1] == 'M': # some have an \"M\" at the end\n", - " return float(revenue[:-1]) * 1e6\n", - " else:\n", - " return float(revenue) * 1e6" - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 333130000.0\n", - "1 126460000.0\n", - "2 138120000.0\n", - "3 270320000.0\n", - "4 325020000.0\n", - "Name: Revenue, dtype: float64\n" - ] - }, - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " <th>Revenue (float)</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " <td>333130000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " <td>126460000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " <td>138120000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " <td>270320000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " <td>325020000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " <td>389810000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " <td>107830000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " <td>18340000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " <td>608580000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " <td>215290000.0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 9 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 Split Horror,Thriller \n", - "3 Sing Animation,Comedy,Family \n", - "4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... \n", - "1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 Baby Driver Action, Crime, Drama \n", - "1065 Only the Brave Action, Biography, Drama \n", - "1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue Revenue (float) \n", - "0 2014 121 8.1 333.13 333130000.0 \n", - "1 2012 124 7.0 126.46M 126460000.0 \n", - "2 2016 117 7.3 138.12M 138120000.0 \n", - "3 2016 108 7.2 270.32 270320000.0 \n", - "4 2016 123 6.2 325.02 325020000.0 \n", - "... ... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 389810000.0 \n", - "1064 2017 113 7.6 107.83 107830000.0 \n", - "1065 2017 134 7.6 18.34 18340000.0 \n", - "1066 2018 118 7.6 608.58 608580000.0 \n", - "1067 2018 136 7.6 215.29 215290000.0 \n", - "\n", - "[1068 rows x 9 columns]" - ] - }, - "execution_count": 64, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What movie had the highest revenue?\n", - "revenue = df[\"Revenue\"].apply(format_revenue) # apply a function to a column\n", - "print(revenue.head())\n", - "max_revenue = revenue.max()\n", - "\n", - "# make a copy of our df\n", - "rev_df = df.copy()\n", - "rev_df[\"Revenue (float)\"] = revenue\n", - "rev_df" - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " <th>Revenue (float)</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>50</th>\n", - " <td>Star Wars: Episode VII - The Force Awakens</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>J.J. Abrams</td>\n", - " <td>Daisy Ridley, John Boyega, Oscar Isaac, Domhna...</td>\n", - " <td>2015</td>\n", - " <td>136</td>\n", - " <td>8.1</td>\n", - " <td>936.63</td>\n", - " <td>936630000.0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "50 Star Wars: Episode VII - The Force Awakens Action,Adventure,Fantasy \n", - "\n", - " Director Cast Year \\\n", - "50 J.J. Abrams Daisy Ridley, John Boyega, Oscar Isaac, Domhna... 2015 \n", - "\n", - " Runtime Rating Revenue Revenue (float) \n", - "50 136 8.1 936.63 936630000.0 " - ] - }, - "execution_count": 65, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Now we can answer the question!\n", - "rev_df[rev_df[\"Revenue (float)\"] == max_revenue]" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " <th>Revenue (float)</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>50</th>\n", - " <td>Star Wars: Episode VII - The Force Awakens</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>J.J. Abrams</td>\n", - " <td>Daisy Ridley, John Boyega, Oscar Isaac, Domhna...</td>\n", - " <td>2015</td>\n", - " <td>136</td>\n", - " <td>8.1</td>\n", - " <td>936.63</td>\n", - " <td>936630000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1006</th>\n", - " <td>Avengers: Endgame</td>\n", - " <td>Action, Adventure, Drama</td>\n", - " <td>Anthony Russo</td>\n", - " <td>Joe Russo, Robert Downey Jr., Chris Evans, Mar...</td>\n", - " <td>2019</td>\n", - " <td>181</td>\n", - " <td>8.4</td>\n", - " <td>858.37</td>\n", - " <td>858370000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>87</th>\n", - " <td>Avatar</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>James Cameron</td>\n", - " <td>Sam Worthington, Zoe Saldana, Sigourney Weaver...</td>\n", - " <td>2009</td>\n", - " <td>162</td>\n", - " <td>7.8</td>\n", - " <td>760.51</td>\n", - " <td>760510000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1007</th>\n", - " <td>Avengers: Infinity War</td>\n", - " <td>Action, Adventure, Sci-Fi</td>\n", - " <td>Anthony Russo</td>\n", - " <td>Joe Russo, Robert Downey Jr., Chris Hemsworth,...</td>\n", - " <td>2018</td>\n", - " <td>149</td>\n", - " <td>8.4</td>\n", - " <td>678.82</td>\n", - " <td>678820000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>85</th>\n", - " <td>Jurassic World</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>Colin Trevorrow</td>\n", - " <td>Chris Pratt, Bryce Dallas Howard, Ty Simpkins,...</td>\n", - " <td>2015</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>652.18</td>\n", - " <td>652180000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>974</th>\n", - " <td>Dark Places</td>\n", - " <td>Drama,Mystery,Thriller</td>\n", - " <td>Gilles Paquet-Brenner</td>\n", - " <td>Charlize Theron, Nicholas Hoult, Christina Hen...</td>\n", - " <td>2015</td>\n", - " <td>113</td>\n", - " <td>6.2</td>\n", - " <td>0</td>\n", - " <td>0.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>183</th>\n", - " <td>Realive</td>\n", - " <td>Sci-Fi</td>\n", - " <td>Mateo Gil</td>\n", - " <td>Tom Hughes, Charlotte Le Bon, Oona Chaplin, Ba...</td>\n", - " <td>2016</td>\n", - " <td>112</td>\n", - " <td>5.9</td>\n", - " <td>0</td>\n", - " <td>0.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>218</th>\n", - " <td>A Dark Song</td>\n", - " <td>Drama,Horror</td>\n", - " <td>Liam Gavin</td>\n", - " <td>Mark Huberman, Susan Loughnane, Steve Oram,Cat...</td>\n", - " <td>2016</td>\n", - " <td>100</td>\n", - " <td>6.1</td>\n", - " <td>0</td>\n", - " <td>0.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>397</th>\n", - " <td>Absolutely Anything</td>\n", - " <td>Comedy,Sci-Fi</td>\n", - " <td>Terry Jones</td>\n", - " <td>Simon Pegg, Kate Beckinsale, Sanjeev Bhaskar, ...</td>\n", - " <td>2015</td>\n", - " <td>85</td>\n", - " <td>6.0</td>\n", - " <td>0</td>\n", - " <td>0.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>815</th>\n", - " <td>I.T.</td>\n", - " <td>Crime,Drama,Mystery</td>\n", - " <td>John Moore</td>\n", - " <td>Pierce Brosnan, Jason Barry, Karen Moskow, Kai...</td>\n", - " <td>2016</td>\n", - " <td>95</td>\n", - " <td>5.4</td>\n", - " <td>0</td>\n", - " <td>0.0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 9 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "50 Star Wars: Episode VII - The Force Awakens Action,Adventure,Fantasy \n", - "1006 Avengers: Endgame Action, Adventure, Drama \n", - "87 Avatar Action,Adventure,Fantasy \n", - "1007 Avengers: Infinity War Action, Adventure, Sci-Fi \n", - "85 Jurassic World Action,Adventure,Sci-Fi \n", - "... ... ... \n", - "974 Dark Places Drama,Mystery,Thriller \n", - "183 Realive Sci-Fi \n", - "218 A Dark Song Drama,Horror \n", - "397 Absolutely Anything Comedy,Sci-Fi \n", - "815 I.T. Crime,Drama,Mystery \n", - "\n", - " Director \\\n", - "50 J.J. Abrams \n", - "1006 Anthony Russo \n", - "87 James Cameron \n", - "1007 Anthony Russo \n", - "85 Colin Trevorrow \n", - "... ... \n", - "974 Gilles Paquet-Brenner \n", - "183 Mateo Gil \n", - "218 Liam Gavin \n", - "397 Terry Jones \n", - "815 John Moore \n", - "\n", - " Cast Year Runtime \\\n", - "50 Daisy Ridley, John Boyega, Oscar Isaac, Domhna... 2015 136 \n", - "1006 Joe Russo, Robert Downey Jr., Chris Evans, Mar... 2019 181 \n", - "87 Sam Worthington, Zoe Saldana, Sigourney Weaver... 2009 162 \n", - "1007 Joe Russo, Robert Downey Jr., Chris Hemsworth,... 2018 149 \n", - "85 Chris Pratt, Bryce Dallas Howard, Ty Simpkins,... 2015 124 \n", - "... ... ... ... \n", - "974 Charlize Theron, Nicholas Hoult, Christina Hen... 2015 113 \n", - "183 Tom Hughes, Charlotte Le Bon, Oona Chaplin, Ba... 2016 112 \n", - "218 Mark Huberman, Susan Loughnane, Steve Oram,Cat... 2016 100 \n", - "397 Simon Pegg, Kate Beckinsale, Sanjeev Bhaskar, ... 2015 85 \n", - "815 Pierce Brosnan, Jason Barry, Karen Moskow, Kai... 2016 95 \n", - "\n", - " Rating Revenue Revenue (float) \n", - "50 8.1 936.63 936630000.0 \n", - "1006 8.4 858.37 858370000.0 \n", - "87 7.8 760.51 760510000.0 \n", - "1007 8.4 678.82 678820000.0 \n", - "85 7.0 652.18 652180000.0 \n", - "... ... ... ... \n", - "974 6.2 0 0.0 \n", - "183 5.9 0 0.0 \n", - "218 6.1 0 0.0 \n", - "397 6.0 0 0.0 \n", - "815 5.4 0 0.0 \n", - "\n", - "[1068 rows x 9 columns]" - ] - }, - "execution_count": 66, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Or more generally...\n", - "rev_df.sort_values(by=\"Revenue (float)\", ascending=False)" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 8 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 Split Horror,Thriller \n", - "3 Sing Animation,Comedy,Family \n", - "4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... \n", - "1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 Baby Driver Action, Crime, Drama \n", - "1065 Only the Brave Action, Biography, Drama \n", - "1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 \n", - "... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 \n", - "\n", - "[1068 rows x 8 columns]" - ] - }, - "execution_count": 67, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the average runtime for movies by \"Francis Lawrence\"?" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "126.75" - ] - }, - "execution_count": 68, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "fl_movies = df[df[\"Director\"] == \"Francis Lawrence\"]\n", - "fl_movies[\"Runtime\"].mean()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Which director had the highest average rating? " - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'Christopher Nolan': 8.533333333333333,\n", - " 'Martin Scorsese': 7.916666666666667,\n", - " 'Quentin Tarantino': 7.840000000000001,\n", - " 'David Fincher': 7.8199999999999985,\n", - " 'Denis Villeneuve': 7.8,\n", - " 'J.J. Abrams': 7.58,\n", - " 'Guy Ritchie': 7.5,\n", - " 'David Yates': 7.433333333333334,\n", - " 'Danny Boyle': 7.42,\n", - " 'Antoine Fuqua': 7.040000000000001,\n", - " 'Zack Snyder': 7.040000000000001,\n", - " 'Woody Allen': 7.019999999999999,\n", - " 'Peter Berg': 6.860000000000001,\n", - " 'Ridley Scott': 6.85,\n", - " 'Justin Lin': 6.82,\n", - " 'Michael Bay': 6.483333333333334,\n", - " 'Paul W.S. Anderson': 5.766666666666666,\n", - " 'M. Night Shyamalan': 5.533333333333332}" - ] - }, - "execution_count": 69, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# one way is to make a python dict of director, list of ratings\n", - "director_dict = dict()\n", - "\n", - "# make the dictionary: key is director, value is list of ratings\n", - "for i in range(len(df)):\n", - " director = df.loc[i, \"Director\"]\n", - " rating = df.loc[i, \"Rating\"]\n", - " #print(i, director, rating)\n", - " if director not in director_dict:\n", - " director_dict[director] = []\n", - " director_dict[director].append(rating)\n", - "\n", - "# make a ratings dict key is directory, value is average\n", - "# only include directors with > 4 movies\n", - "ratings_dict = {k: sum(v) / len(v) for (k, v) in director_dict.items() if len(v) > 4}\n", - "\n", - "#sort a dict by values\n", - "dict(sorted(ratings_dict.items(), key = lambda t:t[-1], reverse = True))" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Director\n", - "Christopher Nolan 8.533333\n", - "Martin Scorsese 7.916667\n", - "Quentin Tarantino 7.840000\n", - "David Fincher 7.820000\n", - "Denis Villeneuve 7.800000\n", - "J.J. Abrams 7.580000\n", - "Guy Ritchie 7.500000\n", - "David Yates 7.433333\n", - "Danny Boyle 7.420000\n", - "Antoine Fuqua 7.040000\n", - "Zack Snyder 7.040000\n", - "Woody Allen 7.020000\n", - "Peter Berg 6.860000\n", - "Ridley Scott 6.850000\n", - "Justin Lin 6.820000\n", - "Michael Bay 6.483333\n", - "Paul W.S. Anderson 5.766667\n", - "M. Night Shyamalan 5.533333\n", - "Name: Rating, dtype: float64" - ] - }, - "execution_count": 70, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# FOR DEMONSTRATION PURPOSES ONLY\n", - "# We haven't learnt about \"groupby\"\n", - "# Pandas has many operations which will be helpful!\n", - "\n", - "# Consider what you already know, and what Pandas can solve\n", - "# when formulating your solutions.\n", - "rating_groups = df.groupby(\"Director\")[\"Rating\"]\n", - "rating_groups.mean()[rating_groups.count() > 4].sort_values(ascending=False)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Extra Practice: Make up some of your own questions about the movies" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.7" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2_template-checkpoint.ipynb b/f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2_template-checkpoint.ipynb deleted file mode 100644 index b775fd1..0000000 --- a/f22/meena_lec_notes/lec-28/.ipynb_checkpoints/lec_28_pandas2_template-checkpoint.ipynb +++ /dev/null @@ -1,1208 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "from pandas import Series, DataFrame\n", - "# We can explictly import Series and DataFrame, why might we do this?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Series Review\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Series from `list`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "scores_list = [54, 22, 19, 73, 80]\n", - "scores_series = Series(scores_list)\n", - "scores_series\n", - "\n", - "# What is the terminology for: 0, 1, 2, ... ?? A: \n", - "# What is the terminology for: 54, 22, 19, .... ?? A: " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Selecting certain scores.\n", - "What are all the scores `> 50`?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Answer:** Boolean indexing. Try the following..." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "scores_series[[True, True, False, False, True]] # often called a \"mask\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are really writing a \"mask\" for our data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Series from `dict`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Imagine we hire students and track their weekly hours\n", - "week1 = Series({\"Rita\":5, \"Therese\":3, \"Janice\": 6})\n", - "week2 = Series({\"Rita\":3, \"Therese\":7, \"Janice\": 4})\n", - "week3 = Series({\"Therese\":5, \"Janice\":5, \"Rita\": 8}) # Wrong order! Will this matter?\n", - "print(week1)\n", - "print(week2)\n", - "print(week3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For everyone in Week 1, add 3 to their hours " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "week1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Total up everyone's hours" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "total_hours = ???\n", - "total_hours" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### What is week1 / week3 ?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "???\n", - "# Notice that we didn't have to worry about the order of indices" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### What type of values are stored in week1 > week2?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(week1)\n", - "print(week2)\n", - "???\n", - "# Notice that indices are ordered the same" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### What is week1 > week3?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(week1)\n", - "print(week3)\n", - "??? # Does it work?\n", - "\n", - "# How can we fix this?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "# Lecture 28: Pandas 2 - DataFrames\n", - "\n", - "\n", - "Learning Objectives:\n", - "- Create a DataFrame from \n", - " - a dictionary of Series, lists, or dicts\n", - " - a list of Series, lists, dicts\n", - "- Select a column, row, cell, or rectangular region of a DataFrame\n", - "- Convert CSV files into DataFrames and DataFrames into CSV Files\n", - "- Access the head or tail of a DataFrame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Big Idea**: Data Frames store 2-dimensional data in tables! It is a collection of Series." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## You can create a DataFrame in a variety of ways!\n", - "\n", - "- dictionary of Series\n", - "- dictionary of lists\n", - "- dictionary of dictionaries\n", - "- list of dictionarines\n", - "- list of lists\n", - "\n", - "### From a dictionary of Series" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "names = Series([\"Alice\", \"Bob\", \"Cindy\", \"Dan\"])\n", - "scores = Series([6, 7, 8, 9])\n", - "\n", - "# to make a dictionary of Series, need to write column names for the keys\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a dictionary of lists" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "name_list = [\"Alice\", \"Bob\", \"Cindy\", \"Dan\"]\n", - "score_list = [6, 7, 8, 9]\n", - "\n", - "# this is the same as above, reminding us that Series act like lists\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a dictionary of dictionaries\n", - "We need to make up keys to match the things in each column" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = {\n", - " \"Player name\": {0: \"Alice\", 1: \"Bob\", 2: \"Cindy\", 3: \"Dan\"},\n", - " \"Score\": {0: 6, 1: 7, 2: 8, 3: 9}\n", - "}\n", - "data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a list of dicts" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = [\n", - " {\"Player name\": \"Alice\", \"Score\": 6},\n", - " {\"Player name\": \"Bob\", \"Score\": 7},\n", - " {\"Player name\": \"Cindy\", \"Score\": 8},\n", - " {\"Player name\": \"Dan\", \"Score\": 9}\n", - "]\n", - "data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### From a list of lists" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = [\n", - " [\"Alice\", 6],\n", - " [\"Bob\", 7],\n", - " [\"Cindy\", 8],\n", - " [\"Dan\", 9]\n", - "]\n", - "data\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explicitly naming the columns\n", - "We have to add the column names, we do this with `columns = [name1, name2, ....]`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = [\n", - " [\"Alice\", 6],\n", - " [\"Bob\", 7],\n", - " [\"Cindy\", 8],\n", - " [\"Dan\", 9]\n", - "]\n", - "data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explicitly naming the indices\n", - "We can use `index = [name1, name2, ...]` to rename the index of each row" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = [\n", - " {\"Player name\": \"Alice\", \"Score\": 6},\n", - " {\"Player name\": \"Bob\", \"Score\": 7},\n", - " {\"Player name\": \"Cindy\", \"Score\": 8},\n", - " {\"Player name\": \"Dan\", \"Score\": 9}\n", - "]\n", - "data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: \n", - "# Make a DataFrame of 4 people you know with different ages\n", - "# Give names to both the columns and rows\n", - "\n", - "# Share how you did with this with your neighbor\n", - "# If you both did it the same way, try it a different way." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Select a column, row, cell, or rectangular region of a DataFrame\n", - "### Data lookup: Series\n", - "- `s.loc[X]` <- lookup by pandas index\n", - "- `s.iloc[X]` <- lookup by integer position" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hours = Series({\"Alice\": 6, \"Bob\": 7, \"Cindy\": 8, \"Dan\": 9})\n", - "hours" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Lookup Bob's hours by pandas index.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Lookup Bob's hours by integer position.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Lookup Cindy's hours by pandas index.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Data lookup: DataFrame\n", - "\n", - "\n", - "- `d.loc[r]` lookup ROW by pandas ROW index\n", - "- `d.iloc[r]` lookup ROW by ROW integer position\n", - "- `d[c]` lookup COL by pandas COL index\n", - "- `d.loc[r, c]` lookup by pandas ROW index and pandas COL index\n", - "- `d.iloc[r, c]` lookup by ROW integer position and COL integer position" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# We often call the object that we make df\n", - "data = [\n", - " [\"Hope\", 10],\n", - " [\"Peace\", 7],\n", - " [\"Joy\", 4],\n", - " [\"Love\", 11]\n", - "]\n", - "df = DataFrame(data, index = [\"H\", \"P\", \"J\", \"L\"], columns = [\"Player name\", \"Score\"])\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are 3 different ways of accessing row L? " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How about accessing a column?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are 3 different ways to access a single cell?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to set values for a specific entry?\n", - "\n", - "- `d.loc[r, c] = new_val`\n", - "- `d.iloc[r, c] = new_val`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#change player D's name\n", - "df.loc[\"L\", \"Player name\"] = \"Luisa\"\n", - "df" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# then add 3 to that player's score using .loc\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# add 7 to a different player's score using .iloc\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find the max score and the mean score" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# find the max and mean of the \"Score\" column\n", - "print(df[\"Score\"].max(), df[\"Score\"].mean())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find the highest scoring player" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing a DataFrame\n", - "\n", - "- `df.iloc[ROW_SLICE, COL_SLICE]` <- make a rectangular slice from the DataFrame using integer positions\n", - "- `df.loc[ROW_SLICE, COL_SLICE]` <- make a rectangular slice from the DataFrame using index" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df.iloc[1:3, 0:2]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df.loc[\"P\":\"J\", \"Player name\":\"Score\"] # notice that this way is inclusive of endpoints" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set values for sliced DataFrame\n", - "\n", - "- `d.loc[ROW_SLICE, COL_SLICE] = new_val` <- set value by ROW INDEX and COL INDEX\n", - "- `d.iloc[ROW_SLICE, COL_SLICE] = new_val` <- set value by ROW Integer position and COL Integer position" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df.loc[\"P\":\"J\", \"Score\"] += 5\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pandas allows slicing of non-contiguous columns" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# just get Player name for Index P and L\n", - "df.loc[[\"P\", \"L\"],\"Player name\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# add 2 to the people in rows P and L\n", - "df.loc[[\"P\", \"L\"],\"Score\"] += 2\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Boolean indexing on a DataFrame\n", - "\n", - "- `d[BOOL SERIES]` <- makes a new DF of all rows that lined up were True" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Make a Series of Booleans based on Score >= 15" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "b" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### use b to slice the DataFrame\n", - "if b is true, include this row in the new df" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### do the last two things in a single step" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating DataFrame from csv" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# it's that easy! \n", - "df = pd.read_csv(\"IMDB-Movie-Data.csv\")\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View the first few lines of the DataFrame\n", - "- `.head(n)` gets the first n lines, 5 is the default" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### get the first 2 rows" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### View the first few lines of the DataFrame\n", - "- `.tail(n)` gets the last n lines, 5 is the default" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are the first and last years in our dataset?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Extract Year column\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"First year: {}, Last year: {}\".format(???))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are the rows that correspond to movies whose title contains \"Harry\" ? \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the movie at index 6 ? " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Notice that there are two index columns\n", - "- That happened because when you write a csv from pandas to a file, it writes a new index column\n", - "- So if the dataFrame already contains an index, you are going to get two index columns\n", - "- Let's fix that problem" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How can you use slicing to get just columns with Title and Year?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df2 = ???\n", - "df2\n", - "# notice that this does not have the 'index' column" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How can you use slicing to get rid of the first column?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df = df.iloc[???] #all the rows, not column 0\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write a df to a csv file" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df.to_csv(\"better_movies.csv\", index = False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Practice on your own.....Data Analysis with Data Frames\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are all the movies that have above average run time (long movies)? " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "long_movies = ???\n", - "long_movies" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Which long movie has the lowest rating?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# of these movies, what was the min rating? \n", - "min_rating = ???\n", - "min_rating" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Which movies had this min rating?\n", - "???" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are all long movies with someone in the cast named \"Emma\" ? " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "???" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the title of the shortest movie?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "???" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What movie had the highest revenue?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df[\"Revnue\"].max() # does not work, Why?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# We need to clean our data\n", - "# Some movies have M at the end and others don't.\n", - "# All revenues are in millions of dollars.\n", - "def format_revenue(revenue):\n", - " \"\"\" \n", - " Checks the last character of the string and formats accordingly\n", - " \"\"\"\n", - " if type(revenue) == float: # need this in here if we run code multiple times\n", - " return revenue\n", - " elif revenue[-1] == 'M': # some have an \"M\" at the end\n", - " return ??? # TODO: convert relevant part of the string to float and multiple by 1e6\n", - " else:\n", - " return ??? # TODO: convert to float and multiple by 1e6" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What movie had the highest revenue?\n", - "revenue = df[\"Revenue\"].apply(format_revenue) # apply a function to a column; returns a Series\n", - "print(revenue.head())\n", - "max_revenue = revenue.max()\n", - "\n", - "# make a copy of our df\n", - "rev_df = df.copy()\n", - "rev_df[\"Revenue (float)\"] = revenue\n", - "rev_df" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Now we can answer the question!\n", - "???" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Or more generally...\n", - "rev_df.sort_values(by = \"Revenue (float)\", ascending = False)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the average runtime for movies by \"Francis Lawrence\"?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### More complicated questions..." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Which director had the highest average rating? \n", - "\n", - "# one way is to make a python dict of director, list of ratings\n", - "director_dict = dict()\n", - "\n", - "# make the dictionary: key is director, value is list of ratings\n", - "for i in range(len(df)):\n", - " director = df.loc[i, \"Director\"]\n", - " rating = df.loc[i, \"Rating\"]\n", - " #print(i, director, rating)\n", - " if director not in director_dict:\n", - " director_dict[director] = []\n", - " director_dict[director].append(rating)\n", - "\n", - "# make a ratings dict key is directory, value is average\n", - "# only include directors with > 4 movies\n", - "ratings_dict = {k:sum(v)/len(v) for (k,v) in director_dict.items() if len(v) > 4}\n", - "\n", - "#sort a dict by values\n", - "dict(sorted(ratings_dict.items(), key=lambda t:t[-1], reverse=True))\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# FOR DEMONSTRATION PURPOSES ONLY\n", - "# We haven't (and will not) learn about \"groupby\"\n", - "# Pandas has many operations which will be helpful!\n", - "\n", - "# Consider what you already know, and what Pandas can solve\n", - "# when formulating your solutions.\n", - "rating_groups = df.groupby(\"Director\")[\"Rating\"]\n", - "rating_groups.mean()[rating_groups.count() > 4].sort_values(ascending=False)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Extra Practice: Make up some of your own questions about the movies" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.7" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28-checkpoint.ipynb b/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28-checkpoint.ipynb deleted file mode 100644 index 7cdd847..0000000 --- a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28-checkpoint.ipynb +++ /dev/null @@ -1,1038 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Pandas 1 worksheet\n", - "\n", - "- Observe syntax, predict output and run cell to confirm your output" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.core.display import display, HTML\n", - "display(HTML(\"<style>.container { width:100% !important; }</style>\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Learning objectives" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " - Pandas helps deal with tabular (tables) data\n", - " - List of list is not adequate alternative to excel\n", - " - Series: new data structure\n", - " - hybrid of a dict and a list\n", - " - Python dict \"key\" equivalent to \"index\" in pandas\n", - " - Python list \"index\" quivalent to \"integer position\" in pandas\n", - " - supports complicated expressions within lookup [...]\n", - " - element-wise operation\n", - " - boolean indexing\n", - " - DataFrames aka tables (next lecture)\n", - " - built from series\n", - " - each series will be a column in the table" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# pandas comes with Anaconda installation\n", - "If for some reason, you don't have pandas installed, run the following command in terminal or powershell\n", - "<pre> pip install pandas </pre>" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pandas.Series" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Module naming abbreviation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pd.Series" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a dict" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#create a series from a dict\n", - "d = {\"one\":7, \"two\":8, \"three\":9}\n", - "d" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.Series({\"one\":7, \"two\":8, \"three\":9})\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 one 7\n", - "# 1 two 8\n", - "# 2 three 9\n", - "\n", - "# dtype: int64" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with index (.loc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# dict access with key\n", - "d[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s.loc[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s.loc[\"two\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with integer position (.iloc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s.iloc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s.iloc[1]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s.iloc[-1]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing multiple values with a list of integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[[0, 2]]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#series access with a list of indexes\n", - "s[[\"one\", \"three\"]]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a list" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Series created from a list\n", - "num_list = [100, 200, 300]\n", - "s = pd.Series([100, 200, 300])\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 0 100\n", - "# 1 1 200\n", - "# 2 2 300\n", - "# dtype: int64" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(s.loc[1])\n", - "print(s.iloc[1])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "# letters[-1] #Avoid negative indexes, unless we use .iloc" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "letters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#list slicing reveiw\n", - "letters_list" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sliced_letter_list = letters_list[2:]\n", - "sliced_letter_list" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sliced_letter_list[0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#series slicing\n", - "letters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sliced_letters = letters[2:]\n", - "sliced_letters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sliced_letters.loc[2]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sliced_letters.iloc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# sliced_letter.loc[0] # index 0 doesn't exist in the sliced series!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sliced_letters[2]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Note: integer positions get renumbered, whereas indexes do not.\n", - "\n", - "# IP Index values\n", - "# 0 2 c\n", - "# 1 3 d\n", - "# 2 4 e\n", - "# 3 5 f\n", - "# dtype: object" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using index" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.Series({\"one\":7, \"two\":8, \"three\":9})\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#slicing with indexes\n", - "s[\"two\":]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise operations\n", - "1. SERIES op SCALAR\n", - "2. SERIES op SERIES" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#list recap\n", - "nums = [1, 2, 3]\n", - "nums * 3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "snum = pd.Series(nums)\n", - "snum" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "snum * 3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "snum + 3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "snum / 3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "nums" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# nums / 3 # doesn't work with lists" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "snum" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "snum += 2\n", - "snum" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#list recap\n", - "l1 = [1, 2, 3]\n", - "l2 = [4, 5, 6]\n", - "l1 + l2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s1 = pd.Series(l1)\n", - "s2 = pd.Series(l2)\n", - "print(s1)\n", - "print(s2)\n", - "s1 + s2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s1 * s2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s1 / s2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s2 ** s1" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s1 < s2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## What happens to element-wise operation if we have two series with different sizes?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pd.Series([1,2,3]) + pd.Series([4,5])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Series with different types" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pd.Series([\"a\", \"Alice\", True, 1, 4.5, [1,2], {\"a\":\"Alice\"}])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How do you merge two series?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s1 = pd.Series([1,2,3]) \n", - "s2 = pd.Series([4,5])\n", - "print(s1)\n", - "print(s2)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.concat( [s1, s2] )\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s.loc[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise Ambiguity" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s1 = pd.Series({\"A\":10, \"B\": 20 })\n", - "s2 = pd.Series({\"B\":1, \"A\": 2 })\n", - "print(s1)\n", - "print(s2)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# INDEX ALIGNMENT\n", - "s1 + s2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to insert an index-value pair?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.Series({\"A\":10, \"B\": 20 })\n", - "print(s)\n", - "s[\"Z\"] = 100\n", - "s" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Boolean indexing" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.Series([10, 2, 3, 15])\n", - "s" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to extract numbers > 8?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "b = pd.Series([True, False, False, True])\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[b]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "b = s > 8\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[b]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[s > 8]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[pd.Series([True, False, False, True])]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise String operations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "words = pd.Series([\"APPLE\", \"boy\", \"CAT\", \"dog\"])\n", - "words" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# words.upper() # can't call string functions on Series" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "words.str.upper()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#words[BOOLEAN SERIES]\n", - "#How do we get BOOLEAN SERIES?\n", - "b = words == words.str.upper()\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "words[b]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "words[words == words.str.upper()]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to get the odd numbers from a list?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.Series([10, 19, 11, 30, 35])\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s % 2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "b = s % 2 == 1\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s[b]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## BOOLEAN OPERATORS on series: and, or, not " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to get numbers < 12 or numbers > 33?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# s[s < 12 or s > 33] # doesn't work with or, and, not" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# use | instead of or\n", - "s[ s < 12 | s > 33]\n", - "# error because precedence is so high\n", - "# s[ s < (12 | s) > 33]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use lots of parenthesis\n", - "s[ (s < 12) | (s > 33)]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# AND is &\n", - "s[ (s > 12) & (s < 33)]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# NOT is ~\n", - "s[ ~((s > 12) & (s < 33))]" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.8" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28_template-checkpoint.ipynb b/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28_template-checkpoint.ipynb deleted file mode 100644 index a7984f8..0000000 --- a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/demo_lec_28_template-checkpoint.ipynb +++ /dev/null @@ -1,1151 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Pandas 1" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<style>.container { width:100% !important; }</style>" - ], - "text/plain": [ - "<IPython.core.display.HTML object>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from IPython.core.display import display, HTML\n", - "display(HTML(\"<style>.container { width:100% !important; }</style>\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Learning objectives" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " - Pandas:\n", - " - Python module: tools for doing Data Science\n", - " - helps deal with tabular (tables) data\n", - " - List of list is not adequate alternative to excel\n", - " - Series: new data structure\n", - " - hybrid of a dict and a list\n", - " - Python dict \"key\" equivalent to \"index\" in pandas\n", - " - Python list \"index\" quivalent to \"integer position\" in pandas\n", - " - supports complicated expressions within lookup [...]\n", - " - element-wise operation\n", - " - boolean indexing\n", - " - DataFrames aka tables (next lecture)\n", - " - built from series\n", - " - each series will be a column in the table" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# pandas comes with Anaconda installation\n", - "If for some reason, you don't have pandas installed, run the following command in terminal or powershell\n", - "<pre> pip install pandas </pre>" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# importing pandas module\n", - "import pandas" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Module naming abbreviation" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# Common abbrievation for pandas module\n", - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a dict" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'one': 7, 'two': 8, 'three': 9}" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# create a series from a dict\n", - "d = {\"one\": 7, \"two\": 8, \"three\": 9}\n", - "d" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series(d)\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"one\": 7, \"two\": 8, \"three\": 9}) # equivalent to the above example\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 one 7\n", - "# 1 two 8\n", - "# 2 three 9\n", - "\n", - "# dtype: int64" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with index (.loc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# dict access with key\n", - "d[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[\"two\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with integer position (.iloc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "9" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[-1]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Regular lookups with just [ ]" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing multiple values with a list of integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[[1, 2]]" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# series access with a list of indexes\n", - "s[[\"two\", \"three\"]]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a list" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 100\n", - "1 200\n", - "2 300\n", - "dtype: int64" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Series created from a list\n", - "num_list = [100, 200, 300]\n", - "s = pd.Series(num_list)\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 0 100\n", - "# 1 1 200\n", - "# 2 2 300\n", - "# dtype: int64" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "200\n", - "200\n" - ] - } - ], - "source": [ - "print(s.loc[1])\n", - "print(s.iloc[1])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "pandas looks for an index when we do a [ ] lookup, by default" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "# letters[-1] # Avoid negative indexes, unless we use .iloc" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 A\n", - "1 B\n", - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "letters" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['C', 'D']" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# list slicing\n", - "sliced_letter_list = letters_list[2:]\n", - "sliced_letter_list" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Sliced Series retains original Series index, whereas integer positions are renumbered." - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letters = letters[2:]\n", - "sliced_letters" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "# Note: integer positions get renumbered, whereas indexes do not.\n", - "\n", - "# IP Index values\n", - "# 0 2 C\n", - "# 1 3 D\n", - "# dtype: object" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "C\n", - "C\n" - ] - } - ], - "source": [ - "print(sliced_letters.loc[2])\n", - "print(sliced_letters.iloc[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using index" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"one\": 7, \"two\": 8, \"three\": 9})\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#slicing with indexes\n", - "s[\"two\":]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Statistics on Series" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 44\n", - "1 32\n", - "2 19\n", - "3 67\n", - "4 23\n", - "5 23\n", - "6 92\n", - "7 47\n", - "8 47\n", - "9 78\n", - "10 84\n", - "dtype: int64" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores = pd.Series([44, 32, 19, 67, 23, 23, 92, 47, 47, 78, 84])\n", - "scores" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "50.54545454545455" - ] - }, - "execution_count": 29, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.mean()" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "26.051347897426098" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.std()" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "47.0" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.median()" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 23\n", - "1 47\n", - "dtype: int64" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.mode()" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1.00 92.0\n", - "0.75 72.5\n", - "0.50 47.0\n", - "0.25 27.5\n", - "0.00 19.0\n", - "dtype: float64" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.quantile([1.0, 0.75, 0.5, 0.25, 0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## CS220 information survey data" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [], - "source": [ - "# Modified from https://automatetheboringstuff.com/chapter14/\n", - "import csv\n", - "def process_csv(filename):\n", - " example_file = open(filename, encoding=\"utf-8\")\n", - " example_reader = csv.reader(example_file)\n", - " example_data = list(example_reader)\n", - " example_file.close()\n", - " return example_data\n", - "\n", - "data = process_csv(\"cs220_survey_data.csv\")\n", - "header = data[0]\n", - "data = data[1:]" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['lecture', 'age', 'major', 'topping']" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "header" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['LEC001', '19', 'Computer Science', 'basil/spinach'],\n", - " ['LEC002', '18', 'Engineering', 'pineapple'],\n", - " ['LEC003', '19', 'Business', 'pepperoni']]" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data[:3]" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[19, 18, 19, 19, 19]" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# use list comprehension to extract just ages\n", - "age_list = [int(row[1]) for row in data if row[1] != \"\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [], - "source": [ - "cs220_ages = pd.Series(age_list)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Unique values in a Series" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "19 290\n", - "18 214\n", - "20 178\n", - "21 101\n", - "22 41\n", - "23 13\n", - "17 11\n", - "25 7\n", - "24 6\n", - "26 4\n", - "28 3\n", - "29 2\n", - "30 2\n", - "27 2\n", - "34 1\n", - "37 1\n", - "35 1\n", - "16 1\n", - "33 1\n", - "32 1\n", - "31 1\n", - "46 1\n", - "dtype: int64" - ] - }, - "execution_count": 42, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages.value_counts()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Series sorting\n", - "- can be done using index or values" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "16 1\n", - "17 11\n", - "18 214\n", - "19 290\n", - "20 178\n", - "21 101\n", - "22 41\n", - "23 13\n", - "24 6\n", - "25 7\n", - "26 4\n", - "27 2\n", - "28 3\n", - "29 2\n", - "30 2\n", - "31 1\n", - "32 1\n", - "33 1\n", - "34 1\n", - "35 1\n", - "37 1\n", - "46 1\n", - "dtype: int64" - ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages.value_counts().sort_index()" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "46 1\n", - "32 1\n", - "33 1\n", - "16 1\n", - "35 1\n", - "37 1\n", - "34 1\n", - "31 1\n", - "27 2\n", - "30 2\n", - "29 2\n", - "28 3\n", - "26 4\n", - "24 6\n", - "25 7\n", - "17 11\n", - "23 13\n", - "22 41\n", - "21 101\n", - "20 178\n", - "18 214\n", - "19 290\n", - "dtype: int64" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages.value_counts().sort_values()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Statistics" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# find the mode\n", - "print(cs220_ages.mode())\n", - "\n", - "# find the age of the 75th percentile\n", - "print(ages.quantile(.75))\n", - "\n", - "# how many ages are > 25 ? \n", - "print(len(cs220_ages[cs220_ages > 25]))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# We can plot the data\n", - "age_plot = cs220_ages.value_counts().sort_index().plot.bar()\n", - "age_plot.set(xlabel = \"age\", ylabel = \"count\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise operations\n", - "1. SERIES op SCALAR\n", - "2. SERIES op SERIES" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "## Series from a dict\n", - "game1_points = pd.Series({\"Chris\": 10, \"Kiara\": 3, \"Mikayla\": 7, \"Ann\": 8, \"Trish\": 6})\n", - "print(game1points)\n", - "game2_points = pd.Series({\"Kiara\": 7, \"Chris\": 3, \"Trish\": 11, \"Mikayla\": 2, \"Ann\": 5})\n", - "print(game2points)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Pandas can perform operations on two series by matching up their indices\n", - "total = game1_points + game2_points\n", - "total" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "## Who has the most points?\n", - "print(total.max())\n", - "print(total.idxmax())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(total['Kiara'], total[2])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.8" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1-checkpoint.ipynb b/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1-checkpoint.ipynb deleted file mode 100644 index e214a4a..0000000 --- a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1-checkpoint.ipynb +++ /dev/null @@ -1,3255 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Web 1 - How to get data from the Internet" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "import json\n", - "import pandas as pd\n", - "from pandas import Series, DataFrame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### P10 check-in" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# It is very important to check auto-grader test results on p10 in a timely manner.\n", - "# Take a few minutes to verify if you hardcoded the slashes in P10 rather than using os.path.join? \n", - " # Your code won't clear auto-grader if you hardcode either \"/\" or \"\\\" \n", - " # for *ANY* relative path in the entire project\n", - "# Check your code and check the autograder as soon as possible." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 1: Read the data from \"IMDB-Movie-Data.csv\" into a pandas DataFrame called \"movies\"" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Index</th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>0</td>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>1</td>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>2</td>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>3</td>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>4</td>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>1063</td>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>1064</td>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>1065</td>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>1066</td>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>1067</td>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 9 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Index Title Genre \\\n", - "0 0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 2 Split Horror,Thriller \n", - "3 3 Sing Animation,Comedy,Family \n", - "4 4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... ... \n", - "1063 1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 1064 Baby Driver Action, Crime, Drama \n", - "1065 1065 Only the Brave Action, Biography, Drama \n", - "1066 1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 \n", - "... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 \n", - "\n", - "[1068 rows x 9 columns]" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "movies = pd.read_csv(\"IMDB-Movie-Data.csv\")\n", - "movies" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 2: fixing duplicate index columns\n", - "\n", - "Notice that there are two index columns\n", - "- That happened because when you write a csv from pandas to a file, it writes a new index column\n", - "- So if the DataFrame already contains an index, you are going to get two index columns\n", - "- Let's fix that problem" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 8 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 Split Horror,Thriller \n", - "3 Sing Animation,Comedy,Family \n", - "4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... \n", - "1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 Baby Driver Action, Crime, Drama \n", - "1065 Only the Brave Action, Biography, Drama \n", - "1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue \n", - "0 2014 121 8.1 333.13 \n", - "1 2012 124 7.0 126.46M \n", - "2 2016 117 7.3 138.12M \n", - "3 2016 108 7.2 270.32 \n", - "4 2016 123 6.2 325.02 \n", - "... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 \n", - "1064 2017 113 7.6 107.83 \n", - "1065 2017 134 7.6 18.34 \n", - "1066 2018 118 7.6 608.58 \n", - "1067 2018 136 7.6 215.29 \n", - "\n", - "[1068 rows x 8 columns]" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#use slicing to retain all the rows and columns excepting for column with integer position 0\n", - "movies = movies.iloc[:, 1:] \n", - "movies" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "movies.to_csv(\"better_movies.csv\", index = False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 3: Which movie has highest rating?" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>54</th>\n", - " <td>The Dark Knight</td>\n", - " <td>Action,Crime,Drama</td>\n", - " <td>Christopher Nolan</td>\n", - " <td>Christian Bale, Heath Ledger, Aaron Eckhart,Mi...</td>\n", - " <td>2008</td>\n", - " <td>152</td>\n", - " <td>9.0</td>\n", - " <td>533.32</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title Genre Director \\\n", - "54 The Dark Knight Action,Crime,Drama Christopher Nolan \n", - "\n", - " Cast Year Runtime Rating \\\n", - "54 Christian Bale, Heath Ledger, Aaron Eckhart,Mi... 2008 152 9.0 \n", - "\n", - " Revenue \n", - "54 533.32 " - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "max_rating = movies[\"Rating\"].max()\n", - "movies[movies[\"Rating\"] == max_rating]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 4: Which movies were released in 2020?" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>998</th>\n", - " <td>Hamilton</td>\n", - " <td>Biography, Drama, History</td>\n", - " <td>Thomas Kail</td>\n", - " <td>Lin-Manuel Miranda, Phillipa Soo, Leslie Odom ...</td>\n", - " <td>2020</td>\n", - " <td>160</td>\n", - " <td>8.6</td>\n", - " <td>612.82</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1000</th>\n", - " <td>Soorarai Pottru</td>\n", - " <td>Drama</td>\n", - " <td>Sudha Kongara</td>\n", - " <td>Suriya, Madhavan, Paresh Rawal, Aparna Balamurali</td>\n", - " <td>2020</td>\n", - " <td>153</td>\n", - " <td>8.6</td>\n", - " <td>5.93</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1022</th>\n", - " <td>Soul</td>\n", - " <td>Animation, Adventure, Comedy</td>\n", - " <td>Pete Docter</td>\n", - " <td>Kemp Powers, Jamie Foxx, Tina Fey, Graham Norton</td>\n", - " <td>2020</td>\n", - " <td>100</td>\n", - " <td>8.1</td>\n", - " <td>121.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1031</th>\n", - " <td>Dil Bechara</td>\n", - " <td>Comedy, Drama, Romance</td>\n", - " <td>Mukesh Chhabra</td>\n", - " <td>Sushant Singh Rajput, Sanjana Sanghi, Sahil Va...</td>\n", - " <td>2020</td>\n", - " <td>101</td>\n", - " <td>7.9</td>\n", - " <td>263.61</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1047</th>\n", - " <td>The Trial of the Chicago 7</td>\n", - " <td>Drama, History, Thriller</td>\n", - " <td>Aaron Sorkin</td>\n", - " <td>Eddie Redmayne, Alex Sharp, Sacha Baron Cohen,...</td>\n", - " <td>2020</td>\n", - " <td>129</td>\n", - " <td>7.8</td>\n", - " <td>0.12</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1048</th>\n", - " <td>Druk</td>\n", - " <td>Comedy, Drama</td>\n", - " <td>Thomas Vinterberg</td>\n", - " <td>Mads Mikkelsen, Thomas Bo Larsen, Magnus Milla...</td>\n", - " <td>2020</td>\n", - " <td>117</td>\n", - " <td>7.8</td>\n", - " <td>21.71</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "998 Hamilton Biography, Drama, History \n", - "1000 Soorarai Pottru Drama \n", - "1022 Soul Animation, Adventure, Comedy \n", - "1031 Dil Bechara Comedy, Drama, Romance \n", - "1047 The Trial of the Chicago 7 Drama, History, Thriller \n", - "1048 Druk Comedy, Drama \n", - "\n", - " Director Cast \\\n", - "998 Thomas Kail Lin-Manuel Miranda, Phillipa Soo, Leslie Odom ... \n", - "1000 Sudha Kongara Suriya, Madhavan, Paresh Rawal, Aparna Balamurali \n", - "1022 Pete Docter Kemp Powers, Jamie Foxx, Tina Fey, Graham Norton \n", - "1031 Mukesh Chhabra Sushant Singh Rajput, Sanjana Sanghi, Sahil Va... \n", - "1047 Aaron Sorkin Eddie Redmayne, Alex Sharp, Sacha Baron Cohen,... \n", - "1048 Thomas Vinterberg Mads Mikkelsen, Thomas Bo Larsen, Magnus Milla... \n", - "\n", - " Year Runtime Rating Revenue \n", - "998 2020 160 8.6 612.82 \n", - "1000 2020 153 8.6 5.93 \n", - "1022 2020 100 8.1 121.0 \n", - "1031 2020 101 7.9 263.61 \n", - "1047 2020 129 7.8 0.12 \n", - "1048 2020 117 7.8 21.71 " - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "movies[movies[\"Year\"] == 2020]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 5a: What does this function do?" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def format_revenue(revenue):\n", - " if type(revenue) == float: # need this in here if we run code multiple times\n", - " return revenue\n", - " elif revenue[-1] == 'M': # some have an \"M\" at the end\n", - " return float(revenue[:-1]) * 1e6\n", - " else: # otherwise, assume millions.\n", - " return float(revenue) * 1e6" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 5b: Using the above function, create a new column called \"Revenue in dollars\" by applying appropriate conversion to Revenue column." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " <th>Revenue in dollars</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>Guardians of the Galaxy</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...</td>\n", - " <td>2014</td>\n", - " <td>121</td>\n", - " <td>8.1</td>\n", - " <td>333.13</td>\n", - " <td>333130000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>Prometheus</td>\n", - " <td>Adventure,Mystery,Sci-Fi</td>\n", - " <td>Ridley Scott</td>\n", - " <td>Noomi Rapace, Logan Marshall-Green, Michael ...</td>\n", - " <td>2012</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>126.46M</td>\n", - " <td>126460000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>Split</td>\n", - " <td>Horror,Thriller</td>\n", - " <td>M. Night Shyamalan</td>\n", - " <td>James McAvoy, Anya Taylor-Joy, Haley Lu Richar...</td>\n", - " <td>2016</td>\n", - " <td>117</td>\n", - " <td>7.3</td>\n", - " <td>138.12M</td>\n", - " <td>138120000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>Sing</td>\n", - " <td>Animation,Comedy,Family</td>\n", - " <td>Christophe Lourdelet</td>\n", - " <td>Matthew McConaughey,Reese Witherspoon, Seth Ma...</td>\n", - " <td>2016</td>\n", - " <td>108</td>\n", - " <td>7.2</td>\n", - " <td>270.32</td>\n", - " <td>270320000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>Suicide Squad</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>David Ayer</td>\n", - " <td>Will Smith, Jared Leto, Margot Robbie, Viola D...</td>\n", - " <td>2016</td>\n", - " <td>123</td>\n", - " <td>6.2</td>\n", - " <td>325.02</td>\n", - " <td>325020000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>...</th>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " <td>...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1063</th>\n", - " <td>Guardians of the Galaxy Vol. 2</td>\n", - " <td>Action, Adventure, Comedy</td>\n", - " <td>James Gunn</td>\n", - " <td>Chris Pratt, Zoe Saldana, Dave Bautista, Vin D...</td>\n", - " <td>2017</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>389.81</td>\n", - " <td>389810000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1064</th>\n", - " <td>Baby Driver</td>\n", - " <td>Action, Crime, Drama</td>\n", - " <td>Edgar Wright</td>\n", - " <td>Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon...</td>\n", - " <td>2017</td>\n", - " <td>113</td>\n", - " <td>7.6</td>\n", - " <td>107.83</td>\n", - " <td>107830000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1065</th>\n", - " <td>Only the Brave</td>\n", - " <td>Action, Biography, Drama</td>\n", - " <td>Joseph Kosinski</td>\n", - " <td>Josh Brolin, Miles Teller, Jeff Bridges, Jenni...</td>\n", - " <td>2017</td>\n", - " <td>134</td>\n", - " <td>7.6</td>\n", - " <td>18.34</td>\n", - " <td>18340000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " <td>608580000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1067</th>\n", - " <td>A Star Is Born</td>\n", - " <td>Drama, Music, Romance</td>\n", - " <td>Bradley Cooper</td>\n", - " <td>Lady Gaga, Bradley Cooper, Sam Elliott, Greg G...</td>\n", - " <td>2018</td>\n", - " <td>136</td>\n", - " <td>7.6</td>\n", - " <td>215.29</td>\n", - " <td>215290000.0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "<p>1068 rows × 9 columns</p>\n", - "</div>" - ], - "text/plain": [ - " Title Genre \\\n", - "0 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", - "1 Prometheus Adventure,Mystery,Sci-Fi \n", - "2 Split Horror,Thriller \n", - "3 Sing Animation,Comedy,Family \n", - "4 Suicide Squad Action,Adventure,Fantasy \n", - "... ... ... \n", - "1063 Guardians of the Galaxy Vol. 2 Action, Adventure, Comedy \n", - "1064 Baby Driver Action, Crime, Drama \n", - "1065 Only the Brave Action, Biography, Drama \n", - "1066 Incredibles 2 Animation, Action, Adventure \n", - "1067 A Star Is Born Drama, Music, Romance \n", - "\n", - " Director Cast \\\n", - "0 James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", - "1 Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael ... \n", - "2 M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", - "3 Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", - "4 David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... \n", - "... ... ... \n", - "1063 James Gunn Chris Pratt, Zoe Saldana, Dave Bautista, Vin D... \n", - "1064 Edgar Wright Ansel Elgort, Jon Bernthal, Jon Hamm, Eiza Gon... \n", - "1065 Joseph Kosinski Josh Brolin, Miles Teller, Jeff Bridges, Jenni... \n", - "1066 Brad Bird Craig T. Nelson, Holly Hunter, Sarah Vowell, H... \n", - "1067 Bradley Cooper Lady Gaga, Bradley Cooper, Sam Elliott, Greg G... \n", - "\n", - " Year Runtime Rating Revenue Revenue in dollars \n", - "0 2014 121 8.1 333.13 333130000.0 \n", - "1 2012 124 7.0 126.46M 126460000.0 \n", - "2 2016 117 7.3 138.12M 138120000.0 \n", - "3 2016 108 7.2 270.32 270320000.0 \n", - "4 2016 123 6.2 325.02 325020000.0 \n", - "... ... ... ... ... ... \n", - "1063 2017 136 7.6 389.81 389810000.0 \n", - "1064 2017 113 7.6 107.83 107830000.0 \n", - "1065 2017 134 7.6 18.34 18340000.0 \n", - "1066 2018 118 7.6 608.58 608580000.0 \n", - "1067 2018 136 7.6 215.29 215290000.0 \n", - "\n", - "[1068 rows x 9 columns]" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "movies[\"Revenue in dollars\"] = movies[\"Revenue\"].apply(format_revenue)\n", - "movies" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 6: What are the top 10 highest-revenue movies?" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " <th>Revenue in dollars</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>50</th>\n", - " <td>Star Wars: Episode VII - The Force Awakens</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>J.J. Abrams</td>\n", - " <td>Daisy Ridley, John Boyega, Oscar Isaac, Domhna...</td>\n", - " <td>2015</td>\n", - " <td>136</td>\n", - " <td>8.1</td>\n", - " <td>936.63</td>\n", - " <td>936630000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1006</th>\n", - " <td>Avengers: Endgame</td>\n", - " <td>Action, Adventure, Drama</td>\n", - " <td>Anthony Russo</td>\n", - " <td>Joe Russo, Robert Downey Jr., Chris Evans, Mar...</td>\n", - " <td>2019</td>\n", - " <td>181</td>\n", - " <td>8.4</td>\n", - " <td>858.37</td>\n", - " <td>858370000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>87</th>\n", - " <td>Avatar</td>\n", - " <td>Action,Adventure,Fantasy</td>\n", - " <td>James Cameron</td>\n", - " <td>Sam Worthington, Zoe Saldana, Sigourney Weaver...</td>\n", - " <td>2009</td>\n", - " <td>162</td>\n", - " <td>7.8</td>\n", - " <td>760.51</td>\n", - " <td>760510000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1007</th>\n", - " <td>Avengers: Infinity War</td>\n", - " <td>Action, Adventure, Sci-Fi</td>\n", - " <td>Anthony Russo</td>\n", - " <td>Joe Russo, Robert Downey Jr., Chris Hemsworth,...</td>\n", - " <td>2018</td>\n", - " <td>149</td>\n", - " <td>8.4</td>\n", - " <td>678.82</td>\n", - " <td>678820000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>85</th>\n", - " <td>Jurassic World</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>Colin Trevorrow</td>\n", - " <td>Chris Pratt, Bryce Dallas Howard, Ty Simpkins,...</td>\n", - " <td>2015</td>\n", - " <td>124</td>\n", - " <td>7.0</td>\n", - " <td>652.18</td>\n", - " <td>652180000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>76</th>\n", - " <td>The Avengers</td>\n", - " <td>Action,Sci-Fi</td>\n", - " <td>Joss Whedon</td>\n", - " <td>Robert Downey Jr., Chris Evans, Scarlett Johan...</td>\n", - " <td>2012</td>\n", - " <td>143</td>\n", - " <td>8.1</td>\n", - " <td>623.28</td>\n", - " <td>623280000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>998</th>\n", - " <td>Hamilton</td>\n", - " <td>Biography, Drama, History</td>\n", - " <td>Thomas Kail</td>\n", - " <td>Lin-Manuel Miranda, Phillipa Soo, Leslie Odom ...</td>\n", - " <td>2020</td>\n", - " <td>160</td>\n", - " <td>8.6</td>\n", - " <td>612.82</td>\n", - " <td>612820000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1066</th>\n", - " <td>Incredibles 2</td>\n", - " <td>Animation, Action, Adventure</td>\n", - " <td>Brad Bird</td>\n", - " <td>Craig T. Nelson, Holly Hunter, Sarah Vowell, H...</td>\n", - " <td>2018</td>\n", - " <td>118</td>\n", - " <td>7.6</td>\n", - " <td>608.58</td>\n", - " <td>608580000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>54</th>\n", - " <td>The Dark Knight</td>\n", - " <td>Action,Crime,Drama</td>\n", - " <td>Christopher Nolan</td>\n", - " <td>Christian Bale, Heath Ledger, Aaron Eckhart,Mi...</td>\n", - " <td>2008</td>\n", - " <td>152</td>\n", - " <td>9.0</td>\n", - " <td>533.32</td>\n", - " <td>533320000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>Rogue One</td>\n", - " <td>Action,Adventure,Sci-Fi</td>\n", - " <td>Gareth Edwards</td>\n", - " <td>Felicity Jones, Diego Luna, Alan Tudyk, Donnie...</td>\n", - " <td>2016</td>\n", - " <td>133</td>\n", - " <td>7.9</td>\n", - " <td>532.17</td>\n", - " <td>532170000.0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title \\\n", - "50 Star Wars: Episode VII - The Force Awakens \n", - "1006 Avengers: Endgame \n", - "87 Avatar \n", - "1007 Avengers: Infinity War \n", - "85 Jurassic World \n", - "76 The Avengers \n", - "998 Hamilton \n", - "1066 Incredibles 2 \n", - "54 The Dark Knight \n", - "12 Rogue One \n", - "\n", - " Genre Director \\\n", - "50 Action,Adventure,Fantasy J.J. Abrams \n", - "1006 Action, Adventure, Drama Anthony Russo \n", - "87 Action,Adventure,Fantasy James Cameron \n", - "1007 Action, Adventure, Sci-Fi Anthony Russo \n", - "85 Action,Adventure,Sci-Fi Colin Trevorrow \n", - "76 Action,Sci-Fi Joss Whedon \n", - "998 Biography, Drama, History Thomas Kail \n", - "1066 Animation, Action, Adventure Brad Bird \n", - "54 Action,Crime,Drama Christopher Nolan \n", - "12 Action,Adventure,Sci-Fi Gareth Edwards \n", - "\n", - " Cast Year Runtime \\\n", - "50 Daisy Ridley, John Boyega, Oscar Isaac, Domhna... 2015 136 \n", - "1006 Joe Russo, Robert Downey Jr., Chris Evans, Mar... 2019 181 \n", - "87 Sam Worthington, Zoe Saldana, Sigourney Weaver... 2009 162 \n", - "1007 Joe Russo, Robert Downey Jr., Chris Hemsworth,... 2018 149 \n", - "85 Chris Pratt, Bryce Dallas Howard, Ty Simpkins,... 2015 124 \n", - "76 Robert Downey Jr., Chris Evans, Scarlett Johan... 2012 143 \n", - "998 Lin-Manuel Miranda, Phillipa Soo, Leslie Odom ... 2020 160 \n", - "1066 Craig T. Nelson, Holly Hunter, Sarah Vowell, H... 2018 118 \n", - "54 Christian Bale, Heath Ledger, Aaron Eckhart,Mi... 2008 152 \n", - "12 Felicity Jones, Diego Luna, Alan Tudyk, Donnie... 2016 133 \n", - "\n", - " Rating Revenue Revenue in dollars \n", - "50 8.1 936.63 936630000.0 \n", - "1006 8.4 858.37 858370000.0 \n", - "87 7.8 760.51 760510000.0 \n", - "1007 8.4 678.82 678820000.0 \n", - "85 7.0 652.18 652180000.0 \n", - "76 8.1 623.28 623280000.0 \n", - "998 8.6 612.82 612820000.0 \n", - "1066 7.6 608.58 608580000.0 \n", - "54 9.0 533.32 533320000.0 \n", - "12 7.9 532.17 532170000.0 " - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "movies.sort_values(by = \"Revenue in dollars\", ascending = False).head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 7: Which shortest movies (below average runtime) have highest rating?" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>Title</th>\n", - " <th>Genre</th>\n", - " <th>Director</th>\n", - " <th>Cast</th>\n", - " <th>Year</th>\n", - " <th>Runtime</th>\n", - " <th>Rating</th>\n", - " <th>Revenue</th>\n", - " <th>Revenue in dollars</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>96</th>\n", - " <td>Kimi no na wa</td>\n", - " <td>Animation,Drama,Fantasy</td>\n", - " <td>Makoto Shinkai</td>\n", - " <td>Ryûnosuke Kamiki, Mone Kamishiraishi, Ryô Nari...</td>\n", - " <td>2016</td>\n", - " <td>106</td>\n", - " <td>8.6</td>\n", - " <td>4.68</td>\n", - " <td>4680000.0</td>\n", - " </tr>\n", - " <tr>\n", - " <th>249</th>\n", - " <td>The Intouchables</td>\n", - " <td>Biography,Comedy,Drama</td>\n", - " <td>Olivier Nakache</td>\n", - " <td>François Cluzet, Omar Sy, Anne Le Ny, Audrey F...</td>\n", - " <td>2011</td>\n", - " <td>112</td>\n", - " <td>8.6</td>\n", - " <td>13.18</td>\n", - " <td>13180000.0</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " Title Genre Director \\\n", - "96 Kimi no na wa Animation,Drama,Fantasy Makoto Shinkai \n", - "249 The Intouchables Biography,Comedy,Drama Olivier Nakache \n", - "\n", - " Cast Year Runtime Rating \\\n", - "96 Ryûnosuke Kamiki, Mone Kamishiraishi, Ryô Nari... 2016 106 8.6 \n", - "249 François Cluzet, Omar Sy, Anne Le Ny, Audrey F... 2011 112 8.6 \n", - "\n", - " Revenue Revenue in dollars \n", - "96 4.68 4680000.0 \n", - "249 13.18 13180000.0 " - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "short_movies = movies[movies[\"Runtime\"] < movies[\"Runtime\"].mean()]\n", - "short_movies[short_movies[\"Rating\"] == short_movies[\"Rating\"].max()]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Learning Objectives\n", - "\n", - "- Make a request for data using requests.get(URL)\n", - "- Check the status of a request/response\n", - "- Extract the text of a response\n", - "- Create a json file from a response\n", - "- State and practice good etiquette when getting data" - ] - }, - { - "attachments": { - "Client_server.png": { - "image/png": "" - } - }, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Core Ideas:\n", - " - Network structure\n", - " - Client / server\n", - " - Request / response\n", - " \n", - " \n", - " \n", - " - HTTP protocol\n", - " - URL\n", - " - Headers\n", - " - Status Codes\n", - " - The requests module" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## HTTP Status Codes you need to know\n", - "- 200: success\n", - "- 404: not found\n", - "\n", - "Here is a list of all status codes, you do NOT need to memorize it: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## requests.get : Simple string example\n", - "- URL: https://www.msyamkumar.com/hello.txt" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "200\n", - "Hello CS220 / CS319 students! Welcome to my website. Hope you are staying safe and healthy!\n", - "\n" - ] - } - ], - "source": [ - "url = \"https://www.msyamkumar.com/hello.txt\"\n", - "r = requests.get(url) # r is the response\n", - "print(r.status_code)\n", - "print(r.text)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "404\n", - "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", - "<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>meena/hello.txttttt</Key><RequestId>9PAFR0FANW1CRPTP</RequestId><HostId>Y+VL63r3qTktX1ZLIpaUvaSXOhstWA4yhSSA6RKCRumeA5+WK+ht7TbROpUZtVjmpGT/QaJcYA0=</HostId></Error>\n" - ] - } - ], - "source": [ - "# Q: What if the web site does not exist?\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", - "r = requests.get(typo_url)\n", - "print(r.status_code)\n", - "print(r.text)\n", - "\n", - "# A: We get a 404 (client error)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "ename": "AssertionError", - "evalue": "", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/2682133174.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mtypo_url\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"https://www.msyamkumar.com/hello.txttttt\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtypo_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus_code\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus_code\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mAssertionError\u001b[0m: " - ] - } - ], - "source": [ - "# We can check for a status_code error by using an assert\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", - "r = requests.get(typo_url)\n", - "assert r.status_code == 200\n", - "print(r.status_code)\n", - "print(r.text)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "ename": "HTTPError", - "evalue": "404 Client Error: Not Found for url: https://www.msyamkumar.com/hello.txttttt", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/4051826470.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Instead of using an assert, we often use raise_for_status()\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtypo_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#similar to asserting r.status_code == 200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py\u001b[0m in \u001b[0;36mraise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 951\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 952\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 953\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mHTTPError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 954\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 955\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mHTTPError\u001b[0m: 404 Client Error: Not Found for url: https://www.msyamkumar.com/hello.txttttt" - ] - } - ], - "source": [ - "# Instead of using an assert, we often use raise_for_status()\n", - "r = requests.get(typo_url)\n", - "r.raise_for_status() #similar to asserting r.status_code == 200\n", - "r.text\n", - "\n", - "# Note the error you get.... We will use this in the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "ename": "NameError", - "evalue": "name 'HTTPError' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/2028031330.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtypo_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#similar to asserting r.status_code == 200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py\u001b[0m in \u001b[0;36mraise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 952\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 953\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mHTTPError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 954\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mHTTPError\u001b[0m: 404 Client Error: Not Found for url: https://www.msyamkumar.com/hello.txttttt", - "\nDuring handling of the above exception, another exception occurred:\n", - "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/2028031330.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#similar to asserting r.status_code == 200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mexcept\u001b[0m \u001b[0mHTTPError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# What's still wrong here?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"oops!!\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mNameError\u001b[0m: name 'HTTPError' is not defined" - ] - } - ], - "source": [ - "# Let's try to catch that error\n", - "\n", - "try:\n", - " r = requests.get(typo_url)\n", - " r.raise_for_status() #similar to asserting r.status_code == 200\n", - " r.text\n", - "except HTTPError as e: # What's still wrong here?\n", - " print(\"oops!!\", e)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# we often need to prepend the names of exceptions with the name of the module\n", - "# fix the error from above\n", - "\n", - "try:\n", - " r = requests.get(typo_url)\n", - " r.raise_for_status() #similar to asserting r.status_code == 200\n", - " r.text\n", - "except requests.HTTPError as e: #correct way to catch the error.\n", - " print(\"oops!!\", e)\n", - " \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## requests.get : JSON file example\n", - "- URL: https://www.msyamkumar.com/scores.json\n", - "- `json.load` (FILE_OBJECT)\n", - "- `json.loads` (STRING)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\n", - " \"alice\": 100,\n", - " \"bob\": 200,\n", - " \"cindy\": 300\n", - "}\n", - "\n", - "<class 'dict'> {'alice': 100, 'bob': 200, 'cindy': 300}\n" - ] - } - ], - "source": [ - "# GETting a JSON file, the long way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "urltext = r.text\n", - "print(urltext)\n", - "d = json.loads(urltext)\n", - "print(type(d), d)" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'dict'> {'alice': 100, 'bob': 200, 'cindy': 300}\n" - ] - } - ], - "source": [ - "# GETting a JSON file, the shortcut way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", - "#Shortcut to bypass using json.loads()\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "d2 = r.json()\n", - "print(type(d2), d2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Good GET Etiquette\n", - "\n", - "Don't make a lot of requests to the same server all at once.\n", - " - Requests use up the server's time\n", - " - Major websites will often ban users who make too many requests\n", - " - You can break a server....similar to DDoS attacks (DON'T DO THIS)\n", - " \n", - "In CS220 we will usually give you a link to a copied file to avoid overloading the site.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## DEMO: Course Enrollment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explore the API!\n", - "\n", - "https://coletnelson.us/cs220-api/classes\n", - "\n", - "https://coletnelson.us/cs220-api/classes_as_txt\n", - "\n", - "https://coletnelson.us/cs220-api/classes/MATH_221\n", - "\n", - "https://coletnelson.us/cs220-api/classes/COMPSCI_200\n", - "\n", - "... etc\n", - "\n", - "https://coletnelson.us/cs220-api/all_data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the list of classes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `json`" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'list'>\n", - "['PSYCH_202', 'COMPSCI_537', 'COMPSCI_300', 'CHEM_104', 'COMPSCI_200', 'MATH_114', 'PSYCH_456', 'COMPSCI_252', 'COMPSCI_400', 'MATH_221', 'BIOLOGY_101', 'COMPSCI_354', 'CHEM_103', 'COMPSCI_639', 'PSYCH_401', 'COMPSCI_240', 'STATS_302']\n" - ] - } - ], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_list = r.json()\n", - "print(type(classes_list))\n", - "print(classes_list)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `text`" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'str'>\n", - "PSYCH_202\n", - "COMPSCI_537\n", - "COMPSCI_300\n", - "CHEM_104\n", - "COMPSCI_200\n", - "MATH_114\n", - "PSYCH_456\n", - "COMPSCI_252\n", - "COMPSCI_400\n", - "MATH_221\n", - "BIOLOGY_101\n", - "COMPSCI_354\n", - "CHEM_103\n", - "COMPSCI_639\n", - "PSYCH_401\n", - "COMPSCI_240\n", - "STATS_302\n" - ] - } - ], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes_as_txt\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_txt = r.text\n", - "print(type(classes_txt))\n", - "print(classes_txt)" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['PSYCH_202',\n", - " 'COMPSCI_537',\n", - " 'COMPSCI_300',\n", - " 'CHEM_104',\n", - " 'COMPSCI_200',\n", - " 'MATH_114',\n", - " 'PSYCH_456',\n", - " 'COMPSCI_252',\n", - " 'COMPSCI_400',\n", - " 'MATH_221',\n", - " 'BIOLOGY_101',\n", - " 'COMPSCI_354',\n", - " 'CHEM_103',\n", - " 'COMPSCI_639',\n", - " 'PSYCH_401',\n", - " 'COMPSCI_240',\n", - " 'STATS_302']" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "classes_txt_as_list = classes_txt.split('\\n')\n", - "classes_txt_as_list" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get data for a specific class" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "<class 'dict'>\n", - "{'credits': 3, 'description': 'Learn the process of incrementally developing small (200-500 lines) programs along with the fundamental Computer Science topics. These topics include: problem abstraction and decomposition, the edit-compile-run cycle, using variables of primitive and more complex data types, conditional and loop-based flow control, basic testing and debugging techniques, how to define and call functions (methods), and IO processing techniques. Also teaches and reinforces good programming practices including the use of a consistent style, and meaningful documentation. Intended for students who have no prior programming experience.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 1', 'number': 'COMPSCI_200', 'requisites': [], 'sections': [{'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_311'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '11:00am - 12:15pm'}, 'number': 'LAB_312'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_314'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_315'}], 'time': {'thursday': '8:00am - 9:15am', 'tuesday': '8:00am - 9:15am'}, 'number': 'LEC_001'}, {'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_321'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '1:00pm - 2:15pm'}, 'number': 'LAB_323'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_324'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_325'}], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_331'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_332'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_333'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_334'}], 'time': {'friday': '1:20pm - 2:10pm', 'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'LEC_003'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_341'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_342'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_343'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_344'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '4:00pm - 5:15pm'}, 'number': 'LAB_345'}], 'time': {'friday': '3:30pm - 4:20pm', 'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}\n" - ] - } - ], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes/COMPSCI_200\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "cs200_data = r.json()\n", - "print(type(cs200_data))\n", - "print(cs200_data) # Too much data? Try print(cs220_data.keys())" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "dict_keys(['credits', 'description', 'keywords', 'name', 'number', 'requisites', 'sections', 'subject'])" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs200_data.keys()" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Get the number of credits the course is worth\n", - "cs200_data['credits']" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['computer', 'science', 'programming', 'java']" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Get the list of keywords for the course\n", - "cs200_data['keywords']" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'Programming 1'" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Get the official course name\n", - "cs200_data['name']" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "4" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Get the number of sections offered.\n", - "len(cs200_data['sections'])" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[{'credits': 3, 'description': 'Behavior, including its development, motivation, frustrations, emotion, intelligence, learning, forgetting, personality, language, thinking, and social behavior.', 'keywords': ['psychology', 'behavior', 'emotion', 'intelligence', 'brain'], 'name': 'Introduction to Psychology', 'number': 'PSYCH_202', 'requisites': [], 'sections': [{'instructor': 'Jeff Henriques', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '9:30am - 10:45am', 'tuesday': '9:30am - 10:45am'}, 'number': 'LEC_001'}, {'instructor': 'Jeff Henriques', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_002'}, {'instructor': 'C. Shawn Green', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'monday': '8:00am - 9:15am', 'wednesday': '8:00am - 9:15am'}, 'number': 'LEC_003'}, {'instructor': 'Patti Coffey', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_004'}, {'instructor': 'Sarah Gavac', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_005'}, {'instructor': 'Patti Coffey', 'location': '101 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_006'}, {'instructor': 'Baoyu Wang', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'monday': '4:30pm - 5:15pm', 'wednesday': '4:30pm - 5:15pm'}, 'number': 'LEC_009'}], 'subject': 'Psychology'}, {'credits': 4, 'description': 'Input-output hardware, interrupt handling, properties of magnetic tapes, discs and drums, associative memories and virtual address translation techniques. Batch processing, time sharing and real-time systems, scheduling resource allocation, modular software systems, performance measurement and system evaluation.', 'keywords': ['computer', 'science', 'operating', 'system', 'systems'], 'name': 'Introduction to Operating Systems', 'number': 'COMPSCI_537', 'requisites': [['COMPSCI_354', 'COMPSCI_400']], 'sections': [{'instructor': 'Andrea Arpaci-Dusseau', 'location': '1125 DeLuca Biochemistry Building', 'subsections': [{'location': '2317 Engineering Hall', 'time': {'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_301'}, {'location': '1325 Computer Sciences and Statistics', 'time': {'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_302'}, {'location': '1325 Computer Sciences and Statistics', 'time': {'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_303'}, {'location': '2255 Engineering Hall', 'time': {'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_304'}, {'location': '1325 Computer Sciences and Statistics', 'time': {'wednesday': '4:15pm - 5:25pm'}, 'number': 'DIS_305'}], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_001'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'Introduces students to Object-Oriented Programming using classes and objects to solve more complex problems. Introduces array-based and linked data structures: including lists, stacks, and queues. Programming assignments require writing and developing multi-class (file) programs using interfaces, generics, and exception handling to solve challenging real world problems. Topics reviewed include reading/writing data and objects from/to files and exception handling, and command line arguments. Topics introduced: object-oriented design; class vs. object; create and define interfaces and iterators; searching and sorting; abstract data types (List,Stack,Queue,PriorityQueue(Heap),Binary Search Tree); generic interfaces (parametric polymorphism); how to design and write test methods and classes; array based vs. linked node implementations; introduction to complexity analysis; recursion.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 2', 'number': 'COMPSCI_300', 'requisites': [['COMPSCI_200']], 'sections': [{'instructor': 'Gary Dahl', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'Gary Dahl', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'friday': '11:00am - 11:50pm', 'monday': '11:00am - 11:50pm', 'wednesday': '11:00am - 11:50pm'}, 'number': 'LEC_003'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': '1310 Sterling Hall', 'subsections': [], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}, {'credits': 5, 'description': 'Principles and application of chemical equilibrium, coordination chemistry, oxidation-reduction and electrochemistry, kinetics, nuclear chemistry, introduction to organic chemistry. Lecture, lab, and discussion.', 'keywords': ['chemistry'], 'name': 'General Chemistry II', 'number': 'CHEM_104', 'requisites': [['MATH_114'], ['CHEM_103']], 'sections': [{'instructor': 'Linda Zelewski', 'location': 'B10 Ingraham Hall', 'subsections': [{'location': '123 Van Hise Hall', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_401'}, {'location': '123 Van Hise Hall', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '12:05pm - 12:55pm', 'tuesday': '12:05pm - 12:55pm'}, 'number': 'DIS_402'}, {'location': 'B387 Chemistry Building', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_403'}, {'location': 'B387 Chemistry Building', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '12:05pm - 12:55pm', 'tuesday': '12:05pm - 12:55pm'}, 'number': 'DIS_404'}], 'time': {'thursday': '9:30am - 10:45am', 'tuesday': '9:30am - 10:45am'}, 'number': 'LEC_001'}, {'instructor': 'Lea Gustin', 'location': '204 Educational Sciences', 'subsections': [{'location': '2377 Chemistry Building', 'time': {'monday': '9:55am - 10:45am', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_421'}, {'location': '2377 Chemistry Building', 'time': {'monday': '11:00am - 11:50am', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_422'}, {'location': '2381 Chemistry Building', 'time': {'monday': '11:00am - 11:50am', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_423'}, {'location': '2377 Chemistry Building', 'time': {'monday': '12:05pm - 12:55pm', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_424'}], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}], 'subject': 'Chemistry'}, {'credits': 3, 'description': 'Learn the process of incrementally developing small (200-500 lines) programs along with the fundamental Computer Science topics. These topics include: problem abstraction and decomposition, the edit-compile-run cycle, using variables of primitive and more complex data types, conditional and loop-based flow control, basic testing and debugging techniques, how to define and call functions (methods), and IO processing techniques. Also teaches and reinforces good programming practices including the use of a consistent style, and meaningful documentation. Intended for students who have no prior programming experience.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 1', 'number': 'COMPSCI_200', 'requisites': [], 'sections': [{'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_311'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '11:00am - 12:15pm'}, 'number': 'LAB_312'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_314'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_315'}], 'time': {'thursday': '8:00am - 9:15am', 'tuesday': '8:00am - 9:15am'}, 'number': 'LEC_001'}, {'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_321'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '1:00pm - 2:15pm'}, 'number': 'LAB_323'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_324'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_325'}], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_331'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_332'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_333'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_334'}], 'time': {'friday': '1:20pm - 2:10pm', 'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'LEC_003'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_341'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_342'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_343'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_344'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '4:00pm - 5:15pm'}, 'number': 'LAB_345'}], 'time': {'friday': '3:30pm - 4:20pm', 'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}, {'credits': 5, 'description': 'The two semester sequence MATH_112-MATH_113 covers similar material as MATH_114, but in a slower pace.', 'keywords': ['math', 'mathematics', 'algebra', 'trigonometry'], 'name': 'Algebra and Trigonometry', 'number': 'MATH_114', 'requisites': [], 'sections': [{'instructor': 'Sharad Chandarana', 'location': 'B130 Van Vleck Hall', 'subsections': [{'location': 'B113 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_301'}, {'location': 'B113 Van Vleck Hall', 'time': {'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'DIS_303'}, {'location': 'B219 Van Vleck Hall', 'time': {'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'DIS_304'}, {'location': 'B113 Van Vleck Hall', 'time': {'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_305'}, {'location': 'B219 Van Vleck Hall', 'time': {'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_306'}, {'location': 'B341 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_307'}, {'location': 'B317 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_308'}, {'location': 'B341 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_309'}, {'location': 'B329 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_310'}, {'location': 'B317 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_311'}], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'Sharad Chandarana', 'location': '19 Ingraham Hall', 'subsections': [{'location': '591 Van Hise Hall', 'time': {'thursday': '8:50am - 9:40am', 'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_321'}, {'location': 'B219 Van Vleck Hall', 'time': {'thursday': '9:55am - 10:45am', 'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_322'}, {'location': '4020 Vilas Hall', 'time': {'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_323'}, {'location': '599 Van Hise Hall', 'time': {'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_324'}, {'location': 'B341 Van Vleck Hall', 'time': {'thursday': '1:20pm - 2:10pm', 'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_325'}, {'location': '223 Van Hise Hall', 'time': {'thursday': '1:20pm - 2:10pm', 'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_326'}, {'location': '223 Van Hise Hall', 'time': {'thursday': '2:25pm - 3:15pm', 'tuesday': '2:25pm - 3:15pm'}, 'number': 'DIS_328'}, {'location': 'B219 Van Vleck Hall', 'time': {'thursday': '3:30pm - 4:20pm', 'tuesday': '3:30pm - 4:20pm'}, 'number': 'DIS_329'}, {'location': 'B341 Van Vleck Hall', 'time': {'thursday': '3:30pm - 4:20pm', 'tuesday': '3:30pm - 4:20pm'}, 'number': 'DIS_330'}], 'time': {'friday': '8:50am - 9:40am', 'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'LEC_002'}], 'subject': 'Mathematics'}, {'credits': 4, 'description': 'The systematic study of the individual in a social context, including social interaction, motivation, attitudes, conformity, communication, leadership, personal relationships, and behavior in small groups.', 'keywords': ['psychology', 'science', 'social', 'interaction', 'behavior'], 'name': 'Introductory Social Psychology', 'number': 'PSYCH_456', 'requisites': [['PSYCH_202']], 'sections': [{'instructor': 'Abigail Letak', 'location': '6104 Sewell Social Sciences', 'subsections': [{'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_301'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_302'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_303'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_304'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '2:25pm - 3:15pm'}, 'number': 'DIS_305'}], 'time': {'friday': '11:00am - 11:50am', 'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'LEC_001'}], 'subject': 'Psychology'}, {'credits': 2, 'description': 'Logic components built with transistors, rudimentary Boolean algebra, basic combinational logic design, basic synchronous sequential logic design, basic computer organization and design, introductory machine- and assembly-language programming.', 'keywords': ['computer', 'science', 'engineering', 'programming'], 'name': 'Introduction to Computer Engineering', 'number': 'COMPSCI_252', 'requisites': [], 'sections': [{'instructor': 'Joseph Krachey', 'location': '1610 Engineering Hall', 'subsections': [], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_001'}, {'instructor': 'Adil Ibrahim', 'location': '113 Brogden Psychology Building', 'subsections': [], 'time': {'friday': '8:50am - 9:40am', 'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'LEC_002'}, {'instructor': 'Adil Ibrahim', 'location': '113 Brogden Psychology Building', 'subsections': [], 'time': {'friday': '12:05pm - 12:55pm', 'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'LEC_005'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'The third course in our programming fundamentals sequence. It presumes that students understand and use functional and object-oriented design and abstract data types as needed. This course introduces balanced search trees, graphs, graph traversal algorithms, hash tables and sets, and complexity analysis and about classes of problems that require each data type. Students are required to design and implement using high quality professional code, a medium sized program, that demonstrates knowledge and use of latest language features, tools, and conventions. Additional topics introduced will include as needed for projects: inheritance and polymorphism; anonymous inner classes, lambda functions, performance analysis to discover and optimize critical code blocks. Students learn about industry standards for code development. Students will design and implement a medium size project with a more advanced user-interface design, such as a web or mobile application with a GUI and event- driven implementation; use of version-control software.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 3', 'number': 'COMPSCI_400', 'requisites': [['COMPSCI_300']], 'sections': [{'instructor': 'Gary Dahl', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'Gary Dahl', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'friday': '11:00am - 11:50pm', 'monday': '11:00am - 11:50pm', 'wednesday': '11:00am - 11:50pm'}, 'number': 'LEC_003'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': '1310 Sterling Hall', 'subsections': [], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}, {'credits': 5, 'description': 'Introduction to differential and integral calculus and plane analytic geometry; applications; transcendental functions.', 'keywords': ['math', 'mathematics', 'calculus', 'analytical', 'geometry', 'differential', 'integral'], 'name': 'Calculus and Analytical Geometry 1', 'number': 'MATH_221', 'requisites': [['MATH_114']], 'sections': [{'instructor': 'Laurentiu Maxim', 'location': '6210 Sewell Social Sciences', 'subsections': [{'location': 'B231 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_301'}, {'location': 'B215 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_302'}, {'location': 'B309 Van Vleck Hall', 'time': {'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_303'}, {'location': 'B211 Van Vleck Hall', 'time': {'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_304'}, {'location': 'B129 Van Vleck Hall', 'time': {'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_305'}, {'location': 'B131 Van Vleck Hall', 'time': {'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_306'}, {'location': 'B231 Van Vleck Hall', 'time': {'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_307'}, {'location': 'B215 Van Vleck Hall', 'time': {'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_308'}, {'location': 'B313 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_309'}, {'location': 'B309 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_310'}, {'location': 'B305 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_311'}, {'location': 'B105 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_312'}, {'location': 'B321 Van Vleck Hall', 'time': {'friday': '9:55am - 10:45am', 'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_313'}], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_001'}], 'subject': 'Mathematics'}, {'credits': 3, 'description': 'General biological principles. Topics include: evolution, ecology, animal behavior, cell structure and function, genetics and molecular genetics and the physiology of a variety of organ systems emphasizing function in humans.', 'keywords': ['biology', 'science', 'animal', 'evolution', 'genetics', 'ecology'], 'name': 'Animal Biology', 'number': 'BIOLOGY_101', 'requisites': [], 'sections': [{'instructor': 'Sharon Thoma', 'location': '272 Bascom Hall', 'subsections': [], 'time': {'friday': '11:00am - 11:50am', 'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'LEC_001'}, {'instructor': 'Sharon Thoma', 'location': '272 Bascom Hall', 'subsections': [], 'time': {'friday': '12:05pm - 12:55pm', 'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'LEC_002'}], 'subject': 'Biology'}, {'credits': 3, 'description': 'An introduction to fundamental structures of computer systems and the C programming language with a focus on the low-level interrelationships and impacts on performance. Topics include the virtual address space and virtual memory, the heap and dynamic memory management, the memory hierarchy and caching, assembly language and the stack, communication and interrupts/signals, compiling and assemblers/linkers.', 'keywords': ['computer', 'science', 'engineering', 'electrical', 'machine', 'programming'], 'name': 'Machine Organization and Programming', 'number': 'COMPSCI_354', 'requisites': [['COMPSCI_252'], ['COMPSCI_300']], 'sections': [{'instructor': 'James Skrentny', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'James Skrentny', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '4:00pm - 5:15pm', 'tuesday': '4:00pm - 5:15pm'}, 'number': 'LEC_002'}], 'subject': 'Computer Science'}, {'credits': 4, 'description': 'Introduction. Stoichiometry and the mole concept, the behavior of gases, liquids and solids, thermochemistry, electronic structure of atoms and chemical bonding, descriptive chemistry of selected elements and compounds, intermolecular forces. For students taking one year or more of college chemistry; serves as a prereq for CHEM_104; lecture, lab and discussion.', 'keywords': ['chemistry'], 'name': 'General Chemistry I', 'number': 'CHEM_103', 'requisites': [], 'sections': [{'instructor': 'Unknown', 'location': 'B10 Ingraham Hall', 'subsections': [{'location': '49 Sellery Residence Hall', 'time': {'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_301'}, {'location': '2307 Chemistry Building', 'time': {'monday': '4:35pm - 5:25pm', 'wednesday': '4:35pm - 5:25pm'}, 'number': 'DIS_302'}, {'location': '123 Van Hise Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_303'}, {'location': '123 Van Hise Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_304'}], 'time': {'friday': '11:00am - 11:50am', 'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'LEC_001'}], 'subject': 'Chemistry'}, {'credits': 3, 'description': 'This course introduces students to the software development of user interfaces (UIs). Topics covered include state-of-the-art (1) UI paradigms, such as event-driven interfaces, direct-manipulation interfaces, and dialogue-based interaction; (2) methods for capturing, interpreting, and responding to different forms of user input and states, including pointing, text entry, speech, touch, gestures, user activity, context, and physiological states; and (3) platform-specific UI development APIs, frameworks, and toolkits for platforms including web/mobile/desktop interfaces, natural user interfaces, and voice user interfaces. Through readings, lectures, and hands-on-activities, students will learn about the fundamental concepts, technologies, and methods in building user interfaces. Assignments will provide an opportunity to gain hands-on experience in the use of state-of-the-art UI development tools and build a UI development portfolio.', 'keywords': ['computer', 'science', 'building', 'user', 'interface', 'interfaces', 'design', 'ui'], 'name': 'Building User Interfaces', 'number': 'COMPSCI_639', 'requisites': [['COMPSCI_300']], 'sections': [{'instructor': 'Bilge Mutlu', 'location': '1221 Computer Sciences and Statistics', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'Focuses on the role that psychological principles, research evidence and social science play in the laws of U.S. society, especially in the policies and mechanisms of social control of human behavior. The course will address the ways that society defines membership, and the role of psychology in how it determines who should be excluded or restricted from open society, in order to maintain a more civil society. In addition to learning the factual information about how selected processes work in the legal and social context, students will be asked to consider the role they can play as citizens in supporting or changing these social processes. The course will take a particular interest in psycholegal issues \"in action\" and in learning about the clinical-legal processes used to determine the disposition of individuals considered marginal in society. Finally, the course will address the mechanisms that are used to exclude individuals from open society through criminal and civil court processes, the role of psychology as a science, and the role of psychologists as behavioral experts in criminal and civil courts, and in shaping social policies.', 'keywords': ['psychology', 'science', 'law', 'social', 'policy', 'behavior'], 'name': 'Psychology, Law, and Social Policy', 'number': 'PSYCH_401', 'requisites': [['PSYCH_202']], 'sections': [{'instructor': 'Gregory Van Rybroek', 'location': '121 Brogden Psychology Building', 'subsections': [], 'time': {'monday': '4:00pm - 5:15pm', 'wednesday': '4:00pm - 5:15pm'}, 'number': 'LEC_001'}], 'subject': 'Psychology'}, {'credits': 3, 'description': 'Basic concepts of logic, sets, partial order and other relations, and functions. Basic concepts of mathematics (definitions, proofs, sets, functions, and relations) with a focus on discrete structures: integers, bits, strings, trees, and graphs. Propositional logic, Boolean algebra, and predicate logic. Mathematical induction and recursion. Invariants and algorithmic correctness. Recurrences and asymptotic growth analysis. Fundamentals of counting.', 'keywords': ['computer', 'science', 'math', 'mathematics', 'discrete', 'logic', 'algorithm', 'algorithms'], 'name': 'Introduction To Discrete Mathematics', 'number': 'COMPSCI_240', 'requisites': [['MATH_221']], 'sections': [{'instructor': 'Beck Hasti', 'location': '105 Brogden Psychology Building', 'subsections': [{'location': '1257 Computer Sciences and Statistics', 'time': {'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_310'}, {'location': '1257 Computer Sciences and Statistics', 'time': {'thursday': '8:50am - 9:40am'}, 'number': 'DIS_311'}, {'location': '3024 Engineering Hall', 'time': {'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_312'}, {'location': '2345 Engineering Hall', 'time': {'thursday': '9:55am - 10:45am'}, 'number': 'DIS_313'}, {'location': '2535 Engineering Hall', 'time': {'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_314'}, {'location': '2535 Engineering Hall', 'time': {'thursday': '11:00am - 11:50am'}, 'number': 'DIS_315'}, {'location': 'B309 Van Vleck Hall', 'time': {'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_316'}], 'time': {'friday': '9:55am - 10:45am', 'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'LEC_001'}, {'instructor': 'Beck Hasti', 'location': '132 Noland Hall', 'subsections': [{'location': 'B211 Van Vleck Hall', 'time': {'thursday': '11:00am - 11:50am'}, 'number': 'DIS_320'}, {'location': 'B211 Van Vleck Hall', 'time': {'tuesday': '12:05pm - 12:55pm'}, 'number': 'DIS_321'}, {'location': '2255 Engineering Hall', 'time': {'thursday': '12:05pm - 12:55pm'}, 'number': 'DIS_322'}, {'location': '2349 Engineering Hall', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_323'}, {'location': '1263 Computer Sciences and Statistics', 'time': {'thursday': '1:20pm - 2:10pm'}, 'number': 'DIS_324'}, {'location': '3418 Engineering Hall', 'time': {'tuesday': '2:25pm - 3:15pm'}, 'number': 'DIS_325'}, {'location': '3418 Engineering Hall', 'time': {'thursday': '2:25pm - 3:15pm'}, 'number': 'DIS_326'}], 'time': {'friday': '1:20pm - 2:10pm', 'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'LEC_002'}, {'instructor': 'Beck Hasti', 'location': '168 Noland Hall', 'subsections': [{'location': '1263 Computer Sciences and Statistics', 'time': {'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_330'}, {'location': '1263 Computer Sciences and Statistics', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_331'}, {'location': '3024 Engineering Hall', 'time': {'thursday': '9:55am - 10:45am'}, 'number': 'DIS_332'}, {'location': '2349 Engineering Hall', 'time': {'thursday': '12:05am - 12:55am'}, 'number': 'DIS_333'}], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_003'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'Graphical and numerical exploration of data; standard errors; distributions for statistical models including binomial, Poisson, normal; estimation; hypothesis testing; randomization tests; basic principles of experimental design; regression; ANOVA; categorical data analysis; goodness of fit; application. (intended for students wishing to take additional statistics courses).', 'keywords': ['statistics', 'statistical', 'math', 'mathematics', 'methods'], 'name': 'Accelerated Introduction to Statistical Methods', 'number': 'STATS_302', 'requisites': [['MATH_221']], 'sections': [{'instructor': 'Unknown', 'location': '331 Service Memorial Institute', 'subsections': [{'location': '212 Educational Sciences', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_311'}, {'location': '1313 Sterling Hall', 'time': {'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_312'}, {'location': '1313 Sterling Hall', 'time': {'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_313'}], 'time': {'monday': '4:00pm - 5:15pm', 'wednesday': '4:00pm - 5:15pm'}, 'number': 'LEC_001'}], 'subject': 'Statistics'}]\n" - ] - } - ], - "source": [ - "# Collect all the class data in a list called 'all_class_data'\n", - "all_class_data = []\n", - "for class_num in classes_list:\n", - " url = \"https://coletnelson.us/cs220-api/classes/\" + class_num\n", - " r = requests.get(url)\n", - " r.raise_for_status()\n", - " class_data = r.json()\n", - " all_class_data.append(class_data)\n", - "\n", - "print(all_class_data) # Too much data? Try print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "17\n" - ] - } - ], - "source": [ - "print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "3 PSYCH_202 Introduction to Psychology\n", - "4 COMPSCI_537 Introduction to Operating Systems\n", - "3 COMPSCI_300 Programming 2\n", - "5 CHEM_104 General Chemistry II\n", - "3 COMPSCI_200 Programming 1\n", - "5 MATH_114 Algebra and Trigonometry\n", - "4 PSYCH_456 Introductory Social Psychology\n", - "2 COMPSCI_252 Introduction to Computer Engineering\n", - "3 COMPSCI_400 Programming 3\n", - "5 MATH_221 Calculus and Analytical Geometry 1\n", - "3 BIOLOGY_101 Animal Biology\n", - "3 COMPSCI_354 Machine Organization and Programming\n", - "4 CHEM_103 General Chemistry I\n", - "3 COMPSCI_639 Building User Interfaces\n", - "3 PSYCH_401 Psychology, Law, and Social Policy\n", - "3 COMPSCI_240 Introduction To Discrete Mathematics\n", - "3 STATS_302 Accelerated Introduction to Statistical Methods\n" - ] - } - ], - "source": [ - "# Print the number of credits, course number, and name for each class.\n", - "for spec_class in all_class_data:\n", - " print(spec_class['credits'], spec_class['number'], spec_class['name'])" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3.4705882352941178" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the average number of credits per course?\n", - "num_credits = 0 \n", - "for spec_class in all_class_data:\n", - " num_credits += spec_class['credits']\n", - "num_credits / len(all_class_data)" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['Biology',\n", - " 'Chemistry',\n", - " 'Computer Science',\n", - " 'Mathematics',\n", - " 'Psychology',\n", - " 'Statistics']" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What are the unique subjects?\n", - "subjects = []\n", - "for spec_class in all_class_data:\n", - " subjects.append(spec_class['subject'])\n", - "list(set(subjects))" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['COMPSCI_300', 'COMPSCI_200', 'COMPSCI_400']" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Besides PYSCH 202, what are the course numbers of the courses\n", - "# with the most sections offered (not including subsections)?\n", - "high_courses = []\n", - "high_sections = 0\n", - "for spec_class in all_class_data:\n", - " current_course_num = spec_class['number']\n", - " current_num_sects = len(spec_class['sections'])\n", - " \n", - " if current_course_num == 'PSYCH_202':\n", - " continue\n", - " \n", - " if current_num_sects == high_sections:\n", - " high_courses.append(current_course_num)\n", - " elif current_num_sects > high_sections:\n", - " high_courses = []\n", - " high_courses.append(current_course_num)\n", - " high_sections = current_num_sects\n", - "high_courses" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Can we make a Pandas dataframe? Yes!" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>requisites</th>\n", - " <th>sections</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>Behavior, including its development, motivatio...</td>\n", - " <td>[psychology, behavior, emotion, intelligence, ...</td>\n", - " <td>Introduction to Psychology</td>\n", - " <td>PSYCH_202</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Jeff Henriques', 'location': ...</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>4</td>\n", - " <td>Input-output hardware, interrupt handling, pro...</td>\n", - " <td>[computer, science, operating, system, systems]</td>\n", - " <td>Introduction to Operating Systems</td>\n", - " <td>COMPSCI_537</td>\n", - " <td>[[COMPSCI_354, COMPSCI_400]]</td>\n", - " <td>[{'instructor': 'Andrea Arpaci-Dusseau', 'loca...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>[[COMPSCI_200]]</td>\n", - " <td>[{'instructor': 'Gary Dahl', 'location': 'AB20...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>5</td>\n", - " <td>Principles and application of chemical equilib...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry II</td>\n", - " <td>CHEM_104</td>\n", - " <td>[[MATH_114], [CHEM_103]]</td>\n", - " <td>[{'instructor': 'Linda Zelewski', 'location': ...</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Jim Williams', 'location': '1...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>5</th>\n", - " <td>5</td>\n", - " <td>The two semester sequence MATH_112-MATH_113 co...</td>\n", - " <td>[math, mathematics, algebra, trigonometry]</td>\n", - " <td>Algebra and Trigonometry</td>\n", - " <td>MATH_114</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Sharad Chandarana', 'location...</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>4</td>\n", - " <td>The systematic study of the individual in a so...</td>\n", - " <td>[psychology, science, social, interaction, beh...</td>\n", - " <td>Introductory Social Psychology</td>\n", - " <td>PSYCH_456</td>\n", - " <td>[[PSYCH_202]]</td>\n", - " <td>[{'instructor': 'Abigail Letak', 'location': '...</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>[computer, science, engineering, programming]</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Joseph Krachey', 'location': ...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>[[COMPSCI_300]]</td>\n", - " <td>[{'instructor': 'Gary Dahl', 'location': 'AB20...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>9</th>\n", - " <td>5</td>\n", - " <td>Introduction to differential and integral calc...</td>\n", - " <td>[math, mathematics, calculus, analytical, geom...</td>\n", - " <td>Calculus and Analytical Geometry 1</td>\n", - " <td>MATH_221</td>\n", - " <td>[[MATH_114]]</td>\n", - " <td>[{'instructor': 'Laurentiu Maxim', 'location':...</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>10</th>\n", - " <td>3</td>\n", - " <td>General biological principles. Topics include:...</td>\n", - " <td>[biology, science, animal, evolution, genetics...</td>\n", - " <td>Animal Biology</td>\n", - " <td>BIOLOGY_101</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Sharon Thoma', 'location': '2...</td>\n", - " <td>Biology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>[computer, science, engineering, electrical, m...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>[[COMPSCI_252], [COMPSCI_300]]</td>\n", - " <td>[{'instructor': 'James Skrentny', 'location': ...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>4</td>\n", - " <td>Introduction. Stoichiometry and the mole conce...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry I</td>\n", - " <td>CHEM_103</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Unknown', 'location': 'B10 In...</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>13</th>\n", - " <td>3</td>\n", - " <td>This course introduces students to the softwar...</td>\n", - " <td>[computer, science, building, user, interface,...</td>\n", - " <td>Building User Interfaces</td>\n", - " <td>COMPSCI_639</td>\n", - " <td>[[COMPSCI_300]]</td>\n", - " <td>[{'instructor': 'Bilge Mutlu', 'location': '12...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>14</th>\n", - " <td>3</td>\n", - " <td>Focuses on the role that psychological princip...</td>\n", - " <td>[psychology, science, law, social, policy, beh...</td>\n", - " <td>Psychology, Law, and Social Policy</td>\n", - " <td>PSYCH_401</td>\n", - " <td>[[PSYCH_202]]</td>\n", - " <td>[{'instructor': 'Gregory Van Rybroek', 'locati...</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>15</th>\n", - " <td>3</td>\n", - " <td>Basic concepts of logic, sets, partial order a...</td>\n", - " <td>[computer, science, math, mathematics, discret...</td>\n", - " <td>Introduction To Discrete Mathematics</td>\n", - " <td>COMPSCI_240</td>\n", - " <td>[[MATH_221]]</td>\n", - " <td>[{'instructor': 'Beck Hasti', 'location': '105...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>16</th>\n", - " <td>3</td>\n", - " <td>Graphical and numerical exploration of data; s...</td>\n", - " <td>[statistics, statistical, math, mathematics, m...</td>\n", - " <td>Accelerated Introduction to Statistical Methods</td>\n", - " <td>STATS_302</td>\n", - " <td>[[MATH_221]]</td>\n", - " <td>[{'instructor': 'Unknown', 'location': '331 Se...</td>\n", - " <td>Statistics</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "0 3 Behavior, including its development, motivatio... \n", - "1 4 Input-output hardware, interrupt handling, pro... \n", - "2 3 Introduces students to Object-Oriented Program... \n", - "3 5 Principles and application of chemical equilib... \n", - "4 3 Learn the process of incrementally developing ... \n", - "5 5 The two semester sequence MATH_112-MATH_113 co... \n", - "6 4 The systematic study of the individual in a so... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "9 5 Introduction to differential and integral calc... \n", - "10 3 General biological principles. Topics include:... \n", - "11 3 An introduction to fundamental structures of c... \n", - "12 4 Introduction. Stoichiometry and the mole conce... \n", - "13 3 This course introduces students to the softwar... \n", - "14 3 Focuses on the role that psychological princip... \n", - "15 3 Basic concepts of logic, sets, partial order a... \n", - "16 3 Graphical and numerical exploration of data; s... \n", - "\n", - " keywords \\\n", - "0 [psychology, behavior, emotion, intelligence, ... \n", - "1 [computer, science, operating, system, systems] \n", - "2 [computer, science, programming, java] \n", - "3 [chemistry] \n", - "4 [computer, science, programming, java] \n", - "5 [math, mathematics, algebra, trigonometry] \n", - "6 [psychology, science, social, interaction, beh... \n", - "7 [computer, science, engineering, programming] \n", - "8 [computer, science, programming, java] \n", - "9 [math, mathematics, calculus, analytical, geom... \n", - "10 [biology, science, animal, evolution, genetics... \n", - "11 [computer, science, engineering, electrical, m... \n", - "12 [chemistry] \n", - "13 [computer, science, building, user, interface,... \n", - "14 [psychology, science, law, social, policy, beh... \n", - "15 [computer, science, math, mathematics, discret... \n", - "16 [statistics, statistical, math, mathematics, m... \n", - "\n", - " name number \\\n", - "0 Introduction to Psychology PSYCH_202 \n", - "1 Introduction to Operating Systems COMPSCI_537 \n", - "2 Programming 2 COMPSCI_300 \n", - "3 General Chemistry II CHEM_104 \n", - "4 Programming 1 COMPSCI_200 \n", - "5 Algebra and Trigonometry MATH_114 \n", - "6 Introductory Social Psychology PSYCH_456 \n", - "7 Introduction to Computer Engineering COMPSCI_252 \n", - "8 Programming 3 COMPSCI_400 \n", - "9 Calculus and Analytical Geometry 1 MATH_221 \n", - "10 Animal Biology BIOLOGY_101 \n", - "11 Machine Organization and Programming COMPSCI_354 \n", - "12 General Chemistry I CHEM_103 \n", - "13 Building User Interfaces COMPSCI_639 \n", - "14 Psychology, Law, and Social Policy PSYCH_401 \n", - "15 Introduction To Discrete Mathematics COMPSCI_240 \n", - "16 Accelerated Introduction to Statistical Methods STATS_302 \n", - "\n", - " requisites \\\n", - "0 [] \n", - "1 [[COMPSCI_354, COMPSCI_400]] \n", - "2 [[COMPSCI_200]] \n", - "3 [[MATH_114], [CHEM_103]] \n", - "4 [] \n", - "5 [] \n", - "6 [[PSYCH_202]] \n", - "7 [] \n", - "8 [[COMPSCI_300]] \n", - "9 [[MATH_114]] \n", - "10 [] \n", - "11 [[COMPSCI_252], [COMPSCI_300]] \n", - "12 [] \n", - "13 [[COMPSCI_300]] \n", - "14 [[PSYCH_202]] \n", - "15 [[MATH_221]] \n", - "16 [[MATH_221]] \n", - "\n", - " sections subject \n", - "0 [{'instructor': 'Jeff Henriques', 'location': ... Psychology \n", - "1 [{'instructor': 'Andrea Arpaci-Dusseau', 'loca... Computer Science \n", - "2 [{'instructor': 'Gary Dahl', 'location': 'AB20... Computer Science \n", - "3 [{'instructor': 'Linda Zelewski', 'location': ... Chemistry \n", - "4 [{'instructor': 'Jim Williams', 'location': '1... Computer Science \n", - "5 [{'instructor': 'Sharad Chandarana', 'location... Mathematics \n", - "6 [{'instructor': 'Abigail Letak', 'location': '... Psychology \n", - "7 [{'instructor': 'Joseph Krachey', 'location': ... Computer Science \n", - "8 [{'instructor': 'Gary Dahl', 'location': 'AB20... Computer Science \n", - "9 [{'instructor': 'Laurentiu Maxim', 'location':... Mathematics \n", - "10 [{'instructor': 'Sharon Thoma', 'location': '2... Biology \n", - "11 [{'instructor': 'James Skrentny', 'location': ... Computer Science \n", - "12 [{'instructor': 'Unknown', 'location': 'B10 In... Chemistry \n", - "13 [{'instructor': 'Bilge Mutlu', 'location': '12... Computer Science \n", - "14 [{'instructor': 'Gregory Van Rybroek', 'locati... Psychology \n", - "15 [{'instructor': 'Beck Hasti', 'location': '105... Computer Science \n", - "16 [{'instructor': 'Unknown', 'location': '331 Se... Statistics " - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "all_course_frame = DataFrame(all_class_data)\n", - "all_course_frame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### We may want to do some \"plumbing\" with our data." - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>Behavior, including its development, motivatio...</td>\n", - " <td>[psychology, behavior, emotion, intelligence, ...</td>\n", - " <td>Introduction to Psychology</td>\n", - " <td>PSYCH_202</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>4</td>\n", - " <td>Input-output hardware, interrupt handling, pro...</td>\n", - " <td>[computer, science, operating, system, systems]</td>\n", - " <td>Introduction to Operating Systems</td>\n", - " <td>COMPSCI_537</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>5</td>\n", - " <td>Principles and application of chemical equilib...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry II</td>\n", - " <td>CHEM_104</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>5</th>\n", - " <td>5</td>\n", - " <td>The two semester sequence MATH_112-MATH_113 co...</td>\n", - " <td>[math, mathematics, algebra, trigonometry]</td>\n", - " <td>Algebra and Trigonometry</td>\n", - " <td>MATH_114</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>4</td>\n", - " <td>The systematic study of the individual in a so...</td>\n", - " <td>[psychology, science, social, interaction, beh...</td>\n", - " <td>Introductory Social Psychology</td>\n", - " <td>PSYCH_456</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>[computer, science, engineering, programming]</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>9</th>\n", - " <td>5</td>\n", - " <td>Introduction to differential and integral calc...</td>\n", - " <td>[math, mathematics, calculus, analytical, geom...</td>\n", - " <td>Calculus and Analytical Geometry 1</td>\n", - " <td>MATH_221</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>10</th>\n", - " <td>3</td>\n", - " <td>General biological principles. Topics include:...</td>\n", - " <td>[biology, science, animal, evolution, genetics...</td>\n", - " <td>Animal Biology</td>\n", - " <td>BIOLOGY_101</td>\n", - " <td>Biology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>[computer, science, engineering, electrical, m...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>4</td>\n", - " <td>Introduction. Stoichiometry and the mole conce...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry I</td>\n", - " <td>CHEM_103</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>13</th>\n", - " <td>3</td>\n", - " <td>This course introduces students to the softwar...</td>\n", - " <td>[computer, science, building, user, interface,...</td>\n", - " <td>Building User Interfaces</td>\n", - " <td>COMPSCI_639</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>14</th>\n", - " <td>3</td>\n", - " <td>Focuses on the role that psychological princip...</td>\n", - " <td>[psychology, science, law, social, policy, beh...</td>\n", - " <td>Psychology, Law, and Social Policy</td>\n", - " <td>PSYCH_401</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>15</th>\n", - " <td>3</td>\n", - " <td>Basic concepts of logic, sets, partial order a...</td>\n", - " <td>[computer, science, math, mathematics, discret...</td>\n", - " <td>Introduction To Discrete Mathematics</td>\n", - " <td>COMPSCI_240</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>16</th>\n", - " <td>3</td>\n", - " <td>Graphical and numerical exploration of data; s...</td>\n", - " <td>[statistics, statistical, math, mathematics, m...</td>\n", - " <td>Accelerated Introduction to Statistical Methods</td>\n", - " <td>STATS_302</td>\n", - " <td>Statistics</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "0 3 Behavior, including its development, motivatio... \n", - "1 4 Input-output hardware, interrupt handling, pro... \n", - "2 3 Introduces students to Object-Oriented Program... \n", - "3 5 Principles and application of chemical equilib... \n", - "4 3 Learn the process of incrementally developing ... \n", - "5 5 The two semester sequence MATH_112-MATH_113 co... \n", - "6 4 The systematic study of the individual in a so... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "9 5 Introduction to differential and integral calc... \n", - "10 3 General biological principles. Topics include:... \n", - "11 3 An introduction to fundamental structures of c... \n", - "12 4 Introduction. Stoichiometry and the mole conce... \n", - "13 3 This course introduces students to the softwar... \n", - "14 3 Focuses on the role that psychological princip... \n", - "15 3 Basic concepts of logic, sets, partial order a... \n", - "16 3 Graphical and numerical exploration of data; s... \n", - "\n", - " keywords \\\n", - "0 [psychology, behavior, emotion, intelligence, ... \n", - "1 [computer, science, operating, system, systems] \n", - "2 [computer, science, programming, java] \n", - "3 [chemistry] \n", - "4 [computer, science, programming, java] \n", - "5 [math, mathematics, algebra, trigonometry] \n", - "6 [psychology, science, social, interaction, beh... \n", - "7 [computer, science, engineering, programming] \n", - "8 [computer, science, programming, java] \n", - "9 [math, mathematics, calculus, analytical, geom... \n", - "10 [biology, science, animal, evolution, genetics... \n", - "11 [computer, science, engineering, electrical, m... \n", - "12 [chemistry] \n", - "13 [computer, science, building, user, interface,... \n", - "14 [psychology, science, law, social, policy, beh... \n", - "15 [computer, science, math, mathematics, discret... \n", - "16 [statistics, statistical, math, mathematics, m... \n", - "\n", - " name number \\\n", - "0 Introduction to Psychology PSYCH_202 \n", - "1 Introduction to Operating Systems COMPSCI_537 \n", - "2 Programming 2 COMPSCI_300 \n", - "3 General Chemistry II CHEM_104 \n", - "4 Programming 1 COMPSCI_200 \n", - "5 Algebra and Trigonometry MATH_114 \n", - "6 Introductory Social Psychology PSYCH_456 \n", - "7 Introduction to Computer Engineering COMPSCI_252 \n", - "8 Programming 3 COMPSCI_400 \n", - "9 Calculus and Analytical Geometry 1 MATH_221 \n", - "10 Animal Biology BIOLOGY_101 \n", - "11 Machine Organization and Programming COMPSCI_354 \n", - "12 General Chemistry I CHEM_103 \n", - "13 Building User Interfaces COMPSCI_639 \n", - "14 Psychology, Law, and Social Policy PSYCH_401 \n", - "15 Introduction To Discrete Mathematics COMPSCI_240 \n", - "16 Accelerated Introduction to Statistical Methods STATS_302 \n", - "\n", - " subject \n", - "0 Psychology \n", - "1 Computer Science \n", - "2 Computer Science \n", - "3 Chemistry \n", - "4 Computer Science \n", - "5 Mathematics \n", - "6 Psychology \n", - "7 Computer Science \n", - "8 Computer Science \n", - "9 Mathematics \n", - "10 Biology \n", - "11 Computer Science \n", - "12 Chemistry \n", - "13 Computer Science \n", - "14 Psychology \n", - "15 Computer Science \n", - "16 Statistics " - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Remove the 'sections' and 'requisites' column.\n", - "new_course_frame = all_course_frame.loc[:, \"credits\":\"number\"]\n", - "new_course_frame[\"subject\"] = all_course_frame.loc[:, \"subject\"]\n", - "new_course_frame" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>Behavior, including its development, motivatio...</td>\n", - " <td>psychology, behavior, emotion, intelligence, b...</td>\n", - " <td>Introduction to Psychology</td>\n", - " <td>PSYCH_202</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>4</td>\n", - " <td>Input-output hardware, interrupt handling, pro...</td>\n", - " <td>computer, science, operating, system, systems</td>\n", - " <td>Introduction to Operating Systems</td>\n", - " <td>COMPSCI_537</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>5</td>\n", - " <td>Principles and application of chemical equilib...</td>\n", - " <td>chemistry</td>\n", - " <td>General Chemistry II</td>\n", - " <td>CHEM_104</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>5</th>\n", - " <td>5</td>\n", - " <td>The two semester sequence MATH_112-MATH_113 co...</td>\n", - " <td>math, mathematics, algebra, trigonometry</td>\n", - " <td>Algebra and Trigonometry</td>\n", - " <td>MATH_114</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>4</td>\n", - " <td>The systematic study of the individual in a so...</td>\n", - " <td>psychology, science, social, interaction, beha...</td>\n", - " <td>Introductory Social Psychology</td>\n", - " <td>PSYCH_456</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>computer, science, engineering, programming</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>9</th>\n", - " <td>5</td>\n", - " <td>Introduction to differential and integral calc...</td>\n", - " <td>math, mathematics, calculus, analytical, geome...</td>\n", - " <td>Calculus and Analytical Geometry 1</td>\n", - " <td>MATH_221</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>10</th>\n", - " <td>3</td>\n", - " <td>General biological principles. Topics include:...</td>\n", - " <td>biology, science, animal, evolution, genetics,...</td>\n", - " <td>Animal Biology</td>\n", - " <td>BIOLOGY_101</td>\n", - " <td>Biology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>computer, science, engineering, electrical, ma...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>4</td>\n", - " <td>Introduction. Stoichiometry and the mole conce...</td>\n", - " <td>chemistry</td>\n", - " <td>General Chemistry I</td>\n", - " <td>CHEM_103</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>13</th>\n", - " <td>3</td>\n", - " <td>This course introduces students to the softwar...</td>\n", - " <td>computer, science, building, user, interface, ...</td>\n", - " <td>Building User Interfaces</td>\n", - " <td>COMPSCI_639</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>14</th>\n", - " <td>3</td>\n", - " <td>Focuses on the role that psychological princip...</td>\n", - " <td>psychology, science, law, social, policy, beha...</td>\n", - " <td>Psychology, Law, and Social Policy</td>\n", - " <td>PSYCH_401</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>15</th>\n", - " <td>3</td>\n", - " <td>Basic concepts of logic, sets, partial order a...</td>\n", - " <td>computer, science, math, mathematics, discrete...</td>\n", - " <td>Introduction To Discrete Mathematics</td>\n", - " <td>COMPSCI_240</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>16</th>\n", - " <td>3</td>\n", - " <td>Graphical and numerical exploration of data; s...</td>\n", - " <td>statistics, statistical, math, mathematics, me...</td>\n", - " <td>Accelerated Introduction to Statistical Methods</td>\n", - " <td>STATS_302</td>\n", - " <td>Statistics</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "0 3 Behavior, including its development, motivatio... \n", - "1 4 Input-output hardware, interrupt handling, pro... \n", - "2 3 Introduces students to Object-Oriented Program... \n", - "3 5 Principles and application of chemical equilib... \n", - "4 3 Learn the process of incrementally developing ... \n", - "5 5 The two semester sequence MATH_112-MATH_113 co... \n", - "6 4 The systematic study of the individual in a so... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "9 5 Introduction to differential and integral calc... \n", - "10 3 General biological principles. Topics include:... \n", - "11 3 An introduction to fundamental structures of c... \n", - "12 4 Introduction. Stoichiometry and the mole conce... \n", - "13 3 This course introduces students to the softwar... \n", - "14 3 Focuses on the role that psychological princip... \n", - "15 3 Basic concepts of logic, sets, partial order a... \n", - "16 3 Graphical and numerical exploration of data; s... \n", - "\n", - " keywords \\\n", - "0 psychology, behavior, emotion, intelligence, b... \n", - "1 computer, science, operating, system, systems \n", - "2 computer, science, programming, java \n", - "3 chemistry \n", - "4 computer, science, programming, java \n", - "5 math, mathematics, algebra, trigonometry \n", - "6 psychology, science, social, interaction, beha... \n", - "7 computer, science, engineering, programming \n", - "8 computer, science, programming, java \n", - "9 math, mathematics, calculus, analytical, geome... \n", - "10 biology, science, animal, evolution, genetics,... \n", - "11 computer, science, engineering, electrical, ma... \n", - "12 chemistry \n", - "13 computer, science, building, user, interface, ... \n", - "14 psychology, science, law, social, policy, beha... \n", - "15 computer, science, math, mathematics, discrete... \n", - "16 statistics, statistical, math, mathematics, me... \n", - "\n", - " name number \\\n", - "0 Introduction to Psychology PSYCH_202 \n", - "1 Introduction to Operating Systems COMPSCI_537 \n", - "2 Programming 2 COMPSCI_300 \n", - "3 General Chemistry II CHEM_104 \n", - "4 Programming 1 COMPSCI_200 \n", - "5 Algebra and Trigonometry MATH_114 \n", - "6 Introductory Social Psychology PSYCH_456 \n", - "7 Introduction to Computer Engineering COMPSCI_252 \n", - "8 Programming 3 COMPSCI_400 \n", - "9 Calculus and Analytical Geometry 1 MATH_221 \n", - "10 Animal Biology BIOLOGY_101 \n", - "11 Machine Organization and Programming COMPSCI_354 \n", - "12 General Chemistry I CHEM_103 \n", - "13 Building User Interfaces COMPSCI_639 \n", - "14 Psychology, Law, and Social Policy PSYCH_401 \n", - "15 Introduction To Discrete Mathematics COMPSCI_240 \n", - "16 Accelerated Introduction to Statistical Methods STATS_302 \n", - "\n", - " subject \n", - "0 Psychology \n", - "1 Computer Science \n", - "2 Computer Science \n", - "3 Chemistry \n", - "4 Computer Science \n", - "5 Mathematics \n", - "6 Psychology \n", - "7 Computer Science \n", - "8 Computer Science \n", - "9 Mathematics \n", - "10 Biology \n", - "11 Computer Science \n", - "12 Chemistry \n", - "13 Computer Science \n", - "14 Psychology \n", - "15 Computer Science \n", - "16 Statistics " - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Turn 'keywords' into a series of Strings and remove the '[', ']', '''\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].astype('string')\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"[\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"]\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"'\", \"\", regex=False)\n", - "new_course_frame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pandas Operations" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "5" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the most number of credits a course offers?\n", - "new_course_frame[\"credits\"].max()" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the least number of credits a course offers?\n", - "new_course_frame[\"credits\"].min()" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "credits 2\n", - "description Logic components built with transistors, rudim...\n", - "keywords computer, science, engineering, programming\n", - "name Introduction to Computer Engineering\n", - "number COMPSCI_252\n", - "subject Computer Science\n", - "Name: 7, dtype: object" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the info for that course?\n", - "new_course_frame.iloc[new_course_frame[\"credits\"].idxmin()]" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>computer, science, engineering, programming</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>computer, science, engineering, electrical, ma...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "2 3 Introduces students to Object-Oriented Program... \n", - "4 3 Learn the process of incrementally developing ... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "11 3 An introduction to fundamental structures of c... \n", - "\n", - " keywords \\\n", - "2 computer, science, programming, java \n", - "4 computer, science, programming, java \n", - "7 computer, science, engineering, programming \n", - "8 computer, science, programming, java \n", - "11 computer, science, engineering, electrical, ma... \n", - "\n", - " name number subject \n", - "2 Programming 2 COMPSCI_300 Computer Science \n", - "4 Programming 1 COMPSCI_200 Computer Science \n", - "7 Introduction to Computer Engineering COMPSCI_252 Computer Science \n", - "8 Programming 3 COMPSCI_400 Computer Science \n", - "11 Machine Organization and Programming COMPSCI_354 Computer Science " - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What courses contain the keyword \"programming\"?\n", - "mask = new_course_frame[\"keywords\"].str.contains(\"programming\")\n", - "new_course_frame[mask]" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'Psychology, Law, and Social Policy'" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What course has the most lengthy description?\n", - "idx_max_desc = new_course_frame[\"description\"].str.len().idxmax()\n", - "new_course_frame.iloc[idx_max_desc]['name']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write it out to a CSV file on your drive\n", - "You now have your own copy!" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [], - "source": [ - "# Write it all out to a single CSV file\n", - "new_course_frame.to_csv(\"my_course_data.csv\", index=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Other Cool APIs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- City of Madison Transit: http://transitdata.cityofmadison.com/\n", - "- Reddit: https://reddit.com/r/UWMadison.json\n", - "- Lord of the Rings: https://the-one-api.dev/\n", - "- Pokemon: https://pokeapi.co/\n", - "\n", - "Remember: Be judicious when making requests; don't overwhelm the server! :)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next Time\n", - "What other documents can we get via the Web? HTML is very popular! We'll explore this." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.7" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1_template-checkpoint.ipynb b/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1_template-checkpoint.ipynb deleted file mode 100644 index b51ca54..0000000 --- a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/lec_29_web1_template-checkpoint.ipynb +++ /dev/null @@ -1,767 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Web 1 - How to get data from the Internet" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# New module\n", - "import requests\n", - "\n", - "# Known modules\n", - "import json\n", - "import pandas as pd\n", - "from pandas import Series, DataFrame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### P10 check-in" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# It is very important to check auto-grader test results on p10 in a timely manner.\n", - "# Take a few minutes to verify if you hardcoded the slashes in P10 rather than using os.path.join? \n", - " # Your code won't clear auto-grader if you hardcode either \"/\" or \"\\\" \n", - " # for *ANY* relative path in the entire project\n", - "# Check your code and check the autograder as soon as possible." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 1: Read the data from \"IMDB-Movie-Data.csv\" into a pandas DataFrame called \"movies\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 2: fixing duplicate index columns\n", - "\n", - "Notice that there are two index columns\n", - "- That happened because when you write a csv from pandas to a file, it writes a new index column\n", - "- So if the DataFrame already contains an index, you are going to get two index columns\n", - "- Let's fix that problem" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#use slicing to retain all the rows and columns excepting for column with integer position 0\n", - "movies = movies.iloc[:, 1:] \n", - "movies" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "movies.to_csv(\"better_movies.csv\", index = False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 3: Which movie has highest rating?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 4: Which movies were released in 2020?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 5a: What does this function do?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def format_revenue(revenue):\n", - " if type(revenue) == float: # need this in here if we run code multiple times\n", - " return revenue\n", - " elif revenue[-1] == 'M': # some have an \"M\" at the end\n", - " return float(revenue[:-1]) * 1e6\n", - " else: # otherwise, assume millions.\n", - " return float(revenue) * 1e6" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 5b: Using the above function, create a new column called \"Revenue in dollars\" by applying appropriate conversion to Revenue column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 6: What are the top 10 highest-revenue movies?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Warmup 7: Which shortest movies (below average runtime) have highest rating?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Learning Objectives\n", - "\n", - "- Make a request for data using requests.get(URL)\n", - "- Check the status of a request/response\n", - "- Extract the text of a response\n", - "- Create a json file from a response\n", - "- State and practice good etiquette when getting data" - ] - }, - { - "attachments": { - "Client_server.png": { - "image/png": "" - } - }, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Core Ideas:\n", - " - Network structure\n", - " - Client / server\n", - " - Request / response\n", - " \n", - " \n", - " \n", - " - HTTP protocol\n", - " - URL\n", - " - Headers\n", - " - Status Codes\n", - " - The requests module" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## HTTP Status Codes you need to know\n", - "- 200: success\n", - "- 404: not found\n", - "\n", - "Here is a list of all status codes, you do NOT need to memorize it: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## requests.get : Simple string example\n", - "- URL: https://www.msyamkumar.com/hello.txt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://www.msyamkumar.com/hello.txt\"\n", - "r = requests.get(url) # r is the response\n", - "print(r.status_code)\n", - "print(r.text)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Q: What if the web site does not exist?\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", - "r = requests.get(typo_url)\n", - "print(r.status_code)\n", - "print(r.text)\n", - "\n", - "# A: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# We can check for a status_code error by using an assert\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", - "r = requests.get(typo_url)\n", - "assert r.status_code == 200\n", - "print(r.status_code)\n", - "print(r.text)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Instead of using an assert, we often use raise_for_status()\n", - "r = requests.get(typo_url)\n", - "r.raise_for_status() #similar to asserting r.status_code == 200\n", - "r.text\n", - "\n", - "# Note the error you get.... We will use this in the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Let's try to catch that error\n", - "\n", - "try:\n", - "\n", - "except:\n", - " print(\"oops!!\", e)\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# we often need to prepend the names of exceptions with the name of the module\n", - "# fix the error from above\n", - "\n", - "try:\n", - "\n", - "except:\n", - " print(\"oops!!\", e)\n", - " \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## requests.get : JSON file example\n", - "- URL: https://www.msyamkumar.com/scores.json\n", - "- `json.load` (FILE_OBJECT)\n", - "- `json.loads` (STRING)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# GETting a JSON file, the long way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "urltext = r.text\n", - "print(urltext)\n", - "d = json.loads(urltext)\n", - "print(type(d), d)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# GETting a JSON file, the shortcut way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", - "#Shortcut to bypass using json.loads()\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "d2 = r.json()\n", - "print(type(d2), d2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Good GET Etiquette\n", - "\n", - "Don't make a lot of requests to the same server all at once.\n", - " - Requests use up the server's time\n", - " - Major websites will often ban users who make too many requests\n", - " - You can break a server....similar to DDoS attacks (DON'T DO THIS)\n", - " \n", - "In CS220 we will usually give you a link to a copied file to avoid overloading the site.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## DEMO: Course Enrollment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explore the API!\n", - "\n", - "https://coletnelson.us/cs220-api/classes\n", - "\n", - "https://coletnelson.us/cs220-api/classes_as_txt\n", - "\n", - "https://coletnelson.us/cs220-api/classes/MATH_221\n", - "\n", - "https://coletnelson.us/cs220-api/classes/COMPSCI_200\n", - "\n", - "... etc\n", - "\n", - "https://coletnelson.us/cs220-api/all_data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the list of classes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `json`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_list = r.json()\n", - "print(type(classes_list))\n", - "print(classes_list)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `text`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes_as_txt\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_txt = r.text\n", - "print(type(classes_txt))\n", - "print(classes_txt)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "classes_txt_as_list = ???" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get data for a specific class" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes/COMPSCI_200\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "cs200_data = r.json()\n", - "print(type(cs200_data))\n", - "print(cs200_data) # Too much data? Try print(cs220_data.keys())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cs200_data.keys()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the number of credits the course is worth\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the list of keywords for the course\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the official course name\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the number of sections offered.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Collect all the class data in a list called 'all_class_data'\n", - "all_class_data = []\n", - "for class_num in classes_list:\n", - " url = \"https://coletnelson.us/cs220-api/classes/\" + class_num\n", - " r = requests.get(url)\n", - " r.raise_for_status()\n", - " class_data = r.json()\n", - " all_class_data.append(???)\n", - "\n", - "print(all_class_data) # Too much data? Try print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Print the number of credits, course number, and name for each class.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the average number of credits per course?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What are the unique subjects?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Besides PYSCH 202, what are the course numbers of the courses\n", - "# with the most sections offered (not including subsections)?\n", - "high_courses = []\n", - "high_sections = 0\n", - "for spec_class in all_class_data:\n", - " pass\n", - "high_courses" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Can we make a Pandas dataframe? Yes!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "all_course_frame = DataFrame(all_class_data)\n", - "all_course_frame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### We may want to do some \"plumbing\" with our data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Remove the 'sections' and 'requisites' column.\n", - "new_course_frame = all_course_frame.loc[:, \"credits\":\"number\"]\n", - "new_course_frame[\"subject\"] = all_course_frame.loc[:, \"subject\"]\n", - "new_course_frame" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Turn 'keywords' into a series of Strings and remove the '[', ']', '''\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].astype('string')\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"[\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"]\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"'\", \"\", regex=False)\n", - "new_course_frame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pandas Operations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the most number of credits a course offers?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the least number of credits a course offers?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the info for that course?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What courses contain the keyword \"programming\"?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What course has the most lengthy description?\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write it out to a CSV file on your drive\n", - "You now have your own copy!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Write it all out to a single CSV file\n", - "new_course_frame.to_csv(\"my_course_data.csv\", index=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Other Cool APIs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- City of Madison Transit: http://transitdata.cityofmadison.com/\n", - "- Reddit: https://reddit.com/r/UWMadison.json\n", - "- Lord of the Rings: https://the-one-api.dev/\n", - "- Pokemon: https://pokeapi.co/\n", - "\n", - "Remember: Be judicious when making requests; don't overwhelm the server! :)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next Time\n", - "What other documents can we get via the Web? HTML is very popular! We'll explore this." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.7" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas1-checkpoint.ipynb b/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas1-checkpoint.ipynb deleted file mode 100644 index 1e84430..0000000 --- a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas1-checkpoint.ipynb +++ /dev/null @@ -1,1736 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Pandas 1" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<style>.container { width:100% !important; }</style>" - ], - "text/plain": [ - "<IPython.core.display.HTML object>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from IPython.core.display import display, HTML\n", - "display(HTML(\"<style>.container { width:100% !important; }</style>\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Learning objectives" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " - Pandas:\n", - " - Python module: tools for doing Data Science\n", - " - helps deal with tabular (tables) data\n", - " - List of list is not adequate alternative to excel\n", - " - Series: new data structure\n", - " - hybrid of a dict and a list\n", - " - Python dict \"key\" equivalent to \"index\" in pandas\n", - " - Python list \"index\" quivalent to \"integer position\" in pandas\n", - " - supports complicated expressions within lookup [...]\n", - " - element-wise operation\n", - " - boolean indexing\n", - " - DataFrames aka tables (next lecture)\n", - " - built from series\n", - " - each series will be a column in the table" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# pandas comes with Anaconda installation\n", - "If for some reason, you don't have pandas installed, run the following command in terminal or powershell\n", - "<pre> pip install pandas </pre>" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# importing pandas module\n", - "import pandas" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Module naming abbreviation" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# Common abbrievation for pandas module\n", - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a dict" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'one': 7, 'two': 8, 'three': 9}" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# create a series from a dict\n", - "d = {\"one\": 7, \"two\": 8, \"three\": 9}\n", - "d" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "pandas.core.series.Series" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pd.Series" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series(d)\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"one\": 7, \"two\": 8, \"three\": 9}) # equivalent to the above example\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 one 7\n", - "# 1 two 8\n", - "# 2 three 9\n", - "\n", - "# dtype: int64" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with index (.loc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'one': 7, 'two': 8, 'three': 9}" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "d" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# dict access with key\n", - "d[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[\"two\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with integer position (.iloc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "9" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[-1]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Regular lookups with just [ ]" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing multiple values with a list of integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[[1, 2]]" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# series access with a list of indexes\n", - "s[[\"two\", \"three\"]]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a list" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 100\n", - "1 200\n", - "2 300\n", - "dtype: int64" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Series created from a list\n", - "num_list = [100, 200, 300]\n", - "s = pd.Series(num_list)\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 0 100\n", - "# 1 1 200\n", - "# 2 2 300\n", - "# dtype: int64" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "200\n", - "200\n" - ] - } - ], - "source": [ - "print(s.loc[1])\n", - "print(s.iloc[1])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "pandas looks for an index when we do a [ ] lookup, by default" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'D'" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "letters\n", - "# letters[-1] # Avoid negative indexes, unless we use .iloc\n", - "letters.iloc[-1]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 A\n", - "1 B\n", - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "letters" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['C', 'D']" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# list slicing\n", - "sliced_letter_list = letters_list[2:]\n", - "sliced_letter_list" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Sliced Series retains original Series index, whereas integer positions are renumbered." - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letters = letters[2:]\n", - "sliced_letters" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "# Note: integer positions get renumbered, whereas indexes do not.\n", - "\n", - "# IP Index values\n", - "# 0 2 C\n", - "# 1 3 D\n", - "# dtype: object" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "C\n", - "C\n" - ] - } - ], - "source": [ - "print(sliced_letters.loc[2])\n", - "print(sliced_letters.iloc[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using index" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 29, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"one\": 7, \"two\": 8, \"three\": 9})\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#slicing with indexes\n", - "s[\"two\":]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Statistics on Series" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 44\n", - "1 32\n", - "2 19\n", - "3 67\n", - "4 23\n", - "5 23\n", - "6 92\n", - "7 47\n", - "8 47\n", - "9 78\n", - "10 84\n", - "dtype: int64" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores = pd.Series([44, 32, 19, 67, 23, 23, 92, 47, 47, 78, 84])\n", - "scores" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "50.54545454545455" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.mean()" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "26.051347897426098" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.std()" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "47.0" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.median()" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 23\n", - "1 47\n", - "dtype: int64" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.mode()" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1.00 92.0\n", - "0.75 72.5\n", - "0.50 47.0\n", - "0.25 27.5\n", - "0.00 19.0\n", - "dtype: float64" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "scores.quantile([1.0, 0.75, 0.5, 0.25, 0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## CS220 information survey data" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [], - "source": [ - "# Modified from https://automatetheboringstuff.com/chapter14/\n", - "import csv\n", - "def process_csv(filename):\n", - " example_file = open(filename, encoding=\"utf-8\")\n", - " example_reader = csv.reader(example_file)\n", - " example_data = list(example_reader)\n", - " example_file.close()\n", - " return example_data\n", - "\n", - "data = process_csv(\"cs220_survey_data.csv\")\n", - "header = data[0]\n", - "data = data[1:]" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['lecture', 'age', 'major', 'topping']" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "header" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['LEC001', '19', 'Computer Science', 'basil/spinach'],\n", - " ['LEC002', '18', 'Engineering', 'pineapple'],\n", - " ['LEC003', '19', 'Business', 'pepperoni']]" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data[:3]" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [], - "source": [ - "# use list comprehension to extract just ages\n", - "# age_list = [int(row[1]) for row in data if row[1] != \"\"]\n", - "age_list = [int(row[header.index(\"age\")]) for row in data if row[header.index(\"age\")] != \"\"]\n", - "#age_list" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 19\n", - "1 18\n", - "2 19\n", - "3 19\n", - "4 19\n", - " ..\n", - "877 19\n", - "878 20\n", - "879 21\n", - "880 19\n", - "881 18\n", - "Length: 882, dtype: int64" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages = pd.Series(age_list)\n", - "cs220_ages" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Unique values in a Series" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "19 290\n", - "18 214\n", - "20 178\n", - "21 101\n", - "22 41\n", - "23 13\n", - "17 11\n", - "25 7\n", - "24 6\n", - "26 4\n", - "28 3\n", - "29 2\n", - "30 2\n", - "27 2\n", - "34 1\n", - "37 1\n", - "35 1\n", - "16 1\n", - "33 1\n", - "32 1\n", - "31 1\n", - "46 1\n", - "dtype: int64" - ] - }, - "execution_count": 42, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages.value_counts()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Series sorting\n", - "- can be done using index or values" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "16 1\n", - "17 11\n", - "18 214\n", - "19 290\n", - "20 178\n", - "21 101\n", - "22 41\n", - "23 13\n", - "24 6\n", - "25 7\n", - "26 4\n", - "27 2\n", - "28 3\n", - "29 2\n", - "30 2\n", - "31 1\n", - "32 1\n", - "33 1\n", - "34 1\n", - "35 1\n", - "37 1\n", - "46 1\n", - "dtype: int64" - ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages.value_counts().sort_index()" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "46 1\n", - "32 1\n", - "33 1\n", - "16 1\n", - "35 1\n", - "37 1\n", - "34 1\n", - "31 1\n", - "27 2\n", - "30 2\n", - "29 2\n", - "28 3\n", - "26 4\n", - "24 6\n", - "25 7\n", - "17 11\n", - "23 13\n", - "22 41\n", - "21 101\n", - "20 178\n", - "18 214\n", - "19 290\n", - "dtype: int64" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages.value_counts().sort_values()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Statistics" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the mode of CS220 student ages?" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 19\n", - "dtype: int64\n" - ] - } - ], - "source": [ - "print(cs220_ages.mode())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What is the 75th percentile of ages?" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "20.0\n" - ] - } - ], - "source": [ - "print(cs220_ages.quantile(.75))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise operations\n", - "1. SERIES op SCALAR\n", - "2. SERIES op SERIES" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Chris 10\n", - "Kiara 3\n", - "Mikayla 7\n", - "Ann 8\n", - "Trish 6\n", - "dtype: int64\n", - "Kiara 7\n", - "Chris 3\n", - "Trish 11\n", - "Mikayla 2\n", - "Ann 5\n", - "Meena 20\n", - "dtype: int64\n" - ] - } - ], - "source": [ - "## Series from a dict\n", - "game1_points = pd.Series({\"Chris\": 10, \"Kiara\": 3, \"Mikayla\": 7, \"Ann\": 8, \"Trish\": 6})\n", - "print(game1_points)\n", - "game2_points = pd.Series({\"Kiara\": 7, \"Chris\": 3, \"Trish\": 11, \"Mikayla\": 2, \"Ann\": 5, \"Meena\": 20})\n", - "print(game2_points)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Give 2 additional points for every player's game 1 score" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Chris 12\n", - "Kiara 5\n", - "Mikayla 9\n", - "Ann 10\n", - "Trish 8\n", - "dtype: int64" - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "game1_points + 2" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Chris 12\n", - "Kiara 5\n", - "Mikayla 9\n", - "Ann 10\n", - "Trish 8\n", - "dtype: int64" - ] - }, - "execution_count": 50, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "game1_points = game1_points + 2\n", - "game1_points" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Give 3 additional points for every player's game 2 score" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Kiara 10\n", - "Chris 6\n", - "Trish 14\n", - "Mikayla 5\n", - "Ann 8\n", - "Meena 23\n", - "dtype: int64" - ] - }, - "execution_count": 51, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "game2_points += 3\n", - "game2_points" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compute total of two series" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Ann 18.0\n", - "Chris 18.0\n", - "Kiara 15.0\n", - "Meena NaN\n", - "Mikayla 14.0\n", - "Trish 22.0\n", - "dtype: float64" - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Pandas can perform operations on two series by matching up their indices\n", - "total = game1_points + game2_points\n", - "total" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Who has the highest points?" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "22.0\n", - "Trish\n" - ] - } - ], - "source": [ - "## Who has the most points?\n", - "print(total.max())\n", - "print(total.idxmax())" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "15.0 15.0\n" - ] - } - ], - "source": [ - "print(total['Kiara'], total[2])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Boolean indexing\n", - "- applying boolean expressions on a Series\n", - "- boolean expression will be specified within the pair of [ ]\n", - "- Boolean operators:\n", - " - & means 'and'\n", - " - | means 'or'\n", - " - ~ means 'not'\n", - " - we must use () for compound boolean expressions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s = pd.Series([10, 2, 3, 15])\n", - "s" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find all values > 8" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 True\n", - "1 False\n", - "2 False\n", - "3 True\n", - "dtype: bool" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# gives a boolean Series, where each value is True if the original Series values satifies the condition\n", - "b = s > 8\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 56, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# now let's apply the boolean expression, which gives a boolean Series\n", - "s[b]" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 57, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Equivalently, you can directly specify boolean expression inside the [ ]\n", - "s[s > 8]" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Decomposing the steps here\n", - "# Above example is equivalent to\n", - "b = pd.Series([True, False, False, True])\n", - "s[b]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How many students are 25 years or older?" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 False\n", - "1 False\n", - "2 False\n", - "3 False\n", - "4 False\n", - " ... \n", - "877 False\n", - "878 False\n", - "879 False\n", - "880 False\n", - "881 False\n", - "Length: 882, dtype: bool" - ] - }, - "execution_count": 59, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages > 25" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "38 32\n", - "93 46\n", - "169 30\n", - "170 35\n", - "173 29\n", - "266 28\n", - "369 28\n", - "409 27\n", - "479 26\n", - "495 31\n", - "669 26\n", - "686 26\n", - "696 30\n", - "698 34\n", - "732 29\n", - "756 26\n", - "786 27\n", - "790 37\n", - "794 28\n", - "804 33\n", - "dtype: int64" - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "cs220_ages[cs220_ages > 25]" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "20\n" - ] - } - ], - "source": [ - "print(len(cs220_ages[cs220_ages > 25]))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How many students are in the age range 18 to 20, inclusive?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "(ages >= 18) & (ages <= 20)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ages[(ages >= 18) & (ages <= 20)]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "len(ages[(ages >= 18) & (ages <= 20)])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### what percentage of students are ages 18 OR 21?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "len((ages[ (ages == 18) | (ages == 20)])) / len(ages)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Plotting age information as a bar plot" - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[Text(0.5, 0, 'age'), Text(0, 0.5, 'count')]" - ] - }, - "execution_count": 62, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAVL0lEQVR4nO3dfbRldX3f8feHGYoaHuRhQGCowzJDFEyC6e00CW3BYAMJtaARO3bFNUlpx64FgqltHUxaSVLMJBVTq2KDBUUjkolGmQajAqKWRhgGRGB4qFMYYcIIVyEBVihrMfn2j7Nnc3Ln3HvO3DvnnnPnvl9r7XX2+e393ft37t3nfu4+++GkqpAkCWC/UXdAkjQ+DAVJUstQkCS1DAVJUstQkCS1DAVJUmtooZDkJUk2JflOki1JfrNpPyzJDUm+2zwe2lVzcZKtSR5Mcsaw+iZJ6i3Duk4hSYAfqapnk+wP3AJcBLwZeLKq1idZBxxaVe9JciLwWWAVcAxwI3BCVe2cbh1HHHFErVixYij9l6R91R133PGDqlrWa9rSYa20OmnzbPN0/2Yo4GzgtKb9auDrwHua9mur6nng4SRb6QTEt6Zbx4oVK9i8efMwui9J+6wk35tu2lCPKSRZkuQu4Anghqq6DTiqqnYANI9HNrMfCzzaVb69aZMkzZOhhkJV7ayqk4HlwKokr51h9vRaxG4zJWuTbE6yeXJyci/1VJIE83T2UVX9JZ2Pic4EHk9yNEDz+EQz23bguK6y5cBjPZZ1RVVNVNXEsmU9PxKTJM3SMM8+Wpbk5c34S4E3AA8AG4E1zWxrgOua8Y3A6iQHJDkeWAlsGlb/JEm7G9qBZuBo4OokS+iEz4aq+tMk3wI2JDkPeAQ4F6CqtiTZANwHvACcP9OZR5KkvW9op6TOh4mJifLsI0naM0nuqKqJXtO8olmS1DIUJEmtYR5T0IBWrLt+2mnb1p81jz2RtNi5pyBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqTW0EIhyXFJbk5yf5ItSS5q2i9J8hdJ7mqGX+yquTjJ1iQPJjljWH2TJPW2dIjLfgF4d1XdmeQg4I4kNzTTfr+qPtA9c5ITgdXAScAxwI1JTqiqnUPsoySpy9D2FKpqR1Xd2Yw/A9wPHDtDydnAtVX1fFU9DGwFVg2rf5Kk3c3LMYUkK4DXAbc1TRckuTvJVUkObdqOBR7tKtvOzCEiSdrLhh4KSQ4EPg+8q6qeBj4GvAo4GdgBXLZr1h7l1WN5a5NsTrJ5cnJyOJ2WpEVqqKGQZH86gfCZqvoTgKp6vKp2VtXfAB/nxY+ItgPHdZUvBx6busyquqKqJqpqYtmyZcPsviQtOsM8+yjAlcD9VfXBrvaju2Z7E3BvM74RWJ3kgCTHAyuBTcPqnyRpd8M8++gU4O3APUnuatreC7wtycl0PhraBrwDoKq2JNkA3EfnzKXzPfNIkubX0EKhqm6h93GCL81Qcylw6bD6JEmamVc0S5JahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqbV01B3YV6xYd/2007atP2seeyJJs+eegiSpZShIklqGgiSpZShIklqGgiSpZShIklpDC4UkxyW5Ocn9SbYkuahpPyzJDUm+2zwe2lVzcZKtSR5Mcsaw+iZJ6m2YewovAO+uqtcAPw2cn+REYB1wU1WtBG5qntNMWw2cBJwJXJ5kyRD7J0maYmihUFU7qurOZvwZ4H7gWOBs4OpmtquBc5rxs4Frq+r5qnoY2AqsGlb/JEm7m5djCklWAK8DbgOOqqod0AkO4MhmtmOBR7vKtjdtU5e1NsnmJJsnJyeH2m9JWmyGHgpJDgQ+D7yrqp6eadYebbVbQ9UVVTVRVRPLli3bW92UJDHkUEiyP51A+ExV/UnT/HiSo5vpRwNPNO3bgeO6ypcDjw2zf5Kkv22YZx8FuBK4v6o+2DVpI7CmGV8DXNfVvjrJAUmOB1YCm4bVP0nS7oZ5l9RTgLcD9yS5q2l7L7Ae2JDkPOAR4FyAqtqSZANwH50zl86vqp1D7J8kaYqhhUJV3ULv4wQAp09Tcylw6bD6JEmamVc0S5JahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqTXMex9pHqxYd/2M07etP2ueeiJpX+CegiSpZShIklqGgiSpZShIklqGgiSpZShIkloDhUKSmwZpkyQtbDNep5DkJcDLgCOSHMqLX695MHDMkPsmSZpn/S5eewfwLjoBcAcvhsLTwEeH1y1J0ijMGApV9SHgQ0neWVUfnqc+SZJGZKDbXFTVh5P8LLCiu6aqPjWkfkmSRmCgUEjyaeBVwF3Azqa5AENBkvYhg94QbwI4sapqmJ2RJI3WoNcp3Au8YpgdkSSN3qB7CkcA9yXZBDy/q7Gq/tlQeiVJGolBQ+GSPV1wkquAfwo8UVWvbdouAf41MNnM9t6q+lIz7WLgPDrHLC6sqq/s6TolSXMz6NlH35jFsj8JfITdD0b/flV9oLshyYnAauAkOtdE3JjkhKraiSRp3gx6m4tnkjzdDP8vyc4kT89UU1XfBJ4csB9nA9dW1fNV9TCwFVg1YK0kaS8ZKBSq6qCqOrgZXgL8Ep29gNm4IMndSa5qbp0BcCzwaNc825s2SdI8mtVdUqvqi8DPzaL0Y3SudzgZ2AFc1rSnx7w9T39NsjbJ5iSbJycne80iSZqlQS9ee3PX0/3oXLewx9csVNXjXcv8OPCnzdPtwHFdsy4HHptmGVcAVwBMTEx43YQk7UWDnn30xq7xF4BtdI4D7JEkR1fVjubpm+hc/wCwEbgmyQfpHGheCWza0+VLkuZm0LOPfnVPF5zks8BpdG67vR14H3BakpPp7GVso3MXVqpqS5INwH10Qud8zzySpPk36MdHy4EPA6fQ+YN+C3BRVW2frqaq3taj+coZ5r8UuHSQ/kiShmPQA82foPMRzzF0zgr6n02bJGkfMmgoLKuqT1TVC83wSWDZEPslSRqBQUPhB0l+OcmSZvhl4IfD7Jgkaf4NGgr/Engr8H061xe8Bdjjg8+SpPE26Cmpvw2sqaqnAJIcBnyATlhIkvYRg+4p/MSuQACoqieB1w2nS5KkURk0FPbruk/Rrj2FQfcyJEkLxKB/2C8D/jzJ5+hcp/BWvKZAkvY5g17R/Kkkm+ncBC/Am6vqvqH2TJI07wb+CKgJAYNAkvZhs7p1tiRp32QoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqTW0UEhyVZInktzb1XZYkhuSfLd5PLRr2sVJtiZ5MMkZw+qXJGl6w9xT+CRw5pS2dcBNVbUSuKl5TpITgdXASU3N5UmWDLFvkqQehhYKVfVN4MkpzWcDVzfjVwPndLVfW1XPV9XDwFZg1bD6Jknqbek8r++oqtoBUFU7khzZtB8L3No13/amTUO0Yt31M07ftv6seeqJpHExLgea06Otes6YrE2yOcnmycnJIXdLkhaX+Q6Fx5McDdA8PtG0bweO65pvOfBYrwVU1RVVNVFVE8uWLRtqZyVpsZnvUNgIrGnG1wDXdbWvTnJAkuOBlcCmee6bJC16QzumkOSzwGnAEUm2A+8D1gMbkpwHPAKcC1BVW5JsAO4DXgDOr6qdw+qbJKm3oYVCVb1tmkmnTzP/pcClw+qPJKm/cTnQLEkaA4aCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWktHsdIk24BngJ3AC1U1keQw4I+AFcA24K1V9dQo+idJi9Uo9xReX1UnV9VE83wdcFNVrQRuap5LkubROH18dDZwdTN+NXDO6LoiSYvTqEKhgK8muSPJ2qbtqKraAdA8HtmrMMnaJJuTbJ6cnJyn7krS4jCSYwrAKVX1WJIjgRuSPDBoYVVdAVwBMDExUcPqoCQtRiPZU6iqx5rHJ4AvAKuAx5McDdA8PjGKvknSYjbvoZDkR5IctGsc+HngXmAjsKaZbQ1w3Xz3TZIWu1F8fHQU8IUku9Z/TVV9OcntwIYk5wGPAOeOoG+StKjNeyhU1UPAT/Zo/yFw+nz3R5L0onE6JVWSNGKjOvtIC9yKddfPOH3b+rPmqSeS9ib3FCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLb+jWSMx03c8+/3O0ugYCg2/iF6S/PhIktTFUJAktfz4SIuKHxNKMxu7UEhyJvAhYAnwP6pq/Yi7pDHjQWppeMYqFJIsAT4K/BNgO3B7ko1Vdd9oeya5l6HFYaxCAVgFbK2qhwCSXAucDRgKWtTcO9J8GbdQOBZ4tOv5duAfDFrsf3IaZ6P6wz6X98WoavvVj6q2X/1C/Xl1S1UNPPOwJTkXOKOq/lXz/O3Aqqp6Z9c8a4G1zdMfAx6cYZFHAD+YZXdGVTvKdS/E2lGu29e8MGpHue5xfc2vrKplPadU1dgMwM8AX+l6fjFw8RyWt3mh1S7Ufvvz8jWPa+1C7feoXvO4XadwO7AyyfFJ/g6wGtg44j5J0qIxVscUquqFJBcAX6FzSupVVbVlxN2SpEVjrEIBoKq+BHxpLy3uigVYO8p1L8TaUa7b17wwake57gX3msfqQLMkabTG7ZiCJGmEDAVJUstQkCS1DAUtaEmOHOG6Dx/VujV/RrmNzdZcts19OhT29TdtkkOSrE/yQJIfNsP9TdvL57DcP+sz/eAkv5Pk00n+xZRpl/epfUWSjyX5aJLDk1yS5J4kG5Ic3af2sCnD4cCmJIcmOWyA13Vm1/ghSa5McneSa5Ic1ad2fZIjmvGJJA8BtyX5XpJT+9TemeQ3kryqXx971E4kuTnJHyY5LskNSf4qye1JXjdA/YFJfivJlqZuMsmtSX5lgNpFtX019bPexua4fc1lG5n1ttnLPhMKi/FNC2wAngJOq6rDq+pw4PVN2x/3We9PTTP8PeDkPuv9BBDg88DqJJ9PckAz7af71H6Szg0OHwVuBp4DzgL+F/Df+9T+ALija9hM535Zdzbj/by/a/wyYAfwRjoXTf5Bn9qzqmrXLQP+C/DPq+pH6dzR97I+tYcCLwduTrIpya8lOWaA/gJcDvwecD3w58AfVNUhwLpmWj+fAR4CzgB+E/hvwNuB1yd5/0yFLL7tC+a2jc1l+5rLNjKXbXN3c7mMepwG4J6u8ZuBv9+Mn0Cfy72Bh4EPAI8Am4BfA44ZcL2bgF8A3kZnQ3xL03468K0B6q8DfgVYDvxb4D8CK4Grgff3qX1wNtOa6TuBrzU/q6nDc31q75ry/NeB/w0cDtzZp/bbXeOPzLTcHrX/Dvgy8OPdv7s92EbunG5dA6z7AWBpM37rdNveAOv9R3T+mH+/+VmvncPP69sz1TbzfGfK89ubx/2AB9y+9t42Nsftay7byKy3zZ7L29OCcR0W6Zv2q8B/AI7qajsKeA9wY5/ae4GV00x7tE/t/cB+U9rWAFuA7w36eoH/vCe/p2ae5XT+S/0gcBDw0B5sI9vpBO+76fz3nK5pd/epfWfz8/454BLgvwL/mM5/358edPvqalsCnAl8ok/tt4CfB84Fvgec07SfygD3tqGzd/EPm/E38rfvLdbvD/u+tn3N+Due6zY2x+1rLtvIrLfNnsvb04JxHRbpm/ZQ4HfpBOJTwJPNG+p3gcP61L4F+LFppp3Tp/b3gDf0aD8T+G6f2t8CDuzR/qPA5/bg9/1G4Fbg+3tQ874pw7Km/RXApwaoPw34I+DbwD10rrxfC+zfp+7aOWzXP0nnti9/BryazrcS/iWdP5A/O2D9pqbmll2/c2AZcKHb197bxuayfc1lG2nqX99j23xHv22z57Lm0pFxG2Z40y4d1i9kL7xpf2LKm/aEpr3vm7aZ79XAG6a+EYAzB6w9fS/X/sJ8rRd4KfDaQWuH+JqHXfua2dZ21c92G1nFix/FnkTnv+BfHHC93bUn0vkveqxre9T/OPAbC6HfU5bV95+caWtnW7iQBuBXF1rtIPXAhXS+T+KLwDbg7K5p/T57nUvtO0dUO+s+j3Lde2G9D8zhNc+6ns5/u7fSOcD6O8BNwH8Cvgn8+h7Wfm3caxfwa97YY3h213i/17zb8va0YCEOTPmsfyHUDlJPZ2/owGZ8RbNBXdQ8/7a147HuBf6alwAvA54GDm7aX0r/z8gXXO1C7TedM6P+kM4nJac2jzua8VP7veapw9jdJXW2ktw93SQ6B8fGrnYv1C+pqmcBqmpbktOAzyV5ZVNv7Xise6G+5heqaifw10n+b1U93SznuSR/sw/WLtR+TwAX0TlL699X1V1Jnquqb/Sp621PU2RcB+BxOuc/v3LKsAJ4bBxr98K6vwacPKVtKfApYKe147HuBfyabwNe1ozv19V+CP0/elpwtQu53828u86a+ghz+YRjtoXjNgBX0pzF02PaNeNYuxfWvRx4xTTTTrF2PNa9gF/zAdO0H0HXefz7Su1C7veUmrPoc43TTIPfpyBJau0zt7mQJM2doSBJahkKkqSWoSBJahkK0iwl+WKSO5rbnq9t2s5L8n+SfD3Jx5N8pGlf1twC+vZmOGW0vZd68+wjaZaSHFZVTyZ5KZ175p9B5xbPPwU8Q+cage9U1QVJrgEur6pbkvxdOjc+fM3IOi9NY5+5olkagQuTvKkZP47Ol9d8o6qeBEjyx3S+zwM6N6Q7MWkvIj44yUFV9cx8dljqx1CQZqG5XcQbgJ+pqr9O8nU6N72b7r///Zp5n5uXDkqz5DEFaXYOAZ5qAuHVdL4m8mXAqel8n+9S4Je65v8qcMGuJ0lOns/OSoMyFKTZ+TKwtLmh4W/Tue3xX9D5nt7bgBvpfFfwXzXzXwhMpPNF7vcB/2b+uyz154FmaS9KcmBVPdvsKXwBuKqqvjDqfkmDck9B2rsuSXIXne8ofpjOl9tIC4Z7CpKklnsKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJav1/Dw/IUUxWqiEAAAAASUVORK5CYII=\n", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "age_plot = cs220_ages.value_counts().sort_index().plot.bar()\n", - "age_plot.set(xlabel = \"age\", ylabel = \"count\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.8" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas_1_worksheet-checkpoint.ipynb b/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas_1_worksheet-checkpoint.ipynb deleted file mode 100644 index f70acea..0000000 --- a/f22/meena_lec_notes/lec-29/.ipynb_checkpoints/pandas_1_worksheet-checkpoint.ipynb +++ /dev/null @@ -1,2042 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Pandas 1 worksheet\n", - "\n", - "- Observe syntax, predict output and run cell to confirm your output" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<style>.container { width:100% !important; }</style>" - ], - "text/plain": [ - "<IPython.core.display.HTML object>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from IPython.core.display import display, HTML\n", - "display(HTML(\"<style>.container { width:100% !important; }</style>\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Learning objectives" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " - Pandas helps deal with tabular (tables) data\n", - " - List of list is not adequate alternative to excel\n", - " - Series: new data structure\n", - " - hybrid of a dict and a list\n", - " - Python dict \"key\" equivalent to \"index\" in pandas\n", - " - Python list \"index\" quivalent to \"integer position\" in pandas\n", - " - supports complicated expressions within lookup [...]\n", - " - element-wise operation\n", - " - boolean indexing\n", - " - DataFrames aka tables (next lecture)\n", - " - built from series\n", - " - each series will be a column in the table" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# pandas comes with Anaconda installation\n", - "If for some reason, you don't have pandas installed, run the following command in terminal or powershell\n", - "<pre> pip install pandas </pre>" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "pandas.core.series.Series" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pandas.Series" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Module naming abbreviation" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "pandas.core.series.Series" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pd.Series" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a dict" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'one': 7, 'two': 8, 'three': 9}" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#create a series from a dict\n", - "d = {\"one\":7, \"two\":8, \"three\":9}\n", - "d" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"one\":7, \"two\":8, \"three\":9})\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 one 7\n", - "# 1 two 8\n", - "# 2 three 9\n", - "\n", - "# dtype: int64" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with index (.loc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# dict access with key\n", - "d[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[\"two\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing values with integer position (.iloc[...])" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[1]" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "9" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.iloc[-1]" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[\"one\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "7" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessing multiple values with a list of integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[[0, 2]]" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#series access with a list of indexes\n", - "s[[\"one\", \"three\"]]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a series from a list" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 100\n", - "1 200\n", - "2 300\n", - "dtype: int64" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Series created from a list\n", - "num_list = [100, 200, 300]\n", - "s = pd.Series([100, 200, 300])\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "# IP index value\n", - "# 0 0 100\n", - "# 1 1 200\n", - "# 2 2 300\n", - "# dtype: int64" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "200\n", - "200\n" - ] - } - ], - "source": [ - "print(s.loc[1])\n", - "print(s.iloc[1])" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "# letters[-1] #Avoid negative indexes, unless we use .iloc" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using integer positions" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 A\n", - "1 B\n", - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "letters_list = [\"A\", \"B\", \"C\", \"D\"]\n", - "letters = pd.Series(letters_list)\n", - "letters" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['A', 'B', 'C', 'D']" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#list slicing reveiw\n", - "letters_list" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['C', 'D']" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letter_list = letters_list[2:]\n", - "sliced_letter_list" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'C'" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letter_list[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 A\n", - "1 B\n", - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#series slicing\n", - "letters" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2 C\n", - "3 D\n", - "dtype: object" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letters = letters[2:]\n", - "sliced_letters" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'C'" - ] - }, - "execution_count": 29, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letters.loc[2]" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'C'" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letters.iloc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [], - "source": [ - "# sliced_letter.loc[0] # index 0 doesn't exist in the sliced series!" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'C'" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sliced_letters[2]" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [], - "source": [ - "# Note: integer positions get renumbered, whereas indexes do not.\n", - "\n", - "# IP Index values\n", - "# 0 2 c\n", - "# 1 3 d\n", - "# 2 4 e\n", - "# 3 5 f\n", - "# dtype: object" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Slicing series using index" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "one 7\n", - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"one\":7, \"two\":8, \"three\":9})\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "two 8\n", - "three 9\n", - "dtype: int64" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#slicing with indexes\n", - "s[\"two\":]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise operations\n", - "1. SERIES op SCALAR\n", - "2. SERIES op SERIES" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[1, 2, 3, 1, 2, 3, 1, 2, 3]" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#list recap\n", - "nums = [1, 2, 3]\n", - "nums * 3" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "snum = pd.Series(nums)\n", - "snum" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 3\n", - "1 6\n", - "2 9\n", - "dtype: int64" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "snum * 3" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 4\n", - "1 5\n", - "2 6\n", - "dtype: int64" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "snum + 3" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 0.333333\n", - "1 0.666667\n", - "2 1.000000\n", - "dtype: float64" - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "snum / 3" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[1, 2, 3]" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "nums" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [], - "source": [ - "# nums / 3 # doesn't work with lists" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64" - ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "snum" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 3\n", - "1 4\n", - "2 5\n", - "dtype: int64" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "snum += 2\n", - "snum" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[1, 2, 3, 4, 5, 6]" - ] - }, - "execution_count": 45, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#list recap\n", - "l1 = [1, 2, 3]\n", - "l2 = [4, 5, 6]\n", - "l1 + l2" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64\n", - "0 4\n", - "1 5\n", - "2 6\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "0 5\n", - "1 7\n", - "2 9\n", - "dtype: int64" - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s1 = pd.Series(l1)\n", - "s2 = pd.Series(l2)\n", - "print(s1)\n", - "print(s2)\n", - "s1 + s2" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64\n", - "0 4\n", - "1 5\n", - "2 6\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "0 4\n", - "1 10\n", - "2 18\n", - "dtype: int64" - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s1 * s2" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64\n", - "0 4\n", - "1 5\n", - "2 6\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "0 0.25\n", - "1 0.40\n", - "2 0.50\n", - "dtype: float64" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s1 / s2" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64\n", - "0 4\n", - "1 5\n", - "2 6\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "0 4\n", - "1 25\n", - "2 216\n", - "dtype: int64" - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s2 ** s1" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64\n", - "0 4\n", - "1 5\n", - "2 6\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "0 True\n", - "1 True\n", - "2 True\n", - "dtype: bool" - ] - }, - "execution_count": 50, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(s1)\n", - "print(s2)\n", - "s1 < s2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## What happens to element-wise operation if we have two series with different sizes?" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 5.0\n", - "1 7.0\n", - "2 NaN\n", - "dtype: float64" - ] - }, - "execution_count": 51, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pd.Series([1,2,3]) + pd.Series([4,5])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Series with different types" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 a\n", - "1 Alice\n", - "2 True\n", - "3 1\n", - "4 4.5\n", - "5 [1, 2]\n", - "6 {'a': 'Alice'}\n", - "dtype: object" - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pd.Series([\"a\", \"Alice\", True, 1, 4.5, [1,2], {\"a\":\"Alice\"}])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How do you merge two series?" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 1\n", - "1 2\n", - "2 3\n", - "dtype: int64\n", - "0 4\n", - "1 5\n", - "dtype: int64\n" - ] - } - ], - "source": [ - "s1 = pd.Series([1,2,3]) \n", - "s2 = pd.Series([4,5])\n", - "print(s1)\n", - "print(s2)" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 1\n", - "1 2\n", - "2 3\n", - "0 4\n", - "1 5\n", - "dtype: int64" - ] - }, - "execution_count": 54, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.concat( [s1, s2] )\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 1\n", - "0 4\n", - "dtype: int64" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s.loc[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise Ambiguity" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "A 10\n", - "B 20\n", - "dtype: int64\n", - "B 1\n", - "A 2\n", - "dtype: int64\n" - ] - } - ], - "source": [ - "s1 = pd.Series({\"A\":10, \"B\": 20 })\n", - "s2 = pd.Series({\"B\":1, \"A\": 2 })\n", - "print(s1)\n", - "print(s2)" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "A 12\n", - "B 21\n", - "dtype: int64" - ] - }, - "execution_count": 57, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# INDEX ALIGNMENT\n", - "s1 + s2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to insert an index-value pair?" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "A 10\n", - "B 20\n", - "dtype: int64\n" - ] - }, - { - "data": { - "text/plain": [ - "A 10\n", - "B 20\n", - "Z 100\n", - "dtype: int64" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series({\"A\":10, \"B\": 20 })\n", - "print(s)\n", - "s[\"Z\"] = 100\n", - "s" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Boolean indexing" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "1 2\n", - "2 3\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 59, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series([10, 2, 3, 15])\n", - "s" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to extract numbers > 8?" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 True\n", - "1 False\n", - "2 False\n", - "3 True\n", - "dtype: bool" - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "b = pd.Series([True, False, False, True])\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 61, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[b]" - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "1 2\n", - "2 3\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 62, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 63, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 True\n", - "1 False\n", - "2 False\n", - "3 True\n", - "dtype: bool" - ] - }, - "execution_count": 63, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "b = s > 8\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 64, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[b]" - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 65, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[s > 8]" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "3 15\n", - "dtype: int64" - ] - }, - "execution_count": 66, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[pd.Series([True, False, False, True])]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Element-wise String operations" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 APPLE\n", - "1 boy\n", - "2 CAT\n", - "3 dog\n", - "dtype: object" - ] - }, - "execution_count": 67, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "words = pd.Series([\"APPLE\", \"boy\", \"CAT\", \"dog\"])\n", - "words" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": {}, - "outputs": [], - "source": [ - "# words.upper() # can't call string functions on Series" - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 APPLE\n", - "1 BOY\n", - "2 CAT\n", - "3 DOG\n", - "dtype: object" - ] - }, - "execution_count": 69, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "words.str.upper()" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 True\n", - "1 False\n", - "2 True\n", - "3 False\n", - "dtype: bool" - ] - }, - "execution_count": 70, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#words[BOOLEAN SERIES]\n", - "#How do we get BOOLEAN SERIES?\n", - "b = words == words.str.upper()\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": 71, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 APPLE\n", - "2 CAT\n", - "dtype: object" - ] - }, - "execution_count": 71, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "words[b]" - ] - }, - { - "cell_type": "code", - "execution_count": 72, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 APPLE\n", - "2 CAT\n", - "dtype: object" - ] - }, - "execution_count": 72, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "words[words == words.str.upper()]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to get the odd numbers from a list?" - ] - }, - { - "cell_type": "code", - "execution_count": 73, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "1 19\n", - "2 11\n", - "3 30\n", - "4 35\n", - "dtype: int64" - ] - }, - "execution_count": 73, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s = pd.Series([10, 19, 11, 30, 35])\n", - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 74, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 0\n", - "1 1\n", - "2 1\n", - "3 0\n", - "4 1\n", - "dtype: int64" - ] - }, - "execution_count": 74, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s % 2" - ] - }, - { - "cell_type": "code", - "execution_count": 75, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 False\n", - "1 True\n", - "2 True\n", - "3 False\n", - "4 True\n", - "dtype: bool" - ] - }, - "execution_count": 75, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "b = s % 2 == 1\n", - "b" - ] - }, - { - "cell_type": "code", - "execution_count": 76, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "1 19\n", - "2 11\n", - "3 30\n", - "4 35\n", - "dtype: int64" - ] - }, - "execution_count": 76, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 77, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1 19\n", - "2 11\n", - "4 35\n", - "dtype: int64" - ] - }, - "execution_count": 77, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s[b]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## BOOLEAN OPERATORS on series: and, or, not " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## How to get numbers < 12 or numbers > 33?" - ] - }, - { - "cell_type": "code", - "execution_count": 78, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 10\n", - "1 19\n", - "2 11\n", - "3 30\n", - "4 35\n", - "dtype: int64" - ] - }, - "execution_count": 78, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "s" - ] - }, - { - "cell_type": "code", - "execution_count": 79, - "metadata": {}, - "outputs": [], - "source": [ - "# s[s < 12 or s > 33] # doesn't work with or, and, not" - ] - }, - { - "cell_type": "code", - "execution_count": 80, - "metadata": {}, - "outputs": [ - { - "ename": "ValueError", - "evalue": "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m<ipython-input-80-743185dad521>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# use | instead of or\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0ms\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m12\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0ms\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m33\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;31m# error because precedence is so high\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# s[ s < (12 | s) > 33]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__nonzero__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1440\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mfinal\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1441\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__nonzero__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1442\u001b[0;31m raise ValueError(\n\u001b[0m\u001b[1;32m 1443\u001b[0m \u001b[0;34mf\"The truth value of a {type(self).__name__} is ambiguous. \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1444\u001b[0m \u001b[0;34m\"Use a.empty, a.bool(), a.item(), a.any() or a.all().\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mValueError\u001b[0m: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." - ] - } - ], - "source": [ - "# use | instead of or\n", - "s[ s < 12 | s > 33]\n", - "# error because precedence is so high\n", - "# s[ s < (12 | s) > 33]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use lots of parenthesis\n", - "s[ (s < 12) | (s > 33)]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# AND is &\n", - "s[ (s > 12) & (s < 33)]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# NOT is ~\n", - "s[ ~((s > 12) & (s < 33))]" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.8" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/f22/meena_lec_notes/lec-29/lec_29_web1.ipynb b/f22/meena_lec_notes/lec-29/lec_29_web1.ipynb index e214a4a..655dcac 100644 --- a/f22/meena_lec_notes/lec-29/lec_29_web1.ipynb +++ b/f22/meena_lec_notes/lec-29/lec_29_web1.ipynb @@ -1375,7 +1375,7 @@ "metadata": {}, "source": [ "## requests.get : Simple string example\n", - "- URL: https://www.msyamkumar.com/hello.txt" + "- URL: https://cs220.cs.wisc.edu/hello.txt" ] }, { @@ -1394,7 +1394,7 @@ } ], "source": [ - "url = \"https://www.msyamkumar.com/hello.txt\"\n", + "url = \"https://cs220.cs.wisc.edu/hello.txt\"\n", "r = requests.get(url) # r is the response\n", "print(r.status_code)\n", "print(r.text)" @@ -1409,20 +1409,38 @@ "name": "stdout", "output_type": "stream", "text": [ - "404\n", - "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", - "<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>meena/hello.txttttt</Key><RequestId>9PAFR0FANW1CRPTP</RequestId><HostId>Y+VL63r3qTktX1ZLIpaUvaSXOhstWA4yhSSA6RKCRumeA5+WK+ht7TbROpUZtVjmpGT/QaJcYA0=</HostId></Error>\n" + "403\n", + "<html>\n", + "<head><title>403 Forbidden</title></head>\n", + "<body>\n", + "<h1>403 Forbidden</h1>\n", + "<ul>\n", + "<li>Code: AccessDenied</li>\n", + "<li>Message: Access Denied</li>\n", + "<li>RequestId: RSKJ3EYVNRYREDMQ</li>\n", + "<li>HostId: l6ZMrsw5g6KOT3fA0zTwyNHdXcngrnGkpT2nJe92rIBllfDi2Vbrz6jLPcUVl3yvQ+45SAg8ebg=</li>\n", + "</ul>\n", + "<h3>An Error Occurred While Attempting to Retrieve a Custom Error Document</h3>\n", + "<ul>\n", + "<li>Code: AccessDenied</li>\n", + "<li>Message: Access Denied</li>\n", + "</ul>\n", + "<hr/>\n", + "</body>\n", + "</html>\n", + "\n" ] } ], "source": [ "# Q: What if the web site does not exist?\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", + "typo_url = \"https://cs220.cs.wisc.edu/hello.txttttt\"\n", "r = requests.get(typo_url)\n", "print(r.status_code)\n", "print(r.text)\n", "\n", - "# A: We get a 404 (client error)" + "# A: We get a 403 (Forbidden error)\n", + "# The most common error that you will encounter is 404 (File not found)" ] }, { @@ -1437,14 +1455,14 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/2682133174.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mtypo_url\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"https://www.msyamkumar.com/hello.txttttt\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtypo_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus_code\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus_code\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "Input \u001b[0;32mIn [14]\u001b[0m, in \u001b[0;36m<cell line: 4>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m typo_url \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttps://cs220.cs.wisc.edu/hello.txttttt\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 3\u001b[0m r \u001b[38;5;241m=\u001b[39m requests\u001b[38;5;241m.\u001b[39mget(typo_url)\n\u001b[0;32m----> 4\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m r\u001b[38;5;241m.\u001b[39mstatus_code \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m200\u001b[39m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28mprint\u001b[39m(r\u001b[38;5;241m.\u001b[39mstatus_code)\n\u001b[1;32m 6\u001b[0m \u001b[38;5;28mprint\u001b[39m(r\u001b[38;5;241m.\u001b[39mtext)\n", "\u001b[0;31mAssertionError\u001b[0m: " ] } ], "source": [ "# We can check for a status_code error by using an assert\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", + "typo_url = \"https://cs220.cs.wisc.edu/hello.txttttt\"\n", "r = requests.get(typo_url)\n", "assert r.status_code == 200\n", "print(r.status_code)\n", @@ -1458,14 +1476,14 @@ "outputs": [ { "ename": "HTTPError", - "evalue": "404 Client Error: Not Found for url: https://www.msyamkumar.com/hello.txttttt", + "evalue": "403 Client Error: Forbidden for url: https://cs220.cs.wisc.edu/hello.txttttt", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/4051826470.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Instead of using an assert, we often use raise_for_status()\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtypo_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#similar to asserting r.status_code == 200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py\u001b[0m in \u001b[0;36mraise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 951\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 952\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 953\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mHTTPError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 954\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 955\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mHTTPError\u001b[0m: 404 Client Error: Not Found for url: https://www.msyamkumar.com/hello.txttttt" + "Input \u001b[0;32mIn [15]\u001b[0m, in \u001b[0;36m<cell line: 3>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# Instead of using an assert, we often use raise_for_status()\u001b[39;00m\n\u001b[1;32m 2\u001b[0m r \u001b[38;5;241m=\u001b[39m requests\u001b[38;5;241m.\u001b[39mget(typo_url)\n\u001b[0;32m----> 3\u001b[0m \u001b[43mr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraise_for_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m#similar to asserting r.status_code == 200\u001b[39;00m\n\u001b[1;32m 4\u001b[0m r\u001b[38;5;241m.\u001b[39mtext\n", + "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py:960\u001b[0m, in \u001b[0;36mResponse.raise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 957\u001b[0m http_error_msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mu\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m Server Error: \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m for url: \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m'\u001b[39m \u001b[38;5;241m%\u001b[39m (\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstatus_code, reason, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39murl)\n\u001b[1;32m 959\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m http_error_msg:\n\u001b[0;32m--> 960\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HTTPError(http_error_msg, response\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m)\n", + "\u001b[0;31mHTTPError\u001b[0m: 403 Client Error: Forbidden for url: https://cs220.cs.wisc.edu/hello.txttttt" ] } ], @@ -1484,19 +1502,10 @@ "metadata": {}, "outputs": [ { - "ename": "NameError", - "evalue": "name 'HTTPError' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/2028031330.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtypo_url\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#similar to asserting r.status_code == 200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py\u001b[0m in \u001b[0;36mraise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 952\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 953\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mHTTPError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 954\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mHTTPError\u001b[0m: 404 Client Error: Not Found for url: https://www.msyamkumar.com/hello.txttttt", - "\nDuring handling of the above exception, another exception occurred:\n", - "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m/var/folders/k6/kcy8b4f57hx9f1wh4sbs8mn40000gn/T/ipykernel_11873/2028031330.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#similar to asserting r.status_code == 200\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mexcept\u001b[0m \u001b[0mHTTPError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# What's still wrong here?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"oops!!\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mNameError\u001b[0m: name 'HTTPError' is not defined" + "name": "stdout", + "output_type": "stream", + "text": [ + "oops!! 403 Client Error: Forbidden for url: https://cs220.cs.wisc.edu/hello.txttttt\n" ] } ], @@ -1507,15 +1516,23 @@ " r = requests.get(typo_url)\n", " r.raise_for_status() #similar to asserting r.status_code == 200\n", " r.text\n", - "except HTTPError as e: # What's still wrong here?\n", + "except requests.exceptions.HTTPError as e: # What's still wrong here?\n", " print(\"oops!!\", e)" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "oops!! 403 Client Error: Forbidden for url: https://cs220.cs.wisc.edu/hello.txttttt\n" + ] + } + ], "source": [ "# we often need to prepend the names of exceptions with the name of the module\n", "# fix the error from above\n", @@ -1525,8 +1542,7 @@ " r.raise_for_status() #similar to asserting r.status_code == 200\n", " r.text\n", "except requests.HTTPError as e: #correct way to catch the error.\n", - " print(\"oops!!\", e)\n", - " \n" + " print(\"oops!!\", e)" ] }, { @@ -1534,14 +1550,14 @@ "metadata": {}, "source": [ "## requests.get : JSON file example\n", - "- URL: https://www.msyamkumar.com/scores.json\n", + "- URL: https://cs220.cs.wisc.edu/scores.json\n", "- `json.load` (FILE_OBJECT)\n", "- `json.loads` (STRING)" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 18, "metadata": {}, "outputs": [ { @@ -1560,7 +1576,7 @@ ], "source": [ "# GETting a JSON file, the long way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", + "url = \"https://cs220.cs.wisc.edu/scores.json\"\n", "r = requests.get(url)\n", "r.raise_for_status()\n", "urltext = r.text\n", @@ -1571,7 +1587,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 19, "metadata": {}, "outputs": [ { @@ -1584,7 +1600,7 @@ ], "source": [ "# GETting a JSON file, the shortcut way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", + "url = \"https://cs220.cs.wisc.edu/scores.json\"\n", "#Shortcut to bypass using json.loads()\n", "r = requests.get(url)\n", "r.raise_for_status()\n", @@ -1610,1597 +1626,761 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## DEMO: Course Enrollment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explore the API!\n", - "\n", - "https://coletnelson.us/cs220-api/classes\n", - "\n", - "https://coletnelson.us/cs220-api/classes_as_txt\n", - "\n", - "https://coletnelson.us/cs220-api/classes/MATH_221\n", + "### Explore real-world JSON\n", "\n", - "https://coletnelson.us/cs220-api/classes/COMPSCI_200\n", + "How to explore an unknown JSON?\n", + "- If you run into a `dict`, try `.keys()` method to look at the keys of the dictionary, then use lookup process to explore further\n", + "- If you run into a `list`, iterate over the list and print each item\n", "\n", - "... etc\n", - "\n", - "https://coletnelson.us/cs220-api/all_data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the list of classes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `json`" + "### Weather for UW-Madison campus\n", + "- URL: https://api.weather.gov/gridpoints/MKX/37,63/forecast" ] }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "<class 'list'>\n", - "['PSYCH_202', 'COMPSCI_537', 'COMPSCI_300', 'CHEM_104', 'COMPSCI_200', 'MATH_114', 'PSYCH_456', 'COMPSCI_252', 'COMPSCI_400', 'MATH_221', 'BIOLOGY_101', 'COMPSCI_354', 'CHEM_103', 'COMPSCI_639', 'PSYCH_401', 'COMPSCI_240', 'STATS_302']\n" + "<class 'dict'>\n" ] } ], "source": [ - "url = \"https://coletnelson.us/cs220-api/classes\"\n", + "# TODO: GET the forecast\n", + "url = \"https://api.weather.gov/gridpoints/MKX/37,63/forecast\"\n", "r = requests.get(url)\n", "r.raise_for_status()\n", - "classes_list = r.json()\n", - "print(type(classes_list))\n", - "print(classes_list)" + "weather_data = r.json()\n", + "\n", + "# TODO: explore the type of the data structure \n", + "print(type(weather_data))\n", + "\n", + "# display the data\n", + "# weather_data # uncomment to see the whole JSON" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": 21, "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['@context', 'type', 'geometry', 'properties']\n", + "<class 'dict'>\n" + ] + } + ], "source": [ - "#### When the data is `text`" + "# TODO: display the keys of the weather_data dict\n", + "print(list(weather_data.keys()))\n", + "\n", + "# TODO: lookup the value corresponding to the 'properties'\n", + "weather_data[\"properties\"]\n", + "\n", + "# TODO: you know what to do next ... explore type again\n", + "print(type(weather_data[\"properties\"]))" ] }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "<class 'str'>\n", - "PSYCH_202\n", - "COMPSCI_537\n", - "COMPSCI_300\n", - "CHEM_104\n", - "COMPSCI_200\n", - "MATH_114\n", - "PSYCH_456\n", - "COMPSCI_252\n", - "COMPSCI_400\n", - "MATH_221\n", - "BIOLOGY_101\n", - "COMPSCI_354\n", - "CHEM_103\n", - "COMPSCI_639\n", - "PSYCH_401\n", - "COMPSCI_240\n", - "STATS_302\n" + "['updated', 'units', 'forecastGenerator', 'generatedAt', 'updateTime', 'validTimes', 'elevation', 'periods']\n", + "<class 'list'>\n" ] } ], "source": [ - "url = \"https://coletnelson.us/cs220-api/classes_as_txt\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_txt = r.text\n", - "print(type(classes_txt))\n", - "print(classes_txt)" + "# TODO: display the keys of the properties dict\n", + "print(list(weather_data[\"properties\"].keys()))\n", + "\n", + "# TODO: lookup the value corresponding to the 'periods'\n", + "# weather_data[\"properties\"][\"periods\"] # uncomment to see the output\n", + "\n", + "# TODO: you know what to do next ... explore type again\n", + "print(type(weather_data[\"properties\"][\"periods\"]))" ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 23, "metadata": {}, "outputs": [ { "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>number</th>\n", + " <th>name</th>\n", + " <th>startTime</th>\n", + " <th>endTime</th>\n", + " <th>isDaytime</th>\n", + " <th>temperature</th>\n", + " <th>temperatureUnit</th>\n", + " <th>temperatureTrend</th>\n", + " <th>windSpeed</th>\n", + " <th>windDirection</th>\n", + " <th>icon</th>\n", + " <th>shortForecast</th>\n", + " <th>detailedForecast</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>1</td>\n", + " <td>Today</td>\n", + " <td>2022-11-16T08:00:00-06:00</td>\n", + " <td>2022-11-16T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>34</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>5 to 10 mph</td>\n", + " <td>NW</td>\n", + " <td>https://api.weather.gov/icons/land/day/snow,60...</td>\n", + " <td>Snow Showers Likely</td>\n", + " <td>Snow showers likely. Cloudy, with a high near ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>2</td>\n", + " <td>Tonight</td>\n", + " <td>2022-11-16T18:00:00-06:00</td>\n", + " <td>2022-11-17T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>23</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>5 to 10 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/night/snow?...</td>\n", + " <td>Chance Snow Showers</td>\n", + " <td>A chance of snow showers after 9pm. Mostly clo...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>3</td>\n", + " <td>Thursday</td>\n", + " <td>2022-11-17T06:00:00-06:00</td>\n", + " <td>2022-11-17T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>28</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>10 to 15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/snow,30...</td>\n", + " <td>Chance Snow Showers</td>\n", + " <td>A chance of snow showers. Mostly cloudy, with ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>4</td>\n", + " <td>Thursday Night</td>\n", + " <td>2022-11-17T18:00:00-06:00</td>\n", + " <td>2022-11-18T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>17</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/night/bkn?s...</td>\n", + " <td>Mostly Cloudy</td>\n", + " <td>Mostly cloudy, with a low around 17. West wind...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>5</td>\n", + " <td>Friday</td>\n", + " <td>2022-11-18T06:00:00-06:00</td>\n", + " <td>2022-11-18T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>23</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/bkn?siz...</td>\n", + " <td>Mostly Cloudy</td>\n", + " <td>Mostly cloudy, with a high near 23. West wind ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>5</th>\n", + " <td>6</td>\n", + " <td>Friday Night</td>\n", + " <td>2022-11-18T18:00:00-06:00</td>\n", + " <td>2022-11-19T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>12</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>15 mph</td>\n", + " <td>SW</td>\n", + " <td>https://api.weather.gov/icons/land/night/bkn?s...</td>\n", + " <td>Mostly Cloudy</td>\n", + " <td>Mostly cloudy, with a low around 12. Southwest...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>6</th>\n", + " <td>7</td>\n", + " <td>Saturday</td>\n", + " <td>2022-11-19T06:00:00-06:00</td>\n", + " <td>2022-11-19T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>23</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/bkn/sno...</td>\n", + " <td>Mostly Cloudy then Slight Chance Snow Showers</td>\n", + " <td>A slight chance of snow showers after noon. Mo...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7</th>\n", + " <td>8</td>\n", + " <td>Saturday Night</td>\n", + " <td>2022-11-19T18:00:00-06:00</td>\n", + " <td>2022-11-20T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>7</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>10 to 15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/night/cold?...</td>\n", + " <td>Mostly Cloudy</td>\n", + " <td>Mostly cloudy, with a low around 7. West wind ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>8</th>\n", + " <td>9</td>\n", + " <td>Sunday</td>\n", + " <td>2022-11-20T06:00:00-06:00</td>\n", + " <td>2022-11-20T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>23</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>10 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/few?siz...</td>\n", + " <td>Sunny</td>\n", + " <td>Sunny, with a high near 23.</td>\n", + " </tr>\n", + " <tr>\n", + " <th>9</th>\n", + " <td>10</td>\n", + " <td>Sunday Night</td>\n", + " <td>2022-11-20T18:00:00-06:00</td>\n", + " <td>2022-11-21T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>15</td>\n", + " <td>F</td>\n", + " <td>rising</td>\n", + " <td>10 mph</td>\n", + " <td>SW</td>\n", + " <td>https://api.weather.gov/icons/land/night/sct?s...</td>\n", + " <td>Partly Cloudy</td>\n", + " <td>Partly cloudy. Low around 15, with temperature...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>10</th>\n", + " <td>11</td>\n", + " <td>Monday</td>\n", + " <td>2022-11-21T06:00:00-06:00</td>\n", + " <td>2022-11-21T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>34</td>\n", + " <td>F</td>\n", + " <td>falling</td>\n", + " <td>5 to 10 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/few?siz...</td>\n", + " <td>Sunny</td>\n", + " <td>Sunny. High near 34, with temperatures falling...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>11</th>\n", + " <td>12</td>\n", + " <td>Monday Night</td>\n", + " <td>2022-11-21T18:00:00-06:00</td>\n", + " <td>2022-11-22T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>18</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>5 mph</td>\n", + " <td>SW</td>\n", + " <td>https://api.weather.gov/icons/land/night/sct?s...</td>\n", + " <td>Partly Cloudy</td>\n", + " <td>Partly cloudy, with a low around 18.</td>\n", + " </tr>\n", + " <tr>\n", + " <th>12</th>\n", + " <td>13</td>\n", + " <td>Tuesday</td>\n", + " <td>2022-11-22T06:00:00-06:00</td>\n", + " <td>2022-11-22T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>36</td>\n", + " <td>F</td>\n", + " <td>falling</td>\n", + " <td>5 to 10 mph</td>\n", + " <td>SW</td>\n", + " <td>https://api.weather.gov/icons/land/day/sct?siz...</td>\n", + " <td>Mostly Sunny</td>\n", + " <td>Mostly sunny. High near 36, with temperatures ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>13</th>\n", + " <td>14</td>\n", + " <td>Tuesday Night</td>\n", + " <td>2022-11-22T18:00:00-06:00</td>\n", + " <td>2022-11-23T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>26</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>5 mph</td>\n", + " <td>S</td>\n", + " <td>https://api.weather.gov/icons/land/night/bkn?s...</td>\n", + " <td>Mostly Cloudy</td>\n", + " <td>Mostly cloudy, with a low around 26.</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], "text/plain": [ - "['PSYCH_202',\n", - " 'COMPSCI_537',\n", - " 'COMPSCI_300',\n", - " 'CHEM_104',\n", - " 'COMPSCI_200',\n", - " 'MATH_114',\n", - " 'PSYCH_456',\n", - " 'COMPSCI_252',\n", - " 'COMPSCI_400',\n", - " 'MATH_221',\n", - " 'BIOLOGY_101',\n", - " 'COMPSCI_354',\n", - " 'CHEM_103',\n", - " 'COMPSCI_639',\n", - " 'PSYCH_401',\n", - " 'COMPSCI_240',\n", - " 'STATS_302']" + " number name startTime \\\n", + "0 1 Today 2022-11-16T08:00:00-06:00 \n", + "1 2 Tonight 2022-11-16T18:00:00-06:00 \n", + "2 3 Thursday 2022-11-17T06:00:00-06:00 \n", + "3 4 Thursday Night 2022-11-17T18:00:00-06:00 \n", + "4 5 Friday 2022-11-18T06:00:00-06:00 \n", + "5 6 Friday Night 2022-11-18T18:00:00-06:00 \n", + "6 7 Saturday 2022-11-19T06:00:00-06:00 \n", + "7 8 Saturday Night 2022-11-19T18:00:00-06:00 \n", + "8 9 Sunday 2022-11-20T06:00:00-06:00 \n", + "9 10 Sunday Night 2022-11-20T18:00:00-06:00 \n", + "10 11 Monday 2022-11-21T06:00:00-06:00 \n", + "11 12 Monday Night 2022-11-21T18:00:00-06:00 \n", + "12 13 Tuesday 2022-11-22T06:00:00-06:00 \n", + "13 14 Tuesday Night 2022-11-22T18:00:00-06:00 \n", + "\n", + " endTime isDaytime temperature temperatureUnit \\\n", + "0 2022-11-16T18:00:00-06:00 True 34 F \n", + "1 2022-11-17T06:00:00-06:00 False 23 F \n", + "2 2022-11-17T18:00:00-06:00 True 28 F \n", + "3 2022-11-18T06:00:00-06:00 False 17 F \n", + "4 2022-11-18T18:00:00-06:00 True 23 F \n", + "5 2022-11-19T06:00:00-06:00 False 12 F \n", + "6 2022-11-19T18:00:00-06:00 True 23 F \n", + "7 2022-11-20T06:00:00-06:00 False 7 F \n", + "8 2022-11-20T18:00:00-06:00 True 23 F \n", + "9 2022-11-21T06:00:00-06:00 False 15 F \n", + "10 2022-11-21T18:00:00-06:00 True 34 F \n", + "11 2022-11-22T06:00:00-06:00 False 18 F \n", + "12 2022-11-22T18:00:00-06:00 True 36 F \n", + "13 2022-11-23T06:00:00-06:00 False 26 F \n", + "\n", + " temperatureTrend windSpeed windDirection \\\n", + "0 None 5 to 10 mph NW \n", + "1 None 5 to 10 mph W \n", + "2 None 10 to 15 mph W \n", + "3 None 15 mph W \n", + "4 None 15 mph W \n", + "5 None 15 mph SW \n", + "6 None 15 mph W \n", + "7 None 10 to 15 mph W \n", + "8 None 10 mph W \n", + "9 rising 10 mph SW \n", + "10 falling 5 to 10 mph W \n", + "11 None 5 mph SW \n", + "12 falling 5 to 10 mph SW \n", + "13 None 5 mph S \n", + "\n", + " icon \\\n", + "0 https://api.weather.gov/icons/land/day/snow,60... \n", + "1 https://api.weather.gov/icons/land/night/snow?... \n", + "2 https://api.weather.gov/icons/land/day/snow,30... \n", + "3 https://api.weather.gov/icons/land/night/bkn?s... \n", + "4 https://api.weather.gov/icons/land/day/bkn?siz... \n", + "5 https://api.weather.gov/icons/land/night/bkn?s... \n", + "6 https://api.weather.gov/icons/land/day/bkn/sno... \n", + "7 https://api.weather.gov/icons/land/night/cold?... \n", + "8 https://api.weather.gov/icons/land/day/few?siz... \n", + "9 https://api.weather.gov/icons/land/night/sct?s... \n", + "10 https://api.weather.gov/icons/land/day/few?siz... \n", + "11 https://api.weather.gov/icons/land/night/sct?s... \n", + "12 https://api.weather.gov/icons/land/day/sct?siz... \n", + "13 https://api.weather.gov/icons/land/night/bkn?s... \n", + "\n", + " shortForecast \\\n", + "0 Snow Showers Likely \n", + "1 Chance Snow Showers \n", + "2 Chance Snow Showers \n", + "3 Mostly Cloudy \n", + "4 Mostly Cloudy \n", + "5 Mostly Cloudy \n", + "6 Mostly Cloudy then Slight Chance Snow Showers \n", + "7 Mostly Cloudy \n", + "8 Sunny \n", + "9 Partly Cloudy \n", + "10 Sunny \n", + "11 Partly Cloudy \n", + "12 Mostly Sunny \n", + "13 Mostly Cloudy \n", + "\n", + " detailedForecast \n", + "0 Snow showers likely. Cloudy, with a high near ... \n", + "1 A chance of snow showers after 9pm. Mostly clo... \n", + "2 A chance of snow showers. Mostly cloudy, with ... \n", + "3 Mostly cloudy, with a low around 17. West wind... \n", + "4 Mostly cloudy, with a high near 23. West wind ... \n", + "5 Mostly cloudy, with a low around 12. Southwest... \n", + "6 A slight chance of snow showers after noon. Mo... \n", + "7 Mostly cloudy, with a low around 7. West wind ... \n", + "8 Sunny, with a high near 23. \n", + "9 Partly cloudy. Low around 15, with temperature... \n", + "10 Sunny. High near 34, with temperatures falling... \n", + "11 Partly cloudy, with a low around 18. \n", + "12 Mostly sunny. High near 36, with temperatures ... \n", + "13 Mostly cloudy, with a low around 26. " ] }, - "execution_count": 21, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "classes_txt_as_list = classes_txt.split('\\n')\n", - "classes_txt_as_list" + "# TODO: extract periods list into a variable\n", + "periods_list = weather_data[\"properties\"][\"periods\"]\n", + "\n", + "# TODO: create a DataFrame using periods_list\n", + "# TODO: What does each inner data structure represent in your DataFrame?\n", + "# Keep in mind that outer data structure is a list.\n", + "# A. rows (because outer data structure is a list)\n", + "periods_df = DataFrame(periods_list)\n", + "periods_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Get data for a specific class" + "#### What is the maximum and minimum observed temperatures? Include the temperatureUnit in your display" ] }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "<class 'dict'>\n", - "{'credits': 3, 'description': 'Learn the process of incrementally developing small (200-500 lines) programs along with the fundamental Computer Science topics. These topics include: problem abstraction and decomposition, the edit-compile-run cycle, using variables of primitive and more complex data types, conditional and loop-based flow control, basic testing and debugging techniques, how to define and call functions (methods), and IO processing techniques. Also teaches and reinforces good programming practices including the use of a consistent style, and meaningful documentation. Intended for students who have no prior programming experience.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 1', 'number': 'COMPSCI_200', 'requisites': [], 'sections': [{'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_311'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '11:00am - 12:15pm'}, 'number': 'LAB_312'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_314'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_315'}], 'time': {'thursday': '8:00am - 9:15am', 'tuesday': '8:00am - 9:15am'}, 'number': 'LEC_001'}, {'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_321'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '1:00pm - 2:15pm'}, 'number': 'LAB_323'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_324'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_325'}], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_331'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_332'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_333'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_334'}], 'time': {'friday': '1:20pm - 2:10pm', 'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'LEC_003'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_341'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_342'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_343'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_344'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '4:00pm - 5:15pm'}, 'number': 'LAB_345'}], 'time': {'friday': '3:30pm - 4:20pm', 'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}\n" + "Minimum observed temperature is: 7 degree F\n", + "Maximum observed temperature is: 36 degree F\n" ] } ], "source": [ - "url = \"https://coletnelson.us/cs220-api/classes/COMPSCI_200\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "cs200_data = r.json()\n", - "print(type(cs200_data))\n", - "print(cs200_data) # Too much data? Try print(cs220_data.keys())" + "min_temp = periods_df[\"temperature\"].min()\n", + "idx_min = periods_df[\"temperature\"].idxmin()\n", + "min_unit = periods_df.loc[idx_min, \"temperatureUnit\"]\n", + "\n", + "max_temp = periods_df[\"temperature\"].max()\n", + "idx_max = periods_df[\"temperature\"].idxmax()\n", + "max_unit = periods_df.loc[idx_max, \"temperatureUnit\"]\n", + "\n", + "print(\"Minimum observed temperature is: {} degree {}\".format(min_temp, min_unit))\n", + "print(\"Maximum observed temperature is: {} degree {}\".format(max_temp, max_unit))" ] }, { - "cell_type": "code", - "execution_count": 23, + "cell_type": "markdown", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "dict_keys(['credits', 'description', 'keywords', 'name', 'number', 'requisites', 'sections', 'subject'])" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], "source": [ - "cs200_data.keys()" + "#### Which days `detailedForecast` contains `snow`?" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 25, "metadata": {}, "outputs": [ { "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>number</th>\n", + " <th>name</th>\n", + " <th>startTime</th>\n", + " <th>endTime</th>\n", + " <th>isDaytime</th>\n", + " <th>temperature</th>\n", + " <th>temperatureUnit</th>\n", + " <th>temperatureTrend</th>\n", + " <th>windSpeed</th>\n", + " <th>windDirection</th>\n", + " <th>icon</th>\n", + " <th>shortForecast</th>\n", + " <th>detailedForecast</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>1</td>\n", + " <td>Today</td>\n", + " <td>2022-11-16T08:00:00-06:00</td>\n", + " <td>2022-11-16T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>34</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>5 to 10 mph</td>\n", + " <td>NW</td>\n", + " <td>https://api.weather.gov/icons/land/day/snow,60...</td>\n", + " <td>Snow Showers Likely</td>\n", + " <td>Snow showers likely. Cloudy, with a high near ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>2</td>\n", + " <td>Tonight</td>\n", + " <td>2022-11-16T18:00:00-06:00</td>\n", + " <td>2022-11-17T06:00:00-06:00</td>\n", + " <td>False</td>\n", + " <td>23</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>5 to 10 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/night/snow?...</td>\n", + " <td>Chance Snow Showers</td>\n", + " <td>A chance of snow showers after 9pm. Mostly clo...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>3</td>\n", + " <td>Thursday</td>\n", + " <td>2022-11-17T06:00:00-06:00</td>\n", + " <td>2022-11-17T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>28</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>10 to 15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/snow,30...</td>\n", + " <td>Chance Snow Showers</td>\n", + " <td>A chance of snow showers. Mostly cloudy, with ...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>6</th>\n", + " <td>7</td>\n", + " <td>Saturday</td>\n", + " <td>2022-11-19T06:00:00-06:00</td>\n", + " <td>2022-11-19T18:00:00-06:00</td>\n", + " <td>True</td>\n", + " <td>23</td>\n", + " <td>F</td>\n", + " <td>None</td>\n", + " <td>15 mph</td>\n", + " <td>W</td>\n", + " <td>https://api.weather.gov/icons/land/day/bkn/sno...</td>\n", + " <td>Mostly Cloudy then Slight Chance Snow Showers</td>\n", + " <td>A slight chance of snow showers after noon. Mo...</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], "text/plain": [ - "3" + " number name startTime endTime \\\n", + "0 1 Today 2022-11-16T08:00:00-06:00 2022-11-16T18:00:00-06:00 \n", + "1 2 Tonight 2022-11-16T18:00:00-06:00 2022-11-17T06:00:00-06:00 \n", + "2 3 Thursday 2022-11-17T06:00:00-06:00 2022-11-17T18:00:00-06:00 \n", + "6 7 Saturday 2022-11-19T06:00:00-06:00 2022-11-19T18:00:00-06:00 \n", + "\n", + " isDaytime temperature temperatureUnit temperatureTrend windSpeed \\\n", + "0 True 34 F None 5 to 10 mph \n", + "1 False 23 F None 5 to 10 mph \n", + "2 True 28 F None 10 to 15 mph \n", + "6 True 23 F None 15 mph \n", + "\n", + " windDirection icon \\\n", + "0 NW https://api.weather.gov/icons/land/day/snow,60... \n", + "1 W https://api.weather.gov/icons/land/night/snow?... \n", + "2 W https://api.weather.gov/icons/land/day/snow,30... \n", + "6 W https://api.weather.gov/icons/land/day/bkn/sno... \n", + "\n", + " shortForecast \\\n", + "0 Snow Showers Likely \n", + "1 Chance Snow Showers \n", + "2 Chance Snow Showers \n", + "6 Mostly Cloudy then Slight Chance Snow Showers \n", + "\n", + " detailedForecast \n", + "0 Snow showers likely. Cloudy, with a high near ... \n", + "1 A chance of snow showers after 9pm. Mostly clo... \n", + "2 A chance of snow showers. Mostly cloudy, with ... \n", + "6 A slight chance of snow showers after noon. Mo... " ] }, - "execution_count": 24, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Get the number of credits the course is worth\n", - "cs200_data['credits']" + "# What courses contain the keyword \"programming\"?\n", + "snow_days_df = periods_df[periods_df[\"detailedForecast\"].str.contains(\"snow\")]\n", + "snow_days_df" ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "['computer', 'science', 'programming', 'java']" + "0 Today\n", + "1 Tonight\n", + "2 Thursday\n", + "6 Saturday\n", + "Name: name, dtype: object" ] }, - "execution_count": 25, + "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Get the list of keywords for the course\n", - "cs200_data['keywords']" + "snow_days_df[\"name\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Which day's `detailedForecast` has the most lengthy description?" ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "'Programming 1'" + "'Thursday'" ] }, - "execution_count": 26, + "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Get the official course name\n", - "cs200_data['name']" + "idx_max_desc = periods_df[\"detailedForecast\"].str.len().idxmax()\n", + "periods_df.iloc[idx_max_desc]['name']" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "4" + "'A chance of snow showers. Mostly cloudy, with a high near 28. West wind 10 to 15 mph, with gusts as high as 25 mph. Chance of precipitation is 30%. New snow accumulation of less than half an inch possible.'" ] }, - "execution_count": 27, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Get the number of sections offered.\n", - "len(cs200_data['sections'])" + "# What was that forecast?\n", + "periods_df.iloc[idx_max_desc]['detailedForecast']" ] }, { - "cell_type": "code", - "execution_count": 28, + "cell_type": "markdown", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[{'credits': 3, 'description': 'Behavior, including its development, motivation, frustrations, emotion, intelligence, learning, forgetting, personality, language, thinking, and social behavior.', 'keywords': ['psychology', 'behavior', 'emotion', 'intelligence', 'brain'], 'name': 'Introduction to Psychology', 'number': 'PSYCH_202', 'requisites': [], 'sections': [{'instructor': 'Jeff Henriques', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '9:30am - 10:45am', 'tuesday': '9:30am - 10:45am'}, 'number': 'LEC_001'}, {'instructor': 'Jeff Henriques', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_002'}, {'instructor': 'C. Shawn Green', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'monday': '8:00am - 9:15am', 'wednesday': '8:00am - 9:15am'}, 'number': 'LEC_003'}, {'instructor': 'Patti Coffey', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_004'}, {'instructor': 'Sarah Gavac', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_005'}, {'instructor': 'Patti Coffey', 'location': '101 Brogden Psychology Building', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_006'}, {'instructor': 'Baoyu Wang', 'location': '105 Brogden Psychology Building', 'subsections': [], 'time': {'monday': '4:30pm - 5:15pm', 'wednesday': '4:30pm - 5:15pm'}, 'number': 'LEC_009'}], 'subject': 'Psychology'}, {'credits': 4, 'description': 'Input-output hardware, interrupt handling, properties of magnetic tapes, discs and drums, associative memories and virtual address translation techniques. Batch processing, time sharing and real-time systems, scheduling resource allocation, modular software systems, performance measurement and system evaluation.', 'keywords': ['computer', 'science', 'operating', 'system', 'systems'], 'name': 'Introduction to Operating Systems', 'number': 'COMPSCI_537', 'requisites': [['COMPSCI_354', 'COMPSCI_400']], 'sections': [{'instructor': 'Andrea Arpaci-Dusseau', 'location': '1125 DeLuca Biochemistry Building', 'subsections': [{'location': '2317 Engineering Hall', 'time': {'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_301'}, {'location': '1325 Computer Sciences and Statistics', 'time': {'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_302'}, {'location': '1325 Computer Sciences and Statistics', 'time': {'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_303'}, {'location': '2255 Engineering Hall', 'time': {'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_304'}, {'location': '1325 Computer Sciences and Statistics', 'time': {'wednesday': '4:15pm - 5:25pm'}, 'number': 'DIS_305'}], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_001'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'Introduces students to Object-Oriented Programming using classes and objects to solve more complex problems. Introduces array-based and linked data structures: including lists, stacks, and queues. Programming assignments require writing and developing multi-class (file) programs using interfaces, generics, and exception handling to solve challenging real world problems. Topics reviewed include reading/writing data and objects from/to files and exception handling, and command line arguments. Topics introduced: object-oriented design; class vs. object; create and define interfaces and iterators; searching and sorting; abstract data types (List,Stack,Queue,PriorityQueue(Heap),Binary Search Tree); generic interfaces (parametric polymorphism); how to design and write test methods and classes; array based vs. linked node implementations; introduction to complexity analysis; recursion.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 2', 'number': 'COMPSCI_300', 'requisites': [['COMPSCI_200']], 'sections': [{'instructor': 'Gary Dahl', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'Gary Dahl', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'friday': '11:00am - 11:50pm', 'monday': '11:00am - 11:50pm', 'wednesday': '11:00am - 11:50pm'}, 'number': 'LEC_003'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': '1310 Sterling Hall', 'subsections': [], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}, {'credits': 5, 'description': 'Principles and application of chemical equilibrium, coordination chemistry, oxidation-reduction and electrochemistry, kinetics, nuclear chemistry, introduction to organic chemistry. Lecture, lab, and discussion.', 'keywords': ['chemistry'], 'name': 'General Chemistry II', 'number': 'CHEM_104', 'requisites': [['MATH_114'], ['CHEM_103']], 'sections': [{'instructor': 'Linda Zelewski', 'location': 'B10 Ingraham Hall', 'subsections': [{'location': '123 Van Hise Hall', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_401'}, {'location': '123 Van Hise Hall', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '12:05pm - 12:55pm', 'tuesday': '12:05pm - 12:55pm'}, 'number': 'DIS_402'}, {'location': 'B387 Chemistry Building', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_403'}, {'location': 'B387 Chemistry Building', 'time': {'monday': '2:25pm - 5:25pm', 'thursday': '12:05pm - 12:55pm', 'tuesday': '12:05pm - 12:55pm'}, 'number': 'DIS_404'}], 'time': {'thursday': '9:30am - 10:45am', 'tuesday': '9:30am - 10:45am'}, 'number': 'LEC_001'}, {'instructor': 'Lea Gustin', 'location': '204 Educational Sciences', 'subsections': [{'location': '2377 Chemistry Building', 'time': {'monday': '9:55am - 10:45am', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_421'}, {'location': '2377 Chemistry Building', 'time': {'monday': '11:00am - 11:50am', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_422'}, {'location': '2381 Chemistry Building', 'time': {'monday': '11:00am - 11:50am', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_423'}, {'location': '2377 Chemistry Building', 'time': {'monday': '12:05pm - 12:55pm', 'tuesday': '5:40pm - 8:40pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_424'}], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}], 'subject': 'Chemistry'}, {'credits': 3, 'description': 'Learn the process of incrementally developing small (200-500 lines) programs along with the fundamental Computer Science topics. These topics include: problem abstraction and decomposition, the edit-compile-run cycle, using variables of primitive and more complex data types, conditional and loop-based flow control, basic testing and debugging techniques, how to define and call functions (methods), and IO processing techniques. Also teaches and reinforces good programming practices including the use of a consistent style, and meaningful documentation. Intended for students who have no prior programming experience.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 1', 'number': 'COMPSCI_200', 'requisites': [], 'sections': [{'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_311'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '11:00am - 12:15pm'}, 'number': 'LAB_312'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_314'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_315'}], 'time': {'thursday': '8:00am - 9:15am', 'tuesday': '8:00am - 9:15am'}, 'number': 'LEC_001'}, {'instructor': 'Jim Williams', 'location': '132 Noland Hall', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '9:30am - 10:45am'}, 'number': 'LAB_321'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '1:00pm - 2:15pm'}, 'number': 'LAB_323'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '2:30pm - 3:45pm'}, 'number': 'LAB_324'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'wednesday': '4:00pm - 5:15pm'}, 'number': 'LAB_325'}], 'time': {'thursday': '11:00am - 12:15pm', 'tuesday': '11:00am - 12:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_331'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_332'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_333'}, {'location': '1350 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_334'}], 'time': {'friday': '1:20pm - 2:10pm', 'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'LEC_003'}, {'instructor': 'Marc Renault', 'location': '113 Brogden Psychology Building', 'subsections': [{'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '9:30am - 10:45am'}, 'number': 'LAB_341'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '11:00am - 12:15pm'}, 'number': 'LAB_342'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '1:00pm - 2:15pm'}, 'number': 'LAB_343'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '2:30pm - 3:45pm'}, 'number': 'LAB_344'}, {'location': '1370 Computer Sciences and Statistics', 'time': {'tuesday': '4:00pm - 5:15pm'}, 'number': 'LAB_345'}], 'time': {'friday': '3:30pm - 4:20pm', 'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}, {'credits': 5, 'description': 'The two semester sequence MATH_112-MATH_113 covers similar material as MATH_114, but in a slower pace.', 'keywords': ['math', 'mathematics', 'algebra', 'trigonometry'], 'name': 'Algebra and Trigonometry', 'number': 'MATH_114', 'requisites': [], 'sections': [{'instructor': 'Sharad Chandarana', 'location': 'B130 Van Vleck Hall', 'subsections': [{'location': 'B113 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_301'}, {'location': 'B113 Van Vleck Hall', 'time': {'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'DIS_303'}, {'location': 'B219 Van Vleck Hall', 'time': {'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'DIS_304'}, {'location': 'B113 Van Vleck Hall', 'time': {'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_305'}, {'location': 'B219 Van Vleck Hall', 'time': {'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_306'}, {'location': 'B341 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_307'}, {'location': 'B317 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_308'}, {'location': 'B341 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_309'}, {'location': 'B329 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_310'}, {'location': 'B317 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_311'}], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'Sharad Chandarana', 'location': '19 Ingraham Hall', 'subsections': [{'location': '591 Van Hise Hall', 'time': {'thursday': '8:50am - 9:40am', 'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_321'}, {'location': 'B219 Van Vleck Hall', 'time': {'thursday': '9:55am - 10:45am', 'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_322'}, {'location': '4020 Vilas Hall', 'time': {'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_323'}, {'location': '599 Van Hise Hall', 'time': {'thursday': '11:00am - 11:50am', 'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_324'}, {'location': 'B341 Van Vleck Hall', 'time': {'thursday': '1:20pm - 2:10pm', 'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_325'}, {'location': '223 Van Hise Hall', 'time': {'thursday': '1:20pm - 2:10pm', 'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_326'}, {'location': '223 Van Hise Hall', 'time': {'thursday': '2:25pm - 3:15pm', 'tuesday': '2:25pm - 3:15pm'}, 'number': 'DIS_328'}, {'location': 'B219 Van Vleck Hall', 'time': {'thursday': '3:30pm - 4:20pm', 'tuesday': '3:30pm - 4:20pm'}, 'number': 'DIS_329'}, {'location': 'B341 Van Vleck Hall', 'time': {'thursday': '3:30pm - 4:20pm', 'tuesday': '3:30pm - 4:20pm'}, 'number': 'DIS_330'}], 'time': {'friday': '8:50am - 9:40am', 'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'LEC_002'}], 'subject': 'Mathematics'}, {'credits': 4, 'description': 'The systematic study of the individual in a social context, including social interaction, motivation, attitudes, conformity, communication, leadership, personal relationships, and behavior in small groups.', 'keywords': ['psychology', 'science', 'social', 'interaction', 'behavior'], 'name': 'Introductory Social Psychology', 'number': 'PSYCH_456', 'requisites': [['PSYCH_202']], 'sections': [{'instructor': 'Abigail Letak', 'location': '6104 Sewell Social Sciences', 'subsections': [{'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_301'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_302'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_303'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_304'}, {'location': '6121 Sewell Social Sciences', 'time': {'tuesday': '2:25pm - 3:15pm'}, 'number': 'DIS_305'}], 'time': {'friday': '11:00am - 11:50am', 'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'LEC_001'}], 'subject': 'Psychology'}, {'credits': 2, 'description': 'Logic components built with transistors, rudimentary Boolean algebra, basic combinational logic design, basic synchronous sequential logic design, basic computer organization and design, introductory machine- and assembly-language programming.', 'keywords': ['computer', 'science', 'engineering', 'programming'], 'name': 'Introduction to Computer Engineering', 'number': 'COMPSCI_252', 'requisites': [], 'sections': [{'instructor': 'Joseph Krachey', 'location': '1610 Engineering Hall', 'subsections': [], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_001'}, {'instructor': 'Adil Ibrahim', 'location': '113 Brogden Psychology Building', 'subsections': [], 'time': {'friday': '8:50am - 9:40am', 'monday': '8:50am - 9:40am', 'wednesday': '8:50am - 9:40am'}, 'number': 'LEC_002'}, {'instructor': 'Adil Ibrahim', 'location': '113 Brogden Psychology Building', 'subsections': [], 'time': {'friday': '12:05pm - 12:55pm', 'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'LEC_005'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'The third course in our programming fundamentals sequence. It presumes that students understand and use functional and object-oriented design and abstract data types as needed. This course introduces balanced search trees, graphs, graph traversal algorithms, hash tables and sets, and complexity analysis and about classes of problems that require each data type. Students are required to design and implement using high quality professional code, a medium sized program, that demonstrates knowledge and use of latest language features, tools, and conventions. Additional topics introduced will include as needed for projects: inheritance and polymorphism; anonymous inner classes, lambda functions, performance analysis to discover and optimize critical code blocks. Students learn about industry standards for code development. Students will design and implement a medium size project with a more advanced user-interface design, such as a web or mobile application with a GUI and event- driven implementation; use of version-control software.', 'keywords': ['computer', 'science', 'programming', 'java'], 'name': 'Programming 3', 'number': 'COMPSCI_400', 'requisites': [['COMPSCI_300']], 'sections': [{'instructor': 'Gary Dahl', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'Gary Dahl', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': 'AB20 Weeks Hall for Geological Sciences', 'subsections': [], 'time': {'friday': '11:00am - 11:50pm', 'monday': '11:00am - 11:50pm', 'wednesday': '11:00am - 11:50pm'}, 'number': 'LEC_003'}, {'instructor': 'Mouna Ayari Ben Hadj Kacem', 'location': '1310 Sterling Hall', 'subsections': [], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_004'}], 'subject': 'Computer Science'}, {'credits': 5, 'description': 'Introduction to differential and integral calculus and plane analytic geometry; applications; transcendental functions.', 'keywords': ['math', 'mathematics', 'calculus', 'analytical', 'geometry', 'differential', 'integral'], 'name': 'Calculus and Analytical Geometry 1', 'number': 'MATH_221', 'requisites': [['MATH_114']], 'sections': [{'instructor': 'Laurentiu Maxim', 'location': '6210 Sewell Social Sciences', 'subsections': [{'location': 'B231 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_301'}, {'location': 'B215 Van Vleck Hall', 'time': {'monday': '7:45am - 8:35am', 'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_302'}, {'location': 'B309 Van Vleck Hall', 'time': {'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_303'}, {'location': 'B211 Van Vleck Hall', 'time': {'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_304'}, {'location': 'B129 Van Vleck Hall', 'time': {'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_305'}, {'location': 'B131 Van Vleck Hall', 'time': {'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_306'}, {'location': 'B231 Van Vleck Hall', 'time': {'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_307'}, {'location': 'B215 Van Vleck Hall', 'time': {'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'DIS_308'}, {'location': 'B313 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_309'}, {'location': 'B309 Van Vleck Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_310'}, {'location': 'B305 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_311'}, {'location': 'B105 Van Vleck Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_312'}, {'location': 'B321 Van Vleck Hall', 'time': {'friday': '9:55am - 10:45am', 'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'DIS_313'}], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_001'}], 'subject': 'Mathematics'}, {'credits': 3, 'description': 'General biological principles. Topics include: evolution, ecology, animal behavior, cell structure and function, genetics and molecular genetics and the physiology of a variety of organ systems emphasizing function in humans.', 'keywords': ['biology', 'science', 'animal', 'evolution', 'genetics', 'ecology'], 'name': 'Animal Biology', 'number': 'BIOLOGY_101', 'requisites': [], 'sections': [{'instructor': 'Sharon Thoma', 'location': '272 Bascom Hall', 'subsections': [], 'time': {'friday': '11:00am - 11:50am', 'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'LEC_001'}, {'instructor': 'Sharon Thoma', 'location': '272 Bascom Hall', 'subsections': [], 'time': {'friday': '12:05pm - 12:55pm', 'monday': '12:05pm - 12:55pm', 'wednesday': '12:05pm - 12:55pm'}, 'number': 'LEC_002'}], 'subject': 'Biology'}, {'credits': 3, 'description': 'An introduction to fundamental structures of computer systems and the C programming language with a focus on the low-level interrelationships and impacts on performance. Topics include the virtual address space and virtual memory, the heap and dynamic memory management, the memory hierarchy and caching, assembly language and the stack, communication and interrupts/signals, compiling and assemblers/linkers.', 'keywords': ['computer', 'science', 'engineering', 'electrical', 'machine', 'programming'], 'name': 'Machine Organization and Programming', 'number': 'COMPSCI_354', 'requisites': [['COMPSCI_252'], ['COMPSCI_300']], 'sections': [{'instructor': 'James Skrentny', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '2:30pm - 3:45pm', 'tuesday': '2:30pm - 3:45pm'}, 'number': 'LEC_001'}, {'instructor': 'James Skrentny', 'location': '132 Noland Hall', 'subsections': [], 'time': {'thursday': '4:00pm - 5:15pm', 'tuesday': '4:00pm - 5:15pm'}, 'number': 'LEC_002'}], 'subject': 'Computer Science'}, {'credits': 4, 'description': 'Introduction. Stoichiometry and the mole concept, the behavior of gases, liquids and solids, thermochemistry, electronic structure of atoms and chemical bonding, descriptive chemistry of selected elements and compounds, intermolecular forces. For students taking one year or more of college chemistry; serves as a prereq for CHEM_104; lecture, lab and discussion.', 'keywords': ['chemistry'], 'name': 'General Chemistry I', 'number': 'CHEM_103', 'requisites': [], 'sections': [{'instructor': 'Unknown', 'location': 'B10 Ingraham Hall', 'subsections': [{'location': '49 Sellery Residence Hall', 'time': {'monday': '3:30pm - 4:20pm', 'wednesday': '3:30pm - 4:20pm'}, 'number': 'DIS_301'}, {'location': '2307 Chemistry Building', 'time': {'monday': '4:35pm - 5:25pm', 'wednesday': '4:35pm - 5:25pm'}, 'number': 'DIS_302'}, {'location': '123 Van Hise Hall', 'time': {'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'DIS_303'}, {'location': '123 Van Hise Hall', 'time': {'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'DIS_304'}], 'time': {'friday': '11:00am - 11:50am', 'monday': '11:00am - 11:50am', 'wednesday': '11:00am - 11:50am'}, 'number': 'LEC_001'}], 'subject': 'Chemistry'}, {'credits': 3, 'description': 'This course introduces students to the software development of user interfaces (UIs). Topics covered include state-of-the-art (1) UI paradigms, such as event-driven interfaces, direct-manipulation interfaces, and dialogue-based interaction; (2) methods for capturing, interpreting, and responding to different forms of user input and states, including pointing, text entry, speech, touch, gestures, user activity, context, and physiological states; and (3) platform-specific UI development APIs, frameworks, and toolkits for platforms including web/mobile/desktop interfaces, natural user interfaces, and voice user interfaces. Through readings, lectures, and hands-on-activities, students will learn about the fundamental concepts, technologies, and methods in building user interfaces. Assignments will provide an opportunity to gain hands-on experience in the use of state-of-the-art UI development tools and build a UI development portfolio.', 'keywords': ['computer', 'science', 'building', 'user', 'interface', 'interfaces', 'design', 'ui'], 'name': 'Building User Interfaces', 'number': 'COMPSCI_639', 'requisites': [['COMPSCI_300']], 'sections': [{'instructor': 'Bilge Mutlu', 'location': '1221 Computer Sciences and Statistics', 'subsections': [], 'time': {'thursday': '1:00pm - 2:15pm', 'tuesday': '1:00pm - 2:15pm'}, 'number': 'LEC_002'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'Focuses on the role that psychological principles, research evidence and social science play in the laws of U.S. society, especially in the policies and mechanisms of social control of human behavior. The course will address the ways that society defines membership, and the role of psychology in how it determines who should be excluded or restricted from open society, in order to maintain a more civil society. In addition to learning the factual information about how selected processes work in the legal and social context, students will be asked to consider the role they can play as citizens in supporting or changing these social processes. The course will take a particular interest in psycholegal issues \"in action\" and in learning about the clinical-legal processes used to determine the disposition of individuals considered marginal in society. Finally, the course will address the mechanisms that are used to exclude individuals from open society through criminal and civil court processes, the role of psychology as a science, and the role of psychologists as behavioral experts in criminal and civil courts, and in shaping social policies.', 'keywords': ['psychology', 'science', 'law', 'social', 'policy', 'behavior'], 'name': 'Psychology, Law, and Social Policy', 'number': 'PSYCH_401', 'requisites': [['PSYCH_202']], 'sections': [{'instructor': 'Gregory Van Rybroek', 'location': '121 Brogden Psychology Building', 'subsections': [], 'time': {'monday': '4:00pm - 5:15pm', 'wednesday': '4:00pm - 5:15pm'}, 'number': 'LEC_001'}], 'subject': 'Psychology'}, {'credits': 3, 'description': 'Basic concepts of logic, sets, partial order and other relations, and functions. Basic concepts of mathematics (definitions, proofs, sets, functions, and relations) with a focus on discrete structures: integers, bits, strings, trees, and graphs. Propositional logic, Boolean algebra, and predicate logic. Mathematical induction and recursion. Invariants and algorithmic correctness. Recurrences and asymptotic growth analysis. Fundamentals of counting.', 'keywords': ['computer', 'science', 'math', 'mathematics', 'discrete', 'logic', 'algorithm', 'algorithms'], 'name': 'Introduction To Discrete Mathematics', 'number': 'COMPSCI_240', 'requisites': [['MATH_221']], 'sections': [{'instructor': 'Beck Hasti', 'location': '105 Brogden Psychology Building', 'subsections': [{'location': '1257 Computer Sciences and Statistics', 'time': {'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_310'}, {'location': '1257 Computer Sciences and Statistics', 'time': {'thursday': '8:50am - 9:40am'}, 'number': 'DIS_311'}, {'location': '3024 Engineering Hall', 'time': {'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_312'}, {'location': '2345 Engineering Hall', 'time': {'thursday': '9:55am - 10:45am'}, 'number': 'DIS_313'}, {'location': '2535 Engineering Hall', 'time': {'tuesday': '11:00am - 11:50am'}, 'number': 'DIS_314'}, {'location': '2535 Engineering Hall', 'time': {'thursday': '11:00am - 11:50am'}, 'number': 'DIS_315'}, {'location': 'B309 Van Vleck Hall', 'time': {'tuesday': '9:55am - 10:45am'}, 'number': 'DIS_316'}], 'time': {'friday': '9:55am - 10:45am', 'monday': '9:55am - 10:45am', 'wednesday': '9:55am - 10:45am'}, 'number': 'LEC_001'}, {'instructor': 'Beck Hasti', 'location': '132 Noland Hall', 'subsections': [{'location': 'B211 Van Vleck Hall', 'time': {'thursday': '11:00am - 11:50am'}, 'number': 'DIS_320'}, {'location': 'B211 Van Vleck Hall', 'time': {'tuesday': '12:05pm - 12:55pm'}, 'number': 'DIS_321'}, {'location': '2255 Engineering Hall', 'time': {'thursday': '12:05pm - 12:55pm'}, 'number': 'DIS_322'}, {'location': '2349 Engineering Hall', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_323'}, {'location': '1263 Computer Sciences and Statistics', 'time': {'thursday': '1:20pm - 2:10pm'}, 'number': 'DIS_324'}, {'location': '3418 Engineering Hall', 'time': {'tuesday': '2:25pm - 3:15pm'}, 'number': 'DIS_325'}, {'location': '3418 Engineering Hall', 'time': {'thursday': '2:25pm - 3:15pm'}, 'number': 'DIS_326'}], 'time': {'friday': '1:20pm - 2:10pm', 'monday': '1:20pm - 2:10pm', 'wednesday': '1:20pm - 2:10pm'}, 'number': 'LEC_002'}, {'instructor': 'Beck Hasti', 'location': '168 Noland Hall', 'subsections': [{'location': '1263 Computer Sciences and Statistics', 'time': {'tuesday': '8:50am - 9:40am'}, 'number': 'DIS_330'}, {'location': '1263 Computer Sciences and Statistics', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_331'}, {'location': '3024 Engineering Hall', 'time': {'thursday': '9:55am - 10:45am'}, 'number': 'DIS_332'}, {'location': '2349 Engineering Hall', 'time': {'thursday': '12:05am - 12:55am'}, 'number': 'DIS_333'}], 'time': {'friday': '2:25pm - 3:15pm', 'monday': '2:25pm - 3:15pm', 'wednesday': '2:25pm - 3:15pm'}, 'number': 'LEC_003'}], 'subject': 'Computer Science'}, {'credits': 3, 'description': 'Graphical and numerical exploration of data; standard errors; distributions for statistical models including binomial, Poisson, normal; estimation; hypothesis testing; randomization tests; basic principles of experimental design; regression; ANOVA; categorical data analysis; goodness of fit; application. (intended for students wishing to take additional statistics courses).', 'keywords': ['statistics', 'statistical', 'math', 'mathematics', 'methods'], 'name': 'Accelerated Introduction to Statistical Methods', 'number': 'STATS_302', 'requisites': [['MATH_221']], 'sections': [{'instructor': 'Unknown', 'location': '331 Service Memorial Institute', 'subsections': [{'location': '212 Educational Sciences', 'time': {'tuesday': '1:20pm - 2:10pm'}, 'number': 'DIS_311'}, {'location': '1313 Sterling Hall', 'time': {'wednesday': '7:45am - 8:35am'}, 'number': 'DIS_312'}, {'location': '1313 Sterling Hall', 'time': {'wednesday': '11:00am - 11:50am'}, 'number': 'DIS_313'}], 'time': {'monday': '4:00pm - 5:15pm', 'wednesday': '4:00pm - 5:15pm'}, 'number': 'LEC_001'}], 'subject': 'Statistics'}]\n" - ] - } - ], "source": [ - "# Collect all the class data in a list called 'all_class_data'\n", - "all_class_data = []\n", - "for class_num in classes_list:\n", - " url = \"https://coletnelson.us/cs220-api/classes/\" + class_num\n", - " r = requests.get(url)\n", - " r.raise_for_status()\n", - " class_data = r.json()\n", - " all_class_data.append(class_data)\n", - "\n", - "print(all_class_data) # Too much data? Try print(len(all_class_data))" + "### Write it out to a CSV file on your drive\n", + "You now have your own copy!" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "17\n" - ] - } - ], - "source": [ - "print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "3 PSYCH_202 Introduction to Psychology\n", - "4 COMPSCI_537 Introduction to Operating Systems\n", - "3 COMPSCI_300 Programming 2\n", - "5 CHEM_104 General Chemistry II\n", - "3 COMPSCI_200 Programming 1\n", - "5 MATH_114 Algebra and Trigonometry\n", - "4 PSYCH_456 Introductory Social Psychology\n", - "2 COMPSCI_252 Introduction to Computer Engineering\n", - "3 COMPSCI_400 Programming 3\n", - "5 MATH_221 Calculus and Analytical Geometry 1\n", - "3 BIOLOGY_101 Animal Biology\n", - "3 COMPSCI_354 Machine Organization and Programming\n", - "4 CHEM_103 General Chemistry I\n", - "3 COMPSCI_639 Building User Interfaces\n", - "3 PSYCH_401 Psychology, Law, and Social Policy\n", - "3 COMPSCI_240 Introduction To Discrete Mathematics\n", - "3 STATS_302 Accelerated Introduction to Statistical Methods\n" - ] - } - ], - "source": [ - "# Print the number of credits, course number, and name for each class.\n", - "for spec_class in all_class_data:\n", - " print(spec_class['credits'], spec_class['number'], spec_class['name'])" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3.4705882352941178" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the average number of credits per course?\n", - "num_credits = 0 \n", - "for spec_class in all_class_data:\n", - " num_credits += spec_class['credits']\n", - "num_credits / len(all_class_data)" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['Biology',\n", - " 'Chemistry',\n", - " 'Computer Science',\n", - " 'Mathematics',\n", - " 'Psychology',\n", - " 'Statistics']" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What are the unique subjects?\n", - "subjects = []\n", - "for spec_class in all_class_data:\n", - " subjects.append(spec_class['subject'])\n", - "list(set(subjects))" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['COMPSCI_300', 'COMPSCI_200', 'COMPSCI_400']" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Besides PYSCH 202, what are the course numbers of the courses\n", - "# with the most sections offered (not including subsections)?\n", - "high_courses = []\n", - "high_sections = 0\n", - "for spec_class in all_class_data:\n", - " current_course_num = spec_class['number']\n", - " current_num_sects = len(spec_class['sections'])\n", - " \n", - " if current_course_num == 'PSYCH_202':\n", - " continue\n", - " \n", - " if current_num_sects == high_sections:\n", - " high_courses.append(current_course_num)\n", - " elif current_num_sects > high_sections:\n", - " high_courses = []\n", - " high_courses.append(current_course_num)\n", - " high_sections = current_num_sects\n", - "high_courses" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Can we make a Pandas dataframe? Yes!" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>requisites</th>\n", - " <th>sections</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>Behavior, including its development, motivatio...</td>\n", - " <td>[psychology, behavior, emotion, intelligence, ...</td>\n", - " <td>Introduction to Psychology</td>\n", - " <td>PSYCH_202</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Jeff Henriques', 'location': ...</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>4</td>\n", - " <td>Input-output hardware, interrupt handling, pro...</td>\n", - " <td>[computer, science, operating, system, systems]</td>\n", - " <td>Introduction to Operating Systems</td>\n", - " <td>COMPSCI_537</td>\n", - " <td>[[COMPSCI_354, COMPSCI_400]]</td>\n", - " <td>[{'instructor': 'Andrea Arpaci-Dusseau', 'loca...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>[[COMPSCI_200]]</td>\n", - " <td>[{'instructor': 'Gary Dahl', 'location': 'AB20...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>5</td>\n", - " <td>Principles and application of chemical equilib...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry II</td>\n", - " <td>CHEM_104</td>\n", - " <td>[[MATH_114], [CHEM_103]]</td>\n", - " <td>[{'instructor': 'Linda Zelewski', 'location': ...</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Jim Williams', 'location': '1...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>5</th>\n", - " <td>5</td>\n", - " <td>The two semester sequence MATH_112-MATH_113 co...</td>\n", - " <td>[math, mathematics, algebra, trigonometry]</td>\n", - " <td>Algebra and Trigonometry</td>\n", - " <td>MATH_114</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Sharad Chandarana', 'location...</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>4</td>\n", - " <td>The systematic study of the individual in a so...</td>\n", - " <td>[psychology, science, social, interaction, beh...</td>\n", - " <td>Introductory Social Psychology</td>\n", - " <td>PSYCH_456</td>\n", - " <td>[[PSYCH_202]]</td>\n", - " <td>[{'instructor': 'Abigail Letak', 'location': '...</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>[computer, science, engineering, programming]</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Joseph Krachey', 'location': ...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>[[COMPSCI_300]]</td>\n", - " <td>[{'instructor': 'Gary Dahl', 'location': 'AB20...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>9</th>\n", - " <td>5</td>\n", - " <td>Introduction to differential and integral calc...</td>\n", - " <td>[math, mathematics, calculus, analytical, geom...</td>\n", - " <td>Calculus and Analytical Geometry 1</td>\n", - " <td>MATH_221</td>\n", - " <td>[[MATH_114]]</td>\n", - " <td>[{'instructor': 'Laurentiu Maxim', 'location':...</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>10</th>\n", - " <td>3</td>\n", - " <td>General biological principles. Topics include:...</td>\n", - " <td>[biology, science, animal, evolution, genetics...</td>\n", - " <td>Animal Biology</td>\n", - " <td>BIOLOGY_101</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Sharon Thoma', 'location': '2...</td>\n", - " <td>Biology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>[computer, science, engineering, electrical, m...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>[[COMPSCI_252], [COMPSCI_300]]</td>\n", - " <td>[{'instructor': 'James Skrentny', 'location': ...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>4</td>\n", - " <td>Introduction. Stoichiometry and the mole conce...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry I</td>\n", - " <td>CHEM_103</td>\n", - " <td>[]</td>\n", - " <td>[{'instructor': 'Unknown', 'location': 'B10 In...</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>13</th>\n", - " <td>3</td>\n", - " <td>This course introduces students to the softwar...</td>\n", - " <td>[computer, science, building, user, interface,...</td>\n", - " <td>Building User Interfaces</td>\n", - " <td>COMPSCI_639</td>\n", - " <td>[[COMPSCI_300]]</td>\n", - " <td>[{'instructor': 'Bilge Mutlu', 'location': '12...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>14</th>\n", - " <td>3</td>\n", - " <td>Focuses on the role that psychological princip...</td>\n", - " <td>[psychology, science, law, social, policy, beh...</td>\n", - " <td>Psychology, Law, and Social Policy</td>\n", - " <td>PSYCH_401</td>\n", - " <td>[[PSYCH_202]]</td>\n", - " <td>[{'instructor': 'Gregory Van Rybroek', 'locati...</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>15</th>\n", - " <td>3</td>\n", - " <td>Basic concepts of logic, sets, partial order a...</td>\n", - " <td>[computer, science, math, mathematics, discret...</td>\n", - " <td>Introduction To Discrete Mathematics</td>\n", - " <td>COMPSCI_240</td>\n", - " <td>[[MATH_221]]</td>\n", - " <td>[{'instructor': 'Beck Hasti', 'location': '105...</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>16</th>\n", - " <td>3</td>\n", - " <td>Graphical and numerical exploration of data; s...</td>\n", - " <td>[statistics, statistical, math, mathematics, m...</td>\n", - " <td>Accelerated Introduction to Statistical Methods</td>\n", - " <td>STATS_302</td>\n", - " <td>[[MATH_221]]</td>\n", - " <td>[{'instructor': 'Unknown', 'location': '331 Se...</td>\n", - " <td>Statistics</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "0 3 Behavior, including its development, motivatio... \n", - "1 4 Input-output hardware, interrupt handling, pro... \n", - "2 3 Introduces students to Object-Oriented Program... \n", - "3 5 Principles and application of chemical equilib... \n", - "4 3 Learn the process of incrementally developing ... \n", - "5 5 The two semester sequence MATH_112-MATH_113 co... \n", - "6 4 The systematic study of the individual in a so... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "9 5 Introduction to differential and integral calc... \n", - "10 3 General biological principles. Topics include:... \n", - "11 3 An introduction to fundamental structures of c... \n", - "12 4 Introduction. Stoichiometry and the mole conce... \n", - "13 3 This course introduces students to the softwar... \n", - "14 3 Focuses on the role that psychological princip... \n", - "15 3 Basic concepts of logic, sets, partial order a... \n", - "16 3 Graphical and numerical exploration of data; s... \n", - "\n", - " keywords \\\n", - "0 [psychology, behavior, emotion, intelligence, ... \n", - "1 [computer, science, operating, system, systems] \n", - "2 [computer, science, programming, java] \n", - "3 [chemistry] \n", - "4 [computer, science, programming, java] \n", - "5 [math, mathematics, algebra, trigonometry] \n", - "6 [psychology, science, social, interaction, beh... \n", - "7 [computer, science, engineering, programming] \n", - "8 [computer, science, programming, java] \n", - "9 [math, mathematics, calculus, analytical, geom... \n", - "10 [biology, science, animal, evolution, genetics... \n", - "11 [computer, science, engineering, electrical, m... \n", - "12 [chemistry] \n", - "13 [computer, science, building, user, interface,... \n", - "14 [psychology, science, law, social, policy, beh... \n", - "15 [computer, science, math, mathematics, discret... \n", - "16 [statistics, statistical, math, mathematics, m... \n", - "\n", - " name number \\\n", - "0 Introduction to Psychology PSYCH_202 \n", - "1 Introduction to Operating Systems COMPSCI_537 \n", - "2 Programming 2 COMPSCI_300 \n", - "3 General Chemistry II CHEM_104 \n", - "4 Programming 1 COMPSCI_200 \n", - "5 Algebra and Trigonometry MATH_114 \n", - "6 Introductory Social Psychology PSYCH_456 \n", - "7 Introduction to Computer Engineering COMPSCI_252 \n", - "8 Programming 3 COMPSCI_400 \n", - "9 Calculus and Analytical Geometry 1 MATH_221 \n", - "10 Animal Biology BIOLOGY_101 \n", - "11 Machine Organization and Programming COMPSCI_354 \n", - "12 General Chemistry I CHEM_103 \n", - "13 Building User Interfaces COMPSCI_639 \n", - "14 Psychology, Law, and Social Policy PSYCH_401 \n", - "15 Introduction To Discrete Mathematics COMPSCI_240 \n", - "16 Accelerated Introduction to Statistical Methods STATS_302 \n", - "\n", - " requisites \\\n", - "0 [] \n", - "1 [[COMPSCI_354, COMPSCI_400]] \n", - "2 [[COMPSCI_200]] \n", - "3 [[MATH_114], [CHEM_103]] \n", - "4 [] \n", - "5 [] \n", - "6 [[PSYCH_202]] \n", - "7 [] \n", - "8 [[COMPSCI_300]] \n", - "9 [[MATH_114]] \n", - "10 [] \n", - "11 [[COMPSCI_252], [COMPSCI_300]] \n", - "12 [] \n", - "13 [[COMPSCI_300]] \n", - "14 [[PSYCH_202]] \n", - "15 [[MATH_221]] \n", - "16 [[MATH_221]] \n", - "\n", - " sections subject \n", - "0 [{'instructor': 'Jeff Henriques', 'location': ... Psychology \n", - "1 [{'instructor': 'Andrea Arpaci-Dusseau', 'loca... Computer Science \n", - "2 [{'instructor': 'Gary Dahl', 'location': 'AB20... Computer Science \n", - "3 [{'instructor': 'Linda Zelewski', 'location': ... Chemistry \n", - "4 [{'instructor': 'Jim Williams', 'location': '1... Computer Science \n", - "5 [{'instructor': 'Sharad Chandarana', 'location... Mathematics \n", - "6 [{'instructor': 'Abigail Letak', 'location': '... Psychology \n", - "7 [{'instructor': 'Joseph Krachey', 'location': ... Computer Science \n", - "8 [{'instructor': 'Gary Dahl', 'location': 'AB20... Computer Science \n", - "9 [{'instructor': 'Laurentiu Maxim', 'location':... Mathematics \n", - "10 [{'instructor': 'Sharon Thoma', 'location': '2... Biology \n", - "11 [{'instructor': 'James Skrentny', 'location': ... Computer Science \n", - "12 [{'instructor': 'Unknown', 'location': 'B10 In... Chemistry \n", - "13 [{'instructor': 'Bilge Mutlu', 'location': '12... Computer Science \n", - "14 [{'instructor': 'Gregory Van Rybroek', 'locati... Psychology \n", - "15 [{'instructor': 'Beck Hasti', 'location': '105... Computer Science \n", - "16 [{'instructor': 'Unknown', 'location': '331 Se... Statistics " - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "all_course_frame = DataFrame(all_class_data)\n", - "all_course_frame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### We may want to do some \"plumbing\" with our data." - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>Behavior, including its development, motivatio...</td>\n", - " <td>[psychology, behavior, emotion, intelligence, ...</td>\n", - " <td>Introduction to Psychology</td>\n", - " <td>PSYCH_202</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>4</td>\n", - " <td>Input-output hardware, interrupt handling, pro...</td>\n", - " <td>[computer, science, operating, system, systems]</td>\n", - " <td>Introduction to Operating Systems</td>\n", - " <td>COMPSCI_537</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>5</td>\n", - " <td>Principles and application of chemical equilib...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry II</td>\n", - " <td>CHEM_104</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>5</th>\n", - " <td>5</td>\n", - " <td>The two semester sequence MATH_112-MATH_113 co...</td>\n", - " <td>[math, mathematics, algebra, trigonometry]</td>\n", - " <td>Algebra and Trigonometry</td>\n", - " <td>MATH_114</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>4</td>\n", - " <td>The systematic study of the individual in a so...</td>\n", - " <td>[psychology, science, social, interaction, beh...</td>\n", - " <td>Introductory Social Psychology</td>\n", - " <td>PSYCH_456</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>[computer, science, engineering, programming]</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>[computer, science, programming, java]</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>9</th>\n", - " <td>5</td>\n", - " <td>Introduction to differential and integral calc...</td>\n", - " <td>[math, mathematics, calculus, analytical, geom...</td>\n", - " <td>Calculus and Analytical Geometry 1</td>\n", - " <td>MATH_221</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>10</th>\n", - " <td>3</td>\n", - " <td>General biological principles. Topics include:...</td>\n", - " <td>[biology, science, animal, evolution, genetics...</td>\n", - " <td>Animal Biology</td>\n", - " <td>BIOLOGY_101</td>\n", - " <td>Biology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>[computer, science, engineering, electrical, m...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>4</td>\n", - " <td>Introduction. Stoichiometry and the mole conce...</td>\n", - " <td>[chemistry]</td>\n", - " <td>General Chemistry I</td>\n", - " <td>CHEM_103</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>13</th>\n", - " <td>3</td>\n", - " <td>This course introduces students to the softwar...</td>\n", - " <td>[computer, science, building, user, interface,...</td>\n", - " <td>Building User Interfaces</td>\n", - " <td>COMPSCI_639</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>14</th>\n", - " <td>3</td>\n", - " <td>Focuses on the role that psychological princip...</td>\n", - " <td>[psychology, science, law, social, policy, beh...</td>\n", - " <td>Psychology, Law, and Social Policy</td>\n", - " <td>PSYCH_401</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>15</th>\n", - " <td>3</td>\n", - " <td>Basic concepts of logic, sets, partial order a...</td>\n", - " <td>[computer, science, math, mathematics, discret...</td>\n", - " <td>Introduction To Discrete Mathematics</td>\n", - " <td>COMPSCI_240</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>16</th>\n", - " <td>3</td>\n", - " <td>Graphical and numerical exploration of data; s...</td>\n", - " <td>[statistics, statistical, math, mathematics, m...</td>\n", - " <td>Accelerated Introduction to Statistical Methods</td>\n", - " <td>STATS_302</td>\n", - " <td>Statistics</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "0 3 Behavior, including its development, motivatio... \n", - "1 4 Input-output hardware, interrupt handling, pro... \n", - "2 3 Introduces students to Object-Oriented Program... \n", - "3 5 Principles and application of chemical equilib... \n", - "4 3 Learn the process of incrementally developing ... \n", - "5 5 The two semester sequence MATH_112-MATH_113 co... \n", - "6 4 The systematic study of the individual in a so... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "9 5 Introduction to differential and integral calc... \n", - "10 3 General biological principles. Topics include:... \n", - "11 3 An introduction to fundamental structures of c... \n", - "12 4 Introduction. Stoichiometry and the mole conce... \n", - "13 3 This course introduces students to the softwar... \n", - "14 3 Focuses on the role that psychological princip... \n", - "15 3 Basic concepts of logic, sets, partial order a... \n", - "16 3 Graphical and numerical exploration of data; s... \n", - "\n", - " keywords \\\n", - "0 [psychology, behavior, emotion, intelligence, ... \n", - "1 [computer, science, operating, system, systems] \n", - "2 [computer, science, programming, java] \n", - "3 [chemistry] \n", - "4 [computer, science, programming, java] \n", - "5 [math, mathematics, algebra, trigonometry] \n", - "6 [psychology, science, social, interaction, beh... \n", - "7 [computer, science, engineering, programming] \n", - "8 [computer, science, programming, java] \n", - "9 [math, mathematics, calculus, analytical, geom... \n", - "10 [biology, science, animal, evolution, genetics... \n", - "11 [computer, science, engineering, electrical, m... \n", - "12 [chemistry] \n", - "13 [computer, science, building, user, interface,... \n", - "14 [psychology, science, law, social, policy, beh... \n", - "15 [computer, science, math, mathematics, discret... \n", - "16 [statistics, statistical, math, mathematics, m... \n", - "\n", - " name number \\\n", - "0 Introduction to Psychology PSYCH_202 \n", - "1 Introduction to Operating Systems COMPSCI_537 \n", - "2 Programming 2 COMPSCI_300 \n", - "3 General Chemistry II CHEM_104 \n", - "4 Programming 1 COMPSCI_200 \n", - "5 Algebra and Trigonometry MATH_114 \n", - "6 Introductory Social Psychology PSYCH_456 \n", - "7 Introduction to Computer Engineering COMPSCI_252 \n", - "8 Programming 3 COMPSCI_400 \n", - "9 Calculus and Analytical Geometry 1 MATH_221 \n", - "10 Animal Biology BIOLOGY_101 \n", - "11 Machine Organization and Programming COMPSCI_354 \n", - "12 General Chemistry I CHEM_103 \n", - "13 Building User Interfaces COMPSCI_639 \n", - "14 Psychology, Law, and Social Policy PSYCH_401 \n", - "15 Introduction To Discrete Mathematics COMPSCI_240 \n", - "16 Accelerated Introduction to Statistical Methods STATS_302 \n", - "\n", - " subject \n", - "0 Psychology \n", - "1 Computer Science \n", - "2 Computer Science \n", - "3 Chemistry \n", - "4 Computer Science \n", - "5 Mathematics \n", - "6 Psychology \n", - "7 Computer Science \n", - "8 Computer Science \n", - "9 Mathematics \n", - "10 Biology \n", - "11 Computer Science \n", - "12 Chemistry \n", - "13 Computer Science \n", - "14 Psychology \n", - "15 Computer Science \n", - "16 Statistics " - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Remove the 'sections' and 'requisites' column.\n", - "new_course_frame = all_course_frame.loc[:, \"credits\":\"number\"]\n", - "new_course_frame[\"subject\"] = all_course_frame.loc[:, \"subject\"]\n", - "new_course_frame" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>3</td>\n", - " <td>Behavior, including its development, motivatio...</td>\n", - " <td>psychology, behavior, emotion, intelligence, b...</td>\n", - " <td>Introduction to Psychology</td>\n", - " <td>PSYCH_202</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>4</td>\n", - " <td>Input-output hardware, interrupt handling, pro...</td>\n", - " <td>computer, science, operating, system, systems</td>\n", - " <td>Introduction to Operating Systems</td>\n", - " <td>COMPSCI_537</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>3</th>\n", - " <td>5</td>\n", - " <td>Principles and application of chemical equilib...</td>\n", - " <td>chemistry</td>\n", - " <td>General Chemistry II</td>\n", - " <td>CHEM_104</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>5</th>\n", - " <td>5</td>\n", - " <td>The two semester sequence MATH_112-MATH_113 co...</td>\n", - " <td>math, mathematics, algebra, trigonometry</td>\n", - " <td>Algebra and Trigonometry</td>\n", - " <td>MATH_114</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>6</th>\n", - " <td>4</td>\n", - " <td>The systematic study of the individual in a so...</td>\n", - " <td>psychology, science, social, interaction, beha...</td>\n", - " <td>Introductory Social Psychology</td>\n", - " <td>PSYCH_456</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>computer, science, engineering, programming</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>9</th>\n", - " <td>5</td>\n", - " <td>Introduction to differential and integral calc...</td>\n", - " <td>math, mathematics, calculus, analytical, geome...</td>\n", - " <td>Calculus and Analytical Geometry 1</td>\n", - " <td>MATH_221</td>\n", - " <td>Mathematics</td>\n", - " </tr>\n", - " <tr>\n", - " <th>10</th>\n", - " <td>3</td>\n", - " <td>General biological principles. Topics include:...</td>\n", - " <td>biology, science, animal, evolution, genetics,...</td>\n", - " <td>Animal Biology</td>\n", - " <td>BIOLOGY_101</td>\n", - " <td>Biology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>computer, science, engineering, electrical, ma...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>12</th>\n", - " <td>4</td>\n", - " <td>Introduction. Stoichiometry and the mole conce...</td>\n", - " <td>chemistry</td>\n", - " <td>General Chemistry I</td>\n", - " <td>CHEM_103</td>\n", - " <td>Chemistry</td>\n", - " </tr>\n", - " <tr>\n", - " <th>13</th>\n", - " <td>3</td>\n", - " <td>This course introduces students to the softwar...</td>\n", - " <td>computer, science, building, user, interface, ...</td>\n", - " <td>Building User Interfaces</td>\n", - " <td>COMPSCI_639</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>14</th>\n", - " <td>3</td>\n", - " <td>Focuses on the role that psychological princip...</td>\n", - " <td>psychology, science, law, social, policy, beha...</td>\n", - " <td>Psychology, Law, and Social Policy</td>\n", - " <td>PSYCH_401</td>\n", - " <td>Psychology</td>\n", - " </tr>\n", - " <tr>\n", - " <th>15</th>\n", - " <td>3</td>\n", - " <td>Basic concepts of logic, sets, partial order a...</td>\n", - " <td>computer, science, math, mathematics, discrete...</td>\n", - " <td>Introduction To Discrete Mathematics</td>\n", - " <td>COMPSCI_240</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>16</th>\n", - " <td>3</td>\n", - " <td>Graphical and numerical exploration of data; s...</td>\n", - " <td>statistics, statistical, math, mathematics, me...</td>\n", - " <td>Accelerated Introduction to Statistical Methods</td>\n", - " <td>STATS_302</td>\n", - " <td>Statistics</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "0 3 Behavior, including its development, motivatio... \n", - "1 4 Input-output hardware, interrupt handling, pro... \n", - "2 3 Introduces students to Object-Oriented Program... \n", - "3 5 Principles and application of chemical equilib... \n", - "4 3 Learn the process of incrementally developing ... \n", - "5 5 The two semester sequence MATH_112-MATH_113 co... \n", - "6 4 The systematic study of the individual in a so... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "9 5 Introduction to differential and integral calc... \n", - "10 3 General biological principles. Topics include:... \n", - "11 3 An introduction to fundamental structures of c... \n", - "12 4 Introduction. Stoichiometry and the mole conce... \n", - "13 3 This course introduces students to the softwar... \n", - "14 3 Focuses on the role that psychological princip... \n", - "15 3 Basic concepts of logic, sets, partial order a... \n", - "16 3 Graphical and numerical exploration of data; s... \n", - "\n", - " keywords \\\n", - "0 psychology, behavior, emotion, intelligence, b... \n", - "1 computer, science, operating, system, systems \n", - "2 computer, science, programming, java \n", - "3 chemistry \n", - "4 computer, science, programming, java \n", - "5 math, mathematics, algebra, trigonometry \n", - "6 psychology, science, social, interaction, beha... \n", - "7 computer, science, engineering, programming \n", - "8 computer, science, programming, java \n", - "9 math, mathematics, calculus, analytical, geome... \n", - "10 biology, science, animal, evolution, genetics,... \n", - "11 computer, science, engineering, electrical, ma... \n", - "12 chemistry \n", - "13 computer, science, building, user, interface, ... \n", - "14 psychology, science, law, social, policy, beha... \n", - "15 computer, science, math, mathematics, discrete... \n", - "16 statistics, statistical, math, mathematics, me... \n", - "\n", - " name number \\\n", - "0 Introduction to Psychology PSYCH_202 \n", - "1 Introduction to Operating Systems COMPSCI_537 \n", - "2 Programming 2 COMPSCI_300 \n", - "3 General Chemistry II CHEM_104 \n", - "4 Programming 1 COMPSCI_200 \n", - "5 Algebra and Trigonometry MATH_114 \n", - "6 Introductory Social Psychology PSYCH_456 \n", - "7 Introduction to Computer Engineering COMPSCI_252 \n", - "8 Programming 3 COMPSCI_400 \n", - "9 Calculus and Analytical Geometry 1 MATH_221 \n", - "10 Animal Biology BIOLOGY_101 \n", - "11 Machine Organization and Programming COMPSCI_354 \n", - "12 General Chemistry I CHEM_103 \n", - "13 Building User Interfaces COMPSCI_639 \n", - "14 Psychology, Law, and Social Policy PSYCH_401 \n", - "15 Introduction To Discrete Mathematics COMPSCI_240 \n", - "16 Accelerated Introduction to Statistical Methods STATS_302 \n", - "\n", - " subject \n", - "0 Psychology \n", - "1 Computer Science \n", - "2 Computer Science \n", - "3 Chemistry \n", - "4 Computer Science \n", - "5 Mathematics \n", - "6 Psychology \n", - "7 Computer Science \n", - "8 Computer Science \n", - "9 Mathematics \n", - "10 Biology \n", - "11 Computer Science \n", - "12 Chemistry \n", - "13 Computer Science \n", - "14 Psychology \n", - "15 Computer Science \n", - "16 Statistics " - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Turn 'keywords' into a series of Strings and remove the '[', ']', '''\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].astype('string')\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"[\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"]\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"'\", \"\", regex=False)\n", - "new_course_frame" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pandas Operations" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "5" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the most number of credits a course offers?\n", - "new_course_frame[\"credits\"].max()" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the least number of credits a course offers?\n", - "new_course_frame[\"credits\"].min()" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "credits 2\n", - "description Logic components built with transistors, rudim...\n", - "keywords computer, science, engineering, programming\n", - "name Introduction to Computer Engineering\n", - "number COMPSCI_252\n", - "subject Computer Science\n", - "Name: 7, dtype: object" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What is the info for that course?\n", - "new_course_frame.iloc[new_course_frame[\"credits\"].idxmin()]" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>credits</th>\n", - " <th>description</th>\n", - " <th>keywords</th>\n", - " <th>name</th>\n", - " <th>number</th>\n", - " <th>subject</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3</td>\n", - " <td>Introduces students to Object-Oriented Program...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 2</td>\n", - " <td>COMPSCI_300</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>4</th>\n", - " <td>3</td>\n", - " <td>Learn the process of incrementally developing ...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 1</td>\n", - " <td>COMPSCI_200</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>7</th>\n", - " <td>2</td>\n", - " <td>Logic components built with transistors, rudim...</td>\n", - " <td>computer, science, engineering, programming</td>\n", - " <td>Introduction to Computer Engineering</td>\n", - " <td>COMPSCI_252</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>8</th>\n", - " <td>3</td>\n", - " <td>The third course in our programming fundamenta...</td>\n", - " <td>computer, science, programming, java</td>\n", - " <td>Programming 3</td>\n", - " <td>COMPSCI_400</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " <tr>\n", - " <th>11</th>\n", - " <td>3</td>\n", - " <td>An introduction to fundamental structures of c...</td>\n", - " <td>computer, science, engineering, electrical, ma...</td>\n", - " <td>Machine Organization and Programming</td>\n", - " <td>COMPSCI_354</td>\n", - " <td>Computer Science</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " credits description \\\n", - "2 3 Introduces students to Object-Oriented Program... \n", - "4 3 Learn the process of incrementally developing ... \n", - "7 2 Logic components built with transistors, rudim... \n", - "8 3 The third course in our programming fundamenta... \n", - "11 3 An introduction to fundamental structures of c... \n", - "\n", - " keywords \\\n", - "2 computer, science, programming, java \n", - "4 computer, science, programming, java \n", - "7 computer, science, engineering, programming \n", - "8 computer, science, programming, java \n", - "11 computer, science, engineering, electrical, ma... \n", - "\n", - " name number subject \n", - "2 Programming 2 COMPSCI_300 Computer Science \n", - "4 Programming 1 COMPSCI_200 Computer Science \n", - "7 Introduction to Computer Engineering COMPSCI_252 Computer Science \n", - "8 Programming 3 COMPSCI_400 Computer Science \n", - "11 Machine Organization and Programming COMPSCI_354 Computer Science " - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What courses contain the keyword \"programming\"?\n", - "mask = new_course_frame[\"keywords\"].str.contains(\"programming\")\n", - "new_course_frame[mask]" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'Psychology, Law, and Social Policy'" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# What course has the most lengthy description?\n", - "idx_max_desc = new_course_frame[\"description\"].str.len().idxmax()\n", - "new_course_frame.iloc[idx_max_desc]['name']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write it out to a CSV file on your drive\n", - "You now have your own copy!" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, "outputs": [], "source": [ "# Write it all out to a single CSV file\n", - "new_course_frame.to_csv(\"my_course_data.csv\", index=False)" + "periods_df.to_csv(\"campus_weather.csv\", index=False)" ] }, { @@ -3247,7 +2427,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.7" + "version": "3.9.12" } }, "nbformat": 4, diff --git a/f22/meena_lec_notes/lec-29/lec_29_web1_template.ipynb b/f22/meena_lec_notes/lec-29/lec_29_web1_template.ipynb index b51ca54..85b3f9b 100644 --- a/f22/meena_lec_notes/lec-29/lec_29_web1_template.ipynb +++ b/f22/meena_lec_notes/lec-29/lec_29_web1_template.ipynb @@ -232,7 +232,7 @@ "metadata": {}, "source": [ "## requests.get : Simple string example\n", - "- URL: https://www.msyamkumar.com/hello.txt" + "- URL: https://cs220.cs.wisc.edu/hello.txt" ] }, { @@ -241,7 +241,7 @@ "metadata": {}, "outputs": [], "source": [ - "url = \"https://www.msyamkumar.com/hello.txt\"\n", + "url = \"https://cs220.cs.wisc.edu/hello.txt\"\n", "r = requests.get(url) # r is the response\n", "print(r.status_code)\n", "print(r.text)" @@ -254,7 +254,7 @@ "outputs": [], "source": [ "# Q: What if the web site does not exist?\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", + "typo_url = \"https://cs220.cs.wisc.edu/hello.txttttt\"\n", "r = requests.get(typo_url)\n", "print(r.status_code)\n", "print(r.text)\n", @@ -269,7 +269,7 @@ "outputs": [], "source": [ "# We can check for a status_code error by using an assert\n", - "typo_url = \"https://www.msyamkumar.com/hello.txttttt\"\n", + "typo_url = \"https://cs220.cs.wisc.edu/hello.txttttt\"\n", "r = requests.get(typo_url)\n", "assert r.status_code == 200\n", "print(r.status_code)\n", @@ -326,7 +326,7 @@ "metadata": {}, "source": [ "## requests.get : JSON file example\n", - "- URL: https://www.msyamkumar.com/scores.json\n", + "- URL: https://cs220.cs.wisc.edu/scores.json\n", "- `json.load` (FILE_OBJECT)\n", "- `json.loads` (STRING)" ] @@ -338,7 +338,7 @@ "outputs": [], "source": [ "# GETting a JSON file, the long way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", + "url = \"https://cs220.cs.wisc.edu/scores.json\"\n", "r = requests.get(url)\n", "r.raise_for_status()\n", "urltext = r.text\n", @@ -354,7 +354,7 @@ "outputs": [], "source": [ "# GETting a JSON file, the shortcut way\n", - "url = \"https://www.msyamkumar.com/scores.json\"\n", + "url = \"https://cs220.cs.wisc.edu/scores.json\"\n", "#Shortcut to bypass using json.loads()\n", "r = requests.get(url)\n", "r.raise_for_status()\n", @@ -380,114 +380,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## DEMO: Course Enrollment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explore the API!\n", - "\n", - "https://coletnelson.us/cs220-api/classes\n", - "\n", - "https://coletnelson.us/cs220-api/classes_as_txt\n", - "\n", - "https://coletnelson.us/cs220-api/classes/MATH_221\n", - "\n", - "https://coletnelson.us/cs220-api/classes/COMPSCI_200\n", + "### Explore real-world JSON\n", "\n", - "... etc\n", + "How to explore an unknown JSON?\n", + "- If you run into a `dict`, try `.keys()` method to look at the keys of the dictionary, then use lookup process to explore further\n", + "- If you run into a `list`, iterate over the list and print each item\n", "\n", - "https://coletnelson.us/cs220-api/all_data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the list of classes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `json`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_list = r.json()\n", - "print(type(classes_list))\n", - "print(classes_list)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### When the data is `text`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes_as_txt\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "classes_txt = r.text\n", - "print(type(classes_txt))\n", - "print(classes_txt)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "classes_txt_as_list = ???" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get data for a specific class" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "url = \"https://coletnelson.us/cs220-api/classes/COMPSCI_200\"\n", - "r = requests.get(url)\n", - "r.raise_for_status()\n", - "cs200_data = r.json()\n", - "print(type(cs200_data))\n", - "print(cs200_data) # Too much data? Try print(cs220_data.keys())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cs200_data.keys()" + "### Weather for UW-Madison campus\n", + "- URL: https://api.weather.gov/gridpoints/MKX/37,63/forecast" ] }, { @@ -496,70 +396,11 @@ "metadata": {}, "outputs": [], "source": [ - "# Get the number of credits the course is worth\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the list of keywords for the course\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the official course name\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get the number of sections offered.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Collect all the class data in a list called 'all_class_data'\n", - "all_class_data = []\n", - "for class_num in classes_list:\n", - " url = \"https://coletnelson.us/cs220-api/classes/\" + class_num\n", - " r = requests.get(url)\n", - " r.raise_for_status()\n", - " class_data = r.json()\n", - " all_class_data.append(???)\n", + "# TODO: GET the forecast\n", "\n", - "print(all_class_data) # Too much data? Try print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(len(all_class_data))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Print the number of credits, course number, and name for each class.\n" + "# TODO: explore the type of the data structure \n", + "\n", + "# display the data\n" ] }, { @@ -568,7 +409,11 @@ "metadata": {}, "outputs": [], "source": [ - "# What is the average number of credits per course?\n" + "# TODO: display the keys of the weather_data dict\n", + "\n", + "# TODO: lookup the value corresponding to the 'properties'\n", + "\n", + "# TODO: you know what to do next ... explore type again\n" ] }, { @@ -577,7 +422,11 @@ "metadata": {}, "outputs": [], "source": [ - "# What are the unique subjects?\n" + "# TODO: display the keys of the properties dict\n", + "\n", + "# TODO: lookup the value corresponding to the 'periods'\n", + "\n", + "# TODO: you know what to do next ... explore type again\n" ] }, { @@ -586,20 +435,18 @@ "metadata": {}, "outputs": [], "source": [ - "# Besides PYSCH 202, what are the course numbers of the courses\n", - "# with the most sections offered (not including subsections)?\n", - "high_courses = []\n", - "high_sections = 0\n", - "for spec_class in all_class_data:\n", - " pass\n", - "high_courses" + "# TODO: extract periods list into a variable\n", + "\n", + "# TODO: create a DataFrame using periods_list\n", + "# TODO: What does each inner data structure represent in your DataFrame?\n", + "# Keep in mind that outer data structure is a list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Can we make a Pandas dataframe? Yes!" + "#### What is the maximum and minimum observed temperatures? Include the temperatureUnit in your display" ] }, { @@ -608,15 +455,23 @@ "metadata": {}, "outputs": [], "source": [ - "all_course_frame = DataFrame(all_class_data)\n", - "all_course_frame" + "min_temp = \n", + "idx_min = \n", + "min_unit = \n", + "\n", + "max_temp = \n", + "idx_max = \n", + "max_unit = \n", + "\n", + "print(\"Minimum observed temperature is: {} degree {}\".format(min_temp, min_unit))\n", + "print(\"Maximum observed temperature is: {} degree {}\".format(max_temp, max_unit))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### We may want to do some \"plumbing\" with our data." + "#### Which days `detailedForecast` contains `snow`?" ] }, { @@ -625,10 +480,8 @@ "metadata": {}, "outputs": [], "source": [ - "# Remove the 'sections' and 'requisites' column.\n", - "new_course_frame = all_course_frame.loc[:, \"credits\":\"number\"]\n", - "new_course_frame[\"subject\"] = all_course_frame.loc[:, \"subject\"]\n", - "new_course_frame" + "snow_days_df = \n", + "snow_days_df" ] }, { @@ -637,46 +490,14 @@ "metadata": {}, "outputs": [], "source": [ - "# Turn 'keywords' into a series of Strings and remove the '[', ']', '''\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].astype('string')\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"[\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"]\", \"\", regex=False)\n", - "new_course_frame[\"keywords\"] = new_course_frame[\"keywords\"].str.replace(\"'\", \"\", regex=False)\n", - "new_course_frame" + "# Extract only the name column information for the subset DataFrame\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Pandas Operations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the most number of credits a course offers?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the least number of credits a course offers?\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# What is the info for that course?\n" + "#### Which day's `detailedForecast` has the most lengthy description?" ] }, { @@ -685,7 +506,8 @@ "metadata": {}, "outputs": [], "source": [ - "# What courses contain the keyword \"programming\"?\n" + "idx_max_desc = \n", + "periods_df.iloc[idx_max_desc]['name']" ] }, { @@ -694,7 +516,7 @@ "metadata": {}, "outputs": [], "source": [ - "# What course has the most lengthy description?\n" + "# What was that forecast?\n" ] }, { @@ -712,7 +534,7 @@ "outputs": [], "source": [ "# Write it all out to a single CSV file\n", - "new_course_frame.to_csv(\"my_course_data.csv\", index=False)" + "periods_df.to_csv(\"campus_weather.csv\", index=False)" ] }, { @@ -759,7 +581,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.7" + "version": "3.9.12" } }, "nbformat": 4, diff --git a/f22/meena_lec_notes/lec-29/my_course_data.csv b/f22/meena_lec_notes/lec-29/my_course_data.csv deleted file mode 100644 index 7c9adb1..0000000 --- a/f22/meena_lec_notes/lec-29/my_course_data.csv +++ /dev/null @@ -1,18 +0,0 @@ -credits,description,keywords,name,number,subject -3,"Behavior, including its development, motivation, frustrations, emotion, intelligence, learning, forgetting, personality, language, thinking, and social behavior.","psychology, behavior, emotion, intelligence, brain",Introduction to Psychology,PSYCH_202,Psychology -4,"Input-output hardware, interrupt handling, properties of magnetic tapes, discs and drums, associative memories and virtual address translation techniques. Batch processing, time sharing and real-time systems, scheduling resource allocation, modular software systems, performance measurement and system evaluation.","computer, science, operating, system, systems",Introduction to Operating Systems,COMPSCI_537,Computer Science -3,"Introduces students to Object-Oriented Programming using classes and objects to solve more complex problems. Introduces array-based and linked data structures: including lists, stacks, and queues. Programming assignments require writing and developing multi-class (file) programs using interfaces, generics, and exception handling to solve challenging real world problems. Topics reviewed include reading/writing data and objects from/to files and exception handling, and command line arguments. Topics introduced: object-oriented design; class vs. object; create and define interfaces and iterators; searching and sorting; abstract data types (List,Stack,Queue,PriorityQueue(Heap),Binary Search Tree); generic interfaces (parametric polymorphism); how to design and write test methods and classes; array based vs. linked node implementations; introduction to complexity analysis; recursion.","computer, science, programming, java",Programming 2,COMPSCI_300,Computer Science -5,"Principles and application of chemical equilibrium, coordination chemistry, oxidation-reduction and electrochemistry, kinetics, nuclear chemistry, introduction to organic chemistry. Lecture, lab, and discussion.",chemistry,General Chemistry II,CHEM_104,Chemistry -3,"Learn the process of incrementally developing small (200-500 lines) programs along with the fundamental Computer Science topics. These topics include: problem abstraction and decomposition, the edit-compile-run cycle, using variables of primitive and more complex data types, conditional and loop-based flow control, basic testing and debugging techniques, how to define and call functions (methods), and IO processing techniques. Also teaches and reinforces good programming practices including the use of a consistent style, and meaningful documentation. Intended for students who have no prior programming experience.","computer, science, programming, java",Programming 1,COMPSCI_200,Computer Science -5,"The two semester sequence MATH_112-MATH_113 covers similar material as MATH_114, but in a slower pace.","math, mathematics, algebra, trigonometry",Algebra and Trigonometry,MATH_114,Mathematics -4,"The systematic study of the individual in a social context, including social interaction, motivation, attitudes, conformity, communication, leadership, personal relationships, and behavior in small groups.","psychology, science, social, interaction, behavior",Introductory Social Psychology,PSYCH_456,Psychology -2,"Logic components built with transistors, rudimentary Boolean algebra, basic combinational logic design, basic synchronous sequential logic design, basic computer organization and design, introductory machine- and assembly-language programming.","computer, science, engineering, programming",Introduction to Computer Engineering,COMPSCI_252,Computer Science -3,"The third course in our programming fundamentals sequence. It presumes that students understand and use functional and object-oriented design and abstract data types as needed. This course introduces balanced search trees, graphs, graph traversal algorithms, hash tables and sets, and complexity analysis and about classes of problems that require each data type. Students are required to design and implement using high quality professional code, a medium sized program, that demonstrates knowledge and use of latest language features, tools, and conventions. Additional topics introduced will include as needed for projects: inheritance and polymorphism; anonymous inner classes, lambda functions, performance analysis to discover and optimize critical code blocks. Students learn about industry standards for code development. Students will design and implement a medium size project with a more advanced user-interface design, such as a web or mobile application with a GUI and event- driven implementation; use of version-control software.","computer, science, programming, java",Programming 3,COMPSCI_400,Computer Science -5,Introduction to differential and integral calculus and plane analytic geometry; applications; transcendental functions.,"math, mathematics, calculus, analytical, geometry, differential, integral",Calculus and Analytical Geometry 1,MATH_221,Mathematics -3,"General biological principles. Topics include: evolution, ecology, animal behavior, cell structure and function, genetics and molecular genetics and the physiology of a variety of organ systems emphasizing function in humans.","biology, science, animal, evolution, genetics, ecology",Animal Biology,BIOLOGY_101,Biology -3,"An introduction to fundamental structures of computer systems and the C programming language with a focus on the low-level interrelationships and impacts on performance. Topics include the virtual address space and virtual memory, the heap and dynamic memory management, the memory hierarchy and caching, assembly language and the stack, communication and interrupts/signals, compiling and assemblers/linkers.","computer, science, engineering, electrical, machine, programming",Machine Organization and Programming,COMPSCI_354,Computer Science -4,"Introduction. Stoichiometry and the mole concept, the behavior of gases, liquids and solids, thermochemistry, electronic structure of atoms and chemical bonding, descriptive chemistry of selected elements and compounds, intermolecular forces. For students taking one year or more of college chemistry; serves as a prereq for CHEM_104; lecture, lab and discussion.",chemistry,General Chemistry I,CHEM_103,Chemistry -3,"This course introduces students to the software development of user interfaces (UIs). Topics covered include state-of-the-art (1) UI paradigms, such as event-driven interfaces, direct-manipulation interfaces, and dialogue-based interaction; (2) methods for capturing, interpreting, and responding to different forms of user input and states, including pointing, text entry, speech, touch, gestures, user activity, context, and physiological states; and (3) platform-specific UI development APIs, frameworks, and toolkits for platforms including web/mobile/desktop interfaces, natural user interfaces, and voice user interfaces. Through readings, lectures, and hands-on-activities, students will learn about the fundamental concepts, technologies, and methods in building user interfaces. Assignments will provide an opportunity to gain hands-on experience in the use of state-of-the-art UI development tools and build a UI development portfolio.","computer, science, building, user, interface, interfaces, design, ui",Building User Interfaces,COMPSCI_639,Computer Science -3,"Focuses on the role that psychological principles, research evidence and social science play in the laws of U.S. society, especially in the policies and mechanisms of social control of human behavior. The course will address the ways that society defines membership, and the role of psychology in how it determines who should be excluded or restricted from open society, in order to maintain a more civil society. In addition to learning the factual information about how selected processes work in the legal and social context, students will be asked to consider the role they can play as citizens in supporting or changing these social processes. The course will take a particular interest in psycholegal issues ""in action"" and in learning about the clinical-legal processes used to determine the disposition of individuals considered marginal in society. Finally, the course will address the mechanisms that are used to exclude individuals from open society through criminal and civil court processes, the role of psychology as a science, and the role of psychologists as behavioral experts in criminal and civil courts, and in shaping social policies.","psychology, science, law, social, policy, behavior","Psychology, Law, and Social Policy",PSYCH_401,Psychology -3,"Basic concepts of logic, sets, partial order and other relations, and functions. Basic concepts of mathematics (definitions, proofs, sets, functions, and relations) with a focus on discrete structures: integers, bits, strings, trees, and graphs. Propositional logic, Boolean algebra, and predicate logic. Mathematical induction and recursion. Invariants and algorithmic correctness. Recurrences and asymptotic growth analysis. Fundamentals of counting.","computer, science, math, mathematics, discrete, logic, algorithm, algorithms",Introduction To Discrete Mathematics,COMPSCI_240,Computer Science -3,"Graphical and numerical exploration of data; standard errors; distributions for statistical models including binomial, Poisson, normal; estimation; hypothesis testing; randomization tests; basic principles of experimental design; regression; ANOVA; categorical data analysis; goodness of fit; application. (intended for students wishing to take additional statistics courses).","statistics, statistical, math, mathematics, methods",Accelerated Introduction to Statistical Methods,STATS_302,Statistics -- GitLab