add lab-p13 and p13

a8f11123 · Ashwin Maran · a666d2f3 · a8f11123 · a8f11123 · a8f11123
Commit a8f11123 authored 1 year ago by Ashwin Maran
--- a/p13/README.md
+++ b/p13/README.md
+# Project 13 (P13): World University Rankings
+
+
+## Corrections and clarifications:
+
+* None yet.
+
+**Find any issues?** Report to us:
+
+- Iffat Nafisa <nafisa@wisc.edu>
+- Jodi Lawson <jlawson6@wisc.edu>
+
+
+## Instructions:
+
+This project will focus on **SQL**, and **Plotting**. To start, download [`p13.ipynb`](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/p13/p13.ipynb), [`p13_test.py`](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/p13/p13_test.py), and [`p13_expected.html`](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/p13/p13_expected.html).
+
+**Important Warning:** You must **not** manually download any of the other files. In particular, you are **not** allowed to manually download the file `QSranking.json`. You **must** download this files using Python in your `p13.ipynb` notebook as a part of the project. Otherwise, your code may pass on **your computer**, but **fail** on the testing computer.
+
+You will work on `p13.ipynb` and hand it in. You should follow the provided directions for each question. Questions have **specific** directions on what **to do** and what **not to do**.
+
+------------------------------
+
+## IMPORTANT Submission instructions:
+- Review the [Grading Rubric](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/p13/rubric.md), to ensure that you don't lose points during code review.
+- You must **save your notebook file** before you run the cell containing **export**.
+- Login to [Gradescope](https://www.gradescope.com/) and upload the zip file into the P13 assignment.
+- If you completed the project with a **partner**, make sure to **add their name** by clicking "Add Group Member"
+in Gradescope when uploading the P13 zip file.
+
+   <img src="images/add_group_member.png" width="400">
+
+   **Warning:** You will have to add your partner on Gradescope even if you have filled out this information in your `p13.ipynb` notebook.
+
+- It is **your responsibility** to make sure that your project clears auto-grader tests on the Gradescope test system. Otter test results should be available in a few minutes after your submission. You should be able to see both PASS / FAIL results for the 20 test cases and your total score, which is accessible via Gradescope Dashboard (as in the image below):
+
+    <img src="images/gradescope.png" width="400">
+- **Important:** After you submit, you **need to verify** that your code is visible on Gradescope. If you displayed the output of a large variable anywhere in your notebook, **we will not be able to view your submission**. Make sure you don't have any large outputs in any of your cells, and verify after submission that your code can be viewed.
+- If you feel you have been incorrectly graded on a particular question by the Gradescope autograder, please make a regrade request.
--- a/p13/images/README.md
+++ b/p13/images/README.md
+# Images
+
+Images from p13 are stored here.
--- a/p13/images/add_group_member.png
+++ b/p13/images/add_group_member.png
--- a/p13/images/gradescope.png
+++ b/p13/images/gradescope.png
--- a/p13/images/q10.png
+++ b/p13/images/q10.png
--- a/p13/images/q16.png
+++ b/p13/images/q16.png
--- a/p13/images/q17.png
+++ b/p13/images/q17.png
--- a/p13/images/q18.png
+++ b/p13/images/q18.png
--- a/p13/images/q20.png
+++ b/p13/images/q20.png
--- a/p13/images/q5.png
+++ b/p13/images/q5.png
--- a/p13/images/q6.png
+++ b/p13/images/q6.png
--- a/p13/images/q8.png
+++ b/p13/images/q8.png
--- a/p13/images/q9.png
+++ b/p13/images/q9.png
--- a/p13/p13.ipynb
+++ b/p13/p13.ipynb
--- a/p13/p13_expected.html
+++ b/p13/p13_expected.html
--- a/p13/p13_test.py
+++ b/p13/p13_test.py
+#!/usr/bin/python
+import os, json, math
+from collections import namedtuple
+import pandas as pd
+import numpy as np
+from bs4 import BeautifulSoup
+
+HTML_TEST_FILE = 'p13_expected.html'
+
+MAX_FILE_SIZE = 500 # units - KB
+REL_TOL = 6e-04  # relative tolerance for floats
+ABS_TOL = 15e-03  # absolute tolerance for floats
+
+PASS = "PASS"
+
+TEXT_FORMAT = "text"  # question type when expected answer is a str, int, float, or bool
+TEXT_FORMAT_NAMEDTUPLE = "text namedtuple"  # question type when expected answer is a namedtuple
+TEXT_FORMAT_UNORDERED_LIST = "text list_unordered"  # question type when the expected answer is a list where the order does *not* matter
+TEXT_FORMAT_ORDERED_LIST = "text list_ordered"  # question type when the expected answer is a list where the order does matter
+TEXT_FORMAT_SPECIAL_ORDERED_LIST = "text list_special_ordered"  # question type when the expected answer is a list where order does matter, but with possible ties. Elements are ordered according to values in special_ordered_json (with ties allowed)
+TEXT_FORMAT_DICT = "text dict"  # question type when the expected answer is a dictionary
+HTML_FORMAT = "html" # question type when the expected answer is a DataFrame
+FILE_JSON_FORMAT = "file json" # question type when the expected answer is a JSON file
+
+def return_expected_json():
+    expected_json =    {"1": (HTML_FORMAT, None),
+                        "2": (HTML_FORMAT, None),
+                        "3": (HTML_FORMAT, None),
+                        "4": (HTML_FORMAT, None),
+                        "5": (HTML_FORMAT, None),
+                        "6": (HTML_FORMAT, None),
+                        "7": (HTML_FORMAT, None),
+                        "8": (HTML_FORMAT, None),
+                        "9": (HTML_FORMAT, None),
+                        "10": (HTML_FORMAT, None),
+                        "11": (TEXT_FORMAT, 0.5213253604130499),
+                        "12": (TEXT_FORMAT, 0.557397228343763),
+                        "13": (HTML_FORMAT, None),
+                        "14": (HTML_FORMAT, None),
+                        "15": (HTML_FORMAT, None),
+                        "16": (HTML_FORMAT, None),
+                        "17": (HTML_FORMAT, None),
+                        "18": (HTML_FORMAT, None),
+                        "19": (TEXT_FORMAT, 56),
+                        "20": (HTML_FORMAT, None)}
+
+    return expected_json
+
+def check_cell(qnum, actual):
+    expected_json = return_expected_json()
+    format, expected = expected_json[qnum[1:]]
+    try:
+        if format == TEXT_FORMAT:
+            return simple_compare(expected, actual)
+        elif format == TEXT_FORMAT_UNORDERED_LIST:
+            return list_compare_unordered(expected, actual)
+        elif format == TEXT_FORMAT_ORDERED_LIST:
+            return list_compare_ordered(expected, actual)
+        elif format == TEXT_FORMAT_DICT:
+            return dict_compare(expected, actual)
+        elif format == TEXT_FORMAT_NAMEDTUPLE:
+            return namedtuple_compare(expected ,actual)
+        elif format == HTML_FORMAT:
+            return check_cell_html(qnum[1:], actual)
+        elif format == FILE_JSON_FORMAT:
+            return check_json(expected, actual)
+        else:
+            if expected != actual:
+                return "expected %s but found %s " % (repr(expected), repr(actual))
+    except:
+        if expected != actual:
+            return "expected %s" % (repr(expected))
+    return PASS
+
+
+
+def simple_compare(expected, actual, complete_msg=True):
+    actual = getattr(actual, "tolist", lambda: actual)()
+    msg = PASS
+    if type(expected) == type:
+        if expected != actual:
+            if type(actual) == type:
+                msg = "expected %s but found %s" % (expected.__name__, actual.__name__)
+            else:
+                msg = "expected %s but found %s" % (expected.__name__, repr(actual))
+    elif type(expected) != type(actual) and not (type(expected) in [float, int] and type(actual) in [float, int]):
+        msg = "expected to find type %s but found type %s" % (type(expected).__name__, type(actual).__name__)
+    elif type(expected) == float:
+        if not math.isclose(actual, expected, rel_tol=REL_TOL, abs_tol=ABS_TOL):
+            msg = "expected %s" % (repr(expected))
+            if complete_msg:
+                msg = msg + " but found %s" % (repr(actual))
+    else:
+        if expected != actual:
+            msg = "expected %s" % (repr(expected))
+            if complete_msg:
+                msg = msg + " but found %s" % (repr(actual))
+    return msg
+
+
+def list_compare_ordered(expected, actual, obj="list"):
+    msg = PASS
+    if type(expected) != type(actual):
+        msg = "expected to find type %s but found type %s" % (type(expected).__name__, type(actual).__name__)
+        return msg
+    for i in range(len(expected)):
+        if i >= len(actual):
+            msg = "expected missing %s in %s" % (repr(expected[i]), obj)
+            break
+        if type(expected[i]) in [int, float, bool, str]:
+            val = simple_compare(expected[i], actual[i])
+        elif type(expected[i]) in [list]:
+            val = list_compare_ordered(expected[i], actual[i], "sub" + obj)
+        elif type(expected[i]) in [dict]:
+            val = dict_compare(expected[i], actual[i])
+        elif type(expected[i]).__name__ in namedtuples:
+            val = namedtuple_compare(expected[i], actual[i])
+        if val != PASS:
+            msg = "at index %d of the %s, " % (i, obj) + val
+            break
+    if len(actual) > len(expected) and msg == PASS:
+        msg = "found unexpected %s in %s" % (repr(actual[len(expected)]), obj)
+    if len(expected) != len(actual):
+        msg = msg + " (found %d entries in %s, but expected %d)" % (len(actual), obj, len(expected))
+
+    if len(expected) > 0 and type(expected[0]) in [int, float, bool, str]:
+        if msg != PASS and list_compare_unordered(expected, actual, obj) == PASS:
+            try:
+                msg = msg + " (%s may not be ordered as required)" % (obj)
+            except:
+                pass
+    return msg
+
+
+def list_compare_helper(larger, smaller):
+    msg = PASS
+    j = 0
+    for i in range(len(larger)):
+        if i == len(smaller):
+            msg = "expected %s" % (repr(larger[i]))
+            break
+        found = False
+        while not found:
+            if j == len(smaller):
+                val = simple_compare(larger[i], smaller[j - 1], False)
+                break
+            val = simple_compare(larger[i], smaller[j], False)
+            j += 1
+            if val == PASS:
+                found = True
+                break
+        if not found:
+            msg = val
+            break
+    return msg
+
+
+def list_compare_unordered(expected, actual, obj="list"):
+    msg = PASS
+    if type(expected) != type(actual):
+        msg = "expected to find type %s but found type %s" % (type(expected).__name__, type(actual).__name__)
+        return msg
+    try:
+        sort_expected = sorted(expected)
+        sort_actual = sorted(actual)
+    except:
+        msg = "unexpected datatype found in %s; expected entries of type %s" % (obj, obj, type(expected[0]).__name__)
+        return msg
+
+    if len(actual) == 0 and len(expected) > 0:
+        msg = "in the %s, missing" % (obj) + expected[0]
+    elif len(actual) > 0 and len(expected) > 0:
+        val = simple_compare(sort_expected[0], sort_actual[0])
+        if val.startswith("expected to find type"):
+            msg = "in the %s, " % (obj) + simple_compare(sort_expected[0], sort_actual[0])
+        else:
+            if len(expected) > len(actual):
+                msg = "in the %s, missing " % (obj) + list_compare_helper(sort_expected, sort_actual)
+            elif len(expected) < len(actual):
+                msg = "in the %s, found un" % (obj) + list_compare_helper(sort_actual, sort_expected)
+            if len(expected) != len(actual):
+                msg = msg + " (found %d entries in %s, but expected %d)" % (len(actual), obj, len(expected))
+                return msg
+            else:
+                val = list_compare_helper(sort_expected, sort_actual)
+                if val != PASS:
+                    msg = "in the %s, missing " % (obj) + val + ", but found un" + list_compare_helper(sort_actual,
+                                                                                               sort_expected)
+    return msg
+
+def list_compare_special_init(expected, special_order):
+    real_expected = []
+    for i in range(len(expected)):
+        if real_expected == [] or special_order[i-1] != special_order[i]:
+            real_expected.append([])
+        real_expected[-1].append(expected[i])
+    return real_expected
+
+
+def list_compare_special(expected, actual, special_order):
+    expected = list_compare_special_init(expected, special_order)
+    msg = PASS
+    expected_list = []
+    for expected_item in expected:
+        expected_list.extend(expected_item)
+    val = list_compare_unordered(expected_list, actual)
+    if val != PASS:
+        msg = val
+    else:
+        i = 0
+        for expected_item in expected:
+            j = len(expected_item)
+            actual_item = actual[i: i + j]
+            val = list_compare_unordered(expected_item, actual_item)
+            if val != PASS:
+                if j == 1:
+                    msg = "at index %d " % (i) + val
+                else:
+                    msg = "between indices %d and %d " % (i, i + j - 1) + val
+                msg = msg + " (list may not be ordered as required)"
+                break
+            i += j
+
+    return msg
+
+
+def dict_compare(expected, actual, obj="dict"):
+    msg = PASS
+    if type(expected) != type(actual):
+        msg = "expected to find type %s but found type %s" % (type(expected).__name__, type(actual).__name__)
+        return msg
+    try:
+        expected_keys = sorted(list(expected.keys()))
+        actual_keys = sorted(list(actual.keys()))
+    except:
+        msg = "unexpected datatype found in keys of dict; expect a dict with keys of type %s" % (
+            type(expected_keys[0]).__name__)
+        return msg
+    val = list_compare_unordered(expected_keys, actual_keys, "dict")
+    if val != PASS:
+        msg = "bad keys in %s: " % (obj) + val
+    if msg == PASS:
+        for key in expected:
+            if expected[key] == None or type(expected[key]) in [int, float, bool, str]:
+                val = simple_compare(expected[key], actual[key])
+            elif type(expected[key]) in [list]:
+                val = list_compare_ordered(expected[key], actual[key], "value")
+            elif type(expected[key]) in [dict]:
+                val = dict_compare(expected[key], actual[key], "sub" + obj)
+            elif type(expected[key]).__name__ in namedtuples:
+                val = namedtuple_compare(expected[key], actual[key])
+            if val != PASS:
+                msg = "incorrect val for key %s in %s: " % (repr(key), obj) + val
+    return msg
+
+def parse_df_html_table(html, question=None):
+    soup = BeautifulSoup(html, 'html.parser')
+
+    if question == None:
+        tables = soup.find_all('table')
+        assert(len(tables) == 1)
+        table = tables[0]
+    else:
+        table = soup.find('table', {"data-question": str(question)})
+
+    rows = []
+    for tr in table.find_all('tr'):
+        rows.append([])
+        for cell in tr.find_all(['td', 'th']):
+            rows[-1].append(cell.get_text())
+
+    cells = {}
+    for r in range(1, len(rows)):
+        for c in range(1, len(rows[0])):
+            rname = rows[r][0]
+            cname = rows[0][c]
+            cells[(rname,cname)] = rows[r][c]
+    return cells
+
+def check_cell_html(qnum, actual):
+    try:
+        actual_cells = parse_df_html_table(actual)
+    except Exception as e:
+        return "expected to find type DataFrame but found type %s instead" % type(actual).__name__
+    try:
+        with open(HTML_TEST_FILE, encoding='utf-8') as f:
+            expected_cells = parse_df_html_table(f.read(), qnum)
+    except Exception as e:
+        return "ERROR! Could not find table in %s. Please make sure you have downloaded %s correctly." % (HTML_TEST_FILE, HTML_TEST_FILE)
+
+    for location, expected in expected_cells.items():
+        location_name = "column {} at index {}".format(location[1], location[0])
+        actual = actual_cells.get(location, None)
+        if actual == None:
+            return "in location %s, expected to find %s" % (location_name, repr(expected))
+        try:
+            actual_ans = float(actual)
+            expected_ans = float(expected)
+            if math.isnan(actual_ans) and math.isnan(expected_ans):
+                continue
+        except Exception as e:
+            actual_ans, expected_ans = actual, expected
+        msg = simple_compare(expected_ans, actual_ans)
+        if msg != PASS:
+            return "in location %s, " % location_name + msg
+    expected_cols = list(set(["column %s" %loc[1] for loc in expected_cells]))
+    actual_cols = list(set(["column %s" %loc[1] for loc in actual_cells]))
+    msg = list_compare_unordered(expected_cols, actual_cols, "DataFrame")
+    if msg != PASS:
+        return msg
+    expected_rows = list(set(["row at index %s" %loc[0] for loc in expected_cells]))
+    actual_rows = list(set(["row at index %s" %loc[0] for loc in actual_cells]))
+    msg = list_compare_unordered(expected_rows, actual_rows, "DataFrame")
+    if msg != PASS:
+        return msg
+    return PASS
+
+
+def check_json(expected, actual):
+    msg = PASS
+    if expected not in os.listdir("."):
+        return "file %s not found" % expected
+    elif actual not in os.listdir("."):
+        return "file %s not found" % actual
+    try:
+        e = open(expected, encoding='utf-8')
+        expected_data = json.load(e)
+        e.close()
+    except json.JSONDecodeError:
+        return "file %s is broken and cannot be parsed; please redownload the file" % expected
+    try:
+        a = open(actual, encoding='utf-8')
+        actual_data = json.load(a)
+        a.close()
+    except json.JSONDecodeError:
+        return "file %s is broken and cannot be parsed" % actual
+    if type(expected_data) == list:
+        msg = list_compare_ordered(expected_data, actual_data, 'file ' + actual)
+    elif type(expected_data) == dict:
+        msg = dict_compare(expected_data, actual_data)
+    return msg
+
+def check(qnum, actual):
+    msg = check_cell(qnum, actual)
+    if msg == PASS:
+        return True
+    print("<b style='color: red;'>ERROR:</b> " + msg)
+
+
+def check_file_size(path):
+    size = os.path.getsize(path)
+    assert size < MAX_FILE_SIZE * 10**3, "Your file is too big to be processed by Gradescope; please delete unnecessary output cells so your file size is < %s KB" % MAX_FILE_SIZE
--- a/p13/rubric.md
+++ b/p13/rubric.md
+# Project 13 (P13) grading rubric
+
+## Code reviews
+
+- A TA / grader will be reviewing your code after the deadline.
+- They will make deductions based on the Rubric provided below.
+- To ensure that you don’t lose any points in code review, you must review the rubric and make sure that you have followed the instructions provided in the project correctly.
+
+## Rubric
+
+### General guidelines:
+
+- `import` statements are not mentioned in the required cell at the top of the notebook or used additional import statements beyond those that are stated in the directions (-3)
+- Required outputs not visible/did not save the notebook file prior to running the cell containing "export" (-3)
+
+### Question specific guidelines:
+
+- `bar_plot` (2)
+	- Function does not create correct bar plot (-2)
+
+- `scatter_plot` (2)
+	- Function does not create correct scatter plot (-2)
+
+- `horizontal_bar_plot` (1)
+	- Function does not create correct horizontal bar plot (-1)
+
+- `pie_plot` (1)
+	- Function does not create correct pie plot (-1)
+
+- `get_regression_line` (1)
+	- Function logic is incorrect (-1)
+
+- `regression_line_plot` (1)
+	- Function logic is incorrect (-1)
+
+- `regression_line_plot` (2)
+	- Required function is not used  (-1)
+	- Function does not create correct scatter plot or the correct line plot using `df["fit"]`(-1)
+
+- `conn` (3)
+	- Data structure is defined more than once (-1)
+	- Did not close the connection to `conn` at the end (-2)
+
+- Q1 (3)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-1)
+
+- Q2 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q3 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q4 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q5 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q6 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q7 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q8 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q9 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q10 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q11 (3)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-1)
+
+- Q12 (3)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-1)
+
+- Q13 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q14 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q15 (4)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+
+- Q16 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q17 (6)
+	- Did not use SQL to answer (-2)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q18 (4)
+	- Incorrect logic is used to answer (-2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)
+
+- Q19 (2)
+	- Incorrect logic is used to answer (-1)
+	- Required function is not used (-1)
+
+- Q20 (2)
+	- Required function is not used (-1)
+	- Plot is not properly labeled (-1)