release P2

f51884e7 · tylerharter · 36effb0c · f51884e7
Commit f51884e7 authored 1 month ago by tylerharter
--- a/p2/README.md
+++ b/p2/README.md
-# DRAFT!  Don't start yet.
-
 # P1 (4% of grade): gRPC and Containers

 ## Overview
@@ -7,8 +5,8 @@
 In this project, you'll create a multi-container application for
 looking up the addresses of houses in Madison with a given zipcode.
 One set of containers will host the data and provide access via gRPC.
-Their funcionality is identical so that your application can continue
-to function even if one container fails.  Another set of containers
+Their functionality is identical so that your application can continue
+to operate even if one container fails.  Another set of containers
 will provide an HTTP interface to the data.  This second set won't
 actually store the original data, but will communicate with the first
 set of containers to get the data necessary to answer queries.  The
@@ -26,7 +24,7 @@ Before starting, please review the [general project directions](../projects.md).

 * none yet

-## Cluster Overview
+## Introduction

 You'll need to write code and Dockerfiles to start 5 containers like this:

@@ -34,7 +32,9 @@ You'll need to write code and Dockerfiles to start 5 containers like this:

 Take a look at the provided Docker compose file (you may not modify
 it).  Note that there are two services, "cache" with 3 replicas and
-"dataset" with 2 replicas.
+"dataset" with 2 replicas.  The cache replicas will forward random
+ports on the VM (probably not 8000-8002) to port 8080 inside the
+containers.

 You should have Dockerfiles named "Dockerfile.cache" and "Dockerfile.dataset" that we can build like this to produce the Docker images for these two services:

@@ -52,7 +52,8 @@ export PROJECT=p2

 Whatever this is set to will be a prefix for the container names.  For
 example, if it is "abc", your first cache container will be named
-"abc-cache-1".  The autograder will may use a prefix other than p2.
+"abc-cache-1".  The autograder may use a prefix other than p2 and
+will modify the build commands accordingly.

 Web requests to the caching layer specify a zipcode, and the number of
 addresses that should be returned (the "limit").  To find the answer,
@@ -61,7 +62,9 @@ alternate between the two dataset containers to balance the load.  If
 one dataset server is down, temporarily or long run, the cache server
 should attempt to use the other dataset server to obtain the result.

-**Hint:** think about whether there is any .sh script that will help you quickly test code changes.  For example, you may want it to rebuild your Dockerfiles, cleanup an old Compose cluster, and deploy a new cluster.
+**Hint 1:** you should test your gRPC server before working on the HTTP/caching server.  Testing the gRPC server independently will probably involve writing some simple client programs beyond what we ask you to do.
+
+**Hint 2:** think about whether there is any .sh script that will help you quickly test code changes.  For example, you may want it to rebuild your Dockerfiles, cleanup an old Compose cluster, and deploy a new cluster.

 ## Part 1: gRPC Server (Dataset Layer)

@@ -83,7 +86,7 @@ server.wait_for_termination()

 The server should read Madison addresses from "addresses.csv.gz" (downloaded from https://data-cityofmadison.opendata.arcgis.com/datasets/a72d02a4fda34327ae68dd0c2fd07455_20/explore) prior to the first request so it is ready to return addresses.  Given a zipcode, it should return "limit" number of addresses (return the first ones according to an alphanumeric sort).

-Create a Dockerfile.dataset that builds a Docker image with your code
+Create a Dockerfile.dataset that lets you build a Docker image with your code
 and any necessary resources.  Note that we won't install any Python
 packages (such as the gRPC tools) on our test VM, so it is important
 that compiling your .proto file is one of the steps that happens
@@ -93,7 +96,7 @@ dataset at build time.
 ## Part 2: HTTP Server (Cache Layer)

 Create an HTTP server in a "cache.py" file.  You can do this with the
-help of Flask package: https://flask.palletsprojects.com/en/stable/.
+help of the Flask framework: https://flask.palletsprojects.com/en/stable/.
 Here is some starter code you can use:

 ```python
@@ -120,14 +123,14 @@ to get real addresses to return back.  Note that the Docker compose
 file passes in a "PROJECT" environment variable that you can access
 via `os.environ`.  When you deploy server.py in a Docker container
 with the help of compose, the two dataset servers will be reachable at
-"<PROJECT>-dataset-1:5000" and "<PROJECT>-dataset-2:5000", so you can
+"\<PROJECT\>-dataset-1:5000" and "\<PROJECT\>-dataset-2:5000", so you can
 create the gRPC channels/stubs accordingly in cache.py.

 Your cache.py program should alternate between sending requests to
 dataset server 1 or 2 in order to balance load (the first request
 should go to server 1).  In the "source" field of the returned JSON
-value, return "1" or "2" to indicate to a client where cache.py
-obtained the answer.
+value, return "1" or "2" to indicate to a client which dataset server
+cache.py relied on to obtain the answer.

 ## Part 3: Retry

@@ -149,7 +152,7 @@ Specifications:
 * implement an LRU cache of size 3
 * a cache entry should consist of a zipcode and 8 corresponding addresses
 * if an HTTP request specifies a limit <8 and there IS a corresponding cache entry, just slice the cache entry to get the desired number of addresses
-* if an HTTP request specifies a limit <8 and there IS NOT a corresponding cache entry, request 8 addresses from the dataset server anyway so we can create a cache entry useful for subsequent requests (adding additional values to the cache that are not immediately needed is called "prefetching")
+* if an HTTP request specifies a limit <8 and there IS NOT a corresponding cache entry, request 8 addresses from the dataset server anyway so we can create a cache entry useful for subsequent requests
 * if an HTTP request specifies a limit >8, we will not be able to use the cache to respond to the request, but you should still add the first 8 addresses to the cache (if not already present)
 * caching should allow the HTTP servers to continue to function in a limited capacity even if all the dataset servers are down
 * the "source" entry should be "cache" (no gRPC call necessary), or "1" or "2" (got the data from a dataset server)
@@ -162,8 +165,8 @@ repo.
 You have some flexibility about how you write your code, but we must be able to run it like this:

 ```
-docker build . -f Dockerfile.cache -t cache
-docker build . -f Dockerfile.dataset -t dataset
+docker build . -f Dockerfile.cache -t p2-cache
+docker build . -f Dockerfile.dataset -t p2-dataset
 docker compose up -d
 ```