p3 draft update

3bca390a · Jing Lan · 82ce5a9f · 3bca390a · 3bca390a
Commit 3bca390a authored 2 months ago by Jing Lan
--- a/p3/README.md
+++ b/p3/README.md
@@ -22,7 +22,7 @@ Before starting, please review the [general project directions](../projects.md).

 ## Part 1: Communication (gRPC)

-In this project, the client program `client.py` will communicate with a server, `server.py`, via gRPC. We provide starter code for the client program. Your job is to write a `.proto` file to generate a gRPC stub (used by our client) and servicer class that you will inherit from in server.py).
+In this project, the client program `client.py` will communicate with a server, `server.py`, via gRPC. We provide starter code for the client program. Your job is to write a `.proto` file to generate a gRPC stub (used by our client) and servicer class that you will inherit from in server.py.

 Take a moment to look at code for the client code and answer the following questions:

@@ -38,7 +38,7 @@ Now build the .proto on your VM. Install the tools like this:
 ```bash
 python3 -m venv venv
 source venv/bin/activate
-pip3 install grpcio==1.66.1 grpcio-tools==1.66.1 protobuf==5.27.2
+pip3 install grpcio==1.70.0 grpcio-tools==1.60.0 protobuf==5.29.3
 ```

 Then use `grpc_tools.protoc` to build your `.proto` file.
@@ -53,11 +53,9 @@ python3 client.py workload
 # should see multiple "TODO"s
 ```

-In P3, `client.py` takes in a batch of operation commands stored in `workload` and executes them line by line. Inspect both the `workload` content and client code (i.e., `read_workload_file()`) to understand how each text command leads to one `gRPC` call. A separate `purge` workload file is provided and *should not be modified*. The client can use a RPC call `Purge()` to reset the server and remove all files stored by the remote peer.
+In P3, `client.py` takes in a batch of operation commands stored in file `workload` and executes them line by line. Inspect both the `workload` file content and client code (i.e., `read_workload_file()`) to understand how each text command leads to one `gRPC` call. A separate `purge` workload file is provided and *should not be modified*. The client can use a RPC call `Purge()` to reset the server and remove all files stored by the remote peer.

-Create a `Dockerfile.server` to build an image that will also let you run your
-server in a container.  It should be possible to build and run your
-server like this:
+Create a `Dockerfile.server` to build an image that will also let you run your server in a container.  It should be possible to build and run your server like this:

 ```bash
 docker build . -f Dockerfile.server -t ${PROJECT}-server
@@ -70,7 +68,7 @@ Like P2, the compose file assumes a "PROJECT" environment variable. You can set
 export PROJECT=p3
 ```

-The client program should then be able to communicate with the server program the same way they communicated with that outside of a container. Once your client program successfully interacts with the dockerrized server, you should similarly draft a `Dockerfile.client` to build a container for `client.py`. Finally, test your setup with `docker compose`:
+The client program should then be able to communicate with the server program the same way it communicated with that outside of a container. Once your client program successfully interacts with the dockerized server, you should similarly draft a `Dockerfile.client` to build a container for `client.py`. Finally, test your setup with `docker compose`:

 ```bash
 docker compose up -d
@@ -81,7 +79,7 @@ fa8de65e0e7c   p3-client   "python3 -u /client.…"   2 seconds ago   ...
 4c899de6e43f   p3-server   "python3 -u /server.…"   2 seconds ago   ...
 ```

-**HINT 1:** consider writing a .sh script that helps you merge code changes. Everytime you modify the source code `client.py/server.py/benchmark.py`, you may want to rebuild the images, bring down the previous docker cluster, and re-instantiate a new cluster.
+**HINT:** consider writing a .sh script that helps you redeploy code changes. Everytime you modify the source code `client.py/server.py/benchmark.py`, you may want to rebuild the images, bring down the previous docker cluster, and re-instantiate a new cluster.

 ## Part 2: Server Implementation

@@ -90,8 +88,8 @@ You will need to implement three RPC calls on the server side:
 ### Upload

 This method should:
-1. Read table from bytes provided by the RPC request
-2. Write the table to a CSV file and write the same table to another file in Parquet format
+1. Recover the uploaded CSV table from *binary* bytes carried by the RPC request message.
+2. Write the table to a CSV file and write the same table to another file in Parquet format.

 **HINT 1:** You are free to decide the names and locations of the stored files. However, the server must keep these records to process future queries (for instance, you can add paths to a data structure like a list or dictionary).

@@ -99,13 +97,9 @@ This method should:

 ### ColSum

-Whenever your server receives a column summation request, it should loop over all the data files that has been uploaded, compute a local sum for each such file, and finally return a total sum for the whole table.
+Whenever your server receives a column summation request, it should loop over all the data files that have been uploaded, compute a local sum for each such file, and finally return a total sum for the whole table.

-When your server receives a column summation request, it should loop
-over all the data that has been uploaded, computing a sum for each
-file, and returning a total sum.
-
-For example, assume sample1.csv and sample2.csv contain these records:
+For example, assume that the `server` has uploaded two files sample1.csv and sample2.csv, which contain these records respectively:

 ```
 x,y,z
@@ -133,13 +127,13 @@ s c w # should print 0

 You can assume columns contain only integers. The table does not have a fixed schema (i.e., it is not guaranteed that a column appears in any uploaded file). You should skip a file if it lacks the target column (e.g., z and w in the above example).

-The server should sum over either Parquet or CSV files according to the input `format` (not both). You should expect querying one column by two formats to produce the same output.
+The server should sum over either Parquet or CSV files according to the input `format` (not both). For a given column, the query results for format="parquet" should be the same as for format="csv", while performance may differ.

 ### Purge

 This method facilitates testing and subsequent benchmarking. The method should:
 1. Remove all local file previously uploaded by method `Upload()`
-2. Reset all associated server state (e.g., counters, paths, etc)
+2. Reset all associated server state (e.g., names, paths, etc.)

 ## Part 3: Multi-threading Server/Client

@@ -157,7 +151,7 @@ python3 client.py workload 1 # set to use only 1 thread

 ### Server

-Now with concurrent requests sent from `client.py`, you must correspondingly protect your server from data race with `threading.Lock()`. Make sure only one thread can modify the server state (e.g., names, paths, counters...). Note that you don't need to explicitly create threads for `server.py` as gRPC can do that for you. The following example code creates a thread pool with 8 threads:
+Now with concurrent requests sent from `client.py`, you must correspondingly protect your server from data race with `threading.Lock()`. Make sure only one thread can modify the server state (e.g., lists of names or paths). Note that you don't need to explicitly create threads for `server.py` as gRPC can do that for you. The following example code creates a thread pool with 8 threads:

 ```python
 grpc.server(

--- a/p3/docker-compose.yaml
+++ b/p3/docker-compose.yaml
+name: ${PROJECT}
+
+services:
+  server:
+    image: ${PROJECT}-server
+    ports:
+      - "5440:5440"
+    environment:
+      - PROJECT=${PROJECT}
+
+  client:
+    image: ${PROJECT}-client
+    environment:
+      - PROJECT=${PROJECT}
\ No newline at end of file