p3 draft update

82ce5a9f · Jing Lan · 6d7783f0 · 82ce5a9f
Commit 82ce5a9f authored 2 months ago by Jing Lan
--- a/p3/README.md
+++ b/p3/README.md
@@ -152,7 +152,7 @@ More specifically, you will need to manually create *N* threads for `client.py`
 **HINT:** Before moving to work on the `server`, test your multi-threading client by running it with a single thread:

 ```bash
-python3 client.py workload 1
+python3 client.py workload 1 # set to use only 1 thread
 ```

 ### Server
@@ -172,60 +172,45 @@ grpc.server(

 ## Part 4: Benchmarking the System

+Congratulations, you have implemented a minimal parallel data system! Let's write a small script to finally benchmark it with different scales (i.e., number of worker threads). Overall, the script is expected to perform the following tasks:

+1. Run `client.py` multiple times with different therading parameters, record their execution time.
+2. plot the data to visualize the performance trend.

-## Grading
+### Driving the Client

-<!-- Details about the autograder are coming soon. -->
+Each time `benchmark.py` should collect 4 pairs of data by running `client.py` with 1, 2, 4, and 8 thread(s). Wrap each `client.py` execution with a pair of timestamp collection. Then calculate the execution time. Make sure you always reset the server before sending the `workload`, by issuing a `Purge()` command through `client.py`:

-Copy `autograde.py` to your working directory 
-then run `python3 -u autograde.py` to test your work.
-This constitutes 75% of the total score. You can add `-v` flag to get a verbose output from the autograder.
+```bash
+python3 client.py purge
+# let sometime for the reset to complete
+time.sleep(3)
+# test follows...
+```

-If you want to manually test on a somewhat bigger dataset, run
-`python3 bigdata.py`.  This generates 100 millions rows across 400
-files and uploads them.  The "x" column only contains 1's, so you if
-sum over it, you should get 100000000.
+You may also want `benchmark.py` to wait a few seconds for the `server` to get ready for any client RPC requests.

-The other 25% of the total score will be graded by us.
-Locking and performance-related details are hard to automatically
-test, so here's a checklist of things we'll be looking for:
+**HINT 1:** You can get a timestamp with `time.time()`.

- are there 8 threads?
- is the lock held when shared data structures accessed?
- is the lock released when files are read or written?
- does the summation RPC use either parquets or CSVs based on the passed argument?
- when a parquet is read, is the needed column the only one that is read?
+**HINT 2:** There are multiple tools to launch a python program from within another. Examples are `os.system()` and `subprocess.run`.

-## Submission
+### Visualizing the Results

-You have some flexibility in how your organize your project
-files. However, we need to be able to easily run your code.  In order
-to be graded, please ensure to push anything necessary so that we'll
-be able to run your client and server as follows:
+Plot a simple line graph with the execution time acquired by the previous step. Save the figure to a file called `plot.png`. Your figure must include at least 4 data points as mentioned above.

-```sh
-git clone YOUR_REPO
-cd YOUR_REPO
+**HINT 1:** `matplotlib` will be a standard toolkit to visualize your data.

-# copy in tester code and client programs...
-python3 -m venv venv
-source venv/bin/activate
-pip3 install grpcio==1.66.1 grpcio-tools==1.66.1 numpy==2.1.1 protobuf==5.27.2 pyarrow==17.0.0 setuptools==75.1.0
+**HINT 2:** `benchmark.py` needs no more than 50 lines of code. Don't complicate your solution.

-# run server
-docker build . -t p3
-docker run -d -m 512m -p 127.0.0.1:5440:5440 p3
+## Submission

-# run clients
-python3 upload.py simple.csv
-python3 csvsum.py x
-python3 parquetsum.py x
-```
+Delirable should work with the `docker-compose.yaml` we provide:
+
+1. `Dockerfile.client` must launch `benchmark.py` **(NOT `client.py`)**. To achieve this, you need to copy both `client.py` and the driver `benchmark.py` to the image, as well as `workload`, `purge`, and the input CSV files. It is sufficient to submit a minimal working set as we may test your code with different datasets and workloads.
+2. `Dockerfile.server` must launch `server.py`.

-Please do include the files built from the .proto.  Do NOT include the venv directory.
+**Requirement:** Do **NOT** submit the `venv` directory (e.g., use `.gitignore`).
+
+## Grading

-After pushing your code to the designated GitLab repository, 
-you can also verify your submission. 
-To do so, simply copy `check_sub.py` to your working directory and run 
-the command `python3 check_sub.py`
+TBD
\ No newline at end of file