From 82ce5a9f032730a3ad2ef5ac174fcd0e1d63428a Mon Sep 17 00:00:00 2001
From: Jing Lan <jlan25@cs544-jlan25.cs.wisc.edu>
Date: Thu, 20 Feb 2025 23:42:10 -0600
Subject: [PATCH] p3 draft update

---
 p3/README.md | 71 +++++++++++++++++++++-------------------------------
 1 file changed, 28 insertions(+), 43 deletions(-)

diff --git a/p3/README.md b/p3/README.md
index 719dc3b..0474c45 100644
--- a/p3/README.md
+++ b/p3/README.md
@@ -152,7 +152,7 @@ More specifically, you will need to manually create *N* threads for `client.py`
 **HINT:** Before moving to work on the `server`, test your multi-threading client by running it with a single thread:
 
 ```bash
-python3 client.py workload 1
+python3 client.py workload 1 # set to use only 1 thread
 ```
 
 ### Server
@@ -172,60 +172,45 @@ grpc.server(
 
 ## Part 4: Benchmarking the System
 
+Congratulations, you have implemented a minimal parallel data system! Let's write a small script to finally benchmark it with different scales (i.e., number of worker threads). Overall, the script is expected to perform the following tasks:
 
+1. Run `client.py` multiple times with different therading parameters, record their execution time.
+2. plot the data to visualize the performance trend.
 
-## Grading
+### Driving the Client
 
-<!-- Details about the autograder are coming soon. -->
+Each time `benchmark.py` should collect 4 pairs of data by running `client.py` with 1, 2, 4, and 8 thread(s). Wrap each `client.py` execution with a pair of timestamp collection. Then calculate the execution time. Make sure you always reset the server before sending the `workload`, by issuing a `Purge()` command through `client.py`:
 
-Copy `autograde.py` to your working directory 
-then run `python3 -u autograde.py` to test your work.
-This constitutes 75% of the total score. You can add `-v` flag to get a verbose output from the autograder.
+```bash
+python3 client.py purge
+# let sometime for the reset to complete
+time.sleep(3)
+# test follows...
+```
 
-If you want to manually test on a somewhat bigger dataset, run
-`python3 bigdata.py`.  This generates 100 millions rows across 400
-files and uploads them.  The "x" column only contains 1's, so you if
-sum over it, you should get 100000000.
+You may also want `benchmark.py` to wait a few seconds for the `server` to get ready for any client RPC requests.
 
-The other 25% of the total score will be graded by us.
-Locking and performance-related details are hard to automatically
-test, so here's a checklist of things we'll be looking for:
+**HINT 1:** You can get a timestamp with `time.time()`.
 
-- are there 8 threads?
-- is the lock held when shared data structures accessed?
-- is the lock released when files are read or written?
-- does the summation RPC use either parquets or CSVs based on the passed argument?
-- when a parquet is read, is the needed column the only one that is read?
+**HINT 2:** There are multiple tools to launch a python program from within another. Examples are `os.system()` and `subprocess.run`.
 
-## Submission
+### Visualizing the Results
 
-You have some flexibility in how your organize your project
-files. However, we need to be able to easily run your code.  In order
-to be graded, please ensure to push anything necessary so that we'll
-be able to run your client and server as follows:
+Plot a simple line graph with the execution time acquired by the previous step. Save the figure to a file called `plot.png`. Your figure must include at least 4 data points as mentioned above.
 
-```sh
-git clone YOUR_REPO
-cd YOUR_REPO
+**HINT 1:** `matplotlib` will be a standard toolkit to visualize your data.
 
-# copy in tester code and client programs...
-python3 -m venv venv
-source venv/bin/activate
-pip3 install grpcio==1.66.1 grpcio-tools==1.66.1 numpy==2.1.1 protobuf==5.27.2 pyarrow==17.0.0 setuptools==75.1.0
+**HINT 2:** `benchmark.py` needs no more than 50 lines of code. Don't complicate your solution.
 
-# run server
-docker build . -t p3
-docker run -d -m 512m -p 127.0.0.1:5440:5440 p3
+## Submission
 
-# run clients
-python3 upload.py simple.csv
-python3 csvsum.py x
-python3 parquetsum.py x
-```
+Delirable should work with the `docker-compose.yaml` we provide:
+
+1. `Dockerfile.client` must launch `benchmark.py` **(NOT `client.py`)**. To achieve this, you need to copy both `client.py` and the driver `benchmark.py` to the image, as well as `workload`, `purge`, and the input CSV files. It is sufficient to submit a minimal working set as we may test your code with different datasets and workloads.
+2. `Dockerfile.server` must launch `server.py`.
 
-Please do include the files built from the .proto.  Do NOT include the venv directory.
+**Requirement:** Do **NOT** submit the `venv` directory (e.g., use `.gitignore`).
+
+## Grading
 
-After pushing your code to the designated GitLab repository, 
-you can also verify your submission. 
-To do so, simply copy `check_sub.py` to your working directory and run 
-the command `python3 check_sub.py`
+TBD
\ No newline at end of file
-- 
GitLab