From 82ce5a9f032730a3ad2ef5ac174fcd0e1d63428a Mon Sep 17 00:00:00 2001 From: Jing Lan <jlan25@cs544-jlan25.cs.wisc.edu> Date: Thu, 20 Feb 2025 23:42:10 -0600 Subject: [PATCH] p3 draft update --- p3/README.md | 71 +++++++++++++++++++++------------------------------- 1 file changed, 28 insertions(+), 43 deletions(-) diff --git a/p3/README.md b/p3/README.md index 719dc3b..0474c45 100644 --- a/p3/README.md +++ b/p3/README.md @@ -152,7 +152,7 @@ More specifically, you will need to manually create *N* threads for `client.py` **HINT:** Before moving to work on the `server`, test your multi-threading client by running it with a single thread: ```bash -python3 client.py workload 1 +python3 client.py workload 1 # set to use only 1 thread ``` ### Server @@ -172,60 +172,45 @@ grpc.server( ## Part 4: Benchmarking the System +Congratulations, you have implemented a minimal parallel data system! Let's write a small script to finally benchmark it with different scales (i.e., number of worker threads). Overall, the script is expected to perform the following tasks: +1. Run `client.py` multiple times with different therading parameters, record their execution time. +2. plot the data to visualize the performance trend. -## Grading +### Driving the Client -<!-- Details about the autograder are coming soon. --> +Each time `benchmark.py` should collect 4 pairs of data by running `client.py` with 1, 2, 4, and 8 thread(s). Wrap each `client.py` execution with a pair of timestamp collection. Then calculate the execution time. Make sure you always reset the server before sending the `workload`, by issuing a `Purge()` command through `client.py`: -Copy `autograde.py` to your working directory -then run `python3 -u autograde.py` to test your work. -This constitutes 75% of the total score. You can add `-v` flag to get a verbose output from the autograder. +```bash +python3 client.py purge +# let sometime for the reset to complete +time.sleep(3) +# test follows... +``` -If you want to manually test on a somewhat bigger dataset, run -`python3 bigdata.py`. This generates 100 millions rows across 400 -files and uploads them. The "x" column only contains 1's, so you if -sum over it, you should get 100000000. +You may also want `benchmark.py` to wait a few seconds for the `server` to get ready for any client RPC requests. -The other 25% of the total score will be graded by us. -Locking and performance-related details are hard to automatically -test, so here's a checklist of things we'll be looking for: +**HINT 1:** You can get a timestamp with `time.time()`. -- are there 8 threads? -- is the lock held when shared data structures accessed? -- is the lock released when files are read or written? -- does the summation RPC use either parquets or CSVs based on the passed argument? -- when a parquet is read, is the needed column the only one that is read? +**HINT 2:** There are multiple tools to launch a python program from within another. Examples are `os.system()` and `subprocess.run`. -## Submission +### Visualizing the Results -You have some flexibility in how your organize your project -files. However, we need to be able to easily run your code. In order -to be graded, please ensure to push anything necessary so that we'll -be able to run your client and server as follows: +Plot a simple line graph with the execution time acquired by the previous step. Save the figure to a file called `plot.png`. Your figure must include at least 4 data points as mentioned above. -```sh -git clone YOUR_REPO -cd YOUR_REPO +**HINT 1:** `matplotlib` will be a standard toolkit to visualize your data. -# copy in tester code and client programs... -python3 -m venv venv -source venv/bin/activate -pip3 install grpcio==1.66.1 grpcio-tools==1.66.1 numpy==2.1.1 protobuf==5.27.2 pyarrow==17.0.0 setuptools==75.1.0 +**HINT 2:** `benchmark.py` needs no more than 50 lines of code. Don't complicate your solution. -# run server -docker build . -t p3 -docker run -d -m 512m -p 127.0.0.1:5440:5440 p3 +## Submission -# run clients -python3 upload.py simple.csv -python3 csvsum.py x -python3 parquetsum.py x -``` +Delirable should work with the `docker-compose.yaml` we provide: + +1. `Dockerfile.client` must launch `benchmark.py` **(NOT `client.py`)**. To achieve this, you need to copy both `client.py` and the driver `benchmark.py` to the image, as well as `workload`, `purge`, and the input CSV files. It is sufficient to submit a minimal working set as we may test your code with different datasets and workloads. +2. `Dockerfile.server` must launch `server.py`. -Please do include the files built from the .proto. Do NOT include the venv directory. +**Requirement:** Do **NOT** submit the `venv` directory (e.g., use `.gitignore`). + +## Grading -After pushing your code to the designated GitLab repository, -you can also verify your submission. -To do so, simply copy `check_sub.py` to your working directory and run -the command `python3 check_sub.py` +TBD \ No newline at end of file -- GitLab