@@ -152,7 +152,7 @@ More specifically, you will need to manually create *N* threads for `client.py`
**HINT:** Before moving to work on the `server`, test your multi-threading client by running it with a single thread:
```bash
python3 client.py workload 1
python3 client.py workload 1# set to use only 1 thread
```
### Server
...
...
@@ -172,60 +172,45 @@ grpc.server(
## Part 4: Benchmarking the System
Congratulations, you have implemented a minimal parallel data system! Let's write a small script to finally benchmark it with different scales (i.e., number of worker threads). Overall, the script is expected to perform the following tasks:
1. Run `client.py` multiple times with different therading parameters, record their execution time.
2. plot the data to visualize the performance trend.
## Grading
### Driving the Client
<!-- Details about the autograder are coming soon. -->
Each time `benchmark.py` should collect 4 pairs of data by running `client.py` with 1, 2, 4, and 8 thread(s). Wrap each `client.py` execution with a pair of timestamp collection. Then calculate the execution time. Make sure you always reset the server before sending the `workload`, by issuing a `Purge()` command through `client.py`:
Copy `autograde.py` to your working directory
then run `python3 -u autograde.py` to test your work.
This constitutes 75% of the total score. You can add `-v` flag to get a verbose output from the autograder.
```bash
python3 client.py purge
# let sometime for the reset to complete
time.sleep(3)
# test follows...
```
If you want to manually test on a somewhat bigger dataset, run
`python3 bigdata.py`. This generates 100 millions rows across 400
files and uploads them. The "x" column only contains 1's, so you if
sum over it, you should get 100000000.
You may also want `benchmark.py` to wait a few seconds for the `server` to get ready for any client RPC requests.
The other 25% of the total score will be graded by us.
Locking and performance-related details are hard to automatically
test, so here's a checklist of things we'll be looking for:
**HINT 1:** You can get a timestamp with `time.time()`.
- are there 8 threads?
- is the lock held when shared data structures accessed?
- is the lock released when files are read or written?
- does the summation RPC use either parquets or CSVs based on the passed argument?
- when a parquet is read, is the needed column the only one that is read?
**HINT 2:** There are multiple tools to launch a python program from within another. Examples are `os.system()` and `subprocess.run`.
## Submission
### Visualizing the Results
You have some flexibility in how your organize your project
files. However, we need to be able to easily run your code. In order
to be graded, please ensure to push anything necessary so that we'll
be able to run your client and server as follows:
Plot a simple line graph with the execution time acquired by the previous step. Save the figure to a file called `plot.png`. Your figure must include at least 4 data points as mentioned above.
```sh
git clone YOUR_REPO
cd YOUR_REPO
**HINT 1:**`matplotlib` will be a standard toolkit to visualize your data.
**HINT 2:**`benchmark.py` needs no more than 50 lines of code. Don't complicate your solution.
# run server
docker build .-t p3
docker run -d-m 512m -p 127.0.0.1:5440:5440 p3
## Submission
# run clients
python3 upload.py simple.csv
python3 csvsum.py x
python3 parquetsum.py x
```
Delirable should work with the `docker-compose.yaml` we provide:
1.`Dockerfile.client` must launch `benchmark.py`**(NOT `client.py`)**. To achieve this, you need to copy both `client.py` and the driver `benchmark.py` to the image, as well as `workload`, `purge`, and the input CSV files. It is sufficient to submit a minimal working set as we may test your code with different datasets and workloads.
2.`Dockerfile.server` must launch `server.py`.
Please do include the files built from the .proto. Do NOT include the venv directory.
**Requirement:** Do **NOT** submit the `venv` directory (e.g., use `.gitignore`).
## Grading
After pushing your code to the designated GitLab repository,
you can also verify your submission.
To do so, simply copy `check_sub.py` to your working directory and run