From 1b7325924123936c1f1f34b15bdd953306553f92 Mon Sep 17 00:00:00 2001 From: Jing Lan <jlan25@cs544-jlan25.cs.wisc.edu> Date: Fri, 21 Feb 2025 00:57:58 -0600 Subject: [PATCH] p3 draft update --- p3/README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/p3/README.md b/p3/README.md index 0ec059c..5242625 100644 --- a/p3/README.md +++ b/p3/README.md @@ -91,6 +91,8 @@ This method should: 1. Recover the uploaded CSV table from *binary* bytes carried by the RPC request message. 2. Write the table to a CSV file and write the same table to another file in Parquet format. +**Requirement:** Write two files to disk per upload. We will test your server with a 512MB memory limit. Do *NOT* keep the table data in memory. + **HINT 1:** You are free to decide the names and locations of the stored files. However, the server must keep these records to process future queries (for instance, you can add paths to a data structure like a list or dictionary). **HINT 2:** Both `pandas` and `pyarrow` provide interfaces to write a table to file. @@ -166,7 +168,7 @@ grpc.server( ## Part 4: Benchmarking the System -Congratulations, you have implemented a minimal parallel data system! Let's write a small script to finally benchmark it with different scales (i.e., number of worker threads). Overall, the script is expected to perform the following tasks: +Congratulations, you have implemented a minimal multi-threading data system! Let's write a small script to finally benchmark it with different scales (i.e., number of worker threads). Overall, the script is expected to perform the following tasks: 1. Run `client.py` multiple times with different therading parameters, record their execution time. 2. plot the data to visualize the performance trend. @@ -198,7 +200,7 @@ Plot a simple line graph with the execution time acquired by the previous step. ## Submission -Delirable should work with the `docker-compose.yaml` we provide: +Delirable should work with `docker-compose.yaml` we provide: 1. `Dockerfile.client` must launch `benchmark.py` **(NOT `client.py`)**. To achieve this, you need to copy both `client.py` and the driver `benchmark.py` to the image, as well as `workload`, `purge`, and the input CSV files. It is sufficient to submit a minimal working set as we may test your code with different datasets and workloads. 2. `Dockerfile.server` must launch `server.py`. -- GitLab