From 677df2492fc8a866391e830fdc8c858e07cb575b Mon Sep 17 00:00:00 2001 From: wyang338 <wyang338@wisc.edu> Date: Thu, 6 Mar 2025 04:32:31 +0000 Subject: [PATCH] Fix the wrong expected file size in Part 1 and sum of blocks in Part 2. --- p4/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/p4/README.md b/p4/README.md index ccf0be7..919487e 100644 --- a/p4/README.md +++ b/p4/README.md @@ -96,9 +96,10 @@ In this part, your task is to implement the `DbToHdfs` gRPC call (you can find t 4. Upload the generated table to `/hdma-wi-2021.parquet` in the HDFS, with **2x** replication and a **1-MB** block size, using PyArrow (https://arrow.apache.org/docs/python/generated/pyarrow.fs.HadoopFileSystem.html). To check whether the upload was correct, you can use `docker exec -it` to enter the gRPC server's container and use HDFS command `hdfs dfs -du -h <path>`to see the file size. The expected result is: - ``` -15.3 M 30.5 M hdfs://nn:9000/hdma-wi-2021.parquet - ``` + +``` +14.4 M 28.9 M hdfs://nn:9000/hdma-wi-2021.parquet +``` **Hint 1:** We used similar tables in lecture: https://git.doit.wisc.edu/cdis/cs/courses/cs544/s25/main/-/tree/main/lec/15-sql @@ -117,7 +118,7 @@ In this part, your task is to implement the `BlockLocations` gRPC call (you can For example, running `docker exec -it p4-server-1 python3 /client.py BlockLocations -f /hdma-wi-2021.parquet` should show something like this: ``` -{'7eb74ce67e75': 15, 'f7747b42d254': 6, '39750756065d': 11} +{'7eb74ce67e75': 15, 'f7747b42d254': 7, '39750756065d': 8} ``` Note: DataNode location is the randomly generated container ID for the -- GitLab