From 6ba89a5d84ef09ccae3c26ae06a6f735e4dd2a31 Mon Sep 17 00:00:00 2001
From: wyang338 <weichuyang777@gmail.com>
Date: Sun, 2 Mar 2025 21:28:51 -0600
Subject: [PATCH] Remove the format restriction on partitioned Parquet.

---
 p4/README.md | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/p4/README.md b/p4/README.md
index d8eedc2..b61ed0b 100644
--- a/p4/README.md
+++ b/p4/README.md
@@ -157,19 +157,10 @@ In this part, your task is to implement the `PartitionByCounty` and `CalcAvgLoan
 
 Imagine a scenario where there could be many queries differentiated by `county`, and one of them is to get the average loan amount for a county. In this case, it might be much more efficient to generate a set of 1x Parquet files filtered by county, and then read data from these partitioned, relatively much smaller tables for computation.
 
-**PartitionByCounty:** To be more specific, you need to categorize the contents of that parquet file just stored in HDFS using `county_id` as the key. For each `county_id`, create a new parquet file that records all entries under that county, and then save them with a **1x replication**. Files should be written into folder `/partitioned/` and name for each should be their `county_id`.
+**PartitionByCounty:** To be more specific, you need to categorize the contents of that parquet file just stored in HDFS using `county_id` as the key. For each `county_id`, create a new parquet file that records all entries under that county, and then save them with a **1x replication**. Files should be written into folder `/partitioned/`.
 
 **CalcAvgLoan:** To be more specific, for a given `county_id` , you need to return a int value, indicating the average `loan_amount` of that county. **Note:** You are required to perform this calculation based on the partitioned parquet files generated by `FilterByCounty`. `source` field in proto file can ignored in this part.
 
-The inside of the partitioned directory should look like this:
-
-      ```
-      ├── partitioned/
-      │   ├── 55001.parquet
-      │   ├── 55003.parquet
-      │   └── ...
-      ```
-
 The root directory on HDFS should now look like this:
 ```
 14.4 M  43.2 M  hdfs://boss:9000/hdma-wi-2021.parquet
-- 
GitLab