Remove the format restriction on partitioned Parquet.

6ba89a5d · wyang338 · 37aae097 · 6ba89a5d
Commit 6ba89a5d authored 1 week ago by wyang338
--- a/p4/README.md
+++ b/p4/README.md
@@ -157,19 +157,10 @@ In this part, your task is to implement the `PartitionByCounty` and `CalcAvgLoan

 Imagine a scenario where there could be many queries differentiated by `county`, and one of them is to get the average loan amount for a county. In this case, it might be much more efficient to generate a set of 1x Parquet files filtered by county, and then read data from these partitioned, relatively much smaller tables for computation.

-**PartitionByCounty:** To be more specific, you need to categorize the contents of that parquet file just stored in HDFS using `county_id` as the key. For each `county_id`, create a new parquet file that records all entries under that county, and then save them with a **1x replication**. Files should be written into folder `/partitioned/` and name for each should be their `county_id`.
+**PartitionByCounty:** To be more specific, you need to categorize the contents of that parquet file just stored in HDFS using `county_id` as the key. For each `county_id`, create a new parquet file that records all entries under that county, and then save them with a **1x replication**. Files should be written into folder `/partitioned/`.

 **CalcAvgLoan:** To be more specific, for a given `county_id` , you need to return a int value, indicating the average `loan_amount` of that county. **Note:** You are required to perform this calculation based on the partitioned parquet files generated by `FilterByCounty`. `source` field in proto file can ignored in this part.

-The inside of the partitioned directory should look like this:
-
-      ```
-      ├── partitioned/
-      │   ├── 55001.parquet
-      │   ├── 55003.parquet
-      │   └── ...
-      ```
-
 The root directory on HDFS should now look like this:
 ```
 14.4 M  43.2 M  hdfs://boss:9000/hdma-wi-2021.parquet