Skip to content
Snippets Groups Projects
Commit 6ba89a5d authored by wyang338's avatar wyang338
Browse files

Remove the format restriction on partitioned Parquet.

parent 37aae097
No related branches found
No related tags found
No related merge requests found
......@@ -157,19 +157,10 @@ In this part, your task is to implement the `PartitionByCounty` and `CalcAvgLoan
Imagine a scenario where there could be many queries differentiated by `county`, and one of them is to get the average loan amount for a county. In this case, it might be much more efficient to generate a set of 1x Parquet files filtered by county, and then read data from these partitioned, relatively much smaller tables for computation.
**PartitionByCounty:** To be more specific, you need to categorize the contents of that parquet file just stored in HDFS using `county_id` as the key. For each `county_id`, create a new parquet file that records all entries under that county, and then save them with a **1x replication**. Files should be written into folder `/partitioned/` and name for each should be their `county_id`.
**PartitionByCounty:** To be more specific, you need to categorize the contents of that parquet file just stored in HDFS using `county_id` as the key. For each `county_id`, create a new parquet file that records all entries under that county, and then save them with a **1x replication**. Files should be written into folder `/partitioned/`.
**CalcAvgLoan:** To be more specific, for a given `county_id` , you need to return a int value, indicating the average `loan_amount` of that county. **Note:** You are required to perform this calculation based on the partitioned parquet files generated by `FilterByCounty`. `source` field in proto file can ignored in this part.
The inside of the partitioned directory should look like this:
```
├── partitioned/
│ ├── 55001.parquet
│ ├── 55003.parquet
│ └── ...
```
The root directory on HDFS should now look like this:
```
14.4 M 43.2 M hdfs://boss:9000/hdma-wi-2021.parquet
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment