Skip to content
Snippets Groups Projects
Commit 5c6fab94 authored by wyang338's avatar wyang338
Browse files

minor typo fixed

parent 6ba89a5d
No related branches found
No related tags found
1 merge request!3P4 dev
...@@ -69,7 +69,7 @@ export PROJECT=p4 ...@@ -69,7 +69,7 @@ export PROJECT=p4
**Hint 2:** Think about whether there is any .sh script that will help you quickly test code changes. For example, you may want it to rebuild your Dockerfiles, cleanup an old Compose cluster, and deploy a new cluster. **Hint 2:** Think about whether there is any .sh script that will help you quickly test code changes. For example, you may want it to rebuild your Dockerfiles, cleanup an old Compose cluster, and deploy a new cluster.
**Hint 3:** You might find it really helpful to use these command below to clean up the disk space occupied by Docker iamges/containers/networks/volumes artifacts. during the development of this project. **Hint 3:** You might find it really helpful to use these command below to clean up the disk space occupied by Docker iamges/containers/networks/volumes during the development of this project.
```bash ```bash
docker image prune -a -f docker image prune -a -f
docker container prune -f docker container prune -f
...@@ -97,7 +97,7 @@ In this part, your task is to implement the `DbToHdfs` gRPC call (you can find t ...@@ -97,7 +97,7 @@ In this part, your task is to implement the `DbToHdfs` gRPC call (you can find t
+----------+ +----------+
| count(*) | | count(*) |
+----------+ +----------+
| 426716 | | 447367 |
+----------+ +----------+
``` ```
2. What are the actual types for those loans? 2. What are the actual types for those loans?
...@@ -126,7 +126,7 @@ The return value of gRPC call `BlockLocations` should be something like: ...@@ -126,7 +126,7 @@ The return value of gRPC call `BlockLocations` should be something like:
Note: datanode loacation is the randomly generated container ID for Note: datanode loacation is the randomly generated container ID for
the container running the DataNode, so yours will be different, and the distribution of blocks across different nodes may also vary. the container running the DataNode, so yours will be different, and the distribution of blocks across different nodes may also vary.
The documents [here](https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/WebHDFS.html) describe how we can interact with HDFS via web requests. Many [examples](https://requests.readthedocs.io/en/latest/user/quickstart/) show these web requests being made with the curl command, but you'll adapt those examples to use requests.get. By default, WebHDFS runs on port 9870. So use port 9870 instead of 9000 to access HDFS for this part. The documents [here](https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/WebHDFS.html) describe how we can interact with HDFS via web requests. Many [examples](https://requests.readthedocs.io/en/latest/user/quickstart/) show these web requests being made with the curl command, but you'll adapt those examples to use requests.get. By default, WebHDFS runs on port `9870`. So use port `9870` instead of `9000` to access HDFS for this part.
**Hint1:** Note that if `r` is a response object, then `r.content` will contain some bytes, which you could convert to a dictionary; alternatively,`r.json()` does this for you. **Hint1:** Note that if `r` is a response object, then `r.content` will contain some bytes, which you could convert to a dictionary; alternatively,`r.json()` does this for you.
...@@ -178,9 +178,9 @@ Keep in mind that `/hdma-wi-2021.parquet` has a replication factor of 3, while ` ...@@ -178,9 +178,9 @@ Keep in mind that `/hdma-wi-2021.parquet` has a replication factor of 3, while `
**CalcAvgLoan:** To be more specific, modify this gRPC call to support the server in correctly handling responses even when one (or even two) DataNode is offline. You may try reading the partitioned parquet file first. If unsuccessful, then go back to the large `hdma-wi-2021.parquet` file and complete the computation. What's more, you have to return with the field `source` filled: **CalcAvgLoan:** To be more specific, modify this gRPC call to support the server in correctly handling responses even when one (or even two) DataNode is offline. You may try reading the partitioned parquet file first. If unsuccessful, then go back to the large `hdma-wi-2021.parquet` file and complete the computation. What's more, you have to return with the field `source` filled:
1. "partitioned": calculation performed on parquet partitioned before 1. "partitioned": calculation performed on parquet partitioned before
2. "unpartitioned": parquet partitioned before is lost, and calculation performed on the initial unpartitioned table 2. "unpartitioned": parquet partitioned before is lost, so the result is from calculation performed on the initial unpartitioned table
To simulate a data node failure, you should use `docker kill` to terminate a node and then wait until you confirm that the number of `live DataNodes` has decreased using the `hdfs dfsadmin -fs <hdfs_path> -report` command. To simulate a data node failure, you may use `docker kill` to terminate a node and then wait until you confirm that the number of `live DataNodes` has decreased using the `hdfs dfsadmin -fs <hdfs_path> -report` command.
## Submission ## Submission
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment