a column of data without reading the whole file. Check out the `columns`
parameter of [`pyarrow.parquet.read_table`](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html).
You can also find an example from the [lecture notes](https://git.doit.wisc.edu/cdis/cs/courses/cs544/s25/main/-/tree/main/lec/14-file-formats?ref_type=heads).
**Requirement:** when the server is asked to sum over the column of a
Parquet file, it should only read the data from that column, not other
**Note:** we will run your server with a 512-MB limit on RAM. Any
**Note 1:** we will run your server with a 512-MB limit on RAM. Any
individual files we upload will fit within that limit, but the total
size of the files uploaded will exceed that limit. That's why your
server will have to do sums by reading the files (instead of just
keeping all table data in memory). If you want manually test your
code with some bigger uploads, use the `bigdata.py` client. Instead
of uploading files, it randomly generateds lots of CSV-formatted data
and directly uploads it via gRPC.
keeping all table data in memory).
**Note 2:** the `bigdata.py` randomly generates a large volumne of
CSV-formatted data and uploads it vis gRPC. You are *required* to
test your upload implementation with this script and it will be used
as part of our tests.
## Part 4: Locking
docker build .-t p3
# run server in new container
docker run --name=yournetid -d-m 512m -v ./inputs:/inputs p3
docker run --name=yournetid -d-m 512m -v ./inputs:/inputs p3
Whenever you push to `main`, we run `autobadger` on your `main` branch. We then push our results to your repository under `Issues`.
This issue will contain the contents of `autobadger` as well as some other metadata and notes. This will almost always be your project's final grade, though we do manual reviews of your code as well to check against cheating and hardcoding. We also take the highest grade of all your submissions. In other words, if you get 100 on a GitLab issue, then you are done! :)
**It is important to note that it is *your responsibility* to verify**:
1. You receive a GitLab issue (within a reasonable amount of time, i.e. an hour, but normally much shorter than that)
2. The results you see align with what you expect.
If there is an issue with (1) or (2), double check your code, give it some time before you push again or [rerun your GitLab pipeline](https://piazza.com/class/m64hzy9v23v398/post/85) manually. If the issue is not resolved after a few attempts, then reach out to your [TA](https://tyler.caraza-harter.com/cs544/s25/messages.html?topic=ta) or visit us in office hours.
> **NOTE**: in cases around/after the deadline, it is better manually rerun the pipeline (if you suspect that your code is fine) than to push to `main` again. We keep track of your latest push to check against the project's deadline.
As such, it is _highly recommended_ to start early, push often, and not wait till the minutes before the deadline to submit! Give yourself a buffer against unexpected issues.
Since it is your responsibility to verify your GitLab issue (and your submission), we will not accept revision requests due to you not checking the status of your GitLab issues beforehand.
> **NOTE**: Be careful not to push after the deadline unless your intention is to submit late (see policy below).
### Miscellaneous
* projects have four parts; for notebooks, use big headers to divide your work into the four parts ("# Part 1: ...")
* for question based project work, (Q1, Q2, etc), include comments like ("# Q1: ...") before the answers