@@ -112,11 +112,11 @@ Remember that if you have a Spark DataFrame `df`, you can get the underlying RDD
**REMEMBER TO INCLUDE `#q1` AT THE TOP OF THIS CELL**
#### Q2: How many problems are there with a `cf_rating` of at least 1600, having `private_tests`, and a name containing "_A." (Case Sensitive)? Answer by directly using the RDD API. Answer by using the DataFrame API.
#### Q2: How many problems are there with a `cf_rating` of at least 1600, having `private_tests`, and a name containing "_A." (Case Sensitive)? Answer by using the DataFrame API.
This is the same question as Q1, and you should get the same answer. This is to give you to interact with Spark different ways.
#### Q3: How many problems are there with a `cf_rating` of at least 1600, having `private_tests`, and a name containing "_A." (Case Sensitive)? Answer by directly using the RDD API. Answer by using Spark SQL.
#### Q3: How many problems are there with a `cf_rating` of at least 1600, having `private_tests`, and a name containing "_A." (Case Sensitive)? Answer by using Spark SQL.
Before you can use `spark.sql`, write the problem data to a Hive table so that you can refer to it by name.