Just some Spark tutorial with Scala
For this project we will be consuming data from here.
Concept Description RDD Low-level distributed collection (rarely used directly now) DataFrame Table-like abstraction; most common API SparkSession Entry point to Spark functionality Transformations .filter(), .select(), .map(), etc. Actions .show(), .collect(), .count(), etc. Caching df.cache() for repeated use Reading files .read.csv, .read.json, .read.parquet