Data engineering is core to any big data analytics project. A key function data engineers often perform is aggregating large amounts of data to create various groupings for many different uses in data science. However, as data volumes and complexities increase, the act of performing various forms of aggregations gets more challenging.
In this eBook we cover
- Why cluster computing makes Apache Spark™ the ideal processing engine for complex aggregations.
- The different types of aggregations that you can perform with Spark.