- Pitfalls in pseudo-random number sampling at scale with Apache Spark - Jun 27, 2017.
Large scale simulation of random number generation is possible with today’s high speed & scalable distributed computing frameworks. Let’s understand how it can be achieved using Apache Spark.
Apache Spark, GitHub, Random, RDD
- Apache Spark Key Terms, Explained - Jun 13, 2016.
An overview of 13 core Apache Spark concepts, presented with focus and clarity in mind. A great beginner's overview of essential Spark terminology.
Pages: 1 2
Apache Spark, Databricks, Dataset, Explained, Key Terms, RDD, Tungsten
- Apache Spark: RDD, DataFrame or Dataset? - Feb 3, 2016.
There are now 3 Apache Spark APIs. Here’s how to choose the right one.
Pages: 1 2
Apache Spark, API, Dataset, Java, RDD, Scala
- The Big ‘Big Data’ Question: Hadoop or Spark? - Aug 5, 2015.
With a considerable number of similarities, Hadoop and Spark are often wrongly considered as the same. Bernard carefully explains the differences between the two and how to choose the right one (or both) for your business needs.
Pages: 1 2
Apache Spark, Bernard Marr, Data Science Tools, Distributed Systems, Hadoop, Machine Learning, Performance, RDD