KDnuggets Home » News » 2016 » Apr » Webcasts » Webinar: High Performance Hadoop With Python, May 5th ( 16:n16 )

Webinar: High Performance Hadoop With Python, May 5th


On May 5th, Dr. Kristopher Overholt and Dr. Matthew Rocklin of Continuum Analytics will present a webinar on High Performance Hadoop with Python. Reserve your spot today!



Webinar: High Performance Hadoop With Python

Python is the fastest growing Open Data Science language and is used more than 50% of the time to extract value from Big Data in Spark. However, both PySpark and SparkR involve JVM overhead and Python/Java serialization when interacting with Spark which negatively impacts the time-to-value from your Big Data. What if there was a way to leverage the entire Python ecosystem without refactoring your Hadoop-based data science investments and get high performance?

Anaconda, the leading Open Data Science Platform, delivers high performance Python for Hadoop. You get to leverage your existing Python-based data science investments with your existing Hadoop or HPC clusters. Anaconda bypasses the typical Hadoop performance issues, leverages existing high performance scientific and array-based computing in Python and now leverages Dask, the powerful parallel execution framework, to deliver fast results on any enterprise Hadoop distribution such as Cloudera and Hortonworks.

On May 5th, Dr. Kristopher Overholt and Dr. Matthew Rocklin of Continuum Analytics will present a webinar on High Performance Hadoop with Python.

In this webinar, you'll learn to:

  • Analyze NYC taxi data through distributed DataFrames on a cluster on HDFS
  • Create interactive distributed visualizations of global temperature data
  • Distribute in-memory natural language processing and interactive queries on text data in HDFS
  • Wrap and parallelize existing legacy code on custom file formats

Sincerely,

Team Anaconda

Any questions? Click here to contact us.

Sign Up

By subscribing you accept KDnuggets Privacy Policy