Search results for spark streaming

    Found 48 documents, 10397 searched:

  • 7 Steps to Mastering Apache Spark 2.0">Silver Blog7 Steps to Mastering Apache Spark 2.0

    ...tible with Spark 2.0 is available as a spark package. Step 6: Structured Streaming with Infinite DataFrames   For much of Spark’s short history, Spark streaming has continued to evolve, to simplify writing streaming applications. Today, developers need more than just a streaming programming...

    https://www.kdnuggets.com/2016/09/7-steps-mastering-apache-spark.html

  • Apache Spark Key Terms, Explained

    ...nteractively. Spark powers a stack of libraries including SQL, DataFrames, and Datasets, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. You can combine these libraries seamlessly in the same application. As well, Spark runs on a laptop, Apache Hadoop, Apache Mesos,...

    https://www.kdnuggets.com/2016/06/spark-key-terms-explained.html

  • Apache Spark Introduction for Beginners">Silver BlogApache Spark Introduction for Beginners

    ...tion “How to get over limitations of Hadoop MapReduce?” is APACHE SPARK. Don’t interpret that Spark and Hadoop are Competitors. Evolution with Apache Spark Spark was presented by Apache Software Foundation for accelerating the Hadoop computational registering programming process and overcoming its...

    https://www.kdnuggets.com/2018/10/apache-spark-introduction-beginners.html

  • Spark SQL for Real-Time Analytics

    …on all the partitions in parallel. RDDs are resilient, if a partition is lost due to a node crash, it can be reconstructed from the original sources. Spark Streaming provides as abstraction called DStream (discrete streams) which is a continuous stream of data. DStreams are created from input data…

    https://www.kdnuggets.com/2015/09/spark-sql-real-time-analytics.html

  • Spark 2.0 Preview Now on Databricks Community Edition: Easier, Faster, Smarter

    ....” To give you a teaser, we have measured the amount of time (in nanoseconds) it would take to process a row on one core for some of the operators in Spark 1.6 vs. Spark 2.0, and the table below is a comparison that demonstrates the power of the new Tungsten engine. Spark 1.6 includes expression...

    https://www.kdnuggets.com/2016/05/spark-2-preview-databricks-community-edition.html

  • Spark 2015 Year In Review

    ...data science, including DataFrames, Machine Learning Pipelines, and R support. 2. Platform APIs 3. Project Tungsten and Performance Optimizations 4. Spark Streaming With this rapid pace of development, we are also happy to see how quickly users adopt new versions. For example, the graph below...

    https://www.kdnuggets.com/2016/01/spark-2015-year-in-review.html

  • Top Spark Ecosystem Projects

    ...relational table Spark SQL - execute SQL queries written using either a basic SQL syntax or HiveQL, and read data from an existing Hive installation Spark Streaming - an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams...

    https://www.kdnuggets.com/2016/03/top-spark-ecosystem-projects.html

  • Learn how to use PySpark in under 5 minutes (Installation + Tutorial)

    ...ilder.master("local[*]").getOrCreate() num_samples = 100000000 def inside(p): x, y = random.random(), random.random() return x*x + y*y < 1 count = spark.sparkContext.parallelize(range(0, num_samples)).filter(inside).count() pi = 4 * count / num_samples print(pi) The output should be: Please note...

    https://www.kdnuggets.com/2019/08/learn-pyspark-installation-tutorial.html

  • Introduction to Apache Spark

    ...e Apache Hive variant of SQL — called the Hive Query Language (HQL) — and it supports many sources of data, including Hive tables, Parquet, and JSON. Spark Streaming : Spark Streaming is a Spark component that enables processing of live streams of data. Examples of data streams include logfiles...

    https://www.kdnuggets.com/2018/07/introduction-apache-spark.html

  • Exclusive Interview: Matei Zaharia, creator of Apache Spark, on Spark, Hadoop, Flink, and Big Data in 2020

    ...eaching a broader audience than Hadoop users. Most of the development activity in Apache Spark is now in the built-in libraries, including Spark SQL, Spark Streaming, MLlib and GraphX. Out of these, the most popular are Spark Streaming and Spark SQL: about 50-60% of users use each of them...

    https://www.kdnuggets.com/2015/05/interview-matei-zaharia-creator-apache-spark.html

  • Fast Big Data: Apache Flink vs Apache Spark for Streaming Data

    …new technology and innovation area along with technical writing. His main focuses are on web architecture, web technologies, java/j2ee, Open source, big data and semantic technologies. Related Apache Flink and the case for stream processing Exclusive Interview: Matei Zaharia, creator of Apache…

    https://www.kdnuggets.com/2015/11/fast-big-data-apache-flink-spark-streaming.html

  • A Community Event for Innovative Spark Apps: A Datapalooza Dispatch

    ...lities of Spark to find vehicles described in AMBER Alert reports in car traffic video feeds. Live video feed from traffic cameras are ported through Spark Streaming into a Spark Cluster. To extract the images of the individual cars, the live feed is processed through SIFT from OpenCV, which...

    https://www.kdnuggets.com/2015/11/datapalooza-dispatch-kobielus.html

  • Spark and the Remorseless Recrystallization of the Open Source Analytics Ecosystem

    …rs. None of these open-source projects is carved in stone. And even if any of them were, stone is not immune to erosion. Fast streams can etch deep channels in seemingly solid substrates. Original. Related: Hadoop and Big Data: The Top 6 Questions Answered Introduction to Spark with Python Spark +…

    https://www.kdnuggets.com/2016/01/spark-crystallization-open-source-analytics-ecosystem.html

  • Spark SQL for Real Time Analytics – Part Two

    …ta in a DStream to RDD Standard RDD Operations : map, countByValue, reduce, join Stateful operations: window, countByValueAndWindow Window Operations Spark Streaming provides windowed computations, which allows applying transformations over a sliding window of data. This figure illustrates this…

    https://www.kdnuggets.com/2015/09/spark-sql-real-time-analytics-part2.html

  • Spark Summit 2015 San Francisco – Day 2 Keynote Highlights

    ...e Spark Community from various parts of the world. The conference was three-day long (July 15-17, 2015). Leading production users of Spark, SparkSQL, Spark Streaming and other relevant technologies discussed project development and use of Spark Stack in variety of verticals and applications. On...

    https://www.kdnuggets.com/2015/06/spark-summit-2015-keynote-highlights-day2.html

  • Spark Summit 2015 San Francisco – Day 1 Keynote Highlights

    ...che Spark Community from various parts of the world. The three-day event (July 15-17, 2015) is still on. Leading production users of Spark, SparkSQL, Spark Streaming and other relevant technologies are discussing project development and use of Spark Stack in variety of verticals and applications....

    https://www.kdnuggets.com/2015/06/spark-summit-2015-keynote-highlights-day1.html

  • Deep Learning With Apache Spark: Part 2

    ...cial API, so is worth taking a look of it. Some of the advantages of this library compared to the ones that joins Spark with DL are: In the spirit of Spark and Spark MLlib, it provides easy-to-use APIs that enable deep learning in very few lines of code. It focuses on ease of use and integration,...

    https://www.kdnuggets.com/2018/05/deep-learning-apache-spark-part-2.html

  • Natural Language Processing Library for Apache Spark – free to use

    ...arks report a 4x speedup by just copying the data within the JVM process (and much more when using GPUs). We see the same issue when using spaCy with Spark: Spark is highly optimized for loading & transforming data, but running an NLP pipeline requires copying all the data outside the Tungsten...

    https://www.kdnuggets.com/2017/11/natural-language-processing-library-apache-spark.html

  • The top 5 Big Data courses to help you break into the industry

    ...ing tables and views with Apache Spark SQL Distributed processing Distributed Data persistence Common patterns in Apache Spark Data Processing Apache Spark Streaming introduction to DStreams Working with DataFrames and Schemes Working with Datasets in Scala Apache Spark Streaming: Processing many...

    https://www.kdnuggets.com/2016/08/simplilearn-5-big-data-courses.html

  • Practical Apache Spark in 10 Minutes

    ...public dataset with Iris classification is available here. To move forward, download the file bezdekIris.data to the working folder.   Part 5 - Streaming Spark is a powerful tool which can be applied to solve many interesting problems. Some of them have been discussed in our previous posts....

    https://www.kdnuggets.com/2019/01/practical-apache-spark-10-minutes.html

  • A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets

    ...high-level and domain specific operations, saves space, and executes at superior speeds. As we examined the lessons we learned from early releases of Spark—how to simplify Spark for developers, how to optimize and make it performant—we decided to elevate the low-level RDD APIs to a high-level...

    https://www.kdnuggets.com/2017/08/three-apache-spark-apis-rdds-dataframes-datasets.html

  • Spark Streaming Innovation Contest

    Spark Streaming Innovation Contest- Build a Spark Streaming Application and Win $10,000! Participate in Spark Streaming Contest, build a Spark streaming application and get a chance to win $10,000. StreamAnalytix announced the Spark Streaming Innovation Contest in which participating teams are...

    https://www.kdnuggets.com/2017/02/impetus-spark-streaming-innovation-contest.html

  • Top Big Data Processing Frameworks

    ...Apache Hadoop ecosystem. It can be used by systems beyond Hadoop, including Apache Spark. Here is an in-depth article on cluster and YARN basics. 2. Spark Spark is the heir apparent to the Big Data processing kingdom. Spark and Hadoop are often contrasted as an "either/or" choice, but that isn't...

    https://www.kdnuggets.com/2016/03/top-big-data-processing-frameworks.html

  • Build, Test and Run Spark Applications at No Cost with StreamAnalytix Visual Spark Studio

    ...ipelines within minutes using the intuitive drag and drop interface and a wide array of pre-built Spark operators. Working with StreamAnalytix Visual Spark Studio is extremely easy Visual Spark Studio is lightweight, less than 2GB on disk. Developers can download it onto their Windows, Mac, or a...

    https://www.kdnuggets.com/2017/10/impetus-streamanalytix-visual-spark-studio.html

  • The Big ‘Big Data’ Question: Hadoop or Spark?

    ...files in a distributed way (the file system) so it requires one provided by a third-party. For this reason many Big Data projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics applications can make use of data stored using the Hadoop Distributed File System (HDFS)....

    https://www.kdnuggets.com/2015/08/big-data-question-hadoop-spark.html

  • Apache Flink: The Next Distributed Data Processing Revolution?">Silver Blog, Jul 2017Apache Flink: The Next Distributed Data Processing Revolution?

    …k is in my opinion easier to use than the API of Apache Spark and besides the fact that Apache Flink has a more flexible windowing system than Apache Spark it is also much faster then Apache Spark when network attached storage (NAS) is used in the computing cluster. In terms of batch processing,…

    https://www.kdnuggets.com/2017/07/apache-flink-distributed-data-processing-revolution.html

  • A powerful new IDE to build, test, and run Apache Spark applications on your desktop for free!

    ...rameworks today. Even though Spark’s popularity has grown significantly, unavailability of Spark talent is impacting a broader and deeper adoption of Spark. Why Spark skills are not growing at the same pace With the rapidly changing big data technology landscape, Spark itself is evolving, and...

    https://www.kdnuggets.com/2018/02/impetus-visual-spark-studio.html

  • [ebook] 7 Steps for a Developer to Learn Apache Spark

    ...es Continuous Applications with Structured Streaming Machine Learning for Humans Download the eBook Sincerely, Databricks Team     Databricks: 160 Spear Street, 13th Floor, San Francisco, CA 94105 US © Databricks 2018. All rights reserved. Apache, Apache Spark, Spark and the Spark...

    https://www.kdnuggets.com/2018/04/databricks-ebook-7-steps-learn-apache-spark.html

  • MLlib: Apache Spark component for machine learning

    ...plementations of well-known and well-understood machine learning algorithms, user friendly documentation and consistent APIs, better integration with Spark SQL, Streaming, and GraphX, addressing practical machine learning pipelines. If only a fraction of these areas come to fruition, the future of...

    https://www.kdnuggets.com/2014/07/mllib-apache-spark-component-machine-learning.html

  • Rapidly Build and Run Apache Spark Applications in the Cloud with StreamAnalytix on AWS Marketplace

    ...isual development environment Build, train, calibrate, deploy, and monitor machine learning models on batch and real-time data Built-in operators for Spark MLlib, Spark ML, PMML, H2O, and TensorFlow Introduce custom logic in the language of choice, including Java, Python, Scala, and SQL Built-in...

    https://www.kdnuggets.com/2019/03/impetus-apache-spark-applications.html

  • Introduction to Big Data with Apache Spark

    …– Machine learning library built on the top of Spark and supports many complex machine learning algorithms which runs 100x faster than map-reduce. 3) Spark Streaming – Supports analytical and interactive applications built on live streaming data. 4) Shark (SQL) – Used for querying structured data….

    https://www.kdnuggets.com/2015/06/introduction-big-data-apache-spark.html

  • Interview: Beth Smith, General Manager of the IBM Analytics Platform business, on Analytics, Hadoop, Spark

    ...y clients around the world, reinforces the fact that ODP addresses an unmet need and broader purpose for Hadoop. Q5. You mentioned the rise of Apache Spark. Given Spark is so new, why do you think Spark has gained so much interest? Rate of adoption of technology is always driven by need and ease of...

    https://www.kdnuggets.com/2015/06/interview-beth-smith-ibm-analytics-hadoop-spark.html

  • Big Data Developer Conference, Santa Clara: Day 3 Highlights

    ...ions between nodes, name servers and applications.   Mohammed Guller, Chief Architect, Glassbeam gave a hands-on training session on Spark Core, Spark Streaming, Scala, Data Frames, etc. He asked the audience to start with learning important ideas hidden behind the technology. He gave a quick...

    https://www.kdnuggets.com/2015/04/big-data-developer-conference-highlights-day3.html

  • Apache Spark, the hot new trend in Big Data

    ...nerality or platform compatibility in both directions meaning it integrates nicely with SQL engines (Shark), Machine Learning (MLlib), and streaming (Spark Streaming) without requiring new software installed on the cluster using Hadoop’s new YARN cluster manager.   At Alpine, we have made it...

    https://www.kdnuggets.com/2014/04/apache-spark-hot-new-trend-big-data.html

  • BigData TechCon San Francisco Report: Focus on Spark

    ...e was reasonable mature and could be used to processing of up to 100 nodes and 1 TB data without any problems. The newer modules in the Spark family (SparkStreaming, SparkSQL, MLLib, ...) can be used and are well along but may become easier to use over the next 6 months. On the first day, Sameer...

    https://www.kdnuggets.com/2014/11/bigdata-techcon-san-francisco-report-focus-spark.html

  • Big Data Bootcamp, Austin: Day 1 Highlights

    ...first class citizen for data driven companies. This data is continuously processed and transformed to derive new data feeds. He briefly described how Spark Streaming and Apache Kafka works. He concluded by talking about resource management tools YARN and Mesos. Srini also gave hands-on tutorial on...

    https://www.kdnuggets.com/2015/04/big-data-bootcamp-austin-highlights-day1.html

  • A Vision for Making Deep Learning Simple">Silver Blog, Sep 2017A Vision for Making Deep Learning Simple

    ...and predicts what objects are in the images that we just loaded. This prediction, of course, is done in parallel with all the benefits that come with Spark: from sparkdl import readImages, DeepImagePredictor predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels",...

    https://www.kdnuggets.com/2017/09/databricks-vision-making-deep-learning-simple.html

  • Spark Summit – Explore the future of Data Science and Machine Learning, San Francisco, June 5-7 – KDnuggets Offer

    ...bsp;   Spark Summit   160 Spear Street, Floor 13    San Francisco,  CA   94105   USA Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with, and does...

    https://www.kdnuggets.com/2017/04/spark-summit-san-francisco-june.html

  • YARN is All the Rage at Hadoop Summit 2014

    ...a separate processing framework for each stage, but the demo showed how to leverage the versatility of the Spark runtime to combine Shark, MLlib, and Spark Streaming and the same time perform all of the processing by a single, small program. This arrangement allows us to reuse code and memory...

    https://www.kdnuggets.com/2014/06/yarn-all-rage-hadoop-summit.html

  • Top Data Science Courses on Udemy

    ...Neighbor Collaborative Filtering Decision Trees & Random Forests Ensemble Learning Tools you will learn Python machine learning libraries Apache Spark and its MLLib package Overview Data Science and Machine Learning with Python is a comprehensive walk-through of how to use Python to analyzing...

    https://www.kdnuggets.com/2016/04/top-data-science-courses-udemy.html

  • Spark with Scala – ACM Professional Development Seminar, Santa Clara, Aug 5

    ...for data analysis. We will cover the latest Spark version 2. The course and labs cover: Scala Primer (if needed, optional) Spark ecosystem Installing Spark Spark shell for interactive data analysis Spark Data models : RDDs / Dataframes / Dataset Spark streaming Labs will cover: text data...

    https://www.kdnuggets.com/2017/06/acm-spark-scala-professional-development-seminar.html

  • Spark Summit Europe – Big Ideas About Big Data- KDnuggets Offer

    ...Sign up for our mailing list to be notified when registration opens for other upcoming events like Spark Summit East in February 2018. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with, and does...

    https://www.kdnuggets.com/2017/08/spark-summit-big-data-europe-dublin-october.html

  • GBDC: Real-Time Big Data Developer (focus on Spark, Storm, Flink, Kafka), Santa Clara, Apr 23-24

    ...ions, Use Cases and hands-on sessions. On day one, we will introduce Spark Core Concepts, Scala , SBT & Labs. Day two will consists of Spark SQL, Spark Streaming, Machine Learning , Spark MLlib, Data Frames, Advanced Spark & hands-on sessions. Also, Technical talks on Storm, Flink , Kafka...

    https://www.kdnuggets.com/2015/03/gbdc-real-time-big-data-developer-santa-clara-april.html

  • Strata + Hadoop World 2015 Singapore – Day 2 Highlights

    …ses). Eric talked about how we can architect pipeline with kafka, spark and memSQL. Here’s how it works: Data from streams is pushed to Apache Kafka. Spark Streaming ingests event data from Apache Kafka, then it will be filtered by event type and enriched each event with time and geo-location data….

    https://www.kdnuggets.com/2015/12/strata-hadoop-2015-singapore-highlights-day2.html

  • Ranking Popular Distributed Computing Packages for Data Science

    ...-time data feeds at scale. Similar to Apache Spark, Apache Flink (8) is also a framework capable of both batch and stream processing. However, Apache Spark bills itself as a batch-processor that can handle streaming, while Apache Flink is suited for heavy stream processing with some batch tasks....

    https://www.kdnuggets.com/2018/03/top-distributed-computing-packages-data-science.html

  • Spark – The Definitive Guide – exclusive preview

    ...g with TensorFrames Download the sneak peek of  Spark: The Definitive Guide  from Databricks to learn more. Sincerely, The Databricks Team 160 Spear Street, 13th Floor San Francisco, CA 94105 USA © Databricks 2017. All rights reserved. Apache, Apache Spark, Spark and the Spark logo...

    https://www.kdnuggets.com/2017/09/databricks-spark-definitive-guide-preview.html

  • Unlock the Power of Spark with IBM Watson and Twitter

    ...o enrich your data with new insights and build more powerful analytics. To show what's possible, I created a simple open-source application that uses Spark Streaming to create a feed of live tweets and enrich the data with emotion/tone scores from the Watson Tone Analyzer service. Read how it...

    https://www.kdnuggets.com/2015/10/ibm-watson-spark-twitter.html

  • The Big Data Ecosystem is Too Damn Big">2016 Silver BlogThe Big Data Ecosystem is Too Damn Big

    ...umber of components, fragmentation can be rampant. In the Spark world, you can use Resilient Distributed Datasets (RDDs), DataFrames or Datasets. And Spark developers can use the new Spark Structured Streams for data in motion. But what about Kafka Streams? Those are shiny and new too. To Code or...

    https://www.kdnuggets.com/2016/06/big-data-ecosystem-too-damn-big.html

Refine your search here:

Sign Up

By subscribing you accept KDnuggets Privacy Policy