Patterns for Streaming Realtime Analytics
Design patterns are well-known for solving the recurrent problems in software engineering, on similar lines we can have Streaming Realtime Analytics patterns and avoid reinventing the wheel. Here, you can see the major patterns we found out for it.
By Srinath Perera (Director, Research, WSO2 Inc.)
Stream processing technologies like Apache Samza and Apache Storm has received much attention under the theme large scale streaming analytics. However, these tools force every programmer to design and implement real-time analytics processing from first principals.
Realtime Streaming Analytics Patterns
Before looking at the patterns, let’s first agree on the terminology. Realtime Streaming Analytics accepts input as a set of streams where each stream consists of many events ordered in time. Each event has many attributes, but all events in a same stream have the same set of attributes or schema.
Pattern 1: Preprocessing
Preprocessing is often done as a projection from one data stream to the other or through filtering. Potential operations include
- Filtering and removing some events
- Reshaping a stream by removing, renaming, or adding new attributes to a stream
- Splitting and combining attributes in a stream
- Transforming attributes
For example, from a twitter data stream, we might choose to extract the fields: author, timestamp, location, and then filter them based on the location of the author.
Pattern 2: Alerts and Thresholds
This pattern detects a condition and generates alerts based on a condition. (e.g. Alarm on high temperature). These alerts can be based on a simple value or more complex conditions such as rate of increase etc.
For an example, in TFL (Transport for London) Demo video based on transit data from London, we trigger a speed alert when the bus has exceed a given speed limit.
Pattern 3: Simple Counting and Counting with Windows
Pattern 4: Joining Event Streams
Pattern 5: Data Correlation, Missing Events, and Erroneous Data
Pattern 6: Interacting with Databases
Pattern 7: Detecting Temporal Event Sequence Patterns
Pattern 8: Tracking ( track something over space or time)
Pattern 9: Detecting Trends ( rise, fall, turn, tipple bottom)
Pattern 10: Running the same Query in Batch and Realtime Pipelines
Pattern 11: Detecting and switching to Detailed Analysis
Pattern 12: Using a Machine Learning Model
Pattern 13: Online Control
The full post is at https://iwringer.wordpress.com/2015/08/03/patterns-for-streaming-realtime-analytics/
Bio: Srinath Perera is a scientist, software architect, and a programmer that works on distributed systems.