Three Ways Big Data and Machine Learning Reinvent Online Video Experience

With traditional TV viewing on the decline, we discuss several ways Big Data and Machine Learning can assist with online video, including redefining user recommendations, improving video buffering and leveraging MAM orchestration.

By Yana Yelina

Three ways big data
Image source:


Let’s face it: traditional TV is fading. Viewing habits have totally changed, with spectators now favoring online video. In this competitive market where big players like Netflix and Hulu are racing for most eyeballs it might be rather difficult to encourage audiences to stay tuned to your video content.

According to NewVantage Venture Partners, big data and machine learning (ML) deliver true value to enterprises. The marriage of these techs allows getting advanced customer intelligence, automating mission-critical workflows, and, in turn, significantly improving viewer experience.

A number of online video providers have already started reaping these benefits. Let’s take a dive into some success stories.

Betting on Big Data to Refine Recommendations

Viewer interests and behavior tend to change rapidly, and it might be quite difficult to predict what content they will acquire and enjoy next. To address that, businesses need to leverage huge amounts of value-rich data for thorough analysis.

For example, Netflix virtually sits on data goldmines. The OTT giant uses lots of data sources to feed the machine learning algorithms of its recommendation engine.

Netflix has billions of member ratings and receives several million daily stream plays — with info about duration, time of day, and device type. The company also analyzes social data, movie metadata (actors, director, genre, parental rating, and reviews), films’ popularity, queue items, demographics, location, language, and much more.

To smartly manage this great availability of data, Netflix implements all sorts of ML approaches, including clustering algorithms, linear and logistic regression, Markov chains, and association rules.

In turn, this ML-based model allows the media provider to automatically deliver personalized offerings in terms of content, payment methods, and subscription types. According to Netflix, the results are impressive, with 75% of its viewership coming directly from its recommendation engine.

The corporation also puts viewing data to good use by personalizing movies’ artworks. What does that mean? Netflix’s ML algorithms analyze a user’s preferences, namely favorite genres and actors, to generate the most relevant imagery for a film.

For instance, if your viewing history says you like comedies, the artwork of the recommended content will likely include a comedian. And if you’re a romantic movie lover, be sure to mostly get recommendations featuring romantic scenes.

Netflix Figure 1

Image source:


Netflix will also spot your cast preferences to feature the artwork with a beloved actor/actress.

Netflix Figure 2

Image source:


Netflix says the steps they took in personalizing the artworks led to a meaningful improvement in how viewers find new content. The video provider also plans to expand this approach and personalize other types of artworks they use, such as synopses, metadata, and trailers.

Leveraging MAM Orchestration to Accelerate Video Processing

Video providers are under constant pressure to deliver content to a vast number of viewers. While fulfilling a batch of media asset management (MAM) operations, they’re dealing with dispersed systems and technologies, communications, and work orders.

And MAM orchestration might be the optimal avenue to address this challenge. Empowered by ML, an MAM orchestrator can automate a wealth of operations — from media ingestion and transcoding to media processing and playout. Namely, such a solution can spare you the need to manually categorize video, i.e. spot adult content, violence, racism, objectionable language, etc. — to meet compliance regulations.

ML algorithms can also greatly enhance an orchestrator’s ability to flag content inappropriate for a particular region or country — for political and/or religious reasons or for being too controversial. With such an automated process in place, you won’t make a blunder releasing a movie about the life of LGBT groups in Singapore, the country where explicit promotion of homosexual lifestyle in movies and other video content is strictly prohibited.

Besides, you won’t have to hire additional staff to support dubbing and subtitling, manage scheduling, or specify film metadata such as genres and subgenres, country of production, distribution regions, crew, and more.

Trailer creation — labor-intensive, manual process that may take up to 30 days — is another thing an ML-enabled orchestrator could automate. By identifying, analyzing, and editing faces, video tone and pace, it could produce a screen-ready preview — the same way IBM Watson created an appropriate preview for the thriller “Morgan”, after being trained on trailers’ structure.

To sum it up, by timely handling ever-growing volumes of content, ML-enabled MAM orchestrators can help you reach a new level of operational efficiency as well as increase speed and flexibility of your solution’s launch — with lower cost of operations.

Netflix heavily relies on such an approach. Their orchestrator engine Conductor has already helped synchronously scale over 2.6 million process flows — from simple linear tasks to very complex ones that run over multiple days.

Below is the scheme explaining the Conductor’s architecture:

Conductor Architecture

Image source:


And here’s an example of a Conductor’s workflow:

Conductor Workflow

Image source:

ML to Put an End to Video Buffering

According to Wistia, 20% of US viewers have no connection capable of streaming HD, and it’s not just people in remote or rural areas who might face this problem. Moreover, 80% of those people tend to abandon the stream if the video starts buffering.

Indeed, buffering is a major impediment to a compelling viewing experience. But offering high-quality streaming to the global audience is a real technical challenge. Is ML able to tackle this?

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) say “yes”. They’re making strides into improving viewing experience with their AI Pensieve.

Pensieve is a system that uses ML to find the optimal adaptive bitrate (ABR) algorithm to deliver the video at the best possible resolution while avoiding buffering breaks, i.e. to adapt to a wide range of environments and QoE metrics. These ABR decisions aren’t made on pre-programmed models or assumptions about the environment. Conversely, Pensieve takes into account the observations collected by client video players.

Video Buffering

Image source:


As a result, more viewers can relish top-quality streams, even if they enter a tunnel with sketchy connectivity or are in a teeming area with thousands of other network users.

The team headed by MIT Professor Mohammad Alizadeh says Pensieve also generalizes well and outperforms best state-of-the-art schemes by up to 25% in average QoE — even on networks for which it wasn’t explicitly trained.

MIT is not alone in the war against buffering. In 2017, Netflix launched its own ML-powered algorithm Dynamic Optimizer that individually analyzes each video frame and compresses it as much as possible without sacrificing image quality.

What is more, this algorithm can differentiate between video types. For example, popular action-packed sequences get an increase in bitrate, while simpler animated content is eased back. “We can now optimize [a stream] scene by scene with an almost infinite matrix of possibilities,” admits Ioannis Katsavounidis, a senior research scientist at Netflix.

The result is a seamless stream for all viewers, especially those with poor internet connection.

Are There Any Stumbling Blocks?

Any business endeavor, especially when it’s intertwined with highly innovative technologies, presupposes certain pitfalls. And when it comes to big data and AI, these are usually data privacy and bias challenges.

  1. Information protection. By collecting large amounts of customer data, the risk of sensitive information leaks is rather high. Moreover, your potential clients may not want you to use their personal information. To address this issue, make sure your solution fully complies with relevant privacy rules, including GDPR.
  2. Bias. We’re now only in the infancy of the AI age, and error is quite common. To wit, Google’s AI labeled images of African people as gorillas. Indeed, there’s a growing concern about bias in AI. And to minimize risks, your algorithms should be continuously trained on as much data as possible.

Benefits Won’t Be Long in Coming

Viewing habits are constantly changing, and innovative technologies can help you succeed in satisfying customer needs. By applying custom big data solutions and AI to their business operations, big brands have already managed to improve employee efficiency, boost sales, increase customer loyalty, and enhance user experience. Are you ready to jump on the bandwagon?

Bio: Yana Yelina is a Technology Writer at Oxagile, a New York-based software company that develops custom big data solutions. Yana’s articles have been featured on ITProPortal, Dataconomy, insideBigData, Datafloq, CloudTweaks, and Business2Community, to name a few. Yana is passionate about the untapped potential of technology and explores the benefits it can bring businesses of every stripe. You can reach Yana at or connect via LinkedIn or Twitter.