Get Out of the Sandbox – Put Your Models in Production, Aug 10 Webinar

Learn how to deploy your Data Science work in production, both in batch and real-time environments, where people and programs can use them simply and confidently.

By Dataiku. Sponsored Post.

Imagine if you and your team spent months putting together a powerful analytics or data science model with great insights, but nobody ever used the dashboards or automated decisions with it. It sounds demoralizing.

Getting the most impact from all your work means deploying these prototypes in production --which means putting them in an environment where people and programs can use them simply and confidently.

Production and Its Discontents

The reality is that everyone has his or her own definition of production. But if we were forced to put our heads together to propose a single definition for a production environment, it would be something like this:

"An environment that is runnable whenever necessary and able to serve external consumers for their day-to-day decisions, whether those consumers are humans or software."

Dataiku Sand Castle This is distinct from what we call a design (or development) environment, which can crash without having any significant impact on the business. The design environment is also often referred to as a sandbox, because while it offers a place to experiment, it can't help or hurt anything else.

How can you smoothly go from a sandbox to production and ensure that everything goes smoothly?

Who "Owns" Production in an Organization?

For some organizations, production is the realm of IT, while design is the realm of data science and analytics. In these organizations, the analytics team connects to data sources and builds models -- in R, SAS, SPSS, and other languages -- in the design environment and then hands these models over to IT, who often will recode the models in yet another language, such as Java or C, and then make them available for broader use. The IT department then manages the supervision of these models and the databases and applications they impact. Ultimately, the whole process can take weeks and even months.

For organizations on the other side of the spectrum, where the analytics team handles deployment, the complexity of managing and monitoring workflows in production can spread team too thin or call for skillsets that not all teams have.

Batch And Real-Time Environments

In addition to the aforementioned differences between companies in who "owns" production, there are also different types of environments. Depending on your real-time needs,your production environment will generally fall into one of two types of architecture: batch or real-time. Let's use some examples to define them.

In an example of a batch environment, the analytics team at a chain of gasoline stations has built and deployed a model to predict the next day's raw materials requirements in each location across the country. The model will use previous sales data to automatically generate predictions of what each location will need, which can then trigger automatic purchasing and dispatch decisions, as well as flow into dashboards and reports.

In an example of a real-time environment, a potential buyer of a car applies for a loan, and the data from the loan application is fed into a credit scoring model by querying an API, generating a live assessment of the creditworthiness of the applicant. Now, the loan can be approved or declined, or a company representative can be prompted to ask for more information, all of this in real time.

Both batch and real-time production environments address different concerns and require different types of infrastructure. Obviously, your organization's infrastructure will be determined by its needs, so neither is, on its own, better than the other.

What's Next?

Clearly, there is significant variation in what production analytics means between businesses (and, in some cases, the needs and expectations of different teams at the same company).

On August 10, 2017, at 11 a.m. EDT, Dataiku will be hosting a webinar to discuss the Dataiku approach to production and the features that facilitate it.

Click here to sign up!