Datapalooza: Produce Your Data Application Development Concert, Nov 10-12, San Francisco

Datapalooza will enable you to take your data-science skills to the next level. You’ll gain hands-on experience, enjoy one-on-one coaching, and learn how to build a practical data-science product in just three days - Nov 10-12 in San Francisco.

Data science is a tightly orchestrated industrial process in the modern economy.

That’s not how it’s always been. Some old-school data scientists still consider themselves soloists. In other words, they fancy themselves skilled artisans who apply creativity, attention, and loving care to the weaving of intricate statistical tapestries.

However, in the 21st century, that's not what pays the bills. Data scientists increasingly hold down time-sensitive operational responsibilities in many organizations, such as running real-world experiments, also known as “A/B testing,” to target offers across diverse customer segments, interaction channels, and other use cases.

This new reality demands that the working data scientists evolve into something akin to an industrial operation: a statistical modeling production line. Accelerating, automating, and scaling the data-science workflow is critical in this regard. The key processes that need to be industrialized are data discovery, profiling, sampling, and preparation, as well as model building, scoring, and deployment. Automation can also help control the cost of developing, scoring, validating, and deploying a growing scale and variety of models against ever expanding big-data collections.

Data scientists will be swamped with unmanageable workloads if they don't begin to offload many seemingly "artisan" tasks to automated tooling. You can accelerate your modeling automation initiatives by following these steps:

  • Virtualize access to data, metadata, rules, and predictive models, as well as to data integration, data warehousing, and advanced analytic applications through a BI semantic virtualization layer;
  • Unify access, governance, orchestration, automation, and administration across these resources within a service-oriented architecture;
  • Explore commercial tools that support maximum automation of model development, scoring, deployment, and execution;
  • Consolidate, accelerate, and deepen predictive analytics through integration into big-data platforms with scalable in-database execution; and
  • Migrate existing analytical data marts into multidomain big-data platforms with unified data, metadata, and model governance within service-oriented virtualization framework.

Rest assured, data scientists: automating many of your tasks will not put you out of work. Automating the modeling process will boost your productivity by an order of magnitude, freeing you from drudgery so you can focus on the sorts of exploration, modeling, and visualization challenges that demand expert human judgment.

Want to accelerate your delivery of data-science models, applications, and outcomes?


Get your ticket here
for the first Datapalooza, which will take place next week, November 10-12, at Galvanize in San Francisco.

Sponsored by the Spark Technology Center, Datapalooza will enable you to take your data-science skills to the next level. You’ll gain hands-on experience, enjoy one-on-one coaching, and learn how to build a practical data-science product in just three days. In doing so, you’ll be addressing real-world data-science challenges that require creative pattern thinking, machine learning, cognitive computing, natural language processing, and stream computing.

You should also explore this informational IBM Analytics resource page on Spark.

James KobielusJames Kobielus is an industry veteran and serves as IBM Big Data Evangelist; Senior Program Director for Product Marketing in Big Data Analytics; and Team Lead, Technical Marketing, IBM Big Data & Analytics Hub. He spearheads IBM's thought leadership activities in Big Data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, and data management.