Recommendations for Big Data Success for the Long Haul

Overcome common big data challenges by identifying the right POC (proof-of-concept) use case for Hadoop and leveraging its success to build executive and stakeholder buy-in.

By Nathan Nickels (Metascale), Aug 2014.

The process of identifying and developing a relevant business use case for Hadoop continues to stump IT managers and stall their attempts to derive new value from their data and achieve big data success. Figuring out the right proof-of-concept (POC) use case for your organization can help bring stakeholders together to build a long-term strategy for successful big data implementation.

Use case development requires education on big data technologies. Hadoop is a departure from the traditional enterprise data warehouse approach, both from an architecture and skills standpoint. You need to gather some concise materials that explain what Hadoop is, why it is important to your organization and what the benefits are. There is a wealth of information readily available. The key is to collect materials that are tailored to your organization’s specific needs and pain points.

Once your team has educated themselves and the necessary executives and business stakeholders, you need to setup a Hadoop cluster if you have not done so already. Setting up a Hadoop cluster is relatively easy. It can be done on some old, discarded equipment (with some local storage) or inexpensive commodity hardware.

Next, identify a small, low impact use case for implementation on the cluster. The application should be a good candidate for Hadoop so a successful POC will be assured. When selecting a use case, identify the problems you are attempting to solve. Pick something small and achievable – do not attempt to boil the ocean. Most important, identify pain points that can be solved with Hadoop. Things like cost reduction or processing constraints are problems that Hadoop excels at solving.

Like all new technology, Hadoop is the “shiny new object” that you will want to test and explore, and try to solve all kinds of problems with at once. Your Hadoop environment can become complex very quickly, so it is imperative you stay focused on your initial use case. Once the POC is successful, you will have the ammunition to demonstrate Hadoop’s value to the organization and build stakeholder buy-in.

Long-term planning is essential to a successful big data implementation. You need to decide what the Physical Hardware Architectures are going to be. You also need to decide whether this is something your team will create and deploy, will be created and deployed by other departments within your organization, or outsourced to a solution provider (such as MetaScale).

There are several factors that need to be considered when designing the architecture. For starters, setup the initial logical and physical architectures, and incorporate data layers into your design. Target de-normalizing data and persisting data as much as possible to enable the creation of your enterprise data hub. You also need to formulate system development plans for functionality.

Insert Graphic: Hadoop – a logical view

Big data technologies such as Hadoop require a fundamental shift in your approach to data management. In order to manage risk, MetaScale recommends that organizations bring in outside help to get started. Developing a partnership with a proven solution provider allows you to focus on your core competencies of extracting value from your data while compressing the time to value of your big data initiatives.

Nathan Nickels Bio: Nathan Nickels, @BigDataMadeEasy, leads Marketing and Operations at MetaScale, a big data company of Sears Holdings Corporation. As a member of the MetaScale Team, Nathan is committed to helping IT and business professionals understand how big data tools such as Hadoop and NoSQL can benefit their organization.