Interview: Dave McCrory, Basho on Why Data Gravity Cannot be Ignored in Architecture Design

We discuss data gravity and its implications, Riak Enterprise 2.0, Riak CS 1.5, competitive landscape, challenges and more.

Dave McCroryDave McCrory is Chief Technology Officer at Basho Technologies. Dave most recently served as SVP of engineering at Warner Music Group, where he led over 100 engineers building the company’s new Digital Services Platform, based on an open source enterprise platform as a service. His extensive experience in the cloud and virtualization industry included positions as a senior architect in Cloud Foundry while at VMware and as a cloud architect at Dell.

Earlier in his career, he experienced successful exits for two companies he founded: Hyper9 (acquired by SolarWinds) and Surgient (acquired by Quest Software). Dave is well known for inventing the concept and coining the term “Data Gravity,” which states that as data accumulates, there is a greater likelihood that additional services and applications will be attracted to this data and add to it.

First part of interview

Here is second last part of my interview with him:

Anmol Rajpurohit: Q6. What do you mean by "data gravity"? What role does it play in designing data management architecture for Enterprise IT?

Dave McCrory: Data gravity describes the effect that as data accumulates, there is a greater likelihood that additional services and applications will be attracted to this data, essentially having the same effect gravity has on objects around a planet. As the mass and density increases, so does the strength of the gravitational pull and as things get closer to the mass, they accelerate towards it at increasing velocity. Although services and applications have their own gravity, data is the most massive and dense, meaning it has the most gravity. If data becomes large enough it can become virtually impossible to move. Usually as services and applications interact with data, they cause even more rapid growth of the data itself, creating a continuous cycle of data growth.
When designing data management architectures, it’s important to take data gravity into consideration. It’s easy to get data into your services and applications, however getting it out can be difficult and expensive. Whether you’re creating a single-user application or deploying a company-wide project, you need to consider the implications of data gravity. The stronger the data gravity involved, the more cautious you should be when choosing or designing your data storage solution and where you implement it (locally or in the cloud).

AR: Q7. What are your favorite enhancements in Riak Enterprise 2.0 and Riak CS 1.5?

DM: In Riak Enterprise 2.0, my favorite enhancement is the redesign of Riak Search integrated with Apache Solr. It powers integration with a wider variety of existing software through client query APIs. In Riak CS 1.5, I really like the improved Amazon S3 compatibility. Our expanded storage API compatibility with S3 includes multi-object delete, put object copy and cache control headers which provide more flexible integration with content delivery networks (CDNs).

AR: Q8. How do you distinguish Riak from its increasing competition? Can you share any client use cases that were particularly interesting to you?

riakDM: Riak is easier to use and scales better than our competition. When companies are switching from an RDBMS to a distributed system or NoSQL database, they want a simple, scalable and easy to use solution. Riak offers extremely easy operations and great scalability.

A use case I find particularly interesting is Tapjoy, a mobile advertising and monetization platform. Tapjoy uses Riak to manage more than 250,000 operations per second without having to employ additional engineering staff. Managing that amount of data with one of our competitors’ solutions wouldn’t be possible with the staff Tapjoy has. Riak helps them keep costs down and reduce complexity while still guaranteeing performance and uptime.

AR: Q9. What are the most underrated challenges of working with Distributed Storage?

data-overloadDM: The biggest challenge with distributed storage, and with all storage at this point, is dealing with the overwhelming amount of data that enterprises are generating while maintaining ease of operations along with performance and scalability. Also, networks are unreliable. Distributed systems architects have to always be ready to address and solve any networks problems quickly and gracefully. It can get pretty stressful, I don’t think many people understand unless they’ve experienced it themselves.

AR: Q10. What is the best advice you have got in your career?

DM: Always hire people that are better/smarter than you are.

AR: Q11. If you were a fresher starting your career journey today, how would you shape up your career?

DM: I would focus on learning to code, statistics and probabilities, presentation skills and social media presence.

AR: Q12. On a personal note, what book (or article) did you read recently and would strongly recommend? What keeps you busy when you are away from work? metaform

DM: The Metaform – The Platform of Everything by my friend Jonathan Murray (@Adamalthus)

Outside of work, I enjoy spending time with my family and travel. I also enjoy working on building a model for Data Gravity and studying Information Theory. I also enjoy Asian movies (subtitled), Anime, Racing Cars, and fine Whiskey and Bourbon.