Interview: Reiner Kappenberger, HP Security Voltage on How to Secure Data-in-Motion
We discuss the security concerns in Big Data, challenges in securing Big Data locally and over cloud, and open source solutions – Knox and Ranger.
Twitter Handle: @hey_anmol
His background ranges from device management in the telecommunications sector to GIS and database systems. He holds a Diploma from the FH Regensburg, Germany in computer science.
Here is second part of my interview with him:
Anmol Rajpurohit: Q5. What are the top security concerns in the Big Data ecosystem? Has anything changed here in the past five years?
Reiner Kappenberger: When we’re talking about big data in an enterprise environment, the importance of security is critical to the success of the business. Organizations have to protect sensitive customer, partner and internal information and adhere to compliance requirements.
The aggregation of data makes for an even more alluring target for hackers and data thieves. Hadoop presents brand new challenges to data risk management: the potential concentration of vast amounts of sensitive corporate and personal data in a low-trust environment.
Hadoop has had a reputation as being difficult to secure, for a number of reasons. While it’s an amazing technology innovation, it also raises security concerns. One of the great things about a Hadoop deployment is that it consists of multiple sources of data from multiple enterprise systems in real-time – but this requires new, previously unknown protection requirements. As these multiple types of data combine together in the Hadoop “data lake,” the data is accessed by many different users with varying analytic needs.
The major Hadoop distributions have recognized that there is a bigger need for security in the Big Data space as it creates a centralized repository of all data inside a company. The traditional approach taken by the distributions focuses on Authentication, Authorization and data-at-rest protection. While those mechanisms are vital steps towards data protection they offer only a limited set of protection against today’s attacks. What more people are focusing on today is to protect the data itself so that it is protected at rest, in motion, as well as in use. This data-centric protection approach enables those customers to perform regular business operations while at the same time assuring them that even should the unthinkable happen that the data is unusable for the attacker.
AR: Q6. What are the major challenges in securing a Big Data implementation? How does a Cloud-based implementation impact Security?
RK: Organizations face the risk of even further reduced control if Hadoop or other big data clusters are deployed in a cloud environment.
One of the major challenges is that organizations need to be aware that data at rest protection does not secure data in motion, or in use, leaving the potential for major compliance and exploitable security gaps. An organization’s security posture has to include protection for data in-motion and in use in analytics.
It’s really important to de-identify data as close to its source as possible, never allowing sensitive information to reach the cloud in its live and vulnerable form. Encryption, tokenization and data masking are the key to protecting that data.
Access is also an issue – organizations need to securely re-identify select data fields for live data access but only by authorized users and applications when required for business needs.
When moving a Big Data environment into the cloud there are additional challenges as the company is actually not only handing the data over to a third party but also all controls for the management associated with it.
If the data contains sensitive information this is getting even more complex as businesses need to have audits covering the third party, which might not always be possible.
In order to solve those challenges, the best approach is to de-identify the information with either Format Preserving Encryption or Tokenization before it is transmitted to the third party. This way the data itself is protected immediately and as there is no association with the real information. Even when data is logged from an application being processed by the third party no information is actually leaked into the wrong hands. The data is protected completely and the scope of any audits can be dramatically reduced.
AR: Q7. What were the key factors behind the rapid growth and success of HP Security Voltage in the last few years?
RK: We continue to listen to our customers and the market, and innovate and deliver easy-to-use, standards-based data-centric encryption technologies. As a result, HP Security Voltage solutions are in use at almost 1,000 enterprise customers, including some of the world's leading brand-name companies in payments, banking, retail, insurance, energy, healthcare and government and six of the top eight U.S. payment processors
Another key factor is the wide platform support that HP Security Voltage provides. Taking a broader enterprise approach for security is vital for any business and HP Security Voltage’s ability to support almost any current and legacy platform and OS in their environment is critical for the adoption of a data-centric security approach.
Q8. What are your thoughts on the Open Source data security solutions such as Knox and Ranger?
RK: Knox and Ranger are important aspects of an overall security approach. They help customers understand who is doing what in their environment and making sure that access controls are present at all times. They also give business the ability to track what is being done, which is important for an efficient operation and performing mandatory audits.
As a base for a security approach Knox and Ranger are vital and should be utilized with additional measures such as a data-centric security practice. All these elements work hand in hand to provide protection of sensitive data in Hadoop installations with industry standards-based technologies as well as give the ability to understand how data is being utilized for best threat detection.
Anmol Rajpurohit is a software development intern at Salesforce. He is a MDP Fellow and graduate mentor at UCI-Calit2. He has presented his research work at various conferences including IEEE Big Data 2013. He is currently a graduate student (MS, Computer Science) at UC, Irvine.