Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2021 » Jun » Tutorials, Overviews » 7 Data Security Best Practices for 2021 ( 21:n22 )

7 Data Security Best Practices for 2021


Here are seven data security best practices to adopt this year.



Figure
Photo by Pixabay from Pexels

 

Data security is a growing concern for any company as cybercrime continues to prosper. Since data scientists’ entire work revolves around potentially sensitive data, they face more pressure than most. If you suffer a data breach, you may be unable to perform your job, in addition to potential financial and reputational losses.

IBM estimates that a data breach costs $3.86 million on average. Depending on the severity of the incident, it could also cost your reputation or even your job. With that in mind, here are seven data security best practices to adopt this year.

 

1. Use Only What You Need

 
One of the best ways to secure your data is to minimize what you store. While it’s tempting to collect as much data as possible, especially when training machine learning models, this makes you more vulnerable. You can only lose what you have, so go through your databases and get rid of anything that isn’t necessary.

Part of this principle is keeping an updated record of the data you have on hand. If you get to a point where you no longer need some information, purge it from your database. Holding on to legacy data doesn’t help you and only means you have more to lose.

 

2. Mask Sensitive Data

 
The “use only what you need” principle applies to the type of data you store, too. Many data science operations don’t require user-specific information, so don’t store it if you don’t need it. If you must use sensitive data like personal identifiers, you should mask it.

One of the most common ways to mask sensitive data is to use a substitution cipher. Tokenization, which substitutes real values with dummy data, is another option and generally safer, as it places the encrypted values in a separate database. No matter which method you use, make sure you scrub your data of all sensitive info that isn’t necessary first.

 

3. Collaborate Carefully

 
Data science is often a collaborative process, and you should think about how you communicate with collaborators. While email may be convenient, it isn’t encrypted by default, so it’s unsuitable for sharing data or credentials to access databases. There are many available services made specifically for sensitive file-sharing, so these are a better option.

You should also keep trust to a minimum, no matter who you’re working with. People should only be able to access what’s critical for their job. You may even consider obfuscating information before sharing it, if possible, to mitigate the impact of any potential breaches.

 

4. Encrypt as Much as Possible

 
When you do share data, you should encrypt it. You should also encrypt your data when it’s sitting in your database. While encryption isn’t a cure-all for all your security concerns, it’s a low-cost way to add another layer of protection.

Many of the best data encryption tools today won’t slow your processes much, either. Look through your options to find something that can encrypt your data at rest and in motion in all scenarios. While this won’t necessarily stop breaches from happening, it will mitigate their cost.

 

5. Secure More Than Your Databases

 
Remember that security applies to more than just where you store your data. Your databases should be the area you pay the most attention to, but they shouldn’t be your only concern. Backups, connected applications, and analytics servers can all serve as backdoors to your data, so they need protection too.

Any program, drive, or file that touches your data should be secure. As you work on this, it’s easier when your data has fewer connections. Minimizing the things that have access to your databases makes your job easier and offers more protection.

 

6. Take Care With Third-Party Cloud Vendors

 
If you use a third-party cloud like AWS, be careful not to become complacent with security. Unfortunately, many users do, as a recent study revealed that 82% of companies give these vendors highly privileged access. Third-party clouds are not inherently risky, but you do need to take security into your own hands.

Check your permissions to make sure you grant the least privilege to your vendor and other applications. Use strong credentials, including multi-factor authentication, and rotate these regularly. If you don’t know what to do, many of these vendors provide security best practices you can reference.

 

7. Establish a Clear Governance Policy

 
Finally, you should establish a clear and specific governance policy for your whole team. Having a written document of what people should and shouldn’t do will help ensure safe user behavior. If someone makes a mistake that jeopardizes security, you can refer to the policy to see what went wrong.

Your governance policy should define everybody’s role in security. You may have a rotating schedule for who monitors and documents incoming and outgoing data. You may give everyone a static role. Whatever you do, make it specific and clear, and ensure everyone understands it.

 

Data Science Security Must Improve

 
Data science is playing an increasingly central role in business today. As this trend continues, your work becomes a more valuable target for cybercriminals. Data science teams must embrace security in light of these rising threats.
Start with these best practices, then look for other, smaller areas where you can increase security. When your data is secure, you can work with confidence and impress potential clients.

 
Bio: Devin Partida is a big data and technology writer, as well as the Editor-in-Chief of ReHack.com.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy