KDnuggets Home » News » 2018 » Mar » Opinions, Interviews » 5 Things You Need to Know about Big Data ( 18:n12 )

Gold Blog5 Things You Need to Know about Big Data

We take a look at five things you need to know about Big Data.

There’s a lot of social media and general internet buzz regarding Big Data, but what exactly is it? Here are 5 interesting things to know about Big Data.

1. What is it?

Simply put, Big Data refers to large data sets that are computationally analysed to reveal patterns and trends relating to a certain aspect of the data. There’s no minimum amount of data needed for it to be categorised as Big Data, as long as there’s enough to draw solid conclusions.
M-Brain explains the different facets of Big Data through the 8 V’s.

The 8 v's of Big Data

Fig. 1: M-Brain – Big Data with 8 V’s

2. How can I access Big Data?

Big Data is available in an endless number of places and it’s only increasing as time goes on. A simple Google search will enable you to find a data repository for just about everything. A lot of people aren’t aware of just how much data is already available for access and analysis. KD Nuggets has an extensive list of Datasets for Data Mining and Data Science available here - https://www.kdnuggets.com/datasets/index.html

How you can access and utilise this data can be split into six parts:

Data Extraction

Before anything happens, some data is needed. This can be gained in a number of ways, normally via an API call to a company’s web service.

Data Storage

The main difficulty with Big Data is managing how it will be stored. It all depends on the budget and expertise of the individual responsible for setting up the data storage as most providers will require some programming knowledge to implement. A good provider should allow you a safe, straight-forward place to store and query your data.

Data Cleaning

Like it or not, data sets come in all shapes and sizes. Before you can even think about how the data will be stored, you need to make sure it is in a clean and acceptable format.

Data Mining

Data mining is the process of discovering insights within a database. The aim of this is to provide predictions and make decisions based on the data currently held.

Data Analysis

Once all the data has been collected it needs to be analysed to look for interesting patterns and trends. A good data analyst will spot something out of the ordinary, or something that hasn’t been reported by anyone else.

Data Visualisation

Perhaps the most important is the visualisation of the data. This is the part that takes all the work done prior and outputs a visualisation that ideally anyone can understand. This can be done using programming languages such as Plot.ly and d3.js or software such as Tableau.

3. Are there careers related to Big Data?

With the growing access to Big Data, it should come as no surprise that the volume of careers related is on the rise as well. According to the Data Motion, a Big Data Engineer would earn an average salary of $150,000 a year.

Top 10 Big Data Jobs

Fig. 2: Top 10 Big Data Jobs

It’s worth noting that 88% of Data Scientists have an MSc, making it a passport to get into any job in this field (https://www.burtchworks.com/files/2014/07/Burtch-Works-Study_DS_final.pdf).

4. Is it a growing industry?

In short, yes. The general interest and access to Big Data is on the rise. This Google Trends chart (https://g.co/trends/pxXJa) shows the increase in popularity of the search term ‘Big Data’ between 2004 and the present day.

Google big data trends

Fig. 3: Google Trends for Big Data, 2004-2018

According to IDC, “Worldwide revenues for big data and business analytics (BDA) will reach $150.8 billion in 2017, an increase of 12.4 percent over 2016”. The company goes onto estimate that by 2020, big data revenues could top $210 billion.

5. How do I learn more?

Big Data is a broad subject, so learning it all requires knowledge of several areas. Someone looking to work in the field would need an array of certain skills, including one or more of the following:

  • A knowledge of a programming language that relates to data analysis, namely R, Python, SAS or SQL
  • A good understanding of Maths and Statistics
  • Experience on how to scrape a webpage
  • Basic Excel skills

Websites such as Coursera (https://www.coursera.org/specializations/big-data) and Simpli Learn (https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training) offer online Big Data courses.

If you’re looking for a University course, Masters Portal (www.mastersportal.eu/study-options/268927258/data-science-big-data-united-kingdom.html) lists 95 Masters Degrees in Data Science & Big Data in the UK. A typical syllabus (www.stir.ac.uk/postgraduate/programme-information/prospectus/computing-science-and-mathematics/bigdata/) might involve:

  • Mathematics for Big Data
  • Pythonscripting
  • Business and scientific applications of Big Data
  • Big databases and NoSQL including MongoDBCassandra and Neo4J
  • Analytics, machine learning and data visualisation using WekaR and scikit-Learn
  • Optimisation and heuristics for big problems
  • Cluster computing with HadoopSparkHive and MapReduce