Gold BlogKnow Your Data: Part 1

This article will introduce the different type of data sets, data object and attributes.

By Krishna Kumar Tiwari, Data Science Architect at InMobi

In my previous article, we discussed about the chemistry between Big Data and Machine Learning. We also reach to the fact that for making a good learning model we need to first understand our data well.

Below picture represents the machine learning & data mining process in general. Data cleaning and Feature extraction is the most tedious job but you need to be good at it make your model more accurate.


What is Data or Data Set?

Collection of data objects and their attributes. Attributes captures the basic characteristics of an object. Lets check the famous Titanic Data set which is available here. It represents the list of passenger information and their survival status (survived=0/1).

Each row represents an passenger object and the columns are basically characteristics of that passenger also called attributes. Attributes values are numbers or some categories. Ex Age is number but Pclass basically represents the passenger class (1 means 1st class, 2 means 2nd class …)


Type of Attributes?

To understand the Data, you need to first understand the different types of attributes. The type of an attribute depends on which of the following properties it possesses:

Distinctness: =, !=
Order: < >
Addition: + -
Multiplication: * /


Follows the distinctness property, examples: ID numbers, eye color, zip codes.


Follows the distinctness & order, examples: rankings (e.g., taste of potato chips on a scale from 1–10), grades, height in {tall, medium, short}


Follows the distinctness, order & addition, examples: calendar dates, temperatures in Celsius or Fahrenheit.


Follows all four properties, examples: temperature in Kelvin, length, time, counts

Based on the values of the attributes we can say attributes can be of two types.

Discrete attributes: It will have finite or countable infinite set of values, zip code, Pclass are the perfect examples of this. Binary attributes are special form of discrete attributes(survived attributed in Titanic data set is an example of binary attribute)

Continuous attributes: It will have real number as values, example: fare in Titanic data set.


Types of Data Sets?

Record Data Sets: Consists of collection of records with fixed set of attributes also know as structured data. Below are the few examples of record data set.


Data Matrix



Document Data



Transaction Data


Graph Data Sets:


World Wide Web



Molecular Structures


Ordered Data Sets:


Sequential Data



Temporal Data


In this article, we understood the different type of data sets, data object and attributes. In my next article we will understand the issues related to the data sets, how to identify and deal with it. We will also walk through an example on how to do feature extraction on Titanic data set.

Thanks for reading this, please share your thoughts, feedback &ideas in comments. You can also reach out me on @simplykk87 on twitter and linkedin.


Introduction to Data Mining,Pang-Ning Tan, Michigan State University, Michael Steinbach, University of Minnesota Vipin Kumar, University of Minnesota

Bio: Krishna Kumar Tiwari is a Data Science Architect at InMobi.

Original. Reposted with permission.