KDnuggets Home » News » 2015 » Jun » Opinions, Interviews, Reports » Data Mining and Predictive Analytics Glossary ( 15:n19 )

Data Mining and Predictive Analytics Glossary

Here, we have collected definitions of common terminologies used in data science and big data.

By Algolytics.

As Predictive Analytics (also called Data Mining or Data Science) is gaining momentum and spreading across companies and sectors, we have created a short guide to some common terms in this field. We hope you like it!

Analytical CRM (aCRM): supports decision-making processes that improve customer interactions or increase the value of customer interactions; aCRM aims at storing, analyzing and applying the knowledge about customers and about ways to approach them effectively.

Big Data: both an often abused buzzword and a real trend in today’s world, reflecting the growing amount of data being captured, processed, aggregated, stored and analyzed every day. Wikipedia describes Big Data as „collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools (…)”.

Business Intelligence: the applications, infrastructure, tools or processes for analyzing data and presenting information to help company executives, managers and others make more informed business decisions.

Churn Analysis (Attrition Analysis): profiles the customers who are likely to stop using the company’s services or products and identifies those whose churn is likely to bring the biggest loss. Results of churn analysis are used to prepare new offers for valuable customers under the risk of defection.

Conjoint (Trade-off) Analysis: allows to compare different variants of a given offer on the basis of their utility to customers. It forecasts the likely acceptance of a product/service if brought to the market, can be used for product line management, price setting etc.

Credit Scoring: assessing the creditworthiness of the entity (usually a person or company), used by banks (lenders) to determine if a person will repay his/her debts.

Cross / Up selling: a marketing notion of selling complementary (cross-selling) or additional (up-selling) products to specific customers considering their characteristics and past behavior.

Customer Segmentation & Profiling: grouping of customers with similar profiles and behavior based on the available customer data, describing and comparing such groups.

Data Mart: part of data stored in an organization that is focused on a single subject or department, such as Sales, Finance, or Marketing.

Data Warehouse: central repository of data that is collected and/or stored by an enterprise’s various business systems.

Data Quality: the processes and techniques involved in ensuring the reliability and application efficiency of data. Data is of high quality if it reliably reflects underlying processes and fits the intended uses in operations, decision making and planning.

ETL (Extract-Transform-Load): a process in data warehousing responsible for pulling data out of one source, transforming them so that they meet the needs of the processes that will be using them on next stages, and placing them into target database.

Fraud Detection: identifying characteristics of suspicious fraudulent transfers, orders and other illegal activities against an organization or company and designing relevant triggers in IT systems that raise warnings whenever such transactions are attempted or made.

Hadoop: another hot topic in Big Data nowadays; Apache Hadoop is an open-source software framework for distributed storage and processing of very large data sets on computer clusters built from commodity (already available) hardware. It enables massive data storage and faster processing.

IoT (Internet of Things): the notion of wide-ranging network of electronic devices of various kinds (personal, household-level, industrial) and purposes (healthcare, leisure, media consumption, shopping, manufacturing, environment control etc.) exchanging data over Internet, coordinating its activities with one another.

LTV (Lifetime Value) of a customer: the anticipated discounted profit that a customer will generate for a company during his/her lifetime.

Machine Learning: a discipline that studies methods and algorithms of automated learning from data through which computer systems can adjust their operations according to feedback they receive. A term strongly related to artificial intelligence, data mining, statistical methods.

Market Basket Analysis: identifying combinations of products or services that frequently co-occur in transactions, for example products that are often purchased together. Results of such analysis are used to recommend additional purchases, inform decisions on placing products in relation to one another etc.

OLAP (On-Line Analytical Processing): tools enabling a user to easily make and browse reports summarizing relevant data, analyzing them from various perspectives.

Predictive Analytics: the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Applied to business, predictive models and analysis are used to analyze current data and historical facts in order to better understand customers, products and partners and to identify potential risks and opportunities for a company.

Real Time Decisioning (RTD): helps companies make the best sales/marketing decisions in real time (near to zero latency). For example RTD systems (scoring systems) can score and rate customers at the very moment of their interaction with the company using various business rules or models.

Retention (Customer Retention): refers to the percentage of customer relationships that, once established, a business is able to maintain on a long-term basis.

Social Network Analysis (SNA): the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes. SNA provides both a visual and a mathematical analysis of human relationships.

Survival Analysis: estimates the time a customer will subscribe to a service or the probability of customer’s defection in subsequent periods of time. This information allows the company to determine the predicted period of retaining the customer and introduce an appropriate loyalty policy.

Text Mining: the analysis of data contained in natural language text. It works by computing statistics for words and phrases in source data, thus expressing text structure in numerical terms, and then analyzing it with traditional data mining techniques.

Unstructured Data: information that either does not have a pre-defined data model and/or is not organized in a predefined manner. It usually refers to the information that doesn’t reside in a traditional row-column database, for example e-mail messages or comments.

Web Mining (Web Data Mining): the use of data mining techniques to automatically discover and extract information from Web sites, documents and services.



Sign Up