Richard Boire on The Data Discovery: Investing in Customer Insight
A leading analytics consultant shares his experience about Data discovery and Customer analytics. How to deal with situations when "we don't know what we don't know" ?
Richard Boire is the Founder of Boire Filler Group, a provider of data analytics and predictive modeling in the Canadian market, and a frequent speaker at many analytics and data mining conferences, including PAW Toronto, Mar 18-21, 2013. The following is an excerpt from his blog on
The Data Discovery: Investing in Customer Insight.
In many of our engagements with new clients, the old Donald Rumsfeld phrase of 'We don't know what we don't know" is very applicable as these organizations commence their journey into database analytics. In these situations, there is no clear definable objective or goal when undertaking these projects. In fact, these companies look for outside consultation in the creation of a roadmap which represents both strategy and tactics on what database analytics projects they should undertake. These type of exercises don't yield an immediate return on investment which can be a barrier to many organizations who fail to appreciate the long term benefits that a data discovery can yield. It is often very difficult to convince these organizations of the longer term benefits of database analytics as they are focused on achieving short-term gains to resolve an immediate business need. In many ways, this is the true challenge of a data discovery as we seek to strike a balance between the longer term analytics needs versus establishing an immediate ROI.
In undertaking these type of projects, the only common feature is the open-ended nature of the assignment as an end solution to solving a specific business problem is not necessarily our goal. Instead exploration and discovery is the intent of the project with the goal being to build an analytical roadmap. Yet, even open-ended projects require some process in order to provide guidelines and steps which are necessary for its success. Typically, this process involves four steps which are:
- Data Audit
- Preliminary Analysis
The first stage of this project represents the portion of the project where the analytics practitioners attempts to increase their knowledge of the client's current business and results. In data mining and analytics, all experts agree that analytics projects require both domain knowledge and data mining expertise in order to really optimize a given solution. Domain knowledge is specific knowledge which pertains to that business. It represents knowledge which is both unique for the industry sector(finance,retail,etc.) of the client business but also knowledge which is unique for the mechanics of how that client business runs. The preparation stage of the project allows the practitioner to increase their domain knowledge of this business. Of course, the domain knowledge of the practitioner will never be as exhaustive as the client but the objective here is to obtain an adequate level of this knowledge in order to conduct an effective discovery exercise.
The initial tasks here are to conduct extensive interviews with key business stakeholders from marketing, I/T, analytics(if there is an area), finance, and the executive depending on availability. During these meetings, key business issues and challenges are identified as well as an understanding of what data is available. Business reports/analyses or any other documents that provide results and meaningful information about their business are shared with the practitioner. At the end of this stage, a data extract is then requested which consists of all the files and fields that will be required for the remainder of this project.
Data audits have been discussed in the past and are a core prequisite in any data discovery exercise. At this stage of the process, the practitioner attempts to become "intimate" with the data which describes a much stronger relationship with data than the standard phrase of " data knowledge".
Once the data extract is received by the practitioner , the data is then loaded into their system whereby standardized reports are produced that essentially provide the following results:
- Data completeness or coverage as indicated by the number of missing values in a variable
- How do values or outcomes distribute within a given variable
- Data inconsistencies and data gaps. Do values change overtime and are there groups of records where certain data anamolies exist