McDowell Interview: PASS Business Analytics Conference, Microsoft Data Mining
I interviewed Douglas McDowell about the PASS Business Analytics Conference, SQL Server, Microsoft Data Mining, less known but useful features of SQL, NodeXL, Big Data and more.
Gregory Piatetsky, Mar 13, 2013.
Douglas McDowell is the CEO of North America for SolidQ (www.solidq.com). He is a Microsoft Most Valuable Professional (MVP) for SQL Server and serves on the Board of Directors for the Professional Association of SQL Server (PASS). He is an author and contributing editor for SQL Server Magazine.
I spoke to Douglas ahead of PASS Business Analytics Conference in Chicago April 10-12. (Note: KDnuggets readers can save $150 when you register for the PASS BA Conference by using the BAC13KDN discount code.)
GP: 1. What is the PASS Business Analytics conference?
[McD] It's a very exciting time for data professionals as more and more organizations turn to data-driven insights to stay ahead in today's competitive marketplace. Staying up to speed in this constantly changing world of data can be a challenge - that's where the PASS Business Analytics Conference fits in.
The conference was established to meet the needs of a growing Business Analytics community affiliated with Microsoft technologies such as Excel, SharePoint, SQL Server, Parallel Data Warehouse, Azure, Hadoop and more. The event is geared towards data and business analysts, data scientists, architects, and business analytics / business intelligence professionals and covers a wide range of information from data exploration and visualization, predictive analytics, content management and architecture, information strategies and much more.
2. What is the role of SQL Server in Microsoft eco-system?
[McD] As a partner and insider I have listened to Microsoft's vernacular shift from "SQL Server" to "Data Platform" and other similar terms. Some might think it a de-prioritization of SQL Server, but that would be a mistake. Microsoft is focused on the exploding business analytics (BA) needs of clients and understands it requires a complete toolbox of complementing technologies to deliver it all. As far as I can see SQL Server is and will be a core component to BA for Microsoft going forward. Whether it be in the cloud or on-premise, SQL Server will hold critical features and therefore the Microsoft licensing model for core BA functionality. I see SQL Server getting more robust and more integrated with the rest of the Microsoft BA platform (since SQL Server will not and should not contain everything).
3. What types of analytics SQL Server supports for different tasks?
[McD] SQL Server includes a ton in the box and then is very extensible. Leaving the extensibility aside you have a lot to work with:
- Classification: Decision Trees, Naive Bayes, Neural Networks, ...
- Clustering: Clustering, Sequence Clustering
- Outlier detection: Clustering without any previous knowledge, and also predictive algorithms including all used for classification + Logistic Regression if you have some previous outliers already marked
- Social network analysis: depends on the task you might use Clustering and Classification algorithms
- Recommendations: any predictive algorithm, including Decision Trees, Naive Bayes, Neural Network, Linear & Logistic Regression
- Visualizations: there are many data mining viewers shipped with SQL Server and additional you can also use Excel and Visio with Data Mining Add-Ins When I want to go deep on SQL Server data mining I go to SolidQ in-house expert Dejan Sarka, @DejanSarka. Props go to Dejan for the bullets above.
If you circle back to the overall extensibility of the Microsoft platform-that default features are just a starting point-it all gets even more interesting. These days the buzz is all around social media analytics. I am fascinated by my recent find: Dr. Marc Smith's NodeXL project that lets you do mapping and correlation of social activity using the Microsoft toolset. Check out this graph Marc made correlating Twitter hashtag activity surrounding the PASS BA Conference (#passbac and #sqlpass).
Marc is giving a session at the PASS BA Conference titled Charting Collections of Social Media Connections with NodeXL, that session alone will be worth attending the conference. If NodeXL is interesting, there is a short post out on the BA Conference blog, and you can download the bits on
4. What are some less known but useful features of Excel and SQL Server for analytics and visualization?
[McD] Wow, I could spend some time here, but I will just dish a few highlights with links. Most folks have no clue about the Data Mining Add-ins for Excel and don't know about coupling PowerPivot with Data Mining and on the horizon is a very cool new Excel-based tool called Data Explorer that will really aid in analytics prep and classifying social data - definitely check that one out!
5. Is "Big Data" overhyped and do you expect a trough of disappointment?
[McD] No. But it is greatly misunderstood. I think we sort of backed into the "Big Data" label... true big data is about the three V's: variety, velocity and volume... but big data technologies will add tremendous value even if all three are not met.
For instance, they are actively used on complex data problems regardless of data volume. Taking a step back, let's look at another type of big data: email. Consider where we were as individual email users just a few years ago... paaainfully surgically pruning unimportant mail, sorting the rest into innumerable folders, assigning categories and such, blindly dumping email of a certain age just to comply with quotas or disk management constraints.
Today most of us have given up on what could be important or how to locate it later and simply archive all of it to be indexed by our 25+GB Office 365 inbox. Corporate data and corporate data keepers are moving along the same continuum, better to simply dump it all into cavernous schema-later data stores and so they have it later. And later usually equates to a time of more wisdom, when it can be used it to ask questions never considered when the data was born. Forecasting and other advanced analytics is not unlike an unbalanced seesaw where we must balance each unit of foresight with a multiplier of history.
6. What is your view on NoSQL movement - when it is better than SQL and when it is not?
[McD] I prefer the common alternative definition "NOSQL" as Not Only SQL. As I related my thoughts surrounding Big Data, I think that NoSQL, specifically technologies like Hadoop, are very necessary data architectures for the variety, velocity and volume of today and tomorrow's data. NoSQL repositories are critical first stops for data en route to ultimately being structured for specific uses such as reporting and diagnostic analysis as well as highly complex forecasting and similar predictive analytics, as well as to the storehouses for data swept off the cleansing-process threshing floor for unforeseen future use.
7. Is there a "NoSQL Server" in the plans ? What are Microsoft solutions for Big Data and Hadoop integration?
[McD] Microsoft is investing heavily in Big Data and Hadoop. It has never been easier to stand-up a Hadoop cluster, anyone can do it now in the cloud using HDInsight Services for Windows Azure . Also Microsoft has partnered with HortonWorks as they are poised to release the first HDP on Windows, Microsoft has publically committed to future full-featured Apace Hadoop on Windows releases, and has also publically announced future support for combining Hadoop and RDBMS query loads handled by the PolyBase advanced query engine in SQL Server 2012 Parallel Data Warehouse Edition. This is all the public news, everyone should keep an eye on Microsoft's leadership in this space going forward.
8. Can you tell us about Windows Azure Marketplace for Data and Analytics?
[McD] The more data that you can access and use to enrich your existing datasets the more value you will derive from your analytics. The Azure Marketplace is a great source along with countless others (i.e. www.data.gov/, statcan.gc.ca/), free as well as fee-based. The cool thing about the Azure Marketplace is that Microsoft refuses to quit with just making more datasets accessible, they are committed to offering increasingly robust cloud-based tools to perform analysis with Azure and private datasets. Another space to keep your eye on Microsoft's activity.
9. What brought you to become interested in Business Intelligence and SQL Server?
[McD] I was not always in technology. I moved over from the restaurant industry where I was always very focused on guest needs and service. This made the transition to consulting very natural; as I progressed with database engagements I was always more interested in *why* clients were storing information and what value they could draw from it. So I was constantly doing iterative design and development to support client requests, that naturally led me down the reporting, analysis and analytics path. SQL Server was an obvious choice for me based on technologies used during my first employment experiences - then I found the community surrounding it made far more accessible than any other platform, it was fun and rewarding as I gained expertise. I love the SQL Server community and PASS and welcome new business analytics members to PASS - community is what makes the Microsoft platform so strong.
10. What recent book you read and liked?
[McD] I have been working my way through "Allen & Mike's Really Cool Telemark Tips: 109 amazing tips to improve your tele-skiing." I just started telemarking this winter - it has been fun and very humbling to start all over again.