The Future of Analytics and Data Science
Learn about the current and future issues of data science and possible solutions from this interview with IADSS Co-founder, Dr. Usama Fayyad following his keynote speech at ODSC Boston 2019.
By Kate Strachnyi, DATACATED.
Kate Strachnyi: Given the huge diversity of roles for people in data, what behavioral changes or tools are getting adopted in the future?
Usama Fayyad: So I think the tools and behavioral changes in organizations are maturing probably in an expensive way more than the real way, meaning they're going through good and bad experiences of hiring data scientists. Some of them are seeing the value, some are seeing they made bad hires, and now they have to recover from that by firing or replacing. I think what's coming out of that is organizations are beginning to understand that they need to do a more thorough evaluation. And one of my biggest rules about hiring data scientists is that it takes one to know one. So if you don't have a good data scientist on board, your chances of hiring another good data scientist aren't near.
So then where do you begin if you’re starting a department or don’t have a good scientist on board already which is why you’re trying to recruit, right? How do you solve that? We believe that by developing the standards, developing good descriptions of what are the roles, what are the positions and what is the training required for each of these roles, we can actually make it a lot easier for people to both sift through a lot of resumes and then hone in on the ones that look promising and then hone in on the interviews that are likely to be valuable and then know what to ask in the interviews. We shared a lot of feedback from candidates who say, hey, I interviewed at ten different places for the same job. And the interviews other than two little bits around programming had almost nothing in common. And each interview was a whole day affair with a completely different kind of an approach to it.
Kate Strachnyi: Well, one thing I'll say is if there are, let's say ten common questions that are expected to be asked of a data scientist, you can expect the answers to those questions to be posted on Google somewhere. So people will just memorize that and come in for interviews. That's another worry to think about.
Usama Fayyad: Of course. And that's why there's no substitute for doing a live follow up where you dig deeper. It's not enough to ask the canned question. When you're doing a video interview, there are tools that can check for these behaviors to see if somebody is looking somewhere else or if somebody else is sitting in the background whispering the answers. I'm amazed there is technology now where people use AI to detect whether to flag something in a video interview that's proctored, and there are companies which offer these services. And when you get a red flag, you drill down and say, do you really know this area? Let me ask you a few follow-up questions. And typically, somebody who's cheating would collapse very quickly.
Kate Strachnyi: Relevant to what we were talking about, there are a lot of people that want to be a data scientist, but they’re also a lot of technological innovations in AI that’s coming into play that help the data scientist do their job. So do you think that that the skills gap is going to close because basically, robots are taking our jobs? So is that a problem?
"AI is not about replacing the human with a robot. It is about taking the robot out of the human."
Usama Fayyad: I think it's the MIT data lab or the MIT media lab where they came up with the motto "AI is not about replacing the human with a robot. It is about taking the robot out of the human". So I think what is happening with AI and a lot of these technologies is they are making our jobs easier. I actually do not believe at all that they're capable of replacing our jobs. The jobs that are capable of replacing are the very mundane, very robotic, very repetitive type of tasks that I think machines are better off doing than humans. We need the humans because to this date; we don't know how to build a machine that has something that most humans have, which is common sense and ability to come up with judgments under new situations quickly.
The analogy I like to use is autonomous driving. I don't think we will see autonomous driving in the near term. It will probably take more than 30 years. But I do believe that there are many areas today where these AI algorithms can help us a lot. So, avoiding collisions when it's obvious that you're going to collide with a driver that is distracted. Getting warnings and applying brakes. These are helpful ones. Following lanes can also be helpful. A lot of these tools that assist you to do some other tasks, for some people parallel parking, can now be automated, and it's a good thing. So in these areas, you can automate much of this, but so far we have not been able to build the machines that can anticipate situations that we haven't seen before and quickly react and map knowledge from another similar situation to this one and apply it effectively. I have many examples of that, but that's why I don't believe how autonomous driving will happen, at least in my lifetime. But I think the machines were advanced enough to do a lot of the mundane tasks and help me when I'm distracted or when I'm incompetent or whatever. We haven't yet figured out how to correct that general intelligence, which seems to exist in humans and also in many animals.
Kate Strachnyi: Okay. So you're saying we're safe for now?
Usama Fayyad: Yes. In fact, historically looking at the past two AI winters, and I think there will be a new AI winter because of all the hype, we created a lot more jobs than we've eliminated. So you open up a whole bunch of new areas where people can do a lot of higher-value work.
Kate Strachnyi: Removing the mundanity from the requirements for human activities frees the human to be more responsive, creative, and proactive. Hopefully, there should be many benefits to many areas of the industry rather than a detriment. Do you agree?
Usama Fayyad: I completely agree and in fact, that is completely supported. I'll use a very basic example that has little to do with data science but is related. Accounting over a hundred years ago opened up these huge ledgers and spent days adding numbers and double-checking that you didn't make an error. In addition, there were all sorts of tricks to avoid errors and to double-check yourself with these ledgers that get dusty, and are impossible to access. Nowadays, no one at all would ever think about doing accounting without software doing the actual mundane work of keeping track of the numbers, adding them up, doing all the right things, creating the balance sheets and all of that. That, to me, is an example where now accountants can think about more strategic things. We can think about things such as “Was this expense necessary?”, “Does this make sense?”, “Could we save money here?”, “Could we utilize the assets better?” etc. Stuff that they never had time to think about. And that's really where the value is in managing money.
Kate Strachnyi: What’s the impact of data technologies on the expectations from the business?
Humans typically will consume data at the level of graphs and summaries, while machines like a machine learning algorithm want the detail of every little transaction and what was around it.
Usama Fayyad: The biggest thing we've seen is a huge wave of digitization. I think, and this is near and dear to my heart, in a lot of digitization, or what's called digital transformation efforts, people start digitizing a lot of the manual tasks, making them often more accurate, less repetitive, boring, and faster. All of that good stuff. But data ends up being an afterthought. So what happens is they create what we call "instant technical debt" because you have now built mechanisms to digitize and you forgot about questions such as "How do I capture the right data?" "How do I represent that data?" "How do I store that data?" "How do I retrieve it at the right time?" and "What level of data?". Humans typically will consume data at the level of graphs and summaries, while machines like a machine learning algorithm want the detail of every little transaction and what was around it.
And that is completely non-consumable by human but necessary for learning algorithms. So, to me what's happening now is people are now rethinking and saying, okay, if I'm really doing a proper digitization, I want to make sure that I put in the right brains and the right intelligence to actually design it in such a way that when I'm capturing the right data, managing the data correctly, and most importantly, enabling the algorithms which are very finicky machine learning algorithms which only require data in a certain format and completely collapse if it's not in that format, to be able to consume it. And that's what I think is changing now and becoming better, especially with big data, which makes it easy to deal with the different types of data.
Kate Strachnyi: Enterprise risk tolerance with data balancing between information security versus information utilization, what are your thoughts on this as a Chief Data Officer?
Usama Fayyad: A huge and very important topic. I'm a strong believer that you can have maximum utilization with maximum privacy. You just have to be careful about how you do it. So many organizations are obsessed with data leaks, attacks, and hacks. It turns out that most of the threats are internal. And a lot of these internal threats are from people who intentionally or unintentionally end up installing bad software, malware, etc. That's called social engineering. That's how the bad guys get you to bring it in, even if you're not connected to the outside. And in fact, the very famous breaches happened that way, including some of the famous ones in the news. The thing I want to say here is many organizations assume that once you're in, once the perimeter is secure, you're safe.
It turns out that most of the data threats are internal. Data should be encrypted. The keys should only be accessible by people who actually have reasons to access it.
And that's a very bad assumption. And by the way, with IoT, the Internet of things, that's becoming a super bad assumption because there is no such thing as a perimeter in that world. So, the proper practice here is simple, right? Data should be encrypted. The keys should only be accessible by people who actually have reasons to access it. And the management over the keys needs to be active enough to make sure that nobody is counting the keys for historical reasons. And the keys get refreshed all the time. The keys can be instantly changed so that people can be denied access instantly when something bad happens. Those technologies, by the way, are available today, they're just not used out of laziness. So if you do it properly and you make sure it's the proper access, often a lot of that utilization and information can be done by algorithms.
No human needs to actually look at it. And the beauty of a machine learning algorithm that is looking at a Data-set is that it doesn't need any of what we consider private information. The PII (personally identifiable information) for example, is useless to an algorithm. If you have a name or a social security number, the algorithm throws it out because it's a unique identifier for each data record. It has no predictive value unless it's a bad algorithm. But it would glean the overall predictive pattern that says, oh, when people use this product and this feature, they tend to run into these kinds of problems. That's the useful stuff that comes out of it. Or here, our customers were looking for something and it's an opportunity for us to double our sales. So these things can be gleaned from the data through algorithms that can be run securely without humans actually having access to it. And without endangering the privacy of that data, you just need to have a very well controlled and architected story on who gets access to the data when and for what reasons.
Original. Reposted with permission.
Bio: Kate Strachnyi is an author, Advisory Board Member of IADSS, Udemy instructor, and host of the Datacated Weekly, a project dedicated to helping others learn about various topics in the data realm.
- Top 10 Technology Trends of 2019
- Getting to the Future First: How Social Data is Transforming Trend Discovery
- The title CDO started out as a joke
Top Stories Past 30 Days