KDnuggets Home » News » 2017 » Jan » News, Features » AI, Data Science, Machine Learning: Main Developments in 2016, Key Trends in 2017 ( 17:n01 )

AI, Data Science, Machine Learning: Main Developments in 2016, Key Trends in 2017

2017 is here. Check out an encore installation in our "Main Developments in 2016 and Key Trends in 2017" series, where experts weigh in with their opinions.

At KDnuggets, we try to keep our finger on the pulse of main events and developments in industry, academia, and technology. We also do our best to look forward to key trends on the horizon.

Over the past few weeks, we published a series of posts outlining expert opinions in data science, machine learning, artificial intelligence, and related fields. See these previous iterations below:

In an encore post of this series, we bring you the collected responses to an amalgam question -- including experts from all of the previous posts' fields -- while adding a second dimension this time around.

1. "What were the main developments in AI, Data Science, Machine Learning in 2016 and what Key Trends do you expect in 2017?"

2. "How can we get more women involved in this field?"

Without further delay, here is what we found.

DS DM ML AI Experts 2017

Jennifer Chayes, Distinguished Scientist & Managing Director, Microsoft Research New England and Microsoft Research New York City

I’d like to thank one of my researchers, Alekh Agarwal, for great input here.

Main developments in 2016:

  • Reinforcement Learning: Many advances were made on the empirical side, notably AlphaGo. Also remarkable was the number of new platforms for testing RL methods, including Project Malmo from Microsoft, Universe from OpenAI, DeepMind Lab from Google, and TorchCraft from Facebook. There were major contributions from Microsoft Research NYC: On the practical side, the Decision Service is the first deployable system for RL: arxiv.org/abs/1606.03966; on the theory side, the Contextual Decision Process https://arxiv.org/abs/1610.09512 is the first statistical foundation for RL with contextual policies.
  • Fairness Issues: Of note, there were several works trying to quantify what it means for fairness to be a part of algorithm design, see e.g. the 2016 FATML workshop: www.fatml.org/schedule/2016.
  • Physics and AI: My collaborators and I have developed a non-equilibrium statistical physics explanation of the “unreasonable effectiveness” of learning (shallow) neural nets: www.pnas.org/content/113/48/E7655.full?sid=e1bedfb2-c8ab-441c-bcd2-dbdd7005721c, which we are working to extend to DNNs.

Trends for 2017:

  • Reinforcement learning: More advances, particularly spurred by the release of the new platforms in 2016.
  • Physics and AI: The development of new algorithms for deep neural nets, including more principled algorithms with better performance, based on non-equilibrium physics ensembles.
  • Non-convex optimization: Over the last few years, there has been increasing understanding of when and how we can solve non-convex problems in ML. Expect to see additional breakthroughs here in the next year or two.

Getting more women involved in the field:

The way to increase the number of women in AI, ML and data science is two-fold. First, we must expand the definitions of the fields to include their interaction with the other sciences, including the biological and social sciences. A prime example of the contribution of the social science approach to AI is development of new field to make ML more fair, accountable and transparent (FATML), a community in which many of the leaders are women. Second, we need to establish networks among the women who are already in the fields. A great example is Women in ML http://wimlworkshop.org/ , an organization co-founded ten years ago by four women in ML who were rooming together during the 2005 NIPS conference. The first WiML workshop took place at NIPS 2006 with almost 100 participants. WiML 2016 had almost 600 participants, which was roughly 10% of NIPS. The networks established through WiML tend to keep women in the field, and in particular to expose them to new opportunities.

Jill Dyche, Vice President, SAS Best Practices

Natural language. Writing this feels like an anachronism, since I was working with a natural language processing back in the early 1990s. Some smart folks at a company called AICorp had figured out how to pose English questions to a database. A program would parse the question into words that would be referenced against an established lexicon that could be used to automatically translate the request into SQL. Thus the question, “What were last year’s revenues for Widget x compared to the prior year’s?” would reliably (though not necessarily quickly) return an accurate answer.

As you can imagine, this opened up a whole new world to business users accustomed to waiting for programmers to email them datasets that they would then be forced to pick through, often extrapolating the right answer.

Flash forward to an hour ago when I just asked Alexa to dim the lights in my office. As consumers become more comfortable using commercially available natural language functions - “Okay, Google” - the ability to combine natural language, voice recognition, and advanced analytics will continue to offer promising new capabilities.

Now, where are my keys? “Alexa...?”

I both appreciate and dread this question. Why? Because my knee-jerk response is to answer in a way that separates women from men.

But then again, in the tech world women are separated from men, both physically (more women work in the service sector, more men have offices with doors) and metaphorically. So I’ll stand by my knee-jerk answer: Offer women communities. After all, there are fewer of us around, so it’s much harder to find other like-minded colleagues who share our experiences. (This phenomenon is even more prevalent in the minority and LGBTQ communities.) We yearn to hear from women who have overcome what we’re grappling with now, to hang out and learn together. And we want to hear from men—especially from men in power—about how to widen the circle. Bring us together and acknowledge we’re a sub-group who wants to participate, be better, and help our companies thrive. Invite more of us to the recruiting table, and into the board room. Studies show that revenues will rise—and thus so will our numbers.

Ian Goodfellow, Research Scientist at OpenAI

1. What were the main developments in AI, Data Science, Machine Learning in 2016 and what Key Trends do you expect in 2017?

One of the largest trends of 2016 was the mainstreaming of reinforcement learning, beginning with AlphaGo's victory. We've begun to see machine learning move from datasets to environments, beginning with OpenAI Gym and culminating in OpenAI's Universe and DeepMind Lab.

Adversarial training has become very popular during the past year, with Yann LeCun describing it as the best idea in machine learning in the last ten years. "Adversarial" was a more common word in ICLR 2017 submission titles than "reinforcement," "variational," "convolutional" or "unsupervised."

On the commercial side, we've seen NVIDIA's stock soar, numerous deep learning startup acquisitions, and the rise of far more players in the autonomous driving space.

In 2017, I think we can expect to continue to see an influx of people and dollars into the field. The deep learning hardware market will become more competitive and more complex, with new specialized hardware taking on GPUs. Companies that were not traditionally associated with machine learning will build machine learning teams to streamline their business.

As the field grows, researchers will be able to diversify and populate previously niche areas, like AI safety, machine learning security and privacy, and economic effects of AI.

2. How can we get more women involved in this field?

This is a complex issue and I will comment on just one aspect of it where I have some experience. This year OpenAI ran the first-ever Self-Organizing Conference on Machine Learning. We made a major effort to boost participation from groups that have traditionally been underrepresented in machine learning, and we learned a lot in the process. For conference organizers, I can definitely suggest having a good code of conduct (we used one created by the Ada Initiative) that ensures everyone will feel welcomed. We also found that it's very important to do a lot of outreach. We sought out several people from underrepresented groups and explicitly invited them to the conference.

We found that people from underrepresented groups were less likely to be able to attend (due to not being able to get time off from work, obligations to care for family members, etc.) so this extra effort was necessary to have a more balanced conference. Finally, we gave out travel grants to members of these groups who want not be able to attend otherwise. Overall, we feel that these efforts greatly improved the conference for all participants.

Nikita Johnson, Founder, RE•WORK

1. At RE•WORK events over the course of 2016, both unsupervised learning and reinforcement learning became a more prominent feature in talks and discussions, startups taking part to the industry leaders in the field. I'm sure this will continue to advance further over the next year. In 2017, I am expecting to see further advancements in applying deep learning to understand and predict videos, working towards summarizing what happens in a video clip.

2. It's a subject we are all aware of at RE•WORK as we are currently an all-female team! To try to get more women involved in the field, we have created a series of 'Women in Machine Intelligence' dinners to provide a welcoming and supportive environment for women to present their latest work to their peers, but also for women to attend to meet other women working in the field. The attendees are always a mix of large established companies, startups and researchers, to provide an interesting networking environment for women looking for inspiration, new job roles, or new collaborative partners at the event. We'll continue to grow these type of events in 2017 and hope to see yet more women becoming role-models for the next generation interested in AI and data science!

Zachary Chase Lipton, PhD student in the Computer Science Engineering department at the University of California, San Diego; Contributing editor at KDnuggets, blogger at Approximately Correct

So much has happened in 2016, it's hard to say where to begin. On the industry side, I would say that 2016 was the year that data science went beyond analytics. For a long time, machine learning academics have had big ideas about intelligent agents making real-time decisions in real-world settings. And we've had some experimental success and fancy mathematics to support these ideas. But when you looked at industry, most people have clung to more mundane "analytics" applications. In this shallow view, machine learning algorithms are useful mainly for data visualization, building dashboards, basically decision support. For many years, a small cadre of major players like Google, IBM and Microsoft have integrated machine learning into live systems. But it's been rare. Over the last year, we've seen story after story of ML powering self-driving cars, neural machine transalation, personal assistants, and more. I expect these trends to continue.

As in any year, research tended to lead practice, even within companies. While the major developments in live products all relied on supervised learning, this paradigm has several frustrating limitations. First, supervised learning models conditional probabilities P(Y|X), assuming data is observed, but not tampered with. Supervised learning tells us about correlation, but not causality. It tells us nothing about what might happen to Y when we take an action, i.e., P(Y|do(X)). Another limitation of supervised machine learning is that our most effective algorithms (deep learning) tend to rely on thousands or millions of labeled examples.

From a research persepctive, I saw 2016 as a year in which the primary focus of the applied machine learning community shifted beyond supervised learning. Two developments stand out: deep reinforcement learning (DRL) and generative adversarial networks (GANs). While these ideas have roots in papers from 2013 and 2014, respectively, this year each gave birth to a large community of dedicated researchers. Deep reinforcement learning weds the representational power of deep neural networks with the reinforcement learning paradigm, in which an agent optimizes a policy to increase the reward signal it receives. While the methods were made famous on games like Atari and Go, we're now starting to see more research on deep reinforcement learning for practical applications. Generative adversarial networks are a creative new approach to unsupervised learning, developed by Ian Goodfellow. Initially GANS were used primarily for generative modeling, but new papers extend them to tasks like auto-encoding and semisupervised learning.

Looking forward to 2017, I see a few big trends. For one, there's a sense in industry and academia that dialogue systems may be the next big area to fall. With recent progress in speech recognition and sequence tagging, the major primitives are already in place. We should note, there are still some big open questions. What should the objectives of an artificial interlocutor be? But for constrained domains, I expect rapid progress. I also expect to see more advanced efforts to apply DRL to real-world problems. A final prediction is that machine learning will start to realize its promise in the medical domain. For some problems, like radiology, the basic technology should already be in place and the remaining hurdles may mostly be HCI and regulatory issues. For other problems, like predicting treatment response, I expect to see researchers start to think seriously about reinforcement learning for medicine.

The lack of women and traditionally underrepresented racial groups in machine learning, and computer science broadly is a question I think about constantly. I'd like to help in any way I can. And yet I'm also aware that as a Caucasian man, there may be certain limitations on my ability to be a leader in this movement. Perhaps the biggest thing we can do is to give the women who are entering this field and embarking on their careers every possible opportunity to succeed. I think this includes some amount of affirmative action. It also includes being critical of the selection criteria we use. If CS departments prioritze students with many years of programming experience, this could reinforce the discrimination that keeps women out of CS in high school.

Another thing we can do is support the organizations that are already working well. Women in Machine Learning (WiML) is a fantastic group that has organized workshops co-located with NIPS. Companies and universities should sponsor their events and support the great work they do to promote women in our field and showcase their work. This year, I attended as the co-author of a paper by my collaborator at UCSD, Subarna Tripathi. As stark as the demographic numbers seem, I'm optimistic about gender diversity in machine learning. Already, there are so many prominent ladies at the top of the field that everyone, regardless of gender, can look up to. Anima Anandkumar, Joelle Pineau, Jennifer Chayes, Finale Doshi-Velez, Anca Dragan, Fei-Fei Li, Suchi Saria are a few that pop out at me. Among my peer group I've also been grateful to learn a lot from Been Kim, Hanie Sedghi, Subarna Tripathi, and my collaborator on approximatelycorrect.com, Victoria Krakovna.

Hilary Mason, Founder at Fast Forward Labs

At Fast Forward Labs we do applied machine learning research and data science advising for a variety of clients with interesting opportunities. This means I have a few different perspectives on the interesting developments in the field.

From a purely algorithmic perspective, I'm very excited about probabilistic programming, and how it is becoming useful for combining human knowledge with data to learn something rigorous about the world. I'm also optimistic about the continued pace of expanding possibilities for using neural networks, particularly in constrained compute environments or for complex NLP problems.

From an application perspective, we're still at the very beginning of meaningful automation. In 2017 we'll see significant automation in very narrow domains. I expect much of this will be for high-value problems, so think of finance and medicine, not consumer applications, just yet.

Finally, there's a ton of interesting work happening around the practice of data science, machine learning, and AI. What does it mean to be a data scientist? What kinds of processes do we need to effectively explore data and tie those into meaningful products? How do we build new on ramps into the field? What will data science even look like in ten years, and what tools do we need to build to get there?

One of the things that I love about working in data science is that people come to the practice from a variety of different backgrounds, both academically and culturally, and that makes working in data science richer than other, more rigidly defined, disciplines. Diversity isn't just a matter of gender, either. Nor is it a pipeline problem. There's a textbook of content that could go here, and I'm hardly an expert, but this is just common sense: If you want to see more diversity in the set of people working in data science and machine learning, offer opportunities to a more diverse set of people. They'll be amazing. And when you invite them into your environment, do your best to make it welcoming.

Michael O'Connell, Chief Analytics Officer, TIBCO

Some key developments in 2016 included:

  • More emphasis on “representative data”, rigorous analytics and data science to identify insights and understand business issues.
  • Mainstreaming of machine learning and predictive analytics for business, customer and engineering applications.
  • Rise in deep learning, beyond the big internet companies, especially for some specialized applications e.g. fraud in the banking system.
  • Continued rise in engineering analytics – especially in IIoT applications, where anomaly detection is foundational.
  • Significant uptick in “systems of insight”, where insights from analytics are transformed in to notifications, alerts and actions on the business.
  • Continued migration to governed data discovery across the corporate landscape – providing self-service, but with guidance and best practices; along with performance, governance and security.
  • Beginnings of hybrid cloud adoption with scalable tenant resources and contextual routing, along with hybrid data and elastic compute engines.

For 2017, I see more action in all these areas, especially in “systems of insight” – turning insights from visual and predictive analytics in to actions. This includes real-time streaming analytics for rapid intervention and action, at moments of truth in business processes. I also see less of a boundary between data preparation and visual analytics, as data wrangling and insight discovery become more intertwined.

My TIBCO Data Science team currently comprises ~40% women. I've made no deliberate attempt to favor women in the hiring process. On analysis, I've been drawn to the work ethic, passion, productivity and professionalism of the women on my team.

I like the dynamic of men and women on my team. It mirrors our day to day world, and I think it helps bring forth a rich spectrum of ideas. My team is also heterogenous on age, race, industry background and skills across stats, computing, data, viz, software development and presentation. The entire team enjoys the mentoring and collaboration across these dimensions.

Data Science requires strength in analytics, computing, data and business. It's enormously satisfying to see these dimensions addressed across the team in virtual, collaborative workstreams; creating value, impact and action in our software, solutions and customer deployments.

Elena Sharova, Computer scientist with specialization in machine learning; Data scientist in financial risk measurement

In my view 2016 has been a very good year for AI, Data Science (DS) and Machine Learning (ML). Firstly, the breadth of industries and areas where AI, DS and ML are being applied has seen considerable growth. From art and literature to science and business, I am seeing new applications almost on a daily basis. Secondly, the awareness of this field and its potential has greatly increased, more and more people are looking to study it, or change career. Finally, 2016 has delivered timely lessons of caution about the degree of reliance on models.

The key trends in 2017 should continue along the same lines: finding new areas for application and increased emphasis on validation of modelling assumptions.

As in Computer Science or other technical fields, it may take some time to achieve a good gender representation. However, I am convinced that DS and ML will attract more women in the near future. This is because a successful data scientist requires skills such as story telling and an eye for detail which often come naturally to women. Again, the breadth of application will ensure that female data scientists can pick a niche, like data journalism or data science applications in psychology and sociology, if they choose to do so. I have personally been inspired by both female and male role models, and as long as we maintain supportive and collaborative work and study environments, the gender misbalance will go away.

Tamara Sipes, Principal Data Scientist at Optum/UnitedHealth Group

Key Trends:

i) I see the data science productivity tools such as the R caret package or DataRobot and similar as gaining more popularity. These allow data scientists to quickly test a variety of methods, parameters, as well as sampling techniques and compare them based on a predetermined evaluation comparison.

ii) There are many indicators that Deep Learning and Ensemble Modeling will continue to be utilized more in the day to day applications, not just in data science competitions.

iii) Dealing with streaming data and evolving or changing data will likely be the focus in the near future as well. The models built on historical data that has since evolved need to be refreshed and fine-tuned accordingly.

Including women: We can start by reaching out to middle and high school level students and building up from there. Mini seminars or motivating talks by well established female data scientist or researches in the field would be my recommendation.