Matthew Mayo (@mattmayo13) holds a Master's degree in computer science and a graduate diploma in data mining. As Managing Editor, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
Third year Ph.D student David Abel, of Brown University, was in attendance at NIP 2017, and he labouriously compiled and formatted a fantastic 43-page set of notes for the rest of us. Get them here.
As we bid farewell to one year and look to ring in another, KDnuggets has solicited opinions from numerous Machine Learning and AI experts as to the most important developments of 2017 and their 2018 key trend predictions.
As we bid farewell to one year and look to ring in another, KDnuggets has solicited opinions from numerous Big Data experts as to the most important developments of 2017 and their 2018 key trend predictions.
Do you assume that deep learning is only being used for toy problems and in self-learning scenarios? This post includes several firsthand accounts of organizations using deep neural networks to solve real world problems.
Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.
Wikipedia is a rich source of well-organized textual data, and a vast collection of knowledge. What we will do here is build a corpus from the set of English Wikipedia articles, which is freely and conveniently available online.