A post describing the key differences between Pandas and Spark's DataFrame format, including specifics on important regular processing features, with code samples.
With all of the success that deep learning is experiencing, the detractors and cheerleaders can be seen coming out of the woodwork. What is the real validity of deep learning, and is it simply hype?
The integration of TensorFlow with Spark leverages the distributed framework for hyperparameter tuning and model deployment at scale. Both time savings and improved error rates are demonstrated.
Feature engineering plays major role while solving the data science problems. Here, we will learn Feature Hashing, or the hashing trick which is a method for turning arbitrary features into a sparse binary vector.
A detailed explanation of one of the most used machine learning algorithms, k-Nearest Neighbors, and its implementation from scratch in Python. Enhance your algorithmic understanding with this hands-on coding exercise.
With mathematical rigor and narrative flair, Adam Kucharski reveals the tangled history of betting and science. The house can seem unbeatable. In this book, Kucharski shows us just why it isn't. Even better, he shows us how the search for the perfect bet has been crucial for the scientific pursuit of a better world.
R vs Python for Data Science: The Winner is ...; 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning; Top 20 Python Machine Learning Open Source Projects; 50+ Data Science and Machine Learning Cheat Sheets.
Are you interested in massive amounts of data for research? Yahoo has just released the largest-ever machine learning dataset to the research community.
Research Leaders in Data Science and Big Data reflect on the most important research advances in 2015 and the key trends expected to dominate throughout 2016.
The top 10 deep learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.
An overview of attention mechanisms and memory in deep neural networks and why they work, including some specific applications in natural language processing and beyond.
There are many deep learning resources freely available online, but it can be confusing knowing where to begin. Go from vague understanding of deep neural networks to knowledgeable practitioner in 7 steps!
We often look back at the past year and an overall history of rare events, and try to then extrapolate future odds of the same rare event, based on that. We illustrate here, that rare past events have no usefulness in understanding the rarity of the same events in the future!
A well-built resume is key to get through the first door – in the process of getting hired as a Data Scientist. Learn more, about how to present yourself as a true DS and which pitfalls to avoid.
Jake Porway is a machine learning and technology enthusiast, and founder of DataKind nonprofit which helps organizations use the power of data science in the service of humanity. He will do Reddit AMA on Jan 13, 2016.
We witness an explosion of Big Data in finance, biology, medicine, marketing, and other fields. This book describes the important statistical ideas for learning from large and sparse data in a common conceptual framework.
Hiring Data Scientists is no easy job, particularly when there are plenty of fake posers. Here is a handy list of questions to help separate the wheat from the chaff.
There are only five questions machine learning can answer: Is this A or B? Is this weird? How much/how many? How is it organized? What should I do next? We examine these questions in detail and what it implies for data science.