A great introductory post from DataRobot on getting started with data science in the Python ecosystem, including cleaning data and performing predictive modeling.
Strata + Hadoop World is the leading event on how big data and ubiquitous, real-time computing is shaping the course of business and society. Win KDnuggets free pass to Strata + Hadoop World New York City.
Using the Google Places API and IMDb API, we selected movie locations in The Golden City which every movie fan should visit while they are in town, and optimize sightseeing by solving the travelling salesman problem.
Data science may be a relatively recent buzzword, but the collection of tools and techniques to which it refers come from a broad range of disciplines. Physics has a wealth of concepts to learn from, as evidenced in this piece.
Check out part 2 of this tutorial on building chatbots with deep neural networks. This part gets practical, and using Python and TensorFlow to implement.
Statistics can often be the most intimidating aspect of data science for aspiring data scientists to learn. Gain some personal perspective from someone who has traveled the path.
We outline preprocessing steps for finding, removing, and cleaning data to prepare it for machine learning and how tools like MATLAB can help with data exploration, identification of key traits, and communicating the findings.
Finally, a #TensorFlow book for humans; Great math-free simple intro explanation video: Deep Learning Demystified; Does #sentiment analysis work? A tidy analysis of Yelp reviews; JupyterLab: the next generation of the #Jupyter Notebook
Discover a set of techniques and methodologies to analyze and explore telecommunications data in order to improve business and operational performances. This new course debuts Sep 14 at Analytics Experience 2016 in Las Vegas.
Learn the latest business practices, concepts, methodologies and techniques in advanced analytics, data mining, survival analysis, explaining analytics to decision makers, fraud detection, and more with the SAS Business Knowledge Series.
PAW Government provides the best information on applying predictive analytics to government with a special track that includes technical training on most relevant tools and concepts. Get extra KDnuggets discount w. code KDN150.
Analysts are often on the lookout for patterns, often relying on spurious patterns. This post looks at some spurious patterns in univariate, bivariate & multivariate analysis.
A series of videos and write-ups covering the basics of data science for beginners. This first video is about the kinds of questions that data science can answer.
Sharpen your edge with an online master’s degree in data science and analytics (MS) from the University of Missouri Informatics Institute. Apply today!
This post covers predicting award counts by the United States in an international beer competition. Exploratory data analysis and Bayes methods are also supported.
Understand emerging big data trends, develop new technical skills through hands on workshops, analyze multiple industry case studies, learn emerging best practices in big data. Use code KDNUGGETS to save.
Check out the first of a 3 part introductory series on machine learning in Python, fueled by the Titanic dataset. This is a great place to start for a machine learning newcomer.
A discussion of what about deep learning architectures allows them to scale, and addresses some assumptions that often inhibit an understanding of this topic.
If you have heard about the Internet of Things many times by now, its time to join the conversation. Explore the many open source tools & projects related to Internet of Things.
Become irreplaceable at Level Bootcamp by learning how to use data to solve real problems. Get 15% KDnuggets discount for upcoming programs in Boston, Seattle, Charlotte, Silicon Valley, and online.
Why Big Data is in Trouble: They Forgot About Applied Statistics; In Deep Learning, Architecture Engineering is the New Feature Engineering; 5 Big Data Projects You Can No Longer Overlook; What Has Pokemon Got To Do With Big Data?
For me, the millions of people around the world playing Pokémon last weekend (and crashing their servers on a regular basis) showed me a glimpse of the future. There may well be an opportunity for real-time Big Data - I will give you a glimpse.
The final installment of this comprehensive overview on building an end-to-end data science portfolio project focuses on bringing it all together, and concludes the project quite nicely.
When it comes to business value and ROI, does machine learning live up tot he claims? We’ll explore a pure machine learning approach through the lens of a typical enterprise use case.
Continuum Analytics CTO Peter Wang will show how you, an analytics leader, and your team can continuously leverage the latest innovations in data, analytics and computation by joining the big data party in the Open Data Science tent.
There are lots of flame wars involving different data science and analytics tools... but this isn't one of them. Check out the quantitative results and analysis of a Burtch Works survey on the subject.
This is a fast paced, vendor agnostic, technical overview of the Big Data landscape. No prior knowledge of databases or programming is assumed. Use code KDNUGGETS to save - extra discount if you register by July 31.
Through our project-based graduate program, you'll get an ethical approach to data science, helping businesses untangle the complexities of data collection and analytics to build a better business and a more equitable society. Now that's a beautiful thing.
The second part of this comprehensive overview on building an end-to-end data science portfolio project concentrates on data exploration and preparation.
Learn about some interesting projects featured at SciPy 2016, brought to you by an attendee who put in the work to bring you this great list of projects.
Algorithmia introduces a solution for hosting and distributing locally-trained deep learning models on Algorithmia using GPUs in the cloud, where they become smart API endpoints for other developers to use.
Check out 5 Big Data projects that you are not likely to have seen before, but which may be useful to you, and perhaps even scratch an itch you didn't know you had.
Dataquest's founder has put together a fantastic resource on building a data science portfolio. This first of three parts lays the groundwork, with subsequent posts over the following 2 days. Very comprehensive!
Bayesian #MachineLearning, Explained; JupyterLab: the next generation of the #Jupyter Notebook; On the importance of democratizing #ArtificialIntelligence
Here's a curated short list of interesting and insightful talks to watch from SciPy 2016 to help guide your search through the volume of great video material emerging from the conference.
Build in-demand skills for the growing analytics field with the Northwestern University Master of Science in Predictive Analytics degree, completely online.
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.
This "classic" (but very topical and certainly relevant) post discusses issues that Big Data can face when it forgets, or ignores, applied statistics. As great of a discussion today as it was 2 years ago.
Here is a collection of introductory predictive analytics terms and concepts, presented for the newcomer in a straight-forward, no frills definition style.
This week is your last chance to get the Best Price for the O'Reilly Artificial Intelligence Conference happening in New York September 26-27. Register with your KDnuggets discount code now!
Top Machine Learning MOOCs and Online Lectures; Bayesian Machine Learning, Explained; 10 Algorithm Categories for A.I., Big Data, and Data Science; 5 Deep Learning Projects You Can No Longer Overlook; The Hard Problems AI Can't (Yet) Touch
In this wide-ranging interview, we discuss the role of IBM global chief data officer, 4 key ideas of cognitive computing, risks of AI, IBM Data Science Experience, healthcare, basketball, sports analytics, and more.
Topic modeling is a a great way to get a bird's eye view on a large document collection using machine learning. Here are 3 ways to use open source Python tool Gensim to choose the best topic model.
This is a summary of the basic principle behind a new paper on multiple test correction for streams and cascades of statistical hypothesis tests, showing how to strictly control the risk of making a mistake over a series of tests and draw appropriate conclusions.
An interesting excerpt from Burtch Works' recently published Burtch Works Study: Salaries of Data Scientists 2016, focusing on trends disrupting the data science market.
Dr. Philip S. Yu wins ACM KDD Innovation Award for his influential research and scientific contributions on mining, fusion and anonymization of big data.
With a focus on leveraging algorithms and balancing human and AI capital, here are the top 10 algorithm categories used to implement A.I., Big Data, and Data Science.
Prof. Wei Wang wins ACM SIGKDD 2016 Service Award for her significant technical contributions to the principles, practice and application of data mining and for her outstanding services to society and the data mining community.
The Impetus Data Warehouse Workload Migration product is a proven, cost-effective, and low-risk solution to offload traditional data warehouse to Big Data warehouse. Contact us for a proof-of-concept.
Statistical Data Analysis in #Python (#Jupyter Notebooks); Modern Pandas: idiomatic Pandas notebook collection; New (free) book by @rdpeng: #rstats Programming for #DataScience
Want to know about Bayesian machine learning? Sure you do! Get a great introductory explanation here, as well as suggestions where to go for further study.
Learn examples of success with text exploration, what engineers and scientists can (and should) do with text data, and the consequences of collecting data and doing nothing with it.
This post is a summary of Serban, et al. "A Survey of Available Corpora for Building Data-Driven Dialogue Systems," which is of increasing relevance given the recent state of conversational AI.
Successful analytics in the big data era does not start with data and software, but with hands-on, immersive training and goal-driven strategy - get it from The Modeling Agency online in August.
This post evaluates four different strategies for solving a problem with machine learning, where customized models built from semi-supervised "deep" features using transfer learning outperform models built from scratch, and rival state-of-the-art methods.
Learn how to get started with predictive modeling and overcome strategic and tactical limitations that cause data mining projects to fall short of their potential. Next webinar is July 14.
Join The Big Data Channel and Innovation Enterprise for three summits September 8 & 9 in Boston, where KDnuggets readers get a 10% discount. Register now!
Unique opportunity to solve complex real world big data challenges for the China mobile market - predict users demographic characteristics based on their app usage, geolocation, and mobile device properties.
There are a number of "mainstream" deep learning projects out there, but many more niche projects flying under the radar. Have a look at 5 such projects worth checking out.
Data Mining History: The Invention of Support Vector Machines; Storytelling: The Power to Influence in Data Science; Support Vector Machines: A Simple Explanation; Big Data, Bible Codes, and Bonferroni
It's tempting to consider the progress of AI as though it were a single monolithic entity,
advancing towards human intelligence on all fronts. But today's machine learning only addresses problems with simple, easily quantified objectives
Learn about the NYU Stern MS in Business Analytics, the only premier global degree program of its kind designed for senior level professionals focused on the intersection of business strategy and data science.
This certificate program brings together the computational, analytical, communication skills, and the tools needed to analyze big data to make better business decisions. Classes run Sep 8 - Dec 15 in Wilmington, DE.
A comprehensive step-by-step guide to designing, setting up, executing and deploying data mining techniques in marketing. Use code VBM93 for 20% discount.
Here is a preview of the Data Science Summit, July 12-13 in San Francisco, where you can meet quality people hear exciting talks like 9 described here. Get get 20% with the code SFDATASCIENCE.
This discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data, with real world examples emphasizing the importance of statistical processes in practice.
This post explains what’s new in the 2.0 version of the FICO Decision Management Suite, and how it can be used by data scientists and others to create stronger customer relationships and provide strategic competitive advantage.
This tutorials uses the recently-released Genie (an acronym for General Evolving Networked Intelligence Engine) platform to learn from P2P (peer-to-peer) loan data. Experts and non-experts alike can leverage Genie to analyze Big Data, recognize objects, events, and patterns, and more.
In the Intelligent World Hackathon, running now through August 2, you can be one of the first developers to access GE smart LED streetlight network data and build urban apps on Predix, GE’s new IIoT data analytics platform.
Read some impressions from a visit to Strata Silicon Valley in March. The focus is on integration of data science and machine learning tools, as well as the simplification of related processes.
Logit Academy full-time 12-week hands-on program is taught by data scientists from top institutions in Southern California, including UCLA, USC and Caltech. Register now for Next session which begins Sep 19.
The #BigData Ecosystem is Too Damn Big!; A Practical Introduction to #DeepLearning with Caffe and #Python; What do Postgres, Kafka, and Bitcoin have in common?
Data scientists need to share results, which is different than talking shop with other data scientists. Read about influencing people and telling stories as a data scientist.
In the previous article, look at techniques to compare categorical variables with the help of an example. In this article, we shall look at techniques to compare mixed type of variables i.e. numerical and categorical variables together.
Everyone wants to leverage analytics, but should everyone dive into the deep end right away? Heed some sensible advice on getting started with analytics, and assessing the true upfront investment.
This third part of an introduction to linear regression moves past the topics covered in the first to discuss linearity, normality, outliers, and other topics of interest.
Microsoft Research Machine Learning Videos; Free Machine Learning Training Pathway; Andrew Ng's New Book; Coursera Removing Free Online Courses; Free Books!
Positions at Center for Data Science and Public Policy at U. Chicago; Business Analytics Lecturer at U. Iowa; IBM Social Good Fellow; Data quality postdoc at McMaster U; Asst. Prof. of Marketing at Yale, and more.
The Big Data Ecosystem is Too Damn Big; 5 More Machine Learning Projects You Can No Longer Overlook; 7 Steps to Mastering Machine Learning With Python; Machine Learning Trends and the Future of Artificial Intelligence
The story starts in Paris in 1989, when I benchmarked neural networks against kernel methods, but the real invention of SVMs happened when Bernhard decided to implement Vladimir Vapnik algorithm.
This post discusses 3 particular tutorial sessions of impact from the recent ICML 2016 conference held in New York. Check out some innovative ideas on Deep Residual Networks, Memory Networks for Language Understanding, and Non-Convex Optimization.
We introduce the concept of topic modelling and explain two methods: Latent Dirichlet Allocation and TextRank. The techniques are ingenious in how they work – try them yourself.