The Myth of Model Interpretability
Deep networks are widely regarded as black boxes. But are they truly uninterpretable in any way that logistic regression is not?
on Apr 27, 2015 in Deep Learning, Deep Neural Network, Interpretability, Support Vector Machines, Zachary Lipton
New Hybrid Rare-Event Sampling Technique for Fraud Detection
Proposed hybrid sampling methodology may prove useful when building and validating machine learning models for applications where target event is rare, such as fraud detection.
on Apr 26, 2015 in Bootstrap sampling, Fraud Detection, Sampling
Data Mining: New Comprehensive Textbook by Charu Aggarwal
This comprehensive data mining textbook explores the different aspects of data mining, from basics to advanced, and their applications, and may be used for both introductory and advanced data mining courses.
on Apr 23, 2015 in Book, Charu Aggarwal, Data Mining
Top 10 R Packages to be a Kaggle Champion
Kaggle top ranker Xavier Conort shares insights on the “10 R Packages to Win Kaggle Competitions”.
on Apr 21, 2015 in Kaggle, R Packages, random forests algorithm, Success, SVM, Text Analysis, Xavier Conort
Algorithmia Tested: Human vs Automated Tag Generation
Algorithmia, the marketplace for algorithms, can be a platform for hosting APIs to do a plethora of text analytics and information retrieval tasks. Automatic post tagging is done in this case study to demonstrate the effectiveness and ease-of-use of the platform.
on Apr 21, 2015 in Algorithmia, API, Grant Marshall, Information Retrieval, Python, Text Analytics
Algorithmia: Building a web site explorer in 5 easy steps
We show how to use Algorithmia for quickly building a functional web site explorer in 5 steps: GetLinks, PageRank, Url2text, Summarizer and AutoTag.
on Apr 20, 2015 in Algorithmia, API, Page Rank, Search Engine, Web Mining
Cartoon: A solution for Data Scientists allergies caused by Big Data
With more and more allergies and big trend towards gluten-free everything, new KDnuggets cartoon envisions a possible solution for Data Scientists allergies.
on Apr 17, 2015 in Allergy, Big Data, Cartoon, Data Scientist
Data Science 101: Preventing Overfitting in Neural Networks
Overfitting is a major problem for Predictive Analytics and especially for Neural Networks. Here is an overview of key methods to avoid overfitting, including regularization (L2 and L1), Max norm constraints and Dropout.
on Apr 17, 2015 in Neural Networks, Nikhil Buduma, Overfitting, Regularization
Interview: Ksenija Draskovic, Verizon on Dissecting the Anatomy of Predictive Analytics Projects
We discuss Predictive Analytics use cases at Verizon Wireless, advantages of a unified data view, model selection and common causes of failure.
on Apr 15, 2015 in Customer Intelligence, Interview, Ksenija Draskovic, Optimization, Predictive Analytics, Project Fail, Use Cases, Verizon
Awesome Public Datasets on GitHub
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
on Apr 6, 2015 in Datasets, Finance, GitHub, Government, Machine Learning, NLP, Open Data, Time series data
Hadoop as a Service: 18 Cloud Options
Hadoop as a service in the cloud makes big data applications and projects easier to approach and these 18 platforms each provide their own unique solutions.
on Apr 2, 2015 in AWS, Big Data Services, Cloud, Cloudera, Hadoop, Hortonworks, Information Management, MapR, Microsoft Azure
Computing Platforms for Analytics, Data Mining, Data Science
The poll results suggest a split between a majority of data miners and data scientists who work with growing but still "PC-size", small GB-sized data, and a smaller group of Big Data analysts who work with cloud-sized data. Cloud computing, Unix, and especially Mac gained in popularity.
on Apr 1, 2015 in Apple, Cloud Computing, Poll
|