How to Compute the Statistical Significance of Two Classifiers Performance Difference
To determine whether a result is statistically significant, a researcher would have to calculate a p-value, which is the probability of observing an effect given that the null hypothesis is true. Here we are demonstrating how you can compute difference between two models using it.
on Mar 30, 2016 in Classifier, Cross-validation, Model Performance, Statistical Significance
100 Active Blogs on Analytics, Big Data, Data Mining, Data Science, Machine Learning
Stay on top of your data science skills game! Here’s a list of about 100 most active and interesting blogs on Big Data, Data Science, Data Mining, Machine Learning, and Artificial intelligence.
on Mar 29, 2016 in Big Data, Blogs, Data Science, Deep Learning, Hadoop, Machine Learning
Don’t Buy Machine Learning
In many projects, the amount of effort spent on R&D on Machine Learning is usually a small fraction of the total effort, or it’s not even there because we plan it for a future phase after building the application first.
on Mar 28, 2016 in Advice, Industry, Machine Learning
Cartoon: Citizen Data Scientist At Work
KDnuggets Cartoon examines Citizen Data Scientist at work and his previous career as a citizen dentist and a citizen pilot.
on Mar 26, 2016 in Cartoon, Citizen Data Scientist, Humor
How to combat financial fraud by using big data?
Financial fraud methods are becoming more sophisticated and the techniques to combat such attacks also need to evolve. Big data has brought with it novel fraud detection and prevention techniques such as behavioral analysis and real-time detection to give fraud fighting techniques a new perspective.
on Mar 25, 2016 in Alibaba, Banking, Big Data, Fraud, Fraud Detection, Fraud Prevention
XGBoost: Implementing the Winningest Kaggle Algorithm in Spark and Flink
An overview of XGBoost4J, a JVM-based implementation of XGBoost, one of the most successful recent machine learning algorithms in Kaggle competitions, with distributed support for Spark and Flink.
on Mar 24, 2016 in Apache Spark, Distributed Systems, Flink, Kaggle, XGBoost
Top 10 Data Science Resources on Github
The top 10 data science projects on Github are chiefly composed of a number of tutorials and educational resources for learning and doing data science. Have a look at the resources others are using and learning from.
on Mar 24, 2016 in Coursera, GitHub, IPython, Johns Hopkins, Open Source, Top 10
Doing Data Science: A Kaggle Walkthrough – Cleaning Data
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
on Mar 23, 2016 in Data Cleaning, Data Preparation, Kaggle, Pandas, Python
R Learning Path: From beginner to expert in R in 7 steps
This learning path is mainly for novice R users that are just getting started but it will also cover some of the latest changes in the language that might appeal to more advanced R users.
on Mar 23, 2016 in 7 Steps, Data Preparation, Data Science Education, Data Visualization, DataCamp, Hadley Wickham, Learning Path, Maps, R
Lift Analysis – A Data Scientist’s Secret Weapon
Gain insight into using lift analysis as a metric for doing data science. Understand how to use it for evaluating the performance and quality of a machine learning model.
on Mar 22, 2016 in Data Science, Lift charts, Metrics
Must Know Tips for Deep Learning Neural Networks
Deep learning is white hot research topic. Add some solid deep learning neural network tips and tricks from a PhD researcher.
on Mar 22, 2016 in Convolutional Neural Networks, Deep Learning
Netflix Prize Analyzed: Movie Ratings and Recommender Systems
A 195-page monograph by a top-1% Netflix Prize contestant. Learn about the famous machine learning competition. Improve your machine learning skills. Learn how to build recommender systems.
on Mar 18, 2016 in Free ebook, Netflix, Recommender Systems
The Data Science Game – Student Competition
The Data Science Game returns this year, with university students competing for dominance. Details for this iteration and further information is provided here.
on Mar 17, 2016 in Competition, Data Science, France, Kaggle, Paris, Student Competition
New KDnuggets Tutorials Page: Learn R, Python, Data Visualization, Data Science, and more
Introducing new KDnuggets Tutorials page with useful resources for learning about Business Analytics, Big Data, Data Science, Data Mining, R, Python, Data Visualization, Spark, Deep Learning and more.
on Mar 16, 2016 in Data Science Education, Online Education, Python, R
The Evolution of the Data Scientist
We trace the evolution of Data Science from ancient mathematics to statistics and early neural networks, to present successes like AlphaGo and self-driving car, and look into the future.
on Mar 16, 2016 in Automated, Data Scientist, Demis Hassabis, Evolution, Mathematics, Statistics
How to tell a great analyst from a good analyst
Good analyst help businesses to stay in the competition, but great analyst sets the business apart from its competition. Learn more about how to be a great analyst by walking that extra mile.
on Mar 15, 2016 in Analyst, Data Science Skills, Quandl
What Should Data Scientists Know About Psychology?
Due to training in the scientific method, data management, statistics/data analysis, subject matter expertise, and communicating results into substantive knowledge psychology researchers must have a solid understanding of data science and vice-versa.
on Mar 14, 2016 in Data Scientist, Methodology, Psychology
What is the influence of Big Data in Medicine?
The 360-degree customer view is the idea, that companies can get a complete view of customers by aggregating data from the various touch points that a user. And, big data is helping to materialize this idea, which will revolutionize the healthcare.
on Mar 14, 2016 in Big Data, Customer Analytics, Healthcare
3 Viable Ways to Extract Data from the Open Web
We look at 3 main ways to handle data extraction from the open web, along with some tips on when each one makes the most sense as a solution.
on Mar 11, 2016 in Crawler, import.io, Web Mining, Web services, Webhose.io
The Data Science Puzzle, Explained
The puzzle of data science is examined through the relationship between several key concepts in the data science realm. As we will see, far from being concrete concepts etched in stone, divergent opinions are inevitable; this is but another opinion to consider.
on Mar 10, 2016 in Artificial Intelligence, Data Mining, Data Science, Deep Learning, Explained, Machine Learning
Deriving Better Insights from Time Series Data with Cycle Plots
Visualization plays key role in analysis of time series data, to understand underlying trends. Here we are demonstrating the cycle plot which shows both the cycle or trend and the day-of-the-week or the month-of-the-year effect.
on Mar 9, 2016 in CleverTap, Data Visualization, Time Series
Top February stories: 21 Must-Know Data Science Interview Q&A; Gartner 2016 MQ for Advanced Analytics: gainers and losers
21 Must-Know Data Science Interview Questions and Answers; Top 10 TED Talks for the Data Scientists; Gartner 2016 Magic Quadrant for Advanced Analytics Platforms: gainers and losers.
on Mar 8, 2016 in Top stories
AI and Machine Learning: Top Influencers and Brands
Onalytica gives us a new list of the top 100 Artifical Intelligence and Machine Learning influencers and brands, and provides some insight into the relationships between them.
on Mar 8, 2016 in About Gregory Piatetsky, AI, Influencers, Kirk D. Borne, Machine Learning, Onalytica, Top list
Watch the Geek Rap Video – Predictive Analytics Song
“PREDICT THIS!” is the first pop song to present analytics content with Gangnam Style humor, and media-blending 80’s throwback visuals. The rapper, formerly known as Dr. Eric Siegel (co-founder of Predictive Analytics World) said, “I only answer to ‘Dr. Data’ now.”
on Mar 8, 2016 in Eric Siegel, Humor, Music, Predictive Analytics
Self-Paced E-Learning course: Credit Risk Modeling
The course covers basic and advanced modeling, including stress testing Probability of Default (PD), Loss Given Default (LGD ) and Exposure At Default (EAD) models.
on Mar 8, 2016 in Bart Baesens, Credit Risk, Online Education, Risk Modeling
Introducing GraphFrames, a Graph Processing Library for Apache Spark
An overview of Spark's new GraphFrames, a graph processing library based on DataFrames, built in a collaboration between Databricks, UC Berkeley's AMPLab, and MIT.
on Mar 7, 2016 in Apache Spark, Databricks, Graph Analytics
Fastest Growing Programming Languages and Computing Frameworks
A new model for ranking programming languages and predicting the growth of user adoption. Includes current language rankings and predictions.
on Mar 7, 2016 in Data Science, Javascript, Programming Languages, SQL, Trends
The Data Science Process
What does a day in the data science life look like? Here is a very helpful framework that is both a way to understand what data scientists do, and a cheat sheet to break down any data science problem.
on Mar 4, 2016 in CRISP-DM, Data Science, Methodology, Springboard
scikit-feature: Open-Source Feature Selection Repository in Python
scikit-feature is an open-source feature selection repository in python, with around 40 popular algorithms in feature selection research. It is developed by Data Mining and Machine Learning Lab at Arizona State University.
on Mar 3, 2016 in Data Mining, Data Science, Feature Extraction, Feature Selection, Machine Learning, Python
Top Big Data Processing Frameworks
A discussion of 5 Big Data processing frameworks: Hadoop, Spark, Flink, Storm, and Samza. An overview of each is given and comparative insights are provided, along with links to external resources on particular related topics.
on Mar 3, 2016 in Apache Samza, Apache Spark, Apache Storm, Flink, Hadoop
Top Spark Ecosystem Projects
Apache Spark has developed a rich ecosystem, including both official and third party tools. We have a look at 5 third party projects which complement Spark in 5 different ways.
on Mar 2, 2016 in Apache Mesos, Apache Spark, Cassandra, Databricks, Distributed Systems
New Salford Predictive Modeler 8
Salford Predictive Modeler software suite: Faster. More Comprehensive Machine Learning. More Automation. Better results. Take a giant step forward in your data science productivity with SPM 8. Download and try it today!
on Mar 1, 2016 in Data Science Platform, Decision Trees, Gradient Boosting, Predictive Modeler, Regression, Salford Systems
The Mirage of a Citizen Data Scientist
The term "citizen data scientist" has been irritating me recently. I explain why I think it both a bad term and a bad idea, and what we need instead.
on Mar 1, 2016 in Citizen Data Scientist, Data Analyst, Data Scientist, Gartner, Overfitting
Dynamic Data Visualization with PHP and MySQL: Election Spending
Learn how to fetch data from MySQL database using PHP and create dynamic charts with that data, using an interesting example of New Hampshire primary election spending.
on Mar 1, 2016 in Data Visualization, FusionCharts, MySQL, PHP
Distributed TensorFlow Has Arrived
Google has open sourced its distributed version of TensorFlow. Get the info on it here, and catch up on some other TensorFlow news at the same time.
on Mar 1, 2016 in Deep Learning, Distributed Systems, Google, Matthew Mayo, TensorFlow
Data Science and Disability
Data Science and Artificial Intelligence has come to the forefront of technology in the last few years. Learn, how practitioners are taking a more philanthropic outlook on life, supporting people suffering with both physical and mental disabilities.
on Mar 1, 2016 in Data Science, Disability, Healthcare
|