Applied Statistics Is A Way Of Thinking, Not Just A Toolbox
The choice of tools in applied statistics is driven by the objective, the structure of the data, and the nature of the uncertainty in the numbers, whereas in academic statistics its driven by publishing or teaching. Here we provide some of common statistical tools and the overlapping genealogy.
on May 29, 2015 in Applied Statistics, Randy Bartlett, Statistics, Toolbox
Insights from Data Science Handbook
Here you can find perspective of lead data scientists on the definitions ranging from data science, metrics selection while solving a problem, work ethics, the art of storytelling and why data science is important in todays world.
on May 28, 2015 in Data Science, Data Science Fellows, Data Science Jargon, DJ Patil, Handbook, Hilary Mason
Miner3D Data Visualization System Version 8
The new software features a redesigned user interface, making it a perfect complement for Excel. New graphics visualization engine is now faster and smoother.
on May 27, 2015 in Data Visualization, Miner3D
Dark Knowledge Distilled from Neural Network
Geoff Hinton never stopped generating new ideas. This post is a review of his research on “dark knowledge”. What’s that supposed to mean?
on May 26, 2015 in Dark Knowledge, Deep Learning, Geoff Hinton, Neural Networks, Ran Bi
R vs Python for Data Science: The Winner is …
In the battle of "best" data science tools, python and R both have their pros and cons. Selecting one over the other will depend on the use-cases, the cost of learning, and other common tools required.
on May 26, 2015 in Data Science Tools, DataCamp, Python, Python vs R, R
R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites
R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.
on May 25, 2015 in Actian, Apache Spark, Data Mining Software, H2O, Knime, Poll, Python, R, RapidMiner, SQL
Exclusive Interview: Matei Zaharia, creator of Apache Spark, on Spark, Hadoop, Flink, and Big Data in 2020
Apache Spark is one the hottest Big Data technologies in 2015. KDnuggets talks to Matei Zaharia, creator of Apache Spark, about key things to know about it, why it is not a replacement for Hadoop, how it is better than Flink, and vision for Big Data in 2020.
on May 22, 2015 in Apache Spark, Big Data, Databricks, Flink, Hadoop, Matei Zaharia, MLlib, Spark SQL
Top 10 Data Mining Algorithms, Explained
Top 10 data mining algorithms, selected by top researchers, are explained here, including what do they do, the intuition behind the algorithm, available implementations of the algorithms, why use them, and interesting applications.
on May 21, 2015 in Algorithms, Apriori, Bayesian, Boosting, C4.5, CART, Data Mining, Explained, K-means, K-nearest neighbors, Naive Bayes, Page Rank, Support Vector Machines, Top 10
I’ve Been Replaced by an Analytics Robot
A veteran statistician reflects on the journey from a statistician of the past to data scientist of today, how the work he used to do became automated, and what future can data scientists can expect.
on May 20, 2015 in Automation, Data Science, Future, History, Robots
Most Viewed Data Mining Videos on YouTube
The top Data Mining YouTube videos by those like Google and Revolution Analytics covers topics ranging from statistics in data mining to using R for data mining to data mining in sports.
on May 18, 2015 in Ayasdi, Data Mining, Google, Grant Marshall, R, Rattle, Revolution Analytics, Statistica, Text Mining, Weka, Youtube
How to Lead a Data Science Contest without Reading the Data
We examine a “wacky” boosting method that lets you climb the public leaderboard without even looking at the data . But there is a catch, so read on before trying to win Kaggle competitions with this approach.
on May 17, 2015 in Accuracy, Benchmark, Competition, Kaggle, Model Performance
Data Science for Workforce Optimization: Reducing Employee Attrition
Predictive analytics is growing its reach, see how it is affecting workforce analytics domain. In this presentation Pasha Roberts explains what is in it for students, managers and practitioners.
on May 15, 2015 in Pasha Roberts, PAW, Talent Analytics, Workforce Analytics
Surprising Random Correlations
An interesting demo showing how easy it is to find surprising correlations in real data. Is German unemployment rate related to Apple Stock? Is 10-year Treasury rate related to price of Red Winter Wheat? You will be surprised.
on May 14, 2015 in Correlation, Overfitting, Quandl, Random
Seven Techniques for Data Dimensionality Reduction
Performing data mining with high dimensional data sets. Comparative study of different feature selection techniques like Missing Values Ratio, Low Variance Filter, PCA, Random Forests / Ensemble Trees etc.
By Rosaria Silipo on May 14, 2015 in Data Processing, High-dimensional, Knime, Rosaria Silipo
Plotly: Online Dashboards That Update Your Data and Graphs
New online visualization option from Plot.ly allows you to have data visualizations and graphs that update dynamically.
on May 13, 2015 in Data Visualization, Plotly
Machine Learning Wars: Amazon vs Google vs BigML vs PredicSis
Comparing 4 Machine Learning APIs: Amazon Machine Learning, BigML, Google Prediction API and PredicSis on a real data from Kaggle, we find the most accurate, the fastest, the best tradeoff, and a surprise last place.
on May 12, 2015 in Amazon, BigML, Google, Louis Dorard, Machine Learning, PredicSis
Cartoon: Data Scientist Mother
We revisit KDnuggets Cartoon which looks at the Mother of All Data. Enjoy and don't forget the mothers in your life - Big Data predicted that 67.53% of you would remember!
on May 10, 2015 in Cartoon
Most Viewed Big Data Videos on YouTube
The top Big Data YouTube videos by those like Hortonworks and Kirk D. Borne cover diverse topics including Hadoop, Big Data Trends, Deep Learning, and Big Data Leadership.
on May 9, 2015 in Big Data, Cloudera, Deep Learning, Google, Grant Marshall, Hadoop, IBM, Kirk D. Borne, TED, Youtube
The Inconvenient Truth About Data Science
Data is never clean, you will spend most of your time cleaning and preparing data, 95% of tasks do not require deep learning, and more inconvenient wisdom.
on May 5, 2015 in Advice, Data Cleaning, Data Science
Data Scientists Automated and Unemployed by 2025?
Will Data Scientists be unemployed by 2025? Majority of voters in latest KDnuggets Poll expect expert-level Data Science to be automated in 10 years or less.
on May 5, 2015 in Automation, Data Scientist, Poll
Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science – Discussions up, Engagement down
While discussions are growing, the comments and engagements are falling, especially since 2012. We cluster groups into 4 quadrants by activity level and identify most active and engaged groups. Open groups are twice as active as closed.
on May 4, 2015 in About KDnuggets, LinkedIn, LinkedIn Groups
WebDataCommons – the Data and Framework for Web-scale Mining
The WebDataCommons project extracts the largest publicly available hyperlink graph, large product-, address-, recipe-, and review corpora, as well as millions of HTML tables from the Common Crawl web corpus and provides the extracted data for public download.
on May 1, 2015 in Big Data Analytics, Graph Databases, RDF, Web Mining
How To Become a Data Scientist And Get Hired
A data scientist should be able to choose the right technology, understand the business context and solve a wide range of problems. To hire the the right data scientist, check the tips list in the post.
on May 1, 2015 in Business, Data Scientist, Hiring, Salary
|