- Applied Statistics Is A Way Of Thinking, Not Just A Toolbox - May 29, 2015.
The choice of tools in applied statistics is driven by the objective, the structure of the data, and the nature of the uncertainty in the numbers, whereas in academic statistics its driven by publishing or teaching. Here we provide some of common statistical tools and the overlapping genealogy.
Applied Statistics, Randy Bartlett, Statistics, Toolbox
- Insights from Data Science Handbook - May 28, 2015.
Here you can find perspective of lead data scientists on the definitions ranging from data science, metrics selection while solving a problem, work ethics, the art of storytelling and why data science is important in todays world.
Data Science, Data Science Fellows, Data Science Jargon, DJ Patil, Handbook, Hilary Mason
- Miner3D Data Visualization System Version 8 - May 27, 2015.
The new software features a redesigned user interface, making it a perfect complement for Excel. New graphics visualization engine is now faster and smoother.
Data Visualization, Miner3D
- KDnuggets™ News 15:n17, May 27: R wins Annual Poll; Top 10 Algorithms; Interview with Spark Creator - May 27, 2015.
R leads RapidMiner, Python catches up - Annual Software Poll; Top 10 Data Mining Algorithms; Exclusive Interview: Matei Zaharia, creator of Apache Spark; 5 Not-to-be-Missed Ideas about Big Data.
- Dark Knowledge Distilled from Neural Network - May 26, 2015.
Geoff Hinton never stopped generating new ideas. This post is a review of his research on “dark knowledge”. What’s that supposed to mean?
Dark Knowledge, Deep Learning, Geoff Hinton, Neural Networks, Ran Bi
- R vs Python for Data Science: The Winner is … - May 26, 2015.
In the battle of "best" data science tools, python and R both have their pros and cons. Selecting one over the other will depend on the use-cases, the cost of learning, and other common tools required.
Data Science Tools, DataCamp, Python, Python vs R, R
- R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites - May 25, 2015.
R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.
Actian, Apache Spark, Data Mining Software, H2O, Knime, Poll, Python, R, RapidMiner, SQL
- Exclusive Interview: Matei Zaharia, creator of Apache Spark, on Spark, Hadoop, Flink, and Big Data in 2020 - May 22, 2015.
Apache Spark is one the hottest Big Data technologies in 2015. KDnuggets talks to Matei Zaharia, creator of Apache Spark, about key things to know about it, why it is not a replacement for Hadoop, how it is better than Flink, and vision for Big Data in 2020.
Apache Spark, Big Data, Databricks, Flink, Hadoop, Matei Zaharia, MLlib, Spark SQL
- Top 10 Data Mining Algorithms, Explained - May 21, 2015.
Top 10 data mining algorithms, selected by top researchers, are explained here, including what do they do, the intuition behind the algorithm, available implementations of the algorithms, why use them, and interesting applications.
Pages: 1 2 3
Algorithms, Apriori, Bayesian, Boosting, C4.5, CART, Data Mining, Explained, K-means, K-nearest neighbors, Naive Bayes, Page Rank, Support Vector Machines, Top 10
- I’ve Been Replaced by an Analytics Robot - May 20, 2015.
A veteran statistician reflects on the journey from a statistician of the past to data scientist of today, how the work he used to do became automated, and what future can data scientists can expect.
Automation, Data Science, Future, History, Robots
- Most Viewed Data Mining Videos on YouTube - May 18, 2015.
The top Data Mining YouTube videos by those like Google and Revolution Analytics covers topics ranging from statistics in data mining to using R for data mining to data mining in sports.
Ayasdi, Data Mining, Google, Grant Marshall, R, Rattle, Revolution Analytics, Statistica, Text Mining, Weka, Youtube
- How to Lead a Data Science Contest without Reading the Data - May 17, 2015.
We examine a “wacky” boosting method that lets you climb the public leaderboard without even looking at the data . But there is a catch, so read on before trying to win Kaggle competitions with this approach.
Accuracy, Benchmark, Competition, Kaggle, Model Performance
- Data Science for Workforce Optimization: Reducing Employee Attrition - May 15, 2015.
Predictive analytics is growing its reach, see how it is affecting workforce analytics domain. In this presentation Pasha Roberts explains what is in it for students, managers and practitioners.
Pasha Roberts, PAW, Talent Analytics, Workforce Analytics
- Surprising Random Correlations - May 14, 2015.
An interesting demo showing how easy it is to find surprising correlations in real data. Is German unemployment rate related to Apple Stock? Is 10-year Treasury rate related to price of Red Winter Wheat? You will be surprised.
Correlation, Overfitting, Quandl, Random
- Seven Techniques for Data Dimensionality Reduction - May 14, 2015.
Performing data mining with high dimensional data sets. Comparative study of different feature selection techniques like Missing Values Ratio, Low Variance Filter, PCA, Random Forests / Ensemble Trees etc.
Data Processing, High-dimensional, Knime, Rosaria Silipo
- Plotly: Online Dashboards That Update Your Data and Graphs - May 13, 2015.
New online visualization option from Plot.ly allows you to have data visualizations and graphs that update dynamically.
Data Visualization, Plotly
- Machine Learning Wars: Amazon vs Google vs BigML vs PredicSis - May 12, 2015.
Comparing 4 Machine Learning APIs: Amazon Machine Learning, BigML, Google Prediction API and PredicSis on a real data from Kaggle, we find the most accurate, the fastest, the best tradeoff, and a surprise last place.
Pages: 1 2
Amazon, BigML, Google, Louis Dorard, Machine Learning, PredicSis
- Cartoon: Data Scientist Mother - May 10, 2015.
We revisit KDnuggets Cartoon which looks at the Mother of All Data. Enjoy and don't forget the mothers in your life - Big Data predicted that 67.53% of you would remember!
Cartoon
- Most Viewed Big Data Videos on YouTube - May 9, 2015.
The top Big Data YouTube videos by those like Hortonworks and Kirk D. Borne cover diverse topics including Hadoop, Big Data Trends, Deep Learning, and Big Data Leadership.
Big Data, Cloudera, Deep Learning, Google, Grant Marshall, Hadoop, IBM, Kirk D. Borne, TED, Youtube
- The Inconvenient Truth About Data Science - May 5, 2015.
Data is never clean, you will spend most of your time cleaning and preparing data, 95% of tasks do not require deep learning, and more inconvenient wisdom.
Advice, Data Cleaning, Data Science
- Data Scientists Automated and Unemployed by 2025? - May 5, 2015.
Will Data Scientists be unemployed by 2025? Majority of voters in latest KDnuggets Poll expect expert-level Data Science to be automated in 10 years or less.
Automation, Data Scientist, Poll
- Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science – Discussions up, Engagement down - May 4, 2015.
While discussions are growing, the comments and engagements are falling, especially since 2012. We cluster groups into 4 quadrants by activity level and identify most active and engaged groups. Open groups are twice as active as closed.
About KDnuggets, LinkedIn, LinkedIn Groups
- WebDataCommons – the Data and Framework for Web-scale Mining - May 1, 2015.
The WebDataCommons project extracts the largest publicly available hyperlink graph, large product-, address-, recipe-, and review corpora, as well as millions of HTML tables from the Common Crawl web corpus and provides the extracted data for public download.
Big Data Analytics, Graph Databases, RDF, Web Mining
- How To Become a Data Scientist And Get Hired - May 1, 2015.
A data scientist should be able to choose the right technology, understand the business context and solve a wide range of problems. To hire the the right data scientist, check the tips list in the post.
Business, Data Scientist, Hiring, Salary