- Hot or Not: Data Science Trends in 2015 - Dec 24, 2014.
CrowdFlower infographic predicts the hot trends for data science in 2015 and which trends will fade away.
CrowdFlower, Data Democratization, Data Science, Infographic, Predictions for 2015, Social Good, Trends
- Interview: Brian Hampton, San Francisco 49ers on Playing Football the Analytics Way - Dec 19, 2014.
We discuss the role of analytics in football, the underrated challenges, evolution since the era of draft trade value chart and analytics-supported team selection.
Analytics, Brian Hampton, Challenges, Coaching, Competition, Football, NFL, Sports, Team
- IBM Watson Analytics vs. Microsoft Azure Machine Learning (Part 1) - Dec 16, 2014.
IBM Watson Analytics prototype seeks to abstract away data science, taking ordinary natural language queries and answering them based on the content of uploaded datasets. Microsoft Azure Machine Learning goes the opposite route, streamlining existing data mining methodology for fast results and integration with MS's other cloud services.
Azure ML, Cloud Analytics, Data Mining Software, IBM Watson, Zachary Lipton
- 16 NoSQL, NewSQL Databases To Watch - Dec 15, 2014.
NoSQL and NewSQL databases have become much more important with the proliferation of big, mobile, and networked data, and these sixteen database solutions are some of the biggest up-and-comers.
Hadoop, InformationWeek, MongoDB, NoSQL, Oracle, VoltDB
- Most Demanded Data Science and Data Mining Skills - Dec 15, 2014.
Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.
Data Science Skills, Data Scientist, Hadoop, New York-NY, Python, R, SAS, Skills, SQL
- Interview: Daqing Zhao, Macys.com on Building Effective Data Models for Marketing - Dec 11, 2014.
We discuss the challenges in identifying the fair price of ad media, recommendations for building effective models for online marketing, unique challenges of Mobile channel, selection of Big Data tools, and more.
Daqing Zhao, Data Models, Data Science Skills, Hadoop, Interview, Macy's, Marketing, Mobile, Tools
- Geoff Hinton AMA: Neural Networks, the Brain, and Machine Learning - Dec 9, 2014.
In a wide-ranging Q&A, Geoff Hinton addresses the future of deep learning, its biological inspirations, and his research philosophy.
Backpropagation, Deep Learning, Geoff Hinton, Michael Jordan, Neural Networks, Neuroscience, Zachary Lipton
- SlamData Open Source Analytics Tool for MongoDB - Dec 4, 2014.
SlamData is an open source SQL-based tool designed to make accessing data in MongoDB easy for developers and non-developers alike with the goal of making application intelligence easier.
MongoDB, NoSQL, Open Source, SlamData, SQL
- Top 10 Big Data Companies by Revenue - Dec 1, 2014.
IBM, HP, Dell, and SAP lead the list of Big Data companies with the most revenue from big data hardware, software, and IT services.
Big Data Vendors, Dell, HP, IBM, Revenue, SAP, SAS, Top 10, Wikibon
- Geoffrey Hinton talks about Deep Learning, Google and Everything - Dec 1, 2014.
A review of Dr. Geoffrey Hinton’s Ask Me Anything on Reddit. He talked about his current research and his thought on some deep learning issues.
Deep Learning, DeepMind, Geoff Hinton, Google, Neural Networks, Reddit, Yann LeCun
- Most Popular Slideshare Presentations on Big Data - Nov 28, 2014.
Hadoop, the cloud, and Microsoft Azure are just a few of the many topics covered by the top Big Data SlideShare presentations retrieved from the SlideShare API.
Big Data, Presentation, SlideShare
- Most Popular Slideshare Presentations on Data Science - Nov 25, 2014.
Top SlideShare data science presentations provide a unique view on topics like data science management, using Python and NumPy in your data science project, and leveraging data science for enterprise big data.
API, Big Data, Data Science Skills, Data Science Tutorial, Python, SlideShare
- 9 Must-Have Skills You Need to Become a Data Scientist - Nov 22, 2014.
Burtch Works details the top 9 data science skills that potential data scientists must have to be competitive in this growing marketplace from the perspective of a recruiter.
Burtch Works, Data Science Skills, Data Scientist, Hiring, MOOC, Unicorn
- Top KDnuggets tweets, Nov 17-18: Keep this #Python Cheat Sheet handy; Is #BigData The Most Hyped Technology Ever? - Nov 19, 2014.
Keep this #Python Cheat Sheet handy when learning to code; Is #BigData The Most Hyped Technology Ever? No (at least not yet); How to become a data scientist in 8 (not so) easy steps;R and Hadoop make Machine Learning Possible for Everyone.
Big Data Hype, Cheat Sheet, Data Scientist, Data Visualization, Python
- Why Azure ML is the Next Big Thing for Machine Learning? - Nov 17, 2014.
With advanced capabilities, free access, strong support for R, cloud hosting benefits, drag-and-drop development and many more features, Azure ML is ready to take the consumerization of ML to the next level.
Azure ML, Cloud Computing, Hadoop, Machine Learning, Marketplace, Microsoft Azure, Nate Silver, Predictive Analytics, Strata
- R and Hadoop make Machine Learning Possible for Everyone - Nov 16, 2014.
R and Hadoop make machine learning approachable enough for inexperienced users to begin analyzing and visualizing interesting data to start down the path in this lucrative field.
Data Science Skills, Hadoop, Hadoop 2.0, Joel Horwitz, LinkedIn, Machine Learning, R
- Most Popular Slideshare Presentations on Data Mining - Nov 13, 2014.
SlideShare data mining presentations cover many topics, offering a unique way of consuming data mining content and exploring a variety of slideshows, both narrow and broad in scope.
API, Data Mining Training, Python, SlideShare
- IBM Watson Analytics – Will it Replace Data Scientists ? - Nov 11, 2014.
We review IBM Watson Analytics Beta version, the service which aims to provide an automated data scientist and intended for business users who want to move beyond spreadsheets for analysis .
Data Scientist, IBM, Visualization, Watson
- To Hire Quants, Fix Your Hiring Process - Nov 7, 2014.
Hiring talented quants requires an up-to-date hiring process including components like competitive salaries, special bonuses, expedient timelines, and that extra special touch to make your company stand out to quality candidates.
Analytics Team, Burtch Works, Competition, Hiring, Salary
- DrivenData: Data Science Competitions for Social Good - Nov 4, 2014.
DrivenData plans to bring cutting-edge practices in data science and crowdsourcing to some of the world's biggest social challenges and the organizations taking them on.
Competition, Crowdsourcing, Data Science, DrivenData, Nonprofit, Social Good
- Cartoon: Halloween Costume for Big Data - Oct 28, 2014.
New KDnuggets cartoon looks at the appropriate Halloween costume for Big Data and its companion, No Privacy.
Big Data, Cartoon, Halloween, Privacy
- CRISP-DM, still the top methodology for analytics, data mining, or data science projects - Oct 28, 2014.
CRISP-DM remains the most popular methodology for analytics, data mining, and data science projects, with 43% share in latest KDnuggets Poll, but a replacement for unmaintained CRISP-DM is long overdue.
CRISP-DM, Data Mining, James Taylor, Methodology, Poll
- Will Deep Learning take over Machine Learning, make other algorithms obsolete? - Oct 27, 2014.
Will deep learning will take over machine learning and make other algorithms obsolete, or is it too complex to use on simpler problems? We look at both sides of this discussion.
Deep Learning, Machine Learning, Quora
- Big Data accelerates medical research? Or not? - Oct 26, 2014.
Take a look at how big data in healthcare brings big opportunities, but along with those opportunities come great risk if statistics aren't carefully applied to those large datasets.
Big Data, Healthcare, Overfitting, Research
- TweetNLP: Twitter Natural Language Processing - Oct 24, 2014.
A short overview of Natural Language Processing tools and utilities developed by Prof. Noah Smith, CMU and his team to analyze Twitter data.
Advanced Analytics, ARK, CMU, Datasets, NLP, Speech, Tools, Twitter
- Supermarket customers segmentation using Self-Organizing Mapping - Oct 23, 2014.
See how a leading European supermarket chain improved customer value and profitability and identified key customer groups by applying business intelligence and analytics techniques like self-organizing maps.
Business Intelligence, Clustering, Consumer Insights, Neural Networks
- Interactive Network and Graph Data Repository - Oct 17, 2014.
The network repository currently hosts over 500+ graphs/networks that span 19 collections of graphs from social science, machine learning, scientific computing, and many others.
Datasets, Graph Analytics, Graph Visualization, Network Graph
- ADW, free software to measure semantic similarity - Oct 13, 2014.
ADW is a software for measuring semantic similarity of arbitrary pairs of lexical items, from word senses to texts, based on "Align, Disambiguate, and Walk", a WordNet-based state-of-the-art semantic similarity approach. Get it on github.
Natural Language Processing, Semantic Analysis, Similarity, WordNet
- Develve statistical software, free for non-commercial use - Oct 10, 2014.
Check out Develve 2.0, a six-sigma tool, the new version featuring new utilities for measure system analysis and the design of sophisticated experiments.
Experimentation, Free Software, Statistical Modeling
- Perfume, computer programming, and Harvard - Oct 8, 2014.
What is the connection between Perfume, computer programming, and Harvard education? Peter Bruce explains.
edX, Harvard, Programming, Statistics.com
- Request: Top Business Analytics Journals? - Oct 7, 2014.
For a young business school professor in business analytics, what are the five to eight A-level journals in which he/she should try to publish?
Bruce Golden, Business Analytics, Business School, Journal
- Taming the Internet of Things – KNIME Case Study - Sep 30, 2014.
With increasing interest in the Internet of Things (IoT), see how KNIME can be applied to collect data from IoT sensors, enrich that data, transform it, analyze it, and finally visualize it.
Internet of Things, IoT, Knime, Rosaria Silipo, Smart City
- Automotive Customer Churn Prediction Results, part 2 - Sep 29, 2014.
Learn how to apply neural networks and self-organizing maps to visualize the macroscopic relationships between clients and the maintenance evolution of cars over the years.
Churn, Gregory Philippatos, Neural Networks, Predictive Analytics, Visualization
- Automotive Customer Churn Prediction using SVM and SOM - Sep 27, 2014.
A Case Study of predicting customer churn using Life Time Cycle approach and advanced machine learning methods including SVM and Self-Organizing Mapping.
Churn, Gregory Philippatos, Predictive Analytics, Product Analytics
- Apache Spark: O’Reilly Certification, EU Training, University Program - Sep 26, 2014.
Recent news on Apache Spark includes developer certification from O'Reilly, upcoming training workshops in EU by Databricks, and Spark tutorial events at major universities.
Academics, Apache Spark, Big Data, Certification, Databricks, Paco Nathan, Strata, Training
- Top KDnuggets tweets, Sep 19-21: Dilbert funniest cartoons on #BigData, data mining; Guess which pattern is random - Sep 22, 2014.
Guess which pattern is random, which machine-generated? Dilbert 20 funniest cartoons on #BigData, data mining, privacy; Data Scientist Cartoon; Neural Networks and Deep Learning, free online book (draft).
Cartoon, Deep Learning, Dilbert, Free ebook, Neural Networks, Random
- Most Viewed Web Mining Lectures - Sep 18, 2014.
Discover interesting lectures on topics like mining information networks and identifying influential members of online communities in this list of the top viewed web mining lectures on videolectures.net.
Text Mining, Videolectures, Web Analytics, Web Mining
- Rattle package for Data Mining and Data Science in R - Sep 17, 2014.
Try the newly-released version of Rattle, the open source R package for data mining, and enjoy accessing a huge array of data mining algorithms through a convenient interface.
Data Mining Software, Free Software, Graham Williams, Open Source, R, Togaware
- Most Viewed Machine Learning Talks at Videolectures - Sep 11, 2014.
Discover lectures from a variety of summer schools and conference tutorials on machine learning in this list of the top lectures on the subject from videolectures.net.
Machine Learning, Summer School, Tutorials, Videolectures
- Hiring Data Scientists: What to look for? - Sep 9, 2014.
Know key characteristics of what makes up a good data scientist based upon the three authors’ consulting and research experience, having collaborated with many companies world-wide on the topics of big data and analytics.
Analytics, Big Data, Business, Data Mining, Data Scientist, Hiring, Programming, Skills, Statistics
- Most Viewed Data Mining Talks at Videolectures - Sep 9, 2014.
Watch the top 25 most viewed popular data mining lectures on VideoLectures.NET to learn about topics ranging general big-data tutorials to monetizing data mining startups.
Big Data, Data Mining, Data Mining Training, Data Science, Tutorials, Videolectures
- Cartoon: Robot Labor Day 2050 - Sep 1, 2014.
Amidst all the discussion about robots and automation taking over human jobs, new KDnuggets cartoon looks at how Labor Day can evolve by 2050.
Cartoon, Labor Day, Robots
- Dataiku Data Science Studio - Aug 26, 2014.
Data Science Studio (DSS) from Dataiku is a complete Data Science software tool for developers and analysts,
which significantly shortens the time-consuming load-clean-train-test-deploy cycles of building predictive applications.
A community edition and a free trial available.
Data Mining Software, Data Preparation, Data Science, Dataiku, Florian Douetteau, Prediction
- Deep Learning – important resources for learning and understanding - Aug 21, 2014.
New and fundamental resources for learning about Deep Learning - the hottest machine learning method, which is approaching human performance level.
Deep Learning, Image Recognition, Machine Learning, Yann LeCun, Yoshua Bengio
- Sibyl: Google’s system for Large Scale Machine Learning - Aug 20, 2014.
A review of 2014 keynote talk about Sibyl, Google system for large scale machine learning. Parallel Boosting algorithm and several design principles are introduced.
Algorithms, Boosting, Google, Machine Learning, Sibyl
- Interview: Pedro Domingos: the Master Algorithm, new type of Deep Learning, great advice for young researchers - Aug 19, 2014.
Top researcher Pedro Domingos on useful maxims for Data Mining, Machine Learning as the Master Algorithm, new type of Deep Learning called sum-product networks, Big Data and startups, and great advice to young researchers.
Advice, Deep Learning, KDD-2014, Machine Learning, Pedro Domingos, Startups
- Four main languages for Analytics, Data Mining, Data Science - Aug 18, 2014.
New KDnuggets Poll shows the growing dominance of four main languages for Analytics, Data Mining, and Data Science: R, SAS, Python, and SQL - used by 91% of data scientists - and decline in popularity of other languages, except for Julia and Scala.
Analytics Languages, Data Mining, Data Science, Julia, Poll, Python, R, SAS, Scala, SQL
- Top Research Leaders in Data Mining, Data Science, and KDD - Aug 16, 2014.
We identify the top researchers in Data Mining, Data Science, and KDD. Jiawei Han, Philip Yu, and Christos Faloutsos remain the leaders, but they are joined by many fast rising young researchers - the leaders of tomorrow.
Christos Faloutsos, Data Mining, Hans-Peter Kriegel, Jian Pei, Jiawei Han, KDD, Philip S. Yu, Researchers, Top list
- Interesting Social Media Datasets - Aug 13, 2014.
Learn about some of the many interesting social media datasets available to you, some of which are quite new, and the different features and challenges they offer you for your next big data science project.
Challenge, Data Visualization, Datasets, Open Data, Social Media Analytics
- OpenML: Share, Discover and Do Machine Learning - Aug 11, 2014.
OpenML is designed to share, organize and reuse data, code and experiments, so that scientists can make discoveries more efficiently. It is an interesting idea to build a network of machine learning.
Kaggle, Machine Learning, OpenML, Ran Bi, Weka
- Interview: Michael Berthold, President and Founder of KNIME, on Data Mining, Startups, and Visual Workflow - Aug 9, 2014.
We discuss KNIME key features and how it compares to competition, KNIME business model, Pharma, planned development, and transition from an academic project to a company.
Knime, Konstanz University, Michael Berthold, Open Source
- BAT: China’s Three Big Data Leaders - Aug 5, 2014.
We examine the “three big mountains” in Chinese Internet and Big Data industry: Baidu, Alibaba, and Tencent (together called BAT), and look into their different strategy and focus.
Alibaba, Baidu, Big Data, China, Liyang Tang, Search Infrastructure, Social Media Analytics, Tencent
- Book: Data Classification: Algorithms and Applications - Aug 2, 2014.
Learn a wide variety of data classification techniques and their methods, domains, and variations in this comprehensive survey of the area of data classification.
Algorithms, Book, Charu Aggarwal, Classification, CRC Press
- 18 essential Hadoop tools - Aug 1, 2014.
Hadoop tools develop at a rapid rate, and keeping up with the latest can be difficult. Here we detail 18 of the most essential tools that work well with Hadoop.
Apache Spark, Data Infrastructure, Hadoop
- Interview: Sastry Malladi, StubHub on Designing Big Data Architecture for the Unknown Future - Jul 28, 2014.
We discuss the Big Data architecture at StubHub, important factors in architecture design, hybrid approach of using Big Data along with traditional data warehouses, challenges, importance of meta-data and more.
Architecture, Challenges, Design, Hadoop, Interview, Metadata, Personalization, Recommendation, Sastry Malladi, StubHub
- Containers: The Enabler of YARN - Jul 28, 2014.
The evolution of a data-center operating system is discussed along with the underlying challenges and approaches being followed. Containers play a big role in enabling the required abstraction and deliver additional benefits.
Altiscale, Applications, Containers, Docker, Hadoop, MapReduce, Mesos, Virtualization, YARN
- Data for Good: data-driven projects for social good - Jul 26, 2014.
Data for Good is an exciting new non-profit seeking to highlight the various data science projects and resources that can ultimately contribute to the social good.
Data Science, Government, Open Data, Social Good, Social Participation
- Spotting Bad Data Visualizations - Jul 22, 2014.
Good (or bad) Data visualizations can significantly help (or hurt) your case. Learn more about how poorly people can spot bad data visualizations.
Data Visualization, Software Advice
- MicroStrategy Analytics Desktop – visual tool, free download - Jul 18, 2014.
MicroStrategy Analytics Desktop is a fast, easy, and beautiful way to explore data and share your insights. Effortlessly build dashboards with a wide range of interactive visualizations. Free download.
Data Visualization, free download, MicroStrategy
- How Xbox, Big Data & Statistical Analysis Can Measure Public Opinion - Jul 11, 2014.
Could the Xbox gaming platform and Big Data hold the key to generating accurate measures of public opinion, such as election polling? A team of statistical scientists think so.
Big Data, Statistics, Survey, Xbox
- When Watson Meets Machine Learning - Jul 2, 2014.
Our report on a recent Cognitive Systems meetup co-sponsored by IBM Watson and NYU Center for Data Science, IBM Watson Ecosystem, and machine learning applications, from healthcare to cognitive toys. You will want Fang!
App, Cognitive Computing, IBM, Machine Learning, Ran Bi, Watson
- The Impact Cycle – how to think of actionable insights - Jun 29, 2014.
The IMPACT Cycle provides a guiding framework for thinking about the steps for being effective analytical consultant, and can be a tool to help you drive effectiveness through your analytical teams.
Analytics Consultant, Business Analytics, Business Strategy, Data Analytics, Jean-Paul Isson
- Do you need a Masters Degree to become a Data Scientist? - Jun 27, 2014.
Leading analytics experts answer the question: "Do you need a Masters Degree to become a Data Scientist?" Read practical tips and interesting commentary.
Data Science Education, Data Scientist, LinkedIn Groups, Master of Science
- Interview: Samaneh Moghaddam, Applied Researcher, eBay on Opinion Mining – Typical Projects and Major Challenges - Jun 27, 2014.
We discuss typical sentiment analysis problems at eBay, underrated challenges, career motivation, important soft skills and more.
Advice, Challenges, eBay, Interview, Samaneh Moghaddam, Skills
- Data Science Skills and Business Problems - Jun 27, 2014.
Discover what skills a data scientist benefits from learning and how the concept of a data scientist, and what businesses expect of them, has developed over time.
Alex Jones, Business Analytics, Data Science Skills, DJ Patil, McKinsey, Unicorn
- Domino – A Platform For Modern Data Analysis - Jun 26, 2014.
Tools that facilitate data science best practices have not yet matured to match their counterparts in the world of software engineering. Domino is a platform built from the ground up to fill in these gaps and accelerate modern analytical workflows.
Business Analytics, Data Analysis, Data Science Platform, Domino, Tools
- XLMiner solves Big Data Problems in Excel - Jun 26, 2014.
XLMiner, a part of Analytic Solver Platform integrated software for predictive and prescriptive analytics - forecasting, data mining, optimization and simulation, lets you solve small or Big Data problems in Excel.
Data Mining, Excel, Forecasting, Optimization, XLMiner
- CRN 50 Big Data Business Analytics Companies - Jun 25, 2014.
We examine CRN top 50 Big Data Business Analytics companies. They are younger (average age is 10), and 44% are founded since 2010.
Big Data, Business Analytics, Companies, CRN
- Does Deep Learning Have Deep Flaws? - Jun 19, 2014.
A recent study of neural networks found that for every correctly classified image, one can generate an "adversarial", visually indistinguishable image that will be misclassified. This suggests potential deep flaws in all neural networks, including possibly a human brain.
Artificial Intelligence, Deep Learning, Google, Image Recognition, Neural Networks
- KDnuggets Analytics, Data Mining, Data Science Software Poll – Analyzed - Jun 17, 2014.
We analyze the results of KDnuggets Software Poll, including correlations between tools, and relationships between commercial, free, and Hadoop/Big Data tools. We identify a potential capability gap. Download anonymized data and analyze it yourself.
Data Mining Software, Hadoop, Poll, R, RapidMiner
- Cartoon: Big Data and World Cup Football - Jun 17, 2014.
New KDnuggets Cartoon takes a fresh look on Big Data insights and World Cup 2014 in Soccer. What should a player do when Big Data predicts his behavior?
Big Data, Cartoon, World Cup
- The Cardinal Sin of Data Mining and Data Science: Overfitting - Jun 14, 2014.
Overfitting leads to public losing trust in research findings, many of which turn out to be false. We examine some famous examples, "the decline effect", Miss America age, and suggest approaches for avoiding overfitting.
Dean Abbott, John Ioannidis, Kirk D. Borne, Overfitting, S&P 500
- NYU Data Science Program – Things to Know - Jun 13, 2014.
Inside summary of NYU Data Science program launched last year, what it is, and what makes it special.
Data Science, Deep Learning, New York-NY, NYU, Ran Bi, Yann LeCun
- The Algorithm that Runs the World Can Now Run More of It - Jun 13, 2014.
The most important algorithm, used for optimizing almost everything, is linear programming. New advances allow linear programming problems to be solved faster using the new commercial parallel simplex solver.
Algorithms, FICO, Linear Programming, Optimization, Qi Huangfu, Simplex
- Top 10 Data Analysis Tools for Business - Jun 13, 2014.
Ten free, easy-to-use, and powerful tools to help you analyze and visualize data, analyze social networks, do optimization, search more efficiently, and solve your data analysis problems.
Data Analysis, Knime, RapidMiner, Tableau, Top 10, Wolfram
- Huge Big Data Poster and Reference - Jun 12, 2014.
A really Big poster "Do You Know Big Data" includes: What it is, Leading tools, What is a Data Scientist, What questions should we ask of databases, Visual techniques, Statistical algorithms, Privacy, and more.
Altamira, Big Data, Bob Gourley, CTOvision, Poster
- DLib: Library for Machine Learning - Jun 10, 2014.
DLib is an open source C++ library implementing a variety of machine learning algorithms, including classification, regression, clustering, data transformation, and structured prediction.
C++, DLib, Machine Learning, Open Source, Tools
- The First Law of Data Science: Do Umbrellas Cause Rain? - Jun 9, 2014.
Michael Brodie on the first law of data science, the role of data curation in Big Data analysis, and Thomas Piketty economic theories.
Causation, Confirmation Bias, Correlation, Data Curation, Michael Brodie, Piketty
- KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead - Jun 7, 2014.
With over 3,000 data miners taking part in KDnuggets 15th Annual Software Poll, RapidMiner continues to lead. Free software is used much more outside US, and Hadoop usage grows fastest in Asia.
Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SAS, SQL, SQL Server, Weka
- Data Lakes vs Data Warehouses - Jun 7, 2014.
Data Warehouses, traditionally popular for business intelligence tasks, are being replaced by less-structured Data Lakes which allow more flexibility.
Business Intelligence, Data Lakes, Data Science Platform, Data Visualization, Data Warehouse, DataRPM
- Data Science Last Mile - Jun 6, 2014.
This post discusses the Data Science "Last Mile", the final work to take the discovered insights and deliver them a highly usable format or integrate into a specific application.
Alpine, Data Science, Joel Horwitz, Predictive Analytics
- Big Data Strategy: Datafication - Jun 5, 2014.
Datafication of everything enables new ways of creating value and becoming more competitive. Oracle Big Data Strategist Paul Sonderegger explains.
Datafication, Las Vegas-NV, Oracle, Paul Sonderegger, Strategy
- OpenNN, An Open Source Library For Neural Networks - Jun 2, 2014.
OpenNN is an open source class library written in C++ which implements neural networks, and runs on Windows, Apple, or Linux.
Neural Networks, Open Source, OpenNN
- Interview: Kirk Borne, Data Scientist, GMU on Big Data in Astrophysics and Correlation vs. Causality - May 30, 2014.
We discuss how to build the best data models, significance of correlation and causality in Predictive Analytics, and impact of Big Data on Astrophysics.
Correlation, Interview, Kirk D. Borne, Predictive Analytics, Recommendations
- Vowpal Wabbit: Fast Learning on Big Data - May 26, 2014.
Vowpal Wabbit is a fast out-of-core machine learning system, which can learn from huge, terascale datasets faster than any other current algorithm. We also explain the cute name.
Fast Learning, John Langford, Machine Learning, Microsoft, Vowpal Wabbit
- Where to Learn Deep Learning – Courses, Tutorials, Software - May 26, 2014.
Deep Learning is a very hot Machine Learning techniques which has been achieving remarkable results recently. We give a list of free resources for learning and using Deep Learning.
Andrew Ng, Deep Learning, Geoff Hinton, Machine Learning, Yann LeCun
- Interview: Richard Wendell, VP, Data Science, TE Connectivity on Strategy for Analytics Projects - May 23, 2014.
We discuss the last mile of the execution path of Analytics projects, five critical pillars of success and data-driven decision making through advanced analytics.
Advanced Analytics, Big Data Strategy, Project Fail, Richard Wendell, TE Connectivity
- Stacking the Deck: The Next Wave of Opportunity in Big Data - May 20, 2014.
A leading venture capitalist explains why Big Data infrastructure market is mostly mature and where lies the next big area of opportunities related to Big Data.
Chip Hazard, Full Stack Analytics, Machine Learning, Network Effects, Startups, VC
- Exclusive: Tamr at the New Frontier of Big Data Curation - May 19, 2014.
Our exclusive profile of Tamr (former Data Tamer), the latest startup from legendary Michael Stonebraker, which emerged from stealth mode to address the new field of Big Data Curation.
Andy Palmer, Data Curation, Machine Learning, Michael Brodie, Michael Stonebraker, Startups, Tamr
- Poll Results: Data Types/Sources Analyzed - May 17, 2014.
Trends in data sources for data mining include: table data dominates, followed by time series and text; audio, JSON grows in popularity, while itemsets decline; 70% access DB engines, but only 20% access NoSQL stores; Hadoop, MongoDB used more for text; Europe is lagging in NoSQL usage.
Data types, Hadoop, NoSQL, Poll, Relational Databases
- Predict Soccer World Cup 2014 Winner, Get Prizes from RapidMiner - May 16, 2014.
Use a free edition of RapidMiner to have fun and bring sports predictions to another level by making a prediction of Soccer (Futbol) World Cup 2014, which starts on June 12 in Brazil.
Boston-MA, Brazil, Competition, RapidMiner, Soccer, World Cup
- Big Data Landscape, v 3.0, analyzed - May 15, 2014.
We analyze the Big Data Landscape and identify the most popular market segments in Analytics, Infrastructure, Applications, Open Source, and Data Sources categories. It is still early - only 4.5% of companies had exits.
Big Data, Big Data Analytics, Data Platform, Infrastructure, Landscape, Open Source, Startups
- Guide to Data Science Cheat Sheets - May 12, 2014.
Selection of the most useful Data Science cheat sheets, covering SQL, Python (including NumPy, SciPy and Pandas), R (including Regression, Time Series, Data Mining), MATLAB, and more.
Cheat Sheet, Data Science, Python, R, SQL
- Cartoon: Data Visualization meets 3-D Printer - May 11, 2014.
New KDnuggets Cartoon looks at what happens when Data Visualization meets 3-D Printer.
3-D Printing, Cartoon, Data Visualization
- Did Target Really Predict a Teen’s Pregnancy? The Inside Story - May 7, 2014.
We examine the origin and the facts behind this explosive story, the importance of headlines, and how unsubstantiated assumptions gain traction and mainstream attention and help create myths around Predictive Analytics.
Book, Charles Duhigg, Eric Siegel, Predictive Analytics, Pregnancy, Target
- JMP White Paper: Advantages of Bootstrap Forest for Yield Analysis - May 7, 2014.
This white paper highlights practical examples on how to use partitioning techniques for semiconductor manufacturing data. These methods also have wider applicability.
Bootstrap Forests, JMP, White Paper
- Poincare Conjecture, Perelman way, and Topology of social networks - May 3, 2014.
We examine the connections between the $1 million proof of Poincare conjecture by a reclusive math genius and the topological behavior and information diffusion over social networks.
Mathematics, Social Networks, Topology
- Cartoon: Data Scientist Salary Negotiation - Apr 29, 2014.
New KDnuggets Cartoon looks at Data Scientist Salary Negotiation situation.
Cartoon, Data Scientist, Hadoop, Salary
- 9 Free Books for Learning Data Mining and Data Analysis - Apr 29, 2014.
Whether you are learning data science for the first time or refreshing your memory or catching up on latest trends, these free books will help you excel through self-study.
Alex Ivanovs, Algorithms, Analysis, Data Mining, Free ebook, Programming
- New Book: Social Media Mining – free PDF download - Apr 22, 2014.
Social Media Mining integrates social media, social network analysis, and data mining to enable students, practitioners, researchers, and managers to understand the basics and potentials of this field.
Book, Free ebook, Huan Liu, Social Media, Social Media Analytics
- Elusive Data Scientists Driving High Salaries - Apr 21, 2014.
Recent study tracks experience, salary, industry and location of Data scientists, finds they are earning base salaries over $200K. Download free report.
Burtch Works, Data Scientist, free download, Salary, Survey
- Interactive Big Data Timeline - Apr 8, 2014.
A very interesting interactive Big Data timeline takes you from the beginning of information overload in 1880s to Business Intelligence, World Wide Web, Hadoop, Cloud, and more.
3Vs of Big Data, ERP, Gil Press, Hadoop, IBM, Information Overload, Timeline
- Employee Churn 201: Calculating Employee Value - Apr 4, 2014.
Much has been written about customer churn. This post examines employee churn - an equally important problem and its unique dynamics.
Employee Churn, Employee Value, GitHub, Pasha Roberts, R, Talent Analytics
- Is Data Scientist the right career path for you? Candid advice - Mar 28, 2014.
Candid advice from an industry veteran reveals the true picture behind the much-talked-about Data Scientist "glamour" and helps people have the right expectations for a Data Science career.
Advice, Career, Data Science, Data Scientist, Hadoop, Paco Nathan, Recommendation, Visualization
- Fractal Analytics Interview Highlights - Mar 27, 2014.
Fractal Analytics CEO on starting the company, competing with the best, managing attrition, attributes he looks for when hiring, 4 different analytics career tracks, strategic bets, and advice for starting data scientists.
Advice, Career, Fractal Analytics, Hiring, Interview
- Gartner 2014 Magic Quadrant for Advanced Analytics Platforms – view report - Mar 25, 2014.
Pioneering predictive analytics vendor RapidMiner was positioned in the Leaders quadrant of the first "Gartner Magic Quadrant for Advanced Analytics Platforms" - view the full report.
Advanced Analytics, Gartner, Magic Quadrant, RapidMiner
- Data Scientists Salary Survey: US, Canada, Australia lead - Mar 21, 2014.
Data Scientists Salary Survey shows that industry data scientists are in a sweet spot, especially in US, Canada, and Australia, with average salary $135K. European and Asian data scientists salaries are significantly lower.
Asia, Australia, Canada, Data Scientist, Europe, Industry, Poll, Salary, USA
- Machine Learning in 7 Pictures - Mar 18, 2014.
Basic machine learning concepts of Bias vs Variance Tradeoff, Avoiding overfitting, Bayesian inference and Occam razor, Feature combination, Non-linear basis functions, and more - explained via pictures.
Basis functions, Bayesian, Concepts, Machine Learning, Pictures, Variance
- Evolution of Fraud Analytics – An Inside Story - Mar 14, 2014.
The amazing analytic innovations in payment fraud prevention can be grouped into three major categories: large data-set modeling, sparse data-set modeling, and false-positive reductions - a view from the inside.
False positive, FICO, Fraud analytics, Fraud Prevention, Neural Networks, Sparse data
- How Many Data Scientists are out there? - Mar 13, 2014.
We examine indeed, LinkedIn, Kaggle, and other sources to investigate how many data scientists - in name and in function - are out there, and how strong is the demand.
Data Scientist, indeed, Kaggle, LinkedIn, McKinsey
- Introduction to Random Forests® for Beginners – free ebook - Mar 6, 2014.
Random Forests is of the most powerful and successful machine learning techniques. This free ebook will help beginners to leverage the power of Random Forests.
Beginners, Decision Trees, ebook, Free, Kaggle, random forests algorithm, Salford Systems
- The Do’s and Don’ts of Data Mining - Mar 1, 2014.
Leading data mining and analytics experts give their favorite do's and don'ts, from "Do plan for data to be messy" to "Do not underestimate the power of a simpler-to-understand solution".
- 10 Most Influential Analytics Leaders in India - Feb 25, 2014.
Analytics India Magazine’s annual ranking of the 10 Most Influential Analytics Leaders in India, in terms of Impact, Leadership, Entrepreneurship and Analytics evangelism.
2014, Analytics Leader, India, Influencers
- SAS, IBM, RapidMiner, Knime leaders in Gartner MQ for Advanced Analytics Platforms - Feb 24, 2014.
Gartner new Magic Quadrant(tm) for Advanced Analytics Platforms names 4 companies as leaders: SAS, IBM, RapidMiner, and Knime. A copy of the report, with evaluations for 15 more companies, is available thanks to RapidMiner.
Advanced Analytics, Gartner, IBM, Knime, Magic Quadrant, RapidMiner, SAS
- Qualitative Analytics: Why numbers do not tell the complete story? - Feb 21, 2014.
Data scientists love numbers, yet not all data is numerical. Qualitative analytics should not be ignored, especially given the unique value it provides.
Customer Experience, Qualitative Analytics, Qualitative Research, Quantitative Analytics, Web Analytics
- KDnuggets Exclusive: Part 2 of the Interview with Yann LeCun - Feb 20, 2014.
We discuss how far AI is likely to go, how Data Science to Statistics is like Computer Science was to Math, Big Data hype and reality, and advice to beginning Data Scientists.
AI, Artificial Intelligence, Big Data Hype, NYU, Singularity, Yann LeCun
- KDnuggets Exclusive: Interview with Yann LeCun, Deep Learning Expert, Director of Facebook AI Lab - Feb 20, 2014.
We discuss what enabled Deep Learning to achieve remarkable successes recently, his argument with Vapnik about (deep) neural nets vs kernel (support vector) machines, and what kind of AI can we expect from Facebook.
Andrew Ng, Deep Learning, Facebook, Interview, NYU, Support Vector Machines, Vladimir Vapnik, Yann LeCun
- Alpine Data Science Periodic Table - Feb 19, 2014.
One of the most clever giveaways at the recent Strata Conference in Santa Clara was a Periodic Table of Data Science from Alpine.
Alpine, Data Science, Periodic-Table, Strata 2014
- Anaconda: Free enterprise-ready Python for Big data, Predictive Analytics - Feb 15, 2014.
125+ cross-platform tested and optimized Python packages for advanced analytics totally free, even for commercial use.
Anaconda, Cross-Platform, Free Enterprise-Ready, Python
- One Page R: A Survival Guide to Data Science with R - Feb 14, 2014.
A collection of useful one-page resources for a data miner, data scientist, and/or a decision scientist. The modules include code, lectures, and one-page recipes for getting things done.
- Cartoon: Data Scientist Valentine Day Prediction - Feb 13, 2014.
New KDnuggets cartoon looks at a Data Scientist Valentine's Day prediction.
Cartoon, Data Scientist, Humor, Valentine's Day
- Book: Mining of Massive Datasets, 2nd Edition, free download - Feb 12, 2014.
The second edition of this landmark book adds Jure Leskovec as a coauthor and has 3 new chapters, on mining large graphs, dimensionality reduction, and machine learning. You can still freely download a PDF version.
Anand Rajaraman, Jeff Ullman, Jure Leskovec, Mining Massive Datasets, Stanford, Textbook
- 3 Ways to Test the Accuracy of Your Predictive Models - Feb 8, 2014.
3 different methods for testing accuracy of predictive models from 3 leading analytics experts - Karl Rexer, John Elder, and Dean Abbott explain using lift charts, randomization testing, and bootstrap sampling.
Bootstrap sampling, Dean Abbott, Decile tables, John Elder, Karl Rexer, Lift charts, Predictive Models, Randomization, Target shuffling
- 10 Emerging Analytics Startups in India - Feb 7, 2014.
India is becoming a powerhouse in Analytics, and here are 10 emerging Indian Analytics startups to watch in 2014: Crayon Data, Flutura, Axtria, Flytxt, Sapience Analytics, SIBIA Analytics, Ideal Analytics, FORMCEPT, IQR Consulting, and StatLabs.
Chennai-India, Crayon Data, Flutura, India, Kolkata-India, Startups
- Deep Learning Wins Dogs vs Cats competition on Kaggle - Feb 5, 2014.
A Deep learning expert wins Kaggle Dogs vs Cats image competition with an almost perfect result.
Cats, Competition, convnet, Deep Learning, Dogs, Facebook, Kaggle
- CMSR Data Miner and Rule-Engine Software Suite – free academic use - Feb 4, 2014.
CMSR - Cramer Modeling, Segmentation and Rules - is data miner and rule-engine suite having rule-engines as a unique feature. Rule-engines provide rule-based predictive model evaluation.
CMSR, Cramer, Rosella Software, Rule Engines, Sequential Rule Engine
- More Data Mining with Weka - Jan 30, 2014.
This online course teaches both principles and practical data mining techniques, lets students work on very big datasets, classify text, experiment with clustering, and much more.
Association Rules, Clustering, Data Mining with Weka, Online Education, Text Classification, Weka
- Determining the Value of Insights - Jan 30, 2014.
With the value of Consumer Insights being questioned to justify ROI, the Market Research professionals need to figure out ways to quantify the value of those insights. Determining the value of insights is no easy task and requires focus on three key components.
Efficiency, Insight Effectiveness, Insight Quality, Market Research
- Viewpoint: Why your company should NOT use “Big Data” - Jan 27, 2014.
Hardcore analytics (and Big Data) can add value, but only marginally and only for companies that have already mastered using the data they already have. The ‘obvious’ information from your own data can get you 90%+ of the total impact, so start there. The hard part is executing the basic insights across the organization.
80/20 Principle, Hardcore Analytics, Pair Search, Quality Score, Sort Order
- Using Data Mining to Predict the Winter Olympics Medal Counts in Sochi - Jan 25, 2014.
Could data mining techniques accurately predict the medal counts at the Olympics? A predictive model could give us an estimate of the number of medals each nation might win; but how close could we get to the actual outcomes? It was a tantalizing project …
Olympics, Russia, Sports