A recent post has generated an intense discussion about finding "unicorn" data scientists with a combination of all the needed skills, or whether that skillset is best filled by a team. Here are the highlights, including a proposal how to train well-rounded data scientists.
The emergence of Apache Spark is a key development for Big Analytics; 5 Free Excel add-Ins to help Marketers analyze #BigData; Key Skills of Top @kaggle Competitors: R (90%), Random Forests (60%); Netflix open sources Suro: data traffic "cop" which directs #BigData to destination
Data Mining Book Review: "Visualize This" from @flowingdata; Top NYU Professor Vasant Dhar on Data Science and Prediction - what do they mean; Analysis reveals #MOOC problems: student participation drops dramatically.
Covers 15 real-world applications on data mining with R, including R code and data, covering business background and problems, data extraction and exploration, data preprocessing, modeling, model evaluation, findings and model deployment.
The rapidly rising term "Data Scientist" caught up with "Statistician" and surpassed "Data Miner" on Google Trends. However, Statistics remains a lot more popular than "Data Science", which begs the question: What do Data Scientists do? Clearly, it is not Data Science.
DMCS (Data Mining Case Studies) 2013 Practice Prize was awarded at ICDM 2013 conference for a work on a novel and successful credit card fraud detection system, implemented in a Turkish bank. The Prize was partially sponsored by KDnuggets.
What does "Data Science" and #BigData mean? Is there something unique about it? What skills do "data scientists" need to be productive in a world deluged by data? What are the implications for scientific inquiry?
Partner with business teams to understand objectives and scope analytical projects that deliver insights and results; work in a cross-functional manner with other consultants, analysts, statisticians, data engineers, and external vendors to deliver insights and solutions.
Poll Results: R has a big lead, but Python is gaining; Who are Data Scientists and why they are or are not unicorns; 2014 Predictions: Machine-generated data will grow; #BigData + Big Pharma = Big Privacy Catastrophe
SEARCH is a statistical technique for understanding complex interactions among explanatory variables in describing a wide variety of phenomena. Awards for US grad students/postdocs trying to understand complex interactions in large databases.
Poll results show that R has a big lead, but Python is gaining among data scientists; We re-analyze top LinkedIn Groups for Analytics, Big Data and Data Science; Top 2013 Stoeries on KDnuggets and more.
More people than ever are interested in how big data and analytics can give them an edge. Watch the panelists, Gregory Piatetsky-Shapiro, Editor of KDNuggets, and Michael Karasick, VP of research in IBM acclaimed Almaden Research as they delve into these topics and give us a look at what they think will be the hottest topics and developments of 2014.
Guest blog of SkyTree CEO Martin Hack looks at 2 Key Trends in Predictive Analytics in 2014: high performance machine learning will penetrate the mainstream, and privacy issues associated with Big Data will be debated by business owners and consumers alike.
A billion rows per second in Python; #BigData Dashboard Dizziness - what you get after careful consideration of 437 charts; Import.io turns any website into a database; 2014 Predictions: Machine-generated data
Highlights of the IEEE ICDM 2013 Conference on Data Mining: Good organization in icy conditions, How to do clustering in high dimensions, Discovering unexpected sequential patterns, and perspectives on #BigData.
Seeking methodology for quantifying the value of different types of business data in order to inform large scale investment decisions concerning improving data infrastructure, supply chain and management.
Facebook hires Deep Learning expert Yann LeCun to head its new AI lab; New Data Mining and Machine Learning books from CRC Press - Save 25%; Import.io turns any website into a database; 2014 World Cup Group Stage, per ESPN: Brazil, Argentina, Germany, France advance
New Book: A Programmer Guide to Data Mining - Free Download; 3 Stages of Big Data; New Poll: Did you switch between R, Python, or other Data Science Languages? Top LinkedIn Groups for Analytics, Big Data
Written by leaders in the data mining community, this new book provides an in-depth introduction to the application of data mining and business analytics techniques and tools in scientific research, medicine, industry, commerce, and diverse other sectors.
More fuel thrown into Data Science Wars: Python vs. R; Data Science Toolbox virtual environments for command-line data science; T-index is like academic H-index; Movie Analytics in India: Dhoom 3 to Don 3
We revisit our analysis of top 30 LinkedIn groups for Analytics, Big Data, Data Mining, and Data Science and identify the largest, fastest growing, and most active groups. In 2013 the growth rate of top groups more than doubled, and growth rate correlated with the activity level.
The LIONbook on machine learning and optimization, written by co-founders of LionSolver software, is provided free for personal and non-profit usage. Chapter 16 looks at Visualizing graphs and networks by nonlinear maps.
Highlights include Focus on CRM, Big Data perhaps not so big, The Ascendance of R, Challenges in the use of analytics, High Job Satisfaction, and a ranking of analytics software by several measures, including Ease-of-use and cost.
Work with product, business, community and development teams to define, analyze and refine KPIs for overall product and new features. Drive the creation of a robust analytics tech stack to log and analyze all product data.
New Poll: Did you switch between R and Python; 3 Stages of Big Data; Why statistical community is disconnected from Big Data and how to fix it; Why RapidMiner? By Usama Fayyad; and more analytics/data mining news
The MS in Predictive Analytics at DePaul University addresses the growing demand for data scientists with 4 timely and in-demand concentrations: Marketing, Computational Methods, Hospitality, and Health-Care Analytics.
From power tools to automobiles, health monitoring machines to wind turbines, our Big Data group is focused on using expertise in data mining and machine learning to improve lives through our products.
Predictive analytics professionals will be beating down the doors of this international conference to hear from PAW keynote speakers. Dont miss your chance to save on PAW registration - register by Jan 24 with Early Bird Pricing.
The goal of this challenge is to encourage innovative visualizations of web data, especially interdisciplinary approaches. Use any of 4 huge datasets: web traffic, Twitter data, social bookmarking, or academic co-authorship.
A public list of R #rstats freelancers - great resource; Top 10 Big Ideas in Harvard Statistics Class; 3 stages of Big Data to help clarify the confusion; Trifacta, maker of #BigData platform for machine-learning powered data visualization
New KDnuggets Poll focuses on on the controversy around whether Python displaces R as language for Data Science, or whether R remains the dominant language. Please vote if you switched between R, Python, or other data analysis language in 2013.
Harvard CS109 Data Science Course, Resources Free and Online; Open Source Data Science Masters Curriculum; Gates Foundation Grants: Big Data for Social Good; Statistical Community and Big Data disconnect
The Big Ideas in Statistics include: Conditioning (the soul of statistics), Random variables and random vectors, Stories, Symmetry, Linearity of expectation, LOTUS, Variance, covariance, and correlation.
R is great for stats on one file, but for more complex data analysis use Python; How Facebook own Edgerank algorithm is killing it; Gates Foundation awards grants for using Big Data for Social Good; Preview of book Data Mining Applications with R
Highlights from a vigorous discussion on Statistical community and Big Data, including: Are data scientists reinventing statistics? Did statisticians miss the boat in 1990s? Is more data always better? Statistics 2.0?
With the current release of RapidMiner v6, and the introduction of application wizards to help business analysts instantly work with their data, RapidMiner will continue to be the platform of choice for anyone analyzing Big Data.
Google "Deep Learning" is outsmarting its human employees; Udacity Creates Online Degree Program For Data Science; JSON and #BigData will Shape the Internet of Things: RESTful APIs a key component; The Case Against #BigData In Sports
This certificate program brings together the computational, analytical and communication skills necessary to discover and implement data-supported solutions to business questions. Classes run Feb 13-May 22.
CIO Review special report on 20 Most Promising Data Analytics Companies, which cover Big Data, real-time insights, enterprise analytics, employee analytics, health care, and even neuroscience based data analytics.
A good collection of open source resources for Data Science Masters Curriculum, covering Math, Algorithms, Databases, Data Mining, Machine Learning, Natural Language Processing, Data Analysis and Visualization, and Python.