Download IKANOW’s “Beyond the Kill Chain” eBook
A complete security platform, powered by business intelligence, lets CISOs go beyond the kill chain, and manage risks, not just react to intrusions - download free ebook.
on Apr 30, 2015 in Cybersecurity, Free ebook, IKANOW
Closer Look: Data Mining and Knowledge Discovery Journal
Data Mining and Knowledge Discovery Journal has a new EIC, has high impact factor, and allows researchers to publish open access.
on Apr 29, 2015 in DMKD Journal, Geoff Webb, Johannes Fuernkranz, Journal
Interview: Haile Owusu, Mashable on Riding the Wave of Viral Content
We discuss Mashable’s milestones, data-driven digital publishing, digital media tracking, viral prediction, and Mashable Velocity.
on Apr 29, 2015 in Content Curation, Haile Owusu, Interview, Mashable, Metrics, Natural Language Processing, Prediction
Data Scientists Thoughts that Inspire
Inspirational thoughts from leading data scientists, including Yann LeCun, Erin Shellman, Daniel Tunkelang, Claudia Perlich, and Jake Porway. What inspires you?
on Apr 29, 2015 in Andy Rey, Claudia Perlich, Daniel Tunkelang, Facebook, Jake Porway, LinkedIn, Yann LeCun
Kaggle Competition (Facebook recruiting): Human or Robot?
Facebook and Kaggle are launching an Engineering competition for 2015 - leaders will earn an opportunity to interview for a software engineer at Facebook, working on world class Machine Learning problems.
In this competition, you'll be chasing down robots for an online auction site.
on Apr 28, 2015 in Bots, Competition, Facebook, Humans vs Machines, Kaggle, Software Engineer
Top KDnuggets tweets, Apr 21-27: Great discussion: Building Big Data systems in academia, industry; Deep Learning in a Nutshell
Great discussion: Building #BigData systems in academia, industry; DeepLearning in a Nutshell - what it is, how it works, why care?; Basics of #DeepLearning to Get You Started; Top LinkedIn Groups for #Analytics, #BigData.
on Apr 28, 2015 in Big Data, Crime, Deep Learning, San Francisco, Text Analysis
Text Analytics, Text Mining Courses on Statistics.com
Text analytics or text mining is the natural extension and essential part of predictive analytics and Data Science - learn key skills with Statistics.com online courses.
on Apr 28, 2015 in Natural Language Processing, Nitin Indurkhya, NLTK, Sentiment Analysis, Statistics.com, Text Analytics
TMA Predictive Analytics Data Mining Training [Wash. DC, May | Toronto, Aug]
Successful analytics in the big data era does not start with data and software, but with hands-on, immersive training and goal-driven strategy - get it from The Modeling Agency in Washington DC (May), Toronto (Aug).
on Apr 28, 2015 in Canada, Data Mining Training, DC, The Modeling Agency, Toronto, Washington
On the Shelf: Data Science Books
Here are some great books About Data Science, Data Science for Businesses, Data Science in Popular Culture, Data Science How Tos & Manuals, and more - brought to you by UC Berkeley online Master of Information and Data Science.
on Apr 28, 2015 in Book, Data Mining Books, Data Science Education, Online Education, UC Berkeley
How to become a Data Scientist – brief answer
The most important steps to become a Data Scientist: learn Python, deep understanding of machine learning, try to be up-to-date. Check more details in the post.
on Apr 28, 2015 in Andrew Ng, Data Scientist, Geoff Hinton, Python, Quora
Webinar: Data Mining: Failure to Launch
Learn how to get started with predictive modeling and overcome strategic and tactical limitations that cause data mining projects to fall short of their potential. Next webinar is May 5.
on Apr 27, 2015 in Data Mining, Failure to Launch, The Modeling Agency
Upcoming Webcasts on Analytics, Big Data, Data Science – Apr 28 and beyond
Solving Big Data Challenges, Implementing a Better Search Experience, Data Scientists Compensation, Maximizing ROI, Identifying Customers Across Platforms, The Fast Data Challenge with Michael Stonebraker, and more.
on Apr 27, 2015 in Burtch Works, Expert System, Lavastorm, Looker, Michael Stonebraker, Salford Systems
Interview: Mario Vinasco, Facebook on Advancing Marketing Analytics through Rigorous Experimentation
We discuss marketing analytics at Facebook, multi-channel performance assessment, success factors, lessons from Look Back feature, advice, and more.
on Apr 27, 2015 in Apache Hive, Career, Data Science, Experimentation, Facebook, Interview, Mario Vinasco, Marketing Analytics, Predictive Analytics, Trends
The Myth of Model Interpretability
Deep networks are widely regarded as black boxes. But are they truly uninterpretable in any way that logistic regression is not?
on Apr 27, 2015 in Deep Learning, Deep Neural Network, Interpretability, Support Vector Machines, Zachary Lipton
Top /r/MachineLearning Posts, Apr 19-25: Neural nets for nipple detection; NHL Goal celebration hack
Convolutional neural nets and Android App for nipple detection (NSFW), NHL goal detection, Geoff Hinton recent AI talk, top machine learning podcasts, and matrix multiplication in deep learning.
on Apr 27, 2015 in Deep Learning, Geoff Hinton, Grant Marshall, Machine Learning, Podcast, Reddit
Top stories for Apr 19-25: Top LinkedIn Groups for Analytics, Big Data, Data Mining; 10 R Packages for a Kaggle Champion
Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang to Now"; Top 10 R Packages to be a Kaggle Champion; Deep Learning to Fight Crime.
on Apr 26, 2015 in Top stories
New Hybrid Rare-Event Sampling Technique for Fraud Detection
Proposed hybrid sampling methodology may prove useful when building and validating machine learning models for applications where target event is rare, such as fraud detection.
on Apr 26, 2015 in Bootstrap sampling, Fraud Detection, Sampling
Interview: Emmanuel Letouzé, Data-Pop Alliance on Big Data for Development and Future Prospects
We discuss the field of Big Data for Development, current projects and future plans for Data-Pop Alliance, public participation opportunities, advice, and more.
on Apr 25, 2015 in Advice, Big Data, Comic, Data-Pop Alliance, Emmanuel Letouze, Interview, Trends
Open drives Boston Open Data Science Conference, May 30-31
Data science is built on transparency, effort, and the exchange of ideas. Join Open Data Science Conference, Boston, May 30-31, 2015.
on Apr 25, 2015 in Boston, Data Science, MA, ODSC, Open Source, Python, R, Sheamus McGovern
Data Science Open House Apr 29, Online or In-Person, NYC
Data science educator Metis, creators of the Metis Data Science Bootcamp in New York City, are hosting Apr 29 open house, in person in NYC and live online. Attend to meet the instructors, students and alumni.
on Apr 24, 2015 in Bootcamp, Data Science, Metis, New York City, NY
Big Data Bootcamp, Austin: Day 3 Highlights
Highlights from the presentations by Big Data and Analytics leaders/consultants on day 3 of Big Data Bootcamp in Austin.
on Apr 24, 2015 in Accenture, Bootcamp, Forrester, Global Big Data Conference, Hadoop, HBase, Hortonworks, Infochimps, NoSQL
HappyGrumpy – Free Twitter Sentiment Analysis and Data
HappyGrumpy has made available interesting data of Twitter sentiment changes and sentiment distribution around the world, by country, and over time.
on Apr 24, 2015 in Sentiment Analysis, Twitter
MapR on Open Data Platform: Why we declined
Why MapR declined to participate in the Open Data Platform? Our concerns include redundancy with Apache Software Foundation Governance, misdefined “core”, and lack of participation from Hadoop leaders.
on Apr 24, 2015 in Cloudera, Hadoop, MapR, Open Data Platform
Big Data Bootcamp, Austin: Day 2 Highlights
Highlights from the presentations by Big Data and Analytics leaders/consultants on day 2 of Big Data Bootcamp in Austin.
on Apr 23, 2015 in Bootcamp, Career, Cassandra, Data Analytics, DataStax, Global Big Data Conference, NoSQL
Data Mining: New Comprehensive Textbook by Charu Aggarwal
This comprehensive data mining textbook explores the different aspects of data mining, from basics to advanced, and their applications, and may be used for both introductory and advanced data mining courses.
on Apr 23, 2015 in Book, Charu Aggarwal, Data Mining
Interview: Emmanuel Letouzé, Data-Pop Alliance on the Role of Big Data in Economic Development
We discuss the emerging Big Data ecosystem, its key players, and the severe consequences of inadequate statistical capabilities across many African nations.
on Apr 23, 2015 in Africa, Data-Pop Alliance, Economics, ecosystem, Emmanuel Letouze, United Nations
Top /r/MachineLearning Posts, Apr 12-18: Andrew Ng AMA, Autoencoders, and Deep Learning Textbooks
Andrew Ng's AMA, a probabilistic view of Autoencoders, open source sentiment analysis, deep learning textbooks, and Airbnb's host matching are all discussed this week on /r/MachineLearning.
on Apr 23, 2015 in AirBnB, Andrew Ng, Baidu, Deep Learning, Grant Marshall, Open Source, Reddit, Sentiment Analysis, Textbook
Salford Quickstart Data Mining Training in Washington, DC, May 15
Get step-by-step instruction for the most popular data mining techniques, be able to start your own data mining projects, apply your new data mining knowledge to create immediate value.
on Apr 23, 2015 in Data Mining Training, DC, Salford Systems, Washington
Data Lakes for Big Data, Free MOOC from EMC
What can Big Data and Data Lakes do for you? Find out in our FREE Data Lakes for Big Data MOOC.
on Apr 23, 2015 in Data Lakes, EMC, MOOC, Online Education
Big Data Bootcamp, Austin: Day 1 Highlights
Highlights from the presentations by Big Data and Analytics leaders/consultants on day 1 of Big Data Bootcamp 2015 in Austin.
on Apr 22, 2015 in Apache Spark, Bootcamp, Hadoop, MapReduce, MongoDB, NoSQL, Relational Databases, Sharding, Spark SQL
Interview: Emmanuel Letouzé, Data-Pop Alliance on Big Data and Human Rights – A Complex Affair
We discuss the founding story of Data-Pop Alliance, the applications and implications of Big Data on Human Rights and the need for penetration of Data Literacy.
on Apr 22, 2015 in Challenges, Data Literacy, Data-Pop Alliance, Ebola, Emmanuel Letouze, Harvard, Interview, Opportunities
Deep Learning to Fight Crime
We look at how using Deep Learning, Spark, and H2O Machine Learning platform can be used to analyze and predict crime in San Francisco and Chicago.
on Apr 22, 2015 in Apache Spark, CA, Chicago, Crime, Deep Learning, H2O, IL, San Francisco
UC Analytics Summit 2015, Cincinnati, May 29
UC Summit will feature two analytics leaders: John Elder and Stephen Few, 4 afternoon tracks focusing on descriptive / prescriptive / predictive analytics, building your analytics team, and more.
on Apr 21, 2015 in Business Analytics, Cincinnati, John Elder, OH
KDnuggets Poll: Future of Predictive Analytics: Human or Machine?
The robots are taking over many jobs - will they take yours and mine? New KDnuggets Poll is asking if and when automation will reach the level of human data scientists.
on Apr 21, 2015 in Artificial Intelligence, Automation, Poll
Map of the Complexity Sciences – from von Neumann & Kolmogorov to Hofstadter and Piatetsky-Shapiro (?)
A map of the Complexity Sciences traces its intellectual heritage from Isaac Newton and Henri Poincare to John von Neumann, Andrei Kolmogorov, and Duncan Watts, and includes an unexpectedly familiar name.
on Apr 21, 2015 in About Gregory Piatetsky, Brian Castellani, Complexity
Interview: Michael Li, Data Incubator on Bridging the Data Science Skills Gap between Academia and Industry
We discuss the response from hiring companies, recommendations for aspirants, retaining data science talent, advice, and more.
on Apr 21, 2015 in Academics, Advice, Career, Data Science Skills, Industry, Interview, Machine Learning, Recommendations, Trends
Top 10 R Packages to be a Kaggle Champion
Kaggle top ranker Xavier Conort shares insights on the “10 R Packages to Win Kaggle Competitions”.
on Apr 21, 2015 in Kaggle, R Packages, random forests algorithm, Success, SVM, Text Analysis, Xavier Conort
Top KDnuggets tweets, Apr 14-20: Modern Methods for Sentiment Analysis; Basics of SQL, RDBMS – must have skills
Great overview: Modern Methods for Sentiment Analysis #word2vec; Basics of SQL and RDBMS - must have skills for data science; The 7 Most Unusual Applications of Big Data; Extensive, but a little confusing site: Understanding Data Visualization.
on Apr 21, 2015 in About Gregory Piatetsky, Data Visualization, Sentiment Analysis, SQL, word2vec
Salford Webinar: Maximizing ROI with State-of-the-art Data Science Techniques, Apr 28
ROI is a key measure for many business decisions. We will show how using state-of-the-art data science techniques, like TreeNet gradient boosting, we can optimize product promotion options and maximize revenue and wider gain.
on Apr 21, 2015 in ROI, Salford Systems, TreeNet
The Imminent Future of Predictive Modeling
Predictive modeling tools and services are undergoing an inevitable step-change which will free data scientists to focus on applications and insight, and result in more powerful and robust models than ever before. Amongst the key enabling technologies are new hugely scalable cross-validation frameworks, and meta-learning.
on Apr 21, 2015 in Automation, MLaaS, PAW, Predictive Analytics World
Algorithmia Tested: Human vs Automated Tag Generation
Algorithmia, the marketplace for algorithms, can be a platform for hosting APIs to do a plethora of text analytics and information retrieval tasks. Automatic post tagging is done in this case study to demonstrate the effectiveness and ease-of-use of the platform.
on Apr 21, 2015 in Algorithmia, API, Grant Marshall, Information Retrieval, Python, Text Analytics
Penn State Online Business Analytics Certificate
Penn State World Campus 9-credit online Graduate Certificate in Business Analytics: teaches you Business Strategies, Marketing Analytics, and Prescriptive Analytics. Applications due June 16.
on Apr 21, 2015 in Business Analytics, Certificate, Online Education, Penn State
Upcoming Webcasts on Analytics, Big Data, Data Science – Apr 21 and beyond
Solving Big Data Challenges, Impact of User-Generated Reviews, Implementing a Better Search Experience, Maximizing ROI using Data Science, Identifying Customers Across Platforms, The Fast Data Challenge with Michael Stonebraker, and more.
on Apr 20, 2015 in Customer Analytics, In-Memory Computing, ROI
Interview: Michael Li, Data Incubator on Data-driven Hiring for Data Scientists
We discuss the launch of the Data Incubator, its business model, why we need data-driven hiring, selection process for the incubator program and alumni feedback.
on Apr 20, 2015 in Data Incubator, Data Scientist, Fellowship, Hiring, Incubation, Interview, Michael Li, PhD
Webinar: Identifying Users Across Platforms with a Universal ID, Apr 28, by Looker + Segment
Any serious customer analysis requires that each customer is counted once and only once - a difficult problem, especially with customer touchpoints across devices. This webinar shows how using a universal id helps solve this problem.
on Apr 20, 2015 in Customer Analytics, Looker
Linkurious Enterprise democratizes graph visualization
Linkurious announces the launch of Linkurious Enterprise, the first data visualization platform for graph databases.
on Apr 20, 2015 in Graph Visualization, Linkurious, Neo4j, Security, Social Network Analysis
smartcon: Big Data, Big Ideas conference with world-renowned experts, Istanbul, 26-27 May
Big data, and its wide range of innovative business applications will be discussed in smartcon 2015 in Istanbul, May 26-27, led by world-renowned experts including Alex Pentland, Usama Fayyad, Amr Awadallah, and Andreas Weigend.
on Apr 20, 2015 in Alex Pentland, Big Data, Istanbul, Turkey, Usama Fayyad
Algorithmia: Building a web site explorer in 5 easy steps
We show how to use Algorithmia for quickly building a functional web site explorer in 5 steps: GetLinks, PageRank, Url2text, Summarizer and AutoTag.
on Apr 20, 2015 in Algorithmia, API, Page Rank, Search Engine, Web Mining
PAW San Francisco 5 Min Recap – Predictive Analytics World
PAW San Francisco: 550+ Data Professionals, 85+ conference sessions, 4 conferences, Dean Abbott on 3-legged stool of good data, domain expertise, and advanced analytics, and more.
on Apr 20, 2015 in CA, Dean Abbott, PAW, Predictive Analytics World, San Francisco
Top stories for Apr 12-18: Awesome Public Datasets on GitHub; Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure
Awesome Public Datasets on GitHub; Preventing Overfitting in Neural Networks; Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure; The Grammar of Data Science: Python vs R.
on Apr 19, 2015 in Top stories
Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science – from “Big Bang” to Now
We examine top LinkedIn groups in Analytics, Big Data, Data Mining, and Data Science from the Big Bang era of group creation to present, and identify the largest groups, the fastest growing groups, and 2 main clusters.
on Apr 19, 2015 in About KDnuggets, LinkedIn, LinkedIn Groups
Cartoon: A solution for Data Scientists allergies caused by Big Data
With more and more allergies and big trend towards gluten-free everything, new KDnuggets cartoon envisions a possible solution for Data Scientists allergies.
on Apr 17, 2015 in Allergy, Big Data, Cartoon, Data Scientist
Interview: Ksenija Draskovic, Verizon on Conquering Fear and Cherishing Creativity for Success in Data Science
We discuss career advice, motivation, key qualities sought in Data Science practitioners, and more.
on Apr 17, 2015 in Advice, Career, Data Science, Interview, Ksenija Draskovic, Success, Verizon
Ventana Predictive Analytics Research, take part and get exclusive report
Our partner Ventana Research is conducting research into the next generation of predictive analytics. Tell us about your analytics experience and methods and get Amazon certificate and also free report with findings and best practices.
on Apr 17, 2015 in Predictive Analysis, Survey, Ventana Research
Data Science 101: Preventing Overfitting in Neural Networks
Overfitting is a major problem for Predictive Analytics and especially for Neural Networks. Here is an overview of key methods to avoid overfitting, including regularization (L2 and L1), Max norm constraints and Dropout.
on Apr 17, 2015 in Neural Networks, Nikhil Buduma, Overfitting, Regularization
Interview: Ksenija Draskovic, Verizon on How to Not Get Lost in the Big Data Wilderness
We discuss recommendations for data-driven decision making, challenges and benefits of using unstructured data, managing expectations and key trends.
on Apr 16, 2015 in Analytics, Interview, Ksenija Draskovic, Success, Trends, Unstructured data, Verizon
Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure
Amazon recently announced Amazon Machine Learning, a cloud machine learning solution for Amazon Web Services. Able to pull data effortlessly from RDS, S3 and Redshift, the product could pose a significant threat to Microsoft Azure ML and IBM Watson Analytics.
on Apr 16, 2015 in Amazon, Azure ML, IBM Watson, Logistic Regression, Machine Learning, MetaMind, Prediction, Regression, Zachary Lipton
Math of Ideas: A Word is Worth a Thousand Vectors
Word vectors give us a simple and flexible platform for understanding text, there are a few diverse examples that should help build your confidence in developing and deploying NLP systems and what problems they can solve.
on Apr 16, 2015 in NLP, Stitch Fix, Text Analytics, word2vec
The State of the Text Analytics Industry – 2015 White Paper
This free whitepaper gives the perspectives of industry experts from leading firms on the culture, benefits, challenges, data and technology currently impacting the text analytics market today.
on Apr 16, 2015 in Data-Driven Business, Text Analytics, White Paper
Domo: From Big Data to Big Decisions Infographic and BI Guide
A new business intelligence guide by DOMO called From Big Data To Better Decisions details the growing importance of collecting, understanding, and applying data to make better decisions.
on Apr 15, 2015 in Big Data, Business Intelligence, Domo, Infographic
Interview: Ksenija Draskovic, Verizon on Dissecting the Anatomy of Predictive Analytics Projects
We discuss Predictive Analytics use cases at Verizon Wireless, advantages of a unified data view, model selection and common causes of failure.
on Apr 15, 2015 in Customer Intelligence, Interview, Ksenija Draskovic, Optimization, Predictive Analytics, Project Fail, Use Cases, Verizon
Baby Boom: Udemy Excel Tutorial on Analyzing Large Data Sets
This tutorial not only shows how to use Excel Pivot Tables and Graphs, but teaches the mindset needed in exploratory data analysis - look beneath the surface, consider the non-obvious interpretations, and question everything (including the data).
on Apr 15, 2015 in CA, Data Wrangling, Excel, Tutorials, Udemy
Webinar: Implementing a Better Search Experience, April 28
Learn how to make SharePoint more than a place where you put documents and start transforming your collected knowledge into your *collective* knowledge.
on Apr 15, 2015 in Expert System, Search Quality, SharePoint, Topic Modeling
Top /r/MachineLearning Posts, Apr 5-11: Amazon Machine Learning, Numerical Optimization, and Conditional Random Fields
Amazon Machine Learning as a Service, Numerical Optimization, Extracting data from NYTimes recipes, Intro to Machine Learning with sci-kit, and more.
on Apr 14, 2015 in Amazon, Deep Learning, Kaggle, Machine Learning, Probability, Python, Reddit, scikit-learn
Interview: Michael Lurye, Time Warner Cable on Key Lessons from Shifting to Hadoop
We discuss the key lessons from shifting to Hadoop, data management in today’s world, future of Data Science, advice and more.
on Apr 14, 2015 in Data Quality, Data Warehousing, Hadoop, Interview, Mike Lurye, Time Warner Cable, Trends
Provalis Research WordStat for Stata combines Numerical, Text Analysis
This new collaboration couples the cutting-edge numerical analysis of Stata with the unique text analytics functionality of Provalis Research.
on Apr 14, 2015 in Provalis, Stata, Text Mining, WordStat
Top KDnuggets tweets, Apr 6-13: Languages have more “happy” words, esp. Spanish; Popular similarity measures in Python
Languages have more "happy" words than unhappy; 5 most popular #similarity measures implementation in Python; Brilliant! Dilbert on Resume embellishing: if engineer, fire him; if marketer ...; Top programming languages change rapidly: SQL, C#, C++ down, Python, Node.js up.
on Apr 14, 2015 in Dilbert, Programming Languages, Python, Similarity
TDWI Chicago Special Offer – Respond by Apr 17
A special invitation and discount for you and your team to attend TDWI Chicago, May 3-8. Join leading industry experts, analysts, practitioners, and solution providers who deliver a world-class education program.
on Apr 14, 2015 in Chicago, IL, TDWI
Interview: Michael Lurye, Time Warner Cable on Big Data and the Insatiable Demand for BI
We discuss EDM at Time Warner Cable, data sources, complementing legacy data warehouses with Big Data solutions, vendor selection and build vs. buy decision.
on Apr 13, 2015 in Big Data, Business Intelligence, Data Management, Data Warehouse, Hadoop, Interview, Mike Lurye, Time Warner Cable
Upcoming Webcasts on Analytics, Big Data, Data Science – Apr 14 and beyond
Women in Data, Business Value from Big Data Quickly, Impact of User-Generated Reviews on Purchase Behavior, Maximizing ROI using Data Science, Apache Ignite, and much more.
on Apr 13, 2015 in Big Data ROI, Michael Stonebraker, Product reviews, Women
UMass Amherst Big Data Report
New report from UMass Amherst covers the strength of Massachusetts and UMass 5 campuses in Big Data and Data Science, and projects 120K Big Data related jobs in Mass by 2018.
on Apr 13, 2015 in Amherst, MA, Massachusetts, UMass
KDnuggets Free Pass to Strata Hadoop World London, 5-7 May, 2015
Strata + Hadoop World has been called "mind-blowing", "an amazing event", "the most interesting and informative conference". Win free registration via KDnuggets.
on Apr 13, 2015 in Hadoop, London, Strata, UK
NYC Data Science Academy Bootcamps, Classes on R, Python, and Machine Learning
NYC Data Science Academy upcoming schedule includes 7 bootcamp events and 4 classes on Data Science, R, Python, and Machine Learning. Register now.
on Apr 13, 2015 in Bootcamp, Data Science Education, NYC Data Science Academy, Python, R
CourseBuffet: Organizing MOOC Courses on Big Data, Data Science, Statistics
CourseBuffet organizes MOOCs in a course catalog format and includes over 100 courses on Big Data, Data Mining, Data Science, and Statistics.
on Apr 12, 2015 in Data Science Education, MOOC, Online Education
Top stories for Apr 5-11: 10 things statistics taught us about big data analysis; Awesome Public Datasets on GitHub
10 things statistics taught us about big data analysis; The Grammar of Data Science: Python vs R; Predictive Analytics Innovation Summit (San Diego) Highlights; Awesome Public Datasets on GitHub.
on Apr 12, 2015 in Top stories
Wikibon Big Data Vendor Revenue and Market Forecast, 2020
Wikibon finds that Big Data market is maturing, with growth rate slowing from 60% in 2013 to 40% in 2014. Wikibon expects the Big Data market to top $61 billion in 2020.
on Apr 11, 2015 in Big Data Market, Market Forecast, Wikibon
Join us for Predictive Analytics Events in Chicago June 2015
The leading, world-renowned events in predictive analytics are coming to Chicago this June. Build your skillset and knowledge, learn from experts, great networking opportunities. Early bird rates until Apr 24.
on Apr 10, 2015 in Chicago, IL, PAW, Predictive Analytics World
Interview: Xia Wang, AstraZeneca on Big Data and the Promise of Effective Healthcare
We discuss challenges in analyzing text data, Big Data impact on translational bioinformatics, advice, desired skills in data scientists, and more.
on Apr 10, 2015 in Advice, AstraZeneca, Bioinformatics, Career, Challenges, Healthcare, Interview, Xia Wang
Top /r/MachineLearning Posts, Mar 29-Apr 4: Andrew Ng AMA, Deep Learning for NLP, and OpenCL Convnets
Andrew Ng's upcoming AMA, scikit-learn updates, Richard Socher's Deep Learning NLP videos, Criteo's huge new dataset, and convolutional neural networks on OpenCL are the top topics discussed this week on /r/MachineLearning.
on Apr 10, 2015 in Andrew Ng, Convolutional Neural Networks, Datasets, Deep Learning, NLP, Python, Reddit, scikit-learn
March 2015 Analytics, Big Data, Data Mining Acquisitions and Startups Activity
March 2015 acquisitions, startups, and company activity in Analytics, Big Data, Data Mining, and Data Science: Apple buys Acunu, Algorithmia Launches, Dataminr raises $130M, PatentVector, Looker, and more.
on Apr 9, 2015 in Acunu, Algorithmia, Apple, Dataminr, Looker, Startups
Algorithmia – How Marketplaces are Fostering Innovation?
We have a marketplace for almost everything – mobile apps, cabs, hotels, and what not. But, not for algorithms. Algorithmia takes up that challenge.
on Apr 9, 2015 in Algorithmia, API, California, Crowdsourcing, Innovation, Marketplace, Social Networks
Interview: Xia Wang, AstraZeneca on Unraveling Patient Treatment Journey by NLP on Clinical Notes
We discuss Analytics at AstraZeneca, prominent use cases, how NLP helped understanding patient treatment journey in diabetes, data sources, insights, and more.
on Apr 9, 2015 in AstraZeneca, Healthcare, Insights, NLP, Recommendations, Research, Xia Wang
Inside Deep Learning: Computer Vision With Convolutional Neural Networks
Deep Learning-powered image recognition is now performing better than human vision on many tasks. We examine how human and computer vision extracts features from raw pixels, and explain how deep convolutional neural networks work so well.
on Apr 9, 2015 in Computer Vision, Convolutional Neural Networks, Deep Learning, Image Recognition, Nikhil Buduma
What do you want to learn? Big Data TechCon How-To Conference, Apr 26-28, Boston
Our survey ahead of Big Data Techcon conference in Boston find most interest in learning Predictive analytics, Data visualization, Spark, Deep learning / Machine Learning, Hadoop and other components of Hadoop stack, and Python.
on Apr 9, 2015 in Apache Spark, Big Data, Boston, Data Visualization, Deep Learning, Hadoop, MA, Techcon
Machine Learning 201: Does Balancing Classes Improve Classifier Performance?
The author investigates if balancing classes improves performance for logistic regression, SVM, and Random Forests, and finds where it helps the performance and where it does not.
on Apr 9, 2015 in Balancing Classes, random forests algorithm, Regression, SVM
Interview: Ravi Iyer, Ranker on Dealing with Inherent Bias in Crowdsourcing Data
We discuss the challenges of analyzing crowdsourcing data, tools and technologies, competitive landscape, advice, trends, and more.
on Apr 8, 2015 in Advice, Bias, Challenges, Crowdsourcing, Interview, Ranker, Ravi Iyer
Predictive Analytics Innovation Summit, San Diego: Day 2 Highlights
Highlights from the presentations by Predictive Analytics leaders from eBay, LinkedIn and Facebook on day 2 of Predictive Analytics Innovation Summit 2015 in San Diego.
on Apr 8, 2015 in A/B Testing, CA, eBay, Facebook, IE Group, LinkedIn, Marketing, Predictive Analytics, San Diego
Top stories in March: 7 common Machine Learning mistakes; Deep Learning for Text Understanding from Scratch
7 common mistakes when doing Machine Learning; Deep Learning for Text Understanding from Scratch; More Free Data Mining, Data Science Books and Resources; The Grammar of Data Science: Python vs R.
on Apr 7, 2015 in Top stories
Interview: Ravi Iyer, Ranker on Why Crowdsourcing Needs Data Science
We discuss the dynamics of Ranker crowdsourcing platform, key factors for effectiveness, role of data science in crowdsourcing, and more.
on Apr 7, 2015 in Analytics, Crowdsourcing, Data Science, Interview, Ranker, Ravi Iyer
Predictive Analytics Innovation Summit, San Diego: Day 1 Highlights
Highlights from the presentations by Predictive Analytics leaders from The Data Incubator, Tamr, Sony and Facebook on day 1 of Predictive Analytics Innovation Summit 2015 in San Diego.
on Apr 7, 2015 in CA, Data Curation, Facebook, IE Group, Marketing, Predictive Analytics, San Diego, Sony, Summit, Tamr
Be Smarter Than Your Devices: Learn About Big Data
If the Apple Watch rollout proves anything, it might be this: Going forward, we’ll all have to be as smart about data as our devices. Also, learn about the origins of "Big Data" term.
on Apr 7, 2015 in Apple Watch, Big Data, IoT, Nitin Indurkhya, Privacy, Tim Cook
Wharton Successful Applications of Customer Analytics Conf., Apr 30, Philadelphia
Wharton Customer Analytics Initiative (WCAI) helps define Customer Analytics, with conference dedicated to real-world applications that balance high-level rigor and business know-how. Case studies include Nielsen, Google, Cablevision, and MLB.
on Apr 7, 2015 in Customer Analytics, MLB, PA, Philadelphia, WCAI, Wharton
ICDM 2015: Nobel Prize Winner, Machine Learning Guru, and Facebook Data Scientist to Keynote
Nobel Prize Winner, Machine Learning Guru, and Facebook Data Scientist will be keynote speakers for the 2015 IEEE International Conference on Data Mining series (ICDM).
on Apr 7, 2015 in Atlantic City, Facebook, ICDM, IEEE, Michael Jordan, NJ
Top KDnuggets tweets, Apr 2-5: The Data Science ecosystem: Data wrangling useful tools and tips
The #datascience ecosystem part 2: Data wrangling useful tools and tips; 10 R Packages to Win Kaggle Competitions; Forrester Wave #BigData Predictive #Analytics Solutions 2015, gainers, losers; How Microsoft uses Big Data to predict traffic jams in advance.
on Apr 6, 2015 in Data Wrangling, Forrester, Kaggle, Microsoft, R, Traffic
Upcoming Webcasts on Analytics, Big Data, Data Science – Apr 7 and beyond
More Accurate Predictive Analytic Models, Enterprise Data Rapid Sense-making, Data Mining - Failure to Launch, Disrupting Traditional Analyst Workflows, Making Sense of Hadoop, and more.
on Apr 6, 2015 in Data Lakes, Hadoop, Lavastorm
WCAI Research Opportunity, Apr 24: He Said, She Bought – User-Generated Reviews and Purchase Behavior
The data collected by a UK-based big box retailer, with all website visits, page views, reviews read, and purchases made gives researchers an unprecedented opportunity to look at how customers shop for products. Register for the webinar on Apr 24.
on Apr 6, 2015 in Bazaarvoice, Consumer Analytics, Product reviews, User Generated Content, WCAI
Women Analytics Book Authors – Meta List
Meta Brown is mission to promote accomplished women in analytics - her catalog includes hundreds of women who published books on many analytics topics - useful for finding experts to present at your event, comment on an issue or work for you.
on Apr 6, 2015 in Book, Meta Brown, Women
Awesome Public Datasets on GitHub
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
on Apr 6, 2015 in Datasets, Finance, GitHub, Government, Machine Learning, NLP, Open Data, Time series data
Interview: Beth Diaz, Washington Post on How Dark Social is Shadowing Modern Analytics
We discuss recent events at Washington Post, growth initiatives, the growing pain of Dark Social, how to deal with it, audience analytics, advice and more.
on Apr 6, 2015 in Advice, Analytics, Beth Diaz, Challenges, Dark Social, Interview, Jeff Bezos, Washington Post
Top stories for Mar 29 – Apr 4: Deep Learning, Dimensionality, and Autoencoders; The Grammar of Data Science: Python vs R
Deep Learning, The Curse of Dimensionality, and Autoencoders; The Grammar of Data Science: Python vs R; Data Science as a profession - time is now; Forrester Wave Big Data Predictive Analytics 2015: Gainers and Losers.
on Apr 5, 2015 in Top stories
Additions to KDnuggets Directory in March 2015
TDWI Chicago, XLDB 2015, Text Analytics East, Sentiment Symposium NYC, Sliderule Intro to Data Science, U.Pacific MS in Analytics, ReportMiner, MoData, Iepy open-source Info Extraction, and more meetings, companies, education, and software.
on Apr 5, 2015 in Data Science Education, MS in Analytics, New York City, Santa Clara
Poll: Machine Learning APIs
Poll from Bart Baesens at KU Leuven asks about your usage of Machine Learning APIs and other predictive analytics tools.
on Apr 4, 2015 in API, Bart Baesens, Machine Learning, Poll
Blockspring: Out-run programmers with your spreadsheet
Blockspring for Google Sheets lets you run over 1000 functions from your spreadsheets - create interactive data visualizations, run algorithms, pull data sources, execute db queries, automate tweets and emails, make API calls, and more.
on Apr 4, 2015 in Blockspring, Google, Spreadsheet
Big Data Developer Conference, Santa Clara: Day 3 Highlights
Highlights from the presentations/tutorials by Data Science leaders from VISA, Glassbeam, Unravel on day 3 of Big Data Developer Conference, Santa Clara.
on Apr 3, 2015 in Apache Spark, Developers, Global Big Data Conference, Hadoop, Highlights, Security, Spark SQL
Interview: Alessandro Gagliardi, Glassdoor on the Fun and Boring Part of Data Scientist Job
We discuss interesting trends, motivation, different aspects of data scientist job, advice, and more.
on Apr 3, 2015 in Advice, Alessandro Gagliardi, Career, Data Scientist, Glassdoor, Jobs, Trends
Watson Developer Cloud-Visual Recognition
IBM Bluemix is a cloud platform which offers both Platform as a Service and Mobile Backend as a Service. Its services include Speech to Text, Text to Speech, Visual Recognition, Concept Insights, and Tradeoff Analytics.
on Apr 3, 2015 in App, BlueMix, IBM, IBM Watson, Image Recognition, Ran Bi
Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers
IBM, SAS, and SAP lead in Forrester Wave(tm) Big Data Predictive Analytics Solutions for Q2, 2015. We compare with a previous Forrester Wave for 2013 and examine gainers and losers.
on Apr 3, 2015 in Alpine, Alteryx, Angoss, FICO, Forrester, IBM, Knime, Mike Gualtieri, Oracle, Predictive Analytics, RapidMiner, SAP, SAS
Top KDnuggets tweets, Mar 30 – Apr 01: Very useful! Data Visualization with ggplot2 CheatSheet
Very useful! Data Visualization with ggplot2 Cheat Sheet; Great Data Science resource: Intro to Statistics using Python, Pandas; 14 Best Python Pandas Features; Data Science shows why taxis can never compete.
on Apr 2, 2015 in Andrew Ng, Cheat Sheet, ggplot2, Lionel Messi, Pandas, Python, Soccer, Uber
Chapter Download from “Data Mining Techniques” (3rd edition)
Download this chapter from "Data Mining Techniques" (3rd Edition), by Gordon Linoff and Michael Berry, and learn how to create derived variables, which allow the statistical modeling process to incorporate human insights.
on Apr 2, 2015 in Data Mining, Derived Variables, Gordon Linoff, JMP, Michael Berry
Big Data Developer Conference, Santa Clara: Day 2 Highlights
Highlights from the presentations/tutorials by Data Science leaders from Cloudera, LinkedIn, Intel, MapR, Locbit and others on day 2 of Big Data Developer Conference 2015.
on Apr 2, 2015 in Cloudera, Developers, Global Big Data Conference, Highlights, Hortonworks, Intel, LinkedIn, MapR, Security
100+ upcoming April – October 2015 Meetings in Analytics, Big Data, Data Mining, Data Science
Coming soon: INFORMS Business Analytics, PASS Business Analytics, Big Data Week, Text by the Bay, Big Data Techcon, Big Data Innovation Summit, Wharton Successful Applications of Customer Analytics, and many more.
on Apr 2, 2015 in Boston, CA, Chicago, IL, London, MA, New York City, NY, San Diego, San Francisco, Santa Clara, UK
Text By the Bay conference, San Francisco, Apr 24-25
The inaugural Text By the Bay conference has an amazing program, with speakers from top universities, Big text data powerhouses, Growing global players, Startups, Text/NLP tech providers, and more. KDnuggets discount.
on Apr 2, 2015 in CA, NLP, San Francisco, Sentiment Analysis, Startups, Text Analytics, Topic Modeling
Pacific 1-year MS in Analytics in San Francisco
Get MS in Analytics in San Francisco: 1-year flexible hybrid program for working professionals, Industry sponsored cases and projects, State of the art facilities and technology - learn more.
on Apr 2, 2015 in CA, MS in Analytics, San Francisco, University of the Pacific
Hazy Forecast for Consumer Privacy in the Next Decade
Majority of experts felt that developing a privacy framework that would be both popular and functional was next to impossible in the near future. With time, privacy is likely tol become a class issue with consumers who have the money having the ability to secure their data better.
on Apr 2, 2015 in Consumer Analytics, Hal Varian, Privacy
Hadoop as a Service: 18 Cloud Options
Hadoop as a service in the cloud makes big data applications and projects easier to approach and these 18 platforms each provide their own unique solutions.
on Apr 2, 2015 in AWS, Big Data Services, Cloud, Cloudera, Hadoop, Hortonworks, Information Management, MapR, Microsoft Azure
Big Data Developer Conference, Santa Clara: Day 1 Highlights
Highlights from the presentations/tutorials by Data Science leaders from ElephantScale, SciSpike, Twitter and Informatica on day 1 of Big Data Developer Conference, Santa Clara
on Apr 1, 2015 in Developers, Elephant Scale, Global Big Data Conference, Highlights, Informatica, MongoDB, Parquet, SciSpike, Twitter
Interview: Alessandro Gagliardi, Glassdoor on the Indispensable Skills for Data Scientists
We discuss Analytics at Glassdoor, important lessons, major factors affecting job satisfaction, challenges of working on Twitter Data, indispensable components of Data Science education.
on Apr 1, 2015 in Alessandro Gagliardi, Data Science Skills, Data Scientist, Glassdoor, Interview, Jobs, Prediction, Twitter
Gold Mine or Blind Alley? Functional Programming for Big Data & Machine Learning
Functional programming is touted as a solution for big data problems. Why is it advantageous? Why might it not be? And who is using it now?
on Apr 1, 2015 in Big Data, Functional Programming, Haskell, Zachary Lipton
A Data Scientist Advice to Business Schools
To remain relevant business school graduates must learn to speak to Data Scientists, whose domain expertise is playing a vital role in an organization's ability to compete in today's market.
on Apr 1, 2015 in Advice, Business Schools, Data Scientist, Sean McClure
Big Data for the Common Good “Collider”, at Frankfurt / Berkeley
The Frankfurt Big Data Lab and ODBMS.org cooperate with the Center for Entrepreneurship & Technology (CET) at UC Berkeley to enable the creation of project proposals for Big Data for the Common Good.
on Apr 1, 2015 in Big Data, Frankfurt, Germany, Social Good, UC Berkeley
Computing Platforms for Analytics, Data Mining, Data Science
The poll results suggest a split between a majority of data miners and data scientists who work with growing but still "PC-size", small GB-sized data, and a smaller group of Big Data analysts who work with cloud-sized data. Cloud computing, Unix, and especially Mac gained in popularity.
on Apr 1, 2015 in Apple, Cloud Computing, Poll