KDnuggets™ News 15:n11, Apr 15: Big Data Predictive Analytics Gainers & Losers; Awesome Public Datasets
Awesome Public Datasets on GitHub; Gold Mine or Blind Alley? Functional Programming for Machine Learning; Inside Deep Learning - Convolutional networks; KDnuggets Free Pass to Strata Hadoop World London.
Features | Software | Opinions | Interviews | Reports | News | Webcasts | Courses | Meetings | Jobs | Academic | Publications | Tweets | CFP | Quote
Features
- Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers
- Apr 3, 2015.
IBM, SAS, and SAP lead in Forrester Wave(tm) Big Data Predictive Analytics Solutions for Q2, 2015. We compare with a previous Forrester Wave for 2013 and examine gainers and losers. - Awesome Public Datasets on GitHub - Apr 6, 2015.
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick? - Gold Mine or Blind Alley? Functional Programming for Big Data & Machine Learning - Apr 1, 2015.
Functional programming is touted as a solution for big data problems. Why is it advantageous? Why might it not be? And who is using it now? - Inside Deep Learning: Computer Vision With Convolutional Neural Networks - Apr 9, 2015.
Deep Learning-powered image recognition is now performing better than human vision on many tasks. We examine how human and computer vision extracts features from raw pixels, and explain how deep convolutional neural networks work so well. - Interview: Alessandro Gagliardi, Glassdoor on the Indispensable Skills for Data Scientists - Apr 1, 2015.
We discuss Analytics at Glassdoor, important lessons, major factors affecting job satisfaction, challenges of working on Twitter Data, indispensable components of Data Science education. - Join us for Predictive Analytics Events in Chicago June 2015 - Apr 10, 2015.
The leading, world-renowned events in predictive analytics are coming to Chicago this June. Build your skillset and knowledge, learn from experts, great networking opportunities. Early bird rates until Apr 24. - KDnuggets Free Pass to Strata Hadoop World London, 5-7 May, 2015 - Apr 13, 2015.
Strata + Hadoop World has been called "mind-blowing", "an amazing event", "the most interesting and informative conference". Win free registration via KDnuggets.
Software (see also All Software )
- Blockspring: Out-run programmers with your spreadsheet - Apr 4, 2015.
Blockspring for Google Sheets lets you run over 1000 functions from your spreadsheets - create interactive data visualizations, run algorithms, pull data sources, execute db queries, automate tweets and emails, make API calls, and more. - Provalis Research WordStat for Stata combines Numerical, Text Analysis - Apr 14, 2015.
This new collaboration couples the cutting-edge numerical analysis of Stata with the unique text analytics functionality of Provalis Research. - Watson Developer Cloud-Visual Recognition - Apr 3, 2015.
IBM Bluemix is a cloud platform which offers both Platform as a Service and Mobile Backend as a Service. Its services include Speech to Text, Text to Speech, Visual Recognition, Concept Insights, and Tradeoff Analytics.
Opinions (see also All Opinions for this month )
- Algorithmia - How Marketplaces are Fostering Innovation? - Apr 9, 2015.
We have a marketplace for almost everything - mobile apps, cabs, hotels, and what not. But, not for algorithms. Algorithmia takes up that challenge. - Machine Learning 201: Does Balancing Classes Improve Classifier Performance? - Apr 9, 2015.
The author investigates if balancing classes improves performance for logistic regression, SVM, and Random Forests, and finds where it helps the performance and where it does not. - Be Smarter Than Your Devices: Learn About Big Data - Apr 7, 2015.
If the Apple Watch rollout proves anything, it might be this: Going forward, we'll all have to be as smart about data as our devices. Also, learn about the origins of "Big Data" term. - Hazy Forecast for Consumer Privacy in the Next Decade - Apr 2, 2015.
Majority of experts felt that developing a privacy framework that would be both popular and functional was next to impossible in the near future. With time, privacy is likely tol become a class issue with consumers who have the money having the ability to secure their data better. - A Data Scientist Advice to Business Schools - Apr 1, 2015.
To remain relevant business school graduates must learn to speak to Data Scientists, whose domain expertise is playing a vital role in an organization's ability to compete in today's market.
Interviews (see also All Interviews for this month )
- Interview: Ravi Iyer, Ranker on Dealing with Inherent Bias in Crowdsourcing Data - Apr 8, 2015.
We discuss the challenges of analyzing crowdsourcing data, tools and technologies, competitive landscape, advice, trends, and more. - Interview: Ravi Iyer, Ranker on Why Crowdsourcing Needs Data Science - Apr 7, 2015.
We discuss the dynamics of Ranker crowdsourcing platform, key factors for effectiveness, role of data science in crowdsourcing, and more. - Interview: Michael Lurye, Time Warner Cable on Key Lessons from Shifting to Hadoop - Apr 14, 2015.
We discuss the key lessons from shifting to Hadoop, data management in today's world, future of Data Science, advice and more. - Interview: Michael Lurye, Time Warner Cable on Big Data and the Insatiable Demand for BI - Apr 13, 2015.
We discuss EDM at Time Warner Cable, data sources, complementing legacy data warehouses with Big Data solutions, vendor selection and build vs. buy decision. - Interview: Xia Wang, AstraZeneca on Big Data and the Promise of Effective Healthcare - Apr 10, 2015.
We discuss challenges in analyzing text data, Big Data impact on translational bioinformatics, advice, desired skills in data scientists, and more. - Interview: Xia Wang, AstraZeneca on Unraveling Patient Treatment Journey by NLP on Clinical Notes - Apr 9, 2015.
We discuss Analytics at AstraZeneca, prominent use cases, how NLP helped understanding patient treatment journey in diabetes, data sources, insights, and more. - Interview: Beth Diaz, Washington Post on How Dark Social is Shadowing Modern Analytics - Apr 6, 2015.
We discuss recent events at Washington Post, growth initiatives, the growing pain of Dark Social, how to deal with it, audience analytics, advice and more. - Interview: Alessandro Gagliardi, Glassdoor on the Fun and Boring Part of Data Scientist Job - Apr 3, 2015.
We discuss interesting trends, motivation, different aspects of data scientist job, advice, and more.
Reports (see also All Reports for this month )
- Predictive Analytics Innovation Summit, San Diego: Day 1 Highlights - Apr 7, 2015.
Highlights from the presentations by Predictive Analytics leaders from The Data Incubator, Tamr, Sony and Facebook on day 1 of Predictive Analytics Innovation Summit 2015 in San Diego. - Predictive Analytics Innovation Summit, San Diego: Day 2 Highlights - Apr 8, 2015.
Highlights from the presentations by Predictive Analytics leaders from eBay, LinkedIn and Facebook on day 2 of Predictive Analytics Innovation Summit 2015 in San Diego. - Big Data Developer Conference, Santa Clara: Day 1 Highlights - Apr 1, 2015.
Highlights from the presentations/tutorials by Data Science leaders from ElephantScale, SciSpike, Twitter and Informatica on day 1 of Big Data Developer Conference, Santa Clara - Big Data Developer Conference, Santa Clara: Day 2 Highlights - Apr 2, 2015.
Highlights from the presentations/tutorials by Data Science leaders from Cloudera, LinkedIn, Intel, MapR, Locbit and others on day 2 of Big Data Developer Conference 2015. - Big Data Developer Conference, Santa Clara: Day 3 Highlights - Apr 3, 2015.
Highlights from the presentations/tutorials by Data Science leaders from VISA, Glassbeam, Unravel on day 3 of Big Data Developer Conference, Santa Clara.
News (see also All News )
- Top /r/MachineLearning Posts, Apr 5-11: Amazon Machine Learning, Numerical Optimization, and Conditional Random Fields - Apr 14, 2015.
Amazon Machine Learning as a Service, Numerical Optimization, Extracting data from NYTimes recipes, Intro to Machine Learning with sci-kit, and more. - Top stories for Apr 5-11: 10 things statistics taught us about big data analysis; Awesome Public Datasets on GitHub - Apr 12, 2015.
10 things statistics taught us about big data analysis; The Grammar of Data Science: Python vs R; Predictive Analytics Innovation Summit (San Diego) Highlights; Awesome Public Datasets on GitHub. - Top /r/MachineLearning Posts, Mar 29-Apr 4: Andrew Ng AMA, Deep Learning for NLP, and OpenCL Convnets - Apr 10, 2015.
Andrew Ng's upcoming AMA, scikit-learn updates, Richard Socher's Deep Learning NLP videos, Criteo's huge new dataset, and convolutional neural networks on OpenCL are the top topics discussed this week on /r/MachineLearning. - March 2015 Analytics, Big Data, Data Mining Acquisitions and Startups Activity - Apr 9, 2015.
March 2015 acquisitions, startups, and company activity in Analytics, Big Data, Data Mining, and Data Science: Apple buys Acunu, Algorithmia Launches, Dataminr raises $130M, PatentVector, Looker, and more. - Top stories in March: 7 common Machine Learning mistakes; Deep Learning for Text Understanding from Scratch - Apr 7, 2015.
7 common mistakes when doing Machine Learning; Deep Learning for Text Understanding from Scratch; More Free Data Mining, Data Science Books and Resources; The Grammar of Data Science: Python vs R. - Top stories for Mar 29 - Apr 4: Deep Learning, Dimensionality, and Autoencoders; The Grammar of Data Science: Python vs R - Apr 5, 2015.
Deep Learning, The Curse of Dimensionality, and Autoencoders; The Grammar of Data Science: Python vs R; Data Science as a profession - time is now; Forrester Wave Big Data Predictive Analytics 2015: Gainers and Losers. - Additions to KDnuggets Directory in March 2015 - Apr 5, 2015.
TDWI Chicago, XLDB 2015, Text Analytics East, Sentiment Symposium NYC, Sliderule Intro to Data Science, U.Pacific MS in Analytics, ReportMiner, MoData, Iepy open-source Info Extraction, and more meetings, companies, education, and software. - Poll: Machine Learning APIs - Apr 4, 2015.
Poll from Bart Baesens at KU Leuven asks about your usage of Machine Learning APIs and other predictive analytics tools. - Big Data for the Common Good "Collider", at Frankfurt / Berkeley - Apr 1, 2015.
The Frankfurt Big Data Lab and ODBMS.org cooperate with the Center for Entrepreneurship & Technology (CET) at UC Berkeley to enable the creation of project proposals for Big Data for the Common Good.
Webcasts and Webinars (see also All Webcasts and Webinars )
- Upcoming Webcasts on Analytics, Big Data, Data Science - Apr 14 and beyond - Apr 13, 2015.
Women in Data, Business Value from Big Data Quickly, Impact of User-Generated Reviews on Purchase Behavior, Maximizing ROI using Data Science, Apache Ignite, and much more. - Upcoming Webcasts on Analytics, Big Data, Data Science - Apr 7 and beyond - Apr 6, 2015.
More Accurate Predictive Analytic Models, Enterprise Data Rapid Sense-making, Data Mining - Failure to Launch, Disrupting Traditional Analyst Workflows, Making Sense of Hadoop, and more. - WCAI Research Opportunity, Apr 24: He Said, She Bought - User-Generated Reviews and Purchase Behavior - Apr 6, 2015.
The data collected by a UK-based big box retailer, with all website visits, page views, reviews read, and purchases made gives researchers an unprecedented opportunity to look at how customers shop for products. Register for the webinar on Apr 24.
Courses (see also All Courses )
- NYC Data Science Academy Bootcamps, Classes on R, Python, and Machine Learning - Apr 13, 2015.
NYC Data Science Academy upcoming schedule includes 7 bootcamp events and 4 classes on Data Science, R, Python, and Machine Learning. Register now. - CourseBuffet: Organizing MOOC Courses on Big Data, Data Science, Statistics - Apr 12, 2015.
CourseBuffet organizes MOOCs in a course catalog format and includes over 100 courses on Big Data, Data Mining, Data Science, and Statistics. - Pacific 1-year MS in Analytics in San Francisco - Apr 2, 2015.
Get MS in Analytics in San Francisco: 1-year flexible hybrid program for working professionals, Industry sponsored cases and projects, State of the art facilities and technology - learn more.
Meetings (see also All Meetings )
- TDWI Chicago Special Offer - Respond by Apr 17 - Apr 14, 2015.
A special invitation and discount for you and your team to attend TDWI Chicago, May 3-8. Join leading industry experts, analysts, practitioners, and solution providers who deliver a world-class education program. - What do you want to learn? Big Data TechCon How-To Conference, Apr 26-28, Boston - Apr 9, 2015.
Our survey ahead of Big Data Techcon conference in Boston find most interest in learning Predictive analytics, Data visualization, Spark, Deep learning / Machine Learning, Hadoop and other components of Hadoop stack, and Python. - Wharton Successful Applications of Customer Analytics Conf., Apr 30, Philadelphia - Apr 7, 2015.
Wharton Customer Analytics Initiative (WCAI) helps define Customer Analytics, with conference dedicated to real-world applications that balance high-level rigor and business know-how. Case studies include Nielsen, Google, Cablevision, and MLB. - ICDM 2015: Nobel Prize Winner, Machine Learning Guru, and Facebook Data Scientist to Keynote - Apr 7, 2015.
Nobel Prize Winner, Machine Learning Guru, and Facebook Data Scientist will be keynote speakers for the 2015 IEEE International Conference on Data Mining series (ICDM). - 100+ upcoming April - October 2015 Meetings in Analytics, Big Data, Data Mining, Data Science - Apr 2, 2015.
Coming soon: INFORMS Business Analytics, PASS Business Analytics, Big Data Week, Text by the Bay, Big Data Techcon, Big Data Innovation Summit, Wharton Successful Applications of Customer Analytics, and many more. - Text By the Bay conference, San Francisco, Apr 24-25 - Apr 2, 2015.
The inaugural Text By the Bay conference has an amazing program, with speakers from top universities, Big text data powerhouses, Growing global players, Startups, Text/NLP tech providers, and more. KDnuggets discount.
Jobs (see also All Jobs )
- NREL: Senior Scientist Computational Statistics - Apr 10, 2015.
Work with NREL scientists by introducing modern statistical methodologies in the research and analysis of large-scale temporal datasets, laboratory data, mathematical models, and simulations related to renewable energy and energy efficiency. - Columbia University: Director of Analysis and Reporting - Apr 8, 2015.
Lead research and assessment through strategic analytics to support Columbia College, provide leadership and technical guidance to manage research needs around design, analysis, benchmarking and reporting. - Cox: Manager, Advanced Analytics - Apr 3, 2015.
Identify and work on high-impact business and marketing problems and develop viable solutions through data analysis, predictive modeling, and advanced analytics techniques.
Academic and Research positions (see also All Academic positions )
- UW Tacoma, The Milgard School of Business: Director, Center for Information Based Management - Apr 7, 2015.
Teach courses focused on business analytics and demonstrate effective management and leadership capabilities and strong industry involvement. Non-tenure track full time position.
Publications
- UMass Amherst Big Data Report - Apr 13, 2015.
New report from UMass Amherst covers the strength of Massachusetts and UMass 5 campuses in Big Data and Data Science, and projects 120K Big Data related jobs in Mass by 2018. - Wikibon Big Data Vendor Revenue and Market Forecast, 2020 - Apr 11, 2015.
Wikibon finds that Big Data market is maturing, with growth rate slowing from 60% in 2013 to 40% in 2014. Wikibon expects the Big Data market to top $61 billion in 2020. - Women Analytics Book Authors - Meta List - Apr 6, 2015.
Meta Brown is mission to promote accomplished women in analytics - her catalog includes hundreds of women who published books on many analytics topics - useful for finding experts to present at your event, comment on an issue or work for you. - Chapter Download from "Data Mining Techniques" (3rd edition) - Apr 2, 2015.
Download this chapter from "Data Mining Techniques" (3rd Edition), by Gordon Linoff and Michael Berry, and learn how to create derived variables, which allow the statistical modeling process to incorporate human insights. - Hadoop as a Service: 18 Cloud Options - Apr 2, 2015.
Hadoop as a service in the cloud makes big data applications and projects easier to approach and these 18 platforms each provide their own unique solutions.
Top Tweets (see also All top tweets for this month )
- Top KDnuggets tweets, Apr 6-13 - Apr 14, 2015.
Languages have more "happy" words than unhappy;
5 most popular #similarity measures implementation in Python;
Brilliant! Dilbert on Resume embellishing: if engineer, fire him;
if marketer ...;
Top programming languages change rapidly: SQL, C#, C++ down, Python, Node.js up. - Top KDnuggets tweets, Apr 2-5: The Data Science ecosystem - Apr 6, 2015.
The #datascience ecosystem part 2: Data wrangling useful tools and tips;
10 R Packages to Win Kaggle Competitions;
Forrester Wave #BigData Predictive #Analytics Solutions 2015, gainers, losers;
How Microsoft uses Big Data to predict traffic jams in advance. - Top KDnuggets tweets, Mar 30 - Apr 01 - Apr 2, 2015.
Very useful! Data Visualization with ggplot2 Cheat Sheet;
Great Data Science resource: Intro to Statistics using Python, Pandas;
14 Best Python Pandas Features;
Data Science shows why taxis can never compete.
CFP - Calls for Papers (see also All Calls for Papers )
- Due Apr 15, The 2015 Int. Conf. on Data Mining (DMIN'15) , Las Vegas, NV, USA. July 27-30, 2015
- Due Apr 17, Proposals for WSDM 2016 data mining competition , at WSDM 2016, Bay Area, CA. February 2016.
- Due Apr 17, IEEE/ACM ASONAM 2015: The 2015 IEEE/ACM Int. Conf. on Advances in Social Network Analysis and Mining , Paris, France. Aug 25-28, 2015
- Due Apr 19, The inaugural INNS Big Data 2015 conference , San Francisco, CA, USA. Aug 8-10, 2015
- Due May 1, MUD2 - 2nd Int. Workshop on Mining Urban Data, Emerging Learning Paradigms and Applications for Smart Cities , at ICML 2015, Lille, France. Jul 6-11, 2015
- Due Jun 5, Population Informatics for Big Data (PopInfo'15) , at KDD-2015, Sydney, Australia. 10 August 2015.
- Due Jun 5, ODDx3: KDD 2015 Workshop on Outlier Definition, Detection, and Description , at KDD-2015, Sydney, Australia. 10 August 2015.
- Due Jun 8, MoReBikeS: 2015 ECML-PKDD Challenge on "Model Reuse with Bike rental Station data" , Discovery Challenge n. 1 of ECML PKDD 2015, Porto, Portugal. Sep 7-11, 2015
- Due Jun 8, LMCE 2015, Second Int. Workshop on Learning over Multiple Contexts , at ECML PKDD 2015, Porto, Portugal. Sep 11, 2015
- Due Jun 10, ECMLPKDD 2015 : PhD Session at The European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases , at ECMLPKDD 2015, Porto, Portugal. Sep 7, 2015
- Due Jun 22, LD4KD2015 - ECML/PKDD workshop on Linked Data for Knowledge Discovery , at ECMLPKDD 2015, Porto, Portugal. Sep 11, 2015
- Due Jul 9, SIGMOD 2016: ACM SIGMOD Int. Conf. on Management of Data , San Francisco, CA, USA. June 26 - July 1, 2016
- Due Dec 18, 11th Int. Conf. on Machine Learning and Data Mining, MLDM 2016 , New York, NY, USA. July 17-19, 2016