KDnuggets™ News 15:n09, Mar 25: Deep Learning from Scratch; 10 steps to Kaggle Success; US CDS DJ Patil Cartoon
Deep Learning for Text Understanding from Scratch; New Poll: Computing platform; 10 Steps to Success in Kaggle Data; Cartoon: US Chief Data Scientist Most Difficult Challenge; SQL-like Query Language for Real-time Streaming Analytics.
Features | Software | Opinions | Interviews | News | Webcasts | Courses | Meetings | Jobs | Publications | Tweets | CFP | Quote
Features
- Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
Forget about the meaning of words, forget about grammar, forget about syntax, forget even the very concept of a word. Now let the machine learn everything by itself.
- New Poll: Computing platform for your analytics, data mining, data science work or research - Mar 14, 2015.
New KDnuggets Poll is asking: What computing platform you use for analytics, data mining, data science work or research? Please vote.
- 10 Steps to Success in Kaggle Data Science Competitions - Mar 11, 2015.
The author, ranked in top 10 in five Kaggle competitions, shares his 10 steps for success. These also apply to any well-defined predictive analytics or modeling problem with a closed dataset.
Cartoon: US Chief Data Scientist Most Difficult Challenge - Mar 13, 2015.
New KDnuggets cartoon looks at the most difficult challenge facing the first US Chief Data Scientist DJ Patil @dpatil.- SQL-like Query Language for Real-time Streaming Analytics - Mar 12, 2015.
We need SQL like query language for Realtime Streaming Analytics to be expressive, short, fast, define core operations that cover 90% of problems, and to be easy to follow and learn.
- Deep Learning, The Curse of Dimensionality, and Autoencoders - Mar 12, 2015.
Autoencoders are an extremely exciting new approach to unsupervised learning and for many machine learning tasks they have already surpassed the decades of progress made by researchers handpicking features.
- PAW: Learn the ways predictive analytics bolsters insurance - Mar 23, 2015.
Insurance relies greatly on predictive analytics - learn about advances and best practices at several PAW Business insurance-related sessions in San Francisco and Chicago.
- Machine Learning Table of Elements Decoded - Mar 11, 2015.
Machine learning packages for Python, Java, Big Data, Lua/JS/Clojure, Scala, C/C++, CV/NLP, and R/Julia are represented using a cute but ill-fitting metaphor of a periodic table. We extract the useful links.
- KDnuggets Free Pass to Big Data TechCon How-To Conference, Apr 26-28, Boston - Mar 15, 2015.
Win a free KDnuggets Pass for Big Data TechCon in Boston - the conference to learn HOW-TO master and analyze Big Data. Learn Hadoop, Spark, Yarn, HBase, R, and Hive from the smartest, hardest-working faculty.
- Do We Need More Training Data or More Complex Models? - Mar 23, 2015.
Do we need more training data? Which models will suffer from performance saturation as data grows large? Do we need larger models or more complicated models, and what is the difference?
Software (see also All Software )
- Automatic Statistician is here: Dr. Mo - Mar 21, 2015.
Dr. Mo, Automatic Statistician is here! Using Artificial Intelligence, self-learning algorithm, multimodel technology Dr. Mo achieves Super Accuracy and Speed. Simple use and simple output for non-statisticians.
Opinions (see also All Opinions for this month )
- Data science done well looks easy, which is a big problem - Mar 24, 2015.
Data Science done well looks too easy and that poses a major public relations problem for serious data scientists. The really tricky twist is that bad data science looks easy too. - Top 10 UK Big Data Professionals - Mar 23, 2015.
The top 10 Big Data Professionals in the UK include CEOs, journalists, an Information Commissioner, and Analytics leaders from leading companies and organizations. - 5 Lessons from a Data Science Chat - Mar 19, 2015.
Data science applications, key challenges, appropriate skills and more - key takeaways from a data science Tweet chat. - Small Data requires Specialized Deep Learning and Yann LeCun response - Mar 19, 2015.
For industries that have relatively small data sets (less than a petabyte), a Specialized Deep Learning approach based on unsupervised learning and domain knowledge is needed. - Report - MLconf: what industry leaders say about machine learning - Mar 14, 2015.
MLconf hosted in 4 different cities, NYC, Seattle, Atlanta and San Francisco with speakers from big, established companies and from emerging startups, bringing more ideas and experience into the game.
Interviews (see also All Interviews for this month )
- Interview: Beena Ammanath, GE on Data Science - It's Not Just Science! - Mar 24, 2015.
We discuss benefits and challenges of Data Lake, trends, life lessons, motivation, desired skills, and more. - Interview: Beena Ammanath, GE on the Industrial Internet for Data-driven Innovation - Mar 23, 2015.
We discuss the role of Analytics at GE, Industrial Internet and how it is different from consumer internet, and the key capabilities of Predix. - Interview: Brad Klingenberg, StitchFix on Decoding Fashion through Analytics and ML - Mar 21, 2015.
We discuss the challenges in making personal styling recommendations, unexpected insights, interesting trends, motivation, advice, desired qualities in data scientists and more. - Interview: Brad Klingenberg, StitchFix on Building Analytics-powered Personal Stylist - Mar 20, 2015.
We discuss StitchFix, how it leverages Analytics, understanding customer preferences, and pros-and-cons of involving human judgement in the recommendation process. - Interview: Vince Darley, King.com on What do you need to become Top Grossing Game - Mar 19, 2015.
We discuss common characteristics of games that achieved top ranking, career advice, trends, desired qualities in data scientists and more. - Interview: Vince Darley, King.com on the Serious Analytics behind Casual Gaming - Mar 18, 2015.
We discuss key characteristics of social gaming data, ML use cases at King, infrastructure challenges, major problems with A-B testing and recommendations to resolve them. - Interview: Dave McCrory, Basho on Why Data Gravity Cannot be Ignored in Architecture Design - Mar 17, 2015.
We discuss data gravity and its implications, Riak Enterprise 2.0, Riak CS 1.5, competitive landscape, challenges and more. - Interview: Dave McCrory, Basho on Distributed Database Needs of a Future Enterprise - Mar 16, 2015.
We discuss the future of distributed storage for enterprise, Scale-up vs. Scale-out, software design patterns in Cloud era, microservices model and the place for legacy database in modern enterprise IT. - Interview: Kenneth Viciana, Equifax on Data Governance - Red Tape or Catalyst? - Mar 14, 2015.
We discuss recommendations for Data Governance policies, advice, Big Data trends, qualities sought in Data Scientists, and more. - Interview: Kenneth Viciana, Equifax on Data Lake & Other Strategies for Insights Culture - Mar 13, 2015.
We discuss the responsibilities of Enterprise Data Strategy team at Equifax, why Data Lake, Equifax Decision360, how to set up Insights Culture and bottlenecks for value delivery from Big Data. - Interview: Josh Hemann, Activision on Why the Tolerance for Ambiguity is Vital - Mar 12, 2015.
We discuss handling bias in data, other data quality concerns, advice, desired qualities, and more. - Interview: Josh Hemann, Activision on Taming the Beast of Gaming Big Data - Mar 11, 2015.
We discuss Analytics challenges at Activision, event data from games such as Call of Duty, balancing aesthetics and inference in visualization, problem with stacked charts and more.
News (see also All News )
- 2015 SIGKDD Data Science/Data Mining PhD Dissertation Award - Nominations due Apr 30 - Mar 21, 2015.
This annual award by ACM SIGKDD seeks to recognize outstanding research by doctoral candidates in the field of data mining, data science, and knowledge discovery. Nominations due Apr 30. - Top /r/MachineLearning Posts, Mar 8-14: Word vectors, Hardware for Deep Learning, and Neural Graphics Engines - Mar 19, 2015.
Word vectors in NLP, Machine Learning's place in programming, hardware for deep learning, Machine Learning interviews, and neural graphics engines are all topics covered this week on /r/MachineLearning. - Ontotext Introduces the S4 Developer Challenge - Mar 17, 2015.
The challenge will award a cash prize to developers that write the most interesting demo, application or show case utilizing the S4 capabilities for text analytics, linked data and knowledge graphs. Submissions due Mar 31. - Participate in the Rexer Analytics 2015 Data Miner Survey - Mar 14, 2015.
Data Analysts, Predictive Modelers, Data Scientists, Data Miners, and all other types of analytic professionals, students, and academics - please participate in the Rexer Analytics 2015 Data Miner Survey. - Feb 2015 Analytics, Big Data, Data Mining Acquisitions and Startups Activity - Mar 12, 2015.
Feb 2015 acquisitions, startups, and company activity in Analytics, Big Data, Data Mining, and Data Science: @Kaggle cuts 1/3 of staff, Infosys buys Panaya, RapidMiner gets $15M, Palantir buys Fancy That, Hitachi buys Pentaho, and more. - Top stories for Mar 15-21: Deep Learning for Text Understanding from Scratch; White House on Big Data and Differential Pricing - Mar 22, 2015.
Deep Learning for Text Understanding from Scratch; 7 common mistakes when doing Machine Learning; White House report on Big Data and Differential Pricing; Why Data Gravity Cannot be Ignored. - Top stories for Mar 8-14: 7 common Machine Learning mistakes; Deep Learning for Text Understanding from Scratch - Mar 15, 2015.
7 common mistakes when doing Machine Learning; Deep Learning for Text Understanding from Scratch; SQL-like Query Language for Real-time Streaming Analytics; 10 Steps to Success in Kaggle Data Science Competitions.
Webcasts and Webinars (see also All Webcasts and Webinars )
- Upcoming Webcasts on Analytics, Big Data, Data Science - Mar 24 and beyond - Mar 24, 2015.
Addressing the Challenges of Data Variety, Semantic Publishing, How to scale faster with NoSQL, Data Mining - Failure to Launch, Disrupting Traditional Analyst Workflows, and more. - Ontotext Webinar: Semantic Publishing, Enhancing Content and Engagement, Mar 26 - Mar 17, 2015.
Ontotext will show how news and media publishers can use semantic publishing technology to more efficiently generate content while increasing audience engagement through personalization and recommendations.
Courses (see also All Courses )
- TMA Predictive Analytics Data Mining Training [Wash. DC, May | Toronto, Aug] - Mar 24, 2015.
Successful analytics in the big data era does not start with data and software, but with hands-on, immersive training and goal-driven strategy - get it from The Modeling Agency in Washington DC (May), Toronto (Aug) - PACE Data Mining Bootcamps, San Diego, April - Mar 17, 2015.
Every class is taught by SDSC experienced data scientists, delivering practical, hands-on training in an intimate classroom setting limited to 25 participants. Early bird till Mar 31. - NYC Data Science Courses, Bootcamps, Meetups - Mar 17, 2015.
NYC Data Science Academy spring schedule includes 3 classes, 3 Meetups, 7 bootcamp events on Data Science, R, Python, Machine Learning, scikit-learn, and related topics. - Coursera: Process Mining: Data science in Action, April 2015 - Mar 14, 2015.
Due to the big success of the first run, this 6 week online course is repeated on Coursera, starting April 1. This free course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains. - Simplilearn Big Data and Analytics Courses - CAREER30 - Mar 12, 2015.
Get Big Data and Analytics certification - a big plus for your career - with Simplilearn courses on Analytics, Big Data, Hadoop, SAS, R, Cloud Computing, and more, now at 30% discounted prices until Mar 30.
Meetings (see also All Meetings )
- PASS Business Analytics Conference, Santa Clara, April 20-22 - Mar 24, 2015.
Aimed at business and data analytics professionals, it brings a lineup of world-class analytics speakers, fresh insights, compelling content and powerful networking. KDnuggets discount. - Catch the Wave at Analytics 2015, the Leading Analytics, Big Data Conference, Huntington Beach, April 12-14 - Mar 23, 2015.
Analytics 2015 will go beyond the typical "buzz" about Big Data and the cloud, providing unique opportunities to learn about potential analytics applications to the Internet of Things, as well as practical implementations of cognitive computing, unstructured data analytics, and real-time decisions based on streaming data. - GBDC: Real-Time Big Data Developer (focus on Spark, Storm, Flink, Kafka), Santa Clara, Apr 23-24 - Mar 23, 2015.
A fast paced, vendor agnostic, technical overview of the Apache Spark landscape, with technical sessions, use cases and hands-on sessions. Get KDnuggets discount. - PAW: Learn the ways predictive analytics bolsters insurance - Mar 23, 2015.
Insurance relies greatly on predictive analytics - learn about advances and best practices at several PAW Business insurance-related sessions in San Francisco and Chicago. - useR 2015 - R User conference, Aalborg, Denmark, June 30 - July 3 - Mar 20, 2015.
The open source R language is a leading tool for data scientists. Attend useR! conference, the main annual event of the R community, June 30 - July 3, in Aalborg, Denmark. - IEEE ICDM 2015 Call for Papers, Workshops, Contest proposals, demos, and tutorials - Mar 16, 2015.
ICDM '15: The 15th IEEE International Conference on Data Mining, a leading research conference in the field, calls for workshop proposals, contest proposals, papers, demo proposals, and tutorial proposals. Conference dates: Nov 14-17, Atlantic City, NJ, USA. - Strata + Hadoop World London, 5-7 May 2015 - Mar 13, 2015.
Strata + Hadoop World has been called "mind-blowing", "an amazing event", "the most interesting and informative conference". See for yourself in London and get a special KDnuggets discount. - Wharton Successful Applications of Customer Analytics Conference, Apr 30, Philadelphia - Mar 11, 2015.
Wharton Customer Analytics Initiative (WCAI) helps define Customer Analytics, with conference dedicated to real-world applications that balance high-level rigor and business know-how. Case studies include Nielsen, Google, Cablevision, and MLB.
Jobs (see also All Jobs )
- NPD: Head of Global Validation & Input - Mar 20, 2015.
Initial focus on standardizing and developing efficient processes to validate data from input through client delivery. - NPD: Head of Global Data Classification (Data Scientist) - Mar 20, 2015.
Run our Global Data Classification organization with an initial focus on standardizing processes and developing productivity measurements. - Brandman University: Assistant Director of Marketing Analytics - Mar 19, 2015.
Building and maintaining the intelligence infrastructure as well as predictive analytics behind all marketing campaigns - Blue Apron: Director of Business Intelligence - Mar 16, 2015.
Lead the data team in New York City. Since launching in 2012, the business has grown dramatically to deliver over million meals nationwide each month. - D-Wave Systems (Quantum Computing): Machine Learning Researcher - Mar 14, 2015.
The science fiction future is here. Help design and implement novel machine learning and deep learning algorithms that leverage the power of the D-Wave quantum computer.
Publications
- White House report on Big Data and Differential Pricing - Mar 14, 2015.
White House report examines how companies are using big data and analytics to charge different prices to different customers (price discrimination), looks at both benefits and risks, and concludes that many concerns can be addressed by existing anti-discrimination and consumer protection laws. - Top Big Data influencers of 2014, according to HadoopSphere - Mar 13, 2015.
Top big data influencers of 2014 include analysts Mike Gualtieri and Curt Monash, IBM and TDWI media, Spark and Scala products, Ben Lorica @bigdata and Gregory Piatetsky @kdnuggets on social media, Data Collective and AngelList co-founder.
Top Tweets (see also All top tweets for this month )
- Top KDnuggets tweets, Mar 19-22 - Mar 23, 2015.
Tensor methods for #MachineLearning: fast, accurate, scalable, need open-source libs; #DataScience and Reproducibility: Explaining when the experiment does not work; Google #DeepLearning FaceNet is the best ever for recognizing faces; Tibco survey #BigData top use cases: Customer & Experience Analytics, Risk/Threat. - Top KDnuggets tweets, Mar 16-18 - Mar 19, 2015.
Also Sirius - a free, open-source version of Siri;
#PI art: the first 13,689 digits of pi;
Great tutorial + #Python code: 1-Layer Neural Networks. - Top KDnuggets tweets, Mar 12-15 - Mar 16, 2015.
Cartoon: top challenge for US Chief Data Scientist DJ Patil; In-depth intro to #MachineLearning, #Statistics, R: 15 hours of videos; Amazing! Forget coding word meaning, grammar, syntax - now #DeepLearning can learn everything. - Top KDnuggets tweets, Mar 09-11 - Mar 12, 2015.
Comprehensive learning path from noob to Kaggler in Python;
10 steps for success in Kaggle competitions;
Machine learning packages #Python #Java #BigData #Lua #Clojure #Scala, R;
Very useful LeaRning Path on R - Step by Step Guide.
CFP - Calls for Papers (see also All Calls for Papers )
- Due Mar 29, Contest proposals, IEEE ICDM 2015 Int. Conf. on Data Mining , Atlantic City, NJ, USA. Nov 14-17
- Due Apr 17, Proposals for WSDM Cup - WSDM 2016, Ninth ACM Int. Conf. on Web Search and Data Mining , San Francisco Bay Area, CA, USA. February 2016
- Due Apr 30, 2015 SIGKDD Doctoral Dissertation Award , Presented at KDD 2015: Sydney, Australia. Aug 10-13, 2015
- Due May 1, Workshop on Law and Big Data: Empirical and Data-Centric Techniques for Legal, Judicial, and Administrative Systems , at 15th Int. Conf. on Artificial Intelligence and Law (ICAIL 2015), San Diego, CA, USA. Jun 12, 2015
- Due May 8, The Fourteenth Int. Symposium on Intelligent Data Analysis (IDA 2015) , Saint-Etienne, France. Oct 22-24, 2015.
- Due May 8, Horizon presentations for for early-stage research of potentially ground-breaking nature, at IDA 2015 , Saint-Etienne, France. Oct 22-24, 2015.
- Due May 18, IEEE 2015 Int. Conf. on Data Science and Advanced Analytics (DSAA'2015) , Paris, France. 19-21 Oct 2015.
- Due Jun 3, Papers, IEEE ICDM 2015 Int. Conf. on Data Mining , Atlantic City, NJ, USA. Nov 14-17
- Due Jul 1, 2015 IEEE Int. Conf. on Big Data (IEEE Big Data 2015) , Santa Clara, CA, USA. Oct 29 - Nov 1, 2015
- Due Jul 1, 4th Int. Conf. on Big Data Analytics , Hyderabad, India. Dec 15-18, 2015
- Due Jul 3, Demos, IEEE ICDM 2015 Int. Conf. on Data Mining , Atlantic City, NJ, USA. Nov 14-17
- Due Aug 1, Tutorial Proposals, IEEE ICDM 2015 Int. Conf. on Data Mining , Atlantic City, NJ, USA. Nov 14-17, 2015
Quote
This is you, these are bureaucratic and legal obstacles, and that is the government data you want. Welcome aboard, US Chief Data Scientist DJ Patil. Cartoon: US Chief Data Scientist Most Difficult ChallengeTop Stories Past 30 Days
|
|