- Data for Democracy: The First Two Months of D4D - Feb 20, 2017.
Let’s hear about how Data Science is used for democracy and well being of human societies by Data for Democracy organisation.
- More Data or Better Algorithms: The Sweet Spot - Jan 17, 2017.
We examine the sweet spot for data-driven Machine Learning companies, where is not too easy and not too hard to collect the needed data.
- Data Sources for Cool Data Science Projects - Dec 20, 2016.
One of the biggest obstacles to successful projects has been getting access to interesting data. Here are some more cool public data sources you can use for your next project.
- Largest Dataset Analyzed Poll shows surprising stability, more junior Data Scientists - Nov 8, 2016.
The majority (57%) of respondents only worked with Gigabyte range data. More junior Data Scientists enter the market, but Petabyte Big Data Scientists still stand apart.
- What is Academic Torrents and Where is Data Sharing Going? - Oct 26, 2016.
Learn more about Academic Torrents, a platform for researchers to share data consisting of a site where users can search for datasets, and a BitTorrent backbone which makes sharing data scalable and fast.
- New Poll: What was the largest dataset you analyzed / data mined? - Oct 22, 2016.
New KDnuggets Poll is asking: What was the largest dataset you analyzed / data mined? Please vote
- Data Science Basics: 3 Insights for Beginners - Sep 22, 2016.
For data science beginners, 3 elementary issues are given overview treatment: supervised vs. unsupervised learning, decision tree pruning, and training vs. testing datasets.
- 10 Data Acquisition Strategies for Startups - Jun 14, 2016.
An interesting discussion of the myriad methods in which startups may choose to acquire data, often the most overlooked and important aspect of a startup's success (or failure).
Pages: 1 2
- Top KDnuggets tweets, May 25-31: 19 Free eBooks to learn #programming with #Python; Awesome collection of public datasets on Github - Jun 1, 2016.
Introducing Hybrid lda2vec Algorithm via Stitch Fix; #DeepLearning and Deep #Gaussian Processes - explainer; Awesome collection of public #datasets on Github; #DataScience foundations: 19 Free eBooks to learn #programming with #Python.
- Top 10 Open Dataset Resources on Github - May 31, 2016.
The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike.
- Datasets Over Algorithms - May 3, 2016.
The average elapsed time between key algorithm proposals and corresponding advances is about 18 years; the average elapsed time between key dataset availabilities and corresponding advances is less than 3 years, 6 times faster.
- CrowdSignals.io, Building Big Mobile Social Sensor dataset - Mar 25, 2016.
CrowdSignals.io a crowdfunding campaign to generate the largest mobile and sensor dataset available to the Data Science community for use in research and product development.
- Interconnecting World Open Data Portals, Mar 8 Webinar - Feb 24, 2016.
Join OpenDataSoft for a web conference to contribute to building the next evolution of the List of 1600 Open Data portals worldwide, dubbed Open Data Inception by its creators.
- 9 Must-Have Datasets for Investigating Recommender Systems - Feb 11, 2016.
Gain some insight into a variety of useful datasets for recommender systems, including data descriptions, appropriate uses, and some practical comparison.
- Tour of Real-World Machine Learning Problems - Dec 26, 2015.
The tour lists 20 interesting real-world machine learning problems for data science enthusiasts to learn by solving.
- Poll Results: Where is Big Data? For most, Largest Dataset Analyzed is in laptop-size GB range - Aug 18, 2015.
A majority of data scientists (56%) work in Gigabyte dataset range. We note a small increase in Petabyte (web-scale) data miners, and a decline in Megabyte data miners. US, Australia/NZ, and Asia lead in percentage of Terabyte and Petabyte analysts.
- Interview: Andrew Duguay, Prevedere on Economic Intelligence from Integrating Public Datasets - Jul 30, 2015.
We discuss Analytics at Prevedere Software, understanding the impact of external factors on a company’s performance, features of in-memory correlation engine and economic intelligence by Prevedere.
- Additions to KDnuggets Directory in April - May 3, 2015.
20+ new meetings, including Smartcon (Istabul), Collab. Data Science, Boston Data Festival, SIGMOD 2016, ICDM 2016; Awesome public datasets; DecisionIQ, VisualText and more.
- KDnuggets™ News 15:n11, Apr 15: Big Data Predictive Analytics Gainers & Losers; Awesome Public Datasets - Apr 15, 2015.
Awesome Public Datasets on GitHub; Gold Mine or Blind Alley? Functional Programming for Machine Learning; Inside Deep Learning - Convolutional networks; KDnuggets Free Pass to Strata Hadoop World London.
- Top /r/MachineLearning Posts, Mar 29-Apr 4: Andrew Ng AMA, Deep Learning for NLP, and OpenCL Convnets - Apr 10, 2015.
Andrew Ng's upcoming AMA, scikit-learn updates, Richard Socher's Deep Learning NLP videos, Criteo's huge new dataset, and convolutional neural networks on OpenCL are the top topics discussed this week on /r/MachineLearning.
- Awesome Public Datasets on GitHub - Apr 6, 2015.
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
Pages: 1 2
- Interview: Anthony Bak, Ayasdi on Novel Insights using Topological Summaries - Jan 29, 2015.
We discuss examples of Topological Data Analysis (TDA) revealing new insights, recommended approach for creating Topological Summaries, Manual vs Automation approach and trends.
- Top /r/MachineLearning posts, Jan 11-17 - Jan 18, 2015.
SVMs, open source datasets, Bayesian decision theory, game AI, and deep learning visualizations are all featured in the past week's top /r/MachineLearning posts.
- SBP15 Grand Data Challenge - Dec 5, 2014.
Use social media analytics on public data to help analyze and explore social inequality and aid the disadvantaged in SBP15 Grand Data Challenge. Submissions due Jan 20.
- Free Urban Data – What’s It Good For? - Nov 1, 2014.
See how the increasing availability of free urban datasets that has come with more cities participating in free data programs can be applied to solve interesting problems in this Big Data article.
- TweetNLP: Twitter Natural Language Processing - Oct 24, 2014.
A short overview of Natural Language Processing tools and utilities developed by Prof. Noah Smith, CMU and his team to analyze Twitter data.
- Top KDnuggets tweets, Oct 17-19: Air traffic analyzed to predict Ebola spread; Cool public data for data science - Oct 20, 2014.
Air traffic data analyzed to predict Ebola spread; Some cool public data sources you can use for your next data science project; Data science can't be point and click ! Finding random correlation is too easy; Bayes Rule in an animated gif.
- Interactive Network and Graph Data Repository - Oct 17, 2014.
The network repository currently hosts over 500+ graphs/networks that span 19 collections of graphs from social science, machine learning, scientific computing, and many others.
- MOOC: “Process Mining: Data science in Action” - Sep 10, 2014.
This 6 week online course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains.
- Top KDnuggets tweets, Aug 13-14: Boyfriend as a statistically “significant” other - Aug 15, 2014.
xkcd: Boyfriend as a statistically "significant" other; Interesting Social Media Datasets; Sibyl: a System for Large Scale Machine Learning at Google; We don't need such hype: "Big Data scientists get 100 recruiter emails a day".
- Interesting Social Media Datasets - Aug 13, 2014.
Learn about some of the many interesting social media datasets available to you, some of which are quite new, and the different features and challenges they offer you for your next big data science project.
- Top KDnuggets tweets, May 30 – Jun 1: Guide to Setting Up an R-Hadoop ; 100+ Interesting Data Sets - Jun 2, 2014.
Tutorial: Step-by-Step Guide to Setting Up an R - #Hadoop System; 100+ Interesting Data Sets for Statistics (and Data Science); #BigData sets available for free - big list from Data Science Central ; Twitter to release all tweets to scientists - a research boon and an ethical dilemma.
- US Open Data Action Plan and Datasets - May 31, 2014.
We summarize the key findings in the recently released US Open Data Action Plan, highlighting the principles, commitments, datasets released and future outlook.
- Top KDnuggets tweets, Mar 21-23: Machine Learning in Parallel with SVM; Good Data Sets for Data Science Practice - Mar 24, 2014.
Machine Learning in Parallel with SVM, GLM; Good Data Sets for Data Science Practice: Big enough, requires data engineering, rich; Cartoon: Why Madame Zaza, Fortune Teller, changes to Predictive Analytics; Top 45 #BigData Tools and Platforms for Developers