- Dealing with Data Leakage - Oct 8, 2021.
Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.
Cross-validation, Data Science, Datasets, Machine Learning, Modeling, Training Data
- KDnuggets™ News 21:n38, Oct 6: Build a Strong Data Science Portfolio; Surpassing Trillion Parameters with Switch Transformers — a path to AGI? - Oct 6, 2021.
How to Build Strong Data Science Portfolio as a Beginner; Surpassing Trillion Parameters and GPT-3 with Switch Transformers — a path to AGI?; How Deep Is That Data Lake?; Data Science Process Lifecycle; Use These Unique Data Sets to Sharpen Your Data Science Skills; How to Auto-Detect the Date/Datetime Columns and Set Their Datatype When Reading a CSV File in Pandas
AGI, AI, Data Processing, Data Science, Data Science Process, Datasets, Pandas, Portfolio, Transformer
- Use These Unique Data Sets to Sharpen Your Data Science Skills - Sep 29, 2021.
Want to get your hands on some real-world data sets right now? Kick off your bootcamp prep with this list of hot-button data sets curated to help you hone different data science skills.
Data Science Skills, Datasets
- Don’t Touch a Dataset Without Asking These 10 Questions - Sep 20, 2021.
Selecting the right dataset is critical for the success of your AI project.
Datasets, Distribution, Outliers, Privacy, Standardization
- 3 Data Acquisition, Annotation, and Augmentation Tools - Aug 27, 2021.
Check out these 3 projects found around GitHub that can help with your data acquisition, annotation, and augmentation tasks.
Computer Vision, Data Annotation, Data Labeling, Datasets, GitHub, NLP, Synthetic Data
- KDnuggets™ News 21:n32, Aug 25: Open Source Datasets for Computer Vision; Django’s 9 Most Common Applications - Aug 25, 2021.
Open Source Datasets for Computer Vision; Django’s 9 Most Common Applications; How to Select an Initial Model for your Data Science Problem; Automate Microsoft Excel and Word Using Python; Stack Overflow Survey Data Science Highlights
Computer Vision, Datasets, Django, Microsoft, Modeling, Open Source, Python, StackOverflow
Open Source Datasets for Computer Vision - Aug 18, 2021.
Access to high-quality, noise-free, large-scale datasets is crucial for training complex deep neural network models for computer vision applications. Many open-source datasets are developed for use in image classification, pose estimation, image captioning, autonomous driving, and object segmentation. These datasets must be paired with the appropriate hardware and benchmarking strategies to optimize performance.
Computer Vision, Datasets, Open Source
- eBook: How to use third-party data to make smarter decisions - Jul 7, 2021.
Get yourself a copy of this eBook and learn how to use third-party data to make smarter decisions.
AWS, Datasets, ebook
- Using External Data to Accelerate Business in a Post-Vaccinated World - Jun 21, 2021.
Join this webinar, Jun 24, 2021, to learn how companies are developing insights to better prepare for growth opportunities, improve business performance and mitigate risk in a post-pandemic economy.
AWS, Business, Datasets, Webinar
- The Data Matters: Choosing the right data to analyze can make or break your analysis - Jun 15, 2021.
We started Nomad Data to help data scientists and business analysts quickly find the right commercial datasets to match their specific use case. We catalog use cases of data and use machine learning and AI to match analysis goals with datasets.
Consumer Analytics, Datasets, Geospatial
- 9 Deadly Sins of Machine Learning Dataset Selection - Jun 11, 2021.
Avoid endless pain in model debugging by focusing on datasets upfront.
Datasets, Machine Learning
- Great New Resource for Natural Language Processing Research and Applications - May 27, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
Datasets, NLP, Research
- Awesome list of datasets in 100+ categories - May 20, 2021.
With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.
Big Data, Data Science, Datasets
- Introducing The NLP Index - Apr 29, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
Datasets, NLP, Research
- Great News for KDnuggets subscribers! You now have access to the WorldData.AI Partners Plan at no cost - Mar 29, 2021.
Great News for KDnuggets subscribers! You now have access to the WorldData.AI Partners Plan at no cost, including access to some of the premium datasets only available to enterprise members. Connect your data to many of 3.5 Billion WorldData datasets and improve your Data Science and Machine Learning models! Subscribe to KDnuggets to get access.
About KDnuggets, Datasets, Geospatial, Sentiment Analysis, WorldData.AI
- 8 Places for Data Professionals to Find Datasets - Dec 17, 2020.
Here is a curated list of sites and resources invaluable for data professionals to acquire practice datasets.
Data Science, Datasets, Google, Government, Kaggle, Reddit, UCI
Top Google AI, Machine Learning Tools for Everyone - Aug 18, 2020.
Google is much more than a search company. Learn about all the tools they are developing to help turn your ideas into reality through Google AI.
AI, AutoML, Bias, Data Science Platforms, Datasets, Google, Google Cloud, Google Colab, Machine Learning, TensorFlow
The List of Top 10 Lists in Data Science - Aug 14, 2020.
The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field.
Algorithms, Data Science, Data Science Skills, Datasets, Influencers, LinkedIn, Python, Top 10
- New Poll: What was the largest dataset you analyzed / data mined? - Jun 9, 2020.
Take part in KDnuggets latest survey to have your voice heard, and let the community know what the largest dataset size you have worked with is.
Big Data, Datasets, Largest, Poll
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
Datasets, Python, scikit-learn, Training Data, Validation
- Data context and how to get started with understanding COVID-19 data - Apr 22, 2020.
If you are already applying your Data Science skills or getting ready to contribute to analyzing COVID-19 data, then be sure to take sufficient time to appreciate the context of the numbers to focus on what's most important as we collaborate on this global battle.
Coronavirus, COVID-19, Data.world, Datasets
- 3 Best Sites to Find Datasets for your Data Science Projects - Apr 9, 2020.
When first learning data science, you will inevitably find yourself looking for more datasets to practice with. Here, we recommend the 3 best sites to find datasets to spark your next data science project.
Coronavirus, Data, Data Science, Datasets, Kaggle
10 Must-read Machine Learning Articles (March 2020) - Apr 9, 2020.
This list will feature some of the recent work and discoveries happening in machine learning, as well as guides and resources for both beginner and intermediate data scientists.
AI, API, Cloud, Data Analytics, Datasets, fast.ai, Machine Learning, Neural Networks, Social Media
- 21 Machine Learning Projects – Datasets Included - Mar 9, 2020.
Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.
Chatbot, Datasets, Google Trends, Machine Learning, Project, Uber
The Big Bad NLP Database: Access Nearly 300 Datasets - Feb 28, 2020.
Check out this database of nearly 300 freely-accessible NLP datasets, curated from around the internet.
Datasets, NLP, Text Mining
- Passive Data Collection and Actionable Results: What to Know - Feb 21, 2020.
There are plenty of ways to get actionable results by using passive data. However, such an outcome will not happen without careful forethought. Data analysts must consider several crucial specifics, including what questions they want and expect the information to answer, and how they'll apply the findings to aid the business.
Analytics, Customer Analytics, Data Curation, Datasets
- Google Dataset Search Provides Access to 25 Million Datasets - Jan 29, 2020.
Google's dataset search is out of beta, and provides centralized access to 25 million datasets.
Data Science, Datasets, Google, Search
- The 5 Most Useful Techniques to Handle Imbalanced Datasets - Jan 22, 2020.
This post is about explaining the various techniques you can use to handle imbalanced datasets.
Balancing Classes, Datasets, Metrics, Python, Sampling, Unbalanced
- What is Data Catalog and Why You Should Care? - Dec 23, 2019.
Learn why data catalogs could be just the thing you need to meet the challenges of data and metadata management and collaboration.
Compliance, Consistency, Data Catalog, Data Governance, Datasets, Metadata, Reddit
- Data Sources 101 - Oct 28, 2019.
Data collection is one of the first steps of the data lifecycle — you need to get all the data you require in the first place. To collect the right data, you need to know where to find it and determine the effort involved in collecting it. This article answers the most basic question: where does all the data you need (or might need) come from?
Big Data, Data Science, Datasets, Unstructured data
- Know Your Data: Part 2 - Oct 8, 2019.
To build an effective learning model, it is must to understand the quality issues exist in data & how to detect and deal with it. In general, data quality issues are categories in four major sets.
Beginners, Data Preparation, Data Preprocessing, Datasets
- Training a Machine Learning Engineer - Oct 3, 2019.
There is no clear outline on how to study Machine Learning/Deep Learning due to which many individuals apply all the possible algorithms that they have heard of and hope that one of implemented algorithms work for their problem in hand. Below, I've listed out some of the steps that one should adopt while solving a machine learning problem.
Architecture, Datasets, Machine Learning, Machine Learning Engineer
Know Your Data: Part 1 - Sep 30, 2019.
This article will introduce the different type of data sets, data object and attributes.
Beginners, Datasets
- Version Control for Data Science: Tracking Machine Learning Models and Datasets - Sep 13, 2019.
I am a Git god, why do I need another version control system for Machine Learning Projects?
Data Science, Datasets, Machine Learning, Modeling, Version Control
- How to Automate Tasks on GitHub With Machine Learning for Fun and Profit - May 3, 2019.
Check this tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.
Datasets, GitHub, Python, TensorFlow
- Synthetic Data Generation: A must-have skill for new data scientists - Dec 27, 2018.
A brief rundown of methods/packages/ideas to generate synthetic data for self-driven data science projects and deep diving into machine learning methods.
Pages: 1 2
Classification, Clustering, Datasets, Machine Learning, Python, Synthetic Data
- Handling Imbalanced Datasets in Deep Learning - Dec 4, 2018.
It’s important to understand why we should do it so that we can be sure it’s a valuable investment. Class balancing techniques are only really necessary when we actually care about the minority classes.
Balancing Classes, Datasets, Deep Learning, Keras, Python
- Machine Learning Classification: A Dataset-based Pictorial - Nov 5, 2018.
In order to relate machine learning classification to the practical, let's see how this concept plays out, step by step (and with images), specifically in direct relation to a dataset.
Datasets, Machine Learning, Supervised Learning
- New Poll: What was the largest dataset you analyzed / data mined? - Oct 12, 2018.
New KDnuggets Poll is asking: What was the largest dataset you analyzed / data mined? Please vote and we will analyze the trends and publish the results.
Big Data, Datasets, Largest, Poll
- Semantic Interoperability: Are you training your AI by mixing data sources that look the same but aren’t? - Oct 9, 2018.
Semantic interoperability is a challenge in AI systems, especially since data has become increasingly more complex. The other issue is that semantic interoperability may be compromised when people use the same system differently.
AI, Datasets, Healthcare, Semantic Analysis
- Introducing VisualData: A Search Engine for Computer Vision Datasets - Sep 26, 2018.
Instead of building your own dataset, there already exists a rich collection of computer vision datasets contributed by academic researchers, hobbyists and companies.
Computer Vision, Datasets
- Announcing Microsoft Research Open Data, a cloud hosted platform for sharing datasets - Jun 28, 2018.
Microsoft announces Microsoft Research Open Data, datasets representing many years of data curation and research efforts by Microsoft that were published as research outcomes.
Datasets, Microsoft, Microsoft Research, Research
- How (dis)similar are my train and test data? - Jun 7, 2018.
This articles examines a scenario where your machine learning model can fail.
Data Science, Datasets, Feature Selection, Machine Learning, Training Data
- Human Involvement Helps Researchers Perfect New Algorithms to Train Robots - Mar 22, 2018.
Many underestimate the role of humans in successful deployment of AI solutions. Alegion engine produces AI training data and enables content moderation, sentiment analysis, data enrichment, tagging, categorization, and more.
AI, Alegion, Datasets, Humans vs Machines
- Training Sets, Test Sets, and 10-fold Cross-validation - Jan 9, 2018.
More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.
Cross-validation, Data Mining, Datasets, Machine Learning
70 Amazing Free Data Sources You Should Know - Dec 20, 2017.
70 free data sources for 2017 on government, crime, health, financial and economic data, marketing and social media, journalism and media, real estate, company directory and review, and more to start working on your data projects.
Big Data, Business, Crime, Datasets, Finance, Government, Health, Journalism, Octoparse, Social Media
- How (and Why) to Create a Good Validation Set - Nov 24, 2017.
The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.
Cross-validation, Datasets, Rachel Thomas, Training Data, Validation
- Building a Wikipedia Text Corpus for Natural Language Processing - Nov 23, 2017.
Wikipedia is a rich source of well-organized textual data, and a vast collection of knowledge. What we will do here is build a corpus from the set of English Wikipedia articles, which is freely and conveniently available online.
Datasets, Natural Language Processing, NLP, Text Mining, Wikidata, Wikipedia
5 Machine Learning Projects You Can No Longer Overlook – Episode VI - Sep 20, 2017.
Deep learning, data preparation, data visualization, oh my! Check out the latest installation of '5 Machine Learning Projects You Can No Longer Overlook' for insight on... well, what machine learning projects you can no longer overlook.
Data Visualization, Datasets, Deep Learning, Javascript, Machine Learning, Netflix, Overlook, Python, Spark
- The new Enigma Public – the platform connecting people to data - Sep 11, 2017.
Public data has tremendous potential and different people can use it to solve variety of problems. Enigma relaunches Enigma Public — the platform connecting people to data.
Datasets, Government, Healthcare, Social Good
- Interesting Things Learned as a Student of Machine Learning - Jun 29, 2017.
Did you ever learn something you didn't really want to? The path to machine learning mastery is paved with such collateral knowledge. Here are a few examples of such information I have gleaned while trekking away.
Datasets, Humor, Machine Learning
- Data for Democracy: The First Two Months of D4D - Feb 20, 2017.
Let’s hear about how Data Science is used for democracy and well being of human societies by Data for Democracy organisation.
Datasets, Elections, Healthcare, Politics
- More Data or Better Algorithms: The Sweet Spot - Jan 17, 2017.
We examine the sweet spot for data-driven Machine Learning companies, where is not too easy and not too hard to collect the needed data.
Algorithms, Big Data, Data, Datasets, Machine Learning
- Data Sources for Cool Data Science Projects - Dec 20, 2016.
One of the biggest obstacles to successful projects has been getting access to interesting data. Here are some more cool public data sources you can use for your next project.
Data Incubator, Datasets, Elections, Healthcare, Michael Li
- Largest Dataset Analyzed Poll shows surprising stability, more junior Data Scientists - Nov 8, 2016.
The majority (57%) of respondents only worked with Gigabyte range data. More junior Data Scientists enter the market, but Petabyte Big Data Scientists still stand apart.
Asia, Big Data, Datasets, Europe, Largest, Poll, USA
- What is Academic Torrents and Where is Data Sharing Going? - Oct 26, 2016.
Learn more about Academic Torrents, a platform for researchers to share data consisting of a site where users can search for datasets, and a BitTorrent backbone which makes sharing data scalable and fast.
Datasets, Reproducibility, Research
- New Poll: What was the largest dataset you analyzed / data mined? - Oct 22, 2016.
New KDnuggets Poll is asking: What was the largest dataset you analyzed / data mined? Please vote
Big Data, Datasets, Largest, Poll
- Data Science Basics: 3 Insights for Beginners - Sep 22, 2016.
For data science beginners, 3 elementary issues are given overview treatment: supervised vs. unsupervised learning, decision tree pruning, and training vs. testing datasets.
Algorithms, Beginners, Datasets, Overfitting, Supervised Learning, Unsupervised Learning
- 10 Data Acquisition Strategies for Startups - Jun 14, 2016.
An interesting discussion of the myriad methods in which startups may choose to acquire data, often the most overlooked and important aspect of a startup's success (or failure).
Pages: 1 2
Acquisitions, Crowdsourcing, Datasets, Startups
- Top KDnuggets tweets, May 25-31: 19 Free eBooks to learn #programming with #Python; Awesome collection of public datasets on Github - Jun 1, 2016.
Introducing Hybrid lda2vec Algorithm via Stitch Fix; #DeepLearning and Deep #Gaussian Processes - explainer; Awesome collection of public #datasets on Github; #DataScience foundations: 19 Free eBooks to learn #programming with #Python.
Datasets, Free ebook, GitHub, Python, Top tweets
- Top 10 Open Dataset Resources on Github - May 31, 2016.
The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike.
Datasets, GitHub, Machine Learning, Open Data
- Datasets Over Algorithms - May 3, 2016.
The average elapsed time between key algorithm proposals and corresponding advances is about 18 years; the average elapsed time between key dataset availabilities and corresponding advances is less than 3 years, 6 times faster.
Algorithms, Datasets
- CrowdSignals.io, Building Big Mobile Social Sensor dataset - Mar 25, 2016.
CrowdSignals.io a crowdfunding campaign to generate the largest mobile and sensor dataset available to the Data Science community for use in research and product development.
Big Data, Crowdsourcing, Datasets, IoT, Mobile, Sensors
- Interconnecting World Open Data Portals, Mar 8 Webinar - Feb 24, 2016.
Join OpenDataSoft for a web conference to contribute to building the next evolution of the List of 1600 Open Data portals worldwide, dubbed Open Data Inception by its creators.
Datasets, Open Data, Webinar
- 9 Must-Have Datasets for Investigating Recommender Systems - Feb 11, 2016.
Gain some insight into a variety of useful datasets for recommender systems, including data descriptions, appropriate uses, and some practical comparison.
Datasets, Lab41, Recommender Systems
- Tour of Real-World Machine Learning Problems - Dec 26, 2015.
The tour lists 20 interesting real-world machine learning problems for data science enthusiasts to learn by solving.
Datasets, Kaggle, Learning from Data, Machine Learning, Research, UCI
- Poll Results: Where is Big Data? For most, Largest Dataset Analyzed is in laptop-size GB range - Aug 18, 2015.
A majority of data scientists (56%) work in Gigabyte dataset range. We note a small increase in Petabyte (web-scale) data miners, and a decline in Megabyte data miners. US, Australia/NZ, and Asia lead in percentage of Terabyte and Petabyte analysts.
Asia, Australia, Big Data, Datasets, Europe, Largest, Poll, USA
- Interview: Andrew Duguay, Prevedere on Economic Intelligence from Integrating Public Datasets - Jul 30, 2015.
We discuss Analytics at Prevedere Software, understanding the impact of external factors on a company’s performance, features of in-memory correlation engine and economic intelligence by Prevedere.
Andrew Duguay, Datasets, Economics, In-Memory Computing, Interview, Performance, Prevedere, Use Cases
- Additions to KDnuggets Directory in April - May 3, 2015.
20+ new meetings, including Smartcon (Istabul), Collab. Data Science, Boston Data Festival, SIGMOD 2016, ICDM 2016; Awesome public datasets; DecisionIQ, VisualText and more.
CA, Datasets, San Francisco
- KDnuggets™ News 15:n11, Apr 15: Big Data Predictive Analytics Gainers & Losers; Awesome Public Datasets - Apr 15, 2015.
Awesome Public Datasets on GitHub; Gold Mine or Blind Alley? Functional Programming for Machine Learning; Inside Deep Learning - Convolutional networks; KDnuggets Free Pass to Strata Hadoop World London.
Crowdsourcing, Datasets, Deep Learning, Free Pass, Marketplace, Neural Networks, Predictive Analytics, Strata
- Top /r/MachineLearning Posts, Mar 29-Apr 4: Andrew Ng AMA, Deep Learning for NLP, and OpenCL Convnets - Apr 10, 2015.
Andrew Ng's upcoming AMA, scikit-learn updates, Richard Socher's Deep Learning NLP videos, Criteo's huge new dataset, and convolutional neural networks on OpenCL are the top topics discussed this week on /r/MachineLearning.
Andrew Ng, Convolutional Neural Networks, Datasets, Deep Learning, NLP, Python, Reddit, scikit-learn
- Awesome Public Datasets on GitHub - Apr 6, 2015.
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
Pages: 1 2
Datasets, Finance, GitHub, Government, Machine Learning, NLP, Open Data, Time series data
- Interview: Anthony Bak, Ayasdi on Novel Insights using Topological Summaries - Jan 29, 2015.
We discuss examples of Topological Data Analysis (TDA) revealing new insights, recommended approach for creating Topological Summaries, Manual vs Automation approach and trends.
Anthony Bak, Automating, Ayasdi, Datasets, Success, TDA, Topological Data Analysis, Topology, Trends
- Top /r/MachineLearning posts, Jan 11-17 - Jan 18, 2015.
SVMs, open source datasets, Bayesian decision theory, game AI, and deep learning visualizations are all featured in the past week's top /r/MachineLearning posts.
AI, Bayesian, Datasets, Deep Learning, Games, Grant Marshall, Machine Learning, Open Source, Reddit, SVM, Visualization
- SBP15 Grand Data Challenge - Dec 5, 2014.
Use social media analytics on public data to help analyze and explore social inequality and aid the disadvantaged in SBP15 Grand Data Challenge. Submissions due Jan 20.
Challenge, Conference, Datasets, Social Media
- Free Urban Data – What’s It Good For? - Nov 1, 2014.
See how the increasing availability of free urban datasets that has come with more cities participating in free data programs can be applied to solve interesting problems in this Big Data article.
Big Data Journal, Datasets, Infrastructure, Location Analytics, Smart City
- TweetNLP: Twitter Natural Language Processing - Oct 24, 2014.
A short overview of Natural Language Processing tools and utilities developed by Prof. Noah Smith, CMU and his team to analyze Twitter data.
Advanced Analytics, ARK, CMU, Datasets, NLP, Speech, Tools, Twitter
- Top KDnuggets tweets, Oct 17-19: Air traffic analyzed to predict Ebola spread; Cool public data for data science - Oct 20, 2014.
Air traffic data analyzed to predict Ebola spread; Some cool public data sources you can use for your next data science project; Data science can't be point and click ! Finding random correlation is too easy; Bayes Rule in an animated gif.
Bayes Rule, Data Science, Datasets, Ebola, Overfitting
- Interactive Network and Graph Data Repository - Oct 17, 2014.
The network repository currently hosts over 500+ graphs/networks that span 19 collections of graphs from social science, machine learning, scientific computing, and many others.
Datasets, Graph Analytics, Graph Visualization, Network Graph
- MOOC: “Process Mining: Data science in Action” - Sep 10, 2014.
This 6 week online course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains.
Coursera, Data Science Course, Datasets, MOOC, Process Mining
- Top KDnuggets tweets, Aug 13-14: Boyfriend as a statistically “significant” other - Aug 15, 2014.
xkcd: Boyfriend as a statistically "significant" other; Interesting Social Media Datasets; Sibyl: a System for Large Scale Machine Learning at Google; We don't need such hype: "Big Data scientists get 100 recruiter emails a day".
Cartoon, Data Scientist, Datasets, Google, Machine Learning, Sibyl, Social Media, xkcd
- Interesting Social Media Datasets - Aug 13, 2014.
Learn about some of the many interesting social media datasets available to you, some of which are quite new, and the different features and challenges they offer you for your next big data science project.
Challenge, Data Visualization, Datasets, Open Data, Social Media Analytics
- Top KDnuggets tweets, May 30 – Jun 1: Guide to Setting Up an R-Hadoop ; 100+ Interesting Data Sets - Jun 2, 2014.
Tutorial: Step-by-Step Guide to Setting Up an R - #Hadoop System; 100+ Interesting Data Sets for Statistics (and Data Science); #BigData sets available for free - big list from Data Science Central ; Twitter to release all tweets to scientists - a research boon and an ethical dilemma.
Datasets, Hadoop, R, Twitter
- US Open Data Action Plan and Datasets - May 31, 2014.
We summarize the key findings in the recently released US Open Data Action Plan, highlighting the principles, commitments, datasets released and future outlook.
Datasets, Government, Open Data, Social Participation, White House
- Top KDnuggets tweets, Mar 21-23: Machine Learning in Parallel with SVM; Good Data Sets for Data Science Practice - Mar 24, 2014.
Machine Learning in Parallel with SVM, GLM; Good Data Sets for Data Science Practice: Big enough, requires data engineering, rich; Cartoon: Why Madame Zaza, Fortune Teller, changes to Predictive Analytics; Top 45 #BigData Tools and Platforms for Developers
Cartoon, Data Science Platform, Datasets, Machine Learning, Platform, Support Vector Machines, Tools