Search results for metadata

Found 237 documents, 5949 searched:

Containerization of PySpark Using Kubernetes
This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.
https://www.kdnuggets.com/2020/08/containerization-pyspark-kubernetes.html
A Tour of End-to-End Machine Learning Platforms
An end-to-end machine learning platform needs a holistic approach. If you’re interested in learning more about a few well-known ML platforms, you’ve come to the right place!
https://www.kdnuggets.com/2020/07/tour-end-to-end-machine-learning-platforms.html
Building a Content-Based Book Recommendation Engine
In this blog, we will see how we can build a simple content-based recommender system using Goodreads data.
https://www.kdnuggets.com/2020/07/building-content-based-book-recommendation-engine.html
Powerful CSV processing with kdb+
This article provides a glimpse into the available tools to work with CSV files and describes how kdb+ and its query language q raise CSV processing to a new level of performance and simplicity.
https://www.kdnuggets.com/2020/07/powerful-csv-processing-kdb.html
Wrapping Machine Learning Techniques Within AI-JACK Library in R">Wrapping Machine Learning Techniques Within AI-JACK Library in R
The article shows an approach to solving problem of selecting best technique in machine learning. This can be done in R using just one library called AI-JACK and the article shows how to use this tool.
https://www.kdnuggets.com/2020/07/wrapping-machine-learning-techniques-ai-jack-library-r.html
Four Ways to Apply NLP in Financial Services
Natural language processing (NLP) is increasingly used to review unstructured content or spot trends in markets. How is Refinitiv Labs applying NLP in financial services to meet challenges around investment decision-making and risk management?
https://www.kdnuggets.com/2020/06/four-ways-apply-nlp-financial-services.html
Faster machine learning on larger graphs with NumPy and Pandas
One of the most exciting features of StellarGraph 1.0 is a new graph data structure — built using NumPy and Pandas — that results in significantly lower memory usage and faster construction times.
https://www.kdnuggets.com/2020/05/faster-machine-learning-larger-graphs-numpy-pandas.html
The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models
The new typed feature schema streamlined the reusability of features across thousands of machine learning models.
https://www.kdnuggets.com/2020/05/architecture-linkedin-feature-management-machine-learning-models.html
Dockerize Jupyter with the Visual Debugger
A step by step guide to enable and use visual debugging in Jupyter in a docker container.
https://www.kdnuggets.com/2020/04/dockerize-jupyter-visual-debugger.html
ModelDB 2.0 is here!
We are excited to announce that ModelDB 2.0 is now available! We have learned a lot since building ModelDB 1.0, so we decided to rebuild from the ground up.
https://www.kdnuggets.com/2020/03/verta-modeldb-20.html
Five Interesting Data Engineering Projects
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.
https://www.kdnuggets.com/2020/03/data-engineering-projects.html
Scaling Your Data Strategy
This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.
https://www.kdnuggets.com/2020/03/scaling-data-strategy.html
Building a Mature Machine Learning Team
After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.
https://www.kdnuggets.com/2020/03/mature-machine-learning-team.html
How To Build Your Own Feedback Analysis Solution
Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.
https://www.kdnuggets.com/2020/03/build-feedback-analysis-solution.html
Can Edge Analytics Become a Game Changer?
Edge analytics is considered to be the future of sensor handling, and this article discusses its benefits and architecture of modern edge devices, gateways, and sensors. Deep Learning for edge analytics is also considered along with a review of experiments in human and chess figure detection using edge devices.
https://www.kdnuggets.com/2020/02/edge-analytics-game-changer.html
Introducing fastpages: An easy to use blogging platform with extra features for Jupyter Notebooks
This article introduces the easy to use blogging platform fastpages. fastpages relies on Github pages for hosting, and Github Actions to automate the creation of your blog, and contains extra features for Jupyter Notebooks.
https://www.kdnuggets.com/2020/02/introducing-fastpages-blogging-platform-jupyter-notebooks.html
How Kubeflow Can Add AI to Your Kubernetes Deployments
As Kubernetes is capable of working with other solutions, it is possible to integrate it with a collection of tools that can almost fully automate your development pipeline. Some of those third-party tools even allow you to integrate AI into Kubernetes. One such tool you can integrate with Kubernetes is Kubeflow. Read more about it here.
https://www.kdnuggets.com/2020/02/kubeflow-ai-kubernetes-deployments.html
The Death of Data Scientists – will AutoML replace them?">The Death of Data Scientists – will AutoML replace them?
Soon after tech giants Google and Microsoft introduced their AutoML services to the world, the popularity and interest in these services skyrocketed. We first review AutoML, compare the platforms available, and then test them out against real data scientists to answer the question: will AutoML replace us?
https://www.kdnuggets.com/2020/02/data-scientists-automl-replace.html
Observability for Data Engineering
Going beyond traditional monitoring techniques and goals, understanding if a system is working as intended requires a new concept in DevOps, called Observability. Learn more about this essential approach to bring more context to your system metrics.
https://www.kdnuggets.com/2020/02/observability-data-engineering.html
Managing Machine Learning Cycles: Five Learnings from comparing Data Science Experimentation/ Collaboration Tools
Machine learning projects require handling different versions of data, source code, hyperparameters, and environment configuration. Numerous tools are on the market for managing this variety, and this review features important lessons learned from an ongoing evaluation of the current landscape.
https://www.kdnuggets.com/2020/01/managing-machine-learning-cycles.html
Geovisualization with Open Data
In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).
https://www.kdnuggets.com/2020/01/open-data-germany-maps-viz.html
7 AI Use Cases Transforming Live Sports Production and Distribution
Here are 7 powerful AI led use cases both for linear television and for OTT apps that are transforming the live sports production landscape.
https://www.kdnuggets.com/2020/01/7-ai-use-cases-transforming-live-sports-production-distribution.html
What is Data Catalog and Why You Should Care?
Learn why data catalogs could be just the thing you need to meet the challenges of data and metadata management and collaboration.
https://www.kdnuggets.com/2019/12/data-catalog.html
Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released
Ontotext Platform 3.0 features significant technology improvements to enable simpler and faster graph navigation, including GraphQL interfaces to make it easier for application developers to access knowledge graphs without tedious development of back-end APIs or complex SPARQL.
https://www.kdnuggets.com/2019/12/ontotext-platform-enterprise-knowledge-graphs.html
Spark NLP 101: LightPipeline
A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.
https://www.kdnuggets.com/2019/11/spark-nlp-101-lightpipeline.html
Topics Extraction and Classification of Online Chats
This article provides covers how to automatically identify the topics within a corpus of textual data by using unsupervised topic modelling, and then apply a supervised classification algorithm to assign topic labels to each textual document by using the result of the previous step as target labels.
https://www.kdnuggets.com/2019/11/topics-extraction-classification-online-chats.html
How to Create a Vocabulary for NLP Tasks in Python
This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.
https://www.kdnuggets.com/2019/11/create-vocabulary-nlp-tasks-python.html
Everything a Data Scientist Should Know About Data Management">Everything a Data Scientist Should Know About Data Management
For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.
https://www.kdnuggets.com/2019/10/data-scientist-data-management.html
Beyond Word Embedding: Key Ideas in Document Embedding
This literature review on document embedding techniques thoroughly covers the many ways practitioners develop rich vector representations of text -- from single sentences to entire books.
https://www.kdnuggets.com/2019/10/beyond-word-embedding-document-embedding.html
The Last SQL Guide for Data Analysis You’ll Ever Need">The Last SQL Guide for Data Analysis You’ll Ever Need
This is it: the last SQL guide for data analysis you'll ever need! OK, maybe it’s actually the first. But it’ll give you a solid head start.
https://www.kdnuggets.com/2019/10/last-sql-guide-data-analysis-ever-need.html
Natural Language in Python using spaCy: An Introduction
This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.
https://www.kdnuggets.com/2019/09/natural-language-python-using-spacy-introduction.html
Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning
While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.
https://www.kdnuggets.com/2019/09/scikit-learn-synthetic-dataset.html
Automate your Python Scripts with Task Scheduler: Windows Task Scheduler to Scrape Alternative Data
In this tutorial, you will learn how to run task scheduler to web scrape data from Lazada (eCommerce) website and dump it into SQLite RDBMS Database.
https://www.kdnuggets.com/2019/09/automate-python-scripts-task-scheduler.html
How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions
As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.
https://www.kdnuggets.com/2019/08/linkedin-uber-lyft-airbnb-netflix-solving-data-management-discovery-machine-learning-solutions.html
Detecting stationarity in time series data
Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.
https://www.kdnuggets.com/2019/08/stationarity-time-series-data.html
Neural Code Search: How Facebook Uses Neural Networks to Help Developers Search for Code Snippets
Developers are always searching for answers to questions about their code. But how do they ask the right questions? Facebook is creating new NLP neural networks to help search code repositories that may advance information retrieval algorithms.
https://www.kdnuggets.com/2019/07/neural-code-facebook-uses-neural-networks.html
How to Showcase the Impact of Your Data Science Work
You're a Data Scientist -- or preparing to land your first job -- and communicating your work to others, especially employers, so they understand your impact is essential. These five tips will help you help others appreciate your data science.
https://www.kdnuggets.com/2019/07/showcase-impact-data-science-work.html
Understanding Cloud Data Services">Understanding Cloud Data Services
Ready to move your systems to a cloud vendor or just learning more about big data services? This overview will help you understand big data system architectures, components, and offerings with an end-to-end taxonomy of what is available from the big three cloud providers.
https://www.kdnuggets.com/2019/06/understanding-cloud-data-services.html
Predict Age and Gender Using Convolutional Neural Network and OpenCV">Predict Age and Gender Using Convolutional Neural Network and OpenCV
Age and gender estimation from a single face image are important tasks in intelligent applications. As such, let's build a simple age and gender detection model in this detailed article.
https://www.kdnuggets.com/2019/04/predict-age-gender-using-convolutional-neural-network-opencv.html
Data Pipelines, Luigi, Airflow: Everything you need to know
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
https://www.kdnuggets.com/2019/03/data-pipelines-luigi-airflow-everything-need-know.html
Top 7 Data Science Use Cases in Travel
To satisfy all the needs of the growing number of consumers and process enormous data chunks, data science algorithms are vital. Let’s consider several of widespread and efficient data science use cases in the travel industry.
https://www.kdnuggets.com/2019/02/top-7-data-science-use-cases-travel.html
The Role of the Data Engineer is Changing
The role of the data engineer in a startup data team is changing rapidly. Are you thinking about it the right way?
https://www.kdnuggets.com/2019/01/role-data-engineer-changing.html
Supervised Learning: Model Popularity from Past to Present
An extensive look at the history of machine learning models, using historical data from the number of publications of each type to attempt to answer the question: what is the most popular model?
https://www.kdnuggets.com/2018/12/supervised-learning-model-popularity-from-past-present.html
Text Preprocessing in Python: Steps, Tools, and Examples
We outline the basic steps of text preprocessing, which are needed for transferring text from human language to machine-readable format for further processing. We will also discuss text preprocessing tools.
https://www.kdnuggets.com/2018/11/text-preprocessing-python.html
Hadoop for Beginners">Hadoop for Beginners
An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.
https://www.kdnuggets.com/2018/09/hadoop-beginners.html
Comparison of the Most Useful Text Processing APIs">Comparison of the Most Useful Text Processing APIs
There is a need to compare different APIs to understand key pros and cons they have and when it is better to use one API instead of the other. Let us proceed with the comparison.
https://www.kdnuggets.com/2018/08/comparison-most-useful-text-processing-apis.html
Docker Cheat Sheet
This comprehensive cheat sheet will assist Docker users, experienced and new, in getting containers up-and-running quickly. We list commands that will allow users to install, build, ship and run Docker containers.
https://www.kdnuggets.com/2018/08/docker-cheat-sheet.html
Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP">Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP
Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.
https://www.kdnuggets.com/2018/08/understanding-language-syntax-and-structure-practitioners-guide-nlp-3.html
Text Classification & Embeddings Visualization Using LSTMs, CNNs, and Pre-trained Word Vectors
In this tutorial, I classify Yelp round-10 review datasets. After processing the review comments, I trained three model in three different ways and obtained three word embeddings.
https://www.kdnuggets.com/2018/07/text-classification-lstm-cnn-pre-trained-word-vectors.html
Building a Basic Keras Neural Network Sequential Model
The approach basically coincides with Chollet's Keras 4 step workflow, which he outlines in his book "Deep Learning with Python," using the MNIST dataset, and the model built is a Sequential network of Dense layers. A building block for additional posts.
https://www.kdnuggets.com/2018/06/basic-keras-neural-network-sequential-model.html
Top 20 Python Libraries for Data Science in 2018">Top 20 Python Libraries for Data Science in 2018
Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.
https://www.kdnuggets.com/2018/06/top-20-python-libraries-data-science-2018.html
Generating Text with RNNs in 4 Lines of Code">Generating Text with RNNs in 4 Lines of Code
Want to generate text with little trouble, and without building and tuning a neural network yourself? Let's check out a project which allows you to "easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code."
https://www.kdnuggets.com/2018/06/generating-text-rnn-4-lines-code.html
Beyond Data Lakes and Data Warehousing
We give a comprehensive review of data lakes and data warehouses, and look at what the future holds for total data integration.
https://www.kdnuggets.com/2018/05/data-lakes-data-warehousing-integration-revolution.html
Complete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API">Complete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API
In this tutorial, a CNN is to be built, and trained and tested against the CIFAR10 dataset. To make the model remotely accessible, a Flask Web application is created using Python to receive an uploaded image and return its classification label using HTTP.
https://www.kdnuggets.com/2018/05/complete-guide-convnet-tensorflow-flask-restful-python-api.html
50+ Useful Machine Learning & Prediction APIs, 2018 Edition">50+ Useful Machine Learning & Prediction APIs, 2018 Edition
Extensive list of 50+ APIs in Face and Image Recognition ,Text Analysis, NLP, Sentiment Analysis, Language Translation, Machine Learning and prediction.
https://www.kdnuggets.com/2018/05/50-useful-machine-learning-prediction-apis-2018-edition.html
Jupyter Notebook for Beginners: A Tutorial
The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. Although it is possible to use many different programming languages within Jupyter Notebooks, this article will focus on Python as it is the most common use case.
https://www.kdnuggets.com/2018/05/jupyter-notebook-beginners-tutorial.html
Text Data Preprocessing: A Walkthrough in Python">Text Data Preprocessing: A Walkthrough in Python
This post will serve as a practical walkthrough of a text data preprocessing task using some common Python tools.
https://www.kdnuggets.com/2018/03/text-data-preprocessing-walkthrough-python.html
Quick Feature Engineering with Dates Using fast.ai
The fast.ai library is a collection of supplementary wrappers for a host of popular machine learning libraries, designed to remove the necessity of writing your own functions to take care of some repetitive tasks in a machine learning workflow.
https://www.kdnuggets.com/2018/03/feature-engineering-dates-fastai.html
Text Processing in R
There are good reasons to want to use R for text processing, namely that we can do it, and that we can fit it in with the rest of our analyses. Furthermore, there is a lot of very active development going on in the R text analysis community right now.
https://www.kdnuggets.com/2018/03/text-processing-r.html
Graph Databases Burst into the Mainstream
What do Amazon, Facebook, Google, IBM, Microsoft and Twitter have in common? They're all adopters of graph databases - a hot technology that continues to evolve.
https://www.kdnuggets.com/2018/02/graph-databases-burst-into-the-mainstream.html
Data Science at the Command Line: Exploring Data">Data Science at the Command Line: Exploring Data
See what's available in the freely-available book "Data Science at the Command Line" by digging into data exploration in the terminal.
https://www.kdnuggets.com/2018/02/data-science-command-line-book-exploring-data.html
Training and Visualising Word Vectors
In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.
https://www.kdnuggets.com/2018/01/training-visualising-word-vectors.html
Elasticsearch for Dummies
In this blog, you’ll get to know the basics of Elasticsearch, its advantages, how to install it and indexing the documents using Elasticsearch.
https://www.kdnuggets.com/2018/01/elasticsearch-overview.html
70 Amazing Free Data Sources You Should Know">70 Amazing Free Data Sources You Should Know
70 free data sources for 2017 on government, crime, health, financial and economic data, marketing and social media, journalism and media, real estate, company directory and review, and more to start working on your data projects.
https://www.kdnuggets.com/2017/12/big-data-free-sources.html
A General Approach to Preprocessing Text Data
Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.
https://www.kdnuggets.com/2017/12/general-approach-preprocessing-text-data.html
Are Data Lakes Fake News?">Are Data Lakes Fake News?
The quick answer is yes, and the biggest problem is that the term “Data Lakes” has been overloaded by vendors and analysts with different meanings, resulting in an ill-defined and blurry concept.
https://www.kdnuggets.com/2017/09/data-lakes-fake-news.html
How Convolutional Neural Networks Accomplish Image Recognition?
Image recognition is very interesting and challenging field of study. Here we explain concepts, applications and techniques of image recognition using Convolutional Neural Networks.
https://www.kdnuggets.com/2017/08/convolutional-neural-networks-image-recognition.html
How to squeeze the most from your training data
In many cases, getting enough well-labelled training data is a huge hurdle for developing accurate prediction systems. Here is an innovative approach which uses SVM to get the most from training data.
https://www.kdnuggets.com/2017/07/squeeze-most-from-training-data.html
Spotlight on the Remarkable Potential of AI in KYC (Know Your Customer)
Most people would have heard of the headline-making tremendous achievements in artificial intelligence (AI): Systems defeating world champions in board games like GO and winning quiz shows. These are small realizations of AI, but there is a silent revolution taking place in other areas, including Regulatory Compliance in Financial Services.
https://www.kdnuggets.com/2017/07/spotlight-remarkable-potential-ai-kyc.html
7 Ways to Get High-Quality Labeled Training Data at Low Cost
Having labeled training data is needed for machine learning, but getting such data is not simple or cheap. We review 7 approaches including repurposing, harvesting free sources, retrain models on progressively higher quality data, and more.
https://www.kdnuggets.com/2017/06/acquiring-quality-labeled-training-data.html
Text Mining 101: Mining Information From A Resume">Text Mining 101: Mining Information From A Resume
We show a framework for mining relevant entities from a text resume, and how to separation parsing logic from entity specification.
https://www.kdnuggets.com/2017/05/text-mining-information-resume.html
Must-Know: What are common data quality issues for Big Data and how to handle them?">Must-Know: What are common data quality issues for Big Data and how to handle them?
Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.
https://www.kdnuggets.com/2017/05/must-know-common-data-quality-issues-big-data.html
Models: From the Lab to the Factory
In this post, we’ll go over techniques to avoid these scenarios through the process of model management and deployment.
https://www.kdnuggets.com/2017/04/models-from-lab-factory.html
Difference Between Big Data and Internet of Things
If you cannot manage real-time streaming data and make real-time analytics and real-time decisions at the edge, then you are not doing IOT or IOT analytics, in my humble opinion. So what is required to support these IOT data management and analytic requirements?
https://www.kdnuggets.com/2017/04/difference-big-data-internet-of-things.html
17 More Must-Know Data Science Interview Questions and Answers, Part 3">17 More Must-Know Data Science Interview Questions and Answers, Part 3
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.

https://www.kdnuggets.com/2017/03/17-data-science-interview-questions-answers-part-3.html
Gartner Data Science Platforms – A Deeper Look
Thomas Dinsmore critical examination of Gartner 2017 MQ of Data Science Platforms, including vendors who out, in, have big changes, Hadoop and Spark integration, open source software, and what Data Scientists actually use.
https://www.kdnuggets.com/2017/03/thomaswdinsmore-gartner-data-science-platforms.html
Introduction to Natural Language Processing, Part 1: Lexical Units
This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.
https://www.kdnuggets.com/2017/02/datascience-introduction-natural-language-processing-part1.html
Machine Learning and Cyber Security Resources">Machine Learning and Cyber Security Resources
An overview of useful resources about applications of machine learning and data mining in cyber security, including important websites, papers, books, tutorials, courses, and more.

https://www.kdnuggets.com/2017/01/machine-learning-cyber-security.html
The big data ecosystem for science: Climate Science and Climate Change
Climate change is one of the most pressing challenges for human society in the 21st century. We review the Big Data ecosystem for studying the climate change.
https://www.kdnuggets.com/2016/12/big-data-ecosystem-science-climate-change.html
Smart Data Platform – The Future of Big Data Technology
Data processing and analytical modelling are major bottlenecks in today’s big data world, due to need of human intelligence to decide relationships between data, required data engineering tasks, analytical models and it’s parameters. This article talks about Smart Data Platform to help to solve such problems.
https://www.kdnuggets.com/2016/12/smart-data-platform-future-big-data-technology.html
Data Science and Big Data, Explained">Data Science and Big Data, Explained
This article is meant to give the non-data scientist a solid overview of the many concepts and terms behind data science and big data. While related terms will be mentioned at a very high level, the reader is encouraged to explore the references and other resources for additional detail.
https://www.kdnuggets.com/2016/11/big-data-data-science-explained.html
LinkedIn Knowledge Graph – KDnuggets Interview
We interview LinkedIn about their recently published LinkedIn Knowledge Graph which connects their many millions of members, jobs, companies, and more.
https://www.kdnuggets.com/2016/10/interview-creators-linkedin-knowledge-graph.html
Top 10 Data Science Videos on Youtube">Top 10 Data Science Videos on Youtube
Learning and the future are the key topics in the recent Youtube videos on Data Science. The main questions revolve around: “how to become a Data Scientist”, “what is a data scientist”, and “where data science is going”. But why there is so little explanation of data science to the masses?
https://www.kdnuggets.com/2016/10/top-10-data-science-videos-youtube.html
Contest 2nd Place: Automating Data Science
This post discusses some considerations, options, and opportunities for automating aspects of data science and machine learning. It is the second place recipient (tied) in the recent KDnuggets blog contest.
https://www.kdnuggets.com/2016/08/automating-data-science.html
Doing Statistics with SQL
This post covers how to perform some basic in-database statistical analysis using SQL.
https://www.kdnuggets.com/2016/08/doing-statistics-sql.html
Improving Nudity Detection and NSFW Image Recognition
This post discussed improvements made in a tricky machine learning classification problem: nude and/or NSFW, or not?
https://www.kdnuggets.com/2016/06/algorithmia-improving-nudity-detection-nsfw-image-recognition.html
Data Lake Plumbers: Operationalizing the Data Lake
Gain insight into data lakes, their benefits, when they are appropriate, and how to operationalize them. How do they compare to the data warehouse?
https://www.kdnuggets.com/2016/02/data-lakes-plumbers-operationalizing.html
Data-Planet Statistical Datasets
Data-Planet Statistical Datasets provides easy access to an extensive repository of standardized and structured statistical data, with more than 25 billion data points from more than 70 source organizations.
https://www.kdnuggets.com/2015/11/data-planet-statistical-datasets.html
Integrating Python and R into a Data Analysis Pipeline, Part 1
The first in a series of blog posts that: outline the basic strategy for integrating Python and R, run through the different steps involved in this process; and give a real example of how and why you would want to do this.
https://www.kdnuggets.com/2015/10/integrating-python-r-data-analysis-part1.html
The Data Science Machine, or ‘How To Engineer Feature Engineering’
MIT researchers have developed what they refer to as the Data Science Machine, which combines feature engineering and an end-to-end data science pipeline into a system that beats nearly 70% of humans in competitions. Is this game-changing?
https://www.kdnuggets.com/2015/10/data-science-machine.html
Interview: Thanigai Vellore, Art.com on Delivering Contextually Relevant Search Experience
We discuss the role of Analytics at Art.com, the polyglot data architecture at Art.com, the use cases for Hadoop, vendor selection, supporting semantic search and experience with Avro.
https://www.kdnuggets.com/2015/07/interview-thanigai-vellore-art-search-experience.html
In Machine Learning, What is Better: More Data or better Algorithms
Gross over-generalization of “more data gives better results” is misguiding. Here we explain, in which scenario more data or more features are helpful and which are not. Also, how the choice of the algorithm affects the end result.
https://www.kdnuggets.com/2015/06/machine-learning-more-data-better-algorithms.html
Interview: Joseph Babcock, Netflix on Genie, Lipstick, and Other In-house Developed Tools
We discuss role of analytics in content acquisition, data architecture at Netflix, organizational structure, and open-source tools from Netflix.
https://www.kdnuggets.com/2015/06/interview-joseph-babcock-netflix-in-house-developed-tools.html
Top 30 Social Network Analysis and Visualization Tools
We review major tools and packages for Social Network Analysis and visualization, which have wide applications including biology, finance, sociology, network theory, and many other domains.
https://www.kdnuggets.com/2015/06/top-30-social-network-analysis-visualization-tools.html
Exclusive: Interview with Chris Wiggins, NYTimes Chief Data Scientist
New York Times Chief Data Scientist Chris Wiggins on the transformation of digital journalism, key Data Science skills, favorite tools, why better wrong than nice, and how Thomas Jefferson is very relevant today.
https://www.kdnuggets.com/2015/01/exclusive-interview-chris-wiggins-nytimes-chief-data-scientist.html
Interesting Social Media Datasets
Learn about some of the many interesting social media datasets available to you, some of which are quite new, and the different features and challenges they offer you for your next big data science project.
https://www.kdnuggets.com/2014/08/interesting-social-media-datasets.html+
Interview: Sastry Malladi, StubHub on Designing Big Data Architecture for the Unknown Future
We discuss the Big Data architecture at StubHub, important factors in architecture design, hybrid approach of using Big Data along with traditional data warehouses, challenges, importance of meta-data and more.
https://www.kdnuggets.com/2014/07/interview-sastry-malladi-stubhub-big-data-architecture.html
Is Data Scientist the right career path for you? Candid advice
Candid advice from an industry veteran reveals the true picture behind the much-talked-about Data Scientist "glamour" and helps people have the right expectations for a Data Science career.
https://www.kdnuggets.com/2014/03/data-scientist-right-career-path-candid-advice.html
KDnuggets™ News 14:n07, Mar 26
Features (3) | Opinions (3) | Software (2) | News (3) | Webcasts (2) | Courses (1) | Meetings (3) | Jobs (5) | Academic Read more »
https://www.kdnuggets.com/2014/n07.html
The Do’s and Don’ts of Data Mining
Leading data mining and analytics experts give their favorite do's and don'ts, from "Do plan for data to be messy" to "Do not underestimate the power of a simpler-to-understand solution".
https://www.kdnuggets.com/2014/03/data-mining-do-and-dont.html
Top Datasets on Reddit
Most popular dataset posts on Reddit include NFL Game Metadata, Reddit top 2.5 Million posts, Zillow housing prices, and, of course, a database of cat pictures.
https://www.kdnuggets.com/2013/12/top-datasets-on-reddit.html
KDnuggets™ News 14:n01, Jan 8
coming on Jan 8
https://www.kdnuggets.com/2014/n01.html
2013 Dec News: Analytics, Big Data, Data Mining and Data Science Features, News, and Software
All (95) | News, Software (27) | Courses, Events (12) | Jobs | Academic | Publications (38) Top stories for Dec 22-29: Data Mining Applications Read more »
https://www.kdnuggets.com/2013/12/news-software.html
2013 Dec: Analytics, Big Data, Data Mining and Data Science News
All (95) | News, Software (27) | Courses, Events (12) | Jobs | Academic | Publications (38) Unicorn Data Scientists vs Data Science Teams - Read more »
https://www.kdnuggets.com/2013/12/index.html
Data: Government, State, City, Local and Public
This is a directory of government, federal, state, city, local and other public datasets. See also Data APIs, Hubs, Marketplaces, Platforms, Portals, and Search Engines. Read more »
https://www.kdnuggets.com/datasets/government-local-public.html
Datasets for Data Science, Machine Learning, AI & Analytics
KDnuggets subscribers now have access to the WorldData.AI Partners Plan at no cost! Check out the world’s largest external curated data platform, integrating data Read more »
https://www.kdnuggets.com/datasets/index.html
Gartner Magic Quadrant for Business Intelligence and Analytics Platforms
In 2012 Data Discovery became a mainstream part of BI and analytic architecture. The market also saw increased activity in real time, content and predictive analytics. The leaders in the market include Microsoft, IBM, Tableau, QlikTech, Oracle, SAS, MicroStrategy, Tibco Spotfire, Information Builders, and SAP.
https://www.kdnuggets.com/2013/02/gartner-magic-quadrant-for-business-intelligence-analytics-platforms.html

More...12

Search results for metadata

Top Posts