Search results for metadata

    Found 237 documents, 5949 searched:

  • Containerization of PySpark Using Kubernetes

    This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.

    https://www.kdnuggets.com/2020/08/containerization-pyspark-kubernetes.html

  • A Tour of End-to-End Machine Learning Platforms

    An end-to-end machine learning platform needs a holistic approach. If you’re interested in learning more about a few well-known ML platforms, you’ve come to the right place!

    https://www.kdnuggets.com/2020/07/tour-end-to-end-machine-learning-platforms.html

  • Building a Content-Based Book Recommendation Engine

    In this blog, we will see how we can build a simple content-based recommender system using Goodreads data.

    https://www.kdnuggets.com/2020/07/building-content-based-book-recommendation-engine.html

  • Powerful CSV processing with kdb+

    This article provides a glimpse into the available tools to work with CSV files and describes how kdb+ and its query language q raise CSV processing to a new level of performance and simplicity.

    https://www.kdnuggets.com/2020/07/powerful-csv-processing-kdb.html

  • Wrapping Machine Learning Techniques Within AI-JACK Library in R">Silver BlogWrapping Machine Learning Techniques Within AI-JACK Library in R

    The article shows an approach to solving problem of selecting best technique in machine learning. This can be done in R using just one library called AI-JACK and the article shows how to use this tool.

    https://www.kdnuggets.com/2020/07/wrapping-machine-learning-techniques-ai-jack-library-r.html

  • Four Ways to Apply NLP in Financial Services

    Natural language processing (NLP) is increasingly used to review unstructured content or spot trends in markets. How is Refinitiv Labs applying NLP in financial services to meet challenges around investment decision-making and risk management?

    https://www.kdnuggets.com/2020/06/four-ways-apply-nlp-financial-services.html

  • Faster machine learning on larger graphs with NumPy and Pandas

    One of the most exciting features of StellarGraph 1.0 is a new graph data structure — built using NumPy and Pandas — that results in significantly lower memory usage and faster construction times.

    https://www.kdnuggets.com/2020/05/faster-machine-learning-larger-graphs-numpy-pandas.html

  • The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models

    The new typed feature schema streamlined the reusability of features across thousands of machine learning models.

    https://www.kdnuggets.com/2020/05/architecture-linkedin-feature-management-machine-learning-models.html

  • Dockerize Jupyter with the Visual Debugger

    A step by step guide to enable and use visual debugging in Jupyter in a docker container.

    https://www.kdnuggets.com/2020/04/dockerize-jupyter-visual-debugger.html

  • ModelDB 2.0 is here!

    We are excited to announce that ModelDB 2.0 is now available! We have learned a lot since building ModelDB 1.0, so we decided to rebuild from the ground up.

    https://www.kdnuggets.com/2020/03/verta-modeldb-20.html

  • Five Interesting Data Engineering Projects

    As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.

    https://www.kdnuggets.com/2020/03/data-engineering-projects.html

  • Scaling Your Data Strategy

    This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.

    https://www.kdnuggets.com/2020/03/scaling-data-strategy.html

  • Building a Mature Machine Learning Team

    After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.

    https://www.kdnuggets.com/2020/03/mature-machine-learning-team.html

  • How To Build Your Own Feedback Analysis Solution

    Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.

    https://www.kdnuggets.com/2020/03/build-feedback-analysis-solution.html

  • Can Edge Analytics Become a Game Changer?

    Edge analytics is considered to be the future of sensor handling, and this article discusses its benefits and architecture of modern edge devices, gateways, and sensors. Deep Learning for edge analytics is also considered along with a review of experiments in human and chess figure detection using edge devices.

    https://www.kdnuggets.com/2020/02/edge-analytics-game-changer.html

  • Introducing fastpages: An easy to use blogging platform with extra features for Jupyter Notebooks

    This article introduces the easy to use blogging platform fastpages. fastpages relies on Github pages for hosting, and Github Actions to automate the creation of your blog, and contains extra features for Jupyter Notebooks.

    https://www.kdnuggets.com/2020/02/introducing-fastpages-blogging-platform-jupyter-notebooks.html

  • How Kubeflow Can Add AI to Your Kubernetes Deployments

    As Kubernetes is capable of working with other solutions, it is possible to integrate it with a collection of tools that can almost fully automate your development pipeline. Some of those third-party tools even allow you to integrate AI into Kubernetes. One such tool you can integrate with Kubernetes is Kubeflow. Read more about it here.

    https://www.kdnuggets.com/2020/02/kubeflow-ai-kubernetes-deployments.html

  • Platinum BlogThe Death of Data Scientists – will AutoML replace them?">Gold BlogPlatinum BlogThe Death of Data Scientists – will AutoML replace them?

    Soon after tech giants Google and Microsoft introduced their AutoML services to the world, the popularity and interest in these services skyrocketed. We first review AutoML, compare the platforms available, and then test them out against real data scientists to answer the question: will AutoML replace us?

    https://www.kdnuggets.com/2020/02/data-scientists-automl-replace.html

  • Observability for Data Engineering

    Going beyond traditional monitoring techniques and goals, understanding if a system is working as intended requires a new concept in DevOps, called Observability. Learn more about this essential approach to bring more context to your system metrics.

    https://www.kdnuggets.com/2020/02/observability-data-engineering.html

  • Managing Machine Learning Cycles: Five Learnings from comparing Data Science Experimentation/ Collaboration Tools

    Machine learning projects require handling different versions of data, source code, hyperparameters, and environment configuration. Numerous tools are on the market for managing this variety, and this review features important lessons learned from an ongoing evaluation of the current landscape.

    https://www.kdnuggets.com/2020/01/managing-machine-learning-cycles.html

  • Geovisualization with Open Data

    In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).

    https://www.kdnuggets.com/2020/01/open-data-germany-maps-viz.html

  • 7 AI Use Cases Transforming Live Sports Production and Distribution

    Here are 7 powerful AI led use cases both for linear television and for OTT apps that are transforming the live sports production landscape.

    https://www.kdnuggets.com/2020/01/7-ai-use-cases-transforming-live-sports-production-distribution.html

  • What is Data Catalog and Why You Should Care?

    Learn why data catalogs could be just the thing you need to meet the challenges of data and metadata management and collaboration.

    https://www.kdnuggets.com/2019/12/data-catalog.html

  • Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released

    Ontotext Platform 3.0 features significant technology improvements to enable simpler and faster graph navigation, including GraphQL interfaces to make it easier for application developers to access knowledge graphs without tedious development of back-end APIs or complex SPARQL.

    https://www.kdnuggets.com/2019/12/ontotext-platform-enterprise-knowledge-graphs.html

  • Spark NLP 101: LightPipeline

    A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.

    https://www.kdnuggets.com/2019/11/spark-nlp-101-lightpipeline.html

  • Topics Extraction and Classification of Online Chats

    This article provides covers how to automatically identify the topics within a corpus of textual data by using unsupervised topic modelling, and then apply a supervised classification algorithm to assign topic labels to each textual document by using the result of the previous step as target labels.

    https://www.kdnuggets.com/2019/11/topics-extraction-classification-online-chats.html

  • How to Create a Vocabulary for NLP Tasks in Python

    This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

    https://www.kdnuggets.com/2019/11/create-vocabulary-nlp-tasks-python.html

  • Platinum BlogEverything a Data Scientist Should Know About Data Management">Silver BlogPlatinum BlogEverything a Data Scientist Should Know About Data Management

    For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.

    https://www.kdnuggets.com/2019/10/data-scientist-data-management.html

  • Beyond Word Embedding: Key Ideas in Document Embedding

    This literature review on document embedding techniques thoroughly covers the many ways practitioners develop rich vector representations of text -- from single sentences to entire books.

    https://www.kdnuggets.com/2019/10/beyond-word-embedding-document-embedding.html

  • The Last SQL Guide for Data Analysis You’ll Ever Need">Gold BlogThe Last SQL Guide for Data Analysis You’ll Ever Need

    This is it: the last SQL guide for data analysis you'll ever need! OK, maybe it’s actually the first. But it’ll give you a solid head start.

    https://www.kdnuggets.com/2019/10/last-sql-guide-data-analysis-ever-need.html

  • Natural Language in Python using spaCy: An Introduction

    This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.

    https://www.kdnuggets.com/2019/09/natural-language-python-using-spacy-introduction.html

  • Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning

    While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.

    https://www.kdnuggets.com/2019/09/scikit-learn-synthetic-dataset.html

  • Automate your Python Scripts with Task Scheduler: Windows Task Scheduler to Scrape Alternative Data

    In this tutorial, you will learn how to run task scheduler to web scrape data from Lazada (eCommerce) website and dump it into SQLite RDBMS Database.

    https://www.kdnuggets.com/2019/09/automate-python-scripts-task-scheduler.html

  • How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

    As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.

    https://www.kdnuggets.com/2019/08/linkedin-uber-lyft-airbnb-netflix-solving-data-management-discovery-machine-learning-solutions.html

  • Detecting stationarity in time series data

    Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.

    https://www.kdnuggets.com/2019/08/stationarity-time-series-data.html

  • Neural Code Search: How Facebook Uses Neural Networks to Help Developers Search for Code Snippets

    Developers are always searching for answers to questions about their code. But how do they ask the right questions? Facebook is creating new NLP neural networks to help search code repositories that may advance information retrieval algorithms.

    https://www.kdnuggets.com/2019/07/neural-code-facebook-uses-neural-networks.html

  • How to Showcase the Impact of Your Data Science Work

    You're a Data Scientist -- or preparing to land your first job -- and communicating your work to others, especially employers, so they understand your impact is essential. These five tips will help you help others appreciate your data science.

    https://www.kdnuggets.com/2019/07/showcase-impact-data-science-work.html

  • Understanding Cloud Data Services">Gold BlogUnderstanding Cloud Data Services

    Ready to move your systems to a cloud vendor or just learning more about big data services? This overview will help you understand big data system architectures, components, and offerings with an end-to-end taxonomy of what is available from the big three cloud providers.

    https://www.kdnuggets.com/2019/06/understanding-cloud-data-services.html

  • Predict Age and Gender Using Convolutional Neural Network and OpenCV">Silver BlogPredict Age and Gender Using Convolutional Neural Network and OpenCV

    Age and gender estimation from a single face image are important tasks in intelligent applications. As such, let's build a simple age and gender detection model in this detailed article.

    https://www.kdnuggets.com/2019/04/predict-age-gender-using-convolutional-neural-network-opencv.html

  • Data Pipelines, Luigi, Airflow: Everything you need to know

    This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.

    https://www.kdnuggets.com/2019/03/data-pipelines-luigi-airflow-everything-need-know.html

  • Top 7 Data Science Use Cases in Travel

    To satisfy all the needs of the growing number of consumers and process enormous data chunks, data science algorithms are vital. Let’s consider several of widespread and efficient data science use cases in the travel industry.

    https://www.kdnuggets.com/2019/02/top-7-data-science-use-cases-travel.html

  • The Role of the Data Engineer is Changing

    The role of the data engineer in a startup data team is changing rapidly. Are you thinking about it the right way?

    https://www.kdnuggets.com/2019/01/role-data-engineer-changing.html

  • Supervised Learning: Model Popularity from Past to Present

    An extensive look at the history of machine learning models, using historical data from the number of publications of each type to attempt to answer the question: what is the most popular model?

    https://www.kdnuggets.com/2018/12/supervised-learning-model-popularity-from-past-present.html

  • Text Preprocessing in Python: Steps, Tools, and Examples

    We outline the basic steps of text preprocessing, which are needed for transferring text from human language to machine-readable format for further processing. We will also discuss text preprocessing tools.

    https://www.kdnuggets.com/2018/11/text-preprocessing-python.html

  • Hadoop for Beginners">Silver BlogHadoop for Beginners

    An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.

    https://www.kdnuggets.com/2018/09/hadoop-beginners.html

  • Comparison of the Most Useful Text Processing APIs">Silver BlogComparison of the Most Useful Text Processing APIs

    There is a need to compare different APIs to understand key pros and cons they have and when it is better to use one API instead of the other. Let us proceed with the comparison.

    https://www.kdnuggets.com/2018/08/comparison-most-useful-text-processing-apis.html

  • Docker Cheat Sheet

    This comprehensive cheat sheet will assist Docker users, experienced and new, in getting containers up-and-running quickly. We list commands that will allow users to install, build, ship and run Docker containers.

    https://www.kdnuggets.com/2018/08/docker-cheat-sheet.html

  • Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP">Silver BlogUnderstanding Language Syntax and Structure: A Practitioner’s Guide to NLP

    Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.

    https://www.kdnuggets.com/2018/08/understanding-language-syntax-and-structure-practitioners-guide-nlp-3.html

  • Text Classification & Embeddings Visualization Using LSTMs, CNNs, and Pre-trained Word Vectors

    In this tutorial, I classify Yelp round-10 review datasets. After processing the review comments, I trained three model in three different ways and obtained three word embeddings.

    https://www.kdnuggets.com/2018/07/text-classification-lstm-cnn-pre-trained-word-vectors.html

  • Building a Basic Keras Neural Network Sequential Model

    The approach basically coincides with Chollet's Keras 4 step workflow, which he outlines in his book "Deep Learning with Python," using the MNIST dataset, and the model built is a Sequential network of Dense layers. A building block for additional posts.

    https://www.kdnuggets.com/2018/06/basic-keras-neural-network-sequential-model.html

  • Top 20 Python Libraries for Data Science in 2018">Silver BlogTop 20 Python Libraries for Data Science in 2018

    Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.

    https://www.kdnuggets.com/2018/06/top-20-python-libraries-data-science-2018.html

  • Generating Text with RNNs in 4 Lines of Code">Silver BlogGenerating Text with RNNs in 4 Lines of Code

    Want to generate text with little trouble, and without building and tuning a neural network yourself? Let's check out a project which allows you to "easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code."

    https://www.kdnuggets.com/2018/06/generating-text-rnn-4-lines-code.html

  • Beyond Data Lakes and Data Warehousing

    We give a comprehensive review of data lakes and data warehouses, and look at what the future holds for total data integration.

    https://www.kdnuggets.com/2018/05/data-lakes-data-warehousing-integration-revolution.html

  • Complete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API">Silver BlogComplete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API

    In this tutorial, a CNN is to be built, and trained and tested against the CIFAR10 dataset. To make the model remotely accessible, a Flask Web application is created using Python to receive an uploaded image and return its classification label using HTTP.

    https://www.kdnuggets.com/2018/05/complete-guide-convnet-tensorflow-flask-restful-python-api.html

  • 50+ Useful Machine Learning & Prediction APIs, 2018 Edition">Silver Blog50+ Useful Machine Learning & Prediction APIs, 2018 Edition

    Extensive list of 50+ APIs in Face and Image Recognition ,Text Analysis, NLP, Sentiment Analysis, Language Translation, Machine Learning and prediction.

    https://www.kdnuggets.com/2018/05/50-useful-machine-learning-prediction-apis-2018-edition.html

  • Jupyter Notebook for Beginners: A Tutorial

    The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. Although it is possible to use many different programming languages within Jupyter Notebooks, this article will focus on Python as it is the most common use case.

    https://www.kdnuggets.com/2018/05/jupyter-notebook-beginners-tutorial.html

  • Text Data Preprocessing: A Walkthrough in Python">Gold BlogText Data Preprocessing: A Walkthrough in Python

    This post will serve as a practical walkthrough of a text data preprocessing task using some common Python tools.

    https://www.kdnuggets.com/2018/03/text-data-preprocessing-walkthrough-python.html

  • Quick Feature Engineering with Dates Using fast.ai

    The fast.ai library is a collection of supplementary wrappers for a host of popular machine learning libraries, designed to remove the necessity of writing your own functions to take care of some repetitive tasks in a machine learning workflow.

    https://www.kdnuggets.com/2018/03/feature-engineering-dates-fastai.html

  • Text Processing in R

    There are good reasons to want to use R for text processing, namely that we can do it, and that we can fit it in with the rest of our analyses. Furthermore, there is a lot of very active development going on in the R text analysis community right now.

    https://www.kdnuggets.com/2018/03/text-processing-r.html

  • Graph Databases Burst into the Mainstream

    What do Amazon, Facebook, Google, IBM, Microsoft and Twitter have in common? They're all adopters of graph databases - a hot technology that continues to evolve.

    https://www.kdnuggets.com/2018/02/graph-databases-burst-into-the-mainstream.html

  • Data Science at the Command Line: Exploring Data">Silver BlogData Science at the Command Line: Exploring Data

    See what's available in the freely-available book "Data Science at the Command Line" by digging into data exploration in the terminal.

    https://www.kdnuggets.com/2018/02/data-science-command-line-book-exploring-data.html

  • Training and Visualising Word Vectors

    In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.

    https://www.kdnuggets.com/2018/01/training-visualising-word-vectors.html

  • Elasticsearch for Dummies

    In this blog, you’ll get to know the basics of Elasticsearch, its advantages, how to install it and indexing the documents using Elasticsearch.

    https://www.kdnuggets.com/2018/01/elasticsearch-overview.html

  • 70 Amazing Free Data Sources You Should Know">Silver Blog70 Amazing Free Data Sources You Should Know

    70 free data sources for 2017 on government, crime, health, financial and economic data, marketing and social media, journalism and media, real estate, company directory and review, and more to start working on your data projects.

    https://www.kdnuggets.com/2017/12/big-data-free-sources.html

  • A General Approach to Preprocessing Text Data

    Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.

    https://www.kdnuggets.com/2017/12/general-approach-preprocessing-text-data.html

  • Are Data Lakes Fake News?">Silver Blog, Sep 2017Are Data Lakes Fake News?

    The quick answer is yes, and the biggest problem is that the term “Data Lakes” has been overloaded by vendors and analysts with different meanings, resulting in an ill-defined and blurry concept.

    https://www.kdnuggets.com/2017/09/data-lakes-fake-news.html

  • How Convolutional Neural Networks Accomplish Image Recognition?

    Image recognition is very interesting and challenging field of study. Here we explain concepts, applications and techniques of image recognition using Convolutional Neural Networks.

    https://www.kdnuggets.com/2017/08/convolutional-neural-networks-image-recognition.html

  • How to squeeze the most from your training data

    In many cases, getting enough well-labelled training data is a huge hurdle for developing accurate prediction systems. Here is an innovative approach which uses SVM to get the most from training data.

    https://www.kdnuggets.com/2017/07/squeeze-most-from-training-data.html

  • Spotlight on the Remarkable Potential of AI in KYC (Know Your Customer)

    Most people would have heard of the headline-making tremendous achievements in artificial intelligence (AI): Systems defeating world champions in board games like GO and winning quiz shows. These are small realizations of AI, but there is a silent revolution taking place in other areas, including Regulatory Compliance in Financial Services.

    https://www.kdnuggets.com/2017/07/spotlight-remarkable-potential-ai-kyc.html

  • 7 Ways to Get High-Quality Labeled Training Data at Low Cost

    Having labeled training data is needed for machine learning, but getting such data is not simple or cheap. We review 7 approaches including repurposing, harvesting free sources, retrain models on progressively higher quality data, and more.

    https://www.kdnuggets.com/2017/06/acquiring-quality-labeled-training-data.html

  • Text Mining 101: Mining Information From A Resume">Silver Blog, May 2017Text Mining 101: Mining Information From A Resume

    We show a framework for mining relevant entities from a text resume, and how to separation parsing logic from entity specification.

    https://www.kdnuggets.com/2017/05/text-mining-information-resume.html

  • Must-Know: What are common data quality issues for Big Data and how to handle them?">Gold Blog, May 2017Must-Know: What are common data quality issues for Big Data and how to handle them?

    Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.

    https://www.kdnuggets.com/2017/05/must-know-common-data-quality-issues-big-data.html

  • Models: From the Lab to the Factory

    In this post, we’ll go over techniques to avoid these scenarios through the process of model management and deployment.

    https://www.kdnuggets.com/2017/04/models-from-lab-factory.html

  • Difference Between Big Data and Internet of Things

    If you cannot manage real-time streaming data and make real-time analytics and real-time decisions at the edge, then you are not doing IOT or IOT analytics, in my humble opinion. So what is required to support these IOT data management and analytic requirements?

    https://www.kdnuggets.com/2017/04/difference-big-data-internet-of-things.html

  • 17 More Must-Know Data Science Interview Questions and Answers, Part 3">Silver Blog, March 201717 More Must-Know Data Science Interview Questions and Answers, Part 3

    The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
     
     

    https://www.kdnuggets.com/2017/03/17-data-science-interview-questions-answers-part-3.html

  • Gartner Data Science Platforms – A Deeper Look

    Thomas Dinsmore critical examination of Gartner 2017 MQ of Data Science Platforms, including vendors who out, in, have big changes, Hadoop and Spark integration, open source software, and what Data Scientists actually use.

    https://www.kdnuggets.com/2017/03/thomaswdinsmore-gartner-data-science-platforms.html

  • Introduction to Natural Language Processing, Part 1: Lexical Units

    This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.

    https://www.kdnuggets.com/2017/02/datascience-introduction-natural-language-processing-part1.html

  • Machine Learning and Cyber Security Resources">Silver BlogMachine Learning and Cyber Security Resources

    An overview of useful resources about applications of machine learning and data mining in cyber security, including important websites, papers, books, tutorials, courses, and more.
     
     

    https://www.kdnuggets.com/2017/01/machine-learning-cyber-security.html

  • The big data ecosystem for science: Climate Science and Climate Change

    Climate change is one of the most pressing challenges for human society in the 21st century. We review the Big Data ecosystem for studying the climate change.

    https://www.kdnuggets.com/2016/12/big-data-ecosystem-science-climate-change.html

  • Smart Data Platform – The Future of Big Data Technology

    Data processing and analytical modelling are major bottlenecks in today’s big data world, due to need of human intelligence to decide relationships between data, required data engineering tasks, analytical models and it’s parameters. This article talks about Smart Data Platform to help to solve such problems.

    https://www.kdnuggets.com/2016/12/smart-data-platform-future-big-data-technology.html

  • Data Science and Big Data, Explained">Silver BlogData Science and Big Data, Explained

    This article is meant to give the non-data scientist a solid overview of the many concepts and terms behind data science and big data. While related terms will be mentioned at a very high level, the reader is encouraged to explore the references and other resources for additional detail.

    https://www.kdnuggets.com/2016/11/big-data-data-science-explained.html

  • LinkedIn Knowledge Graph – KDnuggets Interview

    We interview LinkedIn about their recently published LinkedIn Knowledge Graph which connects their many millions of members, jobs, companies, and more.

    https://www.kdnuggets.com/2016/10/interview-creators-linkedin-knowledge-graph.html

  • Top 10 Data Science Videos on Youtube">Gold BlogTop 10 Data Science Videos on Youtube

    Learning and the future are the key topics in the recent Youtube videos on Data Science. The main questions revolve around: “how to become a Data Scientist”, “what is a data scientist”, and “where data science is going”. But why there is so little explanation of data science to the masses?

    https://www.kdnuggets.com/2016/10/top-10-data-science-videos-youtube.html

  • Contest 2nd Place: Automating Data Science

    This post discusses some considerations, options, and opportunities for automating aspects of data science and machine learning. It is the second place recipient (tied) in the recent KDnuggets blog contest.

    https://www.kdnuggets.com/2016/08/automating-data-science.html

  • Doing Statistics with SQL

    This post covers how to perform some basic in-database statistical analysis using SQL.

    https://www.kdnuggets.com/2016/08/doing-statistics-sql.html

  • Improving Nudity Detection and NSFW Image Recognition

    This post discussed improvements made in a tricky machine learning classification problem: nude and/or NSFW, or not?

    https://www.kdnuggets.com/2016/06/algorithmia-improving-nudity-detection-nsfw-image-recognition.html

  • Data Lake Plumbers: Operationalizing the Data Lake

    Gain insight into data lakes, their benefits, when they are appropriate, and how to operationalize them. How do they compare to the data warehouse?

    https://www.kdnuggets.com/2016/02/data-lakes-plumbers-operationalizing.html

  • Data-Planet Statistical Datasets

    Data-Planet Statistical Datasets provides easy access to an extensive repository of standardized and structured statistical data, with more than 25 billion data points from more than 70 source organizations.

    https://www.kdnuggets.com/2015/11/data-planet-statistical-datasets.html

  • Integrating Python and R into a Data Analysis Pipeline, Part 1

    The first in a series of blog posts that: outline the basic strategy for integrating Python and R, run through the different steps involved in this process; and give a real example of how and why you would want to do this.

    https://www.kdnuggets.com/2015/10/integrating-python-r-data-analysis-part1.html

  • The Data Science Machine, or ‘How To Engineer Feature Engineering’

    MIT researchers have developed what they refer to as the Data Science Machine, which combines feature engineering and an end-to-end data science pipeline into a system that beats nearly 70% of humans in competitions. Is this game-changing?

    https://www.kdnuggets.com/2015/10/data-science-machine.html

  • Interview: Thanigai Vellore, Art.com on Delivering Contextually Relevant Search Experience

    We discuss the role of Analytics at Art.com, the polyglot data architecture at Art.com, the use cases for Hadoop, vendor selection, supporting semantic search and experience with Avro.

    https://www.kdnuggets.com/2015/07/interview-thanigai-vellore-art-search-experience.html

  • In Machine Learning, What is Better: More Data or better Algorithms

    Gross over-generalization of “more data gives better results” is misguiding. Here we explain, in which scenario more data or more features are helpful and which are not. Also, how the choice of the algorithm affects the end result.

    https://www.kdnuggets.com/2015/06/machine-learning-more-data-better-algorithms.html

  • Interview: Joseph Babcock, Netflix on Genie, Lipstick, and Other In-house Developed Tools

    We discuss role of analytics in content acquisition, data architecture at Netflix, organizational structure, and open-source tools from Netflix.

    https://www.kdnuggets.com/2015/06/interview-joseph-babcock-netflix-in-house-developed-tools.html

  • Top 30 Social Network Analysis and Visualization Tools

    We review major tools and packages for Social Network Analysis and visualization, which have wide applications including biology, finance, sociology, network theory, and many other domains.

    https://www.kdnuggets.com/2015/06/top-30-social-network-analysis-visualization-tools.html

  • Exclusive: Interview with Chris Wiggins, NYTimes Chief Data Scientist

    New York Times Chief Data Scientist Chris Wiggins on the transformation of digital journalism, key Data Science skills, favorite tools, why better wrong than nice, and how Thomas Jefferson is very relevant today.

    https://www.kdnuggets.com/2015/01/exclusive-interview-chris-wiggins-nytimes-chief-data-scientist.html

  • Interesting Social Media Datasets

    Learn about some of the many interesting social media datasets available to you, some of which are quite new, and the different features and challenges they offer you for your next big data science project.

    https://www.kdnuggets.com/2014/08/interesting-social-media-datasets.html+

  • Interview: Sastry Malladi, StubHub on Designing Big Data Architecture for the Unknown Future

    We discuss the Big Data architecture at StubHub, important factors in architecture design, hybrid approach of using Big Data along with traditional data warehouses, challenges, importance of meta-data and more.

    https://www.kdnuggets.com/2014/07/interview-sastry-malladi-stubhub-big-data-architecture.html

  • Is Data Scientist the right career path for you? Candid advice

    Candid advice from an industry veteran reveals the true picture behind the much-talked-about Data Scientist "glamour" and helps people have the right expectations for a Data Science career.

    https://www.kdnuggets.com/2014/03/data-scientist-right-career-path-candid-advice.html

  • KDnuggets™ News 14:n07, Mar 26

    Features (3) | Opinions (3) | Software (2) | News (3) | Webcasts (2) | Courses (1) | Meetings (3) | Jobs (5) | Academic Read more »

    https://www.kdnuggets.com/2014/n07.html

  • The Do’s and Don’ts of Data Mining

    Leading data mining and analytics experts give their favorite do's and don'ts, from "Do plan for data to be messy" to "Do not underestimate the power of a simpler-to-understand solution".

    https://www.kdnuggets.com/2014/03/data-mining-do-and-dont.html

  • Top Datasets on Reddit

    Most popular dataset posts on Reddit include NFL Game Metadata, Reddit top 2.5 Million posts, Zillow housing prices, and, of course, a database of cat pictures.

    https://www.kdnuggets.com/2013/12/top-datasets-on-reddit.html

  • KDnuggets™ News 14:n01, Jan 8

    coming on Jan 8

    https://www.kdnuggets.com/2014/n01.html

  • 2013 Dec News: Analytics, Big Data, Data Mining and Data Science Features, News, and Software

    All (95) | News, Software (27) | Courses, Events (12) | Jobs | Academic | Publications (38) Top stories for Dec 22-29: Data Mining Applications Read more »

    https://www.kdnuggets.com/2013/12/news-software.html

  • 2013 Dec: Analytics, Big Data, Data Mining and Data Science News

    All (95) | News, Software (27) | Courses, Events (12) | Jobs | Academic | Publications (38) Unicorn Data Scientists vs Data Science Teams - Read more »

    https://www.kdnuggets.com/2013/12/index.html

  • Data: Government, State, City, Local and Public

    This is a directory of government, federal, state, city, local and other public datasets. See also Data APIs, Hubs, Marketplaces, Platforms, Portals, and Search Engines. Read more »

    https://www.kdnuggets.com/datasets/government-local-public.html

  • Datasets for Data Science, Machine Learning, AI & Analytics

      KDnuggets subscribers now have access to the WorldData.AI Partners Plan at no cost! Check out the world’s largest external curated data platform, integrating data Read more »

    https://www.kdnuggets.com/datasets/index.html

  • Gartner Magic Quadrant for Business Intelligence and Analytics Platforms

    In 2012 Data Discovery became a mainstream part of BI and analytic architecture. The market also saw increased activity in real time, content and predictive analytics. The leaders in the market include Microsoft, IBM, Tableau, QlikTech, Oracle, SAS, MicroStrategy, Tibco Spotfire, Information Builders, and SAP.

    https://www.kdnuggets.com/2013/02/gartner-magic-quadrant-for-business-intelligence-analytics-platforms.html

Refine your search here:

No, thanks!