What follows is then an effort to draw an architecture to access knowledge on AI and follow emergent dynamics, a gateway of pre-existing knowledge on the topic that will allow you to scout around for additional information and eventually create new knowledge on AI.
A well-known model that learns vectors or words from their co-occurrence information is GlobalVectors (GloVe). While word2vec is a predictive model — a feed-forward neural network that learns vectors to improve the predictive ability, GloVe is a count-based model.
In this blog, we’ll try to understand the different interpretations of this “distant” notion. We will also look into the outlier detection and treatment techniques while seeing their impact on different types of machine learning models.
I reported that you can multiply the speed of common (fast) random number generators such as PCG and xorshift128+ by a factor of three or four by vectorizing them using SIMD instructions. Is this actually useful in practice?
The vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering and sentiment analysis. Real world problem are much more complicated than that.
Core principles for successful data visualization, including tips on how to reduce clutter, preattentive processing and how to integrate text within the graph.
Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment!
There is a need to compare different APIs to understand key pros and cons they have and when it is better to use one API instead of the other. Let us proceed with the comparison.
A summary of the key points from the Google Cloud Next in San Francisco, "What’s New with TensorFlow?", including neural networks, TensorFlow Lite, data pipelines and more.
This comprehensive cheat sheet will assist Docker users, experienced and new, in getting containers up-and-running quickly. We list commands that will allow users to install, build, ship and run Docker containers.
Realizing that there is a legitimate knowledge gap between UX Designers and Data Scientists, I have decided to attempt addressing the needs from the Data Scientist’s perspective.
At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.
Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.
Using the Python gradient boosting library LightGBM, this article introduces fraud detection systems, with code samples included to help you get started.
Auto-Keras is an open source software library for automated machine learning. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.
Named entity recognition (NER) , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.
In this post, I will explore the implementation of reinforcement learning in trading. The Financial industry has been exploring the applications of Artificial Intelligence and Machine Learning for their use-cases, but the monetary risk has prompted reluctance.
This is an overview of some basic functionality of the MXNet ndarray package for creating tensor-like objects, and using the autograd package for performing automatic differentiation.
In this post we’ll give an introduction to the exploratory and visualization t-SNE algorithm. t-SNE is a powerful dimension reduction and visualization technique used on high dimensional data.
Whether you're a novice data science enthusiast setting up TensorFlow for the first time, or a seasoned AI engineer working with terabytes of data, getting your libraries, packages, and frameworks installed is always a struggle. Learn how datmo, an open source python package, helps you get started in minutes.
Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.
Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.
At base, RL is a complex algorithm for mapping observed entities and measures into some set of actions, while optimizing for a long-term or short-term reward.
In this post, I'll go over the two mindsets most people switch between when doing programming work specifically for data science: the prototype mindset and the production mindset.
By using the within-cluster sum of squares as cost function, data points in the same cluster will be similar to each other, whereas data points in different clusters will have a lower level of similarity.
I will highlight some of the most important steps which are used heavily in Natural Language Processing (NLP) pipelines and I frequently use them in my NLP projects.
We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset.
This article covers defining statistics, descriptive statistics, measures of central tendency, and measures of spread. This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python.