- Multilabel Document Categorization, step by step example - Aug 31, 2021.
This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.
- Data Compression via Dimensionality Reduction: 3 Main Methods - Dec 10, 2020.
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.
- Roadmap to Natural Language Processing (NLP) - Oct 19, 2020.
Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).
- Innovating versus Doing: NLP and CORD19 - Jun 30, 2020.
How I learned to trust the process and find value in the road most traveled.
- Beyond Word Embedding: Key Ideas in Document Embedding - Oct 11, 2019.
This literature review on document embedding techniques thoroughly covers the many ways practitioners develop rich vector representations of text -- from single sentences to entire books.
- An Overview of Topics Extraction in Python with Latent Dirichlet Allocation - Sep 4, 2019.
A recurring subject in NLP is to understand large corpus of texts through topics extraction. Whether you analyze users’ online reviews, products’ descriptions, or text entered in search bars, understanding key topics will always come in handy.
- Topic Modeling with LSA, PLSA, LDA & lda2Vec - Aug 30, 2018.
This article is a comprehensive overview of Topic Modeling and its associated techniques.
- America’s Next Topic Model - Jul 15, 2016.
Topic modeling is a a great way to get a bird's eye view on a large document collection using machine learning. Here are 3 ways to use open source Python tool Gensim to choose the best topic model.
- Bayesian Machine Learning, Explained - Jul 13, 2016.
Want to know about Bayesian machine learning? Sure you do! Get a great introductory explanation here, as well as suggestions where to go for further study.
- Text Mining 101: Topic Modeling - Jul 1, 2016.
We introduce the concept of topic modelling and explain two methods: Latent Dirichlet Allocation and TextRank. The techniques are ingenious in how they work – try them yourself.
- Interview: Thomas Levi, PlentyOfFish on What does Big Data tell us about Romance - Jul 30, 2014.
We discuss interesting research on the state of romance in US, how PlentyOfFish is managing competition, personal journey from String Theory to Data Science, career advice and more.
- Interview: Thomas Levi, POF on How Online Dating is Improving Matching through Big Data - Jul 29, 2014.
We discuss Big Data use cases at Plenty of Fish, insights from text mining of user profiles, using topic modeling for developing user archetypes, challenges and more.
- Interview: Samaneh Moghaddam, Applied Researcher, eBay on Aspect-based Opinion Mining - Jun 26, 2014.
We discuss aspect-based opinion mining, major challenges, cold start items, the need for accurate opinion mining models for cold start items and how factorized LDA can be leveraged.