How I Used Deep Learning To Train A Chatbot To Talk Like Me
In this post, we’ll be looking at how we can use a deep learning model to train a chatbot on my past social media conversations in hope of getting the chatbot to respond to messages the way that I would.
Dataset Creation
A big part of machine learning involves dataset preprocessing. The data archives from each of these sources comes differently formatted, and contains parts that we don’t really need (the pictures section of our FB data for example).
As you can see, the Hangouts data is formatted a bit differently from the Facebook data, and the LinkedIn messages are in a CSV format. Our goal, with all these datasets, is to just create one unified file that contains pairs in the form of (FRIENDS_MESSAGE, YOUR_RESPONSE).
To do that, I wrote a Python script that you can check out here. This script will create two different files. One will be a Numpy object (conversationDictionary.npy) that contains all of the input output pairs. The other will be a large txt file (conversationData.txt) that contains these pairs in sentence form, one after the other. Normally, I love being able to share datasets, but for this specific one, I’m keeping it private just because it has a lot of private conversations, and I don’t think my friends would be happy if they were just floating around on the internet. But here’s a snapshot of how the final dataset looks like.
Word Vectors
LOL. Lmao. Wtf. These are all words that showed up quite frequently in our conversation data file. While they are common in the realm of social media, they aren’t in a lot of traditional datasets. Normally, my first instinct when approaching any NLP task is to simply use pre-trained vectors, as they are trained on large corpuses for a large number of iterations. However, given that we have so many words and acronyms that aren’t in typical pre-trained word vector lists, generating our own word vectors Is critical to making sure that the words get represented properly.
To generate word vectors, we use the classic approach of a Word2Vec model. The basic idea is that the model creates word vectors by looking at the context with which words appear in sentences. Words with similar contexts will be placed close together in the vector space. For a more detailed overview of how a Word2Vec model is created and trained, check out this great blog post by one of my good friends, Rohan Varma.
I trained the Word2Vec model in this Python script here, which saves the word vectors in a Numpy object.
**Update: I later learned that the Tensorflow Seq2Seq function trains word embeddings from scratch, so I don’t end up using these word vectors, but it was still good practice **
Creating a Seq2Seq Model with Tensorflow
Now that we’ve created the dataset and generated our word vectors, we can move on to coding the Seq2Seq model. I created and trained the model in this Python script. I’ve tried to comment the code to the best of my ability, so hopefully you can follow along. The crux of the model lies in Tensorflow’s embedding_rnn_seq2seq() function. You can find documentation for it here.
Tracking the Training Progress
One of the interesting aspects of this project was getting a chance to look at how the responses changed as the network trained. At different points in the training loop, I tested the network on an input string, and outputted all of the non-pad and non-EOS tokens in the output. At first, you can see that the responses were mainly blank, as the network repeatedly outputted padding and EOS tokens. This is normal since padding tokens are by far the most frequent token in the whole dataset. Then, you can see that the network starts to output ‘lol’ for every single input string it is given. This makes sense intuitively since ‘lol’ gets used so often these days that it kind of is an acceptable response to anything. Slowly, you start to see more complete thoughts and grammatical structure come up in the responses. Could be due to a bit of overfitting as well.
Setting up the Facebook Messenger Chatbot
Now that we have a decently trained Seq2Seq model, let’s look at how to set up a simple FB messenger chatbot. The process was not too difficult, as it took me a little less than 30 minutes by following all the steps on this great tutorial. The basic idea is that we set up a server using a simple Express app, host it on Heroku, and then set up a Facebook App/Page to connect to it. Won’t go into too much detail beyond that, since I really thought that the author did a great job of explaining everything step by step, but at the end, you should have a Facebook app like this.
And you should be able to message your bot (This initial behavior is just echoing everything it gets sent).
Deploying our trained Tensorflow Model
So, now it’s time to put everything together. Since I haven’t found a good interface between Tensorflow and Node (don’t know if there’s an officially supported wrapper), I decided to deploy my model using a Flask server, and have the chatbot’s Express app interact with it.
You can check out the Flask server code here and the chatbot’s index.js file here.
Testing it Out!
If you’d like to chat with this bot, just go ahead and go to this link or go to this Facebook page and hit the Send Message button. It might take a while to respond for the first time, since the server needs to start up.
It’s probably difficult to judge whether or not the bot actually does talk like me (since not a lot of you have talked to me online LOL), but I’d say it’s doing alright! The grammar is passable, considering social media standards. You can cherry pick a couple good results, but most are pretty nonsensical. Here are some of the ones which help me sleep better at night because Skynet definitely isn’t happening any time soon.
I thought the first one was especially funny because “juju green” actually seems to be a combination of Juju Smith-Schuster, a Steelers wide receiver, and Draymond Green who is a forward for the Golden State Warriors. Interesting combination.
Okay let's be real. The performance is not that great right now to say the least. Let's think about ways to improve it though!
Ways to Improve
As you can probably tell from interacting with the chatbot, there is definitely room for a lot of improvement. After a couple of messages, it quickly becomes clear that having a sustained conversation simply just isn’t possible. The chabtot iisn’t able to connect thoughts together, and some of the responses seem random and incoherent. Here are some ways that could improve our chatbot’s performance.
- Incorporate other datasets to help the network learn from a larger conversation corpus. This would remove a bit of the "individualness" of the chatbot since it's strictly trained on my own conversations right now. However, I believe it would help generate more realistic conversations.
- Handling scenarios where the encoder message has nothing to do with what the decoder message is. Example is when one conversation ends, and you start a new one the next day. The topic of conversation could be completely unrelated. This could affect the model's training.
- Using bidirectional LSTMs, attention mechanisms, and bucketing.
- Tuning hyperparameters such as number of LSTM units, number of LSTM layers, choice of optimizer, number of training iterations, etc.
Would be curious to hear other suggestions in the comments too!
How You Can Build Your Own
If you’ve been following along, you should have a general idea of what’s needed to create a chatbot that talks like you. Let’s go over the steps one final time. Detailed instructions are available in the GitHub repo README.
- Find all the online social media sites in which you’ve had a conversation with someone, and download a copy of your data.
- Extract all the (MESSAGE, RESPONSE) pairs with CreateDataset.py or your own script.
- (OPTIONAL) Generate word vectors for each of the words that show up in our conversations through Word2Vec.py.
- Create, train, and save the sequence to sequence model in Seq2Seq.py.
- Create the Facebook chatbot.
- Create a Flask server where you deploy the saved Seq2Seq model.
- Edit index.js file in your Express app so it can communicate with the Flask server.
Interesting Papers
- Sequence to Sequence Learning
- Seq2Seq with Attention
- Neural Conversational Model
- Generative Hierarchical Models
- Persona Based Model
- Deep RL for Dialogue Generation
- Attention with Intention
- Diversity Promoting Objective Functions
- Copying Mechanisms in Seq2Seq
Other Helpful Posts
- Great blog post on Seq2Seq
- Slides from a Tensorflow course at Stanford
- Tensorflow Seq2Seq documentation
- Helpful video tutorial on using Tensorflow’s Seq2Seq functions
Shout out to Amit Tallapragada, Arvind Sankar, and Neil Chen for helping me out with Flask and Javascript stuff.
Original. Reposted with permission.
Related: