Tutorial: Building a Twitter Sentiment Analysis Process

Tutorial on collecting and analyzing tweets using the “Text Analysis by AYLIEN” extension for RapidMiner.



Step 3. Categorizing tweets So we’ve determined the sentiment of the tweets but like we said in the beginning, we also want to categorize them in some way. We can do this pretty easily by using the Categorize Operator from the Text Analysis Extension, but before we do we need to prepare our data for analysis. Firstly we’re going to use a Data to Documents Operator to generate Documents from our existing data set making it easier to categorize: tweet-classification We’ll then add a Categorize Operator which will basically classify our text based on a particular taxonomy (simply put, a set of predefined categories), in this case we’re using the IAB QAG taxonomy, which is a standard used in the digital advertising industry for categorizing content: tweet-categorization Now our Process is starting to take shape, but because we previously transformed our data into documents before they were categorized, we need to reverse the process and create a dataset from the resulting categorized documents, which in turn will make it easier to visualize and understand as a whole. tweet-transformation So here’s what our completed Process looks like: tweet-complete-process Connect the Operators and hit Run. The Process we've built now collects tweets, analyzes the Sentiment of those tweets, prepares them for categorization against a taxonomy and finally displays the results in an ExampleSet, like the one below: tweet-example-set