Finding the Story in the TweetStack: Mining Spatio-temporal Clusters for Event Correlation and Visualization
Project Members
Rahul Potharaju, Andrew Newell, Cristina Nita-Rotaru
Rahul Potharaju, Andrew Newell, Cristina Nita-Rotaru
Abstract
In recent years, social media activity has reached unprecendented levels. Hundreds of millions of users now participate in online social networks and forums, subscribe to microblogging services or maintain blogs. Twitter, in particular, is currently the major microblogging service, with more than 175 million subscribers. Twitter users generate short text messages, called tweets, to report their current thoughts and actions, comment on breaking news and engage in discussions. This work presents time series based clustering of real-time stream data as a pre-cursor to applying sophisticated natural language processing or machine learning techniques. First, we show that for entities related in the physical world, even without the aid of heavy natural language processing techniques, we can cluster them together merely based on the structure of their timelines. Second, by converting the inherent timeline structure into a symbolic representation, we intend to cluster time series of different words to obtain an initial set of clusters that can then be analyzed further using alternate techniques.