2020 Symposium Posters

Posters > 2020

Twitter Topic and Bias Identification: A Graph-based Approach


PDF

Primary Investigator:
Julia (Taylor) Rayz

Project Members
Xiaonan Jing, Yi Zhang, Qingyuan Hu
Abstract
An unusual fall semester at Purdue University began amid the COVID-19 pandemic last month with the new measures implemented to protect the boilermakers drew attention on social networks. Twitter serves as a data source for many Natural Language Processing (NLP) tasks. With a rapid changing online environment and a vast amount of textual data generated daily, it can be challenging to identify real-time topics on Twitter. Yet, tracing real-time topics is important for learning user interests and behaviors. Furthermore, potential content bias can be detected within the topics to help identify online security concerns. In this project, we are interested in detecting sub-topics on Twitter related to the popular “COVID-19” event at Purdue University. We employ graph structures, which are powerful tools for modeling the relationships between textual elements. A Graph-of-Words (GoW) based words association model is implemented to trace the daily Twitter content change through two weeks from Aug 19 to Sep 2. Particularly, we apply a Markov Clustering Algorithm (MCL) along with a graph node removal approach to identify the daily content clusters. Additionally, we leverage FastText word embeddings to identify content bias in tweets.