Data Acquisition to Improve Machine Learning Fairness through Multi-Armed Bandit
Primary Investigator:
Romila Pradhan
Jahid Hasan
Abstract
Over the past few decades, the extensive use of machine learning (ML) has shifted our focus from its implementation to its consequences. Several instances have indicated bias in ML-based systems deployed in sensitive fields (e.g., law, finance, HR, etc.), which has become a matter of concern. Numerous studies have shown that these biases in ML models originate from biased training data, making data the root cause of the issue. Several existing data preparation studies can address this issue. However, these approaches are problem-specific and can negatively impact downstream data usage. A more efficient approach would be to focus on earlier stages in the data science pipeline, such as data acquisition, which can significantly improve the quality of downstream analyses. To address this, we employ a comprehensive solution for fair data acquisition that includes data source selection, merging sources, clustering data instances, and finally, adopting an approach based on multi-armed bandits to acquire data for improved model fairness.