Valuation-based Data Acquisition to Improve Machine Learning Fairness
Primary Investigator:
Romila Pradhan
Ekta, Romila Pradhan
Abstract
Machine learning algorithms are increasingly being used in a variety of applications and are
heavily relied upon to make decisions that impact people’s lives. ML models are often praised for
their precision, yet they can discriminate against certain groups due to biased data. Historical
inequities can propagate through machine learning, posing a challenge to developing models that
are fair and unbiased for all. One of the major factors that lead to bias is the data used to train
them. It is important to address the biases in the training data, as they can lead to unfair and
unjust results when the model is deployed in real-world applications. The induced bias due to
data can be mitigated using three methodologies i.e., pre-processing, in-processing, and
post-processing. This study investigates Data Acquisition as a potential bias mitigation
technique, which is closest to pre-processing in the machine learning pipeline.