Fairness Debugging of Tree-based Models using Machine Unlearning
PDFPrimary Investigator: Romila Pradhan
Project Members
Tanmay Surve, Dr. Romila Pradhan
AbstractMachine learning (ML) is fast becoming the standard choice for data science applications that involve automated decision-making in sensitive domains such as finance, healthcare, crime prevention, and justice management. Designed carefully, ML-based systems have the potential to eliminate the undesirable aspects of human decision-making such as biased judgments. However, concern continues to mount that these systems reinforce systemic biases and discrimination often reflected in their training data. Tree-based machine learning models, such as decision trees and random forests, are one of the most widely used machine learning models primarily because of their predictive power in supervised learning tasks and ease of interpretation. Given their overwhelming success for most tasks, it is of interest to identify root causes of unexpected and discriminatory behavior of tree-based models. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness. We introduce an algorithm which identifies the top-k data points or patterns in training dataset that are responsible for model bias. One of the main parts of our algorithm is to utilize the recent advances in machine unlearning research. Using techniques from machine unlearning, our algorithm can find responsible data points or patterns in the training dataset which are responsible for inducing fairness-based bias on the predictions of testing dataset by the model in a time which is much faster than naively retraining the models.
Fuzzy Logic to the Rescue: Cracking the Code on Grooming Stages' Fuzziness!
PDFPrimary Investigator: Tatiana Ringenberg
Project Members
Siva Sahitya Simhadri
AbstractOnline grooming refers to the practice where an adult builds a relationship with a child or young person with the intention of exploiting them for sexual purposes. The number of internet grooming offenses reported to the police is growing and has increased by more than 80% in the last four years. There are five stages of online grooming via which offenders groom children online. Having a robust approach to detect and intervene in such conversations in the earlier stages is the need of the hour. Grooming chats have always been characterized as crisp sets until now (i.e., each chatline belonging to only one of the 5 stages). The primary objective of this work is to deviate from the conventional method and represent the grooming stages using the fuzzy membership function. We propose a framework to classify predator conversations into different grooming stages. The dataset used for this task was annotated by 2 annotators with over 80% reliability.