Protecting personal information is growing increasingly important to the general public, to the point that major tech companies now advertise the privacy features of their products. Despite this, it remains challenging to implement applications that do not leak private information either directly or indirectly, through timing behavior, memory access patterns, or control flow side channels. Existing security and cryptographic techniques such as secure multiparty computation (MPC) provide solutions to privacy-preserving computation, but they can be difficult to use for non-experts and even experts.
This dissertation develops the design, theory and implementation of various language-based techniques that help programmers write privacy-critical applications under a strong threat model. The proposed languages support private structured data, such as trees, that may hide their structural information and complex policies that go beyond whether a particular field of a record is private. More crucially, the approaches described in this dissertation decouple privacy and programmatic concerns, allowing programmers to implement privacy-preserving applications modularly, i.e., to independently develop application logic and independently update and audit privacy policies. Secure-by-construction applications are derived automatically by combining a standard program with a separately specified security policy.
As machine learning continues to grow and surprise us, its complexity grows as well. Indeed, many machine learning models have become black boxes. Yet, there is a prevailing need for practicality. This dissertation offers some practicality on generative modeling and synthetic data, a recently popular application of generative models. First, Lightweight Chained Universal Approximators (LiCUS) is proposed. Motivated by statistical sampling principles, LiCUS tackles a simplified generative task with its universal approximation property while having a minimal computational bottleneck. When compared to a generative adversarial network (GAN) and variational auto-encoder (VAE), LiCUS empirically yields synthetic data with greater utility for a classifier on the Modified National Institute of Standards and Technology (MNIST) dataset. Second, following on its potential for informative synthetic data, LiCUS undergoes an extensive synthetic data supplementation experiment. The experiment largely serves as an informative starting point for practical use of synthetic data via LiCUS. In addition, by proposing a gold standard of reserved data, the experimental results suggest that additional data collection may generally outperform models supplemented with synthetic data, at least when using LiCUS. Given that the experiment was conducted on two datasets, future research could involve further experimentation on a greater number and variety of datasets, such as images. Lastly, generative machine learning generally demands large datasets, which is not guaranteed in practice. To alleviate this demand, one could offer expert knowledge. This is demonstrated by applying an expert-informed Wasserstein GAN with gradient penalty (WGAN-GP) on network flow traffic from NASA’s Operational Simulation for Small Satellites (NOS3). If one were to directly apply a WGAN-GP, it would fail to respect the physical limitations between satellite components and permissible communications amongst them. By arming a WGAN-GP with cyber-security software Argus, the informed WGAN-GP could produce permissible satellite network flows when given as little as 10,000 flows. In all, this dissertation illustrates how machine learning processes could be modified under a more practical lens and incorporate pre-existing statistical principles and expert knowledge.
The last decade has witnessed an unprecedented rise in the application of machine learning in high-stake automated decision-making systems such as hiring, policing, bail sentencing, medical screening, etc. The long-lasting impact of these intelligent systems on human life has drawn attention to their fairness implications. A majority of subsequent studies targeted the existing historically unfair decision labels in the training data as the primary source of bias and strived toward either removing them from the dataset (de-biasing) or avoiding learning discriminatory patterns from them during training. In this thesis, we show label bias is not a necessary condition for unfair outcomes from a machine learning model. We develop theoretical and empirical evidence showing that biased model outcomes can be introduced by a range of different data properties and components of the machine learning development pipeline.
In this thesis, we first prove that machine learning models are expected to introduce bias even when the training data doesn’t include label bias. We use the proof-by-construction technique in our formal analysis. We demonstrate that machine learning models, trained to optimize for joint accuracy, introduce bias even when the underlying training data is free from label bias but might include other forms of disparity. We identify two data properties that led to the introduction of bias in machine learning. They are the group-wise disparity in the feature predictivity and the group-wise disparity in the rates of missing values. The experimental results suggest that a wide range of classifiers trained on synthetic or real-world datasets are prone to introducing bias under feature disparity and missing value disparity independently from or in conjunction with the label bias. We further analyze the trade-off between fairness and established techniques to improve the generalization of machine learning models such as adversarial training, increasing model complexity, etc. We report that adversarial training sacrifices fairness to achieve robustness against noisy (typically adversarial) samples. We propose a fair re-weighted adversarial training method to improve the fairness of the adversarially trained models while sacrificing minimal adversarial robustness. Finally, we observe that although increasing model complexity typically improves generalization accuracy, it doesn’t linearly improve the disparities in the prediction rates.
This thesis unveils a vital limitation of machine learning that has yet to receive significant attention in FairML literature. Conventional FairML literature reduces the ML fairness task to as simple as de-biasing or avoiding learning discriminatory patterns. However, the reality is far away from it. Starting from deciding on which features collect up to algorithmic choices such as optimizing robustness can act as a source of bias in model predictions. It calls for detailed investigations on the fairness implications of machine learning development practices. In addition, identifying sources of bias can facilitate pre-deployment fairness audits of machine learning driven automated decision-making systems.
In this era of digital surveillance and data breaches, it is important to understand how users protect their smartphone privacy. There needs to be more detailed information regarding the prevalence, factors, and motivations influencing the adoption of privacy-enhancing tools and settings on mobile devices. This study aimed to address this knowledge gap by investigating the use of privacy tools among smartphone users and examining the impact of factors like demographics, awareness levels, and device platforms. The study surveyed 342 participants recruited through Amazon Mechanical Turk (MTurk), and the data were analyzed. The survey gathered data on user characteristics, privacy concerns, experiences with breaches, and use of various privacy tools. Statistical analysis showed that demographic factors, particularly age, significantly influenced the use of privacy tools, aligning with previous research. Users with a higher awareness of digital privacy risks were likelier to adopt privacy-enhancing tools. The study found no significant difference in the prevalence and type of privacy tools used between iOS and Android users. The study’s focus on privacy-enhancing tools among smartphone users and the proposed hypotheses provide valuable insights for law enforcement and forensic practitioners, aiding in digital investigations, evidence collection, and understanding user behavior related to smartphone privacy measures. The study’s outcomes contribute to digital forensics, cybersecurity, and privacy domains by providing insights into user behaviors, motivations, and the factors shaping privacy tool adoption on smartphones. These findings can inform the development of more user-centric privacy tools, policies, and educational campaigns, ultimately enhancing digital privacy protection and supporting law enforcement investigations in the digital age.
Human error is one of the most prominent challenges facing cybersecurity today. Attackers manipulate people’s natural inclination to make mistakes using social engineering tactics to exploit psychological vulnerabilities, gain trust, and access sensitive information. Trust plays a critical role in human interaction, both in the physical and digital realms, making it an attractive target for attackers. However, cultural backgrounds, which reflect individual and societal beliefs and values, are often overlooked in cybersecurity risk assessments, despite significantly influencing human behavior. This study was conducted to investigate the relationship between trust and cybersecurity risks across diverse cultural groups. The study’s findings could provide valuable insights into addressing and preventing human-related vulnerabilities by enhancing overall cybersecurity measures and examining cross-cultural differences in human behavior and their impact on cybersecurity risks. As human factors in cybersecurity become increasingly crucial, this study was performed to understand the differences in risky cybersecurity behaviors among various cultural groups and investigate the impact of different perceptions of trust on engaging in risky behaviors. The outcome of this research provides insights into the critical role cultural backgrounds play in shaping human behavior in the context of cybersecurity. The results of this study may have significant implications for enhancing overall cybersecurity measures by identifying and addressing human-related vulnerabilities that may be unique to specific cultural groups.
Insider threats are among the most costly and prevalent cybersecurity incidents. Modern organizations lack an effective way to detect and deter insider threat events; traditional mitigation approaches that focus on recruitment processes and workplace behavior have proven insufficient. Current analytic detection tools do not map technical indicators to organizational policies. This limitation results in poor risk calculations, rendering inaccurate risk mitigation decisions regarding insider threats. This paper proposes a pragmatic, data-driven approach that uses policy-mapped technical indicators to assess insider threat risk. Our approach provides a quantitative insider threat risk score to facilitate informed decision-making by policymakers. Using computer simulation modeling and synthetic data to iterate common threat scenarios, we increase the probability of detecting an insider threat event. This novel approach provides quantitative analysis with distinct advantages over qualitative risk matrices commonly used in industry to forecast and assess organizational risk.
A retrospective on the Grand Challenges Worshop sponsored by the Computing Research Association
Wearable devices are ubiquitous. There are over 1.1 billion wearable devices in the market today. The market is projected to grow at a rate of 14.6% annually till 2030. These devices collect and store a large amount of data. A major amount of this collected data is stored in the cloud. For many years now, law enforcement organizations have been continuously encountering cases that involve a wearable device in some capacity. There have also been examples of how these wearable devices have helped in crime investigations and insurance fraud investigations . The article performs an analysis of 5 case studies and 57 news articles and shows how the framing of wearables in the context of the crimes helped those cases. However, there still isn’t enough awareness and understanding among law enforcement agencies on leveraging the data collected by these devices to solve crimes. Many of the fitness trackers and smartwatches in the market today have more or less similar functionalities of tracking data on an individual’s fitness-related activities, heart rate, sleep, temperature, and stress. One of the major players in the smartwatch space is Fitbit. Fitbit synchronizes the data that it collects, directly to Fitbit Cloud. It provides an Android app and a web application for users to access some of these data, but not all. Application developers on the other hand can make use of Fitbit APIs to use user’s data. These APIs can also be leveraged by law enforcement agencies to aid in digital forensic investigations. There have been previous studies where they have developed tools that make use of Fitbit Web APIs but for various other purposes, not for forensic research. There are a few studies on the topic of using fitness tracker data for forensic investigations . But very few have used the Fitbit developer APIs. Thus this study aims to propose a proof-of-concept platform that can be leveraged by law enforcement agencies to access and view the data stored on the Fitbit cloud on a person of interest. The results display data on 12 categories - activity, body, sleep, breathing, devices, friends, nutrition, heart rate variability, ECG, temperature, oxygen level, and cardio data, in a tabular format that is easily viewable and searchable. This data can be further utilized for various analyses. The tool developed is Open Source and well documented, thus anyone can reproduce the process.
The challenge of providing data privacy and integrity while maintaining efficient performance for honest users is a persistent concern in cryptography. Attackers exploit advances in parallel hardware and custom circuit hardware to gain an advantage over regular users. One such method is the use of Application-Specific Integrated Circuits (ASICs) to optimize key derivation function (KDF) algorithms, giving adversaries a significant advantage in password guessing and recovery attacks. Other examples include using graphical processing units (GPUs) and field programmable gate arrays (FPGAs).
We propose a focused approach to close the gap between adversarial advantage and honest user performance by leveraging the hardware optimization AES-NI (Advanced Encryption Standard New Instructions). AES-NI is widely available in modern x86 architecture microprocessors. Honest users can negate the adversary advantage by diminishing the utility of their computational power. We explore the impact of AES-NI on the Argon2i KDF algorithm, a widely-used and recommended password hashing function.
Through our analysis, we demonstrate the effectiveness of incorporating AES-NI in reducing the advantage gained by attackers using ASICs. We also discuss the security and performance trade-offs to provide guidelines for practical implementation in deployed cryptosystems.
The progressive integration of microcontrollers into various domains has transformed traditional mechanical systems into modern cyber-physical systems. However, the beginning of this transformation predated the era of hyper-interconnectedness that characterizes our contemporary world. As such, the principles and visions guiding the design choices of this transformation had not accounted for many of today’s security challenges. Many designers had envisioned their systems to operate in an air-gapped-like fashion where few security threats loom. However, with the hyper-connectivity of today’s world, many CPS find themselves in uncharted territory for which they are unprepared.
An example of this evolution is the Controller Area Network (CAN). CAN emerged during the transformation of many mechanical systems into cyber-physical systems as a pivotal communication standard, reducing vehicle wiring and enabling efficient data exchange. CAN’s features, including noise resistance, decentralization, error handling, and fault confinement mechanisms, made it a widely adopted communication medium not only in transportation but also in diverse applications such as factories, elevators, medical equipment, avionic systems, and naval applications.
The increasing connectivity of modern vehicles through CD players, USB sticks, Bluetooth, and WiFi access has exposed CAN systems to unprecedented security challenges and highlighted the need to bolster their security posture. This dissertation addresses the urgent need to enhance the security of modern cyber-physical systems in the face of emerging threats by proposing a proactive vulnerability identification and defense construction approach and applying it to CAN as a lucid case study. By adopting this proactive approach, vulnerabilities can be systematically identified, and robust defense mechanisms can be constructed to safeguard the resilience of CAN systems.
We focus on developing vulnerability scanning techniques and innovative defense system designs tailored for CAN systems. By systematically identifying vulnerabilities before they are discovered and exploited by external actors, we minimize the risks associated with cyber-attacks, ensuring the longevity and reliability of CAN systems. Furthermore, the defense mechanisms proposed in this research overcome the limitations of existing solutions, providing holistic protection against CAN threats while considering its performance requirements and operational conditions.
It is important to emphasize that while this dissertation focuses on CAN, the techniques and rationale used here could be replicated to secure other cyber-physical systems. Specifically, due to CAN’s presence in many cyber-physical systems, it shares many performance and security challenges with those systems, which makes most of the techniques and approaches used here easily transferrable to them. By accentuating the importance of proactive security, this research endeavors to establish a foundational approach to cyber-physical systems security and resiliency. It recognizes the evolving nature of cyber-physical systems and the specific security challenges facing each system in today’s hyper-connected world and hence focuses on a single case study.
Social engineering attacks, especially trust exploitation, have become a focus of attention for cybercriminals attempting to manipulate or deceive users to take actions that further expose their vulnerabilities. This has also become a budding field for researchers as these interactions are based on complex social equations that are constantly taken advantage of. Identifying the “weakest link” is a popular method of identifying how these exploits take place, generally by observing when individuals fall for a social engineering attack. However, valuable insights may be used to harden security by observing patterns in users resistant or vigilant to these attacks. Primarily, this trend may be discovered in resistant users’ personality traits. This has been found to be a more accurate indicator of behavior than self-reported intentions. Survey responses (n=120) indicate correlations between high test scores in trust exploitation exercises and Conscientiousness in the Big 5 Personality Model (p<0.001). No significant correlation was seen between self-reported cybersecurity habits and actual security behavior.
The proliferation of Internet access has enabled the rapid and widespread exchange of information globally. The world wide web has become the primary communications platform for many people and has surpassed other traditional media outlets in terms of reach and influence. However, many nation-states impose various levels of censorship on their citizens’ Internet communications. There is little consensus about what constitutes “objectionable” online content deserving of censorship. Some people consider the censor activities occurring in many nations to be violations of international human rights (e.g., the rights to freedom of expression and assembly). This multi-study dissertation explores Internet censorship methods and systems. By using combinations of quantitative, qualitative, and systematic literature review methods, this thesis provides an interdisciplinary view of the domain of Internet censorship. The author presents a reference model for Internet censorship technologies: an abstraction to facilitate a conceptual understanding of the ways in which Internet censorship occurs from a system design perspective. The author then characterizes the technical threats to Internet communications, producing a comprehensive taxonomy of Internet censorship methods as a result. Finally, this work provides a novel research framework for revealing how nation-state censors operate based on a globally representative sample. Of the 70 nations analyzed, 62 used at least one Internet censorship method against their citizens. The results reveal worldwide trends in Internet censorship based on historical evidence and Internet measurement data.
Internet users require secure means of communication. Virtual Private Networks (VPNs) often serve this purpose, for consumers and businesses. The research aims of this paper were an analysis and implementation of the new VPN protocol WireGuard. The authors explain the cryptographic primitives used, build server and client code implementations of WireGuard peers, and present the benefits and drawbacks of this new technology. The outcome was a functional WireGuard client and server implementation, capable of tunneling all Internet traffic through a cloud-based virtual private server (VPS), with minimal manual configuration necessary from the end user. The code is publicly available.
The National Institute of Standards and Technology defines social engineering as an attack vector that deceives an individual into divulging confidential information or performing unwanted actions. Different methods of social engineering include phishing, pretexting, tailgating, baiting, vishing, SMSishing, and quid pro quo. These attacks can have devastating effects, especially in the healthcare sector, where there are budgetary and time constraints. To address these issues, this study aimed to use cybersecurity experts to identify the most important social engineering attacks to the healthcare sector and rank the underlying factors in terms of cost, success rate, and data breach. By creating a ranking that can be updated constantly, organizations can provide more effective training to users and reduce the overall risk of a successful attack. This study identified phishing attacks via email, voice and SMS to be the most important to defend against primarily due to the number of attacks. Baiting and quid pro quo consistently ranked as lower in priority and ranking.
Social Engineering attacks have been a rising issue in recent years, affecting a multitude of industries. One industry that has been of great interest to hackers is the Healthcare industry due to the high value of patient information. Social Engineering attacks are mainly common because of the ease of execution and the high probability of victimization. A popular way of combatting Social Engineering attacks is by increasing the user’s ability to detect indicators of attack, which requires a level of cybersecurity education. While the number of cybersecurity training programs is increasing, Social Engineering attacks are still very successful. Therefore, education programs need to be improved to effectively increase the ability of users to notice indicators of attack. This research aimed to answer the question - what teaching method results in the greatest learning gains for understanding Social Engineering concepts? This was done by investigating text-based, gamification, and adversarial thinking teaching methods. These three teaching methods were used to deliver lessons on an online platform to a sample of Purdue students. After conducting analysis, both text-based and adversarial thinking showed significant improvement in the understanding of Social Engineering concepts within the student sample. After conducting a follow-up test, a single teaching method was not found to be better among the three teaching methods. However, this study did find two teaching methods that can be used to develop training programs to help decrease the total number of successful Social Engineering attacks across industries.