publications
2024
- NEURIPSLLM-Check: Investigating Detection of Hallucinations in Large Language ModelsGaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kattakinda, and Soheil FeiziIn The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
While Large Language Models (LLMs) have become immensely popular due to their outstanding performance on a broad range of tasks, these models are prone to producing hallucinations— outputs that are fallacious or fabricated yet often appear plausible or tenable at a glance. In this paper, we conduct a comprehensive investigation into the nature of hallucinations within LLMs and furthermore explore effective techniques for detecting such inaccuracies in various real-world settings. Prior approaches to detect hallucinations in LLM outputs, such as consistency checks or retrieval-based methods, typically assume access to multiple model responses or large databases. These techniques, however, tend to be computationally expensive in practice, thereby limiting their applicability to real-time analysis. In contrast, in this work, we seek to identify hallucinations within a single response in both white-box and black-box settings by analyzing the internal hidden states, attention maps, and output prediction probabilities of an auxiliary LLM. In addition, we also study hallucination detection in scenarios where ground-truth references are also available, such as in the setting of Retrieval-Augmented Generation (RAG). We demonstrate that the proposed detection methods are extremely compute-efficient, with speedups of up to 45x and 450x over other baselines, while achieving significant improvements in detection performance over diverse datasets.
@inproceedings{sriramanan2024llmcheck, title = {{LLM}-Check: Investigating Detection of Hallucinations in Large Language Models}, author = {Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil}, booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year = {2024}, url = {https://openreview.net/forum?id=LYx4w3CAgy}, }
- ICMLFast Adversarial Attacks on Language Models In One GPU MinuteVinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan, Priyatham Kattakinda, Atoosa Chegini, and Soheil Feizi2024
In this paper, we introduce a novel class of fast, beam search-based adversarial attack (BEAST) for Language Models (LMs). BEAST employs interpretable parameters, enabling attackers to balance between attack speed, success rate, and the readability of adversarial prompts. The computational efficiency of BEAST facilitates us to investigate its applications on LMs for jailbreaking, eliciting hallucinations, and privacy attacks. Our gradient-free targeted attack can jailbreak aligned LMs with high attack success rates within one minute. For instance, BEAST can jailbreak Vicuna-7B-v1.5 under one minute with a success rate of 89% when compared to a gradient-based baseline that takes over an hour to achieve 70% success rate using a single Nvidia RTX A6000 48GB GPU. Additionally, we discover a unique outcome wherein our untargeted attack induces hallucinations in LM chatbots. Through human evaluations, we find that our untargeted attack causes Vicuna-7B-v1.5 to produce 15% more incorrect outputs when compared to LM outputs in the absence of our attack. We also learn that 22% of the time, BEAST causes Vicuna to generate outputs that are not relevant to the original prompt. Further, we use BEAST to generate adversarial prompts in a few seconds that can boost the performance of existing membership inference attacks for LMs. We believe that our fast attack, BEAST, has the potential to accelerate research in LM security and privacy.
@misc{sadasivan2024fastadversarialattackslanguage, title = {Fast Adversarial Attacks on Language Models In One GPU Minute}, author = {Sadasivan, Vinu Sankar and Saha, Shoumik and Sriramanan, Gaurang and Kattakinda, Priyatham and Chegini, Atoosa and Feizi, Soheil}, year = {2024}, eprint = {2402.15570}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2402.15570}, }
- Computers & SecurityMAlign: Explainable static raw-byte based malware family classification using sequence alignmentShoumik Saha, Sadia Afroz, and Atif Hasan RahmanComputers & Security, 2024
For a long time, malware classification and analysis have been an arms-race between antivirus systems and malware authors. Though static analysis is vulnerable to evasion techniques, it is still popular as the first line of defense in antivirus systems. But most of the static analyzers failed to gain the trust of practitioners due to their black-box nature. We propose MAlign, a novel static malware family classification approach inspired by genome sequence alignment that can not only classify malware families but can also provide explanations for its decision. MAlign encodes raw bytes using nucleotides and adopts genome sequence alignment approaches to create a signature of a malware family based on the conserved code segments in that family, without any human labor or expertise. We evaluate MAlign on two malware datasets, and it outperforms other state-of-the-art machine learning-based malware classifiers (by 4.49%∼0.07%), especially on small datasets (by 19.48%∼1.2%). Furthermore, we explain the generated signatures by MAlign on different malware families illustrating the kinds of insights it can provide to analysts, and show its efficacy as an analysis tool. Additionally, we evaluate its theoretical and empirical robustness against some common attacks. In this paper, we approach static malware analysis from a unique perspective, aiming to strike a delicate balance among performance, interpretability, and robustness.
@article{SAHA2024103714, title = {MAlign: Explainable static raw-byte based malware family classification using sequence alignment}, journal = {Computers & Security}, volume = {139}, pages = {103714}, year = {2024}, issn = {0167-4048}, doi = {https://doi.org/10.1016/j.cose.2024.103714}, url = {https://www.sciencedirect.com/science/article/pii/S0167404824000154}, author = {Saha, Shoumik and Afroz, Sadia and Rahman, Atif Hasan}, keywords = {Malware, Sequence alignment, Explainability, Machine learning, Adversarial}, }
- arxivDemystifying Behavior-Based Malware Detection at EndpointsYigitcan Kaya, Yizheng Chen, Shoumik Saha, Fabio Pierazzi, Lorenzo Cavallaro, David Wagner, and Tudor Dumitras2024
Machine learning is widely used for malware detection in practice. Prior behavior-based detectors most commonly rely on traces of programs executed in controlled sandboxes. However, sandbox traces are unavailable to the last line of defense offered by security vendors: malware detection at endpoints. A detector at endpoints consumes the traces of programs running on real-world hosts, as sandbox analysis might introduce intolerable delays. Despite their success in the sandboxes, research hints at potential challenges for ML methods at endpoints, e.g., highly variable malware behaviors. Nonetheless, the impact of these challenges on existing approaches and how their excellent sandbox performance translates to the endpoint scenario remain unquantified. We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces; we evaluate two scenarios where the endpoint detector was trained on (i) sandbox traces (convenient and accessible); and (ii) endpoint traces (less accessible due to needing to collect telemetry data). This allows us to identify a wide gap between prior methods’ sandbox-based detection performance–over 90%–and endpoint performances–below 20% and 50% in (i) and (ii), respectively. We pinpoint and characterize the challenges contributing to this gap, such as label noise, behavior variability, or sandbox evasion. To close this gap, we propose that yield a relative improvement of 5-30% over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection – scenario (i) – is challenging. The most promising direction is training detectors on endpoint data – scenario (ii) – which marks a departure from widespread practice. We implement a leaderboard for realistic detector evaluations to promote research.
@misc{kaya2024demystifyingbehaviorbasedmalwaredetection, title = {Demystifying Behavior-Based Malware Detection at Endpoints}, author = {Kaya, Yigitcan and Chen, Yizheng and Saha, Shoumik and Pierazzi, Fabio and Cavallaro, Lorenzo and Wagner, David and Dumitras, Tudor}, year = {2024}, eprint = {2405.06124}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2405.06124}, }
2023
- ICLRDRSM: De-Randomized Smoothing on Malware Classifier Providing Certified RobustnessShoumik Saha, Wenxiao Wang, Yigitcan Kaya, Soheil Feizi, and Tudor Dumitras2023
Machine Learning (ML) models have been utilized for malware detection for over two decades. Consequently, this ignited an ongoing arms race between malware authors and antivirus systems, compelling researchers to propose defenses for malware-detection models against evasion attacks. However, most if not all existing defenses against evasion attacks suffer from sizable performance degradation and/or can defend against only specific attacks, which makes them less practical in real-world settings. In this work, we develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. After showing how DRSM is theoretically robust against attacks with contiguous adversarial bytes, we verify its performance and certified robustness experimentally, where we observe only marginal accuracy drops as the cost of robustness. To our knowledge, we are the first to offer certified robustness in the realm of static detection of malware executables. More surprisingly, through evaluating DRSM against 9 empirical attacks of different types, we observe that the proposed defense is empirically robust to some extent against a diverse set of attacks, some of which even fall out of the scope of its original threat model. In addition, we collected 15.5K recent benign raw executables from diverse sources, which will be made public as a dataset called PACE (Publicly Accessible Collection(s) of Executables) to alleviate the scarcity of publicly available benign datasets for studying malware detection and provide future research with more representative data of the time.
@misc{saha2023drsmderandomizedsmoothingmalware, title = {DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness}, author = {Saha, Shoumik and Wang, Wenxiao and Kaya, Yigitcan and Feizi, Soheil and Dumitras, Tudor}, year = {2023}, eprint = {2303.13372}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2303.13372}, }
- arxivContrastive Self-Supervised Learning Based Approach for Patient Similarity: A Case Study on Atrial Fibrillation Detection from PPG SignalSubangkar Karmaker Shanto, Shoumik Saha, Atif Hasan Rahman, Mohammad Mehedy Masud, and Mohammed Eunus Ali2023
@misc{shanto2023contrastiveselfsupervisedlearningbased, title = {Contrastive Self-Supervised Learning Based Approach for Patient Similarity: A Case Study on Atrial Fibrillation Detection from PPG Signal}, author = {Shanto, Subangkar Karmaker and Saha, Shoumik and Rahman, Atif Hasan and Masud, Mohammad Mehedy and Ali, Mohammed Eunus}, year = {2023}, eprint = {2308.02433}, archiveprefix = {arXiv}, primaryclass = {eess.SP}, url = {https://arxiv.org/abs/2308.02433}, }