publications | Shoumik Saha

2025

ACL
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing

Shoumik Saha, and Soheil Feizi

arXiv preprint arXiv:2502.15666, 2025

Abs Bib HTML Code

The growing use of large language models (LLMs) for text generation has led to widespread concerns about AI-generated content detection. However, an overlooked challenge is AI-polished text, where human-written content undergoes subtle refinements using AI tools. This raises a critical question: should minimally polished text be classified as AI-generated? Such classification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content. In this study, we systematically evaluate twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation (APT-Eval) dataset, which contains 15K samples refined at varying AI-involvement levels. Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models. These limitations highlight the urgent need for more nuanced detection methodologies.
@article{saha2025almost, title = {Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing}, author = {Saha, Shoumik and Feizi, Soheil}, journal = {arXiv preprint arXiv:2502.15666}, year = {2025}, url = {https://arxiv.org/abs/2502.15666}, data = {https://huggingface.co/datasets/smksaha/apt-eval} }
arxiv
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text

Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi, Shoumik Saha, and Soheil Feizi

arXiv preprint arXiv:2506.07001, 2025

Abs Bib HTML Code

The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across several detection systems. For instance, compared to simple paraphrasing attack—which, ironically, increases the true positive at 1% false positive (T@1%F) by 8.57% on RADAR and 15.03% on Fast-DetectGPT—adversarial paraphrasing, guided by OpenAI-RoBERTa-Large, reduces T@1%F by 64.49% on RADAR and a striking 98.96% on Fast-DetectGPT. Across a diverse set of detectors—including neural network-based, watermark-based, and zero-shot approaches—our attack achieves an average T@1%F reduction of 87.88% under the guidance of OpenAI-RoBERTa-Large. We also analyze the tradeoff between text quality and attack success to find that our method can significantly reduce detection rates, with mostly a slight degradation in text quality. Our adversarial setup highlights the need for more robust and resilient detection strategies in the light of increasingly sophisticated evasion techniques.
@article{cheng2025adversarial, title = {Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text}, author = {Cheng, Yize and Sadasivan, Vinu Sankar and Saberi, Mehrdad and Saha, Shoumik and Feizi, Soheil}, journal = {arXiv preprint arXiv:2506.07001}, year = {2025}, url = {https://arxiv.org/abs/2506.07001}, }
IEEE SaTML
ML-Based Behavioral Malware Detection Is Far From a Solved Problem

Yigitcan Kaya, Yizheng Chen, Marcus Botacin, Shoumik Saha, Fabio Pierazzi, Lorenzo Cavallaro, David Wagner, and 1 more author

2025

Abs Bib HTML

Malware detection is a ubiquitous application of Machine Learning (ML) in security. In behavioral malware analysis, the detector relies on features extracted from program execution traces. The research literature has focused on detectors trained with features collected from sandbox environments and evaluated on samples also analyzed in a sandbox. However, in deployment, a malware detector at endpoint hosts often must rely on traces captured from endpoint hosts, not from a sandbox. Thus, there is a gap between the literature and real-world needs. We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces, we evaluate two scenarios: (i) an endpoint detector trained on sandbox traces (convenient and easy to train), and (ii) an endpoint detector trained on endpoint traces (more challenging to train, since we need to collect telemetry data). We discover a wide gap between the performance as measured using prior evaluation methods in the literature – over 90% – vs. expected performance in endpoint detection – about 20% (scenario (i)) to 50% (scenario (ii)). We characterize the ML challenges that arise in this domain and contribute to this gap, including label noise, distribution shift, and spurious features. Moreover, we show several techniques that achieve 5–30% relative performance improvements over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection is challenging. The most promising direction is training detectors directly on endpoint data, which marks a departure from current practice. To promote progress, we will facilitate researchers to perform realistic detector evaluations against our real-world dataset.
@misc{kaya2025mlbasedbehavioralmalwaredetection, title = {ML-Based Behavioral Malware Detection Is Far From a Solved Problem}, author = {Kaya, Yigitcan and Chen, Yizheng and Botacin, Marcus and Saha, Shoumik and Pierazzi, Fabio and Cavallaro, Lorenzo and Wagner, David and Dumitras, Tudor}, year = {2025}, eprint = {2405.06124}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2405.06124}, }

2024

NEURIPS
LLM-Check: Investigating Detection of Hallucinations in Large Language Models

Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kattakinda, and Soheil Feizi

In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

Abs Bib HTML Code

While Large Language Models (LLMs) have become immensely popular due to their outstanding performance on a broad range of tasks, these models are prone to producing hallucinations— outputs that are fallacious or fabricated yet often appear plausible or tenable at a glance. In this paper, we conduct a comprehensive investigation into the nature of hallucinations within LLMs and furthermore explore effective techniques for detecting such inaccuracies in various real-world settings. Prior approaches to detect hallucinations in LLM outputs, such as consistency checks or retrieval-based methods, typically assume access to multiple model responses or large databases. These techniques, however, tend to be computationally expensive in practice, thereby limiting their applicability to real-time analysis. In contrast, in this work, we seek to identify hallucinations within a single response in both white-box and black-box settings by analyzing the internal hidden states, attention maps, and output prediction probabilities of an auxiliary LLM. In addition, we also study hallucination detection in scenarios where ground-truth references are also available, such as in the setting of Retrieval-Augmented Generation (RAG). We demonstrate that the proposed detection methods are extremely compute-efficient, with speedups of up to 45x and 450x over other baselines, while achieving significant improvements in detection performance over diverse datasets.
@inproceedings{sriramanan2024llmcheck, title = {{LLM}-Check: Investigating Detection of Hallucinations in Large Language Models}, author = {Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil}, booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year = {2024}, url = {https://openreview.net/forum?id=LYx4w3CAgy}, }
ICML
Fast Adversarial Attacks on Language Models In One GPU Minute

Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan, Priyatham Kattakinda, Atoosa Chegini, and Soheil Feizi

2024

Abs Bib HTML Code

In this paper, we introduce a novel class of fast, beam search-based adversarial attack (BEAST) for Language Models (LMs). BEAST employs interpretable parameters, enabling attackers to balance between attack speed, success rate, and the readability of adversarial prompts. The computational efficiency of BEAST facilitates us to investigate its applications on LMs for jailbreaking, eliciting hallucinations, and privacy attacks. Our gradient-free targeted attack can jailbreak aligned LMs with high attack success rates within one minute. For instance, BEAST can jailbreak Vicuna-7B-v1.5 under one minute with a success rate of 89% when compared to a gradient-based baseline that takes over an hour to achieve 70% success rate using a single Nvidia RTX A6000 48GB GPU. Additionally, we discover a unique outcome wherein our untargeted attack induces hallucinations in LM chatbots. Through human evaluations, we find that our untargeted attack causes Vicuna-7B-v1.5 to produce 15% more incorrect outputs when compared to LM outputs in the absence of our attack. We also learn that 22% of the time, BEAST causes Vicuna to generate outputs that are not relevant to the original prompt. Further, we use BEAST to generate adversarial prompts in a few seconds that can boost the performance of existing membership inference attacks for LMs. We believe that our fast attack, BEAST, has the potential to accelerate research in LM security and privacy.
@misc{sadasivan2024fastadversarialattackslanguage, title = {Fast Adversarial Attacks on Language Models In One GPU Minute}, author = {Sadasivan, Vinu Sankar and Saha, Shoumik and Sriramanan, Gaurang and Kattakinda, Priyatham and Chegini, Atoosa and Feizi, Soheil}, year = {2024}, eprint = {2402.15570}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2402.15570}, }
Computers & Security
MAlign: Explainable static raw-byte based malware family classification using sequence alignment

Shoumik Saha, Sadia Afroz, and Atif Hasan Rahman

Computers & Security, 2024

Abs DOI Bib HTML Code

For a long time, malware classification and analysis have been an arms-race between antivirus systems and malware authors. Though static analysis is vulnerable to evasion techniques, it is still popular as the first line of defense in antivirus systems. But most of the static analyzers failed to gain the trust of practitioners due to their black-box nature. We propose MAlign, a novel static malware family classification approach inspired by genome sequence alignment that can not only classify malware families but can also provide explanations for its decision. MAlign encodes raw bytes using nucleotides and adopts genome sequence alignment approaches to create a signature of a malware family based on the conserved code segments in that family, without any human labor or expertise. We evaluate MAlign on two malware datasets, and it outperforms other state-of-the-art machine learning-based malware classifiers (by 4.49%∼0.07%), especially on small datasets (by 19.48%∼1.2%). Furthermore, we explain the generated signatures by MAlign on different malware families illustrating the kinds of insights it can provide to analysts, and show its efficacy as an analysis tool. Additionally, we evaluate its theoretical and empirical robustness against some common attacks. In this paper, we approach static malware analysis from a unique perspective, aiming to strike a delicate balance among performance, interpretability, and robustness.
@article{SAHA2024103714, title = {MAlign: Explainable static raw-byte based malware family classification using sequence alignment}, journal = {Computers & Security}, volume = {139}, pages = {103714}, year = {2024}, issn = {0167-4048}, doi = {https://doi.org/10.1016/j.cose.2024.103714}, url = {https://www.sciencedirect.com/science/article/pii/S0167404824000154}, author = {Saha, Shoumik and Afroz, Sadia and Rahman, Atif Hasan}, keywords = {Malware, Sequence alignment, Explainability, Machine learning, Adversarial}, }

2023

ICLR
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness

Shoumik Saha, Wenxiao Wang, Yigitcan Kaya, Soheil Feizi, and Tudor Dumitras

2023

Abs Bib HTML Code

Machine Learning (ML) models have been utilized for malware detection for over two decades. Consequently, this ignited an ongoing arms race between malware authors and antivirus systems, compelling researchers to propose defenses for malware-detection models against evasion attacks. However, most if not all existing defenses against evasion attacks suffer from sizable performance degradation and/or can defend against only specific attacks, which makes them less practical in real-world settings. In this work, we develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. After showing how DRSM is theoretically robust against attacks with contiguous adversarial bytes, we verify its performance and certified robustness experimentally, where we observe only marginal accuracy drops as the cost of robustness. To our knowledge, we are the first to offer certified robustness in the realm of static detection of malware executables. More surprisingly, through evaluating DRSM against 9 empirical attacks of different types, we observe that the proposed defense is empirically robust to some extent against a diverse set of attacks, some of which even fall out of the scope of its original threat model. In addition, we collected 15.5K recent benign raw executables from diverse sources, which will be made public as a dataset called PACE (Publicly Accessible Collection(s) of Executables) to alleviate the scarcity of publicly available benign datasets for studying malware detection and provide future research with more representative data of the time.
@misc{saha2023drsmderandomizedsmoothingmalware, title = {DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness}, author = {Saha, Shoumik and Wang, Wenxiao and Kaya, Yigitcan and Feizi, Soheil and Dumitras, Tudor}, year = {2023}, eprint = {2303.13372}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2303.13372}, }

arxiv

Contrastive Self-Supervised Learning Based Approach for Patient Similarity: A Case Study on Atrial Fibrillation Detection from PPG Signal

Subangkar Karmaker Shanto, Shoumik Saha, Atif Hasan Rahman, Mohammad Mehedy Masud, and Mohammed Eunus Ali

2023

Bib HTML

@misc{shanto2023contrastiveselfsupervisedlearningbased,
  title = {Contrastive Self-Supervised Learning Based Approach for Patient Similarity: A Case Study on Atrial Fibrillation Detection from PPG Signal},
  author = {Shanto, Subangkar Karmaker and Saha, Shoumik and Rahman, Atif Hasan and Masud, Mohammad Mehedy and Ali, Mohammed Eunus},
  year = {2023},
  eprint = {2308.02433},
  archiveprefix = {arXiv},
  primaryclass = {eess.SP},
  url = {https://arxiv.org/abs/2308.02433},
}