Shoumik Saha

CS Ph.D. Student

UMD College Park

smksaha@umd.edu

I am a 4th year Computer Science Ph.D. student at the University of Maryland - College Park, where I am fortunate to be advised by Prof. Soheil Feizi. This and last summer, I have worked as an Applied Scientist Intern at Amazon AWS. My research journey began with a focus on machine learning for security, particularly in malware detection. Over time, my interests have evolved toward security and reliability in machine learning. These days, I’m dedicated to enhancing the robustness and reliability of generative AI, and AI Agents.

If you check out my CV, you’ll see a consistent theme: I enjoy exploring challenges from both sides of the coin – attack and defense, red team and blue team, or however you’d like to frame it. Sounds interesting? Feel free to reach out to discuss my research or potential collaborations!

Before joining the Ph.D. program, I earned my B.Sc. from Bangladesh University of Engineering and Technology (BUET). I then gained valuable experience as a full-time lecturer at United International University and a part-time research assistant at BUET’s research lab.

news

May 27, 2025	Joined Amazon AWS as an Applied Scientist Intern
May 16, 2025	Research got featured in The New York Times!
May 15, 2025	Paper on AI-text Detection accepted to ACL 2025!
Jan 19, 2025	Paper (co-authored) accepted to SaTML 2025!
Dec 20, 2024	Fall 2024: Completed all the coursework of Ph.D.

selected publications

arxiv
Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

Shoumik Saha, Jifan Chen, Sam Mayers, Sanjay Krishna Gouda, Zijian Wang, and Varun Kumar

arXiv preprint arXiv:2510.01359, 2025

Abs Bib HTML

Code-capable large language model (LLM) agents are increasingly embedded into software engineering workflows where they can read, write, and execute code, raising the stakes of safety-bypass ("jailbreak") attacks beyond text-only settings. Prior evaluations emphasize refusal or harmful-text detection, leaving open whether agents actually compile and run malicious programs. We present JAWS-BENCH (Jailbreaks Across WorkSpaces), a benchmark spanning three escalating workspace regimes that mirror attacker capability: empty (JAWS-0), single-file (JAWS-1), and multi-file (JAWS-M). We pair this with a hierarchical, executable-aware Judge Framework that tests (i) compliance, (ii) attack success, (iii) syntactic correctness, and (iv) runtime executability, moving beyond refusal to measure deployable harm. Using seven LLMs from five families as backends, we find that under prompt-only conditions in JAWS-0, code agents accept 61% of attacks on average; 58% are harmful, 52% parse, and 27% run end-to-end. Moving to single-file regime in JAWS-1 drives compliance to 100% for capable models and yields a mean ASR (Attack Success Rate) 71%; the multi-file regime (JAWS-M) raises mean ASR to 75%, with 32% instantly deployable attack code. Across models, wrapping an LLM in an agent substantially increases vulnerability – ASR raises by 1.6x – because initial refusals are frequently overturned during later planning/tool-use steps. Category-level analyses identify which attack classes are most vulnerable and most readily deployable, while others exhibit large execution gaps. These findings motivate execution-aware defenses, code-contextual safety filters, and mechanisms that preserve refusal decisions throughout the agent’s multi-step reasoning and tool use.
@article{saha2025breaking, title = {Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks}, author = {Saha, Shoumik and Chen, Jifan and Mayers, Sam and Gouda, Sanjay Krishna and Wang, Zijian and Kumar, Varun}, journal = {arXiv preprint arXiv:2510.01359}, year = {2025}, url = {https://arxiv.org/pdf/2510.01359}, }
ACL
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing

Shoumik Saha, and Soheil Feizi

ACL (Association for Computational Linguistics), 2025

Abs Bib HTML Code

The growing use of large language models (LLMs) for text generation has led to widespread concerns about AI-generated content detection. However, an overlooked challenge is AI-polished text, where human-written content undergoes subtle refinements using AI tools. This raises a critical question: should minimally polished text be classified as AI-generated? Such classification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content. In this study, we systematically evaluate twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation (APT-Eval) dataset, which contains 15K samples refined at varying AI-involvement levels. Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models. These limitations highlight the urgent need for more nuanced detection methodologies.
@article{saha2025almost, title = {Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing}, author = {Saha, Shoumik and Feizi, Soheil}, journal = {ACL (Association for Computational Linguistics)}, year = {2025}, url = {https://arxiv.org/abs/2502.15666}, data = {https://huggingface.co/datasets/smksaha/apt-eval} }
NEURIPS
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text

Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi, Shoumik Saha, and Soheil Feizi

NeurIPS (Conference on Neural Information Processing Systems), 2025

Abs Bib HTML Code

The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across several detection systems. For instance, compared to simple paraphrasing attack—which, ironically, increases the true positive at 1% false positive (T@1%F) by 8.57% on RADAR and 15.03% on Fast-DetectGPT—adversarial paraphrasing, guided by OpenAI-RoBERTa-Large, reduces T@1%F by 64.49% on RADAR and a striking 98.96% on Fast-DetectGPT. Across a diverse set of detectors—including neural network-based, watermark-based, and zero-shot approaches—our attack achieves an average T@1%F reduction of 87.88% under the guidance of OpenAI-RoBERTa-Large. We also analyze the tradeoff between text quality and attack success to find that our method can significantly reduce detection rates, with mostly a slight degradation in text quality. Our adversarial setup highlights the need for more robust and resilient detection strategies in the light of increasingly sophisticated evasion techniques.
@article{cheng2025adversarial, title = {Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text}, author = {Cheng, Yize and Sadasivan, Vinu Sankar and Saberi, Mehrdad and Saha, Shoumik and Feizi, Soheil}, journal = {NeurIPS (Conference on Neural Information Processing Systems)}, year = {2025}, url = {https://arxiv.org/abs/2506.07001}, }
EMNLP
ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision-Language Model Performance

Kazi Tasnim Zinat, Saad Mohammad Abrar, Shoumik Saha, Sharmila Duppala, Saimadhav Naga Sakhamuri, and Zhicheng Liu

EMNLP (Empirical Methods in Natural Language Processing), 2025

Abs Bib

Vision-Language Models have shown both impressive capabilities and notable failures in data visualization understanding tasks, but we have limited understanding on how specific properties within a visualization type affect model performance. We present ProcVQA, a benchmark designed to analyze how VLM performance can be affected by structure type and structural density of visualizations depicting frequent patterns mined from sequence data. ProcVQA consists of mined process visualizations spanning three structure types (linear sequences, tree, graph) with varying levels of structural density (quantified using the number of nodes and edges), with expert-validated QA pairs on these visualizations. We evaluate 21 proprietary and open-source models on the dataset on two major tasks: visual data extraction (VDE) and visual question answering (VQA)(with four categories of questions). Our analysis reveals three key findings. First, models exhibit steep performance drops on multi-hop reasoning, with question type and structure type impacting the degradation. Second, structural density strongly affects VDE performance: hallucinations and extraction errors increase with edge density, even in frontier models. Third, extraction accuracy does not necessarily translate into strong reasoning ability. By isolating structural factors through controlled visualization generation, ProcVQA enables precise identification of VLM limitations.
@article{zinatprocvqa, title = {ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision-Language Model Performance}, author = {Zinat, Kazi Tasnim and Abrar, Saad Mohammad and Saha, Shoumik and Duppala, Sharmila and Sakhamuri, Saimadhav Naga and Liu, Zhicheng}, journal = {EMNLP (Empirical Methods in Natural Language Processing)}, year = {2025}, url = {https://mail.hdi.cs.umd.edu/papers/ProcVQA_EMNLP25.pdf}, }
NEURIPS
LLM-Check: Investigating Detection of Hallucinations in Large Language Models

Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kattakinda, and Soheil Feizi

In NeurIPS (Conference on Neural Information Processing Systems), 2024

Abs Bib HTML Code

While Large Language Models (LLMs) have become immensely popular due to their outstanding performance on a broad range of tasks, these models are prone to producing hallucinations— outputs that are fallacious or fabricated yet often appear plausible or tenable at a glance. In this paper, we conduct a comprehensive investigation into the nature of hallucinations within LLMs and furthermore explore effective techniques for detecting such inaccuracies in various real-world settings. Prior approaches to detect hallucinations in LLM outputs, such as consistency checks or retrieval-based methods, typically assume access to multiple model responses or large databases. These techniques, however, tend to be computationally expensive in practice, thereby limiting their applicability to real-time analysis. In contrast, in this work, we seek to identify hallucinations within a single response in both white-box and black-box settings by analyzing the internal hidden states, attention maps, and output prediction probabilities of an auxiliary LLM. In addition, we also study hallucination detection in scenarios where ground-truth references are also available, such as in the setting of Retrieval-Augmented Generation (RAG). We demonstrate that the proposed detection methods are extremely compute-efficient, with speedups of up to 45x and 450x over other baselines, while achieving significant improvements in detection performance over diverse datasets.
@inproceedings{sriramanan2024llmcheck, title = {{LLM}-Check: Investigating Detection of Hallucinations in Large Language Models}, author = {Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil}, booktitle = {NeurIPS (Conference on Neural Information Processing Systems)}, year = {2024}, url = {https://openreview.net/forum?id=LYx4w3CAgy}, }
ICML
Fast Adversarial Attacks on Language Models In One GPU Minute

Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan, Priyatham Kattakinda, Atoosa Chegini, and Soheil Feizi

2024

Abs Bib HTML Code

In this paper, we introduce a novel class of fast, beam search-based adversarial attack (BEAST) for Language Models (LMs). BEAST employs interpretable parameters, enabling attackers to balance between attack speed, success rate, and the readability of adversarial prompts. The computational efficiency of BEAST facilitates us to investigate its applications on LMs for jailbreaking, eliciting hallucinations, and privacy attacks. Our gradient-free targeted attack can jailbreak aligned LMs with high attack success rates within one minute. For instance, BEAST can jailbreak Vicuna-7B-v1.5 under one minute with a success rate of 89% when compared to a gradient-based baseline that takes over an hour to achieve 70% success rate using a single Nvidia RTX A6000 48GB GPU. Additionally, we discover a unique outcome wherein our untargeted attack induces hallucinations in LM chatbots. Through human evaluations, we find that our untargeted attack causes Vicuna-7B-v1.5 to produce 15% more incorrect outputs when compared to LM outputs in the absence of our attack. We also learn that 22% of the time, BEAST causes Vicuna to generate outputs that are not relevant to the original prompt. Further, we use BEAST to generate adversarial prompts in a few seconds that can boost the performance of existing membership inference attacks for LMs. We believe that our fast attack, BEAST, has the potential to accelerate research in LM security and privacy.
@misc{sadasivan2024fastadversarialattackslanguage, title = {Fast Adversarial Attacks on Language Models In One GPU Minute}, author = {Sadasivan, Vinu Sankar and Saha, Shoumik and Sriramanan, Gaurang and Kattakinda, Priyatham and Chegini, Atoosa and Feizi, Soheil}, journal = {ICML (International Conference on Machine Learning)}, year = {2024}, eprint = {2402.15570}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2402.15570}, }
ICLR
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness

Shoumik Saha, Wenxiao Wang, Yigitcan Kaya, Soheil Feizi, and Tudor Dumitras

2023

Abs Bib HTML Code

Machine Learning (ML) models have been utilized for malware detection for over two decades. Consequently, this ignited an ongoing arms race between malware authors and antivirus systems, compelling researchers to propose defenses for malware-detection models against evasion attacks. However, most if not all existing defenses against evasion attacks suffer from sizable performance degradation and/or can defend against only specific attacks, which makes them less practical in real-world settings. In this work, we develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. After showing how DRSM is theoretically robust against attacks with contiguous adversarial bytes, we verify its performance and certified robustness experimentally, where we observe only marginal accuracy drops as the cost of robustness. To our knowledge, we are the first to offer certified robustness in the realm of static detection of malware executables. More surprisingly, through evaluating DRSM against 9 empirical attacks of different types, we observe that the proposed defense is empirically robust to some extent against a diverse set of attacks, some of which even fall out of the scope of its original threat model. In addition, we collected 15.5K recent benign raw executables from diverse sources, which will be made public as a dataset called PACE (Publicly Accessible Collection(s) of Executables) to alleviate the scarcity of publicly available benign datasets for studying malware detection and provide future research with more representative data of the time.
@misc{saha2023drsmderandomizedsmoothingmalware, title = {DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness}, author = {Saha, Shoumik and Wang, Wenxiao and Kaya, Yigitcan and Feizi, Soheil and Dumitras, Tudor}, journal = {ICLR (International Conference on Learning Representations)}, year = {2023}, eprint = {2303.13372}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2303.13372}, }
Computers & Security
MAlign: Explainable static raw-byte based malware family classification using sequence alignment

Shoumik Saha, Sadia Afroz, and Atif Hasan Rahman

Computers & Security, 2024

Abs DOI Bib HTML Code

For a long time, malware classification and analysis have been an arms-race between antivirus systems and malware authors. Though static analysis is vulnerable to evasion techniques, it is still popular as the first line of defense in antivirus systems. But most of the static analyzers failed to gain the trust of practitioners due to their black-box nature. We propose MAlign, a novel static malware family classification approach inspired by genome sequence alignment that can not only classify malware families but can also provide explanations for its decision. MAlign encodes raw bytes using nucleotides and adopts genome sequence alignment approaches to create a signature of a malware family based on the conserved code segments in that family, without any human labor or expertise. We evaluate MAlign on two malware datasets, and it outperforms other state-of-the-art machine learning-based malware classifiers (by 4.49%∼0.07%), especially on small datasets (by 19.48%∼1.2%). Furthermore, we explain the generated signatures by MAlign on different malware families illustrating the kinds of insights it can provide to analysts, and show its efficacy as an analysis tool. Additionally, we evaluate its theoretical and empirical robustness against some common attacks. In this paper, we approach static malware analysis from a unique perspective, aiming to strike a delicate balance among performance, interpretability, and robustness.
@article{SAHA2024103714, title = {MAlign: Explainable static raw-byte based malware family classification using sequence alignment}, journal = {Computers & Security}, volume = {139}, pages = {103714}, year = {2024}, issn = {0167-4048}, doi = {https://doi.org/10.1016/j.cose.2024.103714}, url = {https://www.sciencedirect.com/science/article/pii/S0167404824000154}, author = {Saha, Shoumik and Afroz, Sadia and Rahman, Atif Hasan}, keywords = {Malware, Sequence alignment, Explainability, Machine learning, Adversarial}, }
IEEE SaTML
ML-Based Behavioral Malware Detection Is Far From a Solved Problem

Yigitcan Kaya, Yizheng Chen, Marcus Botacin, Shoumik Saha, Fabio Pierazzi, Lorenzo Cavallaro, David Wagner, and 1 more author

2025

Abs Bib HTML

Malware detection is a ubiquitous application of Machine Learning (ML) in security. In behavioral malware analysis, the detector relies on features extracted from program execution traces. The research literature has focused on detectors trained with features collected from sandbox environments and evaluated on samples also analyzed in a sandbox. However, in deployment, a malware detector at endpoint hosts often must rely on traces captured from endpoint hosts, not from a sandbox. Thus, there is a gap between the literature and real-world needs. We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces, we evaluate two scenarios: (i) an endpoint detector trained on sandbox traces (convenient and easy to train), and (ii) an endpoint detector trained on endpoint traces (more challenging to train, since we need to collect telemetry data). We discover a wide gap between the performance as measured using prior evaluation methods in the literature – over 90% – vs. expected performance in endpoint detection – about 20% (scenario (i)) to 50% (scenario (ii)). We characterize the ML challenges that arise in this domain and contribute to this gap, including label noise, distribution shift, and spurious features. Moreover, we show several techniques that achieve 5–30% relative performance improvements over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection is challenging. The most promising direction is training detectors directly on endpoint data, which marks a departure from current practice. To promote progress, we will facilitate researchers to perform realistic detector evaluations against our real-world dataset.
@misc{kaya2025mlbasedbehavioralmalwaredetection, title = {ML-Based Behavioral Malware Detection Is Far From a Solved Problem}, author = {Kaya, Yigitcan and Chen, Yizheng and Botacin, Marcus and Saha, Shoumik and Pierazzi, Fabio and Cavallaro, Lorenzo and Wagner, David and Dumitras, Tudor}, year = {2025}, eprint = {2405.06124}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2405.06124}, }