Trustworthy Retrieval-Augmented Generation for Adversarially Robust Medical Large Language Model Agents

Shane Bush; Leonard Bamos

Authors

Shane Bush Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.
Leonard Bamos Department of Computer Science, University of Houston, Houston, TX, USA.

Keywords:

Retrieval-Augmented Generation; Medical AI Agents; Adversarial Robustness; Trustworthiness; Large Language Models; Clinical Decision Support; System Governance

Abstract

The rapid integration of large language models into clinical decision support has created a new class of medical artificial intelligence agents capable of synthesizing vast medical knowledge, yet their safe deployment is undermined by adversarial threats and unresolved trustworthiness gaps. This paper examines the design space for retrieval-augmented generation architectures that serve as the cognitive core of these agents, focusing on the interplay between adversarial robustness, clinical reliability, and system-level governance. We argue that simply inserting a retrieval component into a generative model does not inherently confer trustworthiness; rather, it introduces a complex sociotechnical surface that adversaries can exploit through data poisoning, prompt injection, and corpus manipulation. Starting from a system-of-systems perspective, we dissect the layered infrastructure of medical large language model agents, analyze the emergent adversarial attack taxonomy specific to retrieval-augmented generation, and articulate architectural principles for embedding robustness without sacrificing clinical performance. The discussion extends to structural trade-offs involving latency, fairness, sustainability, and regulatory alignment, emphasizing that trustworthiness cannot be localized to a single model module but must be distributed across data governance, retrieval curation, model alignment, and human oversight. Policy implications for medical device regulation, continuous monitoring, and multi-stakeholder accountability are explored. By synthesizing cross-domain insights from machine learning security, health informatics, and infrastructure studies, we outline a holistic framework for building adversarially resilient medical agents that retain the flexibility of retrieval-augmented generation while upholding rigorous safety standards.

References

1. Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Scharli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., … Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 172–180.

2. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (Vol. 33, pp. 9459–9474).

3. Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 31–36).

4. Wallace, E., Feng, S., Kandpal, N., Gardner, M., & Singh, S. (2019). Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (pp. 2153–2162).

5. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. arXiv preprint arXiv:2302.12173.

6. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

7. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (Vol. 35, pp. 24824–24837).

8. World Health Organization. (2021). Ethics and governance of artificial intelligence for health: WHO guidance. World Health Organization.

9. Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., Kumar, A., Newman, B., Yuan, B., Yan, B., Zhang, C., Cosgrove, C., Manning, C. D., Ré, C., Acosta-Navas, D., Hudson, D. A., … Koreeda, Y. (2023). Holistic evaluation of language models. Transactions on Machine Learning Research.

10. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.

11. Rieke, N., Hancox, J., Li, W., Milletarì, F., Roth, H. R., Albarqouni, S., Bakas, S., Galtier, M. N., Landman, B. A., Maier-Hein, K., Ourselin, S., Sheller, M., Summers, R. M., Trask, A., Xu, D., Baust, M., & Cardoso, M. J. (2020). The future of digital health with federated learning. npj Digital Medicine, 3, 119.

12. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645–3650).

13. Luo, R., Sun, L., Xia, Y., Qin, T., Zhang, S., Poon, H., & Liu, T.-Y. (2022). BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), bbac409.

14. Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155.

15. Cheng, R., Li, Z., Ding, K., Zhao, T., & Huang, H. (2024). TrojanRAG: Retrieval-augmented generation can be backdoored. arXiv preprint arXiv:2403.04877.

16. Li, X., Tramèr, F., Liang, P., & Hashimoto, T. (2022). Large language models can be strong differentially private learners. In International Conference on Learning Representations.

17. Tjoa, E., & Guan, C. (2021). A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 4793–4813.

18. U.S. Food and Drug Administration. (2021). Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan.

19. Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in health and medicine. Nature Medicine, 28(1), 31–38.

20. Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., & Szolovits, P. (2021). What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14), 6421.

Trustworthy Retrieval-Augmented Generation for Adversarially Robust Medical Large Language Model Agents

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission