Explainable Medical Decision-Making through Adversarially Hardened LLM Agents and Semantic-Aware Multi-Label Image Hash Retrieval
Keywords:
explainable AI, large language models, adversarial robustness, medical image retrieval, semantic hashing, multi-label learning, clinical decision support, agentic systems, health informaticsAbstract
The rapid adoption of large language models (LLMs) in clinical environments has introduced unprecedented opportunities for decision support, yet it has simultaneously magnified concerns surrounding adversarial vulnerability, explainability, and the scalable retrieval of multimodal evidence. This paper presents a comprehensive systems-level analysis of an integrated framework that combines adversarially hardened LLM agents with semantic-aware multi-label image hash retrieval to deliver robust and interpretable medical decision-making. The discussion moves beyond algorithmic novelty to examine the structural trade-offs inherent in deploying such a hybrid system within real-world healthcare infrastructures. We explore how adversarial hardening techniques, including input purification, representation smoothing, and constrained decoding, can be embedded into the agent architecture without undermining clinical fluency or diagnostic accuracy. In parallel, we investigate the role of deep semantic hashing that preserves multi-label diagnostic relationships, enabling efficient similarity-preserving retrieval of medical images from large-scale repositories while offering traceable evidence paths for model recommendations. The interplay between the LLM agent and the retrieval engine is analyzed through the lenses of latency, memory footprint, trust calibration, and failure mode containment. Special attention is devoted to the governance challenges that arise when explainability mechanisms must satisfy both algorithmic transparency requirements and the epistemic needs of heterogeneous clinical stakeholders. The paper further interrogates fairness considerations in multi-label retrieval when disease prevalence distributions are imbalanced, and outlines deployment pathways that account for data sovereignty, model updating cadence, and energy sustainability. By synthesizing perspectives from adversarial machine learning, information retrieval, human-computer interaction, and health informatics, the work articulates a design philosophy in which explainability is not retrofitted but engineered as a first-class property of a secure, retrieval-augmented agentic system. The analysis concludes with policy-oriented reflections on certification, auditability, and cross-institutional governance for AI-enabled clinical decision support.
References
1. Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 25(1), 44-56.
2. Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in health and medicine. Nature Medicine, 28(1), 31-38.
3. Gunning, D., & Aha, D. W. (2019). DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, 40(2), 44-58.
4. Zou, A., Wang, Z., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
5. Quellec, G., Cazuguel, G., Cochener, B., & Lamard, M. (2017). Multiple-instance learning for medical image retrieval and classification. IEEE Transactions on Medical Imaging, 36(5), 1087-1096.
6. Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 31-36).
7. Li, J., Ji, S., Du, T., Li, B., & Wang, T. (2020). TextBugger: Generating adversarial text against real-world applications. In 26th Annual Network and Distributed System Security Symposium.
8. Salakhutdinov, R., & Hinton, G. (2009). Semantic hashing. International Journal of Approximate Reasoning, 50(7), 969-978.
9. Wang, J., Zhang, T., Song, J., Sebe, N., & Shen, H. T. (2018). A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 769-790.
10. Yu, Z., Wu, S., Dou, Z., & Bakker, E. M. (2022). Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing, 483, 87-104.
11. Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20, 310.
12. Holzinger, A., Langs, G., Denk, H., Zatloukal, K., & Müller, H. (2019). Causability and explainability of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(4), e1312.
13. Jia, R., & Liang, P. (2017). Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2021-2031).
14. European Commission. (2021). Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM/2021/206 final.
15. Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3), 50-57.
16. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.
17. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., ... & Raffel, C. (2023). Extracting training data from large language models. In 30th USENIX Security Symposium (pp. 2633-2650).
18. Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., ... & Ng, A. Y. (2019). CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 590-597).
19. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (pp. 1597-1607).
20. Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3270-3278).
21. Koh, P. W., & Liang, P. (2017). Understanding black-box predictions via influence functions. In International Conference on Machine Learning (pp. 1885-1894).
22. Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
23. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3, 119.
24. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 308-318).
25. U.S. Food and Drug Administration. (2019). Proposed regulatory framework for modifications to artificial intelligence/machine learning-based software as a medical device. Discussion paper.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Clinical and Translational Medicine

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



