Explainable AI Framework for Wearable Signal–Driven Medical Decision-Making Using Large Language Model Agents and PPG Representations

Malcolm Koskinen; Akshay R. Arora; Rendres May

Authors

Malcolm Koskinen Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
Akshay R. Arora Department of Computer Science, University of New Hampshire, Durham, NH, USA.
Rendres May Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA.

Keywords:

Explainable AI, Large Language Model Agents, Photoplethysmography, Wearable Devices, Medical Decision-Making, Adversarial Robustness, Foundation Models

Abstract

The proliferation of wearable devices generating continuous photoplethysmography (PPG) signals has created unprecedented opportunities for personalized clinical decision-making outside traditional healthcare settings. However, translating raw biosignals into trustworthy, actionable medical insights demands architectures that simultaneously handle physiological complexity, ensure interpretability, and maintain robustness under distributional shifts and adversarial threats. This paper introduces an explainable artificial intelligence framework that couples PPG-specific foundation model representations with large language model (LLM) agents to produce context-aware, natural-language medical decision support. The framework employs a layered design in which self-supervised PPG encoders extract structured, semantically rich embeddings, a registry of LLM agents performs causal reasoning and evidence synthesis, and an integrated explainability layer generates counterfactual narratives, feature attributions, and uncertainty quantification. We analyze the system-level trade-offs between computational efficiency, latency, and explainability fidelity, and we discuss the essential role of human-in-the-loop governance in high-stakes environments. Adversarial robustness is addressed by incorporating input purification modules and agent-level security enhancements that mitigate prompt injection and representation manipulation. Furthermore, the paper examines cross-domain implications by comparing PPG-based decision systems with electrocardiogram and imaging paradigms, highlighting the distinct challenges imposed by inter-subject variability, motion artifacts, and consumer-grade sensor noise. We also reflect on regulatory readiness, fairness across demographic groups, and the sustainability of deploying large-scale models in resource-constrained edge environments. Rather than proposing a singular algorithm, this work contributes a comprehensive architectural blueprint and a critical discourse on the structural, ethical, and operational forces shaping the next generation of explainable, wearable-driven AI in medicine.

References

1. Allen, J. (2007). Photoplethysmography and its application in clinical physiological measurement. Physiological Measurement, 28(3), R1–R39.

2. Pereira, T., Tran, N., Gadhoumi, K., Pelter, M. M., Do, D. H., Lee, R. J., … & Hu, X. (2020). Photoplethysmography based atrial fibrillation detection: a review. npj Digital Medicine, 3(1), 3.

3. Guo, Z., Chen, T., Jiao, Y., Pan, Y., Hu, X., & Ferrario, M. (2026). SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model. arXiv preprint arXiv:2601.21031.

4. Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., … & Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 172–180.

5. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., … & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115.

6. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.

7. Rieke, N., Hancox, J., Li, W., Milletarì, F., Roth, H. R., Albarqouni, S., … & Bakas, S. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 119.

8. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (pp. 1597–1607). PMLR.

9. Dunn, J., Kidzinski, L., Runge, R., Witt, D., Hicks, J. L., Schüssler-Fiorenza Rose, S. M., … & Snyder, M. P. (2021). Wearable sensors enable personalized predictions of clinical laboratory measurements. Nature Medicine, 27, 1105–1112.

10. Li, J., Cheng, X., Zhao, W. X., Nie, J.-Y., & Wen, J.-R. (2023). HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 6449–6464). Association for Computational Linguistics.

11. Loh, H. W., Ooi, C. P., Tan, E., Ng, W. Y., Tan, R. S., & Acharya, U. R. (2022). Application of explainable artificial intelligence for healthcare: A systematic review of the last decade. Computer Methods and Programs in Biomedicine, 215, 106620.

12. Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765–4774).

13. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.

14. Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports, 6(1), 26094.

15. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

16. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.

17. Schulam, P., & Saria, S. (2017). Reliable decision support using counterfactual models. In Advances in Neural Information Processing Systems (pp. 1697–1708).

18. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 618–626).

19. Vyas, D. A., Eisenstein, L. G., & Jones, D. S. (2020). Hidden in plain sight—Reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine, 383(9), 874–882.

20. Wang, Z., Chen, Q., & Wang, W. (2023). Prompt injection attack against LLM-integrated applications. arXiv preprint arXiv:2306.05499.

21. Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., … & Gabriel, I. (2021). Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.

Explainable AI Framework for Wearable Signal–Driven Medical Decision-Making Using Large Language Model Agents and PPG Representations

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission