Foundation Model–Driven Multimodal Health Retrieval: Integrating PPG Physiological Signals and Medical Imaging via Scalable Deep Hashing

Aarav Chopra; Sunil J. Gandhi; Yuetang Pan

Authors

Aarav Chopra Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA.
Sunil J. Gandhi Department of Computer Science, Binghamton University, Binghamton, NY, USA.
Yuetang Pan School of Information Technology, University of Cincinnati, Cincinnati, OH, USA.

Keywords:

foundation model, multimodal health retrieval, photoplethysmography, medical imaging, deep hashing, self-supervised learning, scalable systems, fairness, governance

Abstract

The rapid proliferation of wearable sensors and medical imaging has created unprecedented volumes of heterogeneous health data, motivating the development of intelligent retrieval systems that can exploit both continuous physiological signals and visual diagnostic evidence. This paper presents a system-level investigation of foundation model–driven multimodal health retrieval that integrates photoplethysmogram (PPG) signals and medical imaging through scalable deep hashing. We argue that the confluence of self-supervised foundation models pre-trained on diverse physiological and imaging corpora, coupled with advanced asymmetric deep hashing techniques, offers a transformative pathway toward cross-modal semantic search in clinical environments. Departing from task-specific model training, the proposed architecture leverages modality-specific foundation models to produce unimodal embeddings that are then aligned in a shared latent space and encoded into compact binary hash codes via margin-scalable, self-supervised hashing mechanisms. The paper examines critical system dimensions including modular design, infrastructure requirements, computational sustainability, robustness against distributional shifts, fairness across demographic groups, and governance frameworks that reconcile performance with patient privacy and regulatory compliance. By synthesizing insights from medical AI, hashing theory, and socio-technical systems research, we delineate the structural trade-offs inherent in building a production-grade multimodal retrieval system for health analytics. The discussion further elaborates on deployment strategies that span edge–cloud continuua, the necessity of continuous model monitoring, and the policy implications of embedding such retrieval engines within clinical decision support pipelines. The paper advances a forward-looking perspective on how foundation model–driven hashing can unlock new classes of health applications while remaining attentive to the ethical and operational complexities of large-scale healthcare AI.

References

1. Huang, S. C., Pareek, A., Seyyedi, S., Banerjee, I., & Lungren, M. P. (2020). Fusion of medical imaging and electronic health records using deep learning: A systematic review and implementation guidelines. npj Digital Medicine, 3(1), 136.

2. Moor, M., Banerjee, O., Abad, Z. S. H., Krumholz, H. M., Leskovec, J., Topol, E. J., & Rajpurkar, P. (2023). Foundation models for generalist medical artificial intelligence. Nature, 616(7956), 259–265.

3. Wang, J., Zhang, T., Song, J., Sebe, N., & Shen, H. T. (2018). A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 769–790.

4. Elgendi, M. (2012). On the analysis of fingertip photoplethysmogram signals. Current Cardiology Reviews, 8(1), 14–25.

5. Jiang, Q. Y., & Li, W. J. (2017). Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3232–3240).

6. Azizi, S., Mustafa, B., Ryan, F., Hénaff, O. J., Buck, J. R., & others. (2021). Big self-supervised models advance medical image classification. Nature Biomedical Engineering, 5, 512–523.

7. Kiyasseh, D., Zhu, T., & Clifton, D. A. (2021). CLOCS: Contrastive learning of cardiac signals across space, time, and patients. In International Conference on Machine Learning (pp. 5606–5615). PMLR.

8. Esteva, A., Chou, K., Yeung, S., Naik, N., Madani, A., Mottaghi, A., ... & Topol, E. (2021). Deep learning-enabled medical computer vision. npj Digital Medicine, 4(1), 5.

9. Yu, Z., Wu, S., Dou, Z., & Bakker, E. M. (2022). Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing, 483, 87-104.

10. Zhang, Y., Jiang, H., Miura, Y., Manning, C. D., & Langlotz, C. P. (2022). Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference (pp. 2–25). PMLR.

11. Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2064–2072).

12. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

13. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

14. Kaissis, G., Ziller, A., Passerat-Palmbach, J., Ryffel, T., Usynin, D., Trask, A., ... & Braren, R. (2021). End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence, 3(6), 473–484.

15. Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in health and medicine. Nature Medicine, 28(1), 31–38.

16. Slapničar, G., Mlakar, N., & Luštrek, M. (2019). Blood pressure estimation from photoplethysmogram using a spectro-temporal deep neural network. Sensors, 19(15), 3420.

17. Guo, Z., Chen, T., Jiao, Y., Pan, Y., Hu, X., & Ferrario, M. (2026). SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model. arXiv preprint arXiv:2601.21031.

18. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30, pp. 4765–4774).

19. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. npj Digital Medicine, 3(1), 119.

20. Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L., Rothchild, D., ... (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.

21. Yue, Y., Khanal, A., Lyu, T., Weissman, S., & Liang, C. (2025, May). EHR Phenotyping Methods for Measuring Treatment Adherence Among People Living With HIV in All of Us: Towards Disparities and Inequalities in HIV Care Continuum. In AMIA Annual Symposium Proceedings (Vol. 2024, p. 1294).

Foundation Model–Driven Multimodal Health Retrieval: Integrating PPG Physiological Signals and Medical Imaging via Scalable Deep Hashing

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission