Cross-Modal Deep Hashing Framework for Wearable PPG Signal Retrieval with Self-Supervised Semantic Representation Learning
Keywords:
deep hashing, photoplethysmography, self-supervised learning, cross-modal retrieval, wearable computing, semantic representationAbstract
The proliferation of wearable photoplethysmography sensors has generated vast streams of cardiovascular data, creating an urgent need for efficient, semantic-aware retrieval mechanisms that can operate across heterogeneous contextual modalities. This paper presents a cross-modal deep hashing framework designed for PPG signal retrieval that integrates self-supervised semantic representation learning to extract robust, modality-invariant features from unlabeled physiological time series and associated metadata. The framework maps PPG segments and their corresponding semantic descriptors into a shared binary Hamming space, enabling fast approximate nearest neighbor search while preserving clinically meaningful similarities. A comprehensive system-level analysis is conducted, addressing architectural choices that balance quantization error against retrieval precision, the integration of contrastive and masked reconstruction objectives for representation learning, and the trade-offs inherent in deploying such models on resource-constrained wearable edge devices. The discussion extends to governance and policy considerations, including data privacy, fairness across demographic groups, and the sustainability of large-scale health retrieval infrastructures. By emphasizing structural robustness, bias mitigation, and cross-modal alignment, the proposed framework offers a principled pathway toward scalable, privacy-preserving, and equitable health monitoring systems. The paper concludes with an examination of deployment scenarios, evaluation benchmarks, and future directions for cross-modal biosignal retrieval in real-world healthcare ecosystems.
References
1. He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9729-9738). https://doi.org/10.1109/CVPR42600.2020.00975
2. Allen, J. (2007). Photoplethysmography and its application in clinical physiological measurement. Physiological Measurement, 28(3), R1–R39. https://doi.org/10.1088/0967-3334/28/3/R01
3. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning (pp. 1597-1607). PMLR.
4. Mehari, T., & Strodthoff, N. (2022). Self-supervised representation learning from electrocardiography data. Biomedical Signal Processing and Control, 71, 103244. https://doi.org/10.1016/j.bspc.2021.103244
5. Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2064-2072). https://doi.org/10.1109/CVPR.2016.227
6. Zhang, D., & Li, W. J. (2014). Large-scale supervised multimodal hashing with semantic correlation maximization. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.8955
7. Xie, L., Shen, J., & Zhu, L. (2019). Online cross-modal hashing for web image retrieval. IEEE Transactions on Multimedia, 21(10), 2583-2595. https://doi.org/10.1109/TMM.2019.2907590
8. Su, P., Ding, X. R., Zhang, Y. T., Liu, J., Miao, F., & Zhao, N. (2019). Long-term blood pressure prediction with deep recurrent neural networks. 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (pp. 1-4). https://doi.org/10.1109/BHI.2019.8834684
9. Chen, X., Fan, H., Girshick, R., & He, K. (2020). Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
10. Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., & Eickhoff, C. (2021). A transformer-based framework for multivariate time series representation learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 2114-2124). https://doi.org/10.1145/3447548.3467401
11. Yang, E., Deng, C., Li, C., Liu, W., Li, J., & Tao, D. (2018). Shared predictive cross-modal deep quantization. IEEE Transactions on Neural Networks and Learning Systems, 29(11), 5292-5303. https://doi.org/10.1109/TNNLS.2018.2793863
12. Yu, Z., Wu, S., Dou, Z., & Bakker, E. M. (2022). Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing, 483, 87-104.
13. Liang, Y., Chen, Z., Ward, R., & Elgendi, M. (2018). Photoplethysmography and deep learning: enhancing hypertension risk stratification. Biosensors, 8(4), 101. https://doi.org/10.3390/bios8040101
14. Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., & Zhang, J. (2019). Edge intelligence: paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE, 107(8), 1738-1762. https://doi.org/10.1109/JPROC.2019.2918951
15. Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., & Ghassemi, M. (2021). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 4, 123-144. https://doi.org/10.1146/annurev-biodatasci-092820-114757
16. Price, W. N., & Cohen, I. G. (2019). Privacy in the age of medical big data. Nature Medicine, 25(1), 37–43. https://doi.org/10.1038/s41591-018-0272-7
17. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning (pp. 8748-8763). PMLR.
18. Guo, Z., Chen, T., Jiao, Y., Pan, Y., Hu, X., & Ferrario, M. (2026). SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model. arXiv preprint arXiv:2601.21031.
19. Wang, J., Zhang, T., Song, J., Sebe, N., & Shen, H. T. (2018). A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 769-790. https://doi.org/10.1109/TPAMI.2017.2699960
20. Elgendi, M. (2012). On the analysis of fingertip photoplethysmogram signals. Current Cardiology Reviews, 8(1), 14-25. https://doi.org/10.2174/157340312801215782
21. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 119. https://doi.org/10.1038/s41746-020-00323-1
22. Schmidt, P., Reiss, A., Duerichen, R., & Van Laerhoven, K. (2019). Introducing WESAD, a multimodal dataset for wearable stress and affect detection. Proceedings of the 2018 on International Conference on Multimodal Interaction (pp. 400-408). https://doi.org/10.1145/3242969.3242985
23. Yue, Y., Khanal, A., Lyu, T., Weissman, S., & Liang, C. (2025, May). EHR Phenotyping Methods for Measuring Treatment Adherence Among People Living With HIV in All of Us: Towards Disparities and Inequalities in HIV Care Continuum. In AMIA Annual Symposium Proceedings (Vol. 2024, p. 1294). 24.Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Clinical and Translational Medicine

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



