Federated Learning for Privacy-Preserving Immune Gene Typing and Cross-Cohort Immunogenomic Analysis from Long-Read Sequencing

Xavier Norales; Kasper Hawkins; Brandon Karrett

Authors

Xavier Norales Department of Computer Science, University of New Hampshire, Durham, NH, USA.
Kasper Hawkins Department of Computer Science, George Mason University, Fairfax, VA, USA.
Brandon Karrett Department of Computer Science, University of North Texas, Denton, TX, USA.

Keywords:

federated learning, privacy-preserving genomics, immune gene typing, long-read sequencing, cross-cohort analysis, differential privacy, secure aggregation, immunogenomics, data governance

Abstract

The rapid adoption of long-read sequencing technologies has enabled high-resolution typing of highly polymorphic immune genes, such as those in the major histocompatibility complex, yet the aggregation of such data across multiple cohorts for immunogenomic association studies introduces significant privacy risks. This paper proposes a federated learning framework designed to enable privacy-preserving immune gene typing and cross-cohort immunogenomic analysis from distributed long-read sequencing datasets. We conceptualize a system architecture that integrates local model training on cohort-specific sequencing repositories with secure aggregation protocols, differential privacy mechanisms, and decentralized governance structures. The framework addresses critical trade-offs between model fidelity, communication efficiency, statistical power, and protection against re-identification attacks. We examine the infrastructural demands of deploying such a system across heterogeneous clinical and research sites, including the need for harmonized variant calling pipelines, standardized immune gene annotations, and robust quality control measures that preserve privacy while ensuring biological validity. Furthermore, we analyze the governance and policy implications of federated immunogenomic analysis, including consent management, data sovereignty, and equitable access to derived models. By drawing parallels to existing federated learning deployments in medical imaging and electronic health records, we discuss sustainability, fairness, and robustness challenges specific to polymorphic gene typing. Our analysis concludes that while federated learning offers a compelling paradigm for multi-cohort immunogenomic discovery, its successful implementation requires careful orchestration of algorithmic, regulatory, and ethical dimensions.

References

1. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 54, 1273–1282.

2. Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–407.

3. Logsdon, G. A., Vollger, M. R., & Eichler, E. E. (2020). Long-read human genome sequencing and its applications. Nature Reviews Genetics, 21(10), 597–614.

4. Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., Wang, Q., ... & MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581(7809), 434–443.

5. Robinson, J., Barker, D. J., Georgiou, X., Cooper, M. A., Flicek, P., & Marsh, S. G. E. (2020). IPD-IMGT/HLA Database. Nucleic Acids Research, 48(D1), D948–D955.

6. Wang, S., Wang, X., Wang, M., Zhou, Q., Wang, L., & Li, S. C. (2026). A Scalable Framework for Comprehensive Typing of Polymorphic Immune Genes from Long‐Read Data. Advanced Science, e21531.

7. Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., ... & Wu, J. (2019). Towards federated learning at scale: System design. Proceedings of Machine Learning and Systems, 1, 374–388.

8. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–318.

9. Kaissis, G. A., Makowski, M. R., Rückert, D., & Braren, R. F. (2020). Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence, 2(6), 305–311.

10. Gurdasani, D., Carstensen, T., Fatumo, S., Chen, G., Franklin, C. S., Prado-Martinez, J., ... & Sandhu, M. S. (2019). Uganda genome resource enables insights into population history and genomic architecture of complex traits. Nature Communications, 10, 4615.

11. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Bakas, S. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 119.

12. Sardar, B., Rahman, M. A., Acharjee, S., & Akhtar, M. N. (2021). Federated learning for genomic data: A systematic review. Briefings in Bioinformatics, 22(5), bbab139.

13. Abadi, M., & Andersen, D. G. (2021). Learning with differential privacy: A survey. arXiv preprint arXiv:2102.12395.

14. Geyer, R. C., Klein, T., & Nabi, N. (2017). Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557.

15. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1175–1191.

16. Chaudhuri, K., Sarwate, A. D., & Sinha, K. (2013). A near-optimal algorithm for differentially private principal components. Journal of Machine Learning Research, 14(1), 2905–2943.

17. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60.

18. Balle, B., Barthe, G., & Gaboardi, M. (2018). Privacy amplification by subsampling: Tight analyses via couplings. Advances in Neural Information Processing Systems, 31, 6278–6288.

19. Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference, 265–284.

20. Shen, X., Song, Q., & Wang, L. (2022). Federated learning with heterogeneous data: A review. IEEE Access, 10, 128345–128363.

21. Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., & Chin, M. H. (2018). Ensuring fairness in machine learning to advance health equity. The Lancet Digital Health, 1(8), e399–e402.

22. Cao, Y., Yang, J., & Li, T. (2023). Machine unlearning: A survey. ACM Computing Surveys, 56(4), 1–39.

23. Zhu, H., & Wang, L. (2024). Fairness-aware differential privacy in federated learning. Proceedings of the 2024 AAAI Conference on Artificial Intelligence, 38(9), 10234–10242.

Federated Learning for Privacy-Preserving Immune Gene Typing and Cross-Cohort Immunogenomic Analysis from Long-Read Sequencing

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission