Graph Foundation Models for Protein Electrostatics: Transfer Learning Across Ionization and Stability Prediction Tasks

Xavier C. Horton; Gamjamin Weang; Rhristopher Yimpson; Fann D. James

Authors

Xavier C. Horton Department of Computer Science, University of Houston, Houston, TX, USA.
Gamjamin Weang Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
Rhristopher Yimpson School of Computing, Clemson University, Clemson, SC, USA.
Fann D. James Department of Computer Science, Colorado State University, Fort Collins, CO, USA.

Keywords:

protein electrostatics; graph foundation models; transfer learning; ionization states; protein stability; molecular graphs; large-scale systems

Abstract

Predicting the electrostatic properties of proteins remains a fundamental challenge for molecular biology and drug design, particularly the accurate estimation of ionization states and thermodynamic stability across diverse sequence and structural contexts. Recent breakthroughs in deep learning have opened a pathway toward graph-based foundation models that can capture complex physical interactions at scale, yet a comprehensive systems perspective that spans architecture design, transfer learning strategies, infrastructure deployment, and socio-technical implications is underdeveloped. This paper presents a long-form analysis of graph foundation models tailored for protein electrostatics, with a focus on transfer learning between the prediction of residue-level pKa values and the estimation of mutation-induced stability changes. We examine the underlying architectural trade-offs that arise when enforcing equivariance, incorporating multi-scale attention, and designing pre-training objectives that reconcile physical priors with data-driven learning. The work systematically discusses how such models can be fine-tuned for distinct downstream tasks while managing catastrophic forgetting, calibration, and domain shift. Beyond algorithmic concerns, we address the computational infrastructure required to train and serve these large models sustainably, and we interrogate the fairness and representational biases that may emerge from uneven coverage of protein families in training corpora. Governance, policy, and reproducibility frameworks are evaluated alongside deployment scenarios in industrial drug discovery pipelines. By weaving together structural design, system engineering, and regulatory foresight, this paper provides a holistic reference for the next generation of protein electrostatics models, arguing that scientific impact and societal robustness must evolve in tandem with architectural innovation.

References

1. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., ... Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.

2. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118.

3. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L., & Dror, R. O. (2021). Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations.

4. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M., & Jensen, J. H. (2011). PROPKA3: Consistent treatment of internal and surface residues in empirical pKa predictions. Journal of Chemical Theory and Computation, 7(2), 525–537.

5. Honig, B., & Nicholls, A. (1995). Classical electrostatics in biology and chemistry. Science, 268(5214), 1144–1149.

6. Gao, Y., Chen, L., & Zhang, S. (2022). Improving protein pKa prediction with equivariant graph neural networks and transfer learning. bioRxiv. https://doi.org/10.1101/2022.10.15.512345

7. Unsal, E., Rahman, S., & Ozkirimli, E. (2022). A deep learning approach for predicting pKa values of ionizable residues in proteins. Journal of Chemical Information and Modeling, 62(6), 1481–1492.

8. Song, Z., Wang, R., Jiao, X., & Huang, Z. (2026). Graph-Based Deep Learning Models for Predicting p K a Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering. Journal of Chemical Information and Modeling.

9. Cao, H., Wang, J., He, L., Qi, Y., & Zhang, J. Z. (2019). DeepDDG: Predicting the stability of protein point mutations via deep learning. Journal of Chemical Information and Modeling, 59(4), 1508–1514.

10. Li, G., Panday, S. K., & Alexov, E. (2020). PremPS: Predicting the effects of single mutations on protein stability. Journal of Computational Chemistry, 41(22), 1956–1964.

11. Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V., & Leskovec, J. (2020). Strategies for pre-training graph neural networks. In International Conference on Learning Representations.

12. You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., & Shen, Y. (2020). Graph contrastive learning with augmentations. In Advances in Neural Information Processing Systems.

13. Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., & Tang, J. (2022). Pre-training molecular graph representation with 3D geometry. In International Conference on Learning Representations.

14. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., ... Zheng, X. (2016). TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (pp. 265–283).

15. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645–3650).

16. Greener, J. G., Kandathil, S. M., Moffat, L., & Jones, D. T. (2022). A guide to machine learning for biologists. Nature Reviews Molecular Cell Biology, 23(1), 40–55.

17. Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., & Chin, M. H. (2018). Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine, 169(12), 866–872.

18. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., ... Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

19. Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2), 1–21.

20. Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., & Stoica, I. (2017). Clipper: A low-latency online prediction serving system. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (pp. 613–627).

Graph Foundation Models for Protein Electrostatics: Transfer Learning Across Ionization and Stability Prediction Tasks

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission