GeoProt-GNN: Geometry-Aware Graph Neural Networks for Predicting Functional Ionization Landscapes in Protein Structures

Pankaj A. Pathak; Senjay Waidav; Florian D. Hayes

Authors

Pankaj A. Pathak Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
Senjay Waidav Department of Computer Science, University of New Hampshire, Durham, NH, USA.
Florian D. Hayes Department of Computer Science, University of Central Florida, Orlando, FL, USA.

Keywords:

protein ionization, graph neural networks, geometry-aware deep learning, pKa prediction, molecular systems, AI governance, sustainable computing, structural bioinformatics

Abstract

The prediction of functional ionization landscapes in proteins, encoded by the pKa values of titratable residues, is a fundamental challenge that underpins mechanistic understanding of enzyme catalysis, molecular recognition, and pH-dependent structural transitions. Traditional computational methods rooted in continuum electrostatics or empirical parameterization, while valuable, often fall short in capturing the subtle geometric and dynamic features that govern site-specific protonation equilibria. This work introduces GeoProt-GNN, a comprehensive system architecture that leverages geometry-aware graph neural networks to predict residue-level ionization states directly from three-dimensional protein structures. We present a holistic examination of the system-level design principles required to build, deploy, and govern such a predictive infrastructure. Our discussion moves beyond algorithmic novelty to emphasize structural trade-offs between all-atom resolution and computational tractability, the integration of equivariant message-passing mechanisms that respect roto-translational symmetries, and the design of multi-scale graph representations that balance local chemical detail with long-range electrostatic context. We detail the large-scale training pipeline that amalgamates curated structural data from the Protein Data Bank and AlphaFold Database, highlighting data governance, provenance tracking, and the sustainability metrics of model development. Robustness and fairness are analyzed through the lens of representation bias across protein families and the need for well-calibrated uncertainty quantification in safety-critical applications such as drug design. Furthermore, we outline governance frameworks and policy implications for the responsible deployment of ionization landscape predictors, including model documentation standards, biosecurity considerations, and alignment with FAIR data principles. The modular architecture of GeoProt-GNN enables cross-domain extensions to redox potential prediction, metal-binding site identification, and the design of pH-responsive biological systems. By framing the system as a sociotechnical infrastructure, we provide a blueprint for the next generation of machine learning tools in structural biology that are not only accurate but also robust, fair, sustainable, and ethically aligned.

References

1. Nielsen, J. E., Gunner, M. R., & García-Moreno, B. (2005). The pKa cooperative: A collaborative effort to advance structure-based calculations of pKa values and electrostatic effects in proteins. Proteins: Structure, Function, and Bioinformatics, 61(4), 704–721.

2. Bashford, D., & Karplus, M. (1990). pKa’s of ionizable groups in proteins: Atomic detail from a continuum electrostatic model. Biochemistry, 29(44), 10219–10225.

3. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M., & Jensen, J. H. (2011). PROPKA3: Consistent treatment of internal and surface residues in empirical pKa predictions. Journal of Chemical Theory and Computation, 7(2), 525–537.

4. Anandakrishnan, R., Aguilar, B., & Onufriev, A. V. (2012). H++ 3.0: Automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Research, 40(W1), W537–W541.

5. Mongan, J., Case, D. A., & McCammon, J. A. (2004). Constant pH molecular dynamics in generalized Born implicit solvent. Journal of Computational Chemistry, 25(16), 2038–2048.

6. Song, Z., Wang, R., Jiao, X., & Huang, Z. (2026). Graph-Based Deep Learning Models for Predicting p K a Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering. Journal of Chemical Information and Modeling.

7. Baldassarre, F., Menéndez Hurtado, D., Elofsson, A., & Azizpour, H. (2021). GraphQA: Protein model quality assessment using graph convolutional networks. Bioinformatics, 37(3), 360–366.

8. Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., & Riley, P. (2018). Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219.

9. Fuchs, F. B., Worrall, D. E., Fischer, V., & Welling, M. (2020). SE(3)-Transformers: 3D roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33, 1970–1981.

10. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.

11. Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., … & Baker, D. (2022). Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615), 49–56.

12. Zhang, L., Han, J., Wang, H., Car, R., & Weinan, E. (2018). Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Physical Review Letters, 120(14), 143001.

13. Leaver-Fay, A., Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R., … & Bradley, P. (2011). ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules. Methods in Enzymology, 487, 545–574.

14. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., … & Gebru, T. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229.

15. Sergeev, A., & Del Balso, M. (2018). Horovod: Fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799.

16. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.

17. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., … & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

18. Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., & Chin, M. H. (2018). Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine, 169(12), 866–872.

19. Saunders, M. G., & Voth, G. A. (2013). Coarse-graining methods for computational biology. Annual Review of Biophysics, 42, 73–93.

20. AlQuraishi, M. (2021). Machine learning in protein structure prediction. Current Opinion in Chemical Biology, 65, 1–8.

GeoProt-GNN: Geometry-Aware Graph Neural Networks for Predicting Functional Ionization Landscapes in Protein Structures

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal Information

Indexing & Infrastructure

Current Issue

Information

Make a Submission