Protein Sciences and Drug Discovery

CITRIS-CDSS Innovation


Protein-Ligand Dataset Curation

  1. Y. Wang*, K. Sun*, J. Li, X. Guan, O. Zhang, D. Bagni, Y. Zhang, H. A. Carlson, T. Head-Gordon (2025). A Workflow to Create a High-Quality Protein-Ligand Binding Dataset for Training, Validation, and Prediction Tasks. Digital Discovery, 4, 1209-1220 [link]
  2. O. Zhang, S. A. Naik, Z. H. Liu, J. Forman-Kay, T. Head-Gordon (2024). A Curated Rotamer Library for Common Post-Translational Modifications of Proteins. Bioinformatics, 40 (7), btae444 [link]
  3. J. Li, X. Guan, O. Zhang, K. Sun, Y. Wang, D. Bagni, T. Head-Gordon (2024). Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction.  [link]

Software platforms for drug discovery

  1. J. Li, O. Zhang, F. L. Kearns, M. Haghighatlari, C. Parks, R. E. Amaro, T. Head-Gordon (2024). Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design. J. Chem. Inform. Model., 64, 24, 9082–9097. [link]
  2. J. Purnomo, C. Kim, K. Sun, Y.  Wang, T.  Head-Gordon (2025). More Accurate Binding Free Energy Prediction Using Protein Homology and Ligand-Based Transfer Learning. submitted
  3. K. Sun*, Y. Wang*, J. Purnomo, T. Head-Gordon (2025). Polaris Challenge: Data-driven Priors to Improve Docking for Pose Prediction. *equal contribution. revisions J. Chem. Info. Chem.

LLMs for Drug Discovery and Driven Research Labs

  1. J. M. Cavanagh, K. Sun, A. Gritsevskiy, D. Bagni, T. D. Bannister, T. Head-Gordon (2025). SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration. submitted [link]
  2. K. Sun, D. Bagni*, J. M. Cavanagh*, Y. Wang, A. Gritsevskiy, J. Sawyer, T. Head-Gordon (2025). SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models. *equal contribution. ACS Central Science, in press [link]
  3. Z. Zheng, O. Zhang, H. Nguyen, N. Rampal, A. Alawadhi, Z. Rong, T. Head-Gordon, C. Borgs, J. Chayes, O. Yaghi (2023). ChatGPT Research Group for Optimizing Crystallinity of MOFs and COFs. ACS Central Sci., 9, 2161–2170 [link]

Intrinsically Disordered Proteins (IDPs)

  1. O. Zhang, Z. H. Liu, J. D Forman-Kay, T. Head-Gordon (2025). Deep Learning of Proteins with Local and Global Regions of Disorder. submitted [link]
  2. Z.-H. Liu, M. Tsanai, O. Zhang, J.D. Forman-Kay, T. Head-Gordon (2025). Biological Insights from Integrative Modeling of Intrinsically Disordered Protein Systems.  Curr. Opin. Struct. Bio. 93, 103063 [link]
  3. H. Ghafouri, T. Lazar, A. Del Conte, L. G Tenorio Ku, PED Consortium, P.  Tompa, S. C. E. Tosatto, A. M. Monzon (2024). PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Research, gkad947 [link]
  4. O. Zhang, M. Haghighatlari, J. Li, J. Correia Teixeira, A. Namini, Z.H. Liu, J. Forman-Kay, T. Head-Gordon (2023). Learning to Evolve Structural Ensembles of Unfolded and Disordered Proteins Using Experimental Solution Data. J. Chem. Phys. (ML special issue).  158, 174113 [link]
  5. Z.H. Liu, O. Zhang, J. M. C. Teixeira, J. Li, T. Head-Gordon, J. D. Forman-Kay (2023). SPyCi-PDB: A modular command-line interface for back-calculating experimental datatypes of protein structures. J. Open Source Software, 8(85), 4861. [link]
  6. Z. H. Liu, J. M.C. Teixeira, O. Zhang, T. E. Tsangaris, J. Li, C. C. Gradinaru, T. Head-Gordon, J. D. Forman-Kay (2023). Local Disordered Region Sampling (LDRS) for Ensemble Modeling of Proteins with Experimentally Undetermined or Low Confidence Prediction Segments. Bioinformatics, 39(12):btad739. [link]
  7. J.M.C. Teixeira, Z. H. Liu, A. Namini, J. Li, R. M. Vernon, M. Krzeminski, A. A. Shamandy, O. Zhang, M. Haghighatlari, L. Yu, T. Head-Gordon, J. D. Forman-Kay (2022). IDPConformerGenerator: A Flexible Software Suite for Sampling Conformational Space of Disordered Protein States. J. Phys. Chem. A (editor’s choice) 126(35), 5985–6003 [link]
  8. J. Lincoff, M. Krzeminski, M. Haghighatlari, J. M.C. Teixeira, G.-N. W. Gomes, C. C. Gardinaru, J. Forman-Kay, T. Head-Gordon (2020). Extended Experimental Inferential Structure Determination Method for Evaluating the Structural Ensembles of Disordered Protein States. Chem. Comm. 3, Article no: 74  [link]