Latent Space Embedding Methods for Chemical Molecules: Principles and Applications

Haotian Chen, Tao Yang, Xiaotong Liu

Prog Chem ›› 2025, Vol. 37 ›› Issue (10) : 1456-1478.

PDF(1975 KB)
Home Journals Progress in Chemistry
Progress in Chemistry

Abbreviation (ISO4): Prog Chem      Editor in chief: Jincai ZHAO

About  /  Aim & scope  /  Editorial board  /  Indexed  /  Contact  / 
PDF(1975 KB)
Prog Chem ›› 2025, Vol. 37 ›› Issue (10) : 1456-1478. DOI: 10.7536/PC20250308
Review

Latent Space Embedding Methods for Chemical Molecules: Principles and Applications

Author information +
History +

Abstract

Effective representation of chemical molecules is the key to promoting chemical informatics and new material research and development. In recent years, data-driven molecular representation technology has been developed. Compared with traditional manually designed descriptors and graph structure analysis methods, it can effectively avoid noise and information redundancy, and provide support for efficient and accurate property prediction. Embedding representation has the characteristics of efficient information compression, data representation enhancement and semantic retention, and has been widely used in fields such as deep learning and data mining. Inspired by word embeddings in the field of natural language processing, researchers began to explore the application of similar methods to the construction of the latent space of chemical molecules, and proposed a variety of embedding methods for molecular property prediction and molecular structure generation. This review first elucidates the principles of general embedding technology in machine learning, and then sequentially discusses chemical element latent space representation methods and chemical molecule latent space embedding techniques. By examining the innovative applications of related technologies in natural language processing and graph embedding to molecular embeddings, the review reveals that current molecular embedding methods are gradually evolving towards multimodality, self-supervised learning, and dynamic modeling, and it outlines prospects for future research trends.

Contents

1 Introduction

2 Principles of embedding in machine learning

2.1 Word embedding

2.2 Graph embedding

2.3 Multimodal embedding

3 Element latent space representation methods

3.1 Attribute-based element representation

3.2 Element representation based on physicochemical knowledge

3.3 Data-driven element embedding

4 Advances in molecular latent space embedding

4.1 Traditional chemical feature-based molecular descriptors

4.2 Graph theory-driven molecular embedding

4.3 Data-driven molecular embedding

4.4 Multimodal molecular embedding

5 Conclusion and outlook

5.1 Current status and key technology

5.2 Future research prospects

Key words

molecular embedding / machine learning / representation learning / property prediction / multimodality / self-supervised learning

Cite this article

Download Citations
Haotian Chen , Tao Yang , Xiaotong Liu. Latent Space Embedding Methods for Chemical Molecules: Principles and Applications[J]. Progress in Chemistry. 2025, 37(10): 1456-1478 https://doi.org/10.7536/PC20250308

References

[1]
Afzal M A F, Hachmann J. Handbook on Big Data and Machine Learning in the Physical Sciences. Singapore: World Scientific, 2020. 1.
[2]
Xu D G, Zhang Q, Huo X Y, Wang Y T, Yang M L. Mater. Genome Eng. Adv., 2023, 1: e11.
[3]
Isayev O, Fourches D, Muratov E, Oses C, Rasch K, Tropscha A, Curtarolo S. Bull. Am. Phys. Soc., 2014, 39799817.
[4]
Ramprasad R, Batra R, Pilania G, Mannodi-Kanakkithodi A, Kim C. NPJ Comput. Mater., 2017, 3: 54.
[5]
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. In Neural Information Processing Systems. Nevada: NeurIPS, 2013. 16447573.
[6]
Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. [2013-09-07]. https://doi.org/10.48550/arXiv.1301.3781.
[7]
Yaghoobi M, Alaei M. Comput. Mater. Sci., 2022, 207: 111284.
[8]
Karim A, Singh J, Mishra A, Dehzangi A, Newto, M A H, Sattar A. Lecture Notes in Computer Science. Eds.: Ohara K, Bai Q. Cham: Springer, 2019. 11669: 142.
[9]
Karim A, Riahi V, Mishra A, Hakim Newton M A, Dehzangi A, Balle T, Sattar A. ACS Omega, 2021, 6(18): 12306.
[10]
Manolache A, Tantaru D, Niepert M. MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning. [2024-10-24]. https://doi.org/10.48550/arXiv.2410.07981.
[11]
Devata S, Sridharan B, Mehta S, Pathak Y, Laghuvarapu S, Varma G, Priyakumar U D. Digit. Discov., 2024, 3(4): 818.
[12]
Huang E, Yang J S, Liao K Y K, Tseng W C W, Lee C K, Gill M, Compas C B, See S, Tsai F J. Sci. Rep., 2024, 271087139.
[13]
Bengio Y, Ducharme R, Vincent P. In Neural Information Processing Systems. Colorado: NeurIPS, 2000, 13.
[14]
Pennington J, Socher R, Manning C. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Stroudsburg: ACL, 2014. 1532.
[15]
Bojanowski P, Grave E, Joulin A, Mikolov T. Trans. Assoc. Comput. Linguist., 2017, 5: 135.
[16]
Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. J. Am. Soc. Inf. Sci., 1990, 41(6): 391
[17]
Blei D M, Ng A Y, Jordan M I. J. Mach. Learn. Res. 2003, 3(1): 993.
[18]
Peters M E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep Contextualized Word Representations. [2018-02-15]. https://doi.org/10.48550/arXiv.1802.05365.
[19]
Devlin J, Chang M W, Lee K, Toutanova K. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minnesota: ACL, 2019. 4171.
[20]
Belkin M, Niyogi P. Neural Comput., 2003, 15(6): 1373.
[21]
Ezzat A, Wu M, Li X L, Kwoh C. Methods, 2017, 129: 81.
[22]
Perozzi B, Al-Rfou R, Skiena S. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014. 701.
[23]
Grover A, Leskovec J. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. California: ACM, 2016, 855.
[24]
Hamilton W, Ying Z, Leskovec J. Neural Information Processing Systems. California: NeurIPS, 2017. 30.
[25]
Kipf T N, Welling M. Semi-Supervised Classification with Graph Convolutional Networks. [2017-02-22]. https://doi.org/10.48550/arXiv.1609.02907.
[26]
Peng Y, Qi J. Multimedia Comput. Commun. Appl., 2019, 15(1): 1.
[27]
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet G R G, Levy R, Vasconcelos N. 18th ACM International Conference on Multimedia. Firenze: ACM, 2010. 251.
[28]
Guo W Z, Wang J W, Wang S P. IEEE Access, 2019, 7: 63373.
[29]
Rupp M, Tkatchenko A, Müller K, von Lilienfeld O A. Phys. Rev. Lett., 2012, 108(5): 058301.
[30]
Ward L, Liu R Q, Krishna A, Hegde V I, Agrawal A, Choudhary A, Wolverton C. Phys. Rev. B, 2017, 96(2): 024104.
[31]
Liu K, Sun X, Jia L, Ma J, Xing H, Wu J, Gao H, Sun Y, Boulnois F, Fan J. Int. J. Mol. Sci., 2019, 20(14): 3389.
[32]
Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. J. Comput. Aided Mol. Des., 2016, 30(8): 595.
[33]
Coley C W, Barzilay R, Green W, Jaakkola T, Jensen K. J. Chem. Inf. Model., 2017, 57(8): 1757.
[34]
Yang K, Swanson K, Jin W G, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, Palmer A, Settels V, Jaakkola T, Jensen K, Barzilay R. J. Chem. Inf. Model., 2019, 59(8): 3370.
[35]
Xie T, Grossman J. Phys. Rev. Lett., 2018, 120(14): 145301.
[36]
Ramakrishnan R, Dral P O, Rupp M, von Lilienfeld O A, Sci. Data, 2014, 1: 140022.
[37]
Chen C, Ye W K, Zuo Y X, Zheng C, Ong S. Chem. Mater., 2019, 31(9): 3564.
[38]
Venugopal V, Olivetti E. Sci. Data, 2024, 11(1): 217.
[39]
Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z Q, Kononova O, Persson K A, Ceder G, Jain A. Nature, 2019, 571(7763): 95
[40]
Consonni V, Ballabio D, Todeschini R. Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development. New York: Academic Press, 2023. 303.
[41]
Consonni V, Todeschini R. Statistical Modelling of Molecular Descriptors in QSAR/QSPR. Dehmer M, Varmuza K, Bonchev D (Eds.). New Jersey: Wiley, 2012, 111.
[42]
Amar Y, Schweidtmann A M, Deutsch P, Cao L W, Lapkin A. Chem. Sci., 2019, 10(27): 6697.
[43]
Durant J L, Leland B A, Henry D R, Nourse J G. J. Chem. Inf. Comput. Sci., 2002, 42(6): 1273.
[44]
Rogers D, Hahn M. J. Chem. Inf. Model., 2010, 50(5): 742.
[45]
Vidal D, Thormann M, Pons M. J. Chem. Inf. Model., 2005, 45(2): 386.
[46]
Schwartz J, Awale M, Reymond J. J. Chem. Inf. Model., 2013, 53(8): 1979.
[47]
Bender A, Jenkins J L, Glick M, Deng Z, Nettles J H, Davies J W. J. Chem. Inf. Model., 2006, 46(6): 2445.
[48]
Nidhi, Glick M, Davies J, Jenkins J. J. Chem. Inf. Model., 2006, 46(3): 1124.
[49]
Laufkötter O, Sturm N, Bajorath J, Chen H M, Engkvist O. J. Cheminform, 2019, 11(1): 54.
[50]
David L, Thakkar A, Mercado R, Engkvist O. J. Cheminform, 2020, 12(1): 56.
[51]
Yu L, Sun L L, Du B W, Lv W F. Adv. Neural Inform. Process. Syst., 2023, 36: 67686.
[52]
Yuan H N, Sun Q Y, Fu X C, Zhang Z W, Ji C, Peng H, Li J X. Neural Information Processing Systems. New York: Curran Associates, 2024, 36.
[53]
Faber F A, Hutchison L, Huang B, Gilmer J, Schoenholz S S, Dahl G E, Vinyals O, Kearnes S, Riley P F, von Lilienfeld O A. J. Chem. Theory Comput., 2017, 13(11): 5255.
[54]
Choudhary K, Garrity K, Ghimire N, Anand N, Tavazza F. Phys. Rev. B, 2021, 103(15): 155131.
[55]
Choudhary K, Garrity K F, Tavazza F. J. Phys. Condens. Matter, 2020, 32(47): 475501.
[56]
Liu C H, Tao Y Z, Hsu D, Du Q, Billinge S. Acta Crystallogr. A: Found. Adv., 2019, 75(4): 633.
[57]
Xu K, Hu W, Leskovec J, Jegelka S. How Powerful Are Graph Neural Networks? [2018-10-01]. https://doi.org/10.48550/arXiv.1810.00826.
[58]
Battaglia P, Hamrick J B, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R, Gülçehre Ç, Song H F, Ballard A J, Gilmer J, Dahl G E, Vaswani A, Allen K R, Nash C, Langston V, Dyer C, Heess N, Wierstra D, Kohli P, Botvinick M, Vinyals O, Li Y J, Pascanu R. Relational Inductive Biases, Deep Learning, and Graph Networks. [2018-06-04]. https://doi.org/10.48550/arXiv.1806.01261.
[59]
Gasteiger J, Giri S, Margraf J T, Günnemann S. Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules. [2022-05-05]. https://doi.org/10.48550/arXiv.2011.14115.
[60]
Flam-Shepherd D, Wu T C, Friederich P, Aspuru-Guzik A. Mach. Learn. Sci. Technol., 2021, 2(4): 045009.
[61]
Gasteiger J, Becker F, Günnemann S. Neural Information Processing Systems. New York: Curran Associates, 2021. 34: 6790.
[62]
Schütt K T, Sauceda H E, Kindermans P J, Tkatchenko A, Müller K R. J. Chem. Phys., 2018, 148(24): 241722.
[63]
Unke O T, Meuwly M. J. Chem. Theory Comput., 2019, 15(6): 3678.
[64]
Gasteiger J, Groß J, Günnemann S. Directional Message Passing for Molecular Graphs. [2022-04-05]. https://doi.org/10.48550/arXiv.2003.03123.
[65]
Chen Z H, You Z H, Guo Z H, Yi H C, Luo G X, Wang Y B. Front. Bioeng. Biotechnol., 2020, 8: 338.
[66]
Jo J, Baek J, Lee S, Kim D, Kang M, Hwang S J. Neural Information Processing Systems. New York: Curran Associates, 2021. 34: 7534.
[67]
Gilmer J, Schoenholz S S, Riley P F, Vinyals O, Dahl G E. International Conference on Machine Learning. Sydney: PMLR, 2017. 1263.
[68]
Dwivedi V P, Joshi C K, Luu A T, Laurent T, Bengio Y, Bresson X. J. Mach. Learn. Res., 2023, 24(43): 1.
[69]
Choudhary K, DeCost B L. NPJ Comput. Mater., 2021, 7(1): 185.
[70]
Liao Y L, Smidt T, Shuaibi M, Da A. Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields. [2024-12-19]. https://doi.org/10.48550/arXiv.2403.09549.
[71]
Barroso-Luque L, Shuaibi M, Fu X, Wood B M, Dzamba M, Gao M, Rizvi A, Zitnick C L, Ulissi Z W. Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models. [2024-10-16]. https://arxiv.org/abs/2410.127711
[72]
Jaeger S, Fulle S, Turk S. J. Chem. Inf. Model., 2018, 58(1): 27.
[73]
Mann V, Brito K, Gani R, Venkatasubramanian V. Fluid Phase Equilib., 2022, 561: 113531.
[74]
Bechhofer S, Harmelen F V, Hendler J, Horrocks I, McGuinness D L, Patel-Schneider P, Stein L. OWL Web Ontology Language Reference. [2024-02-10]. http://www.w3.org/TR/2004/rec-owl-ref-20040210/
[75]
Chen J Y, Hu P, Jimenez-Ruiz E, Holter O M, Antonyrajah D, Horrocks I. Mach. Learn., 2021, 1813.
[76]
Goh G B, Hodas N O, Siegel C, Vishnu A. SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties. [2018-03-18]. https://arxiv.org/abs/1712.02034
[77]
Jeon W, Kim D. Bioinformatics, 2019, 35(23): 4979.
[78]
Zang X, Zhao X B, Tang B Z. Commun. Chem., 2023, 6(1): 34.
[79]
Wang S, Guo Y Z, Wang Y H, Sun H M, Huang J Z. 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. New York: ACM, 2019. 429.
[80]
Guo Z J, Zhang Y, Lu W. Attention Guided Graph Convolutional Networks for Relation Extraction. [2019-08-02]. https://www.aclweb.org/anthology/P19-1024/
[81]
Qin L, Dong G C, Peng J. 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Seoul: IEEE, 2020. 708.
[82]
Li J, Jiang X. Wirel. Commun. Mob. Computi., 2021, 2021(1): 7181815.
[83]
Liu Y W, Zhang R S, Li T F, Jiang J, Ma J, Wang P. J. Mol. Graph. Model., 2023, 118: 108344.
[84]
Tang Q, Nie F L, Zhao Q, Chen W. Brief. Bioinform., 2022, 23(5): bbac357.
[85]
Chithrananda S, Grand G, Ramsundar B. ChemBERTa: Large-Scale Self-Supervised Pretrainfab shing for Molecular Property Prediction. [2020-10-19]. https://arxiv.org/abs/2010.09885
[86]
Ghaayathri Devi K, Bedadhala R S, Sachin Kumar S, Soman K P, Bodapatiq J D. 2024 2024 4th International Conference on Intelligent Technologies (CONIT). Bangalore: IEEE, 2024. 1.
[87]
Ross J, Belgodere B M, Chenthamarakshan V, Padhi I, Mroueh Y, Das P. Nat. Mach. Intell., 2022, 4(12): 1256.
[88]
Wu F, Radev D, Li S Z. Proc. AAAI Conf. Artif. Intell., 2023, 37(4): 5312.
[89]
Wan F P, Zeng J Y. D. Deep Learning with Feature Embedding for Compound-Protein Interaction Prediction. [2016-11-07]. https://www.biorxiv.org/content/10.1101/086033v1
[90]
Olivecrona M, Blaschke T, Engkvist O, Chen H M. J. Cheminf., 2017, 9: 48.
[91]
Schneider N, Fechner N, Landrum G, Stiefl N. J. Chem. Inf. Model., 2017, 57(8): 1816.
[92]
Mann V, Venkatasubramanian V. AIChE J., 2021, 67(3): e17190.
[93]
Hinton G E, Salakhutdinov R R. Science, 2006, 313(5786): 504.
[94]
Gómez-Bombarelli R, Wei J N, Duvenaud D, Hernández-Lobato J M, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel T D, Adams R P, Aspuru-Guzik A. ACS Cent. Sci., 2018, 4(2): 268.
[95]
Winter R, Montanari F, Noé F, Clevert D A. Chem. sci., 2019, 10(6): 1692.
[96]
Popova M, Isayev O, Tropsha A. Sci. Adv., 2018, 4(7): eaap7885.
[97]
Segler M H S, Kogej T, Tyrchan C, Waller M. ACS Cent. Sci., 2018, 4(1): 120.
[98]
Xu Z, Wang S, Zhu F Y, Huang J Z. 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics. Boston: ACM, 2017. 285.
[99]
Hou Y Y, Wang S Y, Bai B, Chan H C S, Yuan S G. Molecules, 2022, 27(5): 1668.
[100]
Lv Q J, Chen G X, Zhao L, Zhong W H, Chen C Y. Brief. Bioinform., 2021, 22(6): bbab317.
[101]
Bagal V, Aggarwal R, Vinod P K, Priyakumar U D. J. Chem. Inf. Model., 2022, 62(9): 2064.
[102]
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre-Training. [2018-06-11]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
[103]
Jin W G, Barzilay R, Jaakkola T. International Conference on Machine Learning. Sweden: PMLR, 2018. 2323.
[104]
Jin W G, Barzilay R, Jaakkola T. International Conference on Machine Learning. Vienna: PMLR, 2020. 4839.
[105]
Gebauer N, Gastegger M, Schütt K T. Neural Information Processing Systems. Vancouver: PMLR, 2019. 32.
[106]
Shi C, Xu M, Zhu Z, Zhang W, Zhang M, Tang J. GraphAF: A Flow-Based Autoregressive Model for Molecular Graph Generation. [2020-02-27]. https://doi.org/10.48550/arXiv.2001.09382.
[107]
Zang C X, Wang F. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. California: ACM, 2020. 617.
[108]
Kingma D P, Dhariwal P. Neural Information Processing Systems. Montreal: Curran, 2018. 31.
[109]
Peng X, Guan J, Liu Q, Ma J. MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation. [2023-05-11]. https://doi.org/10.48550/arXiv.2305.07508.
[110]
Oestreich M, Merdivan E, Lee M, Schultze J L, Piraud M, Becker M. J. Cheminf., 2025, 17: 23.
[111]
Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A. Mach. Learn. Sci. Technol., 2020, 1(4): 045024.
[112]
Eckmann P, Sun K, Zhao B, Feng M, Gilson M K, Yu R. International Conference on Machine Learning, ICML 2022. Vienna: PMLR, 2022. 162: 5777.
[113]
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022, 10674.
[114]
Luo Y, Yan K, Ji S. International Conference on Machine Learning. PMLR, 2021. 7192.
[115]
Imrie F, Bradley A R, van der Schaar M, Deane C M. J. Chem. Inf. Model., 2020, 60(4): 1983.
[116]
Liu M, Yan K, Oztekin B, Ji S. GraphEBM: Molecular Graph Generation with Energy-Based Models. [2021-04-11]. https://doi.org/10.48550/arXiv.2102.00546.
[117]
Welling M.; Teh Y. W. 28th International Conference on Machine Learning, ICML-11. Bellevue: Citeseer, 2011. 681.
[118]
Xu M, Yu L, Song Y, Shi C, Ermon S, Tang J. GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation. [2022-05-06]. https://doi.org/10.48550/arXiv.2203.02923.
[119]
Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S. International Conference on Machine Learning. Lille: PMLR, 2015. 2256.
[120]
Liu M, Luo Y, Uchino K, Maruhashi K, Ji S. Generating 3D Molecules for Target Protein Binding. [2022-05-30]. https://doi.org/10.48550/arXiv.2204.09410
[121]
Li Y. Roberta: A Robustly Optimized Bert Pretraining Approach. [2019-07-26]. https://arxiv.org/abs/1907.11692
[122]
Wu Z Q, Ramsundar B, Feinberg E N, Gomes J, Geniesse C, Pappu A S, Leswing K, Pande V. Chem. Sci., 2018, 9(2): 513.
[123]
Su J L, Ahmed M, Lu Y, Pan S F, Bo W, Liu Y F. Neurocomputing, 2024, 568: 127063.
[124]
Ishida S, Miyazaki T, Sugaya Y, Omachi S. Molecules., 2021, 26(11): 3125.
[125]
Qiu J Z, Chen Q B, Dong Y X, Zhang J, Yang H X, Ding M, Wang K S, Tang J. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event CA USA: ACM, 2020, 1150.
[126]
Xu M, Wang H, Ni B, Guo H, Tang J. International Conference on Machine Learning. PMLR, 2021. 11548.
[127]
Liu S, Wang H, Liu W, Lasenby J, Guo H, Tang J. Pre-Training Molecular Graph Representation with 3D Geometry.[2022-05-29]. https://doi.org/10.48550/arXiv.2110.07728.
[128]
Veličković P, Fedus W, Hamilton W L, Liò P, Bengio Y, Hjelm R D. Deep Graph Infomax. [2018-12-21]. https://doi.org/10.48550/arXiv.1809.10341.
[129]
You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y. In Neural Information Processing Systems. Curran Associates, 2020, 33, 5812.
[130]
Zhang Z X, Liu Q, Wang H, Lu C Q, Lee C K. In Neural Information Processing Systems. Curran Associates, 2021, 34: 15870.
[131]
Zhang S, Hu Z, Subramonian A, Sun Y. Motif-Driven Contrastive Learning of Graph Representations. [2021-03-15]. https://doi.org/10.48550/arXiv.2012.12533.
[132]
Wang Y, Wang J, Cao Z, Barati Farimani A. Nat. Mach. Intell., 2022, 4(3): 279.
[133]
Wang Y Y, Magar R, Liang C, Farimani A. J. Chem. Inf. Model., 2022, 62(11): 2713.
[134]
Wang H, Li W, Jin X, Cho K, Ji H, Han J, Burke M D. Chemical-Reaction-Aware Molecule Representation Learning. [2021-10-12]. https://doi.org/10.48550/arXiv.2109.09888.
[135]
Gong X, Liu Q, Han R, Guo Y K, Wang G Y. Neural Netw., 2025, 184: 107088.
[136]
Gong X, Liu M T, Liu Q, Guo Y K, Wang G Y. Pattern Recognit., 2025, 163: 111463.
[137]
Wang C H, Yang Y Q, Song J S, Nan X F. J. Chem. Inf. Model., 2024, 64(19): 7189.
[138]
Guo Z C, Yu W H, Zhang C X, Jiang M, Chawla N V. 29th ACM International Conference on Information & Knowledge Management. Virtual Event Ireland: ACM, 2020. 435.
[139]
Liu J P, Lei X J, Zhang Y C, Pan Y. Comput. Biol. Med., 2023, 153: 106524.
[140]
Nguyen D M H, Lukashina N, Nguyen T, Le A T, Nguyen T, Ho N, Peters J, Sonntag D, Zaverkin V, Niepert M. Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks. [2024-08-19]. https://doi.org/10.48550/arXiv.2402.01975.
[141]
Vayer T, Courty N, Tavenard R, Chapel L, Flamary R. In International Conference on Machine Learning. California: PMLR, 2019, 6275.
[142]
Zhu Y, Hwang J, Adams K, Liu Z, Nan B, Stenfors B, Du Y, Chauhan J, Wiest O, Isayev O, Coley C W, Sun Y, Wang W. Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks.[2024-07-28]. https://doi.org/10.48550/arXiv.2310.00115.
[143]
Wang X F, Li Z, Jiang M J, Wang S, Zhang S G, Wei Z Q. J. Chem. Inf. Model., 2019, 59(9): 3817.
[144]
Cai H X, Zhang H M, Zhao D C, Wu J X, Wang L. Brief. Bioinform., 2022, 23(6): bbac408.
[145]
Wang T Y, Sun J Q, Zhao Q. Comput. Biol. Med., 2023, 153: 106464.
[146]
Zhang H H, Wu J T, Liu S C, Han S. Inf. Fusion, 2024, 103: 102092.
[147]
Lu X, Xie L, Xu L, Mao R, Chang S, Xu X. Comput. Struct. Biotechnol. J., 2024, 23, 1666.
[148]
Nan S H, Li Z M, Jin S M, Du W L, Shen W F. Ind. Eng. Chem. Res., 2025, 64(5): 3045.
[149]
Yi W L, Zhang L, Xu Y L, Cheng X P, Chen T Z. Expert Syst. Appl., 2025, 260: 125403.
[150]
Ryu J, Lee M Y, Lee J H, Lee B, Oh K. Bioinformatics, 2020, 36(10): 3049.
[151]
Deng D G, Chen X W, Zhang R C, Lei Z R, Wang X J, Zhou F. J. Chem. Inf. Model., 2021, 61(6): 2697.
[152]
Wu J Z, Su Y, Yang A, Ren J Z, Xiang Y. Comput. Biol. Med., 2023, 165: 107452.
[153]
Zheng Z X, Wang H, Tan Y Y, Liang C, Sun Y S. Expert Syst. Appl., 2023, 234: 121016.
[154]
Stärk H, Beaini D, Corso G, Tossou P, Dallago C, Günnemann S, Liò P. International Conference on Machine Learning. Baltimore: PMLR, 2022. 20479.
[155]
Chen M K, Gong X W, Pan S R, Wu J, Lin F, Du B, Hu W B. Neural Netw., 2025, 184: 107068.
[156]
Chen R Z, Li C Y, Wang L Y, Liu M Q, Chen S G, Yang J H, Zeng X X. Inf. Fusion, 2025, 115: 102784.
[157]
Xiang H X, Jin S T, Xia J, Zhou M, Wang J M, Zeng L, Zeng X X. Thirty-Third International Joint Conference on Artificial Intelligence. Jeju: AAAI Press, IJCAI Organization, 2024. 6107.
[158]
Ma M, Lei X J. Comput. Biol. Med., 2024, 169: 107911.
[159]
Yin R, Liu R Y, Hao X S, Zhou X R, Liu Y, Ma C, Wang W P. IEEE Trans. Image Process., 2024, 34: 3225.
[160]
Chen Z Y, Xie F K, Wan M, Yuan Y, Liu M, Wang Z G, Meng S, Wang Y G. Chin. Phys. B., 2023, 32(11): 118104.
[161]
Grisoni F. Curr. Opin. Struct. Biol., 2023, 79: 102527.
[162]
Luo Y, Zhang J, Fan S, Yang K, Wu Y, Qiao M, Nie Z. BioMedGPT: Open Multimodal Generative Pre-Trained Transformer for BioMedicine. [2023-08-21]. https://doi.org/10.48550/arXiv.2308.09442.
[163]
Xie T, Wan Y, Liu Y, Zeng Y, Wang S, Zhang W, Grazian C, Kit C, Ouyang W, Zhou D, Hoex B. DARWIN 1.5: Large Language Models as Materials Science Adapted Learners. [2025-05-21]. https://doi.org/10.48550/arXiv.2412.11970.
[164]
Liu X, Wang Y, Yang T, Liu X, Wen X D. AlchemBERT: Exploring Lightweight Language Models for Materials Informatics. [2025-02-13]. https://www.cambridge.org/engage/chemrxiv/article-details/6781a6b481d2151a02a3212e
[165]
Zeng Z N, Yao Y, Liu Z Y, Sun M S. Nat. Commun., 2022, 13(1): 862.
[166]
Su B, Du D, Yang Z, Zhou Y, Li J, Rao A, Sun H, Lu Z, Wen J R. A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language. [2022-09-12]. https://doi.org/10.48550/arXiv.2209.05481.
[167]
Liu S, Nie W, Wang C, Lu J, Qiao Z, Liu L, Tang J, Xiao C, Anandkumar A. Nat. Mach. Intel., 2023, 5(12): 1447.
[168]
Hua Y, Feng Z H, Song X N, Wu X J, Kittler J. Pattern Recognit., 2025, 157: 110887.
[169]
Polat C, Kurban H, Serpedin E, Kurban M. Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding.[2025-05-17]. https://doi.org/10.48550/arXiv.2505.12137.
[170]
Kim S, Chen J, Cheng T J, Gindulyte A, He J, He S Q, Li Q L, Shoemaker B A, Thiessen P A, Yu B, Zaslavsky L, Zhang J, Bolton E E. Nucleic Acids Res., 2025, 53(D1): D1516.
[171]
Arevalo J, Solorio T, Montes-y-Gómez M, González F A. Neural Comput. Appl., 2020, 32(14): 10209.
[172]
Tang X R, Tran A, Tan J, Gerstein M B. Bioinformatics, 2024, 40(Supplement_1): i357.
[173]
Kang C L, Liu X Y, Guo F. The Thirteenth International Conference on Learning Representations. Singapore: ICLR, 2025.
[174]
Chen B, Li C, Dai H, Song L. International Conference on Machine Learning. PMLR, 2020, 1608.
[175]
Liu G, Sun M, Matusik W, Jiang M, Chen J. Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning. [2024-10-08]. https://arxiv.org/abs/2410.04223
[176]
Liu G, Xu J, Luo T, Jiang M. Graph Diffusion Transformers for Multi-Conditional Molecular Generation. [2024-10-03]. https://doi.org/10.48550/arXiv.2401.13858.
[177]
Wiercioch M, Kirchmair J. Expert Syst. Appl., 2023, 213: 119055.
[178]
He Z, Chen L, Lv H, Zhou R, Xu J, Chen Y, Hu J, Gao Y. Advanced Intelligent Computing Technology and Applications. Huang D S, Premaratne P, Jin B, Qu B, Jo K H, Hussain A (Eds.). Singapore: Springer, 2023. 14088: 700.
[179]
Wu K D, Wei G. J. Chem. Inf. Model., 2018, 58(2): 520.
[180]
Yap C W. J. Comput. Chem., 2011, 32(7):1466.
[181]
Ding Y, Jiang X Q, Kim Y. Bioinformatics, 2022, 38(10): 2826.
[182]
Schlichtkrull M, Kipf T N, Bloem P, Van Den Berg R, Titov I, Welling M. The Semantic Web. Gangemi A, Navigli R, Vidal M-E, Hitzler P, Troncy R, Hollink L, Tordai A, Alam M (Eds.). Cham: Springer, 2018, 10843: 593.
[183]
Moriwaki H, Tian Y S, Kawashita N, Takagi T. J. Cheminf., 2018, 10: 4.
[184]
Kumar R, Sharma A, Alexiou A, Bilgrami A L, Kamal M A, Ashraf G M. Front. Neurosci., 2022, 16: 858126.
[185]
Zou Z, Zhang Y, Liang L, Wei M, Leng J, Jiang J, Luo Y, Hu W. Nat. Comput. Sci., 2023, 3(11): 957.
[186]
Alberts M, Schilter O, Zipoli F, Hartrampf N, Laino T. Neural Information Processing Systems. Vancouver: Curran Associates, 2024, 37: 125780.
[187]
Guo K, Nan B, Zhou Y, Guo T, Guo Z, Surve M, Liang Z, Chawla N, Wiest O, Zhang X. In Neural Information Processing Systems. Vancouver: Curran Associates, 2024, 37: 134721.
[188]
Guo S B, Jiang J, Ren H, Wang S. J. Phys. Chem. Lett., 2023, 14(33): 7461.
[189]
Yang G, Jiang S, Luo Y, Wang S, Jiang J. J. Phys. Chem. Lett. 2024, 15(34): 8766.
[190]
EdwinChacko, RudraSondhi, Praveen A, Luska K L, Spectro: A Multi-Modal Approach for Molecule Elucidation Using IR and NMR Data. [2024-11-06]. https://www.cambridge.org/engage/chemrxiv/article-details/6724fb5b7be152b1d0ae66f8
[191]
BehnamGhader P, Adlakha V, Mosbach M, Bahdanau D, Chapados N, Reddy S. LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders. [2024-08-21]. https://doi.org/10.48550/arXiv.2404.05961
[192]
Mirza A, Jablonka K M. Elucidating Structures from Spectra Using Multimodal Embeddings and Discrete Optimization. [2024-11-22]. https://chemrxiv.org/engage/chemrxiv/article-details/673fbcab5a82cea2fa4c4a39
[193]
Rocabert-Oriols P, López N, Heras-Domingo J. Multi-Modal Contrastive Learning for Chemical Structure Elucidation with VibraCLIP. [2025-04-23]. https://www.cambridge.org/engage/chemrxiv/article-details/6807a71c50018ac7c5a0d0cb
[194]
Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J. International Conference on Machine Learning. PmLR, 2021, 8748.
[195]
Wang L, Liu S, Rong Y, Zhao D, Liu Q, Wu S, Wang L. MolSpectra: Pre-Training 3D Molecular Representation with Multi-Modal Energy Spectra. [2025-02-22]. https://doi.org/10.48550/arXiv.2502.16284

Funding

National Natural Science Foundation of China(22272009)
National Natural Science Foundation of China(22203008)
PDF(1975 KB)

Accesses

Citation

Detail

Sections
Recommended

/