Home Journals Progress in Chemistry
Progress in Chemistry

Abbreviation (ISO4): Prog Chem      Editor in chief: Jincai ZHAO

About  /  Aim & scope  /  Editorial board  /  Indexed  /  Contact  / 
Review

Optimizing Metabolic Pathways by Using Bioretrosynthesis Tools

  • Liu Fufeng ,
  • Liu Xuzhi ,
  • Li Jinbi ,
  • Lu Fuping , *
Expand
  • College of Biotechnology, Tianjin University of Science & Technology, Tianjin 300457, China
*e-mail:

Received date: 2023-09-16

  Revised date: 2023-11-02

  Online published: 2024-02-26

Supported by

National Key R&D Program of China(2021YFC2102701)

Abstract

Biocatalysis has become an important technology In the field of biosynthesis because of its mild reaction conditions,high efficiency,high specificity and low price.There are a series of highly integrated metabolic networks in the biosynthesis system,and the study of multi-enzyme catalytic system has become an inevitable trend in the field of biosynthesis,so it is of great significance to explore the unknown multi-enzyme synthesis path based on the known products.in this review,the concepts of multi-enzyme system and retrosynthesis process are introduced.and the design methods,advantages and disadvantages of retrosynthesis tools are summarized.Then the tools are divided into host-based and host-less tools.For each of these two types,some representative retrosynthesis tools are listed to analyze their respective design processes and differences.Finally,the possibility of artificial intelligence-assisted multi-enzyme system is discussed and the optimization and development of multi-enzyme pathway construction tools are forecasted。

Contents

1 Introduction

2 Multienzyme catalysis

3 Methods for building retrosynthesis tools

4 Introduction to the retrosynthesis tools

4.1 Host-based retrosynthetic tools

4.2 Host-free retrosynthetic tools

5 Artificial intelligence fuels the development of multi-enzyme systems

6 Conclusion and outlook

Cite this article

Liu Fufeng , Liu Xuzhi , Li Jinbi , Lu Fuping . Optimizing Metabolic Pathways by Using Bioretrosynthesis Tools[J]. Progress in Chemistry, 2024 , 36(4) : 501 -510 . DOI: 10.7536/PC230906

1 Introduction

as more and more genome-scale metabolic networks are reconstructed,more and more data are mined in biology,so more theories,algorithms and tools are needed to support them.Enzymes,As elements with catalytic functions,have great advantages over chemical catalysts,and can form composite catalytic systems through rational design and multiple assembly to achieve cascade catalytic functions[1,2][3,4]。 Enzyme cascade catalysis can produce various chemical products with high conversion rate in a short time[5]。 Understanding how selected enzymes convert biological feedstocks into biochemicals in multienzyme cascade catalyzed reactions can serve as an important foundation for the application of biocatalysis,metabolic engineering,and synthetic biology to achieve environmentally friendly green bioprocess[6~9],while the utilization of enzyme catalysis is also very valuable for the conversion of non-natural compounds[10]。 the construction of multienzyme cascade reaction systems as well as in vitro organelles has further enabled The synthesis of complex structural compounds,thus solving a series of problems arising from biosynthetic applications[11~13]。 However,in the process of exploring multi-enzyme synthesis pathways,it is still a challenge to explore the best synthesis reaction pathway based on known products,and there are some limitations such as lack of experience and incomplete data in manual design based on personal prior knowledge.Therefore,based on the information related to multi-enzyme reactions(Figure 1),researchers have developed data-driven tool models for retrosynthetic path analysis of target compounds,and the results of these tool application tests are also excellent[14]。 It can be seen that automated retrosynthetic pathway analysis and related tool models play a very important role in the study of multienzyme synthetic networks[15]。 Based on this,in order to facilitate users to choose these tools reasonably and effectively,this paper briefly introduces and summarizes the calculation tools for constructing inverse synthesis paths。
图1 多酶催化反应相关词云图

Fig. 1 Multi-enzyme catalyzed reaction related words cloud map

2 Multienzyme catalysis

multi-enzyme catalysis is a concurrent Multi-enzyme one-pot biocatalytic reaction and a reaction in which components are added in sequence or the process steps are stretched,because the reaction is environmentally friendly and the cost of the conversion of reaction intermediates is reduced.It has established a bridge between single enzyme catalysis and whole cell catalysis,which has been the focus of researchers in the field of biocatalysis[16,17]。 the use of enzymatic cascades throughout The reaction is a particularly attractive aspect of biocatalysis because of its general reaction compatibility and ability to achieve multistep reactions in cell-free systems(in vitro)or in whole cells[18]。 In addition to this,multienzyme catalysis not only reduces substrate transport and reaction time,but also reduces intermediate losses due to diffusion,producing fewer by-products and contaminants[19]。 in recent years,the application of multi-enzyme catalytic reaction is of great significance to industrial development,and significant progress has been made In the synthesis of drugs,cosmetics and nutritional compounds[20]
At present,the multienzyme cascade catalytic system mainly includes in vivo multienzyme catalytic system and in vitro multienzyme catalytic system.multi-enzyme catalysis in vivo mainly refers to the process in which a substrate produces a specific product through Multi-step catalytic reactions under the action of a variety of enzymes in an organism.It can evaluate the applicability of the pathway of the endogenous metabolic system of the host organism in a specific environment,but the range of feasible pathways is limited due to its dependence on host cells.in vitro multienzyme catalysis is a rapidly developing research method in the field of synthetic biology,which is a process of converting a substrate into a target product by using purified enzymes and coenzymes through certain biochemical reactions[21]。 in vitro multienzyme cascades are usually extracellular reconstitution of natural metabolic pathways,or assembly of different types of artificial enzymes to achieve conversion of substrates to target products.Cell extracts and pure enzymes are commonly used to construct multienzyme cascade catalytic systems In vitro[22]。 Multienzyme cascades in vitro have many advantages:they have a greater ability to synthesize complex molecules,do not require intermediate separation,and can reduce the shift of unfavorable equilibria to products,such as toxic effects[23,24]。 However,In a single system,it is challenging to achieve consistent reaction conditions that are optimal for all enzymes.in addition,the stability of enzymes is a big problem,and their performance may be impaired by unstable or inhibitory interactions between cascade components or uncoordinated reaction conditions[25]。 Therefore,the optimization of such systems is usually inevitable,and many parameters,such as the design of synthetic routes,the selection of enzymes,reaction conditions,or process design,can change the performance of in vitro enzyme cascades(Figure 2),so in vitro multi-enzyme catalytic systems can be optimized by combining experimental methods with data-driven tool models[26]
图2 多酶级联反应的影响因素

Fig. 2 Influencing factors of multi-enzyme cascade reaction

3 Back-synthesis tool construction method

based on the development of related applications of multi-enzyme catalytic reactions,multi-enzyme cascade reaction systems have been widely used in different fields,and there have been many successful cases of designing synthetic biological pathways,but the reaction steps are often relatively long.At the same time,some tool models for the design of biological retrosynthetic pathways have emerged in the field of retrosynthesis.Bioretrosynthesis is mainly used to produce high-value compounds from renewable resources or engineered enzymes,designing optimal synthetic pathways Based on microbial metabolic networks as a source of metabolic pathways in synthetic biology[27]。 Biological retrosynthetic design first needs to define the target molecule to be synthesized,and then use the consistent biochemical reaction rules as a template to identify potential precursors and enzymes needed for the reaction.the final reaction path is relatively short and the reaction efficiency can be significantly improved[28]。 In order to be able to explore some unknown areas of viable metabolite transformations,it is necessary to develop computational tools for new bioretrosynthetic pathways。
Through the analysis of the progress of the existing retrosynthetic path design methods,we find that there are three main methods of retrosynthetic path design(Figure 3):path search based on existing knowledge,template-based path construction method,and template-free path construction method[29]
图3 逆合成路径的设计方法

Fig. 3 Design methods of retrosynthesis pathways

Early researchers manually searched for pathways based on existing knowledge,extracting possible biosynthetic Pathways From existing reaction databases(MetaCyc(Metabolic Pathways From all Domains of Life),KEGG(Kyoto Encyclopedia of Genes and Genomes),etc.)and ranking pathways based on experience[30][31]。 For complex compounds,these methods are usually not applicable when the synthetic path of the compound is not available in the database。
With the further development of data-driven,ML(Machine learning)has become the main tool for retrosynthetic path design,and has been applied to every step of the path design process[32,33]。 Through the analysis of this kind of retrosynthetic pathway design tools,it is found that the main difference between machine learning based retrosynthetic tools in the process of designing metabolic pathways lies in whether they depend on templates or not.template-based construction of synthetic pathways is where Template-based or rule-based models(NovoStoic,RetroPathRL,etc.)match a query molecule to a collection of generalized reaction rules,i.e.,subgraph patterns of molecules highlight changes during a biochemical reaction[34][35][36]。 Rules were manually summarized by experts or extracted from the reaction database[37]。 Its principle is based on the template conversion of target molecules into reactants according to the reaction rules in the template library.While rule-based approaches have yielded promising results in reverse transcription biosynthesis,neglecting the effect of long-range substituents on the reaction center,in addition,reaction rules that are too specific or too general will result in predicted routes that are too conservative or unrealistic,respectively,requiring laborious and time-consuming optimization by experts,and they cannot predict reactions outside the database[38][39]。 the Template-free synthetic pathway uses a model without a predefined reaction template to predict the precursor,and its principle is to convert the SMILES sequence of the product molecule into the SMILES sequence of the reactant molecule based on the molecular sequence(such as the SMILES sequence)or to divide the chemical bonds of the product based on the molecular map and add them appropriately to form the reactant.template-free retrosynthesis tools(BioNavi-NP et al.)use reaction databases to train ML models to predict precursors by inputting molecules of interest[40]
Whether template-dependent or template-free,the main workflow of the machine learning-based retrosynthetic pathway design tool is achieved through four steps(Figure 4)。
图4 逆合成工具的设计流程:(1)设计代谢路径,(2)精炼数据,(3)辨别并删减路径,(4)路径评分

Fig. 4 Design flow of retrosynthesis tools. (1) design metabolic pathway;(2) refine data;(3) identify and cut paths;(4) path scoring

(1)Design metabolic pathways and generate metabolic networks.These tools first combine known reactions in the database,retrieve the desired molecule through homology alignment and construct new metabolic pathways based on reaction rules,or convert to possible precursors based on the sequence and structural random bond breakage of the target molecule,in which various prediction algorithms are a very important driver。
(2)Refine the data.the metabolic network generated in the first step will contain all possible reactants and enzymes to generate the target product,and random bond breaking based on the molecular structural formula alone may export candidate precursors that do not exist in nature.Retrosynthetic tools need to evaluate the most likely reactions and optimal reactants,during which unnecessary substances and metabolic pathways are removed。
(3)identify and prune paths.By comparing the path in the metabolic network with the information in the database containing biological metabolic pathways(MetaCyc,KEGG,etc.),we can Identify whether the path belongs to a new path,and in addition,we can obtain the reaction compounds and some biochemical characteristics of the reaction。
and(4)score and prioritizing that generate paths.the results obtained In the first three steps can be used to score and rank the pathways in order to pinpoint the optimal pathway that is most likely to produce the target molecule.in addition to this,the retrosynthetic tool will take into account the toxicity,yield,economics of the compound,and so on,thereby suggesting an optimal route suitable for putting into production[41]

4 Introduction to Retrosynthesis Tools

Up to now,there are many computational tools for designing biosynthetic metabolic pathways,which are mainly divided into two categories based on practicality and user habits.One is to explore the retrosynthetic pathway of the target compound in the host(such as Escherichia coli,cyanobacteria,etc.),and finally obtain the reaction pathway with short reaction steps,easily available intermediates and low cost.Table 1 is a summary analysis of some host-based retrosynthetic tools。
表1 Introduction to Host-Based Retrosynthesis Tools

Table 1 Introduction of host-based retrosynthesis tools

Name Host Method Availability Database resource Advantage
PathPred bacteria, plant rule-based https://www.genome.jp/tools/pathpred/ KEGG selectable host
MRE E. coli response search http://www.cbrc.kaust. edu.sa/mre/ KEGG visual interface
EcoSynther E. coli response search http://www.rxnfinder.org/ecosynther/ Rhea, KEGG no need to set up precursors
Pathway hunter Tool (PHT) E. coli response search http://www.pht.uni-koeln.de BRENDA, PROSITE, KEGG visual interface
PATHcre8 cyanobacteria response search https://www.cbrc.kaust.edu.sa/pathcre8/ KEGG comprehensive scoring
FMM animals,plants, fungus, prokaryote response search http://FMM.mbc.nctu.edu.tw/ KEGG higher host selectivity
in addition,a class of tools is the design of multiple pathways for bioretrosynthetic reactions based on known products in an in vitro environment,regardless of the host.Such tools(Table 2)are typically designed for host-free in vitro synthesis pathways。
表2 Introduction to Host-Free Retrosynthesis Tools

Table 2 Introduction of host-free retrosynthesis tools

Name Method Availability Database resource Advantage
Novostoic rule-based code MetRxn path score ranking
RouteSearch response search software MetaCyc simple operation
Envipath rule-based web server EAWAG-BBD visual interface
BNICE.ch rule-based web server KEGG,MetaCyc (miltiple) simple operation
RetroPath2.0 rule-based web server MetaNetX 2.0 visual path flow
RetroPathRL rule-based code MetaNetX Retropath 2.0 upgrade version
RetroBioCat rule-based web server Pubchem the new biotransformation database
Bionavi-np arbitrary rule web server,code KEGG, MetaCyc (miltiple) higher path hit ratio

4.1 Host-based bioretrosynthetic tool

host-based retrosynthetic tools are mainly used to design The synthesis path of the target compound by simulating the internal environment of the organism,so as to construct a complete metabolic network in vivo,which is conducive to the application of intracellular reaction experiments.However,the Host type is single,and it is often difficult to achieve the selection of multiple hosts.the main existing tools are as follows。

4.1.1 EcoSynther

EcoSynther is a tool for exploring the metabolic pathways of target substances in E.coli,which is based on Rhea,KEGG database and E.coli genome metabolic network.It allows the automated retrieval of the synthetic pathway of a target substance when its precursor is unknown,while taking the physiological state of E.coli under specific conditions as a reference factor[42]。 in the process of using EcoSynther to construct the synthesis path,the main carbon source and the demand for oxygen can be set,and the heterologous reaction can be searched centrally according to the set parameters and the reaction database.Then it is introduced into the E.coli model to construct a complete metabolic network In the host.Finally,according to the substrate,reaction conditions and parameters set by the user,the flux balance analysis is carried out and the theoretical yield of the target compound is calculated[43][44]。 for the output path results,EcoSynther is displayed in the form of a visual interface,which is convenient For users to view and record.EcoSynther can successfully search the synthesis pathway of lycopene and resveratrol,which contains tyrosine and other precursors。
the tool does not need to specify the precursor molecule,which helps to explore the path of different precursors to target compounds in E.coli and try to reduce the impact of heterologous metabolites on the production pathway.However,because its construction path depends on a specific database,the amount of data is limited and needs to be further improved。

4.1.2 Pathway hunter Tool(PHT)

Pathway hunter tool(PHT)is a fast,stable,and user-friendly Tool designed to explore the shortest path of metabolic synthesis in a host.in addition,PHT can also calculate the average shortest path,the average alternative path,and the 10 metabolites with the highest matching degree of nodes in the metabolic network.the Tool first uses metabolite atomic mass values,the fingerprint algorithm in CDK(Chemistry Development Kit),and the Tanimoto algorithm to perform similarity search and calculation of metabolite structures[45][46]。 Then in the process of designing the shortest path,PHT uses the BFS(Breath First Search)algorithm and Hall logic to exclude the side metabolites that may cause path interruption or path elongation in advance[47][48]。 the tool takes the minimum number of reaction steps between substrate and product as the shortest path,and finally outputs the shortest path between two metabolites based on structural similarity and BFS algorithm.There are four options in the de novo design of metabolic pathways using PHT:the shortest metabolic pathway of one substrate to one product,the shortest metabolic pathway of one substrate to multiple feasible products,the shortest pathway of multiple feasible substrates to one product,and statistical analysis of metabolic pathways.the final output feasibility path of the tool has three forms:text-based form,GML file-based form,and enzyme-enzyme connection matrix form.At present,Ding Dewu et al.Used PHT to analyze the shortest path betweenα-D-glucose and pyruvate in the metabolic network of E.coli K-12 MG1655 and the related enzymes required for the reaction[49]
the tool is fast In analysis and accurate in calculation,and the visual interface is very user-friendly and easy for users to operate,allowing users to have a large number of choices when choosing the host,but on this basis,it only outputs the availability of the enzyme in the host.in addition,the tool uses BRENDA,PROSITE,KEGG,etc.As the relevant basic databases,and the loadable metabolite data is limited[50][51]

4.1.3 PathPred

PathPred is a Web server for predicting enzyme-catalyzed metabolic pathways,which can predict multi-step synthetic pathways of target compounds based on local RDM pattern matching and alignment of global chemical structure and reactant databases[52]。 Unlike other tools,This tool has two options in the construction of synthetic pathways:(1)exogenous biodegradation pathways in bacteria;(2)Enzyme-catalyzed pathways of secondary metabolites in plants,another unique function of PathPred is its potential to link prediction results to genomic information.the tool first performs a global similarity search on the database through the SIMCOMP program,and then performs a local similarity search on the matched compounds,and uses the matched patterns to query the reaction intermediates.this step is repeated continuously,and finally the compounds are ranked according to the sum of the path score and the maximum similarity score between the candidate compounds,and the result shows the predicted synthesis path in the form of a dendrogram[53][54]。 Moreover,PathPred successfully predicted the reaction pathway from 1,2,3,4-tetrachlorobenzene to glycolate and the reaction from tetrachlorocatechin to tetrachlorurate。
the visual interface of PathPred is user-friendly,but there are still many pathways in the tool that are not collected,such as the biodegradation pathways of environmental compounds and the biosynthesis pathways of secondary metabolites,and the tool cannot evaluate the applicability of the predicted pathways in the specific environment of the host。

4.1.4 PATHcre8

PATHcre8 can construct metabolic pathways in cyanobacteria and other organisms.After obtaining the required metabolic network,It will rank the pathways according to reaction thermodynamics,toxicity of intermediates,product consumption and host-specific information,and the ranking accuracy is higher than other tools.PATHcre8 data mainly comes from the compound and reaction information of KEGG database.it uses the extracted data to construct an independent bipartite general metabolic network to represent compounds and metabolic reactions,and then assigns reaction scores and composite scores to the metabolic network and uses the combination of Gibbs energy(ΔG)and enzyme copy number(eCN)of the host to estimate reaction scores.KEGG orthogonal method and the combination of composite toxicity score(cTOX)and hypothetical product consumption score(cPC)were used to identify the enzymes required by the reaction paths and estimate the reaction scores,and the paths were sorted according to the comprehensive scores,and finally the top K target paths were obtained based on the k-shortest acyclic path algorithm[55][56]。 In the process of exploring the isoprene synthesis pathway,PATHcre8 can retrieve the experimentally confirmed isoprene synthesis pathway from acetoacetyl-CoA,and predict the effective synthesis pathway of cocaine。
Validation results for PATHcre8 apply only to cyanobacteria,preferred host organisms that often have compounds that are consumed by competing reactions,reducing the yield of the desired product[57]。 On the other hand,the preferred host organism may not have the natural metabolic reactions required to produce the target compound.Therefore,retrosynthetic path design tools need to be able to screen for the most efficient multienzyme-catalyzed reaction paths。

4.1.5 FMM

FMM is a Web server for reconstructing and analyzing metabolic pathways from one metabolite to another between different species.Its data source is mainly based on KEGG database and other comprehensive biological databases.in addition,FMM has a unique advantage:it can realize the connection operation of metabolites in different KEGG maps.the tool first collects metabolite,enzyme and reaction information from the KEGG database,uses the BFS algorithm to construct a reaction matrix,and then uses the identifiers in the reaction matrix to construct the reaction path from one metabolite to another metabolite between different species.Finally,the method based on metabolic pathway alignment score(M-PAS)identifies and ranks all candidate metabolic pathways,and compares and analyzes the ranking results to determine which genes of which species can be cloned into these microorganisms to produce the target product[58]。 the metabolic pathway from 4-coumarate to naringenin and the three enzymes required for the reaction can be retrieved in FMM,and the metabolic pathway from tyrosine to resveratrol and the three enzymes required for the reaction can also be explored。
FMM,after identifying two metabolites of interest,can select species from four categories:animals,plants,fungi,and prokaryotes,When comparing metabolic pathways,one metabolic pathway in the main species column(microorganisms commonly used in synthetic biology)and multiple metabolic pathways in the comparison species column(several organisms commonly used in the laboratory)will be generated,and the introduction of exogenous enzymes will be recommended in the exogenous pathway.However,the compound and reaction information collected by the FMM tool is limited by the KEGG database,and the amount of data needs to be continuously improved and updated[59]

4.1.6 MRE

So far,modulating the expression of competing endogenous pathways has emerged as one of the effective strategies to optimize heterologous production pathways[60]。 MRE is a Web server that provides actual enzymes for exogenous metabolic reactions and generates reaction paths between one metabolite and another metabolite on the premise of considering the endogenous competitive reactions of the host,and its data resources mainly come from the KEGG database.When designing metabolic pathways from scratch,MRE first constructs a directed graph to represent the metabolic network independent of the host,including the existing reaction information in the database,and then classifies the endogenous and exogenous enzyme reactions in the host given by the user.the thermodynamic data are then used to assign weights to the metabolic reactions,and the resulting output is a host-independent metabolic network that ranks the pathways according to a measure of net thermodynamic preference.When exploring the heterologous biosynthesis pathway of naringenin from L-tyrosine,MRE detected the conversion of coumaroyl-CoA to naringenin chalcone,and successfully verified the synthesis pathways of 1,3-propanediol,1,2-propanediol and artemisinic acid。
MRE is not only fast in path search,but also the top metabolic pathways of MRE output are exactly the same as the most advanced pathways known at present.in addition,exogenous enzymes can be recommended on the basis of endogenous metabolic systems,but there are still some substances in the KEGG database without EC numbers and related gene annotations,which will limit MRE to explore more effective metabolic pathways[61]

4.2 Host-free bioretrosynthetic tool

host-free biological retrosynthesis tools mainly use multi-enzyme systems in vitro to design the synthetic pathway of target compounds in a host-free environment,design a series of synthetic pathway networks and screen out the best pathway.It does not have the limitation of cell structure,has a wider range of path screening,and is superior to host-based tools.At present,the main retrosynthetic tools are as follows。

4.2.1 RetroPath2.0

RetroPath 2.0 is an open source workflow for retrosynthesis automation based on reaction rules.Its data source is mainly MetaNetX 2.0 database.the workflow can run in The visualization software KNIME[62]。 First,it takes a set of metabolite SMILES sequences as input,and then encodes the change of bonding mode when a set of substrates are converted into a set of products as the reaction rules.For the application of reverse transcription synthesis,the rules can be reversed to explore the synthesis pathway of products.After outputting the synthesis path,the tool scores the reaction according to the ability of the enzyme sequence required by the reaction,and evaluates the specificity of the enzyme through the substrate generalization ability in the SMARTS rule,so as to delete some path information lower than the predefined value and output the appropriate candidate path[63]。 RetroPath2.0 searched the synthetic pathways of styrene,in which pathways for bioproduction of styrene from phenylalanine using heterologous enzymes as well as five alternative pathways for endogenous compounds of Escherichia coli were found。
the interactive interface of RetroPath 2.0 allows users to quickly test and prototype,and is easy to use,making it easy for users with no programming experience to quickly understand and use the tool to build retrosynthesis paths.In addition,the reaction rules of the tool are not built around enzyme committee nomenclature,but rather an automated translation of enzyme reactions extracted from the database,which provides an accurate view of enzyme capabilities[64]。 However,RetroPath 2.0 cannot evaluate the path applicability of the endogenous metabolic system of the host organism in a specific environment,and the data source is not perfect.in addition,RetroPath 2.0 ignores the knowledge and experience of experts in the identification of path ranking,so it needs to be improved continuously。

4.2.2 RetroPath RL

RetroPath RL is a modular command line tool for building retrosynthetic paths,an upgrade of RetroPath 2.0.the tool takes the target compound and reaction rules as input,uses a Monte Carlo search algorithm to construct feasible reactions,screens atoms based on the distance of the bond around the reaction center,This is used to select the appropriate precursor,and then the reaction path is constructed according to the diameter rule and a biochemical score based on chemical similarity and enzyme sequence availability estimates is used,as well as to predict the toxicity of the compound[65,66]。 Training results based on a manually curated dataset of 20 compounds show that RetroPath RL can achieve 75%of the exact paths described in the literature under stringent settings.RetroPath RL can export pyruvate,acetyl-CoA and other different substrates to produce multiple synthetic pathways of furoic acid。
the memory requirement of the exhaustive search algorithm used by RetroPath 2.0 is basically limited to a five-step path,while RetroPath RL can explore longer paths and find more solutions in the same time.and the paths can be sorted separately.However,it still relies on the metabolic database,and in the case of limited database data,the accuracy of the tool training is also limited,and the number of microbial strains it provides is limited,which needs to be further improved。

4.2.3 Retrobiocat

Retrobiocat is a Computer-Aided Synthesis Planning tool for biocatalytic reactions and cascades.Compared with CASP(Computer-Aided Synthesis Planning)tools such as Chematica or ASKCOS,Retrobiocat is able to describe the reaction steps of enzymes and the use of reaction rules[67][68]。 the use of this tool usually follows three steps:in the first step,the reaction rules are extracted from the known database and coded into a set of professionally coded reaction rules,based on which the pathway of biocatalytic retrosynthesis is generated[69]; in the second step,the tool will identify specific enzymes in each step of the reaction.When a specific enzyme is selected,the tool can identify the enzymes used in the literature precedents and provide suggestions on whether they are applicable to the current path.the source of the literature precedents is the synthetic biotransformation literature precedent database developed by the researchers.potential pathways are then constructed based on factors such as the availability of the target substrate and the number of steps in the reaction pathway.in the third step,the proposed enzymes and reaction pathways are scored based on factors such as substrate availability,enzyme selectivity,thermodynamics,cofactor use,and substrate solubility.in terms of exploring Potential synthetic pathways,the tool is mainly implemented in two ways:One is the CASP network exploration mode,in which the user can explore different pathways to the target molecule by expanding the biocatalytic network.Network exploration mode is very useful for scientists who are not familiar with biocatalysis,showing users the catalytic network of each intermediate in a visual form.one is the path exploration mode,in which the tool automatically generates paths before ranking them based on a user-defined weighted score,and then relies on the weighted score to determine which paths are the most promising[70]
Retrobiocat has developed an interactive interface that is intuitive and user-friendly,and provides literature precedents for the enzymes proposed for the reaction,making it an invaluable resource.in addition to the synthetic biotransformation database,BiocatDB provides a search portal to study the substrate specificity of specific enzymes or reactions,which can provide a more detailed understanding of the enzymes available for specific biotransformations.On the whole,RetroBioCat is superior to Retropath 2.0 and RetroPath RL in terms of accuracy and search range in path construction tests,but the reaction rules currently encoded by the tool are not enough to construct more reaction paths.If we want to expand,we need to use more advanced path generation algorithms,such as Monte Carlo tree search algorithm,and some enzymatic reactions are not complete,so the more complex target compounds may not be able to construct their synthesis paths,and the tool still needs to be optimized continuously。
At present,Retrobiocat has been Successfully applied to experiments.Gao Dengke et al.successfully constructed an efficient multi-enzyme catalytic synthesis pathway of L-homophenylalanine with the assistance of Retrobiocat tool,as shown in Figure 5.the pathway produces L-homophenylalanine from pyruvic acid and benzaldehyde in a whole-cell manner,and improves the expression level of rate-limiting enzymes through protein engineering.to achieve efficient production of L-homophenylalanine,in the construction process of Retrobiocat,the initial step was set to 4,the maximum initial node was set to 7,2-oxo-4-phenylbutyric acid was used as raw material,free ammonia was used as amino donor,and L-HPA was synthesized by phenylalanine dehydrogenase(PHEDHS),which was conducive to industrial application[71]。 Retrobiocat suggested that(E)-2-oxo-4-phenylbut-3-enoic acid could be synthesized by ene reductase-catalyzed ene reduction.Finally,the route was artificially optimized to achieve the efficient synthesis of L-homophenylalanine.It can be seen that RetrobioCat can effectively construct multiple enzyme-catalyzed cascade retrosynthetic reaction pathways,and can be used as a powerful tool for computer-aided construction of retrosynthetic pathways,and can provide a reference for the optimization of retrosynthetic pathway design tools。
图5 L-HPA的合成路径:(1) 高苯丙氨酸,(2) 2-氧代-4-苯基丁酸,(3) (E)-2-氧代-4-苯基丁-3-烯酸,(4) 2-羟基-4-氧代-4-苯基丁酸,(5)丙酮酸,(6)苯甲醛

Fig. 5 Synthesis path of L-HPA. (1) L-HPA; (2) 2-oxo-4- phenylbutanoic acid; (3) (E)-2-oxo-4-phenylbut-3-enoic acid; (4) 4-hydroxy-2-oxo-4-phenylbutanoic acid; (5) pyruvate; (6) benzaldehyde

4.2.4 Bionavi-NP

BioNavi-NP is a single-step bioretrosynthetic prediction model trained using general organic and biosynthetic reactions through an end-to-end transformer neural network,and its data sources are mainly from MetaCyc(version 23.5)and KEGG databases.the tool solves the problem of predicting enzymatic reaction overfitting by training an NLP(Natural Language Process)model,using the Seq 2Seq(Sequence to Sequence)method,combining 62 000 Natural product-like organic reactions and 33 000 biochemical reactions[72][73]。 In The process of constructing the retrosynthetic pathway,BioNavi-NP takes the SMILES sequence of the target compound as input,and divides each extracted reaction into a source sequence and a target sequence for model training.the prediction of each step is carried out by a transformer neural network,which is built,trained and tested by Pytorch and OpenNMT frameworks[74][75][76]。 After that,the beam search program,the best performance model and the Retro*algorithm are used to predict a plurality of feasible precursors and output the best proposed path.the results of the tool trained with USPTO_NPL dataset and BioChem data show that it is 1.7 times more accurate than the existing traditional rule-based method.BioNavi-NP predicted the synthetic pathway of Sterhirsutin J and glutaric acid.sterhirsutin J decomposes to polyalkane sesquiterpenes and Colletorin D acid,which then leads to a defined biosynthetic pathway;the synthetic pathway of glutaric acid includes lysine degradation pathway and glutamic acid production pathway,which belong to the definite synthetic pathway。
BioNavi-NP is a typical model trained irregularly and does not require template-based training.Compared with template-based RetroPath 2.0 and Retrobiocat,BioNavi-NP has a wider range of search synthesis paths and higher accuracy,but it also has some defects,such as low accuracy of reading molecules,possible semantic errors,failure to read,and the transformer neural network needs to be further optimized。

5 Artificial Intelligence Helps the Development of Multi-enzyme System

in the data-driven environment,computer-aided design plays a vital role in enzyme design and pathway construction.At present,machine learning has been widely used in this field,but the traditional machine learning has shown some disadvantages,such as the limitation of autoregressive sampling strategy,the unclear classification of data set types,and the inaccurate information source of the algorithm.With the help of artificial intelligence,this problem has been well solved and applied to the artificial design of enzymes[77~79]。 the deep learning-based retrosynthetic path construction tool uses a series of algorithms to encode the input molecular objects based on molecular fingerprints(such as SMILES),IUPAC and molecular maps to identify and construct possible precursor structures and analyze the required enzymes[80,81][82]。 Similar inverse synthesis path design tools have emerged,but the model is still a black box model,and it is difficult for laymen to explain the detailed construction process of the model without the necessary knowledge,and additional knowledge is still needed for interpretability methods[83]。 In addition,the input of most models is mainly based on SMILES format,which still has problems such as semantic inconsistency and reading errors,which will lead to the reduction of model performance,and double verification can effectively solve this problem[84,85]。 At present,some models use graph neural network to identify the input molecular structure,and 3D graph neural network has been applied in the medical field,so the double combination of SMILES and graph neural network is expected to solve this problem[86,87][88]。 for the reverse synthesis tools that rely on the reaction database,there are still some problems in the process of constructing the precursor,such as the amount of reaction database data,the accuracy of the template,the inconsistency of database integration,and the incompleteness of the data,including ambiguity,errors,redundancy,or inconsistency with the literature.Therefore,a workaround is needed for a nonstandardized nomenclature for databases,which requires constant mapping and perhaps periodic updates as new knowledge is discovered.Therefore,artificial intelligence needs to be further optimized in helping multi-enzyme system。

6 Conclusion and prospect

the current most effective approach for designing synthetic routes to known target compounds is to combine chemical and biological retrosynthetic analysis into a single tool.This approach requires not only access to large databases containing catalytic reactions and understanding how biological and chemical catalysts can be combined In a productive way to shorten any synthetic process,Basic research is also needed to understand the advantages and limitations of biological and chemical catalysis in cascaded pathways,so as to effectively solve the problems of catalyst deactivation and harmful by-products in the reaction.in addition,the limitations of computer performance and the understanding of natural metabolism are common problems that slow the pace of rational design of synthetic pathways by retrosynthetic tools.Therefore,we still need to constantly mine new information to improve the database and update the database,and further optimize the development method of reverse synthesis tools to improve performance,achieve higher accuracy,and make greater contributions to the field of biological reverse synthesis。
[1]
Choi J M, Han S S, Kim H S. Biotechnol. Adv., 2015, 33(7): 1443.

[2]
Wang H Y, Hu X, Hu Y J, Zhu N, Guo K. Progress in Chemistry, 2022, 34(8): 1796.

( 王慧悦, 胡欣, 胡玉静, 朱宁, 郭凯. 化学进展, 2022, 34(8): 1796.)

[3]
Li H, Shi X D, Li J L. Progress in Chemistry, 2022, 34(3): 568.

( 李红, 史晓丹, 李洁龄. 化学进展, 2022, 34(3): 568.)

[4]
Wu J J X, Wei H. Progress in Chemistry, 2021, 33(1): 42.

( 武江洁星, 魏辉. 化学进展, 2021, 33(1): 42.)

[5]
Du C C, Hu P C, Ren L J. Appl. Microbiol. Biotechnol., 2023, 107(1): 9.

[6]
Intasian P, Prakinee K, Phintha A, Trisrivirat D, Weeranoppanant N, Wongnate T, Chaiyen P. Chem. Rev., 2021, 121(17): 10367.

[7]
Nestl B M, Hammer S C, Nebel B A, Hauer B. Angew. Chem. Int. Ed., 2014, 53(12): 3070.

[8]
Zhao Z T, Zhang Z Z, Liang Z H. Progress in Chemistry, 2022, 34(11): 2386.

( 赵自通, 张真真, 梁志宏. 化学进展, 2022, 34(11): 2386.)

[9]
Huang W Q, Wang Y X, Tian W S, Wang J, Tu P F, Wang X H, Shi S B, Liu X. China Journal of Chinese Materia Medica, 2023, 48(2): 336.

( 黄文倩, 王迎夏, 田维圣, 王娟, 屠鹏飞, 王晓晖, 史社坡, 刘晓. 中国中药杂志, 2023, 48(2): 336.)

[10]
Simić S, Zukić E, Schmermund L, Faber K, Winkler C K, Kroutil W. Chem. Rev., 2022, 122(1): 1052.

[11]
Yi D, Bayer T, Badenhorst C P S, Wu S K, Doerr M, Höhne M, Bornscheuer U T. Chem. Soc. Rev., 2021, 50(14): 8003.

[12]
Benítez-Mateos A I, Roura Padrosa D, Paradisi F. Nat. Chem., 2022, 14(5): 489.

[13]
Sharma A, Gupta G, Ahmad T, Mansoor S, Kaur B. Food Rev. Int., 2021, 37(2): 121.

[14]
Zeng T, Wu R B. Synthetic Biology, 2023, 4(3): 535.

( 曾涛, 巫瑞波. 合成生物学, 2023, 4(3): 535.)

[15]
Hossain G S, Nadarajan S P, Zhang L, Ng T K, Foo J L, Ling H, Choi W J, Chang M W. Front. Microbiol., 2018, 9: 155.

[16]
Sperl J M, Sieber V. ACS Catal., 2018, 8(3): 2385.

[17]
Shi J F, Wu Y Z, Zhang S H, Tian Y, Yang D, Jiang Z Y. Chem. Soc. Rev., 2018, 47(12): 4295.

[18]
Bell E L, Finnigan W, France S P, Green A P, Hayes M A, Hepworth L J, Lovelock S L, Niikura H, Osuna S, Romero E, Ryan K S, Turner N J, Flitsch S L. Nat. Rev. Meth. Primers, 2021, 1: 46.

[19]
Ren S Z, Li C H, Jiao X B, Jia S R, Jiang Y J, Bilal M, Cui J D. Chem. Eng. J., 2019, 373: 1254.

[20]
Hwang E T, Lee S. ACS Catal., 2019, 9(5): 4402.

[21]
Hold C, Billerbeck S, Panke S. Nat. Commun., 2016, 7: 12971.

[22]
Chi C B, Zhang W C, Luo M X, Zhang M, Chen G. Chem. Eng. J., 2023, 458: 141321.

[23]
Siedentop R, Claaßen C, Rother D, Lütz S, Rosenthal K. Catalysts, 2021, 11(10): 1183.

[24]
Lopez-Gallego F, Schmidt-Dannert C. Curr. Opin. Chem. Biol., 2010, 14(2): 174.

[25]
López-Gallego F. Methods in Enzymology, 2019, 617: 385.

[26]
Zhang Y C, Nie N, Zhang Y F. Chin. J. Catal., 2022, 43(7): 1749.

[27]
Wei Y X, Han Y L, Lu D N, Qiu T. Journal of Tsinghua University Science and Technology, 2023, 63(5): 697.

( 魏奕新, 韩一蕾, 卢滇楠, 邱彤. 清华大学学报(自然科学版), 2023, 63(5): 697.)

[28]
Law J, Zsoldos Z, Simon A, Reid D, Liu Y, Khew S Y, Johnson A P, Major S, Wade R A, Ando H Y. J. Chem. Inf. Model., 2009, 49(3): 593.

[29]
Yu T H, Boob A G, Volk M J, Liu X, Cui H Y, Zhao H M. Nat. Catal., 2023, 6(2): 137.

[30]
Caspi R, Billington R, Keseler I M, Kothari A, Krummenacker M, Midford P E, Ong W K, Paley S, Subhraveti P, Karp P D. Nucleic Acids Res., 2020, 48(D1): D445.

[31]
Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. Nucl. Acids Res., 2014, 42(D1): D199.

[32]
Li S W. Masteral Dissertation of Lanzhou University, 2023.

( 李思徵. 兰州大学硕士论文, 2023.)

[33]
Chen Y Y, Song D Q, Li Y J, Zhao H P. Chemistry, 2022, 85(08): 951.

( 陈颖莹, 荣丹琪, 李元晶, 赵鸿萍. 化学通报, 2022, 85(08): 951.)

[34]
Wang L, Ng C Y, Dash S, Maranas C D. Biochem. Soc. Trans., 2018, 46(3): 513.

[35]
Koch M, Duigou T, Faulon J L. ACS Synth Biol, 2019:

[36]
Hafner J, Payne J, MohammadiPeyhani H, Hatzimanikatis V, Smolke C. Nat. Commun., 2021, 12: 1760.

[37]
Hatzimanikatis V, Li C H, Ionita J A, Henry C S, Jankowski M D, Broadbelt L J. Bioinformatics, 2005, 21(8): 1603.

[38]
Coley C W, Green W H, Jensen K F. Acc. Chem. Res., 2018, 51(5): 1281.

[39]
Segler M H S, Waller M P. Chemistry(Weinheim an der Bergstrasse, Germany), 2017, 23(25): 6118.

[40]
Zheng S J, Zeng T, Li C T, Chen B H, Coley C W, Yang Y D, Wu R B. Nat. Commun., 2022, 13: 3342.

[41]
Hadadi N, Hatzimanikatis V. Curr. Opin. Chem. Biol., 2015, 28: 99.

[42]
Morgat A, Lombardot T, Axelsen K B, Aimo L, Niknejad A, Hyka-Nouspikel N, Coudert E, Pozzato M, Pagni M, Moretti S, Rosanoff S, Onwubiko J, Bougueleret L, Xenarios I, Redaschi N, Bridge A. Nucleic Acids Res., 2017, 45(7): 4279.

[43]
Orth J D, Thiele I,Palsson B Ø. Nat. Biotechnol., 2010, 28(3): 245.

[44]
Ding S Z, Liao X P, Tu W Z, Wu L, Tian Y, Sun Q P, Chen J N, Hu Q N. ACS Chem. Biol., 2017, 12(11): 2823.

[45]
Steinbeck C, Han Y Q, Kuhn S, Horlacher O, Luttmann E, Willighagen E. J. Chem. Inf. Comput. Sci., 2003, 43(2): 493.

[46]
Xue L, Godden J W, Stahura F L, Bajorath J. J. Chem. Inf. Comput. Sci., 2003, 43(4): 1151.

[47]
Tzanov A. Computing reviews, 2013, 54(12): 725.

[48]
Nadathur G, Miller D. J. ACM, 1990, 37(4): 777.

[49]
Ding D W, Ding Y R, Cai Y J, Chen S W, Xu W B. Computers and Applied Chemistry, 2008, 25(1): 4.

( 丁德武, 丁彦蕊, 蔡宇杰, 陈守文, 须文波. 计算机与应用化学, 2008, 25(1): 4.)

[50]
Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D. Nucleic Acids Res, 2004, 32: D431.

[51]
Sigrist C J A, de Castro E, Cerutti L, Cuche B A, Hulo N, Bridge A, Bougueleret L, Xenarios I. Nucleic Acids Res., 2013, 41(D1): D344.

[52]
Oh M, Yamada T, Hattori M, Goto S, Kanehisa M. J. Chem. Inf. Model., 2007, 47(4): 1702.

[53]
Hattori M, Tanaka N, Kanehisa M, Goto S. Nucleic Acids Res., 2010, 38: W652.

[54]
Moriya Y, Shigemizu D, Hattori M, Tokimatsu T, Kotera M, Goto S, Kanehisa M. Nucleic Acids Res., 2010, 38(suppl_2): W138.

[55]
Faust K, Croes D, van Helden J. Biosystems, 2011, 105(2): 109.

[56]
Topkis D M. IEEE Trans. Commun., 1988, 36(7): 855.

[57]
Motwalli O, Uludag M, Mijakovic I, Alazmi M, Bajic V B, Gojobori T, Gao X, Essack M. ACS Synth. Biol., 2020, 9(12): 3217.

[58]
Li Y L, de Ridder D, de Groot M J, Reinders M J. BMC Syst. Biol., 2008, 2(1): 111.

[59]
Chou C H, Chang W C, Chiu C M, Huang C C, Huang H D. Nucleic Acids Res., 2009, 37(suppl_2): W129.

[60]
Solomon K V, Moon T S, Ma B, Sanders T M, Prather K L J. ACS Synth. Biol., 2013, 2(3): 126.

[61]
Kuwahara H, Alazmi M, Cui X F, Gao X. Nucleic Acids Res., 2016, 44(W1): W217.

[62]
Moretti S, Tran V, Mehl F, Ibberson M, Pagni M. Nucleic Acids Res., 2021, 49(D1): D570.

[63]
Ni Z F, Stine A E, Tyo K E J, Broadbelt L J. Metab. Eng., 2021, 65: 79.

[64]
Delépine B, Duigou T, Carbonell P, Faulon J L. Metab. Eng., 2018, 45: 158.

[65]
Segler M H S, Preuss M, Waller M P. Nature, 2018, 555(7698): 604.

[66]
Han S J, Kwon S, Kim K S. Cancer Cell Int., 2021, 21(1): 152.

[67]
Grzybowski B A, Szymkuć S, Gajewska E P, Molga K, Dittwald P, Wołos A, Klucznik T. Chem, 2018, 4(3): 390.

[68]
Struble T J, Alvarez J C, Brown S P, Chytil M, Cisar J, DesJarlais R L, Engkvist O, Frank S A, Greve D R, Griffin D J, Hou X J, Johannes J W, Kreatsoulas C, Lahue B, Mathea M, Mogk G, Nicolaou C A, Palmer A D, Price D J, Robinson R I, Salentin S, Xing L, Jaakkola T, Green W H, Barzilay R, Coley C W, Jensen K F. J. Med. Chem., 2020, 63(16): 8667.

[69]
Duigou T, Du Lac M, Carbonell P, Faulon J L. Nucleic Acids Res., 2019, 47(D1): D1229.

[70]
Finnigan W, Hepworth L J, Flitsch S L, Turner N J. Nat. Catal., 2021, 4(2): 98.

[71]
Gao D K, Song W, Wu J, Guo L, Gao C, Liu J, Chen X L, Liu L M. Angew. Chem. Int. Ed., 2022, 61(36): e202207077.

[72]
Nadkarni P M, Ohno-Machado L, Chapman W W. J. Am. Med. Inform. Assoc., 2011, 18(5): 544.

[73]
Zhang Y, Li D, Wang Y H, Fang Y, Xiao W D. Appl Sci-Basel, 2019, 9(8): 13.

[74]
Han K, Wang Y H, Chen H T, Chen X H, Guo J Y, Liu Z H, Tang Y H, Xiao A, Xu C J, Xu Y X, Yang Z H, Zhang Y M, Tao D C. IEEE Trans. Pattern Anal. Mach. Intell., 2023, 45(1): 87.

[75]
Li Q B, Wen Z Y, Wu Z M, Hu S X, Wang N B, Li Y, Liu X, He B S. IEEE Trans. Knowl. Data Eng., 2023, 35(4): 3347.

[76]
Tan Z X, Wang S, Yang Z H, Chen G, Huang X C, Sun M S, Liu Y. AI Open, 2020, 1: 5.

[77]
Li Y, Huang C, Ding L Z, Li Z X, Pan Y J, Gao X. Methods, 2019, 166: 4.

[78]
Ching T, Himmelstein D S, Beaulieu-Jones B K, Kalinin A A, Do B T, Way G P, Ferrero E, Agapow P M, Zietz M, Hoffman M M, Xie W, Rosen G L, Lengerich B J, Israeli J, Lanchantin J, Woloszynek S, Carpenter A E, Shrikumar A, Xu J B, Cofer E M, Lavender C A, Turaga S C, Alexandari A M, Lu Z Y, Harris D J, DeCaprio D, Qi Y J, Kundaje A, Peng Y F, Wiley L K, Segler M H S, Boca S M, Swamidass S J, Huang A, Gitter A, Greene C S. J. R. Soc. Interface., 2018, 15(141): 47.

[79]
Ding S Z, Jiang X Q, Meng C, Sun L X, Wang Z Q, Yang H B, Shen G W, Xia N. Science China Chemistry, 2023, 53(01): 66.

( 丁邵珍, 江小琴, 孟超, 孙丽霞, 王正权, 杨弘宾, 沈国文, 夏宁. 中国科学: 化学, 2023, 53(01): 66.)

[80]
Capecchi A, Probst D, Reymond J L. J. Cheminf., 2020, 12: 43.

[81]
Li C Y, Feng J H, Liu S H, Yao J F. Comput. Intell. Neurosci., 2022, 2022: 8464452.

[82]
Handsel J, Matthews B, Knight N J, Coles S J. J. Cheminf., 2021, 13(1): 79.

[83]
Cui W X, Liu S H, Jiang F, Zhao D B. IEEE Trans. Multimedia, 2023, 25: 816.

[84]
Zhang Z Q, Xie A L, Guan J H, Zhou S G. Bioinformatics, 2023, 39(8): btad462.

[85]
Zhou Y, Wu S K, Bornscheuer U T. Chem. Commun., 2021, 57(82): 10661.

[86]
Li J X, Peng H, Cao Y W, Dou Y T, Zhang H K, Yu P, He L F. IEEE Trans. Knowl. Data Eng., 2023, 35(1): 560.

[87]
Wang W, Suo X Y, Wei X Y, Wang B, Wang H, Dai H N, Zhang X L. IEEE Trans. Knowl. Data Eng., 2023, 35(4): 3938.

[88]
Moon K, Im H J, Kwon S. Bioinformatics, 2023, 39(6): btad371.

Outlines

/