Organization of the database

Last updated: 01.16.2021

Monosaccharide Biosynthesis Pathways Database (MBPD) contains two types of data:

1) Experimentally characterized enzymes and their homologs from SwissProt: Experimentally characterized enzymes were obtained from literature and used to generate HMM profiles and BLASTp based searches. For the former, homologs from SwissProt were also used. These form a manually curated dataset containing the following numbers and types:

Total number of enzymes: 1252

  • Direct Enzyme Assay (DEA): 508
  • One Pot Enzyme Assay (OPEA): 11
  • Inferred from Complementation: 53
  • Inferred from Homology (IH): 680

Enzymes used to generate profiles: 1186

Enzymes used as BLASTp queries: 66

Number of monosaccharides illustrated by above set of enzymes: 74

Number of monosaccharides with completely characterized pathways: 66

Enzymes belonging to this section are stored as ‘Experimental data’. They are indexed by unique identifiers called PUIDs. It consists of UniProt identifier of the protein suffixed with two integers. In case of single domain proteins, the two integers correspond to the residue numbers of N- and C-terminal amino acids. Each domain of a multi-domain protein is stored separately and in these cases, the two integers correspond to residue numbers of the first and last residue of the domain.

2) Homologs of monosaccharide biosynthesis enzymes obtained from completely sequenced genome: The functional annotation pipeline developed from the above set of enzymes (section 1) wad used to search for homologs in 12939 Bacterial + Archaeal completely sequenced genomes.

Enzymes belonging to this section are stored as ‘Predictions’.


Experimental data’ and ‘Predictions’ can be accessed in the following ways.

Browse:

By function: This feature allows a user to search the database for proteins based on a broad level function such as Reductase, Aminotransferase, etc. Where applicable, additional filters are provided to make the search stringent. For example, for the Reductase family, 3-reductase, 4-reductase and reductase are available as three additional filters.

By pathway: This feature can be used to search for organisms which contain a selected pathway and also the associated enzymes from a dropdown menu. Some PUIDs are listed under ‘uncharacterized’ pathway as their sequences were used to generate profiles owing to sequence homology but due to lack of experimental evidence, the biological process/pathway they are involved in is not known.

By genome: This feature allows retrieval of monosaccharide biosynthesis pathways encoded by an organism. Drop down menus for genus, species and strain names permit choosing the organism with ease. Results are displayed in two sections: (i) monosaccharides and corresponding experimentally characterized and predicted enzymes of their biosynthesis pathway and (ii) all predictions from the selected genome which participate in monosaccharide biosynthesis. Note: ‘Browse by genome’ fetches both ‘experimental data’ and ‘predictionsbelonging to the selected organism.

Search:

By keyword: This feature can be used to search for enzymes by gene name, UniProt ID, PDB ID, functional category and pubmed ID.

By sequence: Besides retrieving data stored in the database, it is also possible to search the database for homologs of a protein of interest. This is made possible by the ‘search by sequence’ option. HMM profiles and BLASTp queries that were used to scan whole genome sequences have been integrated in the website to find homologs of the sequences input by a user. For predictions from a genome not included in MBPD, kindly send the protein fasta file to jaya_srivastava [at] iitb [dot] ac [dot] in (). We will reply with predictions at the earliest. Additionally, you can go to Download data -> ‘Python script for genome scan’ and access the Github repository which contains the code and required input files to run the scan.