Section site map: News |
||
| News | ||||||
2005 News Stories
NewsGADU/GNARE Uses TeraGrid For Protein Sequence AnalysisAuthors: Dr. Natalia Maltsev, Computational Biology Group (UC/ANL), and Rick Stevens (UC/ANL). GADU/GNARE, the Genome Analysis and Database Update system developed at the Mathematics and Computer Science division of Argonne National Laboratory, has successfully used TeraGrid resources for performing the periodic high-throughput analysis of all publicly available protein sequences using bioinformatics tools (e.g., Blast and Blocks). For example, the size of the NCBI non-redundant protein database is currently 2.3 Million sequences. Analysis of this data using Blast and Blocks requires on the order of 7 M processes. A typical Blast or Blocks workflow includes several steps: splitting of the input file into smaller files that will be submitted to individual nodes on the TeraGrid, execution of the bioinformatics tool by the node, followed by parsing the results from each node. After all the nodes finish parsing, the results are concatenated and the final output is sent back to the submit host. All of the workflow on the Grid is managed by the GriPhyN Virtual Data System, using Condor-G and Globus. Figure 1 shows a DAG representing the BLAST workflow that was used to execute jobs on the TeraGrid. The results of the analysis are then stored in a relational (Oracle) database. The stored data is used for building different bioinformatics applications. PUMA2 is an example of such an application. It contains the analysis of 1031 genomes pre-computed on TeraGrid and other grid resources. The results are used by the algorithms for automated annotation of the sequence data and displayed to the user for further interactive analysis. The results of these analyses are also used by other resources, including Pathos ? Microbial informatics Core for NIH Great Lakes Center of Excellence in Biodefense and emerging infections, TarGet NIH Midwest Structural Biology Center, MetaGenomes ? DOE Microbial Genomes program, and others. Table 1 gives the statistics of a Blast run using TeraGrid and Grid3 resources:
Figure 1: DAG showing a workflow for BLAST used on TeraGrid.
Figure 2: GADU is a Genome Analysis and Databases Update Tool for the Mathematics and Computer Science (MCS) department at Argonne National Laboratories (ANL). GADU is an automated tool that searches periodically through different DNA and protein databases for new and newly updated genomes of different organisms. |
||||||
![]() |
![]() |
|
The TeraGrid project is funded by the National Science Foundation and includes 11 partners: Please email help@teragrid.org with questions or comments or fill out the online feedback form. |
||
![]() |
![]() |