Specific Aims for Medicinal Plant Genomics Resource

High throughput sequencing of genomes and transcriptomes has revolutionized and accelerated the pace and progress of research across the life sciences. In plants, the application of these approaches to model organisms and major agricultural crops (e.g., Arabidopsis, rice, sorghum, maize and poplar) has provided tremendous insight into plant metabolic processes. However, while primary and intermediary metabolism is conserved across the plant kingdom, the specialized secondary metabolic pathways leading to medicinal compounds are not well conserved. Indeed, medicinal compounds are often produced by a handful of plant genera or species. As a result, progress in understanding and manipulating these taxonomically restricted metabolic pathways, many of which produce compounds of pharmaceutical importance, has not benefited to the same extent from the genomics revolution. The proposed research will address this gap in our species-specific knowledge of plant metabolism by determining the DNA sequence and expression of the transcriptomes and the associated metabolomes of 14 key medicinal plant species, thereby allowing genome-enabled identification of candidate pathway genes in these organisms through correlation of gene expression with the production of specific pharmaceutically relevant metabolites. The resulting datasets will provide an unparalleled resource for the research community working at the interface of plant metabolism and human health.

List of Objectives for Medicinal Plant Genomics Resource

  • Obtain well-characterized and reproducible samples of plant materials for the 14 taxonomically diverse medicinal plant species. For each species, up to 20 tissue samples will be selected that are anticipated to have significant variation in concentrations of medicinal compounds: 10 core tissue samples and up to 10 additional samples selected by the relevant species experts. Each sample will be extracted for RNA and metabolites and aliquots provided to the metabolomics and transcriptomics units for analyses.

  • Perform quality control on samples using LC-MS to assess the levels of 5-10 known, well-characterized medicinal compounds in each plant species. These validated, chemically diverse samples will then be used for whole transcriptome sequencing, gene expression profiling and quantitative metabolite profiling analyses.

  • Obtain 600-800 Mb of transcriptome sequence per species using next generation sequencing of a normalized library made from pooled mRNA from 5 diverse tissues (e.g., roots, stems, flowers, young leaf buds and callus tissue). Generate a virtual transcriptome for each species. Annotate the assembled transcriptome for putative gene function using bioinformatic approaches including sequence similarity, motif/domain searches, and subcellular localization predictions.

  • Employ Illumina RNA-Seq whole transcriptome sequencing to generate deep expression profiles in up to 20 chemically diverse tissues for each plant species. Map expression data to assembled virtual transcriptome of each species. Use data to further improve the transcriptome assembly.

  • Perform quantitative metabolite profiling of the samples to quantify the relative levels of known medicinal compounds and potential metabolic intermediates in each species.

  • Deposit all datasets into the relevant publicly accessible databases and make raw and processed datasets available through the project website. This site will provide a user-friendly data interface and will incorporate custom tools for the community to download, access, and compare the sequences, annotations, transcript expression and metabolite data sets. The data and website will enable a key, previously unaccessible link between the genome and metabolome of medicinal plants, and provide novel information to the community about the genes and markers of medicinal compound synthesis.