Living Fossils: Applying Advances in Single Molecule Sequencing to Decode Large and Complex Genomes of Ancient Plant Lineages
A “Living fossil” – a term coined by Darwin – is an ancient species that has not “radiated” to produce new species and has remained relatively unchanged since its first appearance in the fossil record. This proposal focuses on “living fossil” gymnosperm species that have survived with little to no change in morphology since their appearance in the Devonian. They predate dinosaurs, having survived dramatic global changes and 5-6 mass extinctions. We will contrast four pairs of “living fossil” gymnosperms with their closest radiated lineages. This pair-wise comparison will uncover the molecular underpinnings of these “living fossils” and the genomic changes associated with speciation of the radiated species. The addition of a 5th pair including Gnetum, will fill a serious gap in our angiosperm-dominated Tree of Life; helping put to rest the location of a hotly debated gymnosperm lineage in the evolution of seed plants. We will sequence 5 gymnosperm species (pairs 3 & 4, and Gnetum) and analyse the 10 genomes as five paired species: (1) Ginkgo (“living fossil”; sequence available) vs. Cycas (JGI project), (2) Picea (“living fossil”; sequence available) vs. Pinus (sequence available), (3) Metasequoia (“living fossil”) vs. Juniperus (this project), (4) Wollemia (“living fossil) vs. Agathis (this project), and (5) Welwitschia (JGI project) vs. Gnetum (this project). We will use “Single Molecule Real Time (SMRT) sequencing” – for whole genome coverage (Aims 1A and 1B), and use new algorithms for assembly and annotation developed at CSHL and JHU (Aim 1C). We will perform comparative phylogenetic analysis of genes and gene families in Aim 2, and identify Transposable Elements (TEs) and their activity in Aim 3. This will identify the genes and processes associated with “living fossils”, especially those conserved across the four species pairs. We will address three pertinent questions: (1) How has the genome of “living fossils” enabled them to survive global changes while retaining their ancient morphology, as well as their karyotype? (2) What changes in which genes/processes are associated with radiation of lineages vs. living fossil genomes? (3) What are the genetic and epigenetic mechanisms by which gymnosperm genomes remain viable despite being constituted almost entirely of Transposable Elements? While addressing Darwin’s original observations of “living fossils”, our project will make important innovations in sequencing, assembling and annotating large genomes, a goal of this NSF-PGRP cycle. Using SMRT sequencing to generate high-quality, contiguous assemblies of these large, complex genomes has become possible only recently. This is due to advances in throughput that increase accuracy of single molecule sequencing and novel assembly algorithms developed by us. This will be a major improvement over current NGS draft genomes that are fragmented and missing important portions of complex genomes.