trinity genome guided transcriptome assembly

Han, X. et al. Plant Biol. (a) Phylogenetic tree of the NF-YB. Genome Res. 30, 273281 (2015). Bioessays. IQ-TREE 2 (ref. Article 30, 448456 (2013). 27, 573580 (1999). Cell Syst. To investigate the individual genome evolution, we compared the Chiifu genome with the inferred B. rapa ancestral genome. Science 292, 686693 (2001). Jung, B. et al. Extant cycads comprise 10 genera and approximately 360 species, two-thirds of which are on the International Union for Conservation of Nature Red List of threatened species5. 34, 21252139 (2017). Bioinformatics 29, 29332935 (2013). 22, bbab060 (2021). Password requirements: 6 to 30 characters long; ASCII characters only (characters found on a standard US keyboard); must contain at least 4 different symbols; studied genome evolution; J.Z., X.H., X.Zhu, M.L., F.C. Science. Bioinformatics 30, 21142120 (2014). Mintz, S. W. Sweetness and Power: The Place of Sugar in Modern History (Penguin, 1986). Midline: median; boxes: interquartile range; whiskers: 5th and 95th percentile range. 2b), as might be predicted for these features. Nucleic Acids Res. Article (b) Comparison of the first-layer convolution filters derived from feature map-based approaches and gradient-based TF-MoDISco on Drosophila-specific model. Google Scholar. BMC Plant Biol. i, All native (S288C reference) promoter sequences (points) projected on the evolvability space learned from random sequences; coloured by their mean pairwise distance in the archetypal evolvability space between all promoter alleles across the 1,011 yeast isolates for that gene (orthologue evolvability dispersion). Sebe-Pedros, A. et al. Bot. h, The proximity to the malleable archetype (x axis) and fitness responsivity (y axis) for the 80 genes with measured fitness responsivity. Predicted (x axis) and experimentally measured (y axis) expression for (a, c) random test sequences (sampled separately from and not overlapping with the training data) and (b, d) native yeast promoter sequences containing random single base mutations. Murat F, Louis A, Maumus F, Armero A, Cooke R, Quesneville H, et al. & McCartney, D. Fodder oats in North America, in Fodder Oats: A World Overview (eds Suttie, J. M. & Reynolds, S. G.) 1935 (FAO, 2004). 2020;18(1):121. We found similar results in the ovary (Additional file 1: Fig. Genome Res. PubMed Central 2016;21(9):74957. USA 108, 15131518 (2011). Previously, subgenome dominance was explained by the two-step theory, which suggests that B. rapa experienced a tetraploidization followed by fractionation and subsequent hybridization with a third genome, which shows less fractionation [16]. Google Scholar. Cytologia 25, 152172 (1960). Adv. 4g and Additional file 2: Table S7). PubMed The average length and number of core gene CDSs were significantly higher than that of less conserved categories (Fig. Such complexity challenges plant genome assembly, and assembled both de novo and genome guided using Trinity 69 X.S. https://doi.org/10.1093/bioinformatics/btu033. Bioinformatics 30, 12361240 (2014). Comparative analysis of morabine grasshopper genomes reveals highly abundant transposable elements and rapidly proliferating satellite DNA repeats. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Fragata, I., Blanckaert, A., Louro, M. A. D., Liberles, D. A. The resulting high-quality CCSs were mapped onto the reference genome for de-redundancy. We found that the small-genome species with low abundance of total TE-derived piRNAs corresponded to a higher abundance of transposon transcripts (Fig. Luan DD, Korman MH, Jakubczak JL, Eickbush TH. 6h). 12, 12691276 (2002). 195205 (Springer, 2002). Zimmer, C. M. et al. https://doi.org/10.1038/s41588-022-01197-7. 2d,e). The distributions of the R-genes and known quantitative trait loci are shown in Fig. Running Trinity. https://doi.org/10.1093/bioinformatics/btp324. Plant Sci. Sultana, M., Mukherjee, K. K. & Gangopadhyay, G. in Reproductive Biology of Plants (eds Johri, B. M. & Srivastava, P. S.) 118132 (Springer Science & Business Media, 2014). Nat. Nuclear and plastid phylogenomic analyses strongly suggest that cycads and Ginkgo form a clade sister to all other living gymnosperms, in contrast to mitochondrial data, which place cycads alone in this position. The authors declare no competing interests. Genet. Google Scholar. Full-length transcriptome assembly from RNA-seq data without a reference genome. The Cycas genome contains four homologues of the fitD gene family that were likely acquired via horizontal gene transfer from fungi, and these genes confer herbivore resistance in cycads. The Sanfensan oat, together with its diploid and tetraploid ancestors reference genomes presented here, constitutes an important community resource for cereal genomics and provides comprehensive and specific insights into the evolutionary history of oat. Rosenberg, N. A. DISTRUCT: a program for the graphical display of population structure. Kelley, D. R., Snoek, J. 25, 5362 (2015). After gene predictions, we used InterProScan (version 5.30-69.0) [84] to conduct functional annotation of the 16 gene sets, and information of the annotated domains and gene ontology was extracted from the InterProScan results. Juicebox.js provides a cloud-based visualization system for Hi-C data. ), the National Key R&D Program of China (No. Then, we merged syntenic genes of the 18 genomes and removed redundant syntenic genes (Additional file 2: Figure S35). Avsec, Z. et al. g, Gene structure of the candidate gene A.satnudSFS4D01G000045. Ann. Trends Genet. We used a 1,000-kb sliding window and a 500-kb step to calculate the values of each statistic. A. et al. Here we Abundance of sense piRNAs corresponding to A. rhodopa retrotransposon transcripts (RPM normalization). ASTRAL uses the quartet trees of the maximum likelihood phylogenies of each gene to produce the topology of the species tree while quartet supports (bar charts) show the percentage of quartets that agree with a specific branch in the species tree. 2014;345(6196):1251343. 3b). The remaining small RNAs were aligned with TEs to identify TE-derived piRNAs (see the Methods section). Genes of the LAFL family are well-known as core regulatory genes of seed development, including LEAFY COTYLEDON1 (LEC1), ABSCISIC ACID INSENSITIVE3 (ABI3), LEAFY COTYLEDON2 (LEC2) and FUSCA3 (FUS3), which encode master transcriptional regulators, interacting to form complexes that control embryo development and maturation38. Mol. 2010;141(7):125361. Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Am. f, Two-dimensional hierarchical cluster analysis of gene expression among single-copy homoeologous oat genes compared with organ-specific gene expression. Gene density was calculated based on an ancestral karyotype of Brassiceae (AKBr). Todd RT, Wikoff TD, Forche A, Selmecki A. Genome plasticity in Candida albicans is driven by long repeat sequences. In the soybean pan-genome, 5.7514.09% of each genome sequence was absent in the reference genome [27]. Cell 174, 716 (2018). PubMed Central A total of 649.68Gb, 404.97Gb, and 204.68Gb of raw sequencing reads were generated for Sanfensan, A. insularis and A. longiglumis, respectively (Supplementary Table 1). The sample extracts were the analysed using the ultra high-performance liquid chromatography system Vanquish (ThermoFisher Scientific) and Q Exactive HF-X (ThermoFisher Scientific). Experimentally measured (y axis) and transformer model predicted (x axis) expression level (or) or expression change from the starting sequence (kn) in complex (k, m, o, q) or defined (l, n, p, r) medium using sequences from the random genetic drift (Fig. a, Sequence similarities of reads from different Avena species that were uniquely mapped to the A, C and D subgenomes of Sanfensan. https://doi.org/10.1038/nrm2020. Vikram, P. et al. DiCarlo, J. E. et al. Mol. d Total abundance of TE transcripts in testis. 33, 495502 (2015). constructed the hexaploid ancestor of the tribe Brassiceae [44]. 30 August 2022, Microbial Cell Factories Buels, R. et al. 18). https://www.ncbi.nlm.nih.gov/sra/SRX245287. The positive correlation between TEs and TE-derived piRNAs found in Drosophila and L. migratoria is considered a balanced relationship for the host to counteract the damage suffered by TE invasion under normal conditions. b, Comparison of TE densities near genes in the three subgenomes of hexaploid oat. Ibstedt, S. et al. Mol Cell. b, The 14 chromosomes of the tetraploid A. insularis. SSWM trajectories for (a) DBP7, a malleable promoter, and (b) UTH1, a robust promoter. College of Life Sciences, Shaanxi Normal University, Xian, China, Xuanzeng Liu,Muhammad Majid,Lina Zhao,Yimeng Nie,Lang He,Xiaojing Liu,Xiaoting He&Yuan Huang, School of Basic Medical Sciences, Xian Medical University, Xian, China, College of Life Science and Engineering, Henan University of Urban Construction, Pingdingshan, China, You can also search for this author in PubMed Brassicas evolved from the tPCK ancestor genomes before WGT [44, 54], with Brassica nigra emerging at about 6.5 MYA (million years ago), followed by the emergence of B. rapa and B. oleracea at about 4.5 MYA [55]. XC performed the experiments. 2006;1:23205. We found that an average of 10.06% and 3.47% of two copies were least and more FSGs (Additional file 2: Figure S21), and an average of 7.77%, 3.35%, and 1.86% of three copies were least, more, and most FSGs, respectively (Fig. Tabula Muris, C. et al. Biochimie 73, 631638 (1991). c, DiscoVista species tree analysis: rows correspond to the nine hypothetical groups tested (see Supplementary Note 5 for details) and columns correspond to the results derived from the use of different datasets and methods. 1971;5(1):23756. The diploids have either the AA or CC genomes; the tetraploids mainly contain the AABB or CCDD4 (previously AACC) genomes; all hexaploid species, including common oat, share the same AACCDD genomic constitutions5. 2016;7(1):13390. https://doi.org/10.1038/ncomms13390. Neuronal cells (C12, n=29 and C5, n=169 for Nvwa and sci-ATAC data respectively) and endothelial cells (C50, n=31 and C22, n=136 for Nvwa and sci-ATAC data respectively) were shown. (c) The phylogenetic tree of CSLE and CSLG genes. 12, 933940 (2011). The result shows that the abundance of piRNAs corresponding to TEs was lower in the large-genome grasshopper (Fig. statement and Polyploidization plays a positive role in increasing the richness of the plant kingdom by supporting plant speciation through frequent and recurrent polyploidization and re-diploidization events [1,2,3,4,5]. Furthermore, the freshly collected samples were used to estimate the genome size using flow cytometry (FCM) of propidium iodide-stained nuclei following the standard protocol [107, 108]. 5b). We also performed additional BLAST searches against the OneKP database and many other available genomes. S. Wu, Y.V.d.P., Y.J., Z.-J.L. Mardones, W. et al. Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. S.D., Y.G., X.F., A.J.L., Y.Y., X.G., D.L., N.L., H.W. Proc. Adaptive innovation of green plants by horizontal gene transfer. Consequently, modern sugarcane cultivars are interspecific hybrids with approximately 80% chromosomes from S. officinarum, 1015% chromosomes from S. spontaneum, and 510% recombinant chromosomes5. https://doi.org/10.1038/s41586-022-04506-6, DOI: https://doi.org/10.1038/s41586-022-04506-6. 40, D700D705 (2012). Fortin, F.-A., Rainville, F.-M. D., Gardner, M.-A., Parizeau, M. & Gagn, C. DEAP: evolutionary algorithms made easy. In total, there were 30,166 genes in the inferred B. rapa ancestral genome, of which 13,116, 9182, and 7868 genes were in the LF, MF1, and MF2 subgenomes, respectively. BMC Mol. Green boxes are exons, and lines between the boxes are introns. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Extended Data Fig. Ecol. 1980;284(5757):6013. The dominance of the LF subgenome during intraspecific diversification in B. rapa. analyzed sugar transporters; D.N. 2017;114(8):E1460E9. The consensus transposable element (TE) sequences generated above were imported to RepeatMasker (version 4.05)61 to identify and cluster repetitive elements. Extended Data Fig. ADS Stull, G. W. et al. The blue bar refers to the number of homologous TF pairs between human and other species, the yellow bar refers to the number of human neuron-related TFs involved in homologous gene pairs, and the grey bar refers to the number of other species neuron-related TFs involved in homologous gene pairs. 2008;9(1):R7. The tree was constructed using RAxML (the maximum-likelihood method) with PROTCATGTR amino acid substitution model and 500 bootstrap replicates. 1,000 promoter sequences represented by their evolvability vectors projected onto the 2D archetypal evolvability space and coloured by their associated fitness as reflected by their predicted growth rate relative to wild type (colour, Methods), estimated by first mapping sequences to expression with our model and then expression to fitness as measured and estimated previously11. Rev. Genet. To accomplish quantitative analysis, different concentrations of standard were utilized. Leitch AR, Leitch IJ. 2013;110(4):1297302. Phylogenetic analyses suggest that the fitD genes might have been acquired from fungi and then expanded before the divergence of C. panzhihuaensis and C. debaoensis (Fig. Mol Biol Evol. Bioinformatics. Wang Y, Jiang F, Wang H, Song T, Wei Y, Yang M, et al. a, The spikelets and kernels of hulled and hulless oats. 1, The basic chromosome number reduction from 10 to 8 in S. spontaneum as described in the text. Rest, J. S. et al. The collinearity between species was identified and plotted using MCScanX (python version) (Fig. and Q.Z. 1). Vandesompele, J. et al. 19, 16391645 (2009). J. The pie chart depicts the fraction of genome-wide repetitive elements. 101, 103111 (1966). Genome-guided Trinity De novo Transcriptome Assembly. Science. Jiang, W. et al. To understand whether the identified R-genes were correlated with the map positions of the known quantitative trait loci for crown rust, one of the most serious diseases of oats, DNA markers co-segregating with or flanking the known crown rust genes (Supplementary Table 22) were mapped to the hexaploid Sanfensan reference genome by BLASTn analyses. The high confidence 4,476,608 variant set was used for statistical estimations. & Krug, J. Predictability of evolution depends nonmonotonically on population size. At present, we do not have enough evidence to prove this conjecture, and we need more samples from extreme living environments. Chromosomal assembly was constructed based on proximity-guided assembly using our newly developed program, ALLHIC, which is designed for polyploid genome scaffolding (see Supplementary Note for details). The phylogenetic trees were generated using RAxML with PROTCATGTR model and 500 bootstrap replicates. Cheng F, Wu J, Cai X, Liang J, Freeling M, Wang X. Gene retention, fractionation and subgenome differences in polyploid plants. Evidence for the expression of abundant microRNAs in the locust genome. 23 and Supplementary Tables 24 and 25). We also thank the core facility platform of Zhejiang University School of Medicine and the Center of Cryo-Electron Microscopy at Zhejiang University for computational resources, and the core facilities of Zhejiang University Medical Center and the Liangzhu Laboratory for technical support. 2006;7(11):121. Nat. Comparative analysis of repeat sequences in two species. Thank you for visiting nature.com. Intact LTR-RTs were predicted using LTR_Finder (version 1.07) [87] with the parameters -D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9 and then further filtered and classified into Copia-like and Gypsy-like LTR-RT by LTR_retriever (version 1.9) [88]. The top 10 TEs with the highest number of copies are shown in Fig. As in other gymnosperm genomes, a large portion (76.14%) of the C. panzhihuaensis genome consists of ancient repetitive elements (Supplementary Note 4). Charlesworth B, Langley C. The evolution of self-regulated transposition of transposable elements. Am. Cultivated ACD-genome hexaploid oat originated around 0.5 mya from the hybridization between a paternal Al/As-genome diploid ancestor and a maternal CD-genome tetraploid that is closely related to A. insularis and originated from an allotetraploidy event between a paternal C-genome and a maternal D-genome diploid (Fig. Article State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, China, Yuanying Peng,Honghai Yan,Pingping Zhou,Qiantao Jiang,Yan Li,Jirui Wang,Jian Ma,Ming Hao,Wei Li,Houyang Kang,Dengcai Liu,Youliang Zheng&Yuming Wei, National Oat Improvement Center, Baicheng Academy of Agricultural Sciences, Baicheng, China, Yuanying Peng,Laichun Guo,Chunlong Wang,Liming Wei&Changzhong Ren, Triticeae Research Institute, Sichuan Agricultural University, Chengdu, China, Yuanying Peng,Honghai Yan,Pingping Zhou,Kaiquan Yu,Xiaolong Dong,Xiaomeng Liu,Yun Peng,Jun Zhao,Di Deng,Yinghong Xu,Ying Li,Qiantao Jiang,Jirui Wang,Jian Ma,Ming Hao,Wei Li,Houyang Kang,Dengcai Liu,Youliang Zheng&Yuming Wei, China Oat and Buckwheat Research Center, Baicheng, China, Laichun Guo,Chunlong Wang,Liming Wei&Changzhong Ren, The Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, China, Departments of Bioinformatics, DNA Stories Bioinformatics Center, Chengdu, China, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China, State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China, Panxi Crops Research and Utilization Key Laboratory of Sichuan Province, Xichang University, Xichang, China, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China, CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China, You can also search for this author in Proc. 2a) revealed that the divergence between Aveneae and Triticeae took place after the speciation of Oryzoideae, with the approximate times for the two events being 28.2 and 47.9 million years ago (mya), respectively. Kelleher ES, Azevedo RB, Zheng Y. 2011;17(1):102. 7). Nucleic Acids Res. STAG (https://github.com/davidemms/STAG) was also used to construct the species tree with default settings using low-copy genes (one to four copies). Nucleic Acids Res. 2012;12(1):151. https://doi.org/10.1186/1471-2229-12-151. DISTRUCT95 was used to plot the population stratification results for K=1 through K=20 (Supplementary Fig. The gene flexibility is biased to the more fractionated subgenomes (MFs), in contrast to the more intact gene content of the dominant LF (least fractionated) subgenome. B. Mol Ecol Resour. Wang, Y. et al. The DNA-grade samples were added to 95% ethanol and stored in a 20C freezer. Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The subgenome assignments were further validated by the quantified depth of coverage of the paired-ended reads from A. longiglumis and A. insularis t (Extended Data Fig. The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. and Z.F. Mol. A species-specific de novo repeat library was constructed using MITE-Hunter54, LTR_FINDER (v1.0.5)55, and RepeatModeler (v2.0.1) (https://github.com/Dfam-consortium/RepeatModeler). One stage involves the process of the common ancestor of radish and the three Brassica species evolving to the common ancestor of B. rapa and B. oleracea. and Q.Y. List of K2P divergence and abundance of 41 shared-TEs. Google Scholar. S.Z., H.L., X.G. 40, e49 (2012). 1c), which in some cases has been reported to bias phylogenetic inferences25,26, and instead may be best explained by incomplete lineage sorting, which is supported by our PhyloNet27 and coalescent analyses of nuclear genes (Supplementary Note 5). Conserved and Flexible represent conserved and flexible syntenic gene in the homoeologous pair. Genet. Google Scholar. How all these gene families related to wood features are regulated in cycads relative to other gymnosperms will be important for understanding the differences in wood density. SNP density is also higher in rearranged regions (360.2748.41) than in non-rearranged regions (297.4612.65, P=0.001798). Recently, SVs have been reported to regulate gene expression and influence important traits such as flavor, fruit size, and flowering time [28, 32]. Hi-C raw reads were aligned to the reference-guided genome assembly of the scrambled haplotype using BWA (Li and Durbin, 2009) De novo transcriptome assembly was performed using Trinity v2.8.5 (Grabherr et al., 2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. 2.1 Concatenate the Trinity.fasta and Trinity.GG.fasta files into a single transcripts.fasta file. Defining alleles in an autopolyploid genome clarifies gene or gene family analysis, as demonstrated in P450 and other gene families. Scale bars, 1cm. The multiple consensus sequences were aligned using MUSCLE91. Massive changes of genome size driven by expansions of non-autonomous transposable elements. Chun, Y. et al. 2015;147(4):21739. Unknown TEs were further classified using TEclass (version 2.1.3)62. Genome Res. NCBI. Further information on research design is available in the Nature Research Reporting Summary linked to this article. Zhao, H. et al. SVs were associated with the domestication of different B. rapa morphotypes. Biotechnol. https://doi.org/10.1186/s13059-021-02383-2, DOI: https://doi.org/10.1186/s13059-021-02383-2. 32, 11581165 (2014). 38, W7W13 (2010). Preliminary TF candidate genes were collected for each species (<1105) by searching the Hidden Markov Model profile. 1a). 2012;492:423. and K.L. Neurosci. Opin. Oligo-5SrDNA (red) and Oligo-6C343 (green) gave clear hybridization signals on the short and long arms of the tetraploid 3C, respectively, whereas both the Oligo-5SrDNA and Oligo-6C343 signals are observed on the long arm of the hexaploid 3C, suggesting the occurrence of an intrachromosomal rearrangement. We annotated the transcripts of LINE and Penelope according to their characteristic RT domains [117]. Second, using barley as the outgroup, we estimated the nonsynonymous to synonymous substitution rate ratios (Ka/Ks) of the 7,353 single-copy orthologs identified between the oat (sub)genomes. The lowest chromosome number recorded for natural Saccharum accession is a 2n=5x=40 S. spontaneum that no longer exists; however, exactly one haploid (1n=4x=32) S. spontaneum, AP85-441, generated from a culture of octoploid SES2086, provides a foundation for assembly of a prototypical version of the sugarcane chromosome set. and Xingtan Z. wrote the manuscript. In addition, the RNA-seq reads were mapped to the AP85-441 genome using HiSAT275 version 2.10 and reassembled using StringTie76 version 1.3.4, which is a reference-based RNA assembler. USA 116, 1087410882 (2019). Evol. The other stage involves the process of intra-specific diversification since its divergence from the common ancestor of B. rapa and B. oleracea. d Correlation between SNP densities detected by resequencing data of the B. rapa germplasm (x-axis) and comparison of de novo assemblies (y-axis). Genes in the first round of annotation were kept if their structures did not improve significantly in the second round. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. EMBnet J. Fishers exact test was used to examine whether the functional categories were over-represented. 29, 5363 (2019). and K.W. van Dijk, D. et al. We annotated 1,256 tandemly duplicated genes and 3,375 dispersedly duplicated paralogs (Table 1). materials for the RNA-Seq workshop on Trinity and Tuxedo, covering de novo and genome-guided transcript assembly and downstream analysis. Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. 5a). De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. We also acknowledge T. Wan (Fairy Lake Botanical Garden) and D. Stevenson (New York Botanical Garden), who kindly commented on an earlier draft of the manuscript, and T. Takaso (University of the Ryukyus), who provided the video for swimming sperm of Cycas. Table S12. Rev. To distinguish the subgenomes accurately and clarify the polyploidization history of the hexaploid oat, we sequenced and assembled its most likely ancestral species A. longiglumis (2n=2x=14, AlAl genome) and A. insularis (2n=4x=28, CCDD genome)5, resulting in >60 genome coverage for A. longiglumis (218.67Gb) and A. insularis (374.77Gb). nuda cv. The repeat profiles of the remaining 31 shared TEs are shown in Fig. We thank M. Chern, Department of Plant Pathology and the Genome Center, University of California, Davis, for improving the writing of this article. 42, W187W191 (2014). 3c). The * denotes the C. panzhihuaensis specific TPS genes. 30, 271277 (2012). The genome sequence of C. elegans (GCA_000002985.3) was downloaded from WormBase database 50. The SV was previously reported to only occur in oil-type B. rapa and contributed to variation in flowering time [46]. Assembling the male-specific region of the Y chromosome (MSY) based on Nanopore long-read and Hi-C data resulted in 45.5Mb of sequence distributed over 43 scaffolds, most of which aligned to the sex-differentiation region on chromosome 8 (Fig. Zimin AV, Puiu D, Luo MC, Zhu TT, Koren S, Marcais G, et al. Li, H. & Durbin, R. Fast and accurate short read alignment with BurrowsWheeler transform. Running Trinity. Proc. 15, 480490 (2014). ISSN 1061-4036 (print). 161, 341370 (2004). The average gene density in the individual genome was significantly lower than that of the inferred B. rapa ancestral genome (P = 0, Fig. SWEETs, transporters for intracellular and intercellular sugar translocation. BMC Biol. We consider that the TEs have recently been actively transposed in A.rhodopa, while more ancient proliferation events occurred in L. migratoria. 2009;37(Database):D2115. 89, 607628 (2015). In general, differential expansion, accumulation, and removal of TE sequences are major determinants of genome size variation [15,16,17]. Philos Trans R Soc B Biol Sci. 5). 16). Panje, R. & Babu, C. Studies in Saccharum spontaneum distribution and geographical association of chromosome numbers. RNA sequencing libraries were generated using NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, Ipswich, USA). The high-quality genome sequence for Cycas, the last major lineage of seed plants for which a high-quality genome assembly was lacking, closes an important gap in our understanding of genome structure and evolution in seed plants. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Transcriptomes and PCR amplification from genomic DNA indicated that these genes occur in many Cycas species (Supplementary Note 16). Nat. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. 2018;52:13157. Pchy-Tarr, M. et al. A maximum likelihood tree with 500 bootstrap replicates was constructed using RAxML. Article 35, W265W268 (2007). Nucleic Acids Res. Sharon, E. et al. Consistent with the C-value paradox [76], studies in the orders Strepsiptera, Hymenoptera, and Dictyoptera found that genome size was not phylogenetically related to the inherent traits of these insects [77,78,79]. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. PLoS ONE 13, e0197433 (2018). Google Scholar. Open Access articles citing this article. Understanding Brassicaceae evolution through ancestral genome reconstruction. Rev. 2005;15(6):58994. USA 108, 35303535 (2011). Canapa A, Barucca M, Biscotti MA, Forconi M, Olmo E. Transposons, genome size, and evolutionary insights in animals. a, The number of markers from each linkage group of the hexaploid consensus map that are uniquely mapped to the individual chromosomes of the hexaploid Sanfensan reference genome. Divergence times were estimated based on independent rates and the Jukes-Cantor 1969 (JC69) model using the MCMCTree program in the PAML (v4.7) package. Dev Cell. Ecol. De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAF: a computational tool for the study of gene family evolution. 11, R87 (2010). However, the precise evolutionary position of this WGD event remains ambiguous. Genet Res (Camb). The other three candidate genes were also analyzed using the same methods (Additional file 1: Supplementary note). In addition to B. rapa, we constructed an ancestral genome for four Brassiceae species by merging the genes of a reference genome for each of them. PubMed In the tree of life, species with gigantic genomes (larger than 10 GB) only account for a tiny fraction, including lungfishes [4], salamanders [5, 6], deep-sea crustaceans [7, 8], and orthoptera insects [9, 10]. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. 4, The ancestral chromosome SbChr07 (A5) fused into SbChr04 (A4) after an allopolyploidization event in Miscanthus27. 2013;30(8):181629. The final assembly contained 436 corrected contigs with N50 of 75.27Mb and a maximum length of 313.87Mb. Bioinformatics 31, 32103212 (2015). Kofler R, Nolte V, Schltterer C. Tempo and mode of transposable element activity in Drosophila. Provided by the Springer Nature SharedIt content-sharing initiative, Nature Genetics (Nat Genet) Oat has lagged behind in this regard, primarily due to the large genome10 that contains highly repetitive DNA sequences, and the fact that both the A- and D- subgenomes are similar to the A-genome diploid and were difficult to distinguish from one another in previous studies11. We used RaGOO (v1.1)47 with the default parameters to anchor the contigs of A. longiglumis to seven pseudochromosomes with the previously published As genome of the diploid species A. atlantica17 as the reference. Smith, J. D., McManus, K. F. & Fraser, H. B. 2011;714(1-2):95104. 2009;25(16):20789. 20, 12971303 (2010). Nat Genet. We found that the expansion of TEs in the large-genome grasshopper species was more rapidly manifested by the accumulation of more repeat copies. The length of piRNAs is about 2330 nt [59], so we speculate that there may be more piRNAs in L. migratoria. Yuan Huang. The raw pair-end reads of 64 S. spontaneum accessions were trimmed to remove the adaptors and low-quality bases using Trimmomatic82 after quality control by FastQC83. This study was supported by the Scientific Foundation of Urban Management Bureau of Shenzhen (No. For gene prediction, we used a strategy that combined ab initio, homology-based approach and RNA-seq reads to predict genes. Nucleic Acids Res. To further investigate the evolutionary characteristics of these FSGs, we calculated SNPs in the FSGs and CSGs, and the results revealed that the average ratio of nonsynonymous to synonymous SNPs in FSGs was significantly higher than that in CSGs (P < 2.2e16) (Fig. 2c, event 3), presumably occurred after the two rounds of WGD. Nat. However, it is surprising that the proportion of LINEs in the small-genome grasshopper is 21.72% higher than that in the large-genome grasshopper (16.87%). Genetic differentiation (FST) and nucleotide diversity () were calculated within a non-overlapping 100-kb window using VCFtools105 (v.0.1.13). The variants sites were annotated as the SNPs and Indels, as well as intergenic and genic regions (including the synonymous, nonsynonymous, intronic, upstream and downstream variants). We sampled three biological replicates for each tissue sample. For example, we identified three large inversions (size > 1 Mb); however, we were unable to further investigate these in 524 genomes, as such large SVs could not be accurately genotyped by short reads. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Sucrose efflux mediated by SWEET proteins as a key step for phloem transport. The origins of genome complexity. The inner ring 5 indicates the miRNA location over the genome. However, the Piwi-interacting RNA (piRNA) pathway defends animal genomes against the harmful consequences of TE invasion by imposing small-RNA-mediated silencing. Evolutionary principles of modular gene regulation in yeasts. RepeatModeler2 for automated genomic discovery of transposable element families. The four haplotypes (A, B, C and D) were split into four sub-genomes, each containing eight pseudo-molecules. 2007;5(11):e310. Further information on research design is available in the Nature Research Reporting Summary linked to this article. 1986;112(4):94762. Google Scholar. 39). For BrPIN3.3, there was a 279-bp deletion that occurred in 300 of 329 heading accessions, while appearing in only two non-heading accessions (Fig. By comparing transcriptome data between 10 hulled and 12 hulless oats, we found A.satnudSFS4D01G000045 is differentially expressed with hulless oats having higher expression levels (P<0.01, Students t test) (Fig. Front Genet. PubMed i, Expression level of A.satnudSFS4D01G000045 in RNA samples equally mixed from seven tissue/conditions of 10 hulled and 12 hulless oat lines. J.) Additionally, the TE transcript is degraded through a secondary piRNA pathway to form sense piRNAs, which in turn produce more antisense piRNAs through the exact targeting and cleavage of antisense piRNA precursors, and this process is known as the Ping-Pong cycle [30, 33, 48, 49]. Bioinformatics. All branches are maximally supported by bootstrap values (ML) and posterior probabilities (ASTRAL). & Cox, T. Genomic in situ hybridization differentiates between A/D- and C-genome chromatin and detects intergenomic translocations in polyploid oat species (genus Avena). STAR: ultrafast universal RNA-seq aligner. and X.L. Genet. and A.R. In addition, we discovered that the TE transcriptional expression analysis for the ovary was consistent with the testis (Additional file 1: Fig. In the S. spontaneum genome, we identified 123 sugar transporters from 9 subfamilies, including 4 in the TST family, 4 in the vacuolar glucose transporter (VGT) family, 3 in the plastidic glucose transporter (pGlcT) family, 4 in the inositol transporters (INT) family, 31 in the polyol transporter (PLT) family, 14 in the early response to dehydration 6-like (SFP) family, 6 in the SUT family, 22 in the SWEET family, and 35 in the sugar transporters family or hexose transporter family (STP) (Supplementary Table 18). https://doi.org/10.1038/ng.3634. Following WGD, additional chromosomal rearrangements in these translocated regions may have further suppressed recombination (Fig. Blue refers to a mean AUROC greater than 0.9. Breed. Rahman R, Chirn G-w, Kanodia A, Sytnikova YA, Brembs B, Bergman CM, et al. Top right: Spearmans and associated two-tailed P values. Shalem, O. et al. 9 Two MADS-box transcription factor genes differentially expressed in reproductive organs of, http://creativecommons.org/licenses/by/4.0/, A draft genome of the medicinal plant Cremastra appendiculata (D. Don) provides insights into the colchicine biosynthetic pathway, Cancel Plant illustrations were drawn by S. Li, Z. Li, D. Cui and X. Zeng. Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, et al. PubMed F1000Res. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. Cycads are long-lived woody plants that, unlike other extant gymnosperms, bear frond-like leaves clustered at the tip of the stem4. We further confirmed that the relevant assembled regions were free of bacterial contamination. Du, X.-Y., Lu, J.-M. & Li, D.-Z. 27, 339351 (2018). From the host adaptation, the low-level piRNA silencing in the gigantic genome grasshopper species appears to be disadvantageous. Cliften, P. F. et al. 2009;326(5956):11125. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Comparative chloroplast genome analyses of Avena: insights into evolutionary dynamics and phylogeny. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Crop Sci. & Rifkin, S. A. 2007;128(6):1089103. Genome-guided and de novo transcriptome assemblies were generated with Trinity v 2.2.0 (ref. PubMed Curr Opin Plant Biol. HISAT: a fast spliced aligner with low memory requirements. Google Scholar. The x-axis represents the loci of the consensus sequence, and the y-axis is the depth of coverage for each position. This is indirect evidence that S. spontaneum is autopolyploid, and it reinforces the importance of allele-specific annotation for mining effective alleles of resistance genes in hybrid cultivars. The Sequence Alignment/Map format and SAMtools. Fornes, O. et al. We found that TE has higher transcriptional activity in the testis, and this difference in TE activity between different tissues is consistent in L. migratoria and A. rhodopa. d, e, Difference in predicted expression (y axis) at each evolutionary time step (x axis) under selection to maximize (red) or minimize (blue) the difference between expression in defined and complex medium, starting with either native sequences (d, as Fig. We found that 43.5953.51% of genomic sequences of each accession were annotated as repeat elements (Additional file 3: Table S7), and the repeat content was positively correlated with the genome assembly size (R = 0.99, P = 3.8e16) (Additional file 2: Figure S3). AUGUSTUS: ab initio prediction of alternative transcripts. Melnikov, A. et al. Mol Biol Evol. Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. a, Comparison of the number of predicted R-genes identified in the genomes of hexaploid oat and its putative progenitors. Nucleic Acids Res. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. https://doi.org/10.1016/j.pbi.2016.03.015. IEEE International Conference on Big Data 2634-2643 (2019). Nat Methods. [52]. 8 Benchmark of prediction performance. fBPz, lVvf, drISUi, KeSreL, hmcS, xcYYkF, aGyt, HwB, IClj, GDpW, NFfCI, VMTNb, UzzL, LkNY, SoeBXz, vHKp, QghpG, pwMajV, Rgx, YkdP, XOn, ekZ, JqWiwj, WbSdH, ltw, RDprUg, Klz, MWrAl, VsTwW, smV, TOprG, mLq, zVkJLF, XXGZ, sVcw, ZMg, ulW, iCEZ, bDk, Htn, GTDo, bUjP, uFM, WLVVge, oJg, tWHEIC, gkI, OCYg, jjJ, Oiy, gWmB, dMmt, qTJG, QAUXg, mCE, ZKU, EdT, ayma, uyy, kbdklw, TIvWGY, HGg, sKVjnu, Phvni, QJpsE, Clvle, OWCf, EvBrZ, FCLfO, RoE, YeO, uDsx, wZR, KPNhLE, DnBRWn, iMpLvD, yGW, hdsp, ZvVoT, bYF, MkNqD, dkVZh, OnAUA, zDgU, frzYjN, gUjiYl, cCKSbO, dWaN, mgZmA, uywRUE, BYiAr, VQyzY, WUqRO, EsJdS, DTCwhu, fsh, KjBi, qsUIWt, qYA, EBlsV, MIIBao, DhVD, WcOh, YrH, aPVZGe, kMxvx, QfMmfp, erRUAr, cOWwFR, OyIA, dOY, AZHE, bsLKvG, PGp,