|
Overview
Note: The following text comes from phytozome.org:
Genome Size / Loci
This version of the assembly (v. 0.9) is 296 Mb spread over 1,128 scaffolds with 2.3% gaps at 6.5x coverage. Half the genome is accounted for by 27 scaffolds 3.3 Mb or longer. The current gene set (clementine0.9) integrates 800k ESTs with homology and ab initio-based gene predictions (by GenomeScan, Fgenesh). 25,385 protein-coding loci have been predicted. Each encodes a primary transcript. There are an additional 10,591 alternative transcripts encoded on the genome generating a total of 35,976 transcripts. 16,808 primary transcripts have EST support over at least 50% of their length. A third of the primary transcripts (12,805) have EST support over 100% of their length.
Sequencing Method
Genomic sequence was generated by the IGCG, Genoscope, IGA and JGI using a whole genome shotgun approach using Sanger technology sequencing 2-3kb, 6-12kb insert libraries as well as a 39kb fosmid end library totaling 6x coverage.
Assembly Method
The genome was assembled with Arachne by Jeremy Schmutz at HudsonAlpha. Over 98% of the genome is in scaffolds over 50kb long.
Identification of Repeats
A repeat library had previously been generated from the sweet orange genome sequence. This library was used to mask 38% of the genome with RepeatMasker.
EST Alignments
EST sequences were collected from the following sources: 210,567 C. sinensis ESTs from GenBank; 118,365 C. clementina ESTs from GenBank; 401,708 ESTs from Life Technologies; 58,656 non-redundant EST assemblies built from sweet orange 454 EST sequences by Mohammed Mohiuddin. These 789,296 sequences were aligned and assembled into 72,320 assemblies on the haploid clementine genome Brian Haas's PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.
Assembly metrics
Assembly size |
296 Mb |
Number of scaffolds |
1,128 |
N50 |
3,278,304 bp |
Predicted transcripts |
35,976 |
Annotated genes |
|
Assembly BUSCO score (embryophtya_odb10) |
98.5% |
Annotation BUSCO score (embryophtya_odb10) |
94.1% |
Downloads
All assembly and annotation files are available for download by selecting the desired data type in the right-hand "Resources" side bar. Each data type page will provide a description of the available files and links do download. Alternatively, you can browse all available files on the CGD data repository.
Assembly
The following text comes from phytozome.org:
Genomic sequence was generated by the IGCG, Genoscope, IGA and JGI using a whole genome shotgun approach using Sanger technology sequencing 2-3kb, 6-12kb insert libraries as well as a 39kb fosmid end library totaling 6x coverage. The genome was assembled with Arachne by Jeremy Schmutz at HudsonAlpha. Over 98% of the genome is in scaffolds over 50kb long.
Please note: if you download and use the JGI whole genome assembly and annotation please abide by the requirements for this data as specified on phytozome.org's Citrus clementina download page.
Downloads
Gene Predictions
The following text comes from phytozome.org:
The current gene set (clementine0.9) integrates 800k ESTs with homology and ab initio-based gene predictions (by GenomeScan, Fgenesh). 25,385 protein-coding loci have been predicted. Each encodes a primary transcript. There are an additional 10,591 alternative transcripts encoded on the genome generating a total of 35,976 transcripts. 16,808 primary transcripts have EST support over at least 50% of their length. A third of the primary transcripts (12,805) have EST support over 100% of their length.
Please note: if you download and use the JGI whole genome assembly and annotation please abide by the requirements for this data as specified on phytozome.org's Citrus clementina download page.
Downloads
Protein Homology
Protein homology found here was performed by the Main Bioinformatics Lab at WSU. Proteins from the C. clementina v1.0 assembly were mapped against proteins from other genomes and databases using blastp with an e-value cutoff of 1e-6. Only the best 10 matches were kept. The available files are in Excel 2007 format.
Downloads
Repeats
The following text comes from phytozome.org:
A repeat library had previously been generated from the sweet orange genome sequence. This library was used to mask 38% of the genome with RepeatMasker.
Please note: if you download and use the JGI whole genome assembly and annotation please abide by the requirements for this data as specified on phytozome.org's Citrus clementina download page.
Downloads
|