Citrus clementina genome v0.9 (JGI)

Overview
Analysis NameCitrus clementina genome v0.9 (JGI)
MethodPerformed by JGI (v0.9)
SourceJGI Citrus clementina assembly/annotation v0.9 (165)
Date performed2011-02-01

Note: The following text comes from phytozome.org:

Genome Size / Loci
This version of the assembly (v. 0.9) is 296 Mb spread over 1,128 scaffolds with 2.3% gaps at 6.5x coverage. Half the genome is accounted for by 27 scaffolds 3.3 Mb or longer.  The current gene set (clementine0.9) integrates 800k ESTs with homology and ab initio-based gene predictions (by GenomeScan, Fgenesh). 25,385 protein-coding loci have been predicted. Each encodes a primary transcript. There are an additional 10,591 alternative transcripts encoded on the genome generating a total of 35,976 transcripts. 16,808 primary transcripts have EST support over at least 50% of their length. A third of the primary transcripts (12,805) have EST support over 100% of their length.

Sequencing Method
Genomic sequence was generated by the IGCG, Genoscope, IGA and JGI using a whole genome shotgun approach using Sanger technology sequencing 2-3kb, 6-12kb insert libraries as well as a 39kb fosmid end library totaling 6x coverage.

Assembly Method
The genome was assembled with Arachne by Jeremy Schmutz at HudsonAlpha. Over 98% of the genome is in scaffolds over 50kb long.

Identification of Repeats
A repeat library had previously been generated from the sweet orange genome sequence. This library was used to mask 38% of the genome with RepeatMasker.

EST Alignments
EST sequences were collected from the following sources: 210,567 C. sinensis ESTs from GenBank; 118,365 C. clementina ESTs from GenBank; 401,708 ESTs from Life Technologies; 58,656 non-redundant EST assemblies built from sweet orange 454 EST sequences by Mohammed Mohiuddin. These 789,296 sequences were aligned and assembled into 72,320 assemblies on the haploid clementine genome Brian Haas's PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.

Assembly metrics

Assembly size  296 Mb
Number of scaffolds 1,128
N50 3,278,304 bp
Predicted transcripts 35,976
Annotated genes  
Assembly BUSCO score (embryophtya_odb10) 98.5%
Annotation BUSCO score (embryophtya_odb10) 94.1%