Citrus sinensis genome v1.0 (JGI)

Overview
Analysis NameCitrus sinensis genome v1.0 (JGI)
MethodPerformed by JGI (v1.0)
SourceJGI Citrus sinensis assembly/annotation v1.0 (154)
Date performed2011-02-01

Note: The following text comes from phytozome.org:

Genome Size / Loci
This version (v.1) of the assembly is 319 Mb spread over 12,574 scaffolds. Half the genome is accounted for by 236 scaffolds 251 kb or longer. The current gene set (orange1.1) integrates 3.8 million ESTs with homology and ab initio-based gene predictions (see below). 25,376 protein-coding loci have been predicted, each with a primary transcript. An additional 20,771 alternative transcripts have been predicted, generating a total of 46,147 transcripts. 16,318 primary transcripts have EST support over at least 50% of their length. Two-fifths of the primary transcripts (10,813) have EST support over 100% of their length.

Sequencing Method
Genomic sequence was generated using a whole genome shotgun approach with 2Gb sequence coming from GS FLX Titanium; 2.4 Gb from FLX Standard; 440 Mb from Sanger paired-end libraries; 2.0 Gb from 454 paired-end libraries

Assembly Method
The 25.5 million 454 reads and 623k Sanger sequence reads were generated by a collaborative effort by 454 Life Sciences, University of Florida and JGI. The assembly was generated by Brian Desany at 454 Life Sciences using the Newbler assembler.

Identification of Repeats
A de novo repeat library was made by running RepeatModeler (Arian Smit, Robert Hubley) on the genome to produce a library of repeat sequences. Sequences with Pfam domains associated with non-TE functions were removed from the library of repeat sequences and the library was then used to mask 31% of the genome with RepeatMasker.

EST Alignments
We aligned the sweet orange EST sequences using Brian Haas's PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.

Assembly metrics

Assembly size  319 Mb
Number of scaffolds 12,574
N50 250,548
Predicted transcripts 46,147
Annotated genes  
Assembly BUSCO score (embryophtya_odb10) 92.2%
Annotation BUSCO score (embryophtya_odb10) 87.5%