Citrus clementina genome v0.9 (JGI)

Overview
Analysis NameCitrus clementina genome v0.9 (JGI)
MethodPerformed by JGI (v0.9)
SourceJGI Citrus clementina assembly/annotation v0.9 (165)
Date performed2011-02-01

Note: The following text comes from phytozome.org:

Genome Size / Loci
This version of the assembly (v. 0.9) is 296 Mb spread over 1,128 scaffolds with 2.3% gaps at 6.5x coverage. Half the genome is accounted for by 27 scaffolds 3.3 Mb or longer.  The current gene set (clementine0.9) integrates 800k ESTs with homology and ab initio-based gene predictions (by GenomeScan, Fgenesh). 25,385 protein-coding loci have been predicted. Each encodes a primary transcript. There are an additional 10,591 alternative transcripts encoded on the genome generating a total of 35,976 transcripts. 16,808 primary transcripts have EST support over at least 50% of their length. A third of the primary transcripts (12,805) have EST support over 100% of their length.

Sequencing Method
Genomic sequence was generated by the IGCG, Genoscope, IGA and JGI using a whole genome shotgun approach using Sanger technology sequencing 2-3kb, 6-12kb insert libraries as well as a 39kb fosmid end library totaling 6x coverage.

Assembly Method
The genome was assembled with Arachne by Jeremy Schmutz at HudsonAlpha. Over 98% of the genome is in scaffolds over 50kb long.

Identification of Repeats
A repeat library had previously been generated from the sweet orange genome sequence. This library was used to mask 38% of the genome with RepeatMasker.

EST Alignments
EST sequences were collected from the following sources: 210,567 C. sinensis ESTs from GenBank; 118,365 C. clementina ESTs from GenBank; 401,708 ESTs from Life Technologies; 58,656 non-redundant EST assemblies built from sweet orange 454 EST sequences by Mohammed Mohiuddin. These 789,296 sequences were aligned and assembled into 72,320 assemblies on the haploid clementine genome Brian Haas's PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.

Assembly metrics

Assembly size  296 Mb
Number of scaffolds 1,128
N50 3,278,304 bp
Predicted transcripts 35,976
Annotated genes  
Assembly BUSCO score (embryophtya_odb10) 98.5%
Annotation BUSCO score (embryophtya_odb10) 94.1%
Downloads

All assembly and annotation files are available for download by selecting the desired data type in the right-hand "Resources" side bar.  Each data type page will provide a description of the available files and links do download.  Alternatively, you can browse all available files on the CGD data repository.

Assembly

The following text comes from phytozome.org:

Genomic sequence was generated by the IGCG, Genoscope, IGA and JGI using a whole genome shotgun approach using Sanger technology sequencing 2-3kb, 6-12kb insert libraries as well as a 39kb fosmid end library totaling 6x coverage.  The genome was assembled with Arachne by Jeremy Schmutz at HudsonAlpha. Over 98% of the genome is in scaffolds over 50kb long.

Please note: if you download and use the JGI whole genome assembly and annotation please abide by the requirements for this data as specified on phytozome.org's Citrus clementina download page.  

Downloads

Scaffolds (FASTA file) Cclementina_v0.9_scaffolds.fa.gz
Scaffolds w/ masked repeats (FASTA file) Cclementina_v0.9_scaffolds_RM.fa.gz
Scaffolds (GFF3 file) Cclementina_v0.9_scaffolds.gff3.gz

 

Gene Predictions

The following text comes from phytozome.org:

The current gene set (clementine0.9) integrates 800k ESTs with homology and ab initio-based gene predictions (by GenomeScan, Fgenesh). 25,385 protein-coding loci have been predicted. Each encodes a primary transcript. There are an additional 10,591 alternative transcripts encoded on the genome generating a total of 35,976 transcripts. 16,808 primary transcripts have EST support over at least 50% of their length. A third of the primary transcripts (12,805) have EST support over 100% of their length.

Please note: if you download and use the JGI whole genome assembly and annotation please abide by the requirements for this data as specified on phytozome.org's Citrus clementina download page.  

Downloads

Transcript sequences--mRNA (FASTA file) Cclementina_v0.9_transcript.fa.gz
Protein sequences (FASTA file) Cclementina_v0.9_peptide.fa.gz
Gene models (GFF3 file) Cclementina_v0.9_gene.gff3.gz
Transcripts (GFF3 file) Cclementina_v0.9_transcripts.gff3.gz
Alternate transcripts (GFF3 file) Cclementina_v0.9_alt_transcripts.gff3.gz

 

Protein Homology

Protein homology found here was performed by the Main Bioinformatics Lab at WSU. Proteins from the C. clementina v1.0 assembly were mapped against proteins from other genomes and databases using blastp with an e-value cutoff of 1e-6. Only the best 10 matches were kept. The available files are in Excel 2007 format.

Downloads

ExPASy SwissProt Cclementina_v0.9_vs_sprot.xls
Malus x domestica (apple) v1.0 proteins Cclementina_v0.9_vs_apple.xls
TAIR10 (arabidopsis) proteins Cclementina_v0.9_vs_arabidopsis.xls
Prunus persica (peach) v1.0  proteins Cclementina_v0.9_vs_peach.xls
Vitis vinifera (grape)  proteins Cclementina_v0.9_vs_grape.xls
Populus trichocarpa (poplar) v2.0  proteins Cclementina_v0.9_vs_poplar.xls

 

Repeats

The following text comes from phytozome.org:

A repeat library had previously been generated from the sweet orange genome sequence. This library was used to mask 38% of the genome with RepeatMasker.

Please note: if you download and use the JGI whole genome assembly and annotation please abide by the requirements for this data as specified on phytozome.org's Citrus clementina download page.  

Downloads

RepeatsMasker repeats (GFF3 file) Cclementina_v0.9_repeats.gff3.gz