Abstract:
Flatback mud crab (Eurypanopeus depressus) transcriptome was assembled from 15 individuals sequenced in 2 Illumina HiSeq2000 lanes PE100 using Trinity2.0.3 and annotated with Trinotate2.0.1 on a custom MySQL database from The Broad Institute. We exposed 3 flatback mud crabs to one of 4 treatments (total = 12 individuals): non-aerated control, aerated control, oil only (Marlin platform Dorado 1g/l ), and oil-dispersant (Marlin platform Dorado 1g/l, COREXIT 9500 0.1g/l) for 72 hours to assess the up and down regulation of genes in muscle tissues. To account for stress caused by laboratory treatments, muscle tissue from three reference individuals that were sacrificed and not exposed to any lab treatments were analyzed. This dataset reports upregulated and downregulated gene expression. NCBI accession numbers are provided for each sample.
Suggested Citation:
Vazquez-Miranda, Hernan; Thoma, Brent P; Wong, Juliet M; Felder, Darryl L; Crandall, Keith A, and Bracken-Grissom, Heather D. 2017. Annotated transcriptome and associated datasets of flatback mud crabs (Eurypanopeus depressus) exposed to dispersed oil. Distributed by: GRIIDC, Harte Research Institute, Texas A&M University–Corpus Christi. doi:10.7266/N71C1TZC
Data Parameters and Units:
Trinity gene number (gene_id), Trinity isoform number (transcript_id), SwissProt hit all ORFs (sprot_Top_BLASTX_hit), Uniprot90 hit all ORFs (TrEMBL_Top_BLASTX_hit), rRNA identity (RNAMMER), protein number (prot_id), protein coordinates, (prot_coords), SwissProt hit protein (sprot_Top_BLASTP_hit), UniProt90 hit protein (TrEMBL_Top_BLASTP_hit), Protein Family hit (Pfam), Signal Peptide hit (SignalP), Transmembrane HMMer hit (TmHMM), Gene Ontology eggnog (eggnog), Gene Ontology Blast (gene_ontology_blast), Gene Ontology Pfam (gene_ontology_pfam).
Methods:
We conducted a de novo transcriptome assembly in TRINITY 2.0.3 using paired-end reads that passed QC from 15 total libraries. TRINITY was run on the FIU Panther Cluster in the High Performance Computer (HPC) environment with 24 cores and 256 GB of RAM with the following parameters: minimum kmer coverage of 4, maximum memory 252 GB, reverse single-stranded libraries, 24 CPUs, Butterfly maximum heap space 10GB, Butterfly initial heap space 10GB, Butterfly CPUs 24, and Inchworm CPUs 24. Trinotate was used to annotate the assembly. Gene expression and enrichment were calculated with edgeR, DESeq2, and GoSeq. Transcript and peptide columns were not used in this annotation. Raw data deposited directly to NCBI Genbank under: BioProject PRJNA376168 BioSamples SAMN06351232-SAMN06351246 TSA GFJG00000000 Eurypanopeus depressus Replicate 1. This Transcriptome Shotgun Assembly (TSA) project has been deposited at DDBJ/EMBL/GenBank under the accession GFJG00000000. The version described in this dataset is the first version, GFJG01000000. This submission includes one list of contents and 15 folders: DS1 – Complete transcriptomic assembly of 15 samples in Trinity fasta format, and spreadsheet with complete transcriptomic annotations for Eurypanopeus depressus from Trinotate. Columns include Trinity gene ID, transcript (isoform) ID, SwissProt Top BLASTX hit (sprot_Top_BLASTX), EMBL UniProt Top BLASTX Hit (TrEMBL_Top_BLASTX), RNAMMER hit, Protein Uniprot ID number (prot_id), protein coordinates (prot_coords), SwissProt Top BLASTP hit (sprot_Top_BLASTP), EMBL UniProt Top BLASTP Hit (TrEMBL_Top_BLASTP), Protein Family (Pfam), Signal Peptide (SignalP), Transmembrane HMMer (TmHMM), EggNog pathways, Gene Ontology (GO) from BLAST (gene_ontology_blast), and GO from Pfam (gene_ontology_pfam). DS2 – spreadsheet with significant regulated annotated genes from edgeR. Downregulated features in blue, and upregulated features in orange. DS3 – spreadsheet with significant regulated annotated isoforms from edgeR. Downregulated features in blue, and upregulated features in orange. DS4 – spreadsheet with significant regulated annotated genes from DESeq2. Downregulated features in blue, and upregulated features in orange. DS5 – spreadsheet with significant regulated annotated isoforms from DESeq2. Downregulated features in blue, and upregulated features in orange. DS6 – spreadsheet with counts of upregulated genes’ GOs from edgeR DS7 – spreadsheet with counts of upregulated isoforms’ GOs from edgeR DS8 – spreadsheet with counts of upregulated genes’ GOs from DESeq2 DS9 – spreadsheet with counts of upregulated isoforms’ GOs from DESeq2 DS10 – spreadsheet with counts of downregulated genes’ GOs from edgeR DS11 – spreadsheet with counts of downregulated isoforms’ GOs from edgeR DS12 – spreadsheet with counts of downregulated genes’ GOs from DESeq2 DS13 – spreadsheet with counts of downregulated isoforms’ GOs from DESeq2 DS14 – collection of spreadsheets with feature ranking by log2 fold change (logFC) from edgeR and DESeq2 feature counts between pairwise experimental condition comparisons (NC, OO, OD), P-values, and False Discovery Rate (FDR) adjustments, EMBL UniProt Top BLASTX Hit (TrEMBL_Top_BLASTX), and Gene Ontology (GO) from BLAST (gene_ontology_blast) DS15 - Gene Ontology term enrichment analyses with GOseq, based on gene count matrices from edgeR and DESeq2