Abstract:
This dataset contains targeted reconstruction of a population genome in oil-contaminated beach sand samples collected from the Pensacola Municipal Beach (30.32 N 87.17 W), Florida on 2010-10-20. The dataset includes Metagenome-assembled Genome (MAG) sequence, NCBI accession details (NCBI BioProject: PRJNA530182; BioSample: SAMN12824187; Sample name: Candidatus Macondimonas diazotrophica strain:KTK01) and the sampling details including the location, depth and date of sample collection. This dataset supports the publication: Karthikeyan, S., Rodriguez-R, L. M., Heritier-Robbins, P., Kim, M., Overholt, W. A., Gaby, J. C., … Konstantinidis, K. T. (2019). “Candidatus Macondimonas diazotrophica”, a novel gammaproteobacterial genus dominating crude-oil-contaminated coastal sediments. The ISME Journal. doi:10.1038/s41396-019-0400-5
Suggested Citation:
Smruthi Karthikeyan, Luis M Rodriguez-R, Patrick Heritier-Robbins, Minjae Kim, Will A Overholt, John C. Gaby, Janet K. Hatt, Jim C. Spain, Ramon Rosselló-Móra, Markus Huettel, Joel E Kostka, Konstantinos T Konstantinidis. 2020. Targeted reconstruction of a population genome in oil contaminated beach sand samples collected from the Pensacola Municipal Beach, Florida on 2010-10-20. Distributed by: GRIIDC, Harte Research Institute, Texas A&M University–Corpus Christi. doi:10.7266/n7-9k94-bg96
Publications:
Karthikeyan, S., Rodriguez-R, L. M., Heritier-Robbins, P., Kim, M., Overholt, W. A., Gaby, J. C., … Konstantinidis, K. T. (2019). “Candidatus Macondimonas diazotrophica”, a novel gammaproteobacterial genus dominating crude-oil-contaminated coastal sediments. The ISME Journal. doi:10.1038/s41396-019-0400-5
Purpose:
To recover and describe an abundant member of the microbial communities in the oil-contaminated Pensacola Beach sands, and to characterize the microbial community in Gulf of Mexico sediments.
Data Parameters and Units:
The dataset consists of 1 Excel file (GRIIDC_R5_x278_000_0002.xlsx), 1 word document (Methodology.docx), 1 FASTA file (JAACZP01.1.fasta), and 1 GBFF file (JAACZP01.1.gbk.gbff). It includes gene sequences, statistical data on allele frequency, nif gene abundance, methodology, NCBI accession and sampling details.
The file "GRIIDC_R5_x278_000_0002.xlsx" contains: NCBI Accession Details (BioProject #), Assembly (https://www.ncbi.nlm.nih.gov/Traces/wgs/JAACZP01?display=contigs&page=1), WGS (JAACZP000000000), biosample accession number (SAMN12824187), Package (MIMAG: metagenome-assembled genome, soil; version 5.0), Organism, and sample details [isolation source, collection date (DD-Mon-YY), geographic location, latitude and longitude (degrees N and degrees W), reference for biomaterial, relationship to oxygen, isolate (MAG-01), depth (cm), elevation, broad-scale environmental context, local-scale environmental context, and environmental medium].
Methods:
Sample collection and DNA extraction: Beach sand samples were collected from the Pensacola Municipal beach (Florida, USA) before (pre-spill/clean) and after (oil-contaminated and recovered beach sands) the oil slick reached the shoreline as described elsewhere (Rodriguez-R et al 2015). Sixteen shotgun metagenomic datasets (Rodriguez-R et al 2015) and 122 16S rRNA gene amplicon datasets (Huettel et al 2018), sequenced from various sampling time points, were used in the analyses. Initial sample processing and sequencing were done as described previously (Rodriguez-R et al 2015).
Targeted reconstruction of the population genome: To recover the population that carried the abundant nifH allele, the target allele was searched against the genes predicted in all assembled metagenomic contigs using Blast, and the contigs containing genes identical to the target were extracted, which resulted in a collection of 8 contigs totaling 94Kbp (training set). The contigs thus obtained were used to construct fragment recruitment plots from all the metagenomic datasets. The contigs had high and even sequencing coverage in the oil-contaminated samples, ranging from 5 to 100X, and were virtually absent in the recovered and pre-spill datasets. The remaining contigs from the assembly that showed similar tetranucleotide signature to the training coting set based on log-likelihood estimates were subsequently identified (about 6,000 contigs with a total length of 16 Mbp). The final assembly involved re-assembly of the contigs identified in the previous step in order to reduce redundancy (since they came from different metagenomes) using IDBA-UD (Peng, Leung et al. 2012), which resulted in a total of 122 contigs with a total length of 2.5Mbp (N50 75Kbp). The resulting draft population genome had a CheckM (Parks, Imelfort et al. 2015) completeness of 96.39% and contamination of 0.32%. The likely taxonomic affiliation obtained for this bin using the MiGA webserver (www.microbial-genomes.org) was a Gammaproteobacteria order.
Provenance and Historical References:
Huettel, M., Overholt, W. A., Kostka, J. E., Hagan, C., Kaba, J., Wells, W. B., & Dudley, S. (2018). Degradation of Deepwater Horizon oil buried in a Florida beach influenced by tidal pumping. Marine Pollution Bulletin, 126, 488–500. doi:10.1016/j.marpolbul.2017.10.061
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25(7), 1043–1055. doi:10.1101/gr.186072.114
Peng, Y., Leung, H. C. M., Yiu, S. M., & Chin, F. Y. L. (2012). IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11), 1420–1428. doi:10.1093/bioinformatics/bts174
Rodriguez-R, L. M., Overholt, W. A., Hagan, C., Huettel, M., Kostka, J. E., & Konstantinidis, K. T. (2015). Microbial community successional patterns in beach sands impacted by the Deepwater Horizon oil spill. ISME J, 9(9), 1928–1940. doi:10.1038/ismej.2015.5