Abstract:
This dataset includes 3 subsections. The first contains links to the 16S SSU rRNA gene sequences that were generated and deposited to the SRA at NCBI (Bioproject PRJNA414249). The second contains the raw oxygen concentrations for core profiles. The final contains the R code used to analyze these data, including R datafiles (RDS) that were used as input in predicting microbial community composition across the Gulf, the actual models (random forest regressions for each OTU), and the resulting predictions. Samples were collected from August 2012 to August 2015.
Suggested Citation:
Will A. Overholt, Joel E. Kostka. 2018. Benthic microbial community composition across the northern and southern Gulf of Mexico, 2012-2015. Distributed by: GRIIDC, Harte Research Institute, Texas A&M University–Corpus Christi. doi:10.7266/N70G3HN3
Purpose:
Our objectives were to (1) characterize un-impacted sedimentary microbial communities to establish a baseline, (2) map the biogeographical patterns in microbial community structure across the Gulf of Mexico, (3) using this map, generate a Gulf of Mexico biogeography model of microbial community structure that can predict the abundance of dominant microbial populations, and (4) determine if impacted regions had returned to baseline conditions.
Data Parameters and Units:
Oxygen spreadsheet: Site (named sampling location), Cast (1,2,3 – referring to which deployment of the multicorer was sampled, in almost all cases only 1 deployment was performed and this number is 1), Core (1,2,3 – referring to replicates from a single multicorer deployment, nearly always 1), Date (“MM-YYYY”), Water Depth (m), Latitude (decimal degrees), Longitude (decimal degrees), Depth (mmbsf – mm below the seafloor – sediment column depth with 0 referring to the surface), Oxygen (µmol/L), Region (either NGoM or SGoM depending on sampling location, northern vs southern Gulf of Mexico) Sequences: Spreadsheet detailing sample names & BioProject ID for the NCBI SRA database Sample_Name, bioproject accession number, collection date (MM-YY), sediment depth (cmbsf, cm below the seafloor), Water Depth (m), env_biom (NCBI defined), env_feature (NCBI defined), env_material (NCBI defined), geo_loc_name (NCBI defined), lat_lon (decimal degrees), Core Name (the name of the site sampled), Cast_Replicate_Number (Core replicates from a specific sampling time point, 1-3 = Cast replicates, a-c = Core replicates within a cast, tech = technical replicates, replicate DNA extraction from the same sediment). R_script+Models: Defined in the attached csv form (index_README.csv). Root directory contains all R scripts, input_files contains datasets that are utilized by the R code, R_data_files contains R_data_structures generated by the R code, R_scripts_on_cluster has 2 scripts to generate model results using GA Tech’s high performance computing cluster. Operational taxonomic units clustered by taxonomic names at the class level and taxonomic assignments