Dataset Available

Social media data (texts, images and videos URLs) with geo-referenced locations during major hurricanes (Beryl, Hanna, Harvey, Imelda and Laura) that occurred in Texas from 2017 to 2024

Authors: Vysyaraju, Uday, Yalong Pi, and David Retchless

Published On: Jul 11 2025 16:19 UTC

DOI: https://doi.org/10.7266/9eb44a2x

UDI: O1.x158.000:0003

Files

Individual Files

No. of Downloads: 0

No. of Files: 5

File Size: 1.58 GB

File Format(s):
xlsx, docx

Project Information

Funded By:
Texas OneGulf

Funding Cycle:
Texas OneGulf 2023

Research Group:
Towards Targeted Risk Mitigation: Community Engaged, Fast Impact Estimation of Extreme Weather using Big Social and Climate Data

Point of Contact

Yalong Pi
Texas A&M University
piyalong@tamu.edu

Data Collection Period

2017-08-23

2024-07-09

Theme keywords

weather, disaster self reports, geo-tag, images, social media, climate, hurricane, tropical storms, tropical weather events, Texas tropical storms, Hurricane Harvey, Hurricane Beryl, Hurricane Laura, Tropical Storm Imelda, Disaster Impact Estimation Prediction model, historical climate variables, wind speed, preciptation, air temperature, storm conditions, Hurricane Hanna

ISO 19115-2 Metadata

View Metadata

Abstract:

This dataset contains social media data (texts, images and videos URLs) with geo-referenced locations during major hurricanes (Beryl, Hanna, Harvey, Imelda and Laura) that occurred in Texas from 2017 to 2024. Tweets were collected using X.com's API. Each record contains the tweet text, place coordinates, image URL, and other metadata. Retweets and tweets without images or geo-tags are not considered. We collected two weeks before to two weeks after each hurricane event. The results contain roughly 600,000 to 700,000 tweets. The URLs can be directly used to download the images.

Suggested Citation:

Vysyaraju, Uday, Yalong Pi, and David Retchless. 2025. Social media data (texts, images and videos URLs) with geo-referenced locations during major hurricanes (Beryl, Hanna, Harvey, Imelda and Laura) that occurred in Texas from 2017 to 2024. Distributed by: GRIIDC, Harte Research Institute, Texas A&M University–Corpus Christi. doi:10.7266/9eb44a2x

Purpose:

This data can serve as social media (self-reports) input for the deep learning models to estimate the disaster damage rapidly. This input can be text, image, or in combination with other metadata such as climate and demographic information.

Data Parameters and Units:

Description (text), lang (language identifier, text), geo (X's geo object information, text), preview_image_url (URL to the static placeholder preview, URL), url (URL to the media file on Twitter, URL), media_key (unique identifier of the expanded media content, text), text (UTF-8 text of the Tweet, text), place_id (place IDs that can be retrieved from geo/reverse_geocode, text), Place (named locations with corresponding geo in coordinates, text), id (unique ID of each tweet, text), images (compressed RGB image), entities (arrays for hashtags, URLs, user mentions, symbols (cashtags), media (photos, videos, GIFs), and polls), lang (language identifier, en = english, es = spanish, hi = hindi, zxx = media or Twitter card only, qme = media links, qht = hashtags only, cy = Welsh)

Methods:

The researchers used X.com API with a $5000/month subscription to collect the data. The code and time windows were researched and defined before the start of the data collection to maximize the data size. Eventually, the 1 million tweet limit was reached in the third week of the data collection process. Therefore, the subscription was canceled after one month.