Abstract:
This dataset contains social media data (texts, images and videos URLs) with geo-referenced locations during major hurricanes (Beryl, Hanna, Harvey, Imelda and Laura) that occurred in Texas from 2017 to 2024. Tweets were collected using X.com's API. Each record contains the tweet text, place coordinates, image URL, and other metadata. Retweets and tweets without images or geo-tags are not considered. We collected two weeks before to two weeks after each hurricane event. The results contain roughly 600,000 to 700,000 tweets. The URLs can be directly used to download the images.
Suggested Citation:
Vysyaraju, Uday, Yalong Pi, and David Retchless. 2025. Social media data (texts, images and videos URLs) with geo-referenced locations during major hurricanes (Beryl, Hanna, Harvey, Imelda and Laura) that occurred in Texas from 2017 to 2024. Distributed by: GRIIDC, Harte Research Institute, Texas A&M University–Corpus Christi. doi:10.7266/9eb44a2x
Purpose:
This data can serve as social media (self-reports) input for the deep learning models to estimate the disaster damage rapidly. This input can be text, image, or in combination with other metadata such as climate and demographic information.
Data Parameters and Units:
Description (text), lang (language identifier, text), geo (X's geo object information, text), preview_image_url (URL to the static placeholder preview, URL), url (URL to the media file on Twitter, URL), media_key (unique identifier of the expanded media content, text), text (UTF-8 text of the Tweet, text), place_id (place IDs that can be retrieved from geo/reverse_geocode, text), Place (named locations with corresponding geo in coordinates, text), id (unique ID of each tweet, text), images (compressed RGB image), entities (arrays for hashtags, URLs, user mentions, symbols (cashtags), media (photos, videos, GIFs), and polls), lang (language identifier, en = english, es = spanish, hi = hindi, zxx = media or Twitter card only, qme = media links, qht = hashtags only, cy = Welsh)
Methods:
The researchers used X.com API with a $5000/month subscription to collect the data. The code and time windows were researched and defined before the start of the data collection to maximize the data size. Eventually, the 1 million tweet limit was reached in the third week of the data collection process. Therefore, the subscription was canceled after one month.