Abstract:
We collected tweets using X.com's API for geolocated social media posts search within the rectangle of Texas. Each record contains the tweet text, place longitude and latitude coordinates (bounding boxes and points), image URL, and other metadata. Retweets and tweets without images or geo-tags are not considered. We collected data two weeks before and two weeks after each hurricane event in Texas since 2010. The results contain roughly 600,000 to 700,000 tweets. The URLs can be directly used to download the images.
Suggested Citation:
Uday Vysyaraju, Yalong Pi, and David Retchless. Social media data (texts, images and videos URLs) with geo-reference during major hurricanes that occured in Texas since 2010. Distributed by: GRIIDC, Harte Research Institute, Texas A&M University–Corpus Christi. doi:10.7266/9eb44a2x
Purpose:
This data can serve as social media (self-reports) input for the deep learning models to estimate the disaster damage rapidly. This input can be text, image, or in combination with other metadata such as climate and demographic information.
Data Parameters and Units:
Description (text), lang (language identifier, text), geo (X's geo object information, text), preview_image_url (URL to the static placeholder preview, URL), url (URL to the media file on Twitter, URL), media_key (unique identifier of the expanded media content, text), text (UTF-8 text of the Tweet, text), place_id (place IDs that can be retrieved from geo/reverse_geocode, text), Place (named locations with corresponding geo in coordinates, text), id (unique ID of each tweet, text), images (compressed RGB image), entities (arrays for hashtags, URLs, user mentions, symbols (cashtags), media (photos, videos, GIFs), and polls)
Methods:
The researchers used X.com API with a $5000/month subscription to collect the data. The code and time windows were researched and defined before the start of the data collection to maximize the data size. Eventually, the 1 million tweet limit was reached in the third week of the data collection process. Therefore, the subscription was canceled after one month.