Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.
You can view examples of AI-generated tweets from datasets retrieved with this tool in the
Inspired by popular demand due to the success of @dril_gpt2.
First, install the Python script dependencies:
pip3 install twint==2.1.4 fire tqdm
Then download the
download_tweets.pyscript from this repo.
The script is interacted via a command line interface. After
cding into the directory where the script is stored in a terminal, run:
e.g. If you want to download all tweets (sans retweets/replies/quote tweets) from Twitter user @dril, run:
python3 download_tweets.py dril
The script can can also download tweets from multiple usernames at one time. To do so, first create a text file (.txt) with the list of usernames. Then, run script referencing the file name:
The tweets will be downloaded to a single-column CSV titled
The parameters you can pass to the command line interface (positionally or explicitly) are:
@user tags in the tweet text [default: False]
#hashtags in the tweet text [default: False]
gpt-2-simple has a special case for single-column CSVs, where it will automatically process the text for best training and generation. (i.e. by addingand to each tweet, allowing independent generation of tweets)
You can use this Colaboratory notebook (optimized from the original notebook for this use case) to train the model on your downloaded tweets, and generate massive amounts of Tweets from it. Note that without a lot of data, the model might easily overfit; you may want to train for fewer
When generating, you'll always need to include certain parameters to decode the tweets, e.g.:
gpt2.generate(sess, length=200, temperature=0.7, prefix='', truncate='', include_prefix=False )
Max Woolf (@minimaxir)
Max's open-source projects are supported by his Patreon and GitHub Sponsors. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.
This repo has no affiliation with Twitter Inc.