Twitter Poetry

This is a toy project by David Abelman, created in 2014. The algorithm pulls data from Twitter, passes it through 3 layers of MapReduce, and outputs a rhyming, half-sensical poem composed from tweets from around the world.

if i say "you're pretty" don't tell me "no i'm not"
— @Jakeadelics November 19, 2014

i'm not being sarcastic when i say you're hot.
— @darchwjack November 14, 2014

you say i dream too big and i say you think too small
— @Shabilla_rahma November 21, 2014

hope when the moment comes you say... i, i did it all
— @sharifahmirrah November 21, 2014

The following describes briefly how the poems are generated. See the code.

Stage 1: Tweets collected from Twitter

The user enters a set of search terms (these can have and/or conditions, as well as words and phrases to exclude, etc.). A script is run to crawl Twitter according to these search terms. Tweets will be saved along with all metadata within one large text file.

Stage 2: MapReduce to select unique tweets

A set of mappers parse the text file, pulling out details such as tweet text, username, post date and tweet language, on a tweet by tweet basis. Each tweet is then passed through a filter to establish whether it is valid for further analysis. Factors considered here are whether the tweet is within a specified length range (number of words), whether the tweets starts with a certain specified word (optional), whether the tweet contains certain required terms, whether the tweet contains certain banned terms, and so on.

The tweets that pass the filter are sorted by tweet content, and sent to a set of reducers. All tweets with identical text content will be grouped together. The reducers then run a 'minimum' function on the tweet date within this group, selecting the first tweet for any identical sets of tweets which contain the same content. Thus the first author of a tweet's content will be attributed with the tweet. The other duplicated instances of this tweet are not output from the reducer.

Stage 3: MapReduce to create groups of rhyming Tweets

The next mapper takes each unique tweet, and extracts the final word. A 'rhyme code' is looked up for this word, and output from the mapper along with the tweet. Two rhyming words should have the same rhyme code.

The tweets are sorted by rhyme code and sent to the reducers, ensuring all rhyming lines are now grouped together. The reducers output 'sets' of rhyming tweets, all grouped by rhyme code.

Stage 4: MapReduce to calculate optimal rhyming couplets and order

Each 'set' of rhyming tweets is parsed by a final mapper. The mapper will loop through all combinations of tweet-pairs within this set, trying to find the optimum pair of rhyming tweets within the set. Certain conditions will filter out some pairs (for example if the last word is the same for both tweets in the pair, or if they have the same sound such as 'their' and 'there'). Providing the pair makes it through these conditions, the scansion is scored (CMU pronunciation dictionary allows us to analyse the stress and metre of the sentence), the semantic similarity of the lines is scored (using overlap of non-common words), and the overall number of words recognised as English is scored (to weight against typo-ridden tweets). These scores are combined for each tweet-pair, and the pairs are ranked by total score within the rhyme set. The top-scoring pair of tweets for each rhyme set is sent to the reducer, and will ultimately form a line in the output poem.

The output pairs of tweets are sent to one final reducer, along with the score calculated for the pair. The tweets are sorted by score, thus the highest scoring pairs will appear first in the poem. The reducer outputs lines of the poem until either a low-threshold of score is reached (i.e. the scansion decreases to a less than satisfactory level) or a maximum number of output lines is reached (i.e. the poem is getting too long).

Warning: content of poems is not my own, does not necessarily represent my opinion, and may contain offensive language.

I say you say Chuck Norris Young love My Boyfriend My Girlfriend Love it Cats and Dogs Love Hate Sometimes 6 Sometimes 7 Sometimes 8 Sometimes 9 Sometimes (explicit) Hey Twitter Mathematics Barack Obama I feel like Lovely Day 1 2 3 British politics today I was like A special relationship I'm so lonely I'm so lonely (2) OMG. Metapoetry Short & Hairy Long & Hairy Last Night I say you say I I don't care Shakespeare My cat I love it I hate it Chocolate

Coming soon...

Twitter Poetry with MapReduce

What is this?

How it works

Example poems

Create your own poem

Stage 1: Tweets collected from Twitter

Stage 2: MapReduce to select unique tweets

Stage 3: MapReduce to create groups of rhyming Tweets

Stage 4: MapReduce to calculate optimal rhyming couplets and order