I've had a lot of requests from people to explain how the thing works. A lot of people wonder why the haiku seem so pertinent to their own lives, contain the poems they've written, and so on. The answer is that it's entirely based on your own journal - the generator has no knowledge of English (or any other language) at all!
Yes, the output may not be strictly 5-7-5, although there is code to try and make it count syllables correctly. English (my primary language) is hard to syllable count accurately! Some syllable information is taken from GCIDE, a free dictionary.
Well, make more of them in your blog! The meme only uses words you have written!
Then, in all likelihood at some point your LiveJournal was rude or depressing :-) I wouldn't worry about it, it's just a silly meme.
The script has four steps:
The Markov chains code builds an index, so that for any two words you've used after each other, it knows the probability of the third word. For example if you'd used "I am", then the probability table for the next word might look like:
| Word | Probability |
|---|---|
| angry | 30% |
| happy | 50% |
| silly | 10% |
| smelly | 10% |
The program then rolls a virtual dice; 30% of the time it would use the word "angry" in the haiku, 10% "smelly", etc.
So the program picks the first two words of the haiku from words you have used; it then moves on choosing the next word by rolling this virtual dice. For each word, it uses a simple algorithm to get an estimate of the number of syllables. If the number of syllables on a generated line doesn't match the requested form, it drops the line and tries again.
If you haven't got many entries, there will often only be one possibility for the generator to choose from -- for example, the words "I met" might only have been followed by "Sally" in your journal. This causes the behaviour where it parrots back entire sentences of your journal. If you come back later after writing more posts, they will be included in your new Haiku.
Hope this helps; if you have any questions feel free to post in my LJ. I'll try and get back to you :-)
I wanted a cool signature for my email, so I wrote a haiku generator and fed it the King James bible from Project Gutenburg. This was kind of fun, and then I came up with the idea of running it over people's LiveJournals. I then procrastinated about it for ages, and eventually finished it. Woo :-)
The most important change to the Haiku meme is thanks to Tom Lynch. Many thanks to him for pointing me in the direction of entropic chunking!
Much open source software lets this site function; it is running via web.py on lighttpd, and of course is written in the Python programming language. Feedparser provides an invaluable interface which allows the site to grab blog feeds.