Baby Names and Markov Chains

Recently I had the pleasure of reading Shape by Prof. Jordan Ellenberg. In Chapter 4, he describes Markov's use of his chains to model alliteration in Russian poetry.

Markov is one of the seminal contributors to early constructive mathematics and recursion theory, the topics of my own research. The subject is very topical given today's excitement for "generative AI" and chatbots.

Ellenberg describes an amusing use of Markov chains to generate baby names by computing the distribution of ngrams in a corpus of names, and then taking a random walk generated according to that distribution.

I couldn't help but try this out for myself.

I wrote a small program in Go to generate baby names using Markov chains. I grabbed a corpus of the top 1000 baby names (boys and girls) from 2023. Using the distribution of 4-grams produced good results.

Go is enjoyable to use for this kind of work because of go generate, go:embed, and tinygo, a wonderful compiler that produces very small outputs for the WebAssembly target.

Go generate some baby names here. The program runs entirely in your browser and generates names in chunks of 100 as you scroll. The algorithm constructs many familiar names, like Maddison and Sawyer, and some original ones, like Skylerine and Deanora.

I was lucky to have Jordan as a professor at Wisconsin, and he was gracious enough to serve on my thesis committee. Listening to him (yeah, I bought the audiobook too) took my right back to Madison.