I was speaking to someone the other day about test-driven development, and he used a word that was unfamiliar to me: “kata.” I had done TDD as part of my software engineering class (writing unit tests first, and then writing code to pass those tests), but had never heard of kata before.
After clarifying how to spell it, I googled the term and found the page http://codekata.com/. I was distracted by the kittens at first, but then found a little bit of interesting discussion about deliberately practicing coding in an attempt to hone ones craftsmanship.
I do view programming as a sort of craft, somewhere between an art and a science, so I found this intriguing. I decided to try my hand at one.
I scrolled a little bit down the list, and picked one that sounded interesting: Kata05, Bloom Filters.
The purpose of this kata is to create a sort of “spellchecker” that can quickly and memory-efficiently look up whether a word is valid or not, based upon the presence or not of several hashes.
I started by deciding on an interface and writing some basic tests to check valid and invalid words, then set to work on solving the problem. After a little bit of time (not really free of distractions, unfortunately), I had something that I thought would work. But it didn’t pass all my tests – it said that all the invalid words were valid!
Ah, but that’s one of the tricky things about this problem – bloom filters guarantee that all valid words will be included, but there could also be false positives. And that’s what I was running into – I had hash collisions on every single word because the entire table was basically full. I guess this makes sense – I was creating 8 hashes per word, and only had 65536 slots in my table, and there were at least 75,000 words. Of course there were collisions!
I decided to test with a much smaller dictionary of words, and this time all the tests passed. It’s a satisfying feeling to run a battery of unit tests and get all OK’s. Opening up the hash table to be a bit larger, the original word list then also passed everything.
Some people recommend doing the same kata again and again. John Sonmez makes some good points about why repetition of kata isn’t particularly helpful for honing ones craftsmanship of code, and I think I agree with him. We need to constantly push ourselves with new challenges to get better. That said, kata could be used with artificial restrictions, such as using novel methods or learning a very different style of programming language, and in this case could be beneficial.
I’ll at least look at doing more in the future!