A Note on HAMT and Time Complexity (Modern Common Lisp with FSet)

5.2.1 A Note on HAMT and Time Complexity ¶

You will see, in documentation of HAMT-based collections such as Clojure’s, claims that lookups and point updates take “effectively constant” time. You’ll also see notations like “O(log₃₂ n)”. To my surprise, I have sometimes seen these locutions even in peer-reviewed papers.

This is all marketing, not computer science. They’re trying to get you not to be put off by the fact that not only are lookups and point updates on functional types slower than with traditional mutable hash tables, but they get somewhat slower yet as the collections get large, which hash table operations do not. I agree with their goal — for the vast majority of applications, the difference is of no consequence, and you shouldn’t worry about it — but not with the way they’re trying to make the point.

For one thing, they never define “effectively”, but sometimes they say it loudly and bang the table a bit, hoping you’ll overlook the absence of a definition: “effectively constant”. For another, as I have explained, the base of the logarithm in a Big-O expression doesn’t matter, because logarithms of different bases are related by a constant factor, and time complexity abstracts away constant factors. So writing “O(log₃₂ n)” is an abuse of the notation; it’s the same as O(log n).

In my view, the way to make the point — that the performance costs of well-implemented functional collections are usually not worth worrying about — is twofold. First, we shouldn’t forget that time complexity is not all there is to time performance: there are also constant-factor differences, which can be quite substantial. Fully characterizing the performance of an implementation of an algorithm requires mentioning both. The mistake is to try to cram both pieces of information into the Big-O expression, which is specifically designed to isolate one of them and ignore the other.

And second, we should point out that logarithmic time isn’t actually all that different from constant time. A logarithmic-time operation takes only twice as long on a 10⁶-element collection as on a 10³-element collection, and only four times as long on a 10¹²-element collection. So yes, it does slow down a little as the collection grows, but for most use cases this isn’t a problem at all. In particular, the gap between constant time and logarithmic time is far smaller than that between logarithmic time and linear time. I’m not going to gaslight you by saying that there is “effectively” no difference between constant and logarithmic, but I will say that it is only in extreme circumstances that the difference matters.