Dynamic Tuples (Modern Common Lisp with FSet)

Previous: Transients, Up: FSet Data Structures [Contents][Index]

5.10 Dynamic Tuples ¶

Tuples, like maps, are collections of key/value pairs updated functionally. But where maps are designed for situations where the key values are an open class — they’re created at runtime, there can be arbitrarily many of them, and you don’t know in advance what they’re going to be — tuples are designed for the case where the keys are a closed class: you know what they are at compile time. Also, in a map, the range values are all logically of the same type, while in a tuple, each key has its own value type.

Perhaps the best analogy is that a tuple is like a row in a relational table. Each key corresponds to a relational column, and of course the columns can contain different types. You could also think of tuples as like objects where the keys correspond to slots, but at least for now, tuples have no notion of class; there’s no way to write methods that dispatch on a type field, except manually using good old ecase or the like. So I think the relational row — a heterogeneous collection of keyed values without associated behaviors — is the clearest analog.

Since Lisp is untyped, you can, of course, use a map for this purpose anyway; in a statically typed language you’d have to give all the column types some common supertype, but Lisp spares you that ceremony. So a map is a viable choice, even when the values are of different types. Nonetheless, there are three reasons you might want to use a tuple rather than a map:

It’s better FSet style, as it communicates immediately to the reader of your code that the keys are known at compile time and the values are (probably) of different types.
FSet allows you to declare the value type of a key, and will enforce it for you. Keys can also have individual defaults.
In a case where you have many instances of a map all with the same set of keys, you would be paying a space cost that could get to be significant. For instance, a CHAMP map with four key/value pairs would typically take 14 words (112 8-bit bytes, on a 64-bit machine), while a four-key dynamic tuple brings that down to 9 words — 36% smaller. Given millions of instances, that could start to matter.
(There’s a caveat. Dynamic tuples maintain a descriptor for every set of keys ever used in a single tuple over the lifetime of the Lisp session. The assumption is that code that builds tuples will tend to add the keys in the same order every time, or at worst, in one of a small set of possible orders; so the space used by the descriptors will be amortized over many instances. This is a fairly reasonable assumption in light of the intended use case, where they’re being built by constructor-like functions in the code, but if they’re being built in some more data-dependent way, it might not pan out.)

Dynamic tuples are implemented quite unlike any other FSet type. Each key is assigned a number. Each descriptor contains a vector of pairs of integers, implemented as bytes (bitfields) of a fixnum; one integer is the key number, and the other is the index of the value in tuples using this descriptor. Lookup is done by linear search through the vector for a pair whose key number matches the key being looked for. For small tuples, cache effects make this quite fast. For larger ones, FSet does something clever: it dynamically reorders the descriptor vector so that frequently-used keys tend to migrate toward the front. If the frequency of access of the different keys follows something like a power-law distribution, with some keys being accessed much more frequently than others — not an unreasonable assumption — then searches for keys beyond the first few will be relatively rare.

Also, even though the linear search means that tuple operations tend to have O(n) time complexity in the size of the tuple, this is much less likely to be an issue than it may initially sound. Where the sizes of maps are often directly related to the size of the input that a program is processing, tuple sizes are practically always independent of the size of the input; a larger input will typically cause more tuple instances to be consed, but the size of each one won’t change. (Do you add more slots to your CLOS objects as the input gets large? That would be very unusual.) So the time complexity of your program as a whole won’t be affected.