5.9 Transients

I’ve shamelessly copied the idea of transients from Rich Hickey’s Clojure. The basic idea is that when we’re originally populating a collection, we don’t need the updates to be implemented functionally, because we know we’re holding the only reference to the collection; the updates can be performed by mutation much of the time, as long as it’s done in such a way that the resulting tree still has the correct structure. Once it’s fully populated, we convert it back to a persistent collection (the usual kind in FSet), and then we can pass it around without worrying about it being further mutated underneath us.

Transients are available only for the CHAMP collections: ch-set, ch-map, ch-bag, ch-2-relation, ch-replay-set, and ch-replay-map. Transients have a very narrow API, with only a few operations supported: adding a single element or pair, removing a single element or map key, retrieving a single element, testing membership, and the like. Iterating over a transient is unsupported or at least discouraged (more on this below). Operations that combine two collections, such as union, are definitely not supported. You see that transients are useful primarily for the specific scenario of populating a new collection, and little else — but of course, populating a new collection is very common.

Transients are mutable objects. Instead of the usual FSet functional update operations — with, less, etc. — you update them using operations such as include!, exclude!, and (setf lookup!). include! and exclude! correspond to with and less, or perhaps more clearly, to includef and excludef, which expand to with and less respectively. Be sure you understand the semantic differences:

FSET2> (defparameter *psa* (set))
*PSA*
FSET2> (defparameter *psb* *psa*)
*PSB*
FSET2> (includef *psa* 'foo)
##{ FOO }
FSET2> *psb*
##{ }                                              ; still empty
FSET2> (defparameter *tsa* (make-transient *psa*))
*TSA*
FSET2> (defparameter *tsb* *tsa*)
*TSB*
FSET2> (include! *tsa* 'bar)
#<TRANSIENT-CH-SET 334>                            ; transients don't print their contents
FSET2> (contains? *tsb* 'bar)
T
BAR
FSET2> *psa*
##{ FOO }                                          ; original persistent is unchanged

Since transients are mutable objects, they are therefore distinguished by identity, following FSet recommended practice. A transient is only ever equal? to itself, independent of its contents. Transients are assigned a unique serial number when created, which is used for printing, as in the example above.

When accessing a transient map, you can use lookup or @ as usual, but to add or update a key/value pair, you must use setf with lookup!, or else 3-argument include!:

FSET2> (defparameter *tm* (make-transient (map)))
*TM*
FSET2> (setf (lookup! *tm* 'foo) 3)
3
FSET2> (@ *tm* 'foo)
3
T
FOO
FSET2> (lookup! *tm* 'foo)
3
T
FOO
FSET2> (include! *tm* 'bar 7)
#<TRANSIENT-CH-MAP 335>
FSET2> (@ *tm* 'bar)
7
T
BAR

As you see, you can use lookup! for accessing, too, if you find it more consistent, but that’s a little odd since the transient map won’t be modified in that case. It’s your choice.

If you forget and try to use with, less, includef, excludef, (setf (@ ...) ...) or (setf (lookup ...) ...) with a transient, you will get a runtime error.

You might have noticed, in the above examples, that to create a transient collection, you must first create a persistent one. This is partly to make it easy to tell FSet what type of transient to create, but it’s also because the persistent collection doesn’t actually have to be empty; if it isn’t, the transient will be initialized with its contents.

This doesn’t, however, mean that you can make any persistent update go faster by making a transient from the persistent, applying a single update, and making it back into a persistent; that would actually be a little slower than doing a normal functional update. The performance benefits of transients accrue when nodes in the internal tree are updated multiple times via a single transient. So you won’t see much benefit from making an already-populated persistent into a transient unless the number of updates you do to the transient before converting it back is at least a good fraction, like maybe half, of its initial size. There’s one exception to this, though: If you repeatedly modify the value for a given map key, or the multiplicity of a bag element, you will see significant performance gains using a transient.

The function make-persistent, that extracts a persistent collection from a transient one, has two modes, controlled by its copy? keyword parameter. If copy? is false — the default case — then make-persistent runs in O(1) time; but it resets the transient, so that if you continue to update it afterwards, it is as if the transient was just created: you’ll lose the performance benefit.

However, if copy? is true, then make-persistent copies out the contents of the transient, instead of reusing its internal tree, and leaves it unchanged; this takes O(n) time (though it’s a pretty fast O(n)), but you can then continue to update the transient, and it will be just as fast as it was. Copying also does compaction, squeezing out the extra space allocated in the transient nodes to allow for updates. So, in the common case in which you’re discarding the transient after calling make-persistent on it, let copy? default to false — unless you’ve just created a very large collection that you expect to keep in memory for a long time, such that the compaction might be desirable; in that case, consider passing true for copy?.

If you want to iterate over the current state of a transient collection, in most cases you will need to just make a persistent from it and iterate over that, in any of the several ways FSet gives you to iterate. As just mentioned, (make-persistent ... :copy? t) does that without disturbing the transient, and the fact that it takes O(n) time is likely of little consequence if you’re about to perform another O(n) operation on the collection anyway by iterating over it. But for the replay collections, transient-ch-replay-set and transient-ch-replay-map, you don’t need to make a copy, as these have an at-index operation to retrieve an element or pair by its index; this can be used with dotimes or the like to iterate over the collection. However, there’s a tradeoff; this is an O(log n) operation. (The reason the replay collections have this operation when the others don’t is that their ordering is well-defined, being the order in which elements or keys were first added.)

By default, transients are unsynchronized and are not thread-safe. However, make-transient has keyword parameter synchronized?, which if true, adds locking to serialize all operations on the transient.