I’ve shamelessly copied the idea of transients from Rich Hickey’s Clojure. The basic idea is that when we’re originally populating a collection, we don’t need the updates to be implemented functionally, because we know we’re holding the only reference to the collection; the updates can be performed by mutation much of the time, as long as it’s done in such a way that the resulting tree still has the correct structure. Once it’s fully populated, we convert it back to a persistent collection (the usual kind in FSet), and then we can pass it around without worrying about it being further mutated underneath us.
Transients are available only for the CHAMP collections: ch-set, ch-map,
ch-bag, ch-2-relation, ch-replay-set, and ch-replay-map. Transients
have a very narrow API, with only a few operations supported: adding a single element or pair,
removing a single element or map key, retrieving a single element, testing membership, and the like.
Iterating over a transient is unsupported or at least discouraged (more on this below). Operations
that combine two collections, such as union, are definitely not supported. You see that
transients are useful primarily for the specific scenario of populating a new collection, and little
else — but of course, populating a new collection is very common.
Transients are mutable objects. Instead of the usual FSet functional update operations —
with, less, etc. — you update them using operations such as include!,
exclude!, and (setf lookup!). include! and exclude! correspond to
with and less, or perhaps more clearly, to includef and excludef, which
expand to with and less respectively. Be sure you understand the semantic
differences:
FSET2> (defparameter *psa* (set))
*PSA*
FSET2> (defparameter *psb* *psa*)
*PSB*
FSET2> (includef *psa* 'foo)
##{ FOO }
FSET2> *psb*
##{ } ; still empty
FSET2> (defparameter *tsa* (make-transient *psa*))
*TSA*
FSET2> (defparameter *tsb* *tsa*)
*TSB*
FSET2> (include! *tsa* 'bar)
#<TRANSIENT-CH-SET 334> ; transients don't print their contents
FSET2> (contains? *tsb* 'bar)
T
BAR
FSET2> *psa*
##{ FOO } ; original persistent is unchanged
Since transients are mutable objects, they are therefore distinguished by identity, following FSet
recommended practice. A transient is only ever equal? to itself, independent of its contents.
Transients are assigned a unique serial number when created, which is used for printing, as in the
example above.
When accessing a transient map, you can use lookup or @ as usual, but to add or update
a key/value pair, you must use setf with lookup!, or else 3-argument include!:
FSET2> (defparameter *tm* (make-transient (map))) *TM* FSET2> (setf (lookup! *tm* 'foo) 3) 3 FSET2> (@ *tm* 'foo) 3 T FOO FSET2> (lookup! *tm* 'foo) 3 T FOO FSET2> (include! *tm* 'bar 7) #<TRANSIENT-CH-MAP 335> FSET2> (@ *tm* 'bar) 7 T BAR
As you see, you can use lookup! for accessing, too, if you find it more consistent, but
that’s a little odd since the transient map won’t be modified in that case. It’s your choice.
If you forget and try to use with, less, includef, excludef, (setf
(@ ...) ...) or (setf (lookup ...) ...) with a transient, you will get a runtime error.
You might have noticed, in the above examples, that to create a transient collection, you must first create a persistent one. This is partly to make it easy to tell FSet what type of transient to create, but it’s also because the persistent collection doesn’t actually have to be empty; if it isn’t, the transient will be initialized with its contents.
This doesn’t, however, mean that you can make any persistent update go faster by making a transient from the persistent, applying a single update, and making it back into a persistent; that would actually be a little slower than doing a normal functional update. The performance benefits of transients accrue when nodes in the internal tree are updated multiple times via a single transient. So you won’t see much benefit from making an already-populated persistent into a transient unless the number of updates you do to the transient before converting it back is at least a good fraction, like maybe half, of its initial size. There’s one exception to this, though: If you repeatedly modify the value for a given map key, or the multiplicity of a bag element, you will see significant performance gains using a transient.
The function make-persistent, that extracts a persistent collection from a transient one, has
two modes, controlled by its copy? keyword parameter. If copy? is false — the
default case — then make-persistent runs in O(1) time; but it resets the transient,
so that if you continue to update it afterwards, it is as if the transient was just created: you’ll
lose the performance benefit.
However, if copy? is true, then make-persistent copies out the contents of the
transient, instead of reusing its internal tree, and leaves it unchanged; this takes O(n) time
(though it’s a pretty fast O(n)), but you can then continue to update the transient, and it will be
just as fast as it was. Copying also does compaction, squeezing out the extra space allocated in
the transient nodes to allow for updates. So, in the common case in which you’re discarding the
transient after calling make-persistent on it, let copy? default to false —
unless you’ve just created a very large collection that you expect to keep in memory for a
long time, such that the compaction might be desirable; in that case, consider passing true for
copy?.
If you want to iterate over the current state of a transient collection, in most cases you will need
to just make a persistent from it and iterate over that, in any of the several ways FSet gives you
to iterate. As just mentioned, (make-persistent ... :copy? t) does that without disturbing
the transient, and the fact that it takes O(n) time is likely of little consequence if you’re about
to perform another O(n) operation on the collection anyway by iterating over it. But for the replay
collections, transient-ch-replay-set and transient-ch-replay-map, you don’t need to
make a copy, as these have an at-index operation to retrieve an element or pair by its index;
this can be used with dotimes or the like to iterate over the collection. However, there’s a
tradeoff; this is an O(log n) operation. (The reason the replay collections have this operation
when the others don’t is that their ordering is well-defined, being the order in which elements or
keys were first added.)
By default, transients are unsynchronized and are not thread-safe. However,
make-transient has keyword parameter synchronized?, which if true, adds locking to
serialize all operations on the transient.