The Common Lisp implementation of FSet posed a number of challenging design choices. I wanted each implementation to be, as much as possible, idiomatic and well integrated into its source language. This went double for the CL implementation, as it's the one I am using personally :) It occurred to me that CLOS could be used to define a single, global, yet extensible ordering/equality relation across all types of interest, and this is probably the most unusual feature of FSet/CL. Because the ordering/equality relation is global, it is not necessary to supply one when creating collections. (In fact, you can't even override the global relation for a particular collection. This is probably a bug, but a pretty minor one, because of the extensibility.)
Specifically, the generic function compare
takes two
arguments and must return one of these four symbols: :less
,
:greater
, :equal
, or :unequal
. If you
look in order.lisp
you will see the orderings I have established
for the built-in types I thought were important: primarily numbers, strings,
and symbols, but also lists and vectors. Note that I have defined not only
the orderings within each of these
types, but also the orderings between
each pair of types. This allows FSet/CL collections to be completely untyped;
you can put any combination of types of objects into one, just as you can
with Lisp lists and vectors. I think this is very much in the spirit of Lisp
:)
(I defined compare
on lists and vectors because I thought
these methods might be handy, but be aware that if you make a set containing
some of these, or a map using them as keys, you must not modify them while
they're in the set or map. See the section on Tuples.)
I faced a quandary, though, trying to decide what to do about some of the
existing CL operations. FSet is somewhat at odds with CL both
terminologically and semantically, a conflict exacerbated by the fact that CL
is somewhat inconsistent and occasionally just screwy. Yet it is clearly very
important that FSet coexist well with code that uses lists and CL sequences,
because those are ubiquitous in existing code, and even a dedicated FSet user
such as myself can't entirely escape using them in new code either. My first
thought, as I approached this problem, was to try to integrate FSet sequences
into the CL generic sequence system by defining new versions of all the CL
operations, that would do CLOS dispatches to the builtin implementations when
invoked on lists and vectors, and to the FSet implementations when invoked on
FSet types. This sounds like the right thing, but turned out to be
problematic, as the CL operations are just too differently designed from the
FSet ones, in two ways. First is the obvious: the CL types are mutable, while
the FSet ones are not. The best example of why this is a problem concerns the
expansion of (setf (elt x n) y)
. This form would mean something
quite different in the two cases: applied to a Lisp sequence, it would modify
the sequence in place; but the only reasonable meaning on an FSet sequence
would be to update the place x
to hold a new sequence. It would
be bizarre to choose between these two semantics at runtime based on the
class of the sequence! Clearly, true genericity across these two systems is a
chimera.
The second divergence of design is that the FSet types are more semantically loaded: they know whether they're sets or sequences, for instance, and they know what equivalence relation is in force. In contrast, the Lisp philosophy is to use bare data structures as the representation, and choose the semantics at the time the operation is applied. I don't know that this divergence produces any outright ridiculous results like the above example, but it was an endless annoyance to me as I tried to write this generic interface, and I suspect it would also have been annoying to users.
And yet, it's silly for FSet not to support familiar operations such as
find-if
. So the approach of not providing any genericity at all
was also clearly wrong (as I quickly discovered when I tried to use FSet that
way).
Considering all the competing pressures on the design at some length, I have finally settled on an approach guided by these principles:
find-if
etc.The nastiest aspect of this approach — both for me and, I expect,
for FSet users — has to do with the Lisp names first
,
rest
, last
, and second
through
seventh
. First, let's note the inconsistencies already in CL:
last
doesn't return the last element of a list, but rather
the last cons in the list. (This is what I meant by "screwy".
Why not call it lastcons
? I know, historical reasons...)For FSet seqs — as I call the sequence type, to
distinguish it from Lisp sequences — I felt it was important to treat
first
and last
symmetrically, since the
implementation is symmetric. Also, there is a whole structure of related
operations operating on the ends of seqs; I have called these
with-first
, with-last
, less-first
,
less-last
, push-first
, push-last
,
pop-first
, and pop-last
, the first four being the
functional update operations, and the last four being the modify macros that
invoke the first four, respectively. Because of the importance of these
operations, I felt it would be acceptable to force some compromises on
existing code. So FSet exports a generic function first
which
includes a compatibility method for lists; here the impact on existing code
is limited to the cost of the CLOS dispatch. For last
, however,
I think that a compatibility method that returns the same thing as
cl:last
when invoked on a list would only exacerbate the
existing screwiness; so FSet doesn't supply one, but instead exports
lastcons
in the hope that users will agree that it's a better
name, and won't mind changing their code to use it instead.
FSet does not, however, export any version of rest
, or of
second
through seventh
. For rest
, I
think less-first
is better as it participates in the symmetric
structure shown above (though, it's a bit long). And second
through seventh
normally show up when short lists are being used
as quick-and-dirty tuples; one could argue that it's poor form to use them at
all, instead of using (defstruct (... (:type list)) ...)
. (From
the perspective of FSet, with its symmetric functional sequences, it strikes
me as a little unfortunate that the names that caught on as modern
replacements for car
and cdr
were
first
and rest
instead of, say, head
and tail
. first
and rest
, it seems to
me, are naturally applied to any sequence, where head
and
tail
more specifically suggest the structure of a list. As it
turns out, lists still have a role to play even in heavily FSet-ified code (a
point I'll return to below); this role is not just as another kind of
sequence, I would argue, but specifically takes advantage of their head/tail
structure. So FSet exports head
and tail
as
synonyms for car
and cdr
.)
Another issue. I mentioned above that generic-function versions of
find-if
etc. are provided. They are, but they are not completely
compatible when invoked on FSet types (they are completely
compatible, of course, when invoked on Lisp sequences). The primary
differences are:
:test
is #'fset:equal?
rather
than #'eql
. Yes, it's a little weird to key this off of the
collection class. But remember, our purpose is not true genericity, but
merely mental convenience. In that light, if you're invoking one of these
on an FSet collection, it's fair to conclude that you're working within
an FSet context, and FSet semantics should apply.:test-not
keyword is not supported (there seems to be
general agreement that it was a bad idea to begin with).Okay. Here is a complete list of builtin CL symbols for which FSet exports
its own versions. I expect that normally users will want to
shadowing-import
these, though it is certainly possible to keep
the Lisp versions instead, and refer to the FSet versions with an explicit
package prefix. (Of course, you don't have to import any FSet symbols at all;
you can refer to all of them with a package prefix. You may wish to define
f:
or fs:
as an abbreviation for fset:
in this case.) But if you do want to make substantial and idiomatic use of
FSet, I recommend you shadowing-import
the FSet versions of
these, or at least most of them:
set
fset:set
. The CL set
is archaic and nearly useless in the modern style, and easily replaced
with (setf (symbol-value ...) ...)
, which is stylistically
preferable anyway. fset:set
names both a type and a
"constructor macro" for that type; more on this below.map
cl:map
has
perfectly legitimate uses, though it's rarer than mapcar
.
Personally, I never use it because I use my GMap macro
instead, but others might miss it. (The similar FSet operation is
called image
.) Again, fset:map
names both a
type and a constructor macro.union
, intersection
, and
set-difference
first
and last
first
has a compatibility method for lists; I've also
included one for CL sequences (once the price of the CLOS dispatch is
being paid, there's no reason not to). last
, as described
above, similarly returns the last element of a seq, or list, or CL
sequence.find
, find-if
, find-if-not
count
, count-if
, count-if-not
position
, position-if
,
position-if-not
remove
, remove-if
, remove-if-not
substitute
, substitute-if
,
substitute-if-not
subseq
, reverse
, sort
,
stable-sort
search
(not yet implemented), mismatch
(not
yet implemented)find
etc., count
etc., and
remove
etc. have methods on sets, bags, and maps as well.
When called on an FSet collection, the default for :test
is #'equal?
; the :test-not
keyword is not
supported; and the :start
, :end
, and
:from-end
keywords are supported only on seqs, not on
sets, bags, or maps. On a map, only the domain elements are
examined.some
, every
, notany
,
notevery
Here are some Lisp sequence operations for which FSet does not export a symbol with the same name, with comments on each. I have omitted destructive operations, which FSet of course has no equivalent for.
elt
lookup
; see also @
, below.length
size
.remove-duplicates
, merge
Along with using heterogeneous trees to save space, the seq implementation
tries to save even more space by using strings, when possible, for the short
leaf vectors. This is possible when the contiguous sequence of up to 8
consecutive elements of the seq that happen to wind up in a particular leaf
vector are all characters. In Lisp implementations where
base-char
is distinct from character
, FSet will
even use base-string
s when possible. The upshot is:
base-char
s interspersed with the occasional
extended-char
quite efficiently.)Since seqs have log-time concatenation and subsequence operations, they can actually be very useful for string manipulation.
@
MacroCommon Lisp is, of course, a "Lisp-2", meaning a two-namespace Lisp in
which a given name can have both a function binding and a value binding. In a
Lisp-1, like Scheme, it would be nice to set the FSet types up as objects
that could be called as functions as well as having the various operations
invoked on them (some object systems for other Lisps have supported this,
such as that of T). Then,
instead of (lookup m x)
where m
is a map, you could
write simply (m x)
. But this doesn't work in a Lisp-2, since the
car
of a form is evaluated in the function namespace; you'd have
to write something like (funcall m x)
, which isn't any more
elegant.
What we can do, however, is give you a shorter name to type than
funcall
. Thus is born @
. It is a macro that expands
to code that tests whether its first argument is a Lisp function; if so, it
funcalls it (with the supplied arguments); otherwise it calls
lookup
on it. So you can use it when writing higher-order code
to make it accept a map as well as a unary Lisp function. (Also, a seq
behaves as a map from index to element, and a set behaves as a map from value
to boolean.)
Whether you want to use @
generally, in place of
lookup
, is a matter of taste. Both are setf
-able;
though doing (setf (@ m x) y)
works only if m
is an
FSet collection. Remember, this doesn't modify the collection; it's
equivalent to (setf m (with m x y))
.
The design of the FSet bag interface is unfortunately not informed by much actual experience, on my part, with bags. I have tried to guess what operations are likely to be useful. One open question, in my mind, is how often one wants to treat a bag in a set-like way (e.g., in which iterating over the bag would return some members multiple times) versus in a map-like way (where iteration would return each member and its multiplicity as a pair). It's possible I've overemphasized the map-like style of use.
The binary bag operations (bag-sum
etc.) will accept a set
for one operand and convert as necessary.
I have been careful not to add fixnum declarations to those variables and structure slots holding multiplicities. This is out of the thought, probably mistaken, that these bag operations might have some use in cryptography or something where you might want arbitrarily large multiplicities. I may change my mind about this. Feedback invited.
Along with the more familiar sets, bags, sequences, and maps, FSet/CL includes a "dynamic tuple" type. This was something of an experiment, born of the observation that linear searches of short vectors are actually very fast on modern processors because of cache effects. I had seen this approach taken for storing sparse slots in an object system ("sparse" meaning that any given object may have values for only a few of the possible slots), and wondered whether I could put it to use for functional tuples. I also wanted to try my hand at dynamic reordering. I think the result is interesting.
Tuples are abstractly similar to maps, at least in an untyped language
like Lisp, but have a very different usage profile. The big difference is
that the key set is bounded, and indeed normally fixed at compile time. Tuple
keys are thus analogous to slots of classes; they're not arbitrary values. In
fact, there is a class tuple-key
; keys must be instances of this
class. So tuples are really more like structs than maps, except that (a)
update is functional (of course) and (b) there's only one tuple type; any key
can be added to any tuple. Despite that, the representation is sparse: any
given tuple takes only space proportional to the number of pairs it actually
contains.
Here is an example of a situation in which the use of tuples is
appropriate — indeed, highly recommended. Suppose you have instances of
a mutable class, and you want to create a map mapping those instances to some
range type, but rather than the mapping being by identity, you want it to be
by some part of the content of the instances. You might be inclined to define
a compare
method on this class that compares the instances by
comparing an appropriate subset of their attributes. But there's a danger
here. If any of the values of those attributes ever changes after the
instance has been used as a map key, the instance will, in general, no longer
compare the same way against the other instances in the map, thus blowing one
of the key invariants of the map's internal structure and rendering it more
or less useless.
What you should do instead is, for each instance used as a key in this map, make a tuple containing the values of the attributes in question, and then use the tuple as the map key. If you attach the tuple to the instance by a new attribute, then the object can always know what tuple it has been indexed by in the map; and if the values in that tuple differ, at some point, from those in the instance, then you know that the instance values have changed since the instance was last indexed in the map, and you have handy all the information handy to re-index the instance (removing the old mapping, constructing the new tuple, and then adding the new mapping).
The implementation gets its speed by arranging for lookup to be done by
scanning consecutive memory locations (thus taking advantage of cache
effects) rather than by ordered trees. Thus the performance characteristics
of tuples are quite different from those of the other FSet classes; where the
other FSet classes do lookups in log time but with a significant constant
factor, tuple lookups use linear search but with a very small constant factor
— making them faster than even a well-compiled assoc
. This
is partly because of better use of the cache, and partly because of a clever
scheme for dynamic reordering that makes frequently used slots cheaper to
access.
(Note to Lisp implementors: I think tuples, with some minor adjustments, might make a great data structure for deep-bound special variables — shallow binding being incompatible with modern SMP implementations, and ordinary alists being potentially quadratic in recursive algorithms.)