FSet is a functional, set-theoretic collections library for Common Lisp. "Set-theoretic" just means that the types and operations FSet provides are inspired by set theory. "Functional" means that, unlike the types of most collections libraries (and Common Lisp arrays and hash tables), FSet collections are not mutable objects that are updated by side-effect; instead, the various FSet update operations return a new collection, leaving their operand(s) unchanged. Thus FSet brings the benefits of the functional programming style to collection-related code.
What those benefits are is the subject of a separate essay (and many other writings by other people). In this tutorial I will try to show some of them by example. We have just a few preliminaries before we get started. First, this tutorial is addressed to those who have acquired at least some familiarity with Common Lisp; it's not the place to start for newcomers to the language. Second, you will want to have a copy of FSet running so you can follow along. Use Quicklisp to install it:
(ql:quickload "fset")
And thirdly, I have to explain one detail that pervades the examples.
Interactive experimentation is an important feature of Lisp, and a natural
part of interactive use is using variables to hold interesting values.
Unfortunately, some CL implementations don't treat interactive
setq
of a previously undeclared variable in a manner most
conducive to interactive use; they issue warnings and/or declare the variable
special, neither of which is desirable in this situation. Accordingly, FSet
(actually Misc-Extensions, a library FSet uses) provides a macro
isetq
which does what we want: it sets the variable without
complaint and without declaring it special. Remember that isetq
is for interactive use only, not in programs! And if you are using an
implementation that accepts interactive setq
of a fresh variable
without complaint, you can use setq
instead of
isetq
when working through the examples.
FSet provides four major types: sets, maps, seqs, and bags. Sets are the
most basic: a set is an unordered collection of unique members. (FSet imposes
an ordering, but this is a standard ordering FSet uses for its own purposes,
and can't be changed for a particular set instance.) Maps are perhaps the
most useful: a map, like a hash table, associates each of a set of keys with
a unique value. Seqs are FSet's sequences; I use a shortened form of the word
"sequence" because sequence
is, of course, already a CL type
name. And finally, bags (another term is "multisets") are somewhat like sets
except that a bag remembers the number of times each member has been added to
it.
The examples assume one is typing to a REPL. User input is shown in bold. To follow along, which I highly recommend, assuming you have installed and loaded FSet, first do this:
(in-package fset-user)
The simplest way to create a set is with the function
empty-set
:
(isetq es (empty-set)) #{ }
Now we can add an element to the set with with
:
(isetq s (with es 3)) #{ 3 }
You can see that the with
function has returned a new set
without modifying the original:
es #{ }
The function empty?
tells you whether a set is empty. (FSet
uses the trailing question mark rather than the older -p
convention for the names of boolean-valued functions.)
(empty? s) NIL (empty? es) T
If the set is not empty, arb
will give you one of its
members:
(arb s) 3 T
Note that arb
is not guaranteed to give you any particular
member, nor is its job to give you a "randomly selected" member. Its only
postcondition is to give you some member of the set. (The second
value of T
indicates that the set was, in fact, nonempty; FSet's
philosophy is to use multiple values to indicate special cases, when that
makes sense, instead of signalling errors.)
Having obtained a member of the set, we can remove it with
less
:
(less s 3) #{ }
This may not seem too exciting yet, but it is already enough to write certain algorithms which have this general form:
(let ((workset (some-set))) (loop while (not (empty? workset)) (let ((x (arb workset))) (setq workset (less workset x)) (some-operation-on x) (dolist (y (successors-of-some-kind x)) (if (some-predicate y) (setq workset (with workset y)))))))
Another important function on sets is size
, which returns the
number of members:
(size s) 1
To test membership of a particular value, there's
contains?
:
(contains? s 2) NIL (contains? s 3) T
Constructing literal sets by repeated applications of with
gets tedious, so FSet provides a constructor macro which is simply called
set
. (We have shadowed cl:set
, which is archaic
anyway, in the fset-user
package.) For example:
(set) #{ } (isetq s1 (set 1 2 3)) #{ 1 2 3 } (isetq s2 (set 'c 1 "foo" 'a 'c)) #{ 1 A C "foo" }
You can see four important things from this:
The set
constructor macro has a syntax for including all the
elements of another set (this is why it's a macro):
(set ($ s1) 'x) #{ 1 2 3 X }
Okay, given a couple of sets we can start to do some more interesting things with them.
(equal? s1 (set 3 2 1)) T (union s1 s2) #{ 1 2 3 A C "foo" } (intersection s1 s2) #{ 1 } (set-difference s1 s2) #{ 2 3 } (set-difference-2 s1 s2) #{ 2 3 } #{ A C "foo" }
(That last one is returning both differences as two values.)
FSet provides alternate versions of many CL functions, which are intended
to be shadowing-imported into your code package (and are, obviously, into
fset-user
). Most of these are CLOS generic functions, with
compatibility methods so they will do the same thing as the CL builtins if
invoked on non-FSet types. However, you see we have a separate predicate
equal?
rather than shadowing cl:equal
.
There are also a couple of useful predicates on pairs of sets:
(subset? (set 1 2) s1) T (subset? s1 (set 1 2)) NIL (disjoint? s1 (set 1 2)) NIL (disjoint? s1 (set 4 5)) T
You can create an empty map with empty-map
; with
on a map takes two arguments:
(empty-map) #{| |} (isetq m1 (with (empty-map) 'x 1)) #{| (X 1) |}
Retrieving the value associated with a map key is done with
lookup
or its synonym @
(which to use is a
stylistic choice I'll discuss further below; herein I'll mostly use
@
):
(lookup m1 'x) 1 T (@ m1 'x) 1 T (@ m1 'y) NIL NIL
As you see, these functions return a second value indicating whether the
key was in the map. This is in case you're using nil
as a range
element; since such usage is rare, most of the time you can ignore the second
value. Also as you see, the "default default" for the primary value, when the
key is not in the map, is nil
. The default can be changed,
though, either by supplying an argument to empty-map
when first
creating the map, or later by using with-default
:
(isetq m2 (with (empty-map 0) 'x 1)) #{| (X 1) |}/0 (@ m2 'x) 1 T (@ m2 'y) 0 NIL (with-default m1 3) #{| (X 1) |}/3
(When the default is not nil
, it is printed after the map,
separated by a slash.)
There is also a constructor macro for maps called map
-- the
obvious choice of name except that it requires shadowing cl:map
,
which is a little unfortunate (but see the discussion of GMap below). The
map
macro has a more complex syntax than set
since
it accepts pairs:
(isetq m3 (map ('a 1) ('b 2))) #{| (A 1) (B 2) |}
Note that the sublists immediately within the map
form are
not forms, but pairs of forms (e.g. ('a 1)
is not evaluated, but
'a
and 1
are evaluated separately). The macro also
supports the $
syntax for including the pairs of another map:
(isetq m4 (map ('b 3) ('c 4))) #{| (B 3) (C 4) |} (isetq m5 (map ($ m3) ('D 42) ($ m4))) #{| (A 1) (B 3) (C 4) (D 42) |}
As you see, when two of the maps included with $
have pairs
with the same key, the one on the right wins.
There are some interesting and powerful operations to combine maps.
map-difference-2
returns, as two values, the mappings in its
first argument that are not in its second, and conversely:
(map-difference-2 m3 m5) #{| (B 2) |} #{| (B 3) (C 4) (D 42) |}
The functions map-union
and map-intersection
return a map whose domain is respectively the union or intersection of the
domains of the arguments. To compute the range value for each key that
appears on both maps, they call a function supplied as their optional third
argument; this function defaults to one that just returns its second
argument, so the second map shadows the first one:
(map-union m3 m4) #{| (A 1) (B 3) (C 4) |} (isetq m6 (with-default (map ('a (set 1)) ('b (set 2))) (set))) #{| (A #{ 1 }) (B #{ 2 }) |}/#{ } (isetq m7 (with-default (map ('b (set "x")) ('c (set "y"))) (set))) #{| (B #{ "x "}) (C #{" y" }) |}/#{ } (map-union m6 m7 #'union) #{| (A #{ 1 }) (B #{ 2 "x "}) (C #{ "y "}) |}/#{ }
"Seq" is what FSet calls a sequence, since the type sequence
is already used by CL. As you can guess from looking at the section numbering
in this tutorial :), and to be consistent with CL itself, seq indexing is
0-origin. (If you find this comment odd because you've never encountered a
1-origin language, consider yourself fortunate. They are getting rare these
days.)
As you can also guess by now, seq creation can be done with
empty-seq
or the seq
constructor macro:
(empty-seq) #[ ] (isetq q1 (seq 1 2 'a)) #[ 1 2 A ] (isetq q2 (seq "x" ($ q1) "y")) #["x" 1 2 A "y"]
Seqs have operations to access, add, and remove elements at the ends:
(first q1) 1 T (last q1) A T (with-first q1 3) #[ 3 1 2 A ] (with-last q1 7) #[ 1 2 A 7 ] (less-first q2) #[ 1 2 A "y" ] (less-last q2) #[ "x" 1 2 A ]
For indexing purposes, a seq is like a map whose keys are integers.
Indexing is done with @
(or lookup
):
(@ q1 2) A T (@ q1 17) NIL NIL
As you see, we again use a second value to indicate whether the index was in bounds. Like maps, seqs allow you to specify the default to be returned for an out-of-bounds index:
(setq q2 (with-default q2 0)) #[ "x" 1 2 A "y" ]/0 (@ q2 47) 0 NIL
There are two different ways to get a new element into a seq:
with
updates an element at a specific index (it can also be used
to append an element); insert
inserts an element at the given
index:
(with q1 1 7) #[ 1 7 A ] (insert q1 1 7) #[ 1 7 2 A ] (with q1 3 42) #[ 1 2 A 42 ]
If with
is given an out-of-bounds index, it extends the
sequence as needed by inserting copies of the default:
(with q2 8 "yow!") #[ "x" 1 2 A "y" 0 0 0 "yow!" ]/0
To delete the element at a given index, use less
:
(less q1 1) #[ 1 A ]
More interesting operations on seqs are subseq
and
concat
:
(isetq q3 (concat q1 q2)) #[ 1 2 A "x" 1 2 A "y" ] (subseq q3 1 5) #[ 2 A "x" 1 ]
Seqs also support the generic CL sequence operations find
,
position
, etc.
A bag (sometimes called a multiset) is much like a set except that it associates with each member a natural number (nonnegative integer) called the multiplicity, which represents the number of occurrences of the member in the bag. So, for instance, if you start with an empty bag and add a member to it twice, the resulting bag will contain that member with multiplicity 2. Removing the same member once leaves it in the bag with multiplicity 1.
The basics should be familiar by now (once you get used to the print
syntax, at least; the #%
inclusions are for members with
multiplicity greater than 1):
(empty-bag) #{% %} (isetq b1 (bag 1 2 1)) #{% #%(1 2) 2 %} (with b1 3) #{% #%(1 2) 2 3 %} (less b1 1) #{% 1 2 %} (bag ($ b1) 2 3) #{% #%(1 2) #%(2 2) 3 %} (isetq b2 (bag 2 2 3)) #{% #%{2 2} 3 %} (union b1 b2) #{% #%(1 2) #%(2 2) 3 %} (intersection b1 b2) #{% 2 %} (bag-sum b1 b2) #{% #%(1 2) #%(2 3) 3 %} (bag-product b1 b2) #{% #%(2 2) %} (size b1) 3 (set-size b1) 2
union
and intersection
take the max
and min
of the multiplicities of each element, respectively;
bag-sum
and bag-product
are also available. The
standard operation size
returns the total multiplicity; to get
the number of unique members, use set-size
.
This section introduces some idioms and convenience features that make programming with FSet easier, and the resulting code more elegant. It also explains how to extend FSet to handle your own types.
Although CL arrays and hash tables are useful only as mutable structures, and lists certainly can be treated as mutable also, I'd like to point out that the functional use of lists is quite common in CL code. For instance, constructing a list by consing things onto it treats the list functionally: each cons remains unmodified after its creation. And there are many commonly used CL operations that operate on lists without modifying them. The point here is that functional collections are not weird; they're really quite familiar.
FSet fully supports nesting of collections. Sets of sets, maps keyed by sequences of bags, ... all these combinations Just Work with no further effort on your part. Such types don't necessarily arise frequently during ordinary application programming, but they do show up in compiler and static analysis work. The state table of an LALR parser generator, for instance, is basically a map keyed by sets of sequences (the LR(0) items); such a type can be manipulated quite directly and elegantly in FSet.
FSet provides several macros which are handy for updating variables that
hold collections. (Their names all end in "f", following setf
.)
These might appear at first glance to be mutating operators, but of course,
they're not:
(isetq s (set 1 2)) #{ 1 2 } (isetq s1 s) #{ 1 2 } (adjoinf s 3) #{ 1 2 3 } s1 #{ 1 2 }
Does this seem strange? It's the same behavior you would see with
incf
and push
, for instance, two extremely common
CL macros. You see? Functional collections really aren't as weird as they may
sound.
Some of these also work with maps, and there's a setf
method
for @
(or lookup
). The ability, mentioned above, to
specify a default for the map is very handy here:
(isetq m2 (empty-map 0)) #{| |}/0 (incf (@ m2 'x)) 1 m2 #{| (x 1) |}/0 (isetq m3 (empty-map (empty-set))) #{| |}/#{} (adjoinf (@ m3 'foo) "bar") #{ "bar" } m3 #{| (FOO #{ "bar" }) |}/#{}
This works to multiple levels too, e.g., a map from something to maps from something else to sets.
FSet has its own CLOS generic function, convert
, to handle
conversions; this GF has a plethora of methods already, and more can
certainly be added. Its argument order is the reverse of
cl:coerce
: it takes the target type (a symbol) as its first
argument, and uses eql
-dispatch on it (see the code).
A few methods take additional keyword parameters to specify some aspect of
the conversion. For example, there are two ways to represent a bag as a list:
either as a flat list possibly with repeated elements, or as an alist where
the cdr of each pair gives the multiplicity. When converting from an FSet
bag, you can specify the target type as either list
or
alist
to select the desired representation; but
alist
is of course not a CL type, and can't be dispatched on, so
to convert from an alist to a bag, you would say :from-type
'alist
.
A brief summary of the important conversions:
The first thing to be said about iteration in FSet is that if the purpose
of iterating over a collection is to compute a value (which could be a new
collection), there is probably no need to iterate explictly at all. The
functions image
, filter
, and reduce
provide very elegant and clear ways to do implicit iteration to compute a
value.
On a set or sequence, image
applies a function to each member
and returns respectively a set or sequence of the results. It's similar to
good old cl:mapcar
, except that the image of a set is another
set, which will be smaller if multiple members of the original set map to the
same value:
(image (lambda (x) (if (oddp x) (1+ x) x)) (set 1 2 3 4)) #{ 2 4 } (image #'1+ (seq 1 2 3 4)) #[ 2 3 4 5 ]
filter
applies a predicate to each member of a set or
sequence and returns a set or sequence of those members for which the
predicate returned true (like cl:remove-if-not
):
(filter #'oddp (set 1 2 3 4)) #{ 1 3 } (filter #'oddp (seq 4 3 2 1)) #[ 3 1 ]
And reduce
is a generic version of cl:reduce
:
(reduce #'union (set (set 1 2) (set 2 3) (set 3 4))) #{ 1 2 3 4 } (reduce #'concat (seq (seq 1 2) (seq 2 3) (seq 3 4))) #[ 1 2 2 3 3 4 ]
If you're not used to using these operators, I highly recommend practicing at every opportunity — as I say, any time you're iterating over a collection to compute a value. They're an important part of the FSet style.
do-set
etc.)For imperative iteration, FSet provides the iteration macros
do-set
, do-seq
, do-map
,
do-bag
, and do-bag-pairs
(all analogous to
dolist
). These are all pretty straightforward.
do-map
takes two variables, which are bound to the key and value
of successive pairs:
(do-map (x y my-map) (format t "~A --> ~A~%" x y))
For each member of the bag, do-bag
executes its body as many
times as the multiplicity of that member. If you want the body executed only
once per member, use do-bag-pairs
, which gives you the
multiplicity in a second variable, rather like do-map
.
GMap is an iteration macro I first developed as an undergraduate in 1980.
I forgot about it for some years, but when I dusted it off a few years ago it
still seemed well-designed to me. It implements the map-reduce paradigm,
which is gaining currency lately (though it's not a distributed
implementation like Google's MapReduce). You can think of it as a
generalization of cl:map
(which it predated), which is in turn a
generalization of good old mapcar
. It also bears some relation
to Waters' Series package; it's not as powerful (the implementation is far
simpler) and yet it handles a lot of common cases. It's somewhat redundant
with image
/filter
/reduce
, but can do
some things they can't, like parallel iteration of two or more
collections.
The basic idea is simple: we start with something resembling
mapcar
, and then add just enough syntax to allow us to specify
different ways to generate each argument of the function being mapped, and
different ways to collect the results. Some examples:
(gmap :sum #'* (:index 0 3) (:list '(7 3 5))) 13 (gmap :set #'list (:index 0 3) (:vector #(q r s))) #{ (0 Q) (1 R) (2 S) } (gmap :map #'values (:index 0 3) (:seq (seq 'b 'k 'n))) #{| (0 B) (1 K) (2 N) |}
For complete documentation and more examples see gmap.lisp
,
which is in the Misc-Extensions subsystem loaded by FSet. FSet defines new
argument- and result-types :set
, :map
,
:seq
, and :bag
.
Another macro in Misc-Extensions, also dating back to 1980, is used
heavily in the FSet implementation, and you may find you want to use it also.
This is my generalization of let
. It makes working with multiple
values much more convenient; it also generalizes let
and
let*
into a construct that can do both parallel and serial
binding in any combination. Check it out, in new-let.lisp
. This
macro is upward-compatible with cl:let
; it is exported as both
let
and nlet
, so you don't have to shadow
cl:let
if you don't want to.
Another way you can iterate over collections in FSet is by using the
provided iterators. The generic function iterator
returns a
closure of one argument: if passed :more?
it returns true iff
more values are available; if passed :done?
it returns true iff
more values are not available; if passed :get
and more
values are available it returns the next value (or, on a map, the next pair
as two values) and an additional true value; if passed :get
when
no more values are available, returns two or three nil
values.
Note that the iterator is a stateful object, unlike most everything else
in FSet. If you want an iterator that behaves functionally rather than
statefully, your best bet is to convert
the collection to a
list.
Still another way you can iterate in FSet is by indexing on a
seq. Sets, maps, and bags support an analogous operation called
at-rank
which takes an integer and returns the member or pair at
that rank within the collection's ordering. You can use rank
to
find the rank of a member, or for non-members, the rank it would have if it
were in the collection. This is currently the only way to iterate over a
subinterval of a collection.
Although supporting several different kinds of equality is something of a
Lisp tradition, FSet relies on a single, global equality predicate,
equal?
. On the built-in Lisp types supported by FSet it behaves
mostly like equal
, but unlike the latter, equal?
is
extensible for new types. We'll see how that works in a moment. For now I
just want to make the point that FSet is not like the CL sequence interface,
where you can specify the equivalence relation to use on each operation. In
principle it would be possible for FSet to let you specify the equivalence
relation used by a particular collection, but currently it doesn't even
support this (it might someday). So, if you want to control how the members
of a set are distinguished (and also how they're ordered), you need to define
your own type.
The equality predicate and the ordering relation are connected in this
way. The ordering relation is a CLOS generic function compare
,
which takes two arguments and returns one of the symbols :less
,
:greater
, :equal
, or :unequal
. So
equal?
just calls compare
and returns true if the
result is :equal
.
The compare
results :less
and
:greater
are straightforward — they indicate that FSet
should order its first argument before or after the second, respectively. In
rare cases it is necessary to use the fourth possible value,
:unequal
, to indicate that while the two values are not equal,
the relevant compare
method is unable to assign an ordering to
them. (We will see some examples shortly.) FSet goes to some trouble to
handle this case correctly, and does so with only a small loss of performance
as long as the sets of mutually :unequal
values remain small.
FSet already has compare
methods for the following CL
types:
null
: the type of one object, nil
real
: the type of rationals (integers and ratios) and the
various float types; these are ordered by <
, of course,
but distinguished by eql
, meaning that comparing a rational
and a float of equal value returns :unequal
, as does
comparing two floats of equal value but different representationscharacter
symbol
; distinct uninterned symbols of the same name
compare :unequal
, as do symbols of the same name of
:unequal
packages (see below; this is a very rare case)string
: compared by length first, then
lexicographicallyvector
: also compared by length first, then
lexicographicallylist
: compared lexicographically; dotted lists are
supportedpackage
pathname
There is a potential problem in that some of these types are mutable in CL: strings, vectors, and lists. It is very important not to modify these while they're in FSet collections! Doing so will probably break the ordering invariant that FSet relies on. There are some situations in which you can get away with it, but my suggestion is to avoid it generally.
There is a rare situation you will probably never encounter, but I think
it's instructive to understand it because it shows how mutability can sneak
in in ways you might not expect, and it's also useful to see how it can be
handled. Symbols are compared by first comparing their packages; if the
packages are distinct, they are compared by name. Sounds reasonable, yes? But
there's a small matter of the rename-package
operation CL
provides. Use of this operation could change the ordering of packages and
therefore of symbols — except that FSet protects itself, by squirreling
away the original name of the package. This means that if you have a package
foo
, rename it to bar
, and then create a new
package named foo
, the two packages will compare ... you've
guessed it ... :unequal
. (I've oversimplified slightly here; see
the code for the details.)
If you mix types within a set, as I have in some of the examples in section 1, you will also see the effects of the cross-type comparison methods, which control the ordering of values of different types. FSet defines these for the built-in types it supports; the ordering is the same as that of the above list. For user-defined types, it falls back to comparing the type names. We will see in a moment how to make cross-type comparisons more efficient for new types.
One more point about these built-in compare
methods. While
those for simple types like numbers supply orderings that will likely be
generally useful in application code, that's not their purpose. In
particular, note that a shorter string will always be ordered before a longer
one, independent of their contents; only if the lengths are equal do we look
at the contents. That's not an ordering that's likely to be useful to
application code, but doing it this way makes sets of strings or symbols, or
maps keyed by strings or symbols, much more efficient in the common case that
they are of varied lengths. In general, I would discourage you from counting
on FSet's built-in orderings; if you want a set of strings, say, in a
particular order, wrap the strings in a struct or class type and specify the
ordering for that type yourself, as we will now see how to do.
Here we discuss how to extend FSet so you can use your own struct and class types in FSet collections.
The first question to ask of such a type is whether it models an immutable value or a mutable object. If it's a value, you will want to index it by content; if it's an object, you should almost certainly index it by identity. The danger of indexing a mutable object by content is that that content could change while the object is in an FSet collection, invalidating the latter's ordering invariant.
In general, to tell FSet about your type you need to define a method on
compare
. In most cases, this method should return either
:equal
, :less
, or :greater
, but there
might be rare cases when it's necessary to return :unequal
.
Suppose, for instance, you are determining the order in which two values
should go by hashing their representations and then comparing the hash
values. Since it's possible for distinct contents to hash to the same hash
value, your compare
method should return :unequal
in that case.
FSet provides several convenience facilities to assist you in interfacing
your own types to it. For value types, take a look at
compare-slots
: it takes the two values to be compared and (as a
rest argument) a list of accessors; it applies the first accessor to each
value and compares the results; if that comparison returns :less
or :greater
it returns that result, else it moves on to the
second accessor, etc. Here's an example:
(defclass point () ((x :accessor point-x) (y :accessor point-y))) (defmethod compare ((p1 point) (p2 point)) (compare-slots p1 p2 'x 'y)) ;;; or, if you prefer: (defmethod compare ((p1 point) (p2 point)) (compare-slots p1 p2 #'point-x #'point-y))
Here the points will be ordered first by X coordinate, then by Y coordinate. For each slot, you can specify it either as a quoted slot name or as a function-quoted accessor name; or, you can supply an arbitrary function on your type.
The second convenience facility is for (mutable) object classes: all you
have to do, if your class is a standard (CLOS) class, is to supply
identity-ordering-mixin
as one of its superclasses. That's
it!
The third convenience facility is a macro for defining cross-type comparison methods; simply add, after your type definition, a form like
(define-cross-type-compare-methods point)
While FSet has a fallback mechanism that orders otherwise unordered types by name, it is more efficient, if you are actually going to be mixing types within a collection, to define the cross-type methods explicitly. For types that will not be in mixed collections, this doesn't matter.
The fourth convenience facility is for cases where, for the purposes of
your application, you specifically want lexicographic comparison of strings
or other sequences. The generic function
compare-lexicographically
takes two strings, CL sequences, or
seqs, and returns the result of a lexicographic comparison. To use it, define
a new class that holds the string or other sequence in question, and in the
compare
method for that type, call
compare-lexicographically
.
Let me get one thing out of the way. No functional collection implementation can be as fast as well-implemented imperative collections, for the simple reason that the functional implementation has to do more allocation and therefore puts a greater load on the garbage collector. The goal of FSet isn't to be the fastest collection implementation period; it's to be fast enough to make functional collections a reasonable choice for ordinary application programming. I'll say more about this below.
Because FSet is implemented with balanced binary trees, it achieves good time complexity across a wide variety of operations. Of particular note:
union
etc.)
all run in linear time (or better -- see below).subseq
and concat
take
logarithmic time; you can use them freely without worrying about the
copying cost you would pay with CL sequences.FSet also uses a heterogeneous tree design that has much less space overhead than the more common homogeneous trees.
Time complexities of common operations:
These operations run in constant time or amortized constant time:
empty?
, size
, set-size
,
arb
, all operations on an iterator
These operations run in time logarithmic in the size of the collection:
contains?
, lookup
(aka @
),
with
, less
, subseq
,
concat
These operations run in time linear in the size of the collection or the
total sizes of the two collections: equal?
, union
,
intersection
, set-difference
,
set-difference-2
, map-union
,
map-intersection
, map-difference
These operations run in O(n log n) time:
sort
, stable-sort
There is an important optimization that applies to the
collection-combining algorithms. While they run in linear time in the normal
case, they can run faster than that in the common special case that the
collections being combined are historically related, in the sense that one
has been computed from the other or both have been computed from a common
ancestor. In the extreme case, if their arguments are eq
, they
run in constant time. If one argument has been computed by applying a small
number of with
, less
, or update operations to the
other (or both have been computed this way from a common ancestor), they run
in logarithmic time. This means, for example, you can take a large map, hand
it to someone, have them make a point modification and hand it back to you,
and then map-difference
will tell you what they did to it in log
time! And even if they make many changes to it, or construct a completely new
map and hand it back to you, the worst that can happen is that the algorithm
will fall back to linear time.
FSet provides an additional type, tuple
. This type is
somewhat experimental; its usefulness is not clearly established. Tuples are
logically similar to maps, but intended for use in situations where the keys
are all compile-time constants (and thus there is a bounded set of them). See
the comments in tuples.lisp
for more information.
You will have noticed already a major difference in design philosophy
between CL's collections and FSet's. Besides being functional, the FSet
collections are semantically loaded: each collection knows what kind
it is (set, seq, map, bag) and, as previously discussed, knows what
equivalence relation it uses (since there is a single global equivalence
relation). In contrast, CL collections are just bare data structures; you
have to remember the intended semantics — for instance, is a given list
a sequence, a set, an alist, etc. — and if you're using an equivalence
relation other than eql
, you have to remember that too, and
supply it explicitly to each operation. This semantic loading makes FSet
collections easier and more elegant to use. It does require, however, that
you decide in advance what kind of collection you're creating, a decision CL
allows you to defer to some extent.
Again, FSet's goal is not to be the fastest collection library — it can't compete with imperative collections. Rather, it is designed to be a good default choice for ordinary application programming: the thing you use when you don't have a specific reason to do something different.
A good analogy is generic arithmetic. In CL, most of the time that you write an arithmetic operation (in ordinary application code, at least), you use generic arithmetic. Users of other languages look at that and think that the overhead must make Lisp slow; but we know that, first, there's an awful lot of code in the world that isn't particularly performance-sensitive, and second, for those few places where it matters, Lisp provides ways to reduce or eliminate the overhead, such as type declarations and specialized arrays.
So I would encourage you to think about FSet the same way you think of generic arithmetic. Yes, there's some overhead, but there are benefits in convenience, elegance, and generality that usually outweigh the overhead. And when that's not the case, Lisp provides alternatives that are less general, harder to use, but higher in performance.
By the way, I wouldn't suggest that the FSet types supplant all uses of lists. Lists are very convenient for certain situations. I certainly still find uses for them even in FSet-heavy code; but they are one tool in the toolbox rather than the primary tool.
One place where functional collections are particularly useful is at interfaces. The problem with passing mutable collections across interfaces is the possibility of inadvertent modification of a shared structure. If I pass you a mutable set that I am also using internally, I have to trust that you will not modify it; if you overlook this requirement (or I fail to document it), and treat the set as your own datum that you can modify for your own purposes, you're likely to break my code. The only easy way to prevent this, with mutable collections, is to copy the set before I hand it to you; but that's an unpalatable linear-time operation. If, on the other hand, I'm using functional collections, there's no problem: I can hand you the same set I'm using internally, knowing you can't inadvertently modify my copy.
This issue probably arises less in Lisp than it would in, say, Java, because (as I commented above) it's quite common in Lisp to treat lists as functional collections; while it's certainly possible for you to modify a list I've passed to you, you're more likely to be aware that that could be a bad idea. Still, it could come up with a vector or hash table.
I expect there will be two primary ways that FSet gets used; I'll call them "FSet-light" and "FSet-heavy". In the "light" mode of use, FSet types and operations will be sprinkled into code — perhaps, code that was originally written without FSet — but the code will mostly use Lisp collections. In the "heavy" mode, on the other hand, code uses FSet for most or all of its collections.
There are some choices to be made based on this difference. The major one
is whether to import the external FSet symbols into your package. This is
clearly desirable for FSet-heavy code (see the defpackage
:fset-user
form in fset.lisp
for an example of how to do
this, including a list of symbols you will need to
shadowing-import
from either fset
or
cl
). For FSet-light code, on the other hand, it may be better
not to do this, but to refer to FSet symbols with an explicit
fset:
package prefix; this is particularly true when FSet is
being added to existing code (and maybe doubly true on a large project with
multiple developers).
A small choice is whether to use lookup
or @
. In
FSet-light code you should probably use lookup
as the name will
be easier to recognize, but in FSet-heavy code I think you may as well use
@
(I do).