Tutorial: FSet for Common Lisp

0 Introduction

FSet is a functional, set-theoretic collections library for Common Lisp. "Set-theoretic" just means that the types and operations FSet provides are inspired by set theory. "Functional" means that, unlike the types of most collections libraries (and Common Lisp arrays and hash tables), FSet collections are not mutable objects that are updated by side-effect; instead, the various FSet update operations return a new collection, leaving their operand(s) unchanged. Thus FSet brings the benefits of the functional programming style to collection-related code.

What those benefits are is the subject of a separate essay (and many other writings by other people). In this tutorial I will try to show some of them by example. We have just a few preliminaries before we get started. First, this tutorial is addressed to those who have acquired at least some familiarity with Common Lisp; it's not the place to start for newcomers to the language. Second, you will want to have a copy of FSet running so you can follow along. Use Quicklisp to install it:

(ql:quickload "fset")

And thirdly, I have to explain one detail that pervades the examples. Interactive experimentation is an important feature of Lisp, and a natural part of interactive use is using variables to hold interesting values. Unfortunately, some CL implementations don't treat interactive setq of a previously undeclared variable in a manner most conducive to interactive use; they issue warnings and/or declare the variable special, neither of which is desirable in this situation. Accordingly, FSet (actually Misc-Extensions, a library FSet uses) provides a macro isetq which does what we want: it sets the variable without complaint and without declaring it special. Remember that isetq is for interactive use only, not in programs! And if you are using an implementation that accepts interactive setq of a fresh variable without complaint, you can use setq instead of isetq when working through the examples.

1 The Major FSet Types

FSet provides four major types: sets, maps, seqs, and bags. Sets are the most basic: a set is an unordered collection of unique members. (FSet imposes an ordering, but this is a standard ordering FSet uses for its own purposes, and can't be changed for a particular set instance.) Maps are perhaps the most useful: a map, like a hash table, associates each of a set of keys with a unique value. Seqs are FSet's sequences; I use a shortened form of the word "sequence" because sequence is, of course, already a CL type name. And finally, bags (another term is "multisets") are somewhat like sets except that a bag remembers the number of times each member has been added to it.

The examples assume one is typing to a REPL. User input is shown in bold. To follow along, which I highly recommend, assuming you have installed and loaded FSet, first do this:

(in-package fset-user)

1.0 Sets

The simplest way to create a set is with the function empty-set:

(isetq es (empty-set))
#{ }

Now we can add an element to the set with with:

(isetq s (with es 3))
#{ 3 }

You can see that the with function has returned a new set without modifying the original:

es
#{ }

The function empty? tells you whether a set is empty. (FSet uses the trailing question mark rather than the older -p convention for the names of boolean-valued functions.)

(empty? s)
NIL
(empty? es)
T

If the set is not empty, arb will give you one of its members:

(arb s)
3
T

Note that arb is not guaranteed to give you any particular member, nor is its job to give you a "randomly selected" member. Its only postcondition is to give you some member of the set. (The second value of T indicates that the set was, in fact, nonempty; FSet's philosophy is to use multiple values to indicate special cases, when that makes sense, instead of signalling errors.)

Having obtained a member of the set, we can remove it with less:

(less s 3)
#{ }

This may not seem too exciting yet, but it is already enough to write certain algorithms which have this general form:

(let ((workset (some-set)))
  (loop while (not (empty? workset))
    (let ((x (arb workset)))
      (setq workset (less workset x))
      (some-operation-on x)
      (dolist (y (successors-of-some-kind x))
        (if (some-predicate y)
          (setq workset (with workset y)))))))

Another important function on sets is size, which returns the number of members:

(size s)
1

To test membership of a particular value, there's contains?:

(contains? s 2)
NIL
(contains? s 3)
T

Constructing literal sets by repeated applications of with gets tedious, so FSet provides a constructor macro which is simply called set. (We have shadowed cl:set, which is archaic anyway, in the fset-user package.) For example:

(set)
#{ }
(isetq s1 (set 1 2 3))
#{ 1 2 3 }
(isetq s2 (set 'c 1 "foo" 'a 'c))
#{ 1 A C "foo" }

You can see four important things from this:

FSet handles many builtin Lisp types (we'll get to the whole list later)
mixing types in a collection is no problem (of course, this being Lisp)
sets ignore duplication of members
FSet imposes its own ordering on sets

The set constructor macro has a syntax for including all the elements of another set (this is why it's a macro):

(set ($ s1) 'x)
#{ 1 2 3 X }

Okay, given a couple of sets we can start to do some more interesting things with them.

(equal? s1 (set 3 2 1))
T
(union s1 s2)
#{ 1 2 3 A C "foo" }
(intersection s1 s2)
#{ 1 }
(set-difference s1 s2)
#{ 2 3 }
(set-difference-2 s1 s2)
#{ 2 3 }
#{ A C "foo" }

(That last one is returning both differences as two values.)

FSet provides alternate versions of many CL functions, which are intended to be shadowing-imported into your code package (and are, obviously, into fset-user). Most of these are CLOS generic functions, with compatibility methods so they will do the same thing as the CL builtins if invoked on non-FSet types. However, you see we have a separate predicate equal? rather than shadowing cl:equal.

There are also a couple of useful predicates on pairs of sets:

(subset? (set 1 2) s1)
T
(subset? s1 (set 1 2))
NIL
(disjoint? s1 (set 1 2))
NIL
(disjoint? s1 (set 4 5))
T

1.1 Maps

You can create an empty map with empty-map; with on a map takes two arguments:

(empty-map)
#{| |}
(isetq m1 (with (empty-map) 'x 1))
#{| (X 1) |}

Retrieving the value associated with a map key is done with lookup or its synonym @ (which to use is a stylistic choice I'll discuss further below; herein I'll mostly use @):

(lookup m1 'x)
1
T
(@ m1 'x)
1
T
(@ m1 'y)
NIL
NIL

As you see, these functions return a second value indicating whether the key was in the map. This is in case you're using nil as a range element; since such usage is rare, most of the time you can ignore the second value. Also as you see, the "default default" for the primary value, when the key is not in the map, is nil. The default can be changed, though, either by supplying an argument to empty-map when first creating the map, or later by using with-default:

(isetq m2 (with (empty-map 0) 'x 1))
#{| (X 1) |}/0
(@ m2 'x)
1
T
(@ m2 'y)
0
NIL
(with-default m1 3)
#{| (X 1) |}/3

(When the default is not nil, it is printed after the map, separated by a slash.)

There is also a constructor macro for maps called map -- the obvious choice of name except that it requires shadowing cl:map, which is a little unfortunate (but see the discussion of GMap below). The map macro has a more complex syntax than set since it accepts pairs:

(isetq m3 (map ('a 1) ('b 2)))
#{| (A 1) (B 2) |}

Note that the sublists immediately within the map form are not forms, but pairs of forms (e.g. ('a 1) is not evaluated, but 'a and 1 are evaluated separately). The macro also supports the $ syntax for including the pairs of another map:

(isetq m4 (map ('b 3) ('c 4)))
#{| (B 3) (C 4) |}
(isetq m5 (map ($ m3) ('D 42) ($ m4)))
#{| (A 1) (B 3) (C 4) (D 42) |}

As you see, when two of the maps included with $ have pairs with the same key, the one on the right wins.

There are some interesting and powerful operations to combine maps. map-difference-2 returns, as two values, the mappings in its first argument that are not in its second, and conversely:

(map-difference-2 m3 m5)
#{| (B 2) |}
#{| (B 3) (C 4) (D 42) |}

The functions map-union and map-intersection return a map whose domain is respectively the union or intersection of the domains of the arguments. To compute the range value for each key that appears on both maps, they call a function supplied as their optional third argument; this function defaults to one that just returns its second argument, so the second map shadows the first one:

(map-union m3 m4)
#{| (A 1) (B 3) (C 4) |}
(isetq m6 (with-default (map ('a (set 1)) ('b (set 2))) (set)))
#{| (A #{ 1 }) (B #{ 2 }) |}/#{ }
(isetq m7 (with-default (map ('b (set "x")) ('c (set "y"))) (set)))
#{| (B #{ "x "}) (C #{" y" }) |}/#{ }
(map-union m6 m7 #'union)
#{| (A #{ 1 }) (B #{ 2 "x "}) (C #{ "y "}) |}/#{ }

1.2 Seqs

"Seq" is what FSet calls a sequence, since the type sequence is already used by CL. As you can guess from looking at the section numbering in this tutorial :), and to be consistent with CL itself, seq indexing is 0-origin. (If you find this comment odd because you've never encountered a 1-origin language, consider yourself fortunate. They are getting rare these days.)

As you can also guess by now, seq creation can be done with empty-seq or the seq constructor macro:

(empty-seq)
#[ ]
(isetq q1 (seq 1 2 'a))
#[ 1 2 A ]
(isetq q2 (seq "x" ($ q1) "y"))
#["x" 1 2 A "y"]

Seqs have operations to access, add, and remove elements at the ends:

(first q1)
1
T
(last q1)
A
T
(with-first q1 3)
#[ 3 1 2 A ]
(with-last q1 7)
#[ 1 2 A 7 ]
(less-first q2)
#[ 1 2 A "y" ]
(less-last q2)
#[ "x" 1 2 A ]

For indexing purposes, a seq is like a map whose keys are integers. Indexing is done with @ (or lookup):

(@ q1 2)
A
T
(@ q1 17)
NIL
NIL

As you see, we again use a second value to indicate whether the index was in bounds. Like maps, seqs allow you to specify the default to be returned for an out-of-bounds index:

(setq q2 (with-default q2 0))
#[ "x" 1 2 A "y" ]/0
(@ q2 47)
0
NIL

There are two different ways to get a new element into a seq: with updates an element at a specific index (it can also be used to append an element); insert inserts an element at the given index:

(with q1 1 7)
#[ 1 7 A ]
(insert q1 1 7)
#[ 1 7 2 A ]
(with q1 3 42)
#[ 1 2 A 42 ]

If with is given an out-of-bounds index, it extends the sequence as needed by inserting copies of the default:

(with q2 8 "yow!")
#[ "x" 1 2 A "y" 0 0 0 "yow!" ]/0

To delete the element at a given index, use less:

(less q1 1)
#[ 1 A ]

More interesting operations on seqs are subseq and concat:

(isetq q3 (concat q1 q2))
#[ 1 2 A "x" 1 2 A "y" ]
(subseq q3 1 5)
#[ 2 A "x" 1 ]

Seqs also support the generic CL sequence operations find, position, etc.

1.3 Bags

A bag (sometimes called a multiset) is much like a set except that it associates with each member a natural number (nonnegative integer) called the multiplicity, which represents the number of occurrences of the member in the bag. So, for instance, if you start with an empty bag and add a member to it twice, the resulting bag will contain that member with multiplicity 2. Removing the same member once leaves it in the bag with multiplicity 1.

The basics should be familiar by now (once you get used to the print syntax, at least; the #% inclusions are for members with multiplicity greater than 1):

(empty-bag)
#{% %}
(isetq b1 (bag 1 2 1))
#{% #%(1 2) 2 %}
(with b1 3)
#{% #%(1 2) 2 3 %}
(less b1 1)
#{% 1 2 %}
(bag ($ b1) 2 3)
#{% #%(1 2) #%(2 2) 3 %}
(isetq b2 (bag 2 2 3))
#{% #%{2 2} 3 %}
(union b1 b2)
#{% #%(1 2) #%(2 2) 3 %}
(intersection b1 b2)
#{% 2 %}
(bag-sum b1 b2)
#{% #%(1 2) #%(2 3) 3 %}
(bag-product b1 b2)
#{% #%(2 2) %}
(size b1)
3
(set-size b1)
2

union and intersection take the max and min of the multiplicities of each element, respectively; bag-sum and bag-product are also available. The standard operation size returns the total multiplicity; to get the number of unique members, use set-size.

2 Programming with FSet

This section introduces some idioms and convenience features that make programming with FSet easier, and the resulting code more elegant. It also explains how to extend FSet to handle your own types.

2.0 Some comments on functional collections and Lisp

Although CL arrays and hash tables are useful only as mutable structures, and lists certainly can be treated as mutable also, I'd like to point out that the functional use of lists is quite common in CL code. For instance, constructing a list by consing things onto it treats the list functionally: each cons remains unmodified after its creation. And there are many commonly used CL operations that operate on lists without modifying them. The point here is that functional collections are not weird; they're really quite familiar.

2.1 Nested collections

FSet fully supports nesting of collections. Sets of sets, maps keyed by sequences of bags, ... all these combinations Just Work with no further effort on your part. Such types don't necessarily arise frequently during ordinary application programming, but they do show up in compiler and static analysis work. The state table of an LALR parser generator, for instance, is basically a map keyed by sets of sequences (the LR(0) items); such a type can be manipulated quite directly and elegantly in FSet.

2.2 Assignment and updating

FSet provides several macros which are handy for updating variables that hold collections. (Their names all end in "f", following setf.) These might appear at first glance to be mutating operators, but of course, they're not:

(isetq s (set 1 2))
#{ 1 2 }
(isetq s1 s)
#{ 1 2 }
(adjoinf s 3)
#{ 1 2 3 }
s1
#{ 1 2 }

Does this seem strange? It's the same behavior you would see with incf and push, for instance, two extremely common CL macros. You see? Functional collections really aren't as weird as they may sound.

Some of these also work with maps, and there's a setf method for @ (or lookup). The ability, mentioned above, to specify a default for the map is very handy here:

(isetq m2 (empty-map 0))
#{| |}/0
(incf (@ m2 'x))
1
m2
#{| (x 1) |}/0
(isetq m3 (empty-map (empty-set)))
#{| |}/#{}
(adjoinf (@ m3 'foo) "bar")
#{ "bar" }
m3
#{| (FOO #{ "bar" }) |}/#{}

This works to multiple levels too, e.g., a map from something to maps from something else to sets.

2.3 Conversions

FSet has its own CLOS generic function, convert, to handle conversions; this GF has a plethora of methods already, and more can certainly be added. Its argument order is the reverse of cl:coerce: it takes the target type (a symbol) as its first argument, and uses eql-dispatch on it (see the code).

A few methods take additional keyword parameters to specify some aspect of the conversion. For example, there are two ways to represent a bag as a list: either as a flat list possibly with repeated elements, or as an alist where the cdr of each pair gives the multiplicity. When converting from an FSet bag, you can specify the target type as either list or alist to select the desired representation; but alist is of course not a CL type, and can't be dispatched on, so to convert from an alist to a bag, you would say :from-type 'alist.

A brief summary of the important conversions:

set, map, seq, or bag to/from list or vector
set to/from bag
set to/from seq

2.4 Iteration

The first thing to be said about iteration in FSet is that if the purpose of iterating over a collection is to compute a value (which could be a new collection), there is probably no need to iterate explictly at all. The functions image, filter, and reduce provide very elegant and clear ways to do implicit iteration to compute a value.

On a set or sequence, image applies a function to each member and returns respectively a set or sequence of the results. It's similar to good old cl:mapcar, except that the image of a set is another set, which will be smaller if multiple members of the original set map to the same value:

(image (lambda (x) (if (oddp x) (1+ x) x)) (set 1 2 3 4))
#{ 2 4 }
(image #'1+ (seq 1 2 3 4))
#[ 2 3 4 5 ]

filter applies a predicate to each member of a set or sequence and returns a set or sequence of those members for which the predicate returned true (like cl:remove-if-not):

(filter #'oddp (set 1 2 3 4))
#{ 1 3 }
(filter #'oddp (seq 4 3 2 1))
#[ 3 1 ]

And reduce is a generic version of cl:reduce:

(reduce #'union (set (set 1 2) (set 2 3) (set 3 4)))
#{ 1 2 3 4 }
(reduce #'concat (seq (seq 1 2) (seq 2 3) (seq 3 4)))
#[ 1 2 2 3 3 4 ]

If you're not used to using these operators, I highly recommend practicing at every opportunity — as I say, any time you're iterating over a collection to compute a value. They're an important part of the FSet style.

2.4.0 Iteration macros (`do-set` etc.)

For imperative iteration, FSet provides the iteration macros do-set, do-seq, do-map, do-bag, and do-bag-pairs (all analogous to dolist). These are all pretty straightforward. do-map takes two variables, which are bound to the key and value of successive pairs:

(do-map (x y my-map)
  (format t "~A --> ~A~%" x y))

For each member of the bag, do-bag executes its body as many times as the multiplicity of that member. If you want the body executed only once per member, use do-bag-pairs, which gives you the multiplicity in a second variable, rather like do-map.

2.4.1 GMap

GMap is an iteration macro I first developed as an undergraduate in 1980. I forgot about it for some years, but when I dusted it off a few years ago it still seemed well-designed to me. It implements the map-reduce paradigm, which is gaining currency lately (though it's not a distributed implementation like Google's MapReduce). You can think of it as a generalization of cl:map (which it predated), which is in turn a generalization of good old mapcar. It also bears some relation to Waters' Series package; it's not as powerful (the implementation is far simpler) and yet it handles a lot of common cases. It's somewhat redundant with image/filter/reduce, but can do some things they can't, like parallel iteration of two or more collections.

The basic idea is simple: we start with something resembling mapcar, and then add just enough syntax to allow us to specify different ways to generate each argument of the function being mapped, and different ways to collect the results. Some examples:

(gmap :sum #'* (:index 0 3) (:list '(7 3 5)))
13
(gmap :set #'list (:index 0 3) (:vector #(q r s)))
#{ (0 Q) (1 R) (2 S) }
(gmap :map #'values (:index 0 3) (:seq (seq 'b 'k 'n)))
#{| (0 B) (1 K) (2 N) |}

For complete documentation and more examples see gmap.lisp, which is in the Misc-Extensions subsystem loaded by FSet. FSet defines new argument- and result-types :set, :map, :seq, and :bag.

Another macro in Misc-Extensions, also dating back to 1980, is used heavily in the FSet implementation, and you may find you want to use it also. This is my generalization of let. It makes working with multiple values much more convenient; it also generalizes let and let* into a construct that can do both parallel and serial binding in any combination. Check it out, in new-let.lisp. This macro is upward-compatible with cl:let; it is exported as both let and nlet, so you don't have to shadow cl:let if you don't want to.

2.4.2 Iterators

Another way you can iterate over collections in FSet is by using the provided iterators. The generic function iterator returns a closure of one argument: if passed :more? it returns true iff more values are available; if passed :done? it returns true iff more values are not available; if passed :get and more values are available it returns the next value (or, on a map, the next pair as two values) and an additional true value; if passed :get when no more values are available, returns two or three nil values.

Note that the iterator is a stateful object, unlike most everything else in FSet. If you want an iterator that behaves functionally rather than statefully, your best bet is to convert the collection to a list.

2.4.3 Indexing and rank access

Still another way you can iterate in FSet is by indexing on a seq. Sets, maps, and bags support an analogous operation called at-rank which takes an integer and returns the member or pair at that rank within the collection's ordering. You can use rank to find the rank of a member, or for non-members, the rank it would have if it were in the collection. This is currently the only way to iterate over a subinterval of a collection.

2.5 Equality and comparison in FSet

Although supporting several different kinds of equality is something of a Lisp tradition, FSet relies on a single, global equality predicate, equal?. On the built-in Lisp types supported by FSet it behaves mostly like equal, but unlike the latter, equal? is extensible for new types. We'll see how that works in a moment. For now I just want to make the point that FSet is not like the CL sequence interface, where you can specify the equivalence relation to use on each operation. In principle it would be possible for FSet to let you specify the equivalence relation used by a particular collection, but currently it doesn't even support this (it might someday). So, if you want to control how the members of a set are distinguished (and also how they're ordered), you need to define your own type.

The equality predicate and the ordering relation are connected in this way. The ordering relation is a CLOS generic function compare, which takes two arguments and returns one of the symbols :less, :greater, :equal, or :unequal. So equal? just calls compare and returns true if the result is :equal.

The compare results :less and :greater are straightforward — they indicate that FSet should order its first argument before or after the second, respectively. In rare cases it is necessary to use the fourth possible value, :unequal, to indicate that while the two values are not equal, the relevant compare method is unable to assign an ordering to them. (We will see some examples shortly.) FSet goes to some trouble to handle this case correctly, and does so with only a small loss of performance as long as the sets of mutually :unequal values remain small.

2.6 Member types FSet knows about

FSet already has compare methods for the following CL types:

null: the type of one object, nil
real: the type of rationals (integers and ratios) and the various float types; these are ordered by <, of course, but distinguished by eql, meaning that comparing a rational and a float of equal value returns :unequal, as does comparing two floats of equal value but different representations
character
symbol; distinct uninterned symbols of the same name compare :unequal, as do symbols of the same name of :unequal packages (see below; this is a very rare case)
string: compared by length first, then lexicographically
vector: also compared by length first, then lexicographically
list: compared lexicographically; dotted lists are supported
package
pathname

There is a potential problem in that some of these types are mutable in CL: strings, vectors, and lists. It is very important not to modify these while they're in FSet collections! Doing so will probably break the ordering invariant that FSet relies on. There are some situations in which you can get away with it, but my suggestion is to avoid it generally.

There is a rare situation you will probably never encounter, but I think it's instructive to understand it because it shows how mutability can sneak in in ways you might not expect, and it's also useful to see how it can be handled. Symbols are compared by first comparing their packages; if the packages are distinct, they are compared by name. Sounds reasonable, yes? But there's a small matter of the rename-package operation CL provides. Use of this operation could change the ordering of packages and therefore of symbols — except that FSet protects itself, by squirreling away the original name of the package. This means that if you have a package foo, rename it to bar, and then create a new package named foo, the two packages will compare ... you've guessed it ... :unequal. (I've oversimplified slightly here; see the code for the details.)

If you mix types within a set, as I have in some of the examples in section 1, you will also see the effects of the cross-type comparison methods, which control the ordering of values of different types. FSet defines these for the built-in types it supports; the ordering is the same as that of the above list. For user-defined types, it falls back to comparing the type names. We will see in a moment how to make cross-type comparisons more efficient for new types.

One more point about these built-in compare methods. While those for simple types like numbers supply orderings that will likely be generally useful in application code, that's not their purpose. In particular, note that a shorter string will always be ordered before a longer one, independent of their contents; only if the lengths are equal do we look at the contents. That's not an ordering that's likely to be useful to application code, but doing it this way makes sets of strings or symbols, or maps keyed by strings or symbols, much more efficient in the common case that they are of varied lengths. In general, I would discourage you from counting on FSet's built-in orderings; if you want a set of strings, say, in a particular order, wrap the strings in a struct or class type and specify the ordering for that type yourself, as we will now see how to do.

2.7 Adding new member types

Here we discuss how to extend FSet so you can use your own struct and class types in FSet collections.

The first question to ask of such a type is whether it models an immutable value or a mutable object. If it's a value, you will want to index it by content; if it's an object, you should almost certainly index it by identity. The danger of indexing a mutable object by content is that that content could change while the object is in an FSet collection, invalidating the latter's ordering invariant.

In general, to tell FSet about your type you need to define a method on compare. In most cases, this method should return either :equal, :less, or :greater, but there might be rare cases when it's necessary to return :unequal. Suppose, for instance, you are determining the order in which two values should go by hashing their representations and then comparing the hash values. Since it's possible for distinct contents to hash to the same hash value, your compare method should return :unequal in that case.

FSet provides several convenience facilities to assist you in interfacing your own types to it. For value types, take a look at compare-slots: it takes the two values to be compared and (as a rest argument) a list of accessors; it applies the first accessor to each value and compares the results; if that comparison returns :less or :greater it returns that result, else it moves on to the second accessor, etc. Here's an example:

(defclass point ()
    ((x :accessor point-x)
     (y :accessor point-y)))
(defmethod compare ((p1 point) (p2 point))
  (compare-slots p1 p2 'x 'y))
;;; or, if you prefer:
(defmethod compare ((p1 point) (p2 point))
  (compare-slots p1 p2 #'point-x #'point-y))

Here the points will be ordered first by X coordinate, then by Y coordinate. For each slot, you can specify it either as a quoted slot name or as a function-quoted accessor name; or, you can supply an arbitrary function on your type.

The second convenience facility is for (mutable) object classes: all you have to do, if your class is a standard (CLOS) class, is to supply identity-ordering-mixin as one of its superclasses. That's it!

The third convenience facility is a macro for defining cross-type comparison methods; simply add, after your type definition, a form like

(define-cross-type-compare-methods point)

While FSet has a fallback mechanism that orders otherwise unordered types by name, it is more efficient, if you are actually going to be mixing types within a collection, to define the cross-type methods explicitly. For types that will not be in mixed collections, this doesn't matter.

The fourth convenience facility is for cases where, for the purposes of your application, you specifically want lexicographic comparison of strings or other sequences. The generic function compare-lexicographically takes two strings, CL sequences, or seqs, and returns the result of a lexicographic comparison. To use it, define a new class that holds the string or other sequence in question, and in the compare method for that type, call compare-lexicographically.

2.8 Performance considerations

Let me get one thing out of the way. No functional collection implementation can be as fast as well-implemented imperative collections, for the simple reason that the functional implementation has to do more allocation and therefore puts a greater load on the garbage collector. The goal of FSet isn't to be the fastest collection implementation period; it's to be fast enough to make functional collections a reasonable choice for ordinary application programming. I'll say more about this below.

Because FSet is implemented with balanced binary trees, it achieves good time complexity across a wide variety of operations. Of particular note:

Lookup (including seq indexing) and point updates take logarithmic time.
The operations that combine two collections (union etc.) all run in linear time (or better -- see below).
Seq operations subseq and concat take logarithmic time; you can use them freely without worrying about the copying cost you would pay with CL sequences.

FSet also uses a heterogeneous tree design that has much less space overhead than the more common homogeneous trees.

Time complexities of common operations:

These operations run in constant time or amortized constant time: empty?, size, set-size, arb, all operations on an iterator

These operations run in time logarithmic in the size of the collection: contains?, lookup (aka @), with, less, subseq, concat

These operations run in time linear in the size of the collection or the total sizes of the two collections: equal?, union, intersection, set-difference, set-difference-2, map-union, map-intersection, map-difference

These operations run in O(n log n) time: sort, stable-sort

There is an important optimization that applies to the collection-combining algorithms. While they run in linear time in the normal case, they can run faster than that in the common special case that the collections being combined are historically related, in the sense that one has been computed from the other or both have been computed from a common ancestor. In the extreme case, if their arguments are eq, they run in constant time. If one argument has been computed by applying a small number of with, less, or update operations to the other (or both have been computed this way from a common ancestor), they run in logarithmic time. This means, for example, you can take a large map, hand it to someone, have them make a point modification and hand it back to you, and then map-difference will tell you what they did to it in log time! And even if they make many changes to it, or construct a completely new map and hand it back to you, the worst that can happen is that the algorithm will fall back to linear time.

2.9 Tuples

FSet provides an additional type, tuple. This type is somewhat experimental; its usefulness is not clearly established. Tuples are logically similar to maps, but intended for use in situations where the keys are all compile-time constants (and thus there is a bounded set of them). See the comments in tuples.lisp for more information.

3 FSet Philosophy and Style

You will have noticed already a major difference in design philosophy between CL's collections and FSet's. Besides being functional, the FSet collections are semantically loaded: each collection knows what kind it is (set, seq, map, bag) and, as previously discussed, knows what equivalence relation it uses (since there is a single global equivalence relation). In contrast, CL collections are just bare data structures; you have to remember the intended semantics — for instance, is a given list a sequence, a set, an alist, etc. — and if you're using an equivalence relation other than eql, you have to remember that too, and supply it explicitly to each operation. This semantic loading makes FSet collections easier and more elegant to use. It does require, however, that you decide in advance what kind of collection you're creating, a decision CL allows you to defer to some extent.

3.0 When and why to use FSet?

Again, FSet's goal is not to be the fastest collection library — it can't compete with imperative collections. Rather, it is designed to be a good default choice for ordinary application programming: the thing you use when you don't have a specific reason to do something different.

A good analogy is generic arithmetic. In CL, most of the time that you write an arithmetic operation (in ordinary application code, at least), you use generic arithmetic. Users of other languages look at that and think that the overhead must make Lisp slow; but we know that, first, there's an awful lot of code in the world that isn't particularly performance-sensitive, and second, for those few places where it matters, Lisp provides ways to reduce or eliminate the overhead, such as type declarations and specialized arrays.

So I would encourage you to think about FSet the same way you think of generic arithmetic. Yes, there's some overhead, but there are benefits in convenience, elegance, and generality that usually outweigh the overhead. And when that's not the case, Lisp provides alternatives that are less general, harder to use, but higher in performance.

By the way, I wouldn't suggest that the FSet types supplant all uses of lists. Lists are very convenient for certain situations. I certainly still find uses for them even in FSet-heavy code; but they are one tool in the toolbox rather than the primary tool.

One place where functional collections are particularly useful is at interfaces. The problem with passing mutable collections across interfaces is the possibility of inadvertent modification of a shared structure. If I pass you a mutable set that I am also using internally, I have to trust that you will not modify it; if you overlook this requirement (or I fail to document it), and treat the set as your own datum that you can modify for your own purposes, you're likely to break my code. The only easy way to prevent this, with mutable collections, is to copy the set before I hand it to you; but that's an unpalatable linear-time operation. If, on the other hand, I'm using functional collections, there's no problem: I can hand you the same set I'm using internally, knowing you can't inadvertently modify my copy.

This issue probably arises less in Lisp than it would in, say, Java, because (as I commented above) it's quite common in Lisp to treat lists as functional collections; while it's certainly possible for you to modify a list I've passed to you, you're more likely to be aware that that could be a bad idea. Still, it could come up with a vector or hash table.

3.1 Some style considerations

I expect there will be two primary ways that FSet gets used; I'll call them "FSet-light" and "FSet-heavy". In the "light" mode of use, FSet types and operations will be sprinkled into code — perhaps, code that was originally written without FSet — but the code will mostly use Lisp collections. In the "heavy" mode, on the other hand, code uses FSet for most or all of its collections.

There are some choices to be made based on this difference. The major one is whether to import the external FSet symbols into your package. This is clearly desirable for FSet-heavy code (see the defpackage :fset-user form in fset.lisp for an example of how to do this, including a list of symbols you will need to shadowing-import from either fset or cl). For FSet-light code, on the other hand, it may be better not to do this, but to refer to FSet symbols with an explicit fset: package prefix; this is particularly true when FSet is being added to existing code (and maybe doubly true on a large project with multiple developers).

A small choice is whether to use lookup or @. In FSet-light code you should probably use lookup as the name will be easier to recognize, but in FSet-heavy code I think you may as well use @ (I do).