What Is a Functional Datatype? (Modern Common Lisp with FSet)

Next: Is Lisp a Functional Language?, Up: Conceptual Background [Contents][Index]

3.1 What Is a Functional Datatype? ¶

You may have noticed that I refer to FSet collections as “functional” rather than “immutable”. There is more to functional semantics than mere immutability: functional collections also support “functional update” operations: operations that take a collection and return a new collection that is different in some specified way. Some “immutable” collections libraries do not provide these.

Functional collections are therefore also persistent, in the data-structure sense: since performing a functional update returns a new collection, the collection in its previous state still exists and can be operated on further if desired. (The term “persistent” is also used for data written to nonvolatile storage; this meaning is not intended here.)

To develop an exact definition of functional collections, or functional datatypes in general, let me start with some simple examples in C++ and Java. The first could be in either language:

    int a = 3;
    int b = a;
    ++a;     // b is unchanged

Simple, right? Everyone knows that incrementing a doesn’t affect b. This is because the type int has an immediate implementation: there is no way that b could be an alias of a, because the assignment copies the value itself, not a pointer to the value. We say that int has value semantics.

Now consider this, in Java:

    int[] a = { 3 };
    int[] b = a;
    a[0] = 4;    // b is also affected

Here we see a case where assignment creates an alias to a mutable object, so that mutations to the object are visible through the alias. This example demonstrates reference semantics.

Interestingly, to have reference semantics, a type must be aliasing (meaning that assignment creates an alias) and mutable. C++, for instance, has collection types that are mutable but not aliasing:

    std::vector<int> a{3};
    std::vector<int> b = a;
    a[0] = 4;   // b is unchanged!

Why is b not an alias of a? Because std::vector redefines assignment to copy the contents. So even though the implementation is pointer-mediated — as opposed to immediate, like int — the class arranges for assignment not to create an alias. (Of course, this being C++, you could create an alias using an explicit reference or pointer, but let’s leave that aside. And yes, C++ pedants, I am overlooking the distinction between initialization and assignment, which doesn’t matter for this discussion.)

Curiously enough, the result is that (in the absence of explicit references or pointers) the standard C++ collections have value semantics despite being mutable! I said above that the Java int[] example shows a case where assignment creates an alias to a mutable object. Here we see that both aspects are required: the type must be aliasing and mutable to have reference semantics; otherwise, it has value semantics.

Let’s have another look at that first example:

    int a = 3;
    int b = a;
    ++a;     // b is unchanged

How do we want to describe the action performed by the third statement? The usual way is to say that ++a is an abbreviation for a = a + 1; and that’s valid. But interestingly, it’s equally valid to consider int to be a mutable but unaliasing type, like std::vector. We could just as well say that the increment operation is mutating a in place; in fact, that’s often how we think about it, and it’s also very likely to be how it’s implemented at the machine-instruction level. We see, then, that if a type is unaliasing, the mutable/immutable distinction no longer has the force we expect: it no longer determines whether the type has reference or value semantics.

So the first requirement for a functional datatype is simply that it have value semantics.

Okay, back to Common Lisp and FSet. Collections have to have pointer-mediated implementations because they’re of arbitrary size, and Common Lisp gives us no way to redefine assignment. So aliasing is forced on us. (This is true of most languages.) To get value semantics, then, the FSet types must be immutable. As we will see, though, FSet provides operations that superficially resemble mutations; in practice, you can mentally pretend that FSet types are mutable but unaliasing, like std::vector, although that’s not how they’re actually implemented — the apparently mutating operations are constructing a new representation and assigning it back to the variable holding the collection.

If that sounds odd to you as a CL programmer, let me assure you that it is not. Consider this example:

    (let* ((a (list 4))
          (b a))
      (push 3 a))   ; b is unchanged

Here the push form expands to (setq a (cons 3 a)), which obviously doesn’t touch b. Although Lisp lists are technically a reference type, since cons cells can be mutated, in practice they are rarely mutated once they are fully constructed. (By “fully constructed” I’m referring to the fact that lists are often built backwards and then reversed in place with nreverse; usually, once that reversal is done, the list is fully constructed.) So Lisp lists are almost always used as a functional type, and the language supports this usage fairly well.