This page is intended for anyone interested in extending FSet.
1
and the floating point number 1.0
represent mathematically equal values, yet are distinct. The natural
treatment is to consider them equivalent, though I guess nothing goes
wrong if we arbitrarily consider the integer to be less than the float.
A better example is when we're arranging for sets of objects to have a
desired canonical order, but our true equivalence relation is still
object identity — we occasionally have duplicate objects which we
want to treat as distinct.)
I don't know how often that happens, but it seemed to me that it
might happen occasionally, and so, because my goal was to make FSet as
broadly useful as possible, I decided to go to the effort to support
equivalent-but-unequal elements. In one case this was critical: some of
the FSet/Java classes take their ordering from the hash values, in case
the element class doesn't implement Comparable
at all; and
of course there's no guarantee that hash values won't collide. In the
remaining cases, it's not clear how important it is. Well, time will
tell.
That said, it should be possible to do a pretty good job of this abstraction in Lisp using macros, and it might be interesting to see the result. Anyone who feels inspired to tackle this is welcome to do so.
The really right solution, seems to me, would be to code the algorithms in some more abstract formalism that, when combined with information about the implementation strategy, could be translated into the target language. This would be a very fun project but would have taken me quite far afield — and I really wanted to get this thing working so I could use it.
These fall into several categories.
No, a node is not just its set of out-edges; it has an identity. An alternative design, one suggested by the standard mathematical formulation, would be to use any set of distinct point objects, such as the natural numbers, to represent the nodes; then the graph can be represented as a map from nodes to (maps from edge labels to target nodes). This works pretty well, but still has one problem: as new versions of the graph are constructed, some nodes may become unreachable from the root set of interest; yet mappings for those nodes will, by default, remain indefinitely. Consider that when one is using mutable graphs made of mutable nodes in the usual way, the garbage collector takes care of unreachable nodes; now, because we are no longer representing the graph as language-level objects connected by pointers that the GC understands, we no longer get the benefit of GC for our graph. So, we have to GC it ourselves, probably by occasionally copying it starting from a root set (which we will have to compute explicitly).
While GCing our graphs ourselves could work well enough, there is possibly a better solution involving weak data structures. (I refuse to use the misnomer "weak pointer" despite its prevalence — weakness is not a property of the pointer, but of the location in which the pointer is stored.) If we could use weak functional maps to represent the graph, and use language-level objects to represent the nodes, then the built-in GC would work on our graphs.
But making a usable implementation of a weak functional map seems to be highly nontrivial. First, the primitives provided by languages for constructing weak data structures vary, and in most cases appear to be inadequate to our task. This goes doubly for Common Lisp, where the standard does not provide for them at all; and though most major implementations that do have some support for them, the nature of that support varies. (For a good overview of what's possible, see this paper by Bruno Haible.) Secondly, tree data structures don't like to have their keys disappear; those keys are needed to navigate the tree! There may be a solution to this problem using lazy cleanup of subtrees containing deleted keys, but it will definitely complicate FSet's algorithms.
I'm sure we can think of more...
ext:freeze-type
proclamations, at least
on the structs declared in wb-trees.lisp
. Also, a few spots
may deserve dynamic-extent
declarations.The last work I've done on FSet/Java was a few years ago, before the
release of Java 1.5. I see that since then, some new collection
implementations have been added. Of these, the most interesting is
ConcurrentSkipListSet
. By providing thread-safe update without
locking, this class eliminates one of the motivations for using FSet sets.
Fair enough. I still think the stylistic benefits of functional collections
make them superior in most cases. It will be interesting to see whether this
argument gains any traction.
extends
wildcard types (? I think).Collection
that need to be implemented (I think there
are).NavigableSet
etc.
interfaces.It looks like we'll have to maintain separate versions for Java 1.4, 1.5, and 1.6. Yuck.
(As of November 2008, it appears unlikely that FSet/Java will ever see the light of day, but let me know if you're interested in it.)
I think my priorities for new implementations would be in roughly this order:
The upshot of all this is that while I would love to see an FSet implementation for Python — which is probably the second most commercially important semi-functional language, after Java — it sounds like persuading the Python powers that be to endorse FSet (e.g., by adding it to the standard library) is going to be very hard.
TreeMap
),
but they
are of relatively limited functionality. Anyway, FSet is definitely in
the spirit of the language, and I would bet the maintainers would be
delighted to add an FSet implementation to the standard library, if
someone were to write one.