Clojure Overloads the Term "Namespace"

Published on Saturday, 2013-08-10

Just a quick PSA on a topic I don’t think I’ve heard about anywhere else: the term “namespace” in Clojure.

“Namespace” means at least two things in Clojure, and they often overlap. The first meaning is the one you’re likely thinking of: the namespaces by which we organize code, that vars live in. This is the sort of namespace referred to by the clojure.core functions and macros that have ns in the name (ns, in-ns, ns-map, the-ns, *ns*, …), as well as several that contain namespace spelled out. The notable exception is clojure.core/namespace itself.

user> (doc namespace)
-------------------------
clojure.core/namespace
([x])
  Returns the namespace String of a symbol or keyword, or nil if not present.

This actually doesn’t strictly have anything to do with the first meaning of “namespace”. It refers to the part of symbols and keywords preceding the / if present. E.g.,

user> (namespace :foo)
nil
user> (namespace 'bar)
nil
user> (namespace ::bam)
"user"
user> (namespace :flaz.bang/chizzle)
"flaz.bang"
user> (namespace `taco-sam)
"user"
user> (namespace #'first)
ClassCastException clojure.lang.Var cannot be cast to clojure.lang.Named
user> (namespace second)
ClassCastException clojure.core$second cannot be cast to clojure.lang.Named

What tends to confuse the issue is that though the two don’t have to have anything to do with each other, they often do. For example, if I enter:

user> (clojure.string/trim "  HOITS")
"HOITS"

I’ve written code that syntactically contains a symbol with the namespace "clojure.string", and the compiler uses this when resolving the symbol to figure out that it needs to look for a var interned under the name trim in the clojure.string namespace. But we all know that we may well have done something like this:

(ns user
  (:require [clojure.string :as s]))

(s/join " and " ["you" "me" "me" "you"])
;; => "you and me and me and you"

In this case we’re using the symbol s/join in the code, and even though we know that it’s referring to the #'clojure.string/join var, the actual namespace of the symbol in the code is not clojure.string:

(namespace 's/join) ;; => "s"

The compiler is letting us use this symbol even though there is no namespace (in the normal sense) called "s".

The namespace function actually complements the name function:

(for [x [:foo 'bar :baz.bang/toots 'fizzle/wizzle]]
  [x (namespace x) (name x)])

([:foo nil "foo"]
 [bar nil "bar"]
 [:baz.bang/toots "baz.bang" "toots"]
 [fizzle/wizzle "fizzle" "wizzle"])

In summary, both symbols and keywords can have “namespaces” which may or may not match an existing “code-namespace” (and there are uses for all four possibilities).



Self-referential Seqs Considered Dangerous

Published on Saturday, 2013-07-27

There was a lively conversation on freenode’s Clojure channel a few days ago about strange behavior involving self-referential chunked seqs.

The normal gotcha with chunked seqs is that more elements of the sequence may be realized than you expect, which is usually only noticeable if your lazy sequence has side-effects associated with its realization:

(take 3 (map print (range)))

;; returns (nil nil nil) and prints
;; 012345678910111213141516171819202122232425262728293031

Note that the return value here ((nil nil nil)) is as expected, since print returns nil, but the associated side effects are not.

The discussion centered around an example where the return value itself is unexpected, due to the sequence being self-referential. A self-referential immutable data structure is only possible with some flavor of delayed evaluation – in this case, we use lazy seqs and vars. Haskell achieves self-referentiality using the lazy nature of the entire execution model, but while this is a common tactic in Haskell, it is not a common one in Clojure (for reasons that should become apparent).

Self-referential Vanilla Seqs

For a simple example of the Clojure approach, we can try to define the Fibonacci series self-referentially:

(def fibs (list* 0 1 (map + fibs (rest fibs))))
;; IllegalArgumentException Don't know how to create ISeq from:
;; clojure.lang.Var$Unbound  clojure.lang.RT.seqFrom (RT.java:505)

This fails when the compiler tries to evaluate (list* 0 1 (map + fibs (rest fibs))) before assigning any value to #'fibs. The var exists, but has an “Unbound” value, and the rest function is trying to treat it as a seq right away.

We can try to delay evaluation a little bit by wrapping the expression in lazy-seq:

(def fibs2 (lazy-seq (list* 0 1 (map + fibs2 (rest fibs2)))))

(take 10 fibs2)
;; StackOverflowError   user/fn--4171 (NO_SOURCE_FILE:1)

This one surprised me for a few minutes. We successfully suppress any errors at compile time, but when we try to get some of the elements the same kind of issue as before pops up – when we try to evaluate the body of the lazy-seq expression, the rest function is trying to walk a step or two into the sequence, which requires evaluating the body of the lazy-seq expression, which requires…

Okay, so since the culprit is the rest function being a little too eager, presumably we can just wrap that bit in lazy-seq to get the behavior we need:

(def fibs3 (list* 0 1 (map + fibs3 (lazy-seq (rest fibs3)))))

(take 10 fibs3)
;; IllegalArgumentException Don't know how to create ISeq from:
;; clojure.lang.Var$Unbound  clojure.lang.RT.seqFrom (RT.java:505)

This one I was sure would work. Note that (take 2 fibs3) works fine here, so the exception is not thrown until the lazy-seq internal to the map function is being forced. The culprit this time is not the third argument to map but the second – since fibs3 is unbound at the time the arguments to map are being evaluated, the thing passed to map is the “unbound” value which doesn’t change later on. So we need to delay that as well:

(def fibs4 (list* 0 1 (map + (lazy-seq fibs4)
                             (lazy-seq (rest fibs4)))))

(take 10 fibs4)
;; (0 1 1 2 3 5 8 13 21 34)

Phew. It’s worth mentioning that we can cover both args at once by lifting the lazy-seq up around the call to map:

(def fibs5 (list* 0 1 (lazy-seq (map + fibs5 (rest fibs5)))))

(take 10 fibs5)
;; (0 1 1 2 3 5 8 13 21 34)

So self-referential seqs are possible if you’re careful about what gets evaluated when. If this weren’t argument enough to avoid the technique, it gets worse when we introduce chunked sequences.

Self-referential Chunked Seqs

Chunked sequences are less lazy, because they realize their elements in groups of 32 for better performance. Due to implementation details this leads to problems when an element of a sequence depends on the preceding element and both elements happen to be part of the same chunk. This might not be so bad if it blew up like in the previous examples, but surprisingly we can silently get incorrect results.

I had a hard time coming up with a “natural” example of this, so instead we’ll have to settle for a contrived example. We’ll define a sequence of numbers where all but the first several numbers are filtered from (range) by checking if the element of the sequence at the index (rem n 10) is a number (which it always should be).

(def foos
  (list* -3 -2 -1
            ;; we expect this function to always return true
    (filter #(number?
               ;; Using nth here instead throws an
               ;; IndexOutOfBoundsException.  Why is left
               ;; as an exercise for the reader.
               (first (drop (rem % 10) foos)))
            (range))))

(take 20 foos)
;; => (-3 -2 -1 0 1 2 10 11 12 20 21 22 30 31 32 33 34 35 36 37)

We expect the result to be (-3 -2 -1 0 1 2 3 4 5 ...), but instead we see a gap from 3 to 9. Just to verify that this is caused by chunking, we can tweak the code to use (iterate inc 0), which is an unchunked equivalent of (range):

(def foos2
  (list* -3 -2 -1
    (filter #(number?
               (first (drop (rem % 10) foos2)))
            (iterate inc 0))))

(take 20 foos2)
;; => (-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16)

What’s going on is that the first chunk of (range) is realized and run through the filter function before the chunk being returned from filter is properly populated, and as a result the expression (first (drop (rem % 10) foos)) is returning nil. So the effect is the elements of the sequence temporarily appear as if they were nil.

As indicated in a comment above, sometimes this sort of thing can instead cause an IndexOutOfBoundsException, and certainly in a lot of numeric cases an unexpected nil will mean a NullPointerException. So it won’t always silently give incorrect results.

Self-reference is an innocent-enough technique in Haskell due to the fully-lazy execution model. Clojure’s partial laziness provides a handful of various ways to screw yourself up, and so you’re much better off avoiding self-reference altogether.

Epilogue

It’s worth distinguishing the sort of self-reference talked about above from a function that returns a lazy sequence which is internally dependent on the return value from a separate (recursive) call of the same function. A great example of this is computing primes, which by one strategy requires having a list of smaller primes to test for divisibility.

We can do this with a function that returns a list of primes (using defn instead of def), which internally calls itself. You still have to be careful with this kind of recursion (see the comment in the code for how chunking is a danger here), but I think it’s a little easier to reason about and avoids some of the self-reference problems discussed above.

(defn primes
  []
  (cons 2
    (lazy-seq
      ;; don't make the recursive call until the
      ;; lazy-seq is forced
      (let [ps (primes)]
              ;; using (drop 3 (range)) here gives a
              ;; StackOverflowError due to chunking
        (for [n (iterate inc 3)
              :let [root (Math/sqrt n)]
              :when (->> ps
                         (take-while #(<= % root))
                         (filter #(zero? (rem n %)))
                         (empty?))]
          n)))))

(nth (primes) 9999) ;; => 104729

Even less problematic (and actually idiomatic) is a function that returns a lazy-seq and uses a recursive call for the tail, like this simplified version of filter:

(defn filter
  [func coll]
  (lazy-seq
    (when-let [[x & xs] (seq coll)]
      (if (func x)
        (cons x (filter func xs))
        (filter func xs)))))

Thanks to @ltw_ for helping this all to make more sense.



Object-Oriented Qubits

Published on Tuesday, 2013-07-23

I recently found out about the Quipper quantum programming language (hat tip to @senorflor) and was inspired to write a Clojure library that allows playing with Qubits at the repl. In contrast to my quantum circuits app, which only allows ahead-of-time design of entire circuits, this library lets you create qubits on the fly and interact with them in an open-ended way.

The part I thought was the most interesting was how appropriate common object-oriented techniques were for modeling the qubits. Some of the worst aspects of popular object-oriented design are hidden state and unexpected side effects. But both of these turn out to be perfect for qubits. Qubits have more state information than is possible to extract (so it’s in a sense fundamentally hidden), and the act of observing them will generally change their state. Not only that, if the qubit is entangled with other qubits, observing one will change the state of the others as well.

For modeling this in Clojure I used a deftype with a ref in one of its fields (none of the code here includes the actual math of quantum computation, which is buried away in the data/-prefixed namespace):

(deftype Qubit [name system])

(defn init-system
  "Creates a singleton system with the qubit's value 0,
   and sets the qubit's ref to that system."
  [^Qubit q]
  (let [system (data/single-qubit-system q 0)]
    (dosync
     (alter (.system q) (constantly system)))))

(defn qubit
  "Creates a fresh qubit, with a name if supplied."
  ([] (qubit (gensym "qubit-")))
  ([name']
     (doto (->Qubit (name name') (ref nil))
       (init-system))))

When two qubits become entangled, their internal “systems” are merged, and both of their refs are pointed to the same merged system, which is what allows them to have internal references to each other:

(defn update-system-pointers!
  "Given a system-map, updates all the .system refs of the :qubits
   list to point to that map."
  [system]
  (doseq [^Qubit q (:qubits system)]
    (alter (.system q) (constantly system))))

(defn merge-systems!
  "Updates the system properties of the qubits so that they are all
  together."
  [qs]
  (dosync
   (let [systems (distinct (map (fn [^Qubit q] @(.system q)) qs))]
     (when (> (count systems) 1)
       (let [system (reduce data/merge-systems systems)]
         (update-system-pointers! system))))))

(defn single-qubit-gate-fn
  "Given a gate definition [[a b] [c d]], returns a function that
   takes a primary qubit and optional control qubits and executes
   the gate on it."
  [gate]
  (fn [^Qubit q & controls]
    (dosync
     (when (seq controls)
       (merge-systems! (cons q controls)))
     (let [new-system (data/apply-single-qubit-gate gate
                                                    @(.system q)
                                                    q
                                                    controls)]
       (update-system-pointers! new-system)))
    q))

Observation is just as side-effecting as applying a gate:

(defn observe
  "Returns 0 or 1."
  [^Qubit q]
  (dosync
   (let [[outcome new-system] (data/observe @(.system q) q)]
     ;; if the qubit was previously entangled, detangle it
     (if (> (count (:qubits new-system)) 1)
       (let [new-system-1 (data/factor-qubit-from-system new-system q)
             new-system-2 (data/single-qubit-system q outcome)]
         (update-system-pointers! new-system-1)
         (update-system-pointers! new-system-2))
       (update-system-pointers! new-system))
     outcome)))

I doubt this approach would be useful for doing anything serious related to quantum computation, but for the sake of learning how qubits behave I think it could be promising. The rest of the code is on Github.



Motivation

Published on Saturday, 2013-07-20

I’ve avoided blogging about software development despite doing it professionally for several years now, because it didn’t fit with what I wanted my original blog to be about, and I’ve developed a high enough bar for posting on that blog that it would be inconveniently prohibitive.

So today I’m throwing this blog up to give myself an outlet when necessary. Topics will include everything I do professionally, programming-language, and miscellaneous hacker stuff.

Quality will be questionable.