Compiling queries without EVAL or COMPILE

When I saw Gary King's post about using EVAL to match strings against expressions, I had to write something about alternative strategies.

Peter Seibel has a good followup about compiling. He writes:

What he should have done is written a compiler. Luckily Common Lisp comes with a Lisp compiler built in so all we have to do to write our own compiler is translate our source language into Lisp.

I agree that Gary should have used a compiler, but there's a problem with Peter's alternative: the Common Lisp compiler is usually quite a heavy thing to invoke.

Gary's example problem can be solved without using EVAL or COMPILE. The trick is to produce a tree of closures that implement the semantics of the expression. cl-ppcre, for example, uses this strategy to great effect, and SICP also covers it in detail.

What does it mean to compile to a tree of closures? It means that for a given form, a closure is produced that calls a particular subsequent function if the form is successful (whatever that means) and another subsequent function if the form fails. The user is given the root of the tree, and invoking it with a target string will work through the functions in the tree until a success or failure function returns.

This can minimize the work by using the short-circuiting semantics of operators like AND and OR, without using AND and OR directly.

Here's Gary's example expression:

  '(or (and "space" "mission")
       (and (or "moon" "lunar") "landing")))

Here's a dispatching function that looks at a form and calls a specific closure creator depending on the kind of the form:

(defun make-matcher-aux (form success failure)
  (etypecase form
    (cons
     (ecase (first form)
       (or (make-or-matcher (rest form) success failure))
       (and (make-and-matcher (rest form) success failure))))
    (string
     (make-string-matcher form success failure))))

There are three different creator functions to implement.

For strings, I'm going to stick with Peter's simplification of the string matching function:

(defun find-word-in-string (word string)
  (search word string :test #'char-equal))

String forms are successful if FIND-WORD-IN-STRING returns true:

(defun make-string-matcher (string success failure)
  (lambda (target-string)
    (if (find-word-in-string string target-string)
        (funcall success target-string)
        (funcall failure target-string))))

AND-forms are successful if all the subforms are successful. This can be recursively defined like so:

(defun make-and-matcher (forms success failure)
  (if (null forms)
      success
      (make-matcher-aux (first forms)
                        (make-and-matcher (rest forms) success failure)
                        failure)))

At first glance it may not be obvious how the semantics are implemented. The key parts are:

  • an empty AND-form is successful (base case)
  • the success closure for the first form's closure requires the success of the rest of the AND forms
  • the failure closure is passed along; that means failure happens as early as the first closure

OR-forms are successful if any of the subforms are true. This can be recursively defined like so:

(defun make-or-matcher (forms success failure)
  (if (null forms)
      failure
      (make-matcher-aux (first forms)
                        success
                        (make-or-matcher (rest forms) success failure))))

In contrast to AND-forms, OR-forms break down like this:

  • an empty OR-form fails
  • the success closure isn't changed; that means success can happen as early as the first closure
  • the failure closure is forgiving; it's a chain of the rest of the closures from the OR-form, so the testing can keep going

Finally, a front-end function starts the compilation going with very simple success and failure functions:

(defun make-matcher (form)
  (make-matcher-aux form (constantly t) (constantly nil)))

When these functions are all compiled, they each produce compiled closures. In contrast to Peter's solution, COMPILE is used only once, possibly in the distant past, to produce the closure-creator functions.

It's easy to add more operators. For example, here's the closure creator for a NOT operator:

(defun make-not-matcher (form success failure)
  (make-matcher-aux form failure success))

NOT just reverses the meaning of success and failure.

Here are some benchmarks on SBCL 1.0 on an iBook G4:

(defparameter *example-expression*
  '(or (and "space" "mission")
    (and (or "moon" "lunar") "landing")))

(defparameter *test-strings*
  '("The NASA lunar program was a smashing success."
    "The moon landing was a triumph of technology and elbow grease."
    "Unmanned space missions have also been very successful."
    "The shuttle program is a joke, though."))

(defparameter *peter-matcher* (compile-expression *example-expression*))
(defparameter *zach-matcher* (make-matcher *example-expression*))

(defun test-matching (count fun)
  (dotimes (i count)
    (dolist (string *test-strings*)
      (funcall fun string))))

(time (test-matching 1000 *peter-matcher*))
Evaluation took:
  0.345 seconds of real time
  0.296151 seconds of user run time
  0.02539 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  2,990,080 bytes consed.

(time (test-matching 1000 *zach-matcher*))
Evaluation took:
  0.334 seconds of real time
  0.287856 seconds of user run time
  0.025078 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  2,998,272 bytes consed.

The matching speeds of the two compiled functions are comparable. The big difference, though, comes when you actually produce the compiled functions:

(time (dotimes (i 1000)
        (setf *zach-matcher* (make-matcher *example-expression*))))
Evaluation took:
  0.004 seconds of real time
  0.001744 seconds of user run time
  6.3e-4 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  122,760 bytes consed.

(time (dotimes (i 1000)
        (setf *peter-matcher* (compile-expression *example-expression*))))
Evaluation took:
  5.763 seconds of real time
  4.560934 seconds of user run time
  0.631804 seconds of system run time
  [Run times include 0.128 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  92,549,416 bytes consed.

This result is partly because SBCL has a pretty slow compiler. Other compilers vary, but they're all between 100 and 1000 times slower at COMPILE-EXPRESSSION.

MAKE-MATCHER is pretty simple. In a more flexible system, you could move work out to generic functions that specialize on the form type and the operator of the form. Chris Riesbeck talks about this in Graham Crackers, using Graham's raytracer as an example.

As Peter mentioned, it would also be helpful to keep track of which strings have been seen, and their presence or absence in the target, to avoid making multiple tests.

The main point, though, is that avoiding EVAL by using COMPILE is partly an exercise in shuffling a big cost from runtime to a smaller cost at compile-time. The closure creation scheme not only matches the run-time performance of COMPILE, it reduces the compile-time cost as well, sometimes dramatically. When compile efficiency is important, it's a handy tool to have in the toolbox.

If you want to experiment with the different approaches, gwking.lisp includes the code from Gary, Peter, me, and adeht, along with a little bit of benchmark code.

Tags:

Comments

this is the best approach i've seen so far. thanks xach

July 2014

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728293031  
Powered by LiveJournal.com