2012-11-01

Generics

Recently at RacketCon, I gave a talk about the new Generics library that ships with Racket 5.3. In this blog post, I’ll offer a slightly expanded commentary about the new library. For the ten-minute video explanation, see the Youtube video. The accompanying slides are available here.

Introduction

Probably the first question that comes to your mind is: what are generics and what do we mean by generic programming? Let me illustrate with some example code:
> (vector-ref #(1 2 3) 2)
3
> (list-ref '(1 2 3) 2)
3
Both of the lines above operate on sequence-like datatypes, but for each of the datatypes we use different functions: vector-ref and list-ref. This is also the case with other datatypes like dictionary-like datatypes: hash-ref, assoc, etc. Or all the different kinds of equality: string=?, boolean=?, and =. These specialized operations may seem redundant. Ideally, we’d have generic functions that don’t care about the specific datatypes that we operate over.

Thankfully, Racket does provide these. For dictionaries, we have functions like dict-ref and dict-set that operate over any kind of dictionary-like type. For sequences, sequence-ref, sequence-append, and so on. These generic interfaces are all built-in to the standard library. You might wonder, however, if you can define your own generic functions.

As of version 5.3, you just need to (require racket/generic) to get all the tools you need to define your own generic interface. In the rest of the article I’ll show you how to define your own generic interface and how to use it. If you’ve seen Rust’s traits or Clojure’s protocols, our design will feel familiar.

Examples

The running example will be the implementation of a simple queue interface. Our queue will contain five operations: queue-enqueue, queue-dequeue, queue-head, queue-empty?, and queue-length.

The first thing to do is to require the library:
> (require racket/generic)
Then we use define-generics to define a generic interface:
> (define-generics queue
    [queue-enqueue queue elem]
    [queue-dequeue queue]
    [queue-head queue]
    [queue-empty? queue]
    [queue-length queue])
The method headers above define the methods that concrete implementations of the generic interface need to provide. Each header needs to contain at least one argument that matches the name of the interface; this argument will be used for dispatch. The form defines several identifiers: queue?, gen:queue, queue/c, and each of the generic functions. The first is a predicate that checks whether a given value implements the interface. The identifier prefixed with gen: is a binding that’s used to supply methods for the interface. The queue/c specifies a contract combinator for the interface, which I’ll describe later in the article.

We now have the generic functions, but they’re not very useful without concrete implementations. To implement a generic interface, you first need to define a structure type. We’ll define a simple functional queue (Okasaki 1998, p.42) that uses two lists:
> (struct simple-queue (front back)
    #:methods gen:queue
    [; helper function to balance lists
     (define (check-front queue)
       (match queue
         [(simple-queue '() back)
          (simple-queue (reverse back) '())]
         [_ queue]))
     ; enqueue an element
     (define (queue-enqueue queue elem)
       (match queue
         [(simple-queue front back)
          (check-front (simple-queue front (cons elem back)))]))
     ; dequeue an element
     (define (queue-dequeue queue)
       (match queue
         [(simple-queue (cons x xs) back)
          (check-front (simple-queue xs back))]))
     ; get the head of the queue
     (define (queue-head queue)
       (match queue
         [(simple-queue (cons x xs) back) x]))
     ; check if the queue is empty
     (define (queue-empty? queue)
       (empty? (simple-queue-front queue)))
     ; get the queue's length
     (define (queue-length queue)
       (+ (length (simple-queue-front queue))
          (length (simple-queue-back queue))))])
> (define empty-queue
    (simple-queue '() '()))
Using the #:methods keyword and the gen:queue binding, we can specify the methods that a simple-queue implements. Note that a #:methods block may also contain helper functions (like check-front) and definitions that are used to define the methods. Each method has the same method header as the corresponding headers in the interface definition.

We can check that our new queue actually works with the generic functions:
> (queue-head (queue-enqueue empty-queue 5))
5
> (queue-empty? empty-queue)
#t
> (queue-length (queue-enqueue (queue-enqueue empty-queue 7) 5))
2
It works! For any structure type, we can define methods in the same way. For example, we can define an efficient persistent queue (Okasaki 1998, p.64) that implements the same methods. This time, the implementation will use lazy evaluation with streams:
> (struct persistent-queue (front-len front back-len back)
    #:methods gen:queue
    [; helper function to balance lists
     (define (check queue)
       (match queue
         [(persistent-queue front-len front back-len back)
          (if (<= back-len front-len)
              queue
              (persistent-queue
               (+ front-len back-len)
               (stream-append front (stream-reverse back))
               0 stream-null))]))
     ; enqueue an element
     (define (queue-enqueue queue elem)
       (match queue
         [(persistent-queue front-len front back-len back)
          (check (persistent-queue
                  front-len front
                  (+ 1 back-len) (stream-cons elem back)))]))
     ; dequeue an element
     (define (queue-dequeue queue)
       (match queue
         [(persistent-queue front-len front back-len back)
          (check (persistent-queue
                  (- front-len 1) (stream-rest front)
                  back-len back))]))
     ; get the head of the queue
     (define (queue-head queue)
       (match queue
         [(persistent-queue front-len front back-len back)
          (stream-first front)]))
     ; check if the queue is empty
     (define (queue-empty? queue)
       (= 0 (persistent-queue-front-len queue)))
     ; get the queue's length
     (define (queue-length queue)
       (+ (persistent-queue-front-len queue)
          (persistent-queue-back-len queue)))])
> (define empty-persistent-queue
    (persistent-queue 0 stream-null 0 stream-null))
Our operations from before work as expected:
> (queue-head (queue-enqueue empty-persistent-queue 5))
5
> (queue-empty? empty-persistent-queue)
#t
> (queue-length (queue-enqueue (queue-enqueue empty-persistent-queue 7) 5))
2

Contracts

Earlier, I mentioned that the generic interface also comes with a contract form that’s automatically defined. You can use these to attach dynamic checks to your implementations.

For example, we can write a contract that restricts our queues to only accept integers as data elements:
> (define int-queue/c
    (recursive-contract
     (queue/c [queue-enqueue (-> int-queue/c integer? int-queue/c)]
              [queue-dequeue (-> int-queue/c int-queue/c)]
              [queue-head    (-> int-queue/c integer?)]
              [queue-empty?  (-> int-queue/c boolean?)]
              [queue-length  (-> int-queue/c integer?)])))
For the queue interface, the automatically defined queue/c combinator allows us to specify contracts on each of the methods in the interface. We also use a recursive contract here just so that we can reference the int-queue/c contract within itself.

We can apply the contract to a particular queue:
> (define/contract checked-queue
    int-queue/c
    empty-queue)
> (queue-enqueue checked-queue 42)
#<simple-queue>
> (queue-enqueue checked-queue "not an integer")
checked-queue: contract violation
 expected: integer?
 given: "not an integer"
 in: the 2nd argument of
     the queue-enqueue method of
      (recursive-contract
       (queue/c
        (queue-enqueue
         (-> int-queue/c integer? int-queue/c))
        (queue-dequeue
         (-> int-queue/c int-queue/c))
        (queue-head (-> int-queue/c integer?))
        (queue-empty? (-> int-queue/c boolean?))
        (queue-length (-> int-queue/c integer?))))
 contract from: (definition checked-queue)
 blaming: top-level
 at: eval:15.0
The second use of queue-enqueue causes a contract error as expected, since we can’t add a string to an integer queue. You can also provide a constructor for your integer queue that’s contracted to produce int-queue/cs. Any queues created with that constructor will be checked for integers.

Also, you might have noticed that the queues we wrote above don’t protect against dequeueing or taking the head of empty queues. To prevent this, we can write contracts that ensure these operations raise contract errors on empty queues. Since we want to enforce this for all queues instead of just some of them, we apply contracts to the generic functions:
> (define non-empty-queue/c
    (flat-named-contract
     'non-empty-queue
     (λ (q) (and (queue? q) (not (queue-empty? q))))))
> (define/contract (checked-dequeue queue)
    (-> non-empty-queue/c queue?)
    (queue-dequeue queue))
> (define/contract (checked-head queue)
    (-> non-empty-queue/c any/c)
    (queue-head queue))
> (checked-head empty-persistent-queue)
checked-head: contract violation
 expected: non-empty-queue
 given: #<persistent-queue>
 in: the 1st argument of
      (-> non-empty-queue any/c)
 contract from: (function checked-head)
 blaming: top-level
 at: eval:20.0
The checked-head function raises a contract error as expected instead of an exception from a stream function. In a real implementation, you would just export the original generic functions with contracts attached using contract-out instead of defining checked versions like we did here.

Summary

Racket 5.3 has made the process of defining and using generic interfaces much easier. The new library is still under active development and we plan to experiment with additional features and performance improvements. The full code from this article can be found in the following gist: https://gist.github.com/3995200

Bibliography

Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998.

1 comments:

Unknown said...

Interesting stuff. These days I find data + interfaces all I really need to in terms of "object orientation".

Given all this, is it possible to define new sequence methods in a similar way?