2007-12-29

The Design of Extended Exercises

One of the highlights of the TeachScheme! method is to create Extended Exercises. Several of these pepper How to Design Programs, and even more have been created since to deal with a variety of interesting problem scenarios (e.g., illustrating graphics via t-shirt design, explaining networking by having machines play roles in a theatrical play, demonstrating communication with foreign sites by processing data from a microfinance institution, etc). Through an Extended Exercise a student learns about how computer science connects to domains, develops practice building programs incrementally, learns to build earlier assignments that later assignments can depend on, and so forth.

Here is a preliminary articulation of some principles that I think govern a good Extended Exercise, with an emphasis on their “form factor”.

  • Pick a domain. Whether the domain looks inward (a computing activity such as networking) or outward (such as social networking) doesn't matter. If it does look inward, try to make it more applicable through the judicious use of data (the same exercise can look very dry or very applied depending on what data you choose). For instance, our networking exercises is presented in terms of Shakespeare's Hamlet.
  • If necessary, write a Teachpack. A domain almost certainly needs a Teachpack to reduce the programming burden on students. For instance, the microfinance exercise uses a Teachpack to hide the ugly details of screen-scraping (which in turn need to be constantly updated by a vigilant maintainer).
  • Try to provide a non-trivial dataset in the Teachpack. Good data can make an assignment more enjoyable—e.g., our networking example provides an excerpt from Act 2, Scene 2 of Hamlet (“What a piece of work is man!”)—and in cases where the exercise depends on connecting to an external site (e.g., the microfinance example), the data may be essential.
  • Structure the assignment to have five to eight questions: not much fewer (too few steps) nor much more (too much to grasp).
  • Try to decompose the problem using good principles of stepwise refinment, using your own wisdom in these matters. By showing students several such examples, we hope for them to build up an intuition for the process. Your decomposition may not be strictly linear; that's okay. But it should be progressive.
  • Perhaps the most important point: At every step, try to have a full, working application. That means the Teachpack may need to export several interfaces, each one taking more parameters and accepting more functionality than the previous one. Otherwise the student needs to have all the parts working before they can understand whether even one works in context, leading to a frustrating learning experience and encouraging wanton hacking as they try (and invariably fail) to quickly get to a working system.
  • Design interfaces carefully to make judicious use of first-class functions. It is inevitable that students will need to provide functions (not just flat data) to what your Teachpack exports. Show them the invocations of your Teachpack in terms of named functions (that they define).
  • Try to provide a few extra-credit routes for ambitious students. Options include letting students peel back even more of the Teachpack, or adding interesting features.

PLT Scheme v372

PLT Scheme version 372 is now available from http://download.plt-scheme.org/
This is mostly a bug-fix release.
Changes:
  • DrScheme now supports name completion via Ctl-/ (Windows and X) or Cmd-/ (Mac OS X). Completion is sensitive to the current language in DrScheme, but it is not sensitive to lexical bindings.
  • DrScheme's stepper now supports the "check-expect", "check-within", and "check-error" forms of the testing.ss teachpack.
  • A number of bug fixes and small improvements for ProfessorJ. The grammar for the current release slightly differs from the one in HtDC.

Feedback Welcome.

2007-12-19

Your security hole is my fun hack, or: computing factorial in DrScheme with a click-powered loop.

One of the many changes in v4.0 is to close a security hole in DrScheme. Specifically, DrScheme v371 lets the program in the definitions window get a hold of the editor containing said program and manipulate it programmatically. There are lots of bad things one might do with this fact, like circumventing DrScheme's protections and cause it to crash, or even spontaneously exit.

But, we can do something even more fun. Put the following program into a DrScheme window (in v371) and set the language to the mzscheme/textual language. Change "input" to whatever number you wish to compute the factorial of and then hit the Run button until your program transforms itself into the final result.

(define input 10)
(require (lib "mred.ss" "mred") (lib "class.ss"))
(let* ([ed (let-syntax ([m (λ (stx) (with-syntax ([x (syntax-source stx)]) #'x))])
             (m))]
       [mth (regexp-match 
             #rx"^; ([0-9]+) ([0-9]+)" 
             (send ed get-text 0 
                   (send ed paragraph-end-position 0)))]
       [lckd (send ed is-locked?)])
  (send ed begin-edit-sequence)
  (send ed lock #f)
  (if mth
      (let ([n (string->number (list-ref mth 1))]
            [acc (string->number (list-ref mth 2))])
        (send ed delete 0 (send ed paragraph-end-position 0))
        (if (= n 1)
            (begin (send ed delete 0 (send ed paragraph-end-position 0))
                   (send ed insert (format "~a\n#|" acc) 0)
                   (send ed insert "\n|#" (send ed last-position)))
            (begin (send ed delete 0 (send ed paragraph-end-position 0))
                   (send ed insert (format "; ~a ~a" (- n 1) (* n acc)) 0 0))))
      (send ed insert (format "; ~a 1\n" input) 0))
  (send ed lock lckd)
  (send ed end-edit-sequence))

2007-11-12

Getting rid of set-car! and set-cdr!

Functional is Beautiful

Scheme is a “mostly functional” language. Although Schemers don’t hesitate to use set! when mutation solves a problem best, Scheme programmers prefer to think functionally. Purely functional programs are easier to test, they make better and more reliable APIs, and our environments, compilers, and run-time systems take advantage of functional style.

A Schemer’s functional bias is especially strong when writing programs that process and produce lists. The map function, which does both, is a thing of beauty:

  (define (map f l)
   (cond
     [(null? l) '()]
     [else (cons (f (car l)) (map f (cdr l)))]))

The map function is most beautiful when the given f is functional. If f has side-effects, the the above implementation over-specifies map, which is traditionally allowed to process the list in any order that it wants (though PLT Scheme guarantees left-to-right order, as above). Arguably, when some other Schemer provides a non-functional f, then it’s their problem; they have to deal with the consequences (which may well be minor compared to some benefits of using mutation).

The map function might also receive a non-list, but the map implementor can guard against such misuse of map by wrapping it with a check,

  (define (checked-map f l)
    (if (list? l)
        (map f l)
        (error 'map "not a list")))

and then exporting checked-map instead of the raw map. This kind of checking gives nicer error messages, and it helps hide implementation details of map. We could further also imagine that the raw map is compiled without run-time checks on car and cdr.

The Problem with Mutable Pairs

What if someone calls checked-map like this?:

  (define l (list 1 2 3 4 5))
  (checked-map (lambda (x)
                 (set-cdr! (cddr l) 5))
               l)

The f provided to map in this case is not purely functional. Moreover, it uses mutation in a particularly unfortunate way: the list? test in checked-map succeeds, because the argument is initially a list, and the mutation is ultimately discovered by a call to cdr --- but only if checks haven't been disabled.

If you’re a Schemer, then unless you’ve seen this before, or unless you thought a bit about the title of this section, then you probably didn't think of the above test case for map. A Schemer’s view of lists is so deeply functional that it's hard to make this particular leap.

Furthermore, this example is not contrived. If you have either Chez Scheme version 6.1 or a pre-200 MzScheme sitting around, calling map as above leads to a seg fault or an invalid memory access:

  Chez Scheme Version 6.1
  Copyright (c) 1998 Cadence Research Systems

  > (define l (list 1 2 3 4 5))
  > (map (lambda (x) (set-cdr! (cddr l) 5)) l)

  Error: invalid memory reference.
  Some debugging context may have been lost.

The map example illustrates how mutable pairs can break a Schemer’s natural and ingrained model of programming. Of course, if optimizing and providing friendly error messages for map were the only issues with mutable pairs, then it wouldn’t matter; Scheme implementors are smart enough to (eventually) get this right. Unfortunately, the underlying problem is more pervasive.

In the API for a typical Scheme library, lists can be used for many kinds of input and output. Flags for options might be provided in a list. A function might provide information about the current configuration (e.g., the current items in a GUI list box) in a list. Procedures or methods that deal gracefully with list mutation are few and far between. In most cases, the result of unexpected mutation is merely a bad error message; sometimes, however, unexpected mutation of a list can break the library’s internal invariants. In the worst case, the library whose internal invariants are broken plays some role in a system’s overall security.

Mutable lists also interfere with the language’s extensibility. The PLT Scheme contract system, for example, offers a way to wrap an exported function with a contract that constrains its input and outputs, which are optionally (in principle) enforced by run-time checks. Higher-order contracts, such as “a list of functions that consume and produce numbers”, require wrappers on sub-pieces, and these wrappers can be installed only by copying the enclosing list. Copying a mutable list changes the semantics of a program, however, whereas contracts are supposed to enforce invariants without otherwise changing the program. Copying an immutable list creates no such problem.

Finally, mutable lists make the language’s specification messy. The R6RS editors spent considerable energy trying to pin down the exception-raising guarantees of map; the possibility of mutable pairs made it difficult to provide much of a guarantee. The standard says that implementations should check that the lists provided to map are the same length, but it’s not worth much to require that check, since an argument’s length as a list can change via mutation to the list’s pairs.

Switching to Immutable Pairs

The designers of PLT Scheme long ago recognized the problems of mutable pairs, and we introduced functions like cons-immutable and list-immutable to support programming with immutable lists. These additions solved some problems --- but only in the cases where we were careful to use immutable lists. The R6RS editors also recognized the problems of mutable pairs, so that set-car! and set-cdr! were banished to their own library --- but programmers are still free to use that library.

While these are worthwhile steps for many reasons, they do not solve the underlying problem. Library implementors who deal in lists must still either set up elaborate guards against mutation, pretend that the problem doesn't matter, or require the use of a special immutable-list datatype that is incompatible with libraries whose authors set up elaborate guards or ignore the problem.

Why all this hassle? If most Scheme code really does use and expect pairs in a functional way, can't we just switch to immutable pair? Most Scheme code will still work, untold security holes will have been closed, specifications will become instantly tighter, and language extensions like contracts will work better.

Schemers have been reluctant to make this leap, because it has never been clear just how much code relies on mutable pairs. We don’t know how much the switch will cost in porting time and long-term incompatibility, and we don’t really know how much we will gain. We won't know until we try it.

For PLT Scheme v4.0, we’re going to try it. In our main dialects of Scheme (such as the mzscheme language), cons will create immutable pairs, and pair? and list? will recognize only immutable pairs and lists. The set-car! and set-cdr procedures will not exist. A new set of procedure mcons, mcar, mcdr, set-mcar!, and set-mcdr! will support mutable pairs. (A related v4.0 change is that define-struct by default creates immutable structure types.)

Of course, PLT Scheme v4.0 will support an R5RS language where cons is mcons, and so on, so many old programs can still run easily in the new version. The difference is that interoperability between R5RS libraries and PLT Scheme libraries will be less direct than before.

Experience So Far

PLT Scheme v3.99.0.2 exists already in a branch of our SVN repository, and it will soon move to the SVN trunk. That is, we have already ported at least a half million lines of Scheme code to a dialect without set-car! and set-cdr!.

The conversion took about eight hours. Obviously, relatively little code had to change. The following are the typical porting scenarios:

  • The reverse! and append! functions were frequently used for “linear updates” by performance-conscious implementors. As our underlying Scheme implementation has improved, however, the performance benefits of these functions has become less. All uses could be replaced with reverse and append.

  • The set-cdr! operation was often used to implement an internal queue. Such internal queues were easily changed to use mcons, mcar, mcdr, and set-mcdr!.

  • An association-list mapping was sometimes updated with set-cdr! when a mapping was present, otherwise the list was extended. Since the extension case was supported, it was easy to just update the list functionally. (The relevant lists were short; if the lists were long, the right change would be to use a hash table instead of a list.)

  • A pair was sometime used for an updatable mapping where a distinct structure type is better. The quick solution was to throw in a mutable box in place of the value.

The PLT Scheme code might be better positioned for the switch than arbitrary Scheme code. Most of it was written by a handful of people who understood the problems of mutable pairs, and who might therefore shy away from them. However, the PLT Scheme code base includes a lot of code that was not written specifically for PLT Scheme, including Slatex, Tex2page, and many SRFI reference implementations. With the exception of SRFI-9, which generalizes set! to work with pairs, the SRFI implementations were remarkably trouble free. (Thanks to Olin Shivers for making mutation optional in the “linear update” functions like reverse! from SRFIs 1 and 32.)

In addition, we looked at a number of standard Scheme benchmarks, which can be found here:

http://svn.plt-scheme.org/plt/trunk/collects/tests/mzscheme/benchmarks/common/

Of the 28 benchmarks, eight of them mutate pairs. Four of those are trivially converted to functional programs, along the lines of the scenarios above. One, destruct, is designed specifically to test mutation performance, so it makes no sense to port. Another, sort1, is a sorting algorithm that inherently relies on mutation; a functional sort is obviously possible, but that would be a different benchmark. The conform benchmark uses mutable pairs for tables in a relatively non-local way; as a modern Scheme program, it would probably be written with structures, but it’s not trivial to port. The peval benchmark uses pairs to represent Scheme programs, and it partially evaluates the program by mutating it, so it is not trivial to port. To summarize, out of 28 old, traditional benchmark programs, only two represent interesting programs that are not easily adapted to immutable pairs. (They run in PLT Scheme’s R5RS language, of course.)

Finally, we selected a useful third-party library that is not included with PLT Scheme. We checked the generic SSAX implementation (not the PLT Scheme version), and we found a couple of uses of set-car! and set-cdr!. Again, they fall into the above queue and association-list categories that are easily and locally converted.

Meanwhile, as we start to use v3.99 to run scripts in our day-to-day work, immutable pairs have so far created no difficulty at all. So far, then, our optimism in trying immutable pairs seems to be justified; it just might work.

But Its Lisp Tradition!

A typical response to news of the demise of mutable pairs is that it will create lot of trouble, because mutable pairs are Scheme tradition, and surely lots of useful old code exploits them in lots of places.

We’re eager to hear whether anyone has such code. Our initial hypothesis is that practically all old code falls into one of two categories:

  • The code is easily ported to immutable pairs, along the same lines as above (i.e., local queues and small association lists).

  • The code so old and generic that it can be run as an R5RS program. It won’t call into the large PLT Scheme set of libraries that will expect immutable pairs, and it can easily be used as a library with wrappers that convert mutable pairs back and forth with immutable pairs.

Frankly, we’re not so eager to hear opinions based on guesswork about existing code and how it might get used. Download v3.99 from SVN or as a nightly build when it becomes available; let us know your guesses about how running your old code would go, but then let us know what actually happens.

The immutable-pairs plan for v4.0 is not set in stone, but we won’t make the decision based on guesswork. More libraries (other than R5RS) to aid compatibility may be useful, but so far we don’t have a tangible need for them. In any case, we’ll revert to mutable pairs only if significant experience with the pre-release version demonstrates that it really won’t work.

2007-09-14

Don't say "abstract" (instead say "general")

The word "abstract" is common in computer science. An abstract thing is one where some part of the whole is unspecified. For instance, the expression "3*x + 3" is an abstraction of the expression "3*4+3", because the "x" is unspecified. Likewise, a function is an abstraction over some set of values, supplied when the function is called.
The word "general" is not at all common in computer science. In non-computer-science use, the word "general" is used to describe things that may be applied to more than one thing or situation. For instance, a "more general solution" is one that applies not just to the problem at hand, but instead to a larger set of problems.
From a computer science perspective, things that are abstract are also general. Things that are general are also abstract. Substituting the word "general" for the word "abstract" would not be a terrible hurdle.
From a non-computer-science perspective, however, "general" and "abstract" have very different implications. Something that is general is better: it is more useful, it applies more frequently. Something that is abstract, though, is worse: it is lacking detail, it is non-concrete.
This is one difference--the major difference?--between computer science (and of course mathematics) and the real world: the abstract is no less concrete. We can abstract over expressions using functions, and we can even abstract over syntactic things, using hygienic macros. The result of such abstraction is a perfectly well-defined element in our universe of expressions.
In computer science, then, the pejorative sense of the word "abstract" is misleading, and the use of the terms "abstract" and "abstraction" merely provides ammunition for those who wish that we could all still be writing assembly language.
I suggest instead the use of the word "general."
John "purveyor of barbarous neologisms" Clements

2007-09-09

Completions in DrScheme (finally)

DrScheme now supports a language- sensitive (but not lexical- scope sensitive) completion feature. Type <menukey>-/ and see what names are available to finish off the word you're typing.

Thanks to Jacob (and do follow that link; we all need a little more love in our lives) and Mike for taking the initiative to actually implement what is probably the most requested feature in DrScheme at the moment.

2007-09-06

How many occurrences of car in the PLT source code?

Lets play a guessing game. See who can guess:

  • How many occurrences of the identifier 'car' there are in the PLT tree (when using 'read' and just counting the symbols that come out)?
  • Where does 'car' rank on the list of the most commonly used identifiers?
  • What is the most common identifier, and how many occurrences of it are there?
UPDATE: The two files raw-hattori and raw-kajitani.ss are generated files containing solutions to Paint by Numbers problems and about 30,000 occurrences of x and o. Discounting them, this is the list of the top ten identifiers and the number of occurrences:
((define 25294)
 (quote 24101)
 (lambda 18883)
 (let 14796)
 (send 14349)
 (x 11877)
 (if 11118)
 (... 8474)
 (car 7610)
 (syntax 6537))
The identifier cdr ranks 21st with 5,259 occurrences, let* has 3,066 which, when combined with let comes out at 17,862, still not enough to pass lambda. Speaking of combining, λ has 2,271 occurrences, which is also not enough to move lambda. Finally map comes in 32nd with 3,853 occurrences and foldl beats out foldr (1168th place with 75 occurrences vs 1451st place with 58 occurrences).

2007-09-03

Birthday Easter Eggs in DrScheme

DrScheme has five birthday easter eggs in it, one for each of the main contributers to the PLT Scheme infrastructure (Matthias, Matthew, Eli, Shriram, and me). I put four of them in there, and mostly concentrated on making them fun. Matthew added mine and the best part of that one is figuring out on earth it shows up (it is quite tricky to find the code that actually makes that one appear).

I don't want to ruin the fun of searching for the Easter Eggs yourself, but just to get you started, do have a look at plt/collects/framework/private/bday.ss for Matthias, Matthew, Shriram, and Eli's birthdays. Mine is July 2nd.

Happy Hunting!

2007-08-22

New Debugger Features

As Eli mentioned, v371 introduces support for debugging several files at a time, as well as new buttons for stepping Over and Out of expressions in the debugger.

Debugging across multiple files is easy. Start by opening the "main" file that you want to debug and all of the files it requires (directly or indirectly) that you want to debug along with it. Then click Debug in the main file's frame. For example, if I wanted to see what the FrTime dataflow engine (in frp-core.ss) does when a particular program (say demo-module.ss) runs, I would open these two files and click Debug in the frame for demo-module.ss.

As each required file loads, DrScheme offers the option of debugging it. If you choose "yes", then the file is included in the debugging session, so you can set breakpoints and step into it. (Note that this will make the code in the file run more slowly, and single-stepping at calls to its functions will bring you into it.) A file can only participate in one debugging session at a time, so if you're already debugging it with some other program, DrScheme will tell you so (instead of asking whether to debug it). For best results, all of the files you debug should be modules. Once a file is included in the debugging session, you can set breakpoints and step into it as if you were debugging it by itself.

As soon as you can debug programs that span several files, it's particularly valuable to be able to do more than set breakpoints and single-step. This is the motivation for the new Over and Out buttons, which are also quite simple. If the execution marker is at the start of an expression that's not in tail position, then you can step over the entire expression, which is equivalent to setting a one-shot breakpoint at the end of the expression and continuing. (If you've set breakpoints inside the expression, or inside any functions it calls, then execution may suspend before reaching the end.) Likewise, if execution is suspended and the current expression is evaluating within a debugging-enabled context, then you can step out to the innermost such context. This would be difficult to simulate by hand, since you'd need to keep track of recent callers.

At any given point, either or both of the Over and Out buttons may be disabled, but over the course of a session they can eliminate a lot of tedium.

The screenshot above shows a session debugging frp-core.ss as used by demo-module.ss. Execution is suspended on a right paren, so stepping Over is disabled, but we see the expression's value at the upper left, we've moused over b to see its value at the upper right, and it's possible to step Out.

2007-08-18

PLT Scheme v371

PLT Scheme version 371 is now available from http://download.plt-scheme.org/ This is mostly a bug-fix release. Changes:

  • The debugger now works across multiple files and supports "step over" and "step out" operations.
  • HtDP teachpacks: the world.ss teachpack now exports two add-line functions: one from image.ss and one for adding lines to scenes.
  • ProfessorJ now includes a language level between Intermediate and Advanced, Intermediate + access, that includes all of Intermediate and introduces access modifiers and overloading. The language manuals contain the complete details.

Feedback Welcome.

2007-08-07

PLT Modules and Separate Compilation

For my summer job this year, I'm programming in Common Lisp; this is the first time I've used the language for anything more than toy examples. The experience has given me new appreciation for the PLT module system and how it enables separate compilation.

Lisp has a package system, of course, but it's not the same thing. It's primarily a tool to make sure that the symbols in one part of the program don't collide with the symbols in another part (unless you ask them to). Packages aren't about abstraction: while you can specify which symbols are exported from the package and which aren't, that's just a suggestion that's not enforced by the language.

(You'll notice, by the way, that I used the word "symbol" and not "identifier," which is the more common term in the study of programming languages, in the previous paragraph. That's deliberate: the Lisp package system works on symbols, not identifiers, so it also affects quoted, literal symbols. In my experience, this is sometimes helpful, sometimes a real pain, and usually completely unexpected. But that's a topic for another post.)

Also, there's no real relationship between Lisp packages and files. One package can be spread across multiple files, and one file can contain code in several different packages.

All this means that separate compilation in Lisp is a real problem. There is a system, ASDF, that attempts to address this need. (For more details, consult the closest thing to a homepage that I could find for ASDF.) I'm no expert on ASDF, but essentially the programmer specifies the dependencies between source files, in a set of files that exist parallel to the Lisp source. (ASDF does support grouping source files into larger chunks and specifying dependencies between those chunks, but as far as I can tell that's largely a convenience thing.)

The key thing for separate compilation, of course, is the dependencies. With ASDF, the programmer specifies those manually, and then ASDF basically does a topological sort such that if file a depends on file b, then ASDF ensures that a is compiled and loaded before b is compiled, and again before B is loaded. (This should start sounding a little familiar to folks who've worked in the area where PLT's modules and macros intersect.)

So far, so good. Unfortunately, there are a couple of problems with this setup. First, the dependencies between files are specified outside the language. This means that, if you happen to forget one, the results are not well-defined. If ASDF happens to choose an order that's consistent with the dependency you left out, everything will just work, and you won't have any indication that there's a problem. If, however, it doesn't, then you'll get random "undefined function" and "undefined symbol" errors---if you're lucky (at least in SBCL, the implementation of Common Lisp that I use at my job). In PLT, by contrast, inter-module dependencies are part of the language, so the compiler will always give you an undefined-identifier error when it tries to compile a module in which you've forgotten a require form. Big win, in my opinion (although we could argue about whether this should be an error or a warning, and whether the compiler should report lots of errors or just one before giving up completely).

Second, because ASDF lives outside the compiler, it can't be very smart about how macros affect separate compilation. I don't fully understand this, perhaps because the folks who've been mentoring me at my job haven't thought it worth the time to explain it to me fully. But it appears that, if you change a macro that's used in other files, or change a function that's called by a macro at expansion time, you have to do the effect of a make clean in a distressingly large number of cases. This is a real problem when you've got a large source base (~200K LOC, I think) and you're trying to speed up builds, as we are, and it's especially problematic if you're trying to run unrelated parts of the build in parallel.

I've certainly griped about the complexity of the interaction between PLT's modules and macros in the past. But after this summer, I have to say it's awfully nice to have a module system that Just Works for separate compilation. Nicely done, Matthew.

(I've pointed the folks at work at Matthew's ICFP 02 paper, but as that technique requires a lot of support from the compiler, and we don't have the resources to add the necessary support to SBCL ourselves, I don't know that it'll be more than a "wouldn't it be nice if we could do that?")

(Answer to rhetorical question in preceding paragraph: Yes. Yes it would.)

2007-08-06

macros and hygiene, resumed

The Friday entry demonstrates how to break hygiene for a macro that defines a generator. Ryan Culpepper, the local macrologist, reminded me that expanding into this macro goes wrong in the syntax-case world:

(define-syntax define-that-expands-into-define/y 
  (syntax-rules ()
    ((_ (name arg ...) body ...) 
     (define/y (name arg ...) body ...))))

(define-that-expands-into-define/y (bar)
  (yield 1)
  (yield 2)
  'finished)
Run this in Pretty Big [DrScheme] and you get a strange note concerning MrEd's yield or run it in MzScheme [Textual] and you get an error message about 'yield' being unbound. What gives? The 'stx' of datum->syntax-object is the syntactic context of the new macro but it doesn't bind yield; it just uses it. So the definition of yield in define/y must be a different one according to the hygiene standards. Ergo yield is free at the top-leve [MzScheme] or bound to the yield import from MrEd [Pretty Big]. ;; --- How can we try to fix this? The explanation suggests we use a different macro definition for define/y, one that uses a context that is guaranteed from the body of an instance of define/y:
(require (lib "control.ss"))

(define-syntax (define/y stx)
  (syntax-case stx ()
    [(_ (name arg ...) body0 body ...)
     (with-syntax 
         ((yield-name 
           (datum->syntax-object (syntax body0) 'yield)))
       (syntax
        (define (name arg ...)
          (define (yield-name x)
            (control resume-here
             (set! name 
                   (lambda ()
                     (prompt (resume-here 'dummy))))
             x))
          (prompt body0 body ...))))]))

(define-syntax define-that-expands-into-define/y 
  (syntax-rules ()
    ((_ (name arg ...) body ...) 
     (define/y (name arg ...) body ...))))

;; --- try it out ---

(define-that-expands-into-define/y (bar)
  (yield 1)
  (yield 2)
  'finished)

(list (bar) (bar) (bar) (bar))
Run it. You will find that it works as expected. Tomorrow, time permitting, I will tell you what's wrong with it and how you can fix it.

2007-08-03

control and macros

After reading the posts on control operators, Vlado Zlatanov decided to look into prompt, control, fcontrol and the rest of the goodies in control.ss. So based on the example from the blog post I did this python-like snippet:

(define/y (step) 
  (yield 1)
  (yield 2)
  (yield 3)
  'finished)
He decided to look into turning it into a macro, such that the above ends up being correct code. When he got stuck, he asked on our mailing list and the resulting dialog was so informative that I decided to blog it. My first replay was this suggestion:
(define-syntax define/y
  (syntax-rules ()
    [(_ yield-name (name arg ...) body ...)
     (define (name arg ...)
       (define exit-with #f)
       (define (switch-control-context th)
         (call/cc 
          (lambda (k)
            (set! exit-with k)
            (th))))
       (define (yield-name x)
         (call/cc 
          (lambda (resume-here)
            (set! name 
               (lambda () 
                 (switch-control-context 
                  (lambda () 
                     (resume-here 'dummy)))))
            (exit-with x))))
       (switch-control-context (lambda () body ...)))]))
I sent this out with two suggestions. First, use control.ss to simplify the code. Second, use syntax-case to eliminate the need for the programmer-user of define/y to specify the name of yield. So, here is the prompt-based code:
(require (lib "control.ss"))

(define-syntax define/y
  (syntax-rules ()
    [(_ yield-name (name arg ...) body ...)
     (define (name arg ...)
       (define (yield-name x)
         (control resume-here
            (set! name
                  (lambda ()
                    (prompt (resume-here 'dummy))))
            x))
       (prompt body ...))]))

(define/y yield (step) 
  (yield 1)
  (yield 2)
  (yield 3)
  'finished)

(equal? '(1 2 3) (list (step) (step) (step)))
This time I include a test case that assures the proper return behavior of yield. The definition of define/y shows how to mark the return point with prompt and how to switch to this point with control so that your generator can resume the traversal at the place where it was interrupted. For the second challenge, I wrote this definition:
(require (lib "control.ss"))

(define-syntax (define/y stx)
  (syntax-case stx ()
    [(_ (name arg ...) body ...)
     (with-syntax 
         ((yield-name (datum->syntax-object stx 'yield)))
       (syntax
        (define (name arg ...)
          (define (yield-name x)
            (control resume-here
             (set! name 
                   (lambda ()
                     (prompt (resume-here 'dummy))))
             x))
          (prompt body ...))))]))

(define/y (step) 
  (yield 1)
  (yield 2)
  (yield 3)
  'finished)

(equal? '(1 2 3) (list (step) (step) (step)))
If you compare the two macro definitions, you notice very little difference. Indeed, what really differs is the "interface" (the API), that is, the way you can use the macro: see the test case. What also differs is that the definition uses syntax-case and with-syntax to inject yield into the body of define/y. In response, Vlado wrote "but isn't this non-hygienic." Here is my response:
Hygiene is a uniformity default imposed on the expander with a provision for programmers to choose the non-default. I chose this word carefully when I coined the phrase. So what you have *is* a hygienic solution.
In other words, injecting an identifier into a macro is not a violation of hygiene at all. It's just means using the full power of the macro system.

Experience Report: Scheme in Commercial Web Application Development

Our paper “Experience Report: Scheme in Commercial Web Application Development” is online. As the title suggests, it describes our experiences over the past year developing a number of web-based applications in PLT Scheme. If we'd chosen a language like Java or Ruby we could have used a large number of libraries developed for web apps, whereas PLT Scheme has relatively few libraries in this area, and they haven't been tested under high load. So we were gambling that Scheme would make us so productive we could develop our own libraries and the applications we were contracted to produce in the same time it would take to develop just the applications in another language. It was a gamble that paid off. You'll have to read the paper for all the details, but suffice to say we delivered the applications on time (and more are in development) and our libraries already compare well against big names like Ruby on Rails and J2EE.

On thing that got cut from the paper was our use of Flapjax is parts of the interface. If you write complicated Javascript to take a look at it. It really does simplify event handling, and our code using Flapjax is half the size of our original code without it.

Update: This is more or less the same post as on Untyping

2007-08-02

Relationally-Parametric Polymorphic Contracts

We've been making progress on the connection between types and contracts. This paper is a step towards answering the question, “What would polymorphic types (a la Standard ML) look like in a contract world?” If you haven't thought much about polymorphic types, you may find the answer has some subtlety; if you have, hopefully you will find the answer reasonable.

Arjun and I want to point out that some of the work in this paper was already in an earlier paper that Jacob and Robby wrote, but the material was excised from the public version, so we weren't aware of it. But there is some fresh material here as well, and anyway Robby and I have been gabbing about this question for years.

2007-07-30

control, resumed

Since at least some people helped me re-re-invent prompt after my last post, I thought I would remind people that PLT Scheme is the only production system in the world that provides delimited and (truly) composable continuations directly (and w/o loss of TCO properties). So here is the same fragment again:

(require (lib "control.ss"))

(define (generate-one-element-at-a-time a-list)
  (define (control-state)
    (for-each (lambda (an-element-from-a-list)
  (control resume-here
    (set! control-state resume-here)
    an-element-from-a-list))
              a-list)
    'you-fell-off-the-end-off-the-list)
  (define (generator) (prompt (control-state)))
  generator)
Take a look, compare and contrast with the previous post. Time permitting, I will continue to show you another control poem soon. P.S. See Adding Delimited and Composable Control to a Production Programming Environment for details.

2007-07-27

call/cc and self-modifying code

Today I wrote this short illustration of call/cc and posted it on wikipedia:

;; [LISTOF X] -> ( -> X u 'you-fell-off-the-end-off-the-list)
(define (generate-one-element-at-a-time a-list)
  ;; (-> X u 'you-fell-off-the-end-off-the-list)
  ;; this is the actual generator, producing one item from a-list at a time
  (define (generator)
     (call/cc control-state)) 
  ;; [CONTINUATION X] -> EMPTY
  ;; hand the next item from a-list to "return" (or an end-of-list marker)'
  (define (control-state return)
     (for-each 
        (lambda (an-element-from-a-list)
           (set! return ;; fixed
             (call/cc
               (lambda (resume-here)
                 (set! control-state resume-here)
                 (return an-element-from-a-list)))))
        a-list)
     (return 'you-fell-off-the-end-off-the-list))
  ;; time to return the generator
  generator)
It reminded of all the talk in the 1980s and 1990s that self-modifying code is bad. But look at the elegant assignment to control-state within its body. It's such a poem, I thought I'd share it with people since nobody else blogs here anywya.

2007-06-14

Small is Beautiful, Large is Useful, and Scheme is Both

They say, Scheme is small and this is good.

Have you heard of X? No? It is the smallest computational basis. It is a single function that can compute everything a Turing machine can compute; a Church lambda calculus; a Post model; a RAM; a what-have-you model of computation. Indeed, X is so simple that two equations suffice to specify it completely [Barendregt, page 166]. Imagine that: a complete language report in two lines; a compiler that fits in a few K instead of Ms; no more arguments about smallness.

Small alone can't be any good. If you used X alone, your programs would be the size of the universe or something like that. That's what the theory of computability teaches us [Church and Turing]. Adding LAMBDA and a few primitives to get a pure functional language isn't good enough either. That's what the theory of expressiveness shows [Felleisen; Mitchell; Riecke]. And, using an R5RS Scheme to build large systems with many people at a dozen sites isn't doable either. That's what the PLT experience determined.

When we set out to construct DrScheme using MzScheme, we also conducted an experiment:

Could we really build a graphical system that manages (shared) resources and that provides excellent error feedback with just plain Scheme?
Could we just add enough libraries to do all this? Or would we have to change the kernel of the language? As much as we tried to keep MzScheme small, it became clear quickly that we needed exceptions, structures, module-like features, a mechanism for concurrency, a way to manage resources such as windows, tcp connections, and so on. The list isn't infinite but it is much longer than I expected. Our "Revenge of the Son of the LISP machine" paper is a good summary for the state of the art around 1999 [Flatt and Son].

As language designers we stepped back time and again to look at our monster. What could we remove? What would we have to add in response? For some five years, we had first-class modules (units) and first-class classes in the core of the language. We had almost banned macros. They were so ugly I stopped teaching about them because I did want to use our own dog food in my courses but I couldn't stomach the macro system. It was such a step back from Eugene's extend-syntax. But then Matthew figured out the next big step in macro and module technology [Flatt, You Want It When?]. And with that out went units and classes from the core and many other things. So we learned lessons, and we need to keep building systems to learn more.

I have no question that the idea of Scheme is beautiful. At the same time, I have also learned that if I wish to use this beautiful idea in practice, I need to add the ingredients that it takes to build large systems. R6RS reflects this insight, and I am happy about it.

2007-06-09

R6RS is "perfect"

When I read the "side by side" and "head to head" descriptions of the alternatives facing the Scheme community (see Comp.Lang.Scheme and the R6RS mailing list), I am wondering which one is which and which one is better.

  • Is it really good that Scheme (the spec) doesn't support a module system?
  • Is it really good that almost all major implementations support their own version of a module system?
  • Is it really good that programmers can't even leave the module structure intact when porting code?

Imagine your own similar questions and add them here. We have lived in a side-by-side universe for a long time, and there are quite a few programmers who have suffered from this not-really-the-same-language problem. Besides the module system, there are other not-quite-the-same-but-related features that implementations have and programmers wish to use.

The R6RS process has pushed several major implementors/implementations to agree on a design for module systems and other constructs. Their report declares that they are ready to put a large amount of work in to get from r5rs to r6rs. I believe that this step would help the community in several arenas, listed in increasing order of relevance:

  1. the academic publishing business
  2. the fund raising business
  3. adapting each others innovations
  4. supporting programmers who learn on one and switch to another implementation
  5. supporting commercial programmers who need reassurance that there is more than one implementation and implementor [ever attended Commercial Uses of Functional Programming?]

Is the document perfect? Is every construct exactly the 'right thing'? Of course not! Guy and Gerry revised their first Scheme report because they didn't get it 'right'. R3RS and R4RS and R5RS revised flaws in R(n-1)RS because the authors/editors didn't get it 'right'. It is extremely difficult, and usually impossible, to get the design of a complex artifacts (such as a programming language) 'right' the first time. In these cases, it's all about the feedback loop and revising your design based on observations. (Remember the 'science' part in the name of our discipline?) Indeed, 'right' doesn't exist; what exists is 'most pragmatic and internally beautiful,' and nothing else.

Our choice is quite simple: move forward as a community with some amount of convergence (r6rs) or split into dozens of mutually incompatible sub-communities (status quo, including SRFIs).

Also posted as "R6RS is perfect" at the R6RS discussion list.

2007-05-22

PLT Scheme version 370

PLT Scheme version 370 is now available from

http://download.plt-scheme.org/

Some highlights:
  • The conservative garbage collector (CGC) has been replaced with a precise garbage collector (3m) in the standard build. For most users, this change simply amounts to "better performance in space and time". For example, a long-running DrScheme instance typically uses much less memory than before.

    The new memory manager also supports a new "Limit Memory..." option (in DrScheme's "Scheme" menu) to limit the memory use of a programming running inside DrScheme.

    For those who work with C-implemented libraries and extensions, the switch to precise collection may complicate interoperability. To a large extent, however, (lib "foreign.ss") works the same with both collectors. (But note that the 3m is a moving collector, so be careful with passing Scheme objects to C.)

    Although our pre-built binaries use the new collector, builds from source using the conservative collector are still supported.

  • For a program written with one of the the "How to Design Programs" (HtDP) languages, DrScheme saves the program with meta-information that identifies the language and records the teachpacks used by the program. DrScheme's teachpack GUI now works only with the HtDP languages. In other languages, use require to access teachpacks.

    The meta-information is in the form of a reader extension that turns the file content into a module-based program, which means that teaching-language files can be loaded directly into MzScheme or used with other PLT Scheme tools.

  • The HtDP "world.ss" and "image.ss" teachpacks have been revised, including support for the creation of animated GIFs.

  • Unit-based servlets are no longer supported in the web server. Use module-based servlets, instead. (Servlets can be implemented using a unit within a module, but the web server's API is provided through a module.)

  • A new (lib "unit.ss") library replaces the old one and provides a simpler and more flexible syntax. The (lib "unitsig.ss") library is deprecated but still available as (lib "unitsig200.ss"), and the old (lib "unit.ss") is available as (lib "unit200.ss").

Feedback Welcome,

The PLT Scheme Team

2007-05-16

Looking for small Scheme scripts

As part of the Typed Scheme project, we are looking for small Scheme scripts that we can port from PLT Scheme to Typed Scheme. We would like to investigate if Typed Scheme is capable of checking idiomatic PLT Scheme code, as represented by scripts written by members of the PLT Scheme community.

Therefore, if you have a simple PLT Scheme program which handles a scripting/processing task, and you are willing to share it with us for the improvement of Typed Scheme, please let me know. Typed Scheme currently handles all of 'core' MzScheme, as well as many of the collections (the major exceptions are the class and unit systems).

In return, we will inform you of any bugs that we discover during the port.

More information about Typed Scheme is available from the webpage: http://www.ccs.neu.edu/~samth/typed-scheme.html

2007-05-10

Adjusting DrScheme's Keybindings

Check out Kyle Smith's blog post on how to change DrScheme's keybindings:

XML Transformation in Scheme

Selenium is a tool for testing web applications, the core of which is a Javascript library that controls a web browser. With the Selenium IDE you can convert your actions in a web browser into tests, and with the Selenium Remote Control you can control a web browser from code. I've recently been working on adding Selenium Remote Control bindings to PLT Scheme, which has resulted in a nice and hopefully instructional demonstration of XML transformation in PLT Scheme

The Selenium Remote Control is controlled by sending simple messages over HTTP. The format of the messages isn't important. What is, is that there are a lot of them, and the API is specified in a file called iedoc.xml that comes with Selenium. The Java/Python/Ruby bindings are generated using XSL. If I was to use XSL I'd have a processing pipeline that uses three languages (XSL, Java, Scheme) which is two more than I'd like. Hence I turned to WebIt!, an XML transformation DSL written in Scheme, to create an all Scheme pipeline. The rest of this post wshows the steps I used to transform the Selenium API into Scheme code using WebIt! I think this is interesting in its own right, but also serves as a nice demonstration of the power of macros, which WebIt! makes extensive use of.

My first step is to get an idea of the structure of the XML. The bits I'm interested in look like this:

<function name="click">
  <param name="locator">an element locator</param>
  <comment>Clicks on a link, button, checkbox or radio button.
  If the click action causes a new page to load (like a link usually
  does), call waitForPageToLoad.</comment>
</function>

Let's read in the XML file and extract all the function elements. For this I'll use SSAX and SXPath:

(require
 (planet "ssax.ss" ("lizorkin" "ssax.plt" 1))
 (only (planet "sxml.ss" ("lizorkin" "sxml.plt" 1)) sxpath))

(define api
  (with-input-from-file "iedoc.xml"
    (lambda () (ssax:xml->sxml (current-input-port) '()))))

(define functions
  ((sxpath '(// function)) api))

Ok, so we have all the functions. Now let's parse them into a more useful datastructure. Here's my first attempt:

(require (planet "xml.ss" ("jim" "webit.plt" 1 5)))

;; struct function : string (listof string)
(define-struct function (name params))

;; parse-function : sxml -> function
(define (parse-function fn)
  (xml-match fn
    [(function name: ,name
               (param name: ,param-name ,desc) ...
               (comment ,_ ...))
     (make-function name (list param-name ...))]))

(map parse-function functions)

The xml-match macro is a pattern matcher for SXML. You specify the “shape” of the SXML, and if the input matches the pattern the following expressions are evaluated:

(xml-match value
  [(pattern expression ...)]...)

The simplified form of a pattern is:

  • (element ...) matches an element with the given name.
  • name: value matches an attribute with the given name and value.
  • ,binding binds the value of binding to the given name in the scope of the following expressions.
  • ... matches zero or more of the preceeding patterns.

In our example the pattern is:

     (function name: ,name
               (param name: ,param-name ,desc) ...
               (comment ,_ ...))

So we're looking for an element called function with an attribute called name the value of which is bound to name. Then follows zero or more param elements, with attribute name, the value of which is bound to param-name. Finally we expect a comment element which can contain any amount of data. The use of _ as the binding name is a common convention to indicate data we don't care about but must still match to make our pattern complete.

I run the code in DrScheme and see the result:

xml-match: no matching clause found

Oops. So our pattern isn't complete. We've also seen one flaw of WebIt!: it doesn't give very good error messages. However we can easily fix this by adding a catch all pattern that raises an error telling us what we failed to match. The code follows. Notice that I've also added pretty printing to make the unmatched SXML easier to read.

(require (lib "pretty.ss"))

;; parse-function : sxml -> function
(define (parse-function fn)
  (xml-match fn
    [(function name: ,name
               (param name: ,param-name ,desc) ...
               (comment ,_ ...))
     (make-function name (list param-name ...))]
    [,err (let ([op (open-output-string)])
            (pretty-print err op)
            (error (format "Didn't match ~n~a~n" (get-output-string op))))]))

Run this code and you'll see the error occurs as we don't allow the description to contain more than one element. This is easily fixed by extending the pattern to ,desc .... The next error is more interesting. The function element contains a return element. The WebIt! pattern language doesn't allows us to express optional patterns, so we have to duplicate our pattern and include the case of return. This also requires we extend the defintion of the function structure.

;; struct function : string string (listof string)
(define-struct function (name return params))

;; parse-function : sxml -> function
(define (parse-function fn)
  (xml-match fn
    [(function name: ,name
               (param name: ,param-name ,desc ...) ...
               (comment ,_ ...))
     (make-function name "void" (list param-name ...))]
    [(function name: ,name
               (return type: ,type ,return-desc ...)
               (param name: ,param-name ,desc ...) ...
               (comment ,_ ...))
     (make-function name type (list param-name ...))]
    [,err (let ([op (open-output-string)])
            (pretty-print err op)
            (error (format "Didn't match ~n~a~n" (get-output-string op))))]))

This works! This is as far as I want to go in this article. We've seen how we can use SSAX. SXPath, and WebIt! to create XML transforms in pure Scheme. There is a lot more to all of these packages but what we've used is sufficient for many uses. The rest of the code to create Scheme from the API is quite straightforward and specific to Selenium. If you're curious read the source of the Selenium PLaneT package, which will be released soon.

This post also appears on Untyping

2007-05-03

Macros Matter

Thank you Jens for setting up this Blog.

PLT Scheme is a 12-year old project now and it is definitely time to open it up to the world. The language and the project has contributed numerous ideas and products to the world. This covers programming languages (units, mixins, an implementation of cml-style concurrency, etc); programming tools (drscheme, check-syntax, transparent repls, module browsers, etc), programming pedagogy (htdp, htdc); program engineering (we resurrected the "expression" problem, web programming and continuations); and some more.

Time and again, people have asked me what I consider the one 'feature' that distinguishes us from the rest of the hordes of programming languages. I always respond with a single word:

                                            macros.

We have pushed macros hard, and we have accomplished a lot with them. I conjecture that without macros, we would never have achieved the level of productivity that this group displays.

Of course, everyone else in academia works on types. ML's module type system of the third kind and Haskell's system-complete type system are serious challenges to anyone. It is probably true that you shouldn't consider yourself a programmer if you can't read and write some of those type-laden programs, and I seriously believe that they are the next generation of influential languages.

For the generation-after-the-next then, I see "macros" as one of the big topics (next to concurrency). A real programmer will have to know how Lisp and Scheme-style macros can reduce labor by orders of magnitude, how macros provide the tools for creating the "ultimate abstraction" in the form of domain-specific and embedded languages (Hudak's words). And there is no better place to start with than PLT Scheme's macro system.

So I would like to dedicate this blog to all things macros and everything else that matters in (and to) PLT Scheme.