Full of ... umm ... ideas, yeah that's it

Well I've been stewing over the metawidget thing for nearly a year now, just from the time I first tried to implement it, using XML. Stewing on a lot of levels. I think it's an area where I can do something fairly cool with long-term value, without knowing a whole lot more than I already do, so I'm glad to be working on it. Anyway it's one of those fundamental problems... why bother working with distributed AI systems (like I wish I could be doing) if you can't even have a decent remote GUI.

And I've been going round and round on what language to use. XML is right out, I think, because it would require a separate language to describe client-side behavior. It's important to offload basic "view/control" behavior to the client, and not require so many round-trips to the server like X does. (By client, I mean the terminal at which you log in and do something, not in the X sense of the word.) In a browser-based world you could use JavaScript but I doubt that is really lightweight to implement, and it has limitations.

There was an Ask Slashdot about functional languages which got me looking into a bunch of them. Haskell is hard to understand, and there are others along that same branch of the tree, too, with various differences... and solutions to problems I didn't know I had.

Fundamentally the most important features the language I use to implement a remote GUI system must be portability, preferably ASCII; interpreted; able to load new code via a network and run it; wicked fast; and probably a modicum of OO support. So I considered Scheme. I even started writing a client using Guile linked to a C program which would be fleshed out to provide all the graphics functions. I managed to draw some Bezier curves from Scheme into an X window. Got stuck for a while as I learned more Scheme syntax; ordered a copy of the SICP so I could get really good at it. (Why oh why didn't I get a CS degree instead of EE...) Wrote a Scheme "server" also, which would pass code across a network socket for the client to execute. The client logs in, and gets back the initial code it needs to get going. Subscribed for X "expose" events in the C code, and responded to them by calling a Scheme refresh method.

I discovered some MIT students had built Postscript-like graphics operators in Scheme. My guess is that this code converts a high-level stack of primitives into a low-level set of primitives which are guaranteed not to overlap, and are easy to paint. I'm speculating that a Postscript printer works this way too. If there had been enough memory in early printers to be able to have a complete bitmap image of the entire page, Postscript would be less complicated, because if you paint one object on top of another, you just go ahead and change the colors of the right pixels in the image. But if you instead break up the overlapping primitives into polygons which do not overlap and have only a single fill per polygon, then it would be possible to generate the pixels for a "row" of dots across the paper in more or less real time by just searching the entire list of polygons to see which ones interact with that row. So to save memory that's probably what they did. Anyhow this Scheme thing from MIT only generates Postscript output, so it's not necessarily that useful. Depends on how low a level it goes to.

Then I talked to a friend of mine, who is a Scheme fan, and we discussed it for a couple days and somehow I started reconsidering Postscript again. He seemed to think it would be the best language for this, for several reasons. I had a hunch that a stack-based language would be more efficient than Scheme, anyway. He also has experience both writing parsers and doing graphics, and wants to help. I brought up NeWS, since I had heard it was like Display Postscript with a built-in event model. Display Postscript as used on NeXT and Unix workstations, and the Display PDF thing Apple is using on OSX, all depend on the underlying OS to provide the event model: the means by which the app finds out that the mouse moved or was clicked or a key was typed. DPS defines only how to draw things, and maybe how to change existing shapes that have already been drawn, but not how to get events back from the human input devices, nor how to handle such events in Postscript code.

After that I did some more web-hunting. I'd had an order form for an OpenLook documentation CD, which had some incidental NeWS info and sample programs, on my desk at work for several weeks. I was loathe to order it because most of it wasn't about NeWS, and I suspected that what there was, was probably stuff I could find on the 'net somewhere. So a few days ago I did a really serious Internet search; web sites (tricky because you have to do a case-sensitive search, otherwise "news" is such a common word it finds way too much stuff), dejanews, and I even used archie. I only found a couple of tidbits but they proved useful. One - the NeWS Book by James Gosling et. al. is still in print. So I ordered a copy pronto. Two - it's easy enough to find test/demo programs on the FTP sites, but no actual implementations of NeWS itself. Three - it was common knowledge among programmers old enough to have used NeWS, that X sucks in comparison, and Sun's dumping it was a major setback to technological progress that has yet to be rectified. Four - one of these programmers tried to clone it. It was called rbuss; only version 0.1 was ever released though. I contacted the author, and he graciously provided me a copy of it. So far I can't get it to compile; it was originally built on a Sun, not a Linux box, and there are a lot of #includes that are wrong and functions that shouldn't be overridden because they already exist, and so on. I tried for an hour or two, and haven't gotten back to it yet. Of course there is Ghostscript too, but I assumed it's too big to swallow whole, and it's known for being spaghetti code too... but maybe I can at least learn how to do some things, if I can find the places where they are done.

I want to get a chance to try a vintage implementation of NeWS because I've never actually seen it run, but it's hard to find. But by coincidence I saw a message on the Classic Computers mailing list about somebody who's got some old Suns to get rid of because he's moving. So I will probably end up with a couple of them that run SunOS 4.something, which should include NeWS; and the documentation and development tools too.

Well this weekend I read the NeWS Book, and it turns out NeWS is a lot like what I need to build. So much that it's uncanny, in fact. It was designed for being able to put interaction code on the client. It's multithreaded (in a cooperative way that needs updating to a full preemptive threading model, IMO). It has arbitrary nesting of "canvases", which can either be windows or sub-panes within windows. It has all the power of Postscript; it is compatible, and adds on what's necessary for turning it into an interactive system. As I was reading the early parts of the book, I realized it would be easy to implement the concept of an "object" by putting both methods and variables into a dictionary (a fundamental concept in Postscript, sortof like a Hashtable in Java, except they get used everywhere). Because methods are just executable arrays of keywords and parameters, a "function pointer" is nothing special, it's just a pointer to an array. Postscript is almost fundamentally OO, except that in the base version there are only a few kinds of objects and you can't easily define more of them. But if you can fake it with a dictionary, that's good enough. Well guess what? That's almost exactly what the NeWS system does. They even implemented classes and inheritance (I was thinking of cutting corners and just using "objects" directly... if you want another instance, think of it as cloning an existing object which is close to what you want, rather than "instantiating a class"). The built-in basic widget toolkit uses widget classes just like a modern OO toolkit, so you can subclass and override for custom behavior.

So I guess I will have to get used to the wacky syntax of Postscript, because circumstances are leading me down this path just as much as my own ideas are.

There are two fundamental problems which have been my nemesis in my past graphics work. One is, given two paths consisting of an arbitrary collection of lines, arcs and bezier curves, find the intersection. What makes it hard is that the method for finding the intersection of anything with a Bezier curve is not straightforward; you can do it algorithmically by trial and error, but not in constant time. My buddy Rob says maybe this would be a reason not to use these curves. But, I really like how Bezier curves are so useful in drawing things, and they are fundamental in Postscript, probably for good reason. Rob says that Knuth claimed that the Metafont system in TeX used quadratic curves instead because they are more efficient. I have used them, and they were not much fun to use interactively, and also not easy to convert certain kinds of geometric shapes (such as arcs) into that form. (Rob had this chore in a previous job where we worked together.) But in Postscript arcs are a separate type anyway; because a Bezier curve also cannot exactly reproduce an arc, although it can come very close. So there is still the question of whether I ought to try to preserve total Postscript compatibility in this new system, or just do whatever is most efficient. It seems like a shame to build something that is close but not quite the same, because Postscript is so widely used, there are a lot of legacy content and applications that would be easy to display. I thought maybe I could avoid solving this problem by never clipping shapes to anything but rectanglular areas, but even doing that still involves finding the intersection of a Bezier curve with a horizontal or vertical line, and if I could do that, I could probably intersect it with other things too. On X, I could just convert every "canvas" into an X "window", and let it do the clipping. (It would be clipping polygons, because it can't render Beziers at all, so I would have to convert them to polygons, or individual dots.) But it has to work on other systems besides X; and anyway doing this would force me to leave out a lot of cool stuff that Postscript and NeWS can do, such as shaped windows, clipping to curved shapes for artistic effect, etc. It would just end up being a big hole in the functionality.

The second fundamental problem is how to stroke an arbitrary path. It's easy for a line segment, but it's hard to handle the corners where line segments meet. For Bezier curves, you either have to build two curves which are parallel, inside and outside (and that is also an intractable problem - I think I saw a proof on the web that it's impossible, in fact) or convert to line segments first, and stroke those. Now Rob tells me that the most efficient way to draw a Bezier curve is to use an error diffusion technique, and the result is individual dots, not line segments. So that's incompatible with the method which might have to be used to stroke the dang things. Postscript appears to use line segments, and then stroke those; I did this little experiment to learn more:

/box-stroke { 
        newpath
        moveto 
        0 72 rlineto
        72 0 rlineto
        0 -72 rlineto
        closepath
        20 setlinewidth
        strokepath
        0 setlinewidth
        stroke
} def
0 setlinejoin
100 100 box-stroke
1 setlinejoin
200 100 box-stroke
2 setlinejoin
300 100 box-stroke
Notice how there is some overlap, but the stroke of the line (which by default would just be a rectangle with the same length as the line) gets extended just enough to meet the next segment's stroke. At least for closed curves, each segment only does this on one end of itself. But not all angles are 90 degrees. Here's a little more complicated one:
/shape-stroke {
        newpath
        moveto
        0 72 rlineto
        10 -50 30 70 104 -20 rcurveto
        0 -72 rlineto 
        closepath
        10 setlinewidth
        strokepath
        0 setlinewidth
        stroke
} def
4 4 scale
20 80 shape-stroke
The Bezier curve obviously gets converted to line segments and then those are stroked. But the overlap is really complicated around the tight part of the bend. The miter joint where the left line meets the curve is exaggerated because of the acuteness of the angle. (Setting a shorter miterlimit would cause it to be chopped off completely; I experimented with that a little too.) I used Gimp to color what I think would be the fill areas for the bottom and left line-strokes. The curve has pieces which overlap the left line, and the left and bottom lines' strokes also overlap a little. But I'm not sure if the exaggerated miter sticking out is part of the curve's stroke, or the line's. Actually, it probably belongs to the line, because it appears the "far end" of each stroke (the second point on a line, or the fourth on a curve) is the one that tries to meet up with the next segment.

Here's the whole PS file in case you want to zoom in for a closer look.

So if I keep going down this path (no pun intended), I will have to solve both of these problems. I think the rest will be relatively straightforward, just a lot of work. And then when I have most of the functionality of NeWS reproduced, I should be able to build higher-level objects like action trees associated with canvases, rich text widgets, a reusable structured drawing canvas, etc.

A couple of other random ideas that came up the last few days:

A "workstation" should be considered to be a collection of displays and human-interface devices. That will make it easier to build larger, more functional clusters; I cannot buy a monitor big enough for my taste, and it will be many years before I can, but what if I could display different apps on different monitors? You can do this using special graphics drivers and the "virtual desktop" concept, of course, but it's a little too specialized, requires special hardware and all of it has to be in the same machine. Display drivers which exist today might not work with the next rev of the OS. But if every display is just a metawidget container on the network, a collection of machines can be clustered together and become one virtual workstation. Transparently being able to resume a session is important, but if you sometimes log in on a single-screen workstation, and sometimes on one that has several screens, then it'll be necessary to store some preferences/profile stuff on the server for those different situations. Not only does the number of screens change but the whole window-system paradigm might be different; some allow you to only see one app at a time, some have tiled windows, some have overlapping windows, some have touchscreens intended as a main display, some have touchscreens intended only as a supplemental input device to hold the "action" widgets while you view the main document on the main screen; and so on. The simplest possible "workstation" would probably be an LED which blinks to indicate something; or a button that you push to notify the computer of some event (I got home, I'd like to store a GPS waypoint right here, uh oh there's a burglar, please turn this light on, whatever). In your car, a whole collection of little stuff like this could be collectively thought of as a workstation; the GPS waypoint button, the little touchscreen control panels on the doors (for windows and locks), the little one in the dash for the radio, GPS, climate control, maps, etc., the instrument cluster, the HUD, the smart-card reader you use to "log in", are all part of the set of devices which you access in this futuristic scenario of driving your car. So regardless how many computers are required to operate them, when you "log in" in this context, you should be able to transfer the UIs for various apps to wherever it's convenient, and these preferences should be remembered with no further intervention. Some of the apps you will also use outside of the car, and some you won't. Some apps which you use at home might occasionally be useful in the car. If your car has wireless internet access, there's no reason you couldn't run a program at home, remotely from the car. Your bored passenger could play Tetris on his window-control touchscreen, without having to have it pre-installed.

Clustering might or might not require a "middleware" machine to coordinate the event traffic and provide a facade to the server, so it can pretend there is just one workstation. I think some collaboration software has to run somewhere, but it could be one of the machines in the cluster or a server-side process. A middleware tier could be useful for optimizing things for really slow devices anyway. Maybe the higher levels of abstraction of Postscript metawidget code could be offloaded to a more powerful mediator, so that the underpowered display device can just be given simple drawing commands and send back raw events.

Anyway some of this is just dreaming, but it's important to keep the cardinality of things "n" instead of "1" to make sure it's possible. This is why threading and reentrancy are important in the Postscript interpreter. Ideally it would be possible to have multiple interpreters running, maybe even nested inside each other, but it's also good to try to avoid it when possible for efficiency... every interpreter has a lot of internal state.

I'm wondering if a stack of dictionaries, which is normally used to look up Postscript operators, could be replaced by a dictionary containing trees for each "definition". Basically version control with branching. It would be more efficient in that lookup would be done in constant time instead of taking longer the more dictionaries are in the stack. It might require maintaining independent hashes for each user of the dictionary, because each user can create new versions and branches. (A user being an interpreter process, or a block of code which makes its own local definitions.) Also want to check into whether a trie would be a good way to implement a dictionary. I suspect it's best for large data sets when you've got memory to burn, and Postscript dictionaries probably don't get big enough to justify it.

Theming is very important. Programmers are tempted to write egocentric, inconsistent GUIs into their programs because of GUI toolkits which are not flexible enough to incorporate those same ideas into all of the apps at once. The canonical solution should be, if you think of a new UI idea, make a theme and advocate it. Don't build it into your product. X encourages this kind of misbehavior way too much, and now "skin is in" - there are several applications which bypass the various toolkits which already provide theme support, and provide their very own theme/skin support for just that one app. Now users waste more time customizing each app, instead of customizing everything at once. The metawidget system will do its best to prevent this sort of nonsense. The user must remain in control of the appearance of everything; and the system will be maximally flexible so that the strangest most unimaginable things can be done with themes, and there is no idea that cannot be implemented, at least in 2D space. 3D may even happen, later. A theme in the mw system will go way beyond look and feel of the same old widget. Because the application program does not use widgets at all, it describes the goal of the UI at a high level; the widgets can be realized in a myriad of ways. So in this context I'm reluctant to allow new code to override basic built-in functions the way you usually can in Postscript. But being able to do it is also important to do theming at all. So maybe there needs to be some layering; applications are not allowed to override functions and operators in certain dictionaries, nor to change certain methods in the metawidget classes (like having final methods in Java), but themes are allowed to change those things, and only the user is allowed to install a theme. This will not stop programmers from writing completely new widgets or metawidgets which have no namespace conflicts; which is OK because it should't stop actual UI innovation. But I've got a lot more thinking to do on the mechanisms to encourage the right kinds of change and discourage the wrong ones.

Maybe an application should never be allowed to directly create UI on the client. Instead it requests a metawidget which is not "standard" yet, and the server just happens to have a default implementation of it which the client can download... but later the user can go find another implementation and install that instead, if he doesn't like the default one.

BTW the idea of the client requesting implementation for a feature that the application tried to use, which it has no implementation for yet, is also kindof important. I don't think NeWS did that. But Java does, and it's one of the touted features of things like RMI and JINI - any VM can run any code, if it's available on the 'net it's available as a component in applications. But in Postscript, any new definitions typically go only into the latest dictionary on the top of the stack, yet these newly-defined metawidgets will need to be saved for later, so the client doesn't have to keep asking the server for them over and over. I'm not sure what to do about it. Maybe when the client asks for code explicitly, it's a special case? Anyway should these new definitions also be cached to disk, if the client has one, or should they be only in memory so that if an implementation is updated, the client can be reset, and will re-get everything from the server?

These ideas are incompatible with NeWS as it existed before. I wanted to think of the metawidget stuff as an extra layer on top, with full NeWS backwards-compatibility being a possble goal... but maybe some mw ideas will require changes to the interpreter itself.

Because the OO support is not necessarily strongly typed (well, maybe it is in the NeWS implementation, but with my idea of simply cloning objects, you could potentially, say, receive an event which claims to be a mouse event but is actually missing the x, y fields that are normally there), exceptions are more likely. Maybe there should be a better global strategy for handling the uncaught ones. Since the code is all interpreted anyway, there could be a client-side debugger. The user could do several things - add in the missing or messed-up data that the failing function was looking for, and try again; fix the code so it won't happen again; send the developer a bug report; or even send in the fix as a patch. The trend with open systems is for users to become more like developers anyway; this is just an extreme case of it. Maybe it would work. Or maybe it only works in a "power user" mode; the default exception handler only lets you know what happened, submits a bug report, and then quits the app just like a plain old compiled app would dump core and be finished. From what I've read, early Smalltalk and Symbolics Lisp systems, back when they were standalone systems, had integrated debuggers which allowed you to fix code at runtime and then keep going right where you left off.