Last updated Sunday, 8 September 2024

KBG.sh Nihil est in intellectu quod non sit prius in sensu

The Dubsar Project

This page contains a somewhat informal précis of the Dubsar system as it exists as of this writing (6 October 2024). I invite any and all interested readers to contact me if you have feedback or questions about the project! My email address can be found at the home page of this site. This document is intended for individuals with some background knowledge of computer typography, but I will try not to be too esoteric where possible. In particular, I assume familiarity with terms like text shaping, kerning, hinting, etc. It is also intended to function as a companion to my recent presentation on Dubsar at the 2024 Text Encoding Initiative Conference in Buenos Aires, Argentina, from where I am currently writing! This is a living document and will be updated along with Dubsar itself, as time permits.

Introduction

Working with internationalized text is monumentally difficult; doing it correctly is even harder. The question of what text is and whether there's a better way to do things than the status quo has consumed much of my spare thought over the past year. After much deliberation, I am here to tell you that I believe the answer is "yes". To understand where things went wrong, what we need is a kind of genealogical critique of the past century of text.

My basic hypothesis is that we computer scientists think of computer text in the way we do because we have falsely extrapolated from the limitations of early computers, inherited from their direct typographical ancestor, the teletype, to an imagined ontology of progressively less complex kinds of text, with so-called "plain text" at the most the primitive layer. In other words, it is my opinion that there is no such thing as "plain" text. Now, that is certainly a bold claim, and wait a second, weren't we talking about fonts? Allow me to explain.

Those of us who came of age in the ASCII-era are likely to conceive of writing as an linear or unidimensional sequence of discrete symbols, which we wrap into lines for visual convenience. What I want to argue is that this model is fundamentally incorrect with respect to non-Latin writing systems, and isn't even true of the Latin alphabet itself. While it is undoubtedly true that the problem of text is simply inherently complex, that complexity is only exacerbated by the past half-century of attempting to shove the square peg of the world's writing systems into the round hole of ASCII-like encodings. While I don't think it's wrong to speak of graphemes, the actual reality is that sequences of graphemes, when rendered visually, are not compositional. What this means is that we simply cannot achieve good typography by trying to map graphemes onto individualized/discretized concrete letterforms. Nor can we sidestep this issue by reifying all instances of non-compositional grapheme sequences as predetermined ligatures (as does OpenType). In many writing systems, the actual number of "ligatures" (taken in this sense) is essentially unbounded.

Recently, I attended the fantastic WAVE conference in Cambridge, UK (short for Writing As Visual Experience). In speaking with other attendees, a point I found myself reiterating time and time again (I'm sure to the consternation of at least some!) is that we must understand computer text as an independent modality of writing and language that stands on its own. It is easy to think of computer text as completely parasitic on printed or even written forms of text, but this is simply not so. There are manifold ways in which we interact with computer text that simply have no clear analogue in these antecedent forms. There is no copy and paste or arbitrary insertion in hot metal typesetting, or typewriters, or a pen or pencil. And precisely because that is the case, just the mechanics of how writing works in its original context do not necessarily offer any clear idea on how its computerized form should work.

Dubsar, from the Sumerian word for scribe, is my attempt to address at least some of these problems. Dubsar is a stroke based font format, where glyphs are drawn programmatically. A Dubsar font is a program where each glyph for which a font has coverage is an entrypoint. Dubsar fonts and individual glyphs may be arbitrarily parameterized, though some such as size, thickness, line-height, DPI, and others are standard. Crucially, the code for each glyph in a Dubsar font has access to the entire graphical context, which it can arbitrarily modify (up to erasing anything previously drawn). Dubsar fonts are written in a programming language, one that I have unimaginatively named DubsarScript, that is compiled down to a virtual instruction set, the Dubsar VM. DubsarScript and its VM feature a graphics model that is somewhat reminiscent of PostScript, though with important differences, some of which are enumerated later this in this document.

My basic criterion for what belongs in Dubsar (as opposed to some other layer above) is that a font should fully specify the "logic" or mechanics of a writing system. A consumer of a Nastaliq font should not, for instance, have to know or care about the fact that words in Nastaliq vertically overlap. Dubsar allows for the particulars of individual writing systems to be ignored by presenting a programming interface that is highly abstract. Inputing text moves the cursor, yes, and an embedder can query where the cursor is at a given moment, but it is the responsibility of the font to actually move it by a specific amount.

Related Work

METAFONT

The astute reader might make the natural comparison with Donald Knuth's METAFONT. And they would be correct to, as METAFONT is among the chief inspirations for Dubsar. In fact, it is fair to say that Dubsar itself was essentially born out of the frustrations I experienced in trying to actually design a now abandoned font, Monsalvat, with METAFONT.

However, while there are many similarities between Dubsar and METAFONT, the differences are profound. First: while METAFONT is ostensibly stroke-based, perusing the source code of actual existing METAFONTs will immediately disappoint anyone enthused by the elegance of the METAFONT programming model: real-world METAFONTs nearly always use the language as a rather baroque way to define what are essentially outline fonts. This is because METAFONT has a dehabiliting limitation: there is no way to manipulate the envelope of a stroke as a path unto itself. This is significant, because punch cut type faces do not really correspond to the union of nib or brush strokes. This is a problem because for instance, stems and serifs of a typical Latin typeface do not meet at the squared off angles that are typical of text written with a pen nib. Rather, they are gently curved at their intersections. This is essentially an artifact of the physical process of cutting metal type, much like how serifs themselves are basically artifacts of stone cutting, but found themselves baptized as graphical features in their own right. Regardless, gently curved intersections are a significant stylistic component of many typefaces. METAFONT is well-suited to digitizing "calligraphic" style typefaces, but it cannot easily accommodate anything else. While there is no theoretical reason why pen stroke envelopes could not be "reifed" in a hypothetical METAFONT extension, it would come with deep mathematical complexy involving higher-order Bézier curves, that seems unlikely to be added to a program that is essentially dormant.

A second and more significant difference is that in the METAFONT model, gylphs are drawn individually, rasterized before a document is typeset, and augmented with manually specified kerning tables. While METAFONTs as a whole can be parameterized (and are to fantastic effect by Knuth and others) this is simply a convention. Despite its elegance, METAFONT is still fundamentally organized around the notion of discretized glyphs.

SIL Graphite

SIL Graphite is another technology that invites an obvious comparison with Dubsar. SIL describes Graphite as a "'smart font' system developed specifically to handle the complexities of lesser-known languages of the world." Clearly, the remit of Graphite and Dubsar are quite similar. Graphite is a layered on top of TrueType (a predecessor of OpenType, it is limited to quadratic Bézier curves), so does not share Dubsar's stroke-based graphics model. Though expressed through a different mechanism, Graphite is similar to Dubsar in that it does not predefine shaping models like OpenType. Rather Graphite allows a font designer to express essentially arbitrary shaping logic by specifying a finite state transducer from the text input to glyphs and their positions. Though OpenType's Unified Shaping Engine provides a somewhat similar feature, there remains some functionality that lies beyond its capabilities. Graphite was an important early inspiration for Dubsar.

HarfBuzz WebAssembly Shaping

Finally, I would be remiss if I did not mention that recent version of HarfBuzz have added support for a custom extension to OpenType that allows font designers to embed more or less arbitrary WebAssembly code that can function as custom shaping model. While this feature is very much in the spirit of Dubsar, it still does not achieve the same degree of conceptual economy as Dubsar.

Technical Details

This section is mostly intended for a technical audience. These descriptions are very much a sketch of the general design of Dubsar, and not meant to be a comprehensive account. I am currently working on proper documentation for DubsarScript and the DubsarVM.

The Dubsar system is written in OCaml. A benchmark I set for myself was that the entire Dubsar system should be less than 5000 lines of code. Currently, Dubsar sits at around 2500 lines, but that does not include a rasterizer, which is still a work in progress and currently weighs in at another 1500 lines so far (as it stands, I am mostly generating PDFs or SVGs rather than rasterizing myself). In any case, we are talking about orders of magnitude less code than the typical open source text stack (FriBiDi, HarfBuzz, FreeType).

The whole of Dubsar is divided into two parts: the DubsarScript Compiler and the Dubsar VM interpreter. Dubsar fonts are written in the DubsarScript language, which is compiled to the VM. Dubsar VM bytecodes are then interpreted by an implementation of the VM instruction set. Dubsar also specifies a standard API for embedders, by which the state of the VM (for instance, the position of the cursor) can be queried

DubsarScript

DubsarScript is the object-level programming language in which Dubsar fonts are written. It uses a Pascal-like syntax.

Dubsar VM

The Dubsar VM is a "true" stack-based virtual machine. This means that, unlike the JVM, CLR, and WebAssembly, it does not have user-visible registers (called "locals") in addition to an evaluation stack. Unlike the TrueType Hinting VM, but like the JVM, CLR, and WebAssembly, the DubsarVM is inherently safe, as the loading process specifies a type-checking phase. The Dubsar VM has been carefully defined so that its execution model is entirely endian-independent. As a consequence of this decision, it is impossible to individually access any quantity smaller than 32-bits.

The Dubsar VM has the following basic data types:

All of these types are first class; they can be stored in variables and passed to and returned from functions. Note that functions are intentionally absent from this list. Restricting the VM to a first-order language profoundly simplifies the metatheory and the (ongoing, forthcoming) formal proof of type safety. Another notable, intentional restriction is that Dubsar functions may not recur. This decision has profound implications with respect to performance, and most importantly, it allows embedders of Dubsar VMs to statically determine the maximum size of the evaluation and return stacks at load-time. Checking for stack overflows, as it turns out, is a significant source of performance penalties for WebAssembly modules.

Every glyph in a Dubsar font is a subroutine that received some number of parameters on the evaluation stack, executes code that typically manipulates the drawing surface, then returns the next position, orientation, and size of the cursor on the evaluation stack. Dubsar fonts may also have font-wide parameters that are specified when the font is invoked. The current value of these parameters is also passed to each glyph along with glyph-specific parameters.

As mentioned, the Dubsar graphics model is similar to PostScript. Paths are first class data, which consist of sequences of Bézier curves, line segments, and elliptical arcs. DubsarScript uses Hobby splines to specify Bézier curves. Similar to METAFONT, paths are stroked with pens. Unlike METAFONT, stroking a path returns a new path, which then must be explicitly drawn. Paths may be combined with set-theoretic type operators such as union, difference, etc. Unlike PostScript, the graphical state of the drawing surface is "structured": exactly which paths have been drawn with which pens can be inspected by glyphs, and already drawn objects can be erased at a later point.

That a cursor's size and rotation is specified is a curious aspect of the design, and requires further explanation. In my opinion, a natural treatment of hieroglyphic scripts such as Egyptian, Maya, and Luwian requires that the cursor be able to move within a conceptual "line", into and out of blocks of glyphs. In Egyptological and Mayanist literature, it is common to speak of so-called "quadrats". For anyone who has seen Maya, it is visually obvious what a quadrat is: each "pebble" (as one early European commenter described it) is a quadrat, though it may surprise those unfamiliar with Maya to know that each quadrat can and typically does contain more than one grapheme. An Egyptological quadrat is more subtle: Egyptian scribes seemed to have had the aesthetic sensibility that hieroglyphs looked best when they were compactly nestled up close to each other, into roughly square regions. Look carefully at most Egyptian texts, and this pattern will become apparent once you know what to look for. What we are describing here is, essentially, a sort of two-dimensional kerning. As for rotation, to return to the earlier example of Nastaliq, the most natural way to interact with Nastaliq text is with a (potentially) diagonal cursor that moves with the angle of the text.

Kerning

Glyphs in Dubsar fonts are able to measure how close they are to previously drawn glyphs by several different metrics, and adjust their position accordingly. This means that Dubsar fonts can specify kerning in a "programmatic" fashion, without the need to manually specify kerning tables. While manual kerning isn't so onerous for languages like English, even for more complicated Latin-based alphabets like Vietnamese it can quickly become daunting. Dynamic collision detection is also particularly important for high quality Nastaliq typography.

Hinting

While hinting has fallen somewhat out of fashion with the advent of high-DPI displays (my understanding is newer version of iOS and macOS simply ignore hinting tables altogether), I believe there is still a place for it. The inherent awkwardness of outline fonts is magnified in the presence of hinting, because outline fonts are inherently not "semantic". By that I mean that an outline font engine has no knowledge what each part of a glyph does, and how it should dynamically behave. If a human were manually rasterizing a font, he or she would simply know that a stem should never have gaps.

Because stroke-based fonts are more "high-level" representations of a glyph, they already need less manual hinting. However, Dubsar also allows fonts to be parameterized by DPI, in which case the current target DPI is passed to each glyph drawn, which it can then use to decide how it should draw itself. In this way, Dubsar fonts do not need a separate hinting information layered above the code for drawing the glyphs themselves.

Justification

Dubsar fonts (will eventually) perform justification where it is required. This is a controversial decision and one that I didn't take lightly. Recall here my criterion for what is proper to a typography engine. After careful consideration, it is my opinion that justification is irreducibly particular to an individual writing system. For those literate in Western alphabets, this is a surprising conclusion. Isn't justification just adding space? Yes: in Latin, Greek, Cyrillic, Georgian, Armenian and many other similar writing systems. By contrast, in Arabic text is traditionally justified by stretching certain letters, with words separated by generally fixed sized spaces, in a system called kashida.

This aspect of Dubsar is highly distinct from, for example, HarfBuzz, where the quantum of shaping (so to speak) is a span of text of the same typeface. While I am resolute that justification is within Dubsar's purview, how best to implement this feature concretely remains an area of active research. My basic assumption is that Dubsar's typographical quantum is a two-dimensional block that glyphs may query the boundaries of and place themselves within accordingly. For now, Dubsar fonts draw glyphs on an infinitely long single line.

Conclusion

Dubsar is currently under active research and development. However, it is not simply an academic curiosity: it is my intent to get it to a point where it is usable for "real" typography. Right now, I am aiming for a public release in the next 6 months to a year, depending on whether funding can be secured (such that I can work on it full time). The public release will be under an open source license, likely AGPLv3. While that might seem highly restrictive, consider that software embedding Dubsar need only implement the VM, not the DubsarScript compiler. The DubsarVM has been carefully designed such that implementions in most languages should be well under 10k lines of code. At some point, I will probably release a C99 reference implementation of the VM under a less restrictive license.

Detractors of METAFONT have consistently questioned whether type designers would be willing to learn a programming language to design a type. Given the paucity of third party METAFONTs, perhaps this criticism has some teeth. However, much has changed in the nearly 50 years since METAFONT was first released and knowledge of programming is much more widespread. In terms of accessibility, I would ask what is a higher barrier to entry, a simple programming language or a huge, expensive, and complicated GUI font design application?

All that being said, designing a font without any visual aid is hard. In preparing a (fragmentary) Maya font, my method was to actually sketch the glyphs onto grid paper, and then input the coordinates of e.g. endpoints of line segments manually. This may seem tedious, but it wound up being far less so than using Inkscape! I'm currently investigating the possibility of designing custom graph paper and a compass-like tool that could simplify this process. After all, consider that the original Bézier splines were in fact physical objects!

As mentioned at the beginning, please do not hesitate to get in touch with me if you feel so inclined. I am especially interested in hearing from users of minoritized writing systems or specialists in ancient scripts. Thank you for reading!