TeX Study

I tried to draw Knuth.

TeX is a typesetting system designed by Donald E. Knuth. I studied it to implement something similar into a list editor.

It is a pipeline design: The content goes through a parser and a macroexpander. It turns into stream of tokens that are interpreted by an evaluator. The evaluator builds elements and layouts them.

I won't cover the parser or the macroexpander in this post. I'm not interested about them because I've seen enough of that. Besides, the TeXbook explains them quite well. Instead of that I describe how the evaluator works.

Box Model

TeX operates on lists of boxes and spacing. Each box is a rectangle with a reference point, and perhaps drawing associated with it. The space is called glue. It has parameters telling how much it is allowed to shrink or stretch.

When TeX knows how to layout a list of elements, it wraps them into a containing box. This freezes them into place, relative to the containing box.

There are two basic methods to pack the elements inside a box. Horizontal or vertical. Everything is layouted in order by their reference points. If spread parameter is given, it is used to resize the glue proportionally. In this way horizontal boxes can be justified to spread evenly across the page.

The glue shrink and stretch also have a fill -property. The strongest glue takes priority over the remaining glue and extends indefinitely. Major use of this feature is to align and center things.

Internally every box has a width, height, depth and shift. The shift parameter is there to shift the box's position in the layout. Every parameter can be negative but the algorithms do not consider it as a special case.

Evaluator

The evaluator interprets the tokens according to it's state. The states can be roughly divided into horizontal and vertical. There's an also an environment, which tells things such as how large the page is and which font size and line height is in use.

Some tokens can cause the mode to change. For example, text in vertical mode always switches to unrestricted horizontal mode. Vertical token causes a recovery back to vertical mode. But if vertical token appears in restricted horizontal mode, it produces an error. The unrestricted mode applies Knuth-Plass line wrapping algorithm to the list of boxes it'll produce. There's also an equivalent vertical page mode, which does similar partitioning.

Opinions

I see that TeX is a good starting point for solving my layouting problems. It has a logical structure and it's not difficult to think about how to extend it.

Similar posts