Implemented linear abstract machine

I've provided Python & Javascript code in this post. Here's the download links:

Update: These have few bugs. Specifically, finding the instruction boundaries in this kind of a format is trickier than presented here.

Bytecode & Instruction set

The bytecode this virtual machine runs is a near-verbatim translation of its textual representation.

A trivial example:

type: 1 ⊸ (1 ⊗ 1) ⊗ !1

program:  (insl ; exp)

bytecode: [1, 4, 17, 16, 18]

Here's the whole instruction set:

(f; g)       : a ⊗ a       → f a ⊗ g a
assl         : a ⊗ (b ⊗ c) → (a ⊗ b) ⊗ c
assr         : (a ⊗ b) ⊗ c → a ⊗ (b ⊗ c)
insl         : a           → 1 ⊗ a
dell         : 1 ⊗ a       → a
exch         : a ⊗ b       → b ⊗ a
app          : (a ⇒ b) ⊗ a → b
sel num      : a & ...     → a
ind num      : a           → a ⊕ ...
inst         : !a          → a
dup          : !a          → !a ⊗ !a
drop         : !a          → 1
alt (f;g...) : a           → f a ⊕ g a ...
opt (f;g...) : a           → f a & g a ...
cur (f)      : a           → (b ⇒ f (b ⊗ a))
exp (f)      : a           → !(f a)

There is an octet assigned for every operation:

var SPLIT = 1,
    ASSL  = 2,
    ASSR  = 3,
    INSL  = 4,
    DELL  = 5,
    EXCH  = 6,
    APP   = 7,
    SEL   = 8,
    IND   = 9,
    INST  = 10,
    DUP   = 11,
    DROP  = 12,
    ALT   = 13,
    OPT   = 14,
    CUR   = 15,
    EXP   = 16,
    SEP   = 17, END   = 18, UNR  = 19;

The SEP and END are markers for ; and ). The UNR

Data structures

The virtual machine transforms a state by applying instructions into it. In some functional language you might describe it like this:

run : data → [code] → data
run = foldl step

step : data → code → data

type data = nil | cons data data | comp data code | susp

We copy the "nil" "cons" from LISP. They're exactly the same thing as they're in LISP. In the code I've supplied I don't even provide nil and is_nil.

var nil = null;

function is_nil(sp) {
    return sp === null;
}

function cons(a, b) {
    return [a, b];
}

function is_cons(sp) {
    return (sp instanceof Array);
}

The "comp" is irreducible program state. This is produced by normal-form expressions such as IND, OPT, CUR, EXP.

If the normal-form expression is an exponent, it needs a structure that can be copied, otherwise the structure can be destroyed when consumed.

function Comp(sp, codes, pcs) {
    this.sp = sp;
    this.codes = codes;
    this.pcs = pcs;
}

function comp(sp, code, pc) {
    /* Exponents a bit different from other structures. */
    if (!(sp instanceof Comp) || is_comp(sp, EXP)) {
        sp = new Comp(sp, [code], [pc]);
    }
    else {
        sp.codes.push(code);
        sp.pcs.push(pc);
    }
    return sp;
}

function is_comp(sp, op) {
    if (sp instanceof Comp) {
        var code = sp.codes[sp.codes.length-1];
        var pc = sp.pcs[sp.pcs.length-1];
        return (code[pc] === op);
    }
    return false;
}

function uncomp(sp) {
    if (sp.codes.length == 1) {
        return {sp:sp.sp, code:sp.codes[0], pc:sp.pcs[0]};
    }
    else {
        var code = sp.codes.pop();
        var pc = sp.pcs.pop();
        return {sp:sp, code:code, pc:pc};
    }
}

There is also a structure that suspends the execution if accessed. The suspension is eventually filled and the execution continues.

function is_susp(sp) {
    return (sp instanceof Promise);
}

Virtual machine

The virtual machine keeps a tab of concurrently running routines. Spawned coroutine does an eager run.

function spawn(pending, code, pc, sp) {
    return new Promise(function(resolve, reject) {
        run(resolve, code, pc, sp);
    });
}

run is introduced in the interpreter -section.

Bytecode access

There are some numeric values in the stream. They're treated as signed big-endian variable-length-quantities.

function getnum(code, pc) {
    var octet = code[pc];
    pc += 1;
    sign = octet & 64;
    num = (octet & 63) - (octet & sign);
    while (octet >= 128) {
        octet = code[pc];
        pc += 1;
        num = (num << 7) | (octet & 127);
    }
    return {num:num, code:code, pc:pc};
}

This way we don't have an upper bound to the enumeration size.

At compound instructions we have to collect the instruction boundaries from the stream. To do it we use this routine:

function forward_after(code, pc, octet) {
    depth = 0
    while ((depth > 0) || (code[pc] !== octet)) {
        switch (octet) {
            case END:
                depth -= 1;
                break;
            case SPLIT: case ALT: case OPT: case CUR: case EXP:
                depth += 1;
                break;
        }
        pc += 1;
    }
    if (code[pc] != octet) {
        throw "type error";
    }
    pc += 1;
    return pc;
}

It finds the next SEP/END -boundary in the stream, correctly counting the "parentheses" in the bytecode.

Interpreter

The interpreter is a large asynchronous function with a loop inside.

function run(resolve, code, pc, sp) {
    while ((pc < code.length) && (code[pc] != END) && (code[pc] != SEP)) {
        switch(code[pc]) {

The SPLIT instruction establishes what this machine is all about. It's meant to be a concurrently running system. Subroutines are spawned, consed together and then the program continues.

            case SPLIT:
                if (is_cons(sp)) {
                    var pc1 = pc+1;
                    var pc2 = forward_after(code, pc1, SEP);
                    var pc3 = forward_after(code, pc2, END);
                    sp = cons(spawn(code, pc1, sp[0]),
                              spawn(code, pc2, sp[1]));
                    pc = pc3;
                    break;
                } else {
                    throw "type error";
                }

The ASSL reorganizes tuples, or if there's suspended state, waits for it.

            case ASSL:
                if (is_cons(sp) && is_cons(sp[1])) {
                    sp = cons(cons(sp[0], sp[1][0]), sp[1][1]);
                    pc += 1;
                    break;
                } else if (is_cons(sp) && is_susp(sp[1])) {
                    return sp[1].then(function(sp1) {
                        run(resolve, code, pc, cons(sp[0], sp1));
                    });
                } else {
                    throw "type error";
                }

ASSR does the same but is the inverse operation for ASSL:

            case ASSR:
                if (is_cons(sp) && is_cons(sp[0])) {
                    sp = cons(sp[0][0], cons(sp[0][1], sp[1]));
                    pc += 1;
                    break;
                } else if (is_cons(sp) && is_susp(sp[0])) {
                    return sp[0].then(function(sp0) {
                        run(resolve, code, pc, cons(sp0, sp[1]));
                    });
                } else {
                    throw "type error";
                }

INSL creates an empty branch into a tuple, that you can SPLIT into.

            case INSL:
                sp = cons(null, sp);
                pc += 1;
                break;

DELL removes a branch that became useless.

            case DELL:
                if (is_cons(sp) && (sp[0] === null)) {
                    sp = sp[1];
                } else if (is_cons(sp) && is_susp(sp[0])) {
                    sp[0].then(function(sp0) {
                        if (sp0 !== null) {
                            throw "type error";
                        }
                    });
                    sp = sp[1];
                } else {
                    throw "type error";
                }
                break;

EXCH switches items in a tuple around.

            case EXCH:
                if (is_cons(sp)) {
                    sp = cons(sp[1], sp[0]);
                } else {
                    throw "type error";
                }
                break;

APP does a "call" that is a linear equivalent for a function call.

            case APP:
                if (is_cons(sp) && is_comp(sp[0], CUR)) {
                    var a = sp[1];
                    var ref = uncomp(sp[0]);
                    sp = ref.sp;
                    return spawn(ref.code, ref.pc+1, cons(sp, a)).then(function(sp) {
                        run(resolve, code, pc + 1, sp);
                    });
                } else if (is_cons(sp) && is_susp(sp[0])) {
                    return sp[0].then(function(sp0) {
                        run(resolve, code, pc, cons(sp0, sp[1]));
                    });
                } else {
                    throw "type error";
                }
                break;

SEL selects an option. OPT provides several ways to use some resource.

            case SEL:
                if (is_comp(sp, OPT)) {
                    var n = getnum(code, pc+1);
                    var ref = uncomp(sp);
                    ref.pc += 1;
                    for (var i=0; i<n.num; i++){
                        ref.pc = forward_after(ref.code, ref.pc, SEP);
                    }
                    return spawn(ref.code, ref.pc, sp).then(function(sp) {
                        run(resolve, code, n.pc, sp);
                    });
                } else {
                    throw "type error";
                }
                break;

IND places an element behind an index, so that alternatives can be presented.

            case IND:
                sp = comp(sp, code, pc);
                pc = getnum(code, pc+1).pc;
                break;

INST instantiates an exponential. This is an one possible eliminator for a "garbage collected", "referenced" object.

            case INST:
                if (is_comp(sp, EXP)) {
                    var ref = uncomp(sp);
                    return spawn(ref.code, ref.pc+1, sp).then(function(sp) {
                        run(resolve, code, pc+1, sp);
                    });
                } else {
                    throw "type error";
                }
                break;

The DUP is a duplicator that duplicates such references.

            case DUP:
                if (is_comp(sp, EXP)) {
                    sp = cons(sp, sp);
                } else {
                    throw "type error";
                }
                break;

DROP discards a reference.

            case DROP:
                if (is_comp(sp, EXP)) {
                    sp = null;
                } else {
                    throw "type error";
                }
                break;

The ALT matches on the IND.

            case ALT:
                if (is_comp(sp, IND)) {
                    var ref = uncomp(sp);
                    var n = getnum(ref.code, ref.pc+1);
                    pc += 1;
                    for (var i=0; i<n.num; i++){
                        pc = forward_after(code, pc, SEP);
                    }
                    return spawn(code, pc, sp).then(function(sp) {
                        pc = forward_after(code, pc, END);
                        run(resolve, code, pc, sp);
                    });
                } else {
                    throw "type error";
                }
                break;

OPT, presents an option, like mentioned. CUR, produces a linear function. EXP, produces a referenceable that can be copied. There's some additional rules for it.

            case OPT: case CUR: case EXP:
                sp = comp(sp, code, pc);
                pc = forward_after(code, pc+1, END);
                break;

Undefined reference, or a bad opcode stops the system. Additional opcodes can be defined to provide some runtime environment.

            case UNR:
            default:
                throw "type error";
                /* some sort of issue */
        }
    }
    resolve(sp);
}

If the procedure finishes, it signals an another coroutine to resolve. In some cases this behaves like a call stack.

The input to EXP is required to be trivially duplicated. What this means is exactly this:

─────────────────────
nil is trivial to dup

───────────────────────
exp x is trivial to dup

x is trivial to dup
y is trivial to dup
───────────────────────
x ⊗ y is trivial to dup

The above virtual machine is running correctly in case this holds. Otherwise the first element to consume the object will ruin it.

Entry points

To use the virtual machine you write some segment of code, turn it into bytecode and then spawn it. The result is a promise.

function main() {
    var code = [
        SPLIT, INSL, SEP, EXP, END
    ];
    var result = spawn(code, 0, cons(null, null));
    result.then(function(value){
        console.log("result is: ", value);
    });
}

If you have a type-checked total program, the result failing is a bug. However note that the use of results this way is difficult because portions of the result may be suspended. Ideally you would describe your interface with a type and validate against it before passing the control.

Use-cases for this system

It's a complete implementation, you could supply it with some standard libraries and builtin values but that's it.

I think there are use-cases for this system that should be seriously considered.

A "better" scheme/LISP

You could theoretically just pick up the verbatim forms and start writing code with them. This implementation isn't quite perfected for this purpose but it's possible to do that.

The main feature here is that you don't need to know the evaluation order to write useful interactive programs in the LAM bytecode.

For example you could define a communication scheme:

entry : terminal ⊸ terminal

type terminal = (terminal ⊗ string) & (string ⊸ terminal)

Now you can write an interactive program. This would run if we had a correct environment for it.

insl exch (SEL 1; "what is your name?") app
SEL 0 (SEL 1; insl ("hello: ";) concat) app

Lets do step-by-tep evaluation by hand. The description of the instruction set earlier on allows to do this.

terminal          [insl, exch]
terminal ⊗ 1      [split]
  -------------------------------- left
  terminal        [SEL 1]
  string ⊸ terminal
  -------------------------------- right
  1               [string literal]
  string
----------------------------------
(string ⊸ terminal) ⊗ string [app]
terminal

This is the evaluation for the first line. SEL 1 behaves like a printout system call, giving an opportunity to send a string. We send a string literal that queries for the name from the user.

Next we evaluate the second part.

terminal            [SEL 0]
terminal ⊗ string   [split]
  ------------------------------- left
  terminal          [SEL 1]
  string ⊸ terminal
  ------------------------------- right
  string            [insl (string literal;)]
  string ⊗ string   [concat]
(string ⊸ terminal) ⊗ string [app]
terminal

This catenates the name to a string and returns it.

We have a dynamically typed description of the program that doesn't require evaluation order to be defined to it in order to run. That's because the data dependencies here entirely define the evaluation order.

The communication scheme I provided is very restricted, I mean, with a real terminal you can do much more than just send and receive strings.

That's interesting though.

Razor-sharp specifications

If you can describe an interface with a type, it means for plenty of other things as well. It allows specifications between software to be razor sharp precise. This can turn out to be quite helpful in practice.

POSIX.1 spec for read() function start like this:

The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.

Apparently the spec declares that we can do multiple concurrent reads, however it must happen into a file descriptor that actually is a file.

I'm not entirely sure how to translate the POSIX read() into a linear type, because there is a huge amount of things that are presentable about the behavior of this system now. You can describe that notion that parallel read can be only done on file descriptors.

splitfd : ∀a. (a:fd rw) ⊗ a allows concurrent reads ⊸ fd r ⊗ fd r ⊗ fd w

If we happen to know something about the file descriptor, we can use it. for example the "a allows concurrent reads" is something you can describe in a type and then you observe it somehow.

Abstraction for async/await

When playing with linear logic you quickly notice that function calls are unlike control flow constructs there.

Javascript is collecting up evidence that merging control flow and function calls leads to a heck of bad ideas. The Javascript and the solutions it has weaved to the "callback hell" is comical in badness. It's as if PHP had leaked into the web client.

If you wonder why JS conceived the "callback hell", it's because implementing first-class continuations is difficult. It's much simpler to add callbacks than give an interpreter a way to wait for a response from an event loop without stopping the interpreter entirely.

Unfortunately not having first-class continuations mean that there's disjointness between calling functions that have to wait vs. functions that run synchronously. This means that programming in Javascript resembles a lot like programming in C, despite that everything is dynamic enough to cause trouble for efficient implementation anyway.

Technically you can introduce first-class continuations into javascript with a macro that rewrites every function into continuation-passing-style. Unfortunately this is incompatible with "vanilla" Javascript and building a bridge to communicate between means that you undertake an effort similar to reimplementing Javascript.

So we go an another route where we introduce async/await & promises to neatly bridge continuation-passing-style and traditional Javascript. It's a tradeoff, though meanwhile the language has gotten so complex that it's difficult and unpleasant to use. And you still can't compose those old-style functions with new-style functions, yet the old-style functions remain much faster than the new-style function so they remain there.

The first-class continuations aren't an obvious choice because something is indeed lost when they're introduced into the language. If you happen to use some resources badly enough you can make a program so convoluted that it's impossible to reason anything about it by any other way than running it.

Of course you get the problems anyway because you already have callbacks/promises/async-await that implement the part first-class continuations would have introduced in the first place. Here's a simple demonstration of it:

var x = 5;
do_things(function thing() {
    return x; // Does this return '5' or '6'?
});
x += 1;

Depending on what is going on inside do_things, the callback gets called without knowledge about what's in 'x', unless you read the rest of the program.

It's enough to have mutable values that you can pass with a reference. In the example a closure scope stands as a reference for the mutable variable.

If you didn't have mutable shareable references as a build-primitive, you wouldn't need distinction between async/await in the first place.

Has normalization schemes

The linear abstract machine presented here is a sort of a functional language. There are several ways to reduce the expression aside evaluating it.

Though this is already something that leads from not having a fixed evaluation order, it means you can evaluate the program anywhere in it without worrying that you do something that you shouldn't do yet.

Improvement to Elm?

Elm has recently gained popularity by being a simple functional language with some practical gains in web-development. Mainly the programs are concise and simple to reason about.

Elm provides some primitives for building apps from functions, and then provides a message passing layer to communicate. With linear abstract machine you could do the same, except that you don't need to restrict the communication as much.

init   : cmd ⊸ model
view   : model           ⊸ model ⊗ html message
update : model ⊗ message ⊸ model ⊕ 1
exit   : model           ⊸ 1

The Elm equivalents would be:

init : (model, cmd)
view : model → html message
update : model → message → (model, cmd)
exit : model → action

There's some distinctions here, though the elm model duplicates and produces garbage when model is being updated while the "linear elm" wouldn't need to do that.

It'd be sort of cool if you could make interactive programming a simple matter of composing things together.