Implemented linear abstract machine
I've provided Python & Javascript code in this post. Here's the download links:
Update: These have few bugs. Specifically, finding the instruction boundaries in this kind of a format is trickier than presented here.
Bytecode & Instruction set
The bytecode this virtual machine runs is a near-verbatim translation of its textual representation.
A trivial example:
type: 1 ⊸ (1 ⊗ 1) ⊗ !1
program: (insl ; exp)
bytecode: [1, 4, 17, 16, 18]
Here's the whole instruction set:
(f; g) : a ⊗ a → f a ⊗ g a
assl : a ⊗ (b ⊗ c) → (a ⊗ b) ⊗ c
assr : (a ⊗ b) ⊗ c → a ⊗ (b ⊗ c)
insl : a → 1 ⊗ a
dell : 1 ⊗ a → a
exch : a ⊗ b → b ⊗ a
app : (a ⇒ b) ⊗ a → b
sel num : a & ... → a
ind num : a → a ⊕ ...
inst : !a → a
dup : !a → !a ⊗ !a
drop : !a → 1
alt (f;g...) : a → f a ⊕ g a ...
opt (f;g...) : a → f a & g a ...
cur (f) : a → (b ⇒ f (b ⊗ a))
exp (f) : a → !(f a)
There is an octet assigned for every operation:
var SPLIT = 1,
ASSL = 2,
ASSR = 3,
INSL = 4,
DELL = 5,
EXCH = 6,
APP = 7,
SEL = 8,
IND = 9,
INST = 10,
DUP = 11,
DROP = 12,
ALT = 13,
OPT = 14,
CUR = 15,
EXP = 16,
SEP = 17, END = 18, UNR = 19;
The SEP
and END
are markers for ;
and )
.
The UNR
Data structures
The virtual machine transforms a state by applying instructions into it. In some functional language you might describe it like this:
run : data → [code] → data
run = foldl step
step : data → code → data
type data = nil | cons data data | comp data code | susp
We copy the "nil" "cons" from LISP.
They're exactly the same thing as they're in LISP.
In the code I've supplied I don't even provide nil
and is_nil
.
var nil = null;
function is_nil(sp) {
return sp === null;
}
function cons(a, b) {
return [a, b];
}
function is_cons(sp) {
return (sp instanceof Array);
}
The "comp" is irreducible program state.
This is produced by normal-form expressions such as
IND
, OPT
, CUR
, EXP
.
If the normal-form expression is an exponent, it needs a structure that can be copied, otherwise the structure can be destroyed when consumed.
function Comp(sp, codes, pcs) {
this.sp = sp;
this.codes = codes;
this.pcs = pcs;
}
function comp(sp, code, pc) {
/* Exponents a bit different from other structures. */
if (!(sp instanceof Comp) || is_comp(sp, EXP)) {
sp = new Comp(sp, [code], [pc]);
}
else {
sp.codes.push(code);
sp.pcs.push(pc);
}
return sp;
}
function is_comp(sp, op) {
if (sp instanceof Comp) {
var code = sp.codes[sp.codes.length-1];
var pc = sp.pcs[sp.pcs.length-1];
return (code[pc] === op);
}
return false;
}
function uncomp(sp) {
if (sp.codes.length == 1) {
return {sp:sp.sp, code:sp.codes[0], pc:sp.pcs[0]};
}
else {
var code = sp.codes.pop();
var pc = sp.pcs.pop();
return {sp:sp, code:code, pc:pc};
}
}
There is also a structure that suspends the execution if accessed. The suspension is eventually filled and the execution continues.
function is_susp(sp) {
return (sp instanceof Promise);
}
Virtual machine
The virtual machine keeps a tab of concurrently running routines. Spawned coroutine does an eager run.
function spawn(pending, code, pc, sp) {
return new Promise(function(resolve, reject) {
run(resolve, code, pc, sp);
});
}
run
is introduced in the interpreter -section.
Bytecode access
There are some numeric values in the stream. They're treated as signed big-endian variable-length-quantities.
function getnum(code, pc) {
var octet = code[pc];
pc += 1;
sign = octet & 64;
num = (octet & 63) - (octet & sign);
while (octet >= 128) {
octet = code[pc];
pc += 1;
num = (num << 7) | (octet & 127);
}
return {num:num, code:code, pc:pc};
}
This way we don't have an upper bound to the enumeration size.
At compound instructions we have to collect the instruction boundaries from the stream. To do it we use this routine:
function forward_after(code, pc, octet) {
depth = 0
while ((depth > 0) || (code[pc] !== octet)) {
switch (octet) {
case END:
depth -= 1;
break;
case SPLIT: case ALT: case OPT: case CUR: case EXP:
depth += 1;
break;
}
pc += 1;
}
if (code[pc] != octet) {
throw "type error";
}
pc += 1;
return pc;
}
It finds the next SEP/END -boundary in the stream, correctly counting the "parentheses" in the bytecode.
Interpreter
The interpreter is a large asynchronous function with a loop inside.
function run(resolve, code, pc, sp) {
while ((pc < code.length) && (code[pc] != END) && (code[pc] != SEP)) {
switch(code[pc]) {
The SPLIT instruction establishes what this machine is all about. It's meant to be a concurrently running system. Subroutines are spawned, consed together and then the program continues.
case SPLIT:
if (is_cons(sp)) {
var pc1 = pc+1;
var pc2 = forward_after(code, pc1, SEP);
var pc3 = forward_after(code, pc2, END);
sp = cons(spawn(code, pc1, sp[0]),
spawn(code, pc2, sp[1]));
pc = pc3;
break;
} else {
throw "type error";
}
The ASSL reorganizes tuples, or if there's suspended state, waits for it.
case ASSL:
if (is_cons(sp) && is_cons(sp[1])) {
sp = cons(cons(sp[0], sp[1][0]), sp[1][1]);
pc += 1;
break;
} else if (is_cons(sp) && is_susp(sp[1])) {
return sp[1].then(function(sp1) {
run(resolve, code, pc, cons(sp[0], sp1));
});
} else {
throw "type error";
}
ASSR does the same but is the inverse operation for ASSL:
case ASSR:
if (is_cons(sp) && is_cons(sp[0])) {
sp = cons(sp[0][0], cons(sp[0][1], sp[1]));
pc += 1;
break;
} else if (is_cons(sp) && is_susp(sp[0])) {
return sp[0].then(function(sp0) {
run(resolve, code, pc, cons(sp0, sp[1]));
});
} else {
throw "type error";
}
INSL creates an empty branch into a tuple, that you can SPLIT into.
case INSL:
sp = cons(null, sp);
pc += 1;
break;
DELL removes a branch that became useless.
case DELL:
if (is_cons(sp) && (sp[0] === null)) {
sp = sp[1];
} else if (is_cons(sp) && is_susp(sp[0])) {
sp[0].then(function(sp0) {
if (sp0 !== null) {
throw "type error";
}
});
sp = sp[1];
} else {
throw "type error";
}
break;
EXCH switches items in a tuple around.
case EXCH:
if (is_cons(sp)) {
sp = cons(sp[1], sp[0]);
} else {
throw "type error";
}
break;
APP does a "call" that is a linear equivalent for a function call.
case APP:
if (is_cons(sp) && is_comp(sp[0], CUR)) {
var a = sp[1];
var ref = uncomp(sp[0]);
sp = ref.sp;
return spawn(ref.code, ref.pc+1, cons(sp, a)).then(function(sp) {
run(resolve, code, pc + 1, sp);
});
} else if (is_cons(sp) && is_susp(sp[0])) {
return sp[0].then(function(sp0) {
run(resolve, code, pc, cons(sp0, sp[1]));
});
} else {
throw "type error";
}
break;
SEL selects an option. OPT
provides several ways to use some resource.
case SEL:
if (is_comp(sp, OPT)) {
var n = getnum(code, pc+1);
var ref = uncomp(sp);
ref.pc += 1;
for (var i=0; i<n.num; i++){
ref.pc = forward_after(ref.code, ref.pc, SEP);
}
return spawn(ref.code, ref.pc, sp).then(function(sp) {
run(resolve, code, n.pc, sp);
});
} else {
throw "type error";
}
break;
IND places an element behind an index, so that alternatives can be presented.
case IND:
sp = comp(sp, code, pc);
pc = getnum(code, pc+1).pc;
break;
INST instantiates an exponential. This is an one possible eliminator for a "garbage collected", "referenced" object.
case INST:
if (is_comp(sp, EXP)) {
var ref = uncomp(sp);
return spawn(ref.code, ref.pc+1, sp).then(function(sp) {
run(resolve, code, pc+1, sp);
});
} else {
throw "type error";
}
break;
The DUP is a duplicator that duplicates such references.
case DUP:
if (is_comp(sp, EXP)) {
sp = cons(sp, sp);
} else {
throw "type error";
}
break;
DROP discards a reference.
case DROP:
if (is_comp(sp, EXP)) {
sp = null;
} else {
throw "type error";
}
break;
The ALT matches on the IND.
case ALT:
if (is_comp(sp, IND)) {
var ref = uncomp(sp);
var n = getnum(ref.code, ref.pc+1);
pc += 1;
for (var i=0; i<n.num; i++){
pc = forward_after(code, pc, SEP);
}
return spawn(code, pc, sp).then(function(sp) {
pc = forward_after(code, pc, END);
run(resolve, code, pc, sp);
});
} else {
throw "type error";
}
break;
OPT, presents an option, like mentioned. CUR, produces a linear function. EXP, produces a referenceable that can be copied. There's some additional rules for it.
case OPT: case CUR: case EXP:
sp = comp(sp, code, pc);
pc = forward_after(code, pc+1, END);
break;
Undefined reference, or a bad opcode stops the system. Additional opcodes can be defined to provide some runtime environment.
case UNR:
default:
throw "type error";
/* some sort of issue */
}
}
resolve(sp);
}
If the procedure finishes, it signals an another coroutine to resolve. In some cases this behaves like a call stack.
The input to EXP is required to be trivially duplicated. What this means is exactly this:
─────────────────────
nil is trivial to dup
───────────────────────
exp x is trivial to dup
x is trivial to dup
y is trivial to dup
───────────────────────
x ⊗ y is trivial to dup
The above virtual machine is running correctly in case this holds. Otherwise the first element to consume the object will ruin it.
Entry points
To use the virtual machine you write some segment of code, turn it into bytecode and then spawn it. The result is a promise.
function main() {
var code = [
SPLIT, INSL, SEP, EXP, END
];
var result = spawn(code, 0, cons(null, null));
result.then(function(value){
console.log("result is: ", value);
});
}
If you have a type-checked total program, the result failing is a bug. However note that the use of results this way is difficult because portions of the result may be suspended. Ideally you would describe your interface with a type and validate against it before passing the control.
Use-cases for this system
It's a complete implementation, you could supply it with some standard libraries and builtin values but that's it.
I think there are use-cases for this system that should be seriously considered.
A "better" scheme/LISP
You could theoretically just pick up the verbatim forms and start writing code with them. This implementation isn't quite perfected for this purpose but it's possible to do that.
The main feature here is that you don't need to know the evaluation order to write useful interactive programs in the LAM bytecode.
For example you could define a communication scheme:
entry : terminal ⊸ terminal
type terminal = (terminal ⊗ string) & (string ⊸ terminal)
Now you can write an interactive program. This would run if we had a correct environment for it.
insl exch (SEL 1; "what is your name?") app
SEL 0 (SEL 1; insl ("hello: ";) concat) app
Lets do step-by-tep evaluation by hand. The description of the instruction set earlier on allows to do this.
terminal [insl, exch]
terminal ⊗ 1 [split]
-------------------------------- left
terminal [SEL 1]
string ⊸ terminal
-------------------------------- right
1 [string literal]
string
----------------------------------
(string ⊸ terminal) ⊗ string [app]
terminal
This is the evaluation for the first line.
SEL 1
behaves like a printout system call,
giving an opportunity to send a string.
We send a string literal that queries for the name from the user.
Next we evaluate the second part.
terminal [SEL 0]
terminal ⊗ string [split]
------------------------------- left
terminal [SEL 1]
string ⊸ terminal
------------------------------- right
string [insl (string literal;)]
string ⊗ string [concat]
(string ⊸ terminal) ⊗ string [app]
terminal
This catenates the name to a string and returns it.
We have a dynamically typed description of the program that doesn't require evaluation order to be defined to it in order to run. That's because the data dependencies here entirely define the evaluation order.
The communication scheme I provided is very restricted, I mean, with a real terminal you can do much more than just send and receive strings.
That's interesting though.
Razor-sharp specifications
If you can describe an interface with a type, it means for plenty of other things as well. It allows specifications between software to be razor sharp precise. This can turn out to be quite helpful in practice.
POSIX.1 spec for read()
function start like this:
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Apparently the spec declares that we can do multiple concurrent reads, however it must happen into a file descriptor that actually is a file.
I'm not entirely sure how to translate the POSIX read()
into a linear type,
because there is a huge amount of things that are presentable about the
behavior of this system now.
You can describe that notion that parallel read can be only done on
file descriptors.
splitfd : ∀a. (a:fd rw) ⊗ a allows concurrent reads ⊸ fd r ⊗ fd r ⊗ fd w
If we happen to know something about the file descriptor, we can use it. for example the "a allows concurrent reads" is something you can describe in a type and then you observe it somehow.
Abstraction for async/await
When playing with linear logic you quickly notice that function calls are unlike control flow constructs there.
Javascript is collecting up evidence that merging control flow and function calls leads to a heck of bad ideas. The Javascript and the solutions it has weaved to the "callback hell" is comical in badness. It's as if PHP had leaked into the web client.
If you wonder why JS conceived the "callback hell", it's because implementing first-class continuations is difficult. It's much simpler to add callbacks than give an interpreter a way to wait for a response from an event loop without stopping the interpreter entirely.
Unfortunately not having first-class continuations mean that there's disjointness between calling functions that have to wait vs. functions that run synchronously. This means that programming in Javascript resembles a lot like programming in C, despite that everything is dynamic enough to cause trouble for efficient implementation anyway.
Technically you can introduce first-class continuations into javascript with a macro that rewrites every function into continuation-passing-style. Unfortunately this is incompatible with "vanilla" Javascript and building a bridge to communicate between means that you undertake an effort similar to reimplementing Javascript.
So we go an another route where we introduce async/await & promises to neatly bridge continuation-passing-style and traditional Javascript. It's a tradeoff, though meanwhile the language has gotten so complex that it's difficult and unpleasant to use. And you still can't compose those old-style functions with new-style functions, yet the old-style functions remain much faster than the new-style function so they remain there.
The first-class continuations aren't an obvious choice because something is indeed lost when they're introduced into the language. If you happen to use some resources badly enough you can make a program so convoluted that it's impossible to reason anything about it by any other way than running it.
Of course you get the problems anyway because you already have callbacks/promises/async-await that implement the part first-class continuations would have introduced in the first place. Here's a simple demonstration of it:
var x = 5;
do_things(function thing() {
return x; // Does this return '5' or '6'?
});
x += 1;
Depending on what is going on inside do_things
,
the callback gets called without knowledge about what's in 'x',
unless you read the rest of the program.
It's enough to have mutable values that you can pass with a reference. In the example a closure scope stands as a reference for the mutable variable.
If you didn't have mutable shareable references as a build-primitive, you wouldn't need distinction between async/await in the first place.
Has normalization schemes
The linear abstract machine presented here is a sort of a functional language. There are several ways to reduce the expression aside evaluating it.
Though this is already something that leads from not having a fixed evaluation order, it means you can evaluate the program anywhere in it without worrying that you do something that you shouldn't do yet.
Improvement to Elm?
Elm has recently gained popularity by being a simple functional language with some practical gains in web-development. Mainly the programs are concise and simple to reason about.
Elm provides some primitives for building apps from functions, and then provides a message passing layer to communicate. With linear abstract machine you could do the same, except that you don't need to restrict the communication as much.
init : cmd ⊸ model
view : model ⊸ model ⊗ html message
update : model ⊗ message ⊸ model ⊕ 1
exit : model ⊸ 1
The Elm equivalents would be:
init : (model, cmd)
view : model → html message
update : model → message → (model, cmd)
exit : model → action
There's some distinctions here, though the elm model duplicates and produces garbage when model is being updated while the "linear elm" wouldn't need to do that.
It'd be sort of cool if you could make interactive programming a simple matter of composing things together.