Dynamic-static typing

In this post I try to abstract away details of type checking, such that the resulting type system can no longer be categorized on dynamically/statically typed axis.

I guess it's a good opportunity to pick on Harper's blog post where he shat on people using dynamically typed languages by proposing that we stick to our languages due to clever marketing rather than a rationale.

In the post Harper finds it important to be able to state and enforce invariants. Dynamically typed languages are usually dynamic enough to not allow checking that any invariants hold.

Either way, the invariants may or may not be there. Even dynamically typed programs tend to collect useful invariants as they are written. It is harmful that you cannot programmatically ensure that those invariants actually are there.

Dynamic typing avoids an equally hideous situation of excessive invariants though. In my opinion excessive invariants are a much worse problem than inability to check invariants because the later usually forces or at least guides you to write more code to keep the invariant. This is not only true to C, but also to languages such as ML with type inference because the inference itself tends to introduce more invariants in order to work through the code.

In the same post Harper admits that...

languages that admit both modes of reasoning enable you and liberate you from the tyranny of a single type.

It would seem that we agree on this one. But it appears as if Harper had completely disregarded the motivation for having dynamic typing in the first place. I think he meant for something else than what you're about to see next.

Introduction

The main task to "admit both modes of reasoning" here is to find the correspondences between dynamic and static object spaces.

Assume we'd have the following script:

a = [1, 2, "hello"]
print(type(a))

We want to print out the type of the list we just constructed. The desired output of this would be List(Integer | String).

I know you already see some problems that seem intractable here.

Both types and values exist in the same object space. Type theorists tend to treat these as if they were in different universes entirely.
We cannot know whether the type annotation is conclusive. Append 1.5 into the list and the type annotation should change to List(Integer | Rational | String).
How do the internals of dynamic types correspond to static types?

When I started writing this post, I did not know the answer to the last question here. I think it's the potentially the most interesting one. The answer comes from higher-order type operators. It is also the reason for writing the whole post in the first place.

The second problem is likely something we have to live with, but it can be made much smaller by making module contents immutable. At first it would seem counterintuitive to have a dynamically typed language with immutable modules, but this actually makes a lot of sense.

The types and values in same object space would seem to be a showstopper at first. On closer inspection it doesn't seem to matter at all. All right, values and types are from different universes, and you may get types of types that are again from an another universe. They can still share the same namespace without a problem.

Immutable modules

Programs no longer follow Unix-process model in design. Modern software consists of many small tasks that operate at the same time. There is an operating system inside and outside a process. Setting modules immutable allows different tasks to use the same modules without interacting inadvertently.

When modules are immutable we would appear to lose the live programming capabilities of dynamically typed languages, but actually the opposite happens. The only reliable way to do live programming is to save the state, shut down and restart with a new program loading the previous state.

Additional type information makes it easier to do live programming. The verified task boundaries make it reliable and safe. When these aspects are combined, live programming becomes a commonplace and accepted software development technique.

People who have done some live programming know that it's incredibly effective when something interactive should be written. The development time may drop hundredfold if you can see the immediate feedback for every change you do.

Constructors

So that they make sense both dynamically and statically typed, how should the data constructors work?

We might have data constructors defined like this:

data Foo(a) =
    foo(bool, Foo(a))
    bar(int, a)
    woot

You could create objects like this:

a = foo(true, woot)
b = foo(false, bar(3, "h"))
c = bar(4, 2)

With types such as these:

a :: Foo(?)
b :: Foo(String)
c :: Foo(Integer)

We get to the type constructor Foo(a) later, first lets discuss what the types of the data constructors should be:

foo  :: Constructor(Foo(_), (Integer,Bool))
bar  :: Constructor(Foo(a), (Integer,a))
woot :: Constant(Foo(?))

Constructors are pattern matchers and callables. Constants are instances of Foo(?) while they're also valid pattern matchers.

class Constructor(a, b)
    Pattern(a, b)
    Call(b -> a)

class Constant(a)
    Pattern(a, unit)

Two builtin methods are implemented for patterns:

matching    :: (Pattern(a,_), a) -> bool
deconstruct :: (Pattern(a,b), a) -> b

And they are used to implement the case clauses such as:

show(a):
    case a
    on foo(a,b)
        print("foo", a, b)
    on bar(c,d)
        print("bar", c, show(d))
    on woot
        print("woot")
show :: Foo(Stringify) -> Unit !

I have not decided how to represent coeffects/exceptions/IO, but those have to be provided on the type annotations.

The data constructors come from the representation selected for a type. Tagged unions have this kind of constructors and record types would have different kind of constructors.

Proper exception handling may require that tagged and openly extensible tree hierarchies can be created.

Operators

The base utilities in the language are provided by abstract operators that any object in the system may implement. Here are some interesting ones with their associated types:

call :: Operator((Call(a -> b):a -> b)
(==) :: Operator((a,b) -> c)
    eq [a|b] (a,b) -> c
(+)  :: Operator((a,b) -> c)
    add [a|b] (a,b) -> c
hash :: Operator(Hashable -> Integer)
type :: Operator(Typeable -> Scheme)

Some of these operators select a method by picking it based on the type. The operators with relations in them, such as the equality and addition, select a method by a type constructor.

Type constructors have a coercible-relation set for them. This relation describes acceptable forms of implicit conversions and the function that does the conversion.

For example, to resolve the eq [a|b] (a,b) -> c, the type constructors are collected from the a and b arguments.

x,y in selector
exist unique y. forall x {x!=y}. coercible(x, y)
convert every element to y, select operation from y.

This approach allows the type system to infer the types, but also gives enough leeway to provide very sufficient means for representing algebra in code.

Kinds with subtyping

Just like with Haskell, we are going to need kinds. The kinds have to recognize covariance, contravariance, bivariance and some forms of abstract invariance.

So we have slightly more kinds here. They are not used in typeclasses, rather, these are needed for determining what the shapes of the types are and how to construct valid types.

bool  ................ +
list  ................ ± -> +
builtin  ............. F -> +
record_immutable  .... R+ -> +
record_mutable  ...... R± -> +
singleton  ........... O -> +          (module or ffi library)
linalg  .............. (+, I, I) -> +
product  ............. (+...) -> +
map  ................. (±, ±) -> +

Likely in a real usecase these would seem slightly different:

Foo    :: Interface((vp) -> vp)
Foo(a) :: vp

The purpose for representing types would be to issue explicit type conversions and annotate objects for type checking.

What's the type of Interface or vp? I guess the issue could be skipped by treating these as representation for the idea of a type. And you could always just leave them untyped as well. Some functions, such as those that are involved when creating new types, may become difficult to type inference anyway.

The type() operation will be run for every element when a module is created, but it doesn't have to succeed for every element in a module.

CFFI-viewpoint

A working C-FFI that interacts in a meaningful way with the type inferencer is the one of the most obvious points where large gains can be made.

If C-libraries are typed well-enough, they are relatively safe to use in a high-level, type-inferenced environment.

An optimizing compiler can be made that takes in the functions and their type inference results, then produce well-optimized code as a result.

I took the interfaces from an existing FFI library I've written and wrote how they correspond to types in the new system:

api.headers  ....... module-like constructs
ffi.library

ffi.array  ......... obvious type constructors
ffi.bitmask
ffi.cfunc
ffi.pointer
ffi.struct
ffi.union
ffi.signed
ffi.unsigned

ffi.wrap  .......... wrapped type, type constructor

ffi.handle  ........ instances of a type
ffi.mem              data constructors
ffi.callback

ffi.pool  .......... semi-automatic pool allocation for trivial uses.
                     a data constructor {often hidden}

Finally

There are some details I left out, and details that I am not aware about yet. But I hope this is interesting enough.

During the next week I'll be writing a subtyping type inferencer in RPython. At first I thought about writing it in the language itself, but I haven't figured out the whole syntax, and this is so important part of the language that I can't leave it last.

I will release a working prototype of this language sometime during this year.