Indexed multidimensional arrays

I wrote some APL & Egison-inspired code to work with multidimensional arrays. This code has been written in Python. I plan to implement some variation of this into Lever 0.10.0.

a = tensor(0, [
    tensor(1, [1, 0, 0, 5]),
    tensor(1, [0, 1, 0, 2]),
    tensor(1, [0, 0, 1, 2]),
    tensor(1, [0, 0, 0, 1]),
])
b = tensor(0, [
    tensor(1, [1, 2]),
    tensor(1, [3, 4]),
    tensor(1, [0, 2]),
    tensor(1, [1, 0]),
])
c = l(a, {1:cnt("k")}) * l(b, {0:"k"})

The line producing value c is doing matrix multiplication expressed with tensors and tensor indice relabeling. The rows in a and columns in b become a contraction pair after multiplication. Here are the printouts for a, b and c:

╭1 0 0 5╮
│0 1 0 2│
│0 0 1 2│
╰0 0 0 1╯
╭1 2╮
│3 4│
│0 2│
╰1 0╯
╭6 2╮
│5 4│
│2 2│
╰1 0╯

The code for printing out the tensors were sufficiently challenging to write as well.

You can try all of this code yourself because everything of it has been supplied along this blog post. tensors.py is more of a study rather than a production quality library. I wrote it to see how to put this together.

Definition of the data structure

The library represents tensors with a multidimensional array. It disregards the order in layout in order to produce tensors. I thought this is as unusual, but similar to how the underlying ordering is disregarded in set data structures.

The choice of information presented in each parameter inside the data structure is not perfectly planned out. There is maybe a better way to structure this that I have not found out yet.

It is assumed that none of the dimensions in an array ever reduce to zero length. This is prohibited as it would mean that the tensor collapses into nothing.

Pretty printing

I start by describing the pretty printing as being able to see what the heck is happening is very useful during development of code.

if all(isinstance(i, int) and i >= 0 for i in shape):
    if len(shape)/2 <= max(shape):
        shape = range(max(shape)+1)
        signature = ""
if signature is None:
    signature = "{" + ", ".join(repr(i) for i in self.shape) + "}"
indexer = tensor_reshape(self.shape, self.indices, shape, self.indices)
return "".join(generate_table(indexer, self.values)) + signature

The shape describes the layout of in which order the dimensions are layed out in the values -list. If the shape proposes that this is a mostly linear tensor with natural numbers as indices then we print out the table as if it was fully populated with indices. Otherwise the table is generated in the order in which it has been layouted in, with a trailing signature.

odds = indexer[1::2]
even = indexer[0::2]
rows = []
stride = 1
for count, _ in odds:
    stride *= count
    rows.append(stride)
row_stride = stride
cols = []
for count, _ in even:
    stride *= count
    cols.append(stride)
col_stride = stride
if len(rows) < len(cols):
    rows.append(row_stride)
if len(rows) > len(cols):
    cols.append(col_stride)

We use a display where the tensor has been interleaved such that even ranked indices are columns and odd ranked indices are rows. The above code calculates stride counts to later determine where to put parentheses.

col_size = [1 for col in range(row_stride)]
for j, index in enumerate(take_indices(odds + even)):
    k = j % len(col_size)
    col_size[k] = max(col_size[k], len(repr(values[index])))

The column sizes are computed so that every column can be spaced into equal widths. After this point we have everything so that the printing can start.

view1 = [u"\u256D", u"\u2502", u"\u2570", u"\u256E", u"\u2502", u"\u256F"]
view2 = [u"\u239B", u"\u239C", u"\u239D", u"\u239E", u"\u239F", u"\u23A0"]
view = view1

First I tried to use the characters purpose-made for drawing long parentheses, but I ended up settling to graphic characters because they look much better in the terminal.

The view2 is the alternative visual I tried. For curious people, this is how the alternative visual looks like:

⎛1 0 0 5⎞
⎜0 1 0 2⎟
⎜0 0 1 2⎟
⎝0 0 0 1⎠

Some monospace renderers render this properly but others produce small jitters that hinder the readability.

sp = ""
for j, i in enumerate(take_indices(odds + even)):
    k = j % len(col_size)
    yield sp
    for z, n in enumerate(reversed(rows)):
        if j % n == 0:
            p = (j - j % row_stride)
            uc = (p % cols[-1-z] == 0)
            lc = ((p + row_stride) % cols[-1-z] == 0)
            if uc and lc:
                yield u"\u27EE".encode('utf-8')
            elif uc:
                yield view[0].encode('utf-8')
            elif lc:
                yield view[2].encode('utf-8')
            else:
                yield view[1].encode('utf-8')
    yield repr(values[i]).rjust(col_size[k])
    sp = " "
    for z, n in enumerate(rows):
        if (j+1) % n == 0:
            p = (j - j % row_stride)
            uc = ((p + row_stride) % cols[z] == 0)
            lc = (p % cols[z] == 0)
            if uc and lc:
                yield u"\u27EF".encode('utf-8')
            elif uc:
                yield view[5].encode('utf-8')
            elif lc:
                yield view[3].encode('utf-8')
            else:
                yield view[4].encode('utf-8')
    if (j+1) % len(col_size) == 0:
        sp = "\n"

The upper part determines how to print the opening parentheses and the lower part determines how to print the closing parentheses. To get the uc and lc conditions right I had to think very carefully about what the cols and rows -strides mean. I quickly forgot what they mean after I got this to work.

Unary operations

Every unary operation is taken to an element directly so it is trivial to write an implementation for applying them.

def unary_operation(op, self):
    return Tensor(self.shape,
        [op(value) for value in self.values], self.indices)

The binary operations are much harder to implement.

Binary operations

Most of the magic below happens inside the tensor_reshape and take_indices.

def binary_operation(op, self, other):
    self = to_tensor(self)
    other = to_tensor(other)
    contracts = set()
    shape = []
    indices = {}
    for key in self.shape:
        if key in indices:
            continue
        count1 = self.indices.get(key, 1)
        count2 = other.indices.get(key, 1)
        assert count1 == count2 or count1 == 1 or count2 == 1, "shape error"
        indices[key] = max(count1, count2)
        shape.append(key)
    for key2 in other.shape:
        if key2 in indices:
            continue
        if is_contracting(key2) and contracting_pair(key2) in indices:
            key1 = contracting_pair(key2)
            contracts.add(key1)
        else:
            key1 = key2
            shape.append(key1)
        count1 = self.indices.get(key1, 1)
        count2 = other.indices.get(key2, 1)
        assert count1 == count2 or count1 == 1 or count2 == 1, "shape error"
        indices[key1] = max(count1, count2)
    indexer1 = tensor_reshape(self.shape, self.indices, shape, indices)
    indexer2 = tensor_reshape(other.shape, other.indices, shape, indices)
    indices1 = take_indices(indexer1, 0)
    indices2 = take_indices(indexer2, 0)
    values = [op(self.values[i], other.values[j])
        for i, j in itertools.izip(indices1, indices2)]
    num = Tensor(shape, values, indices)
    for key in contracts:
        num = fold_dimension(contracting_fold(key), num, key)
    return num

The code mentions contracting pairs, they are pair of indices that are programmed to fold and merge away from the result when they meet each other.

This design has been heavily inspired by Egison's tensor paper that is an attempt to implement Einstein summation notation in a programming language. It describes symbols that name each dimension and they behave more like annotations than being actual indices representing dimensions. But the way how I've presented it here gives it all whole new interesting meanings.

It is as if tensors never needed to have their dimensions ordered and the choice of symbols for basis and the order of those symbols was always an unavoidable representational detail.

Folding and indexing

Folding and indexing both "destroy" an index, but only in the sense that it is no longer needed in the result. The index is folded into the length of '1' meaning that the value does not expand in that direction. It is as if every index that we know and do not know about would be present in every tensor simultaneously.

def fold_dimension(op, self, rank=0):
    if rank not in self.indices:
        return self
    strides = tensor_strides(self.shape, self.indices)
    count, stride = strides.pop(rank)
    width = count*stride
    shape = [i for i in self.shape if i != rank]
    indices = dict((i, self.indices[i]) for i in shape)
    indexer = [strides[i] for i in shape]
    values = [reduce(op,
        (self.values[j] for j in range(i, i+width, stride)))
        for i in take_indices(indexer, 0)]
    if len(shape) == 0:
        return values[0]
    return Tensor(shape, values, indices)

def index_dimension(self, rank, index):
    if rank not in self.indices:
        return self
    strides = tensor_strides(self.shape, self.indices)
    count, stride = strides.pop(rank)
    assert 0 <= index < count, "index range error"
    width = count*stride
    shape = [i for i in self.shape if i != rank]
    indices = dict((i, self.indices[i]) for i in shape)
    indexer = [strides[i] for i in shape]
    values = [self.values[i] for i in take_indices(indexer, index*stride)]
    if len(shape) == 0:
        return values[0]
    return Tensor(shape, values, indices)

The contraction mechanics automate folding such that fold does not need to be called very often.

Reindexing

In a traditional APL-style arrays implementation we would be transposing and stacking indices to adjust how they compose. In this tensors library we need a concept for reindexing indices as physical reordering of the array would not do it.

def l(self, new_names):
    strides = tensor_strides(self.shape, self.indices)
    contracts = set()
    indices = {}
    new_strides = {}
    shape = []
    for rank, (count1, stride1) in strides.iteritems():
        rank = new_names.get(rank, rank)
        if is_contracting(rank) and contracting_pair(rank) in indices:
            key = contracting_pair(rank)
            contracts.add(rank)
        if rank in new_strides:
            count2, stride2 = new_strides[rank]
            assert count1 == count2, "shape error"
            new_strides[rank] = count1, stride1+stride2
        else:
            new_strides[rank] = count1, stride1
            indices[rank] = count1
            shape.append(rank)
    indexer = []
    for rank in shape:
        indexer.append(new_strides[rank])
    values = [self.values[i] for i in take_indices(indexer, 0)]
    num = Tensor(shape, values, indices)
    for key in contracts:
        num = fold_dimension(contracting_fold(key), num, key)
    return num

The reindexing allows to trigger contraction but it may be also used for taking a diagonal and transposing the tensor. It allows to compact a lot of functionality into a short notation.

Shuffling

Reversing, rolling and slicing are all using the shuffling mechanism. The shuffler does exactly what it says, it shuffles the values inside a tensor.

def reverse_dimension(self, rank):
    if rank not in self.indices:
        return self
    offset = 0
    strides = tensor_strides(self.shape, self.indices)
    indices = {}
    shuffles = []
    for index in self.shape:
        count, stride = strides[index]
        if index != rank:
            shuffle = (count, stride, stride-stride*count, count-1)
        else:
            offset += stride*(count-1)
            shuffle = (count, -stride, -stride+stride*count, count-1)
        indices[index] = count
        shuffles.append(shuffle)
    values = [self.values[i] for i in take_shuffle(shuffles, offset)]
    return Tensor(self.shape, values, indices)

def roll_dimension(self, rank, amount):
    if rank not in self.indices:
        return self
    offset = 0
    strides = tensor_strides(self.shape, self.indices)
    indices = {}
    shuffles = []
    for index in self.shape:
        if index != rank:
            shuffle = (count, stride, stride-stride*count, count-1)
        else:
            amount %= count
            offset += stride*amount
            shuffle = (count, stride, stride-stride*count, count-1-amount)
        indices[index] = count
        shuffles.append(shuffle)
    values = [self.values[i] for i in take_shuffle(shuffles, offset)]
    return Tensor(self.shape, values, indices)

def slice_dimension(self, rank, start, stop):
    if rank not in self.indices:
        return self
    offset = 0
    strides = tensor_strides(self.shape, self.indices)
    indices = {}
    shuffles = []
    for index in self.shape:
        if index != rank:
            shuffle = (count, stride, stride-stride*count, count-1)
        else:
            assert 0 <= start < stop <= count, "slice error"
            offset += stride*start
            count = stop-start
            shuffle = (count, stride, stride-stride*count, count-1)
        indices[index] = count
        shuffles.append(shuffle)
    values = [self.values[i] for i in take_shuffle(shuffles, offset)]
    return Tensor(self.shape, values, indices)

The name 'roll' refers to the old APL operation that rotate the values, as if in a fortune wheel. The 'slice' corresponds to the like-named Python operation that takes a slice from a list.

Tensor generation and construction

We may either generate tensors automatically, or then feed in values to fill into a tensor.

def generate_tensor(dimensions, function):
    shape = []
    counts = []
    indices = {}
    for key, count in dimensions:
        assert key not in indices, "shape error"
        indices[key] = count
        counts.append(count)
        shape.append(key)
    values = [function(state) for state in index_generator(counts)]
    return Tensor(shape, values, indices)

def index_generator(counts):
    state = [0] * len(counts)
    while True:
        yield state
        for i in range(len(counts)):
            state[i] += 1
            if state[i] < counts[i]:
                break
            else:
                state[i] = 0
        else:
            break

The main tool for constructing tensors is supposed to be this function tensor.

def tensor(index, objects):
    objects = [to_tensor(n) for n in objects]
    shape   = []
    indices = {}
    for t in objects:
        for col, count in t.indices.iteritems():
            assert not is_contracting_pair(index, col), "cannot re-introduce a rank"
            if col not in indices:
                shape.append(col)
                indices[col] = count
            assert indices[col] == count or count == 1, "shape error"
            if is_contracting(col):
                assert contracting_pair(col) not in indices, "polarity error"
            indices[col] = max(indices[col], count)
    values = [t.values[i]
        for t in objects
        for i in take_indices(tensor_reshape(
            t.shape, t.indices, shape, indices))]
    indices[index] = len(objects)
    shape.append(index)
    return Tensor(shape, values, indices)

It takes the index and the objects to be put into that index. To make this trivial I require that you cannot reintroduce an index that is already present in the objects. I also require that the input objects are unable to trigger a contraction.

The tensor reshaping & indexing methods

A lot of the core functionality is packed into few functions that are then reused in the program. These are the functions to index the underlying data structure.

The tensor_reshape takes an old tensor and reshapes it into the shape of a new tensor.

def tensor_reshape(shape, indices, newshape, newindices):
    vector = tensor_strides(shape, indices)
    new_vector = []
    for index in newshape:
        if is_contracting(index):
            coindex = contracting_pair(index)
            if coindex in vector:
                count1, stride = vector.get(coindex, (1,0))
            else:
                count1, stride = vector.get(index, (1,0))
        else:
            count1, stride = vector.get(index, (1,0))
        count2 = newindices[index]
        assert count1 == count2 or count1 == 1, (index, count1, count2)
        new_vector.append((count2, stride))
    return new_vector

The tensor_srides calculate a stride for each index and places them into a map for easy access. It sets stride=0 for everything that has only one element because they are not present in the array. This tweak makes it easy to spread the single element to fill a row or a column.

It would also suggest that vectors are an odd form of a quotient. A real number divided into multiple pieces across an index.

def tensor_strides(shape, indices, stride=1):
    vector = {}
    for index in shape:
        count = indices.get(index, 1)
        vector[index] = (count, (stride if count > 1 else 0))
        stride *= count
    return vector

The take_indices and take_shuffle are such functions that I have been on the borderline of merging them together, but they seem to have their distinct places where they are useful, so I have kept them separate. They generate offsets and indices into the flat array that represents the tensor.

def take_indices(vector, offset=0):
    state = [0] * len(vector)
    while True:
        yield offset
        for i in range(len(vector)):
            offset   += vector[i][1]
            state[i] += 1
            if state[i] < vector[i][0]:
                break
            else:
                offset  -= state[i]*vector[i][1]
                state[i] = 0
        else:
            break

def take_shuffle(shuffles, offset=0):
    state = [0] * len(shuffles)
    while True:
        yield offset
        for i in range(len(shuffles)):
            count, step, skip, skipi = shuffles[i]
            offset += (step if state[i] != skipi else skip)
            state[i] += 1
            if state[i] < count:
                break
            else:
                state[i] = 0
        else:
            break

Contraction

Some of the operations above react to the contracting pairs with a fold. The contracting pair provides the folding operation and can be an any object. I expect thought that it is a natural number or a symbol representing an invariant quantity.

def is_contracting(index):
    return True

def is_contracting_pair(a, b):
    if isinstance(a, cnt) and a.index == b:
        return True
    if isinstance(b, cnt) and b.index == a:
        return True
    return False

def contracting_pair(index):
    if isinstance(index, cnt):
        return index.index
    else:
        return cnt(index)

Every symbol originally was not a contracting pair, so there is a bit of history left of that in the code. I did not clean it up because eventually there is a time when you have to stop and move on. It came when I verified that this is a workable concept.

The one last remaining thing was to provide a tool to represent contraction. I originally thought about negative values for this purpose. But I found it to be problematic because zero is a valid rank. We are taught to treat zero as a number like every other with the only difference that it is a neutral element.

class cnt(object):
    def __init__(self, index):
        self.index = index

    def __eq__(self, other):
        if isinstance(other, cnt):
            return self.index == other.index
        return False

    def __ne__(self, other):
        return not (self == other)

    def __hash__(self):
        return hash(self.index)

    def __repr__(self):
        return "cnt({})".format(self.index)

With the contraction rules I default to addition and do the contraction eagerly. This is different from the Egison paper. I decided to associate the operation to the invariant quantity used for contraction.

class csym(object):
    def __init__(self, op):
        self.op = op

    def __repr__(self):
        return "<csym({}) {}>".format(self.op, id(self))

The purpose of this post

This is the first time when I tried a concept called 'semantic linefeeds' when writing a blog post. I suppose that helped me to explain my line of thought better and made it easier to write the text. It also gave it a different tone. thumbs up a good idea, learned from Brandon Rhodes.

You may also like to look at the arrays.py. That was the prototype leading to this beauty here.

But why did I write this post? I spent three days to get this through and if I always spent that much to blogging I would be blogging half of my time. I think that this is something unusual that I do not see every day. I would like to know what smart people think about this approach. A lot of time I get good feedback at irc.freenode.net#proglangdesign and I am grateful for that. But I guess it would be interesting to hear what a mathematician thinks about the concepts and claims of tensors presented here.