Indexed multidimensional arrays
I wrote some APL & Egison-inspired code to work with multidimensional arrays. This code has been written in Python. I plan to implement some variation of this into Lever 0.10.0.
a = tensor(0, [
tensor(1, [1, 0, 0, 5]),
tensor(1, [0, 1, 0, 2]),
tensor(1, [0, 0, 1, 2]),
tensor(1, [0, 0, 0, 1]),
])
b = tensor(0, [
tensor(1, [1, 2]),
tensor(1, [3, 4]),
tensor(1, [0, 2]),
tensor(1, [1, 0]),
])
c = l(a, {1:cnt("k")}) * l(b, {0:"k"})
The line producing value c
is doing matrix multiplication
expressed with tensors and tensor indice relabeling.
The rows in a
and columns in b
become a contraction pair
after multiplication.
Here are the printouts for a
, b
and c
:
╭1 0 0 5╮
│0 1 0 2│
│0 0 1 2│
╰0 0 0 1╯
╭1 2╮
│3 4│
│0 2│
╰1 0╯
╭6 2╮
│5 4│
│2 2│
╰1 0╯
The code for printing out the tensors were sufficiently challenging to write as well.
You can try all of this code yourself because everything of it has been supplied along this blog post. tensors.py is more of a study rather than a production quality library. I wrote it to see how to put this together.
Definition of the data structure
The library represents tensors with a multidimensional array. It disregards the order in layout in order to produce tensors. I thought this is as unusual, but similar to how the underlying ordering is disregarded in set data structures.
The choice of information presented in each parameter inside the data structure is not perfectly planned out. There is maybe a better way to structure this that I have not found out yet.
It is assumed that none of the dimensions in an array ever reduce to zero length. This is prohibited as it would mean that the tensor collapses into nothing.
Pretty printing
I start by describing the pretty printing as being able to see what the heck is happening is very useful during development of code.
if all(isinstance(i, int) and i >= 0 for i in shape):
if len(shape)/2 <= max(shape):
shape = range(max(shape)+1)
signature = ""
if signature is None:
signature = "{" + ", ".join(repr(i) for i in self.shape) + "}"
indexer = tensor_reshape(self.shape, self.indices, shape, self.indices)
return "".join(generate_table(indexer, self.values)) + signature
The shape
describes the layout
of in which order the dimensions are layed out
in the values
-list.
If the shape proposes that this is a mostly linear
tensor with natural numbers as indices
then we print out the table as if it was
fully populated with indices.
Otherwise the table is generated in the order in
which it has been layouted in, with a trailing signature.
odds = indexer[1::2]
even = indexer[0::2]
rows = []
stride = 1
for count, _ in odds:
stride *= count
rows.append(stride)
row_stride = stride
cols = []
for count, _ in even:
stride *= count
cols.append(stride)
col_stride = stride
if len(rows) < len(cols):
rows.append(row_stride)
if len(rows) > len(cols):
cols.append(col_stride)
We use a display where the tensor has been interleaved such that even ranked indices are columns and odd ranked indices are rows. The above code calculates stride counts to later determine where to put parentheses.
col_size = [1 for col in range(row_stride)]
for j, index in enumerate(take_indices(odds + even)):
k = j % len(col_size)
col_size[k] = max(col_size[k], len(repr(values[index])))
The column sizes are computed so that every column can be spaced into equal widths. After this point we have everything so that the printing can start.
view1 = [u"\u256D", u"\u2502", u"\u2570", u"\u256E", u"\u2502", u"\u256F"]
view2 = [u"\u239B", u"\u239C", u"\u239D", u"\u239E", u"\u239F", u"\u23A0"]
view = view1
First I tried to use the characters purpose-made for drawing long parentheses, but I ended up settling to graphic characters because they look much better in the terminal.
The view2
is the alternative visual I tried.
For curious people,
this is how the alternative visual looks like:
⎛1 0 0 5⎞
⎜0 1 0 2⎟
⎜0 0 1 2⎟
⎝0 0 0 1⎠
Some monospace renderers render this properly but others produce small jitters that hinder the readability.
sp = ""
for j, i in enumerate(take_indices(odds + even)):
k = j % len(col_size)
yield sp
for z, n in enumerate(reversed(rows)):
if j % n == 0:
p = (j - j % row_stride)
uc = (p % cols[-1-z] == 0)
lc = ((p + row_stride) % cols[-1-z] == 0)
if uc and lc:
yield u"\u27EE".encode('utf-8')
elif uc:
yield view[0].encode('utf-8')
elif lc:
yield view[2].encode('utf-8')
else:
yield view[1].encode('utf-8')
yield repr(values[i]).rjust(col_size[k])
sp = " "
for z, n in enumerate(rows):
if (j+1) % n == 0:
p = (j - j % row_stride)
uc = ((p + row_stride) % cols[z] == 0)
lc = (p % cols[z] == 0)
if uc and lc:
yield u"\u27EF".encode('utf-8')
elif uc:
yield view[5].encode('utf-8')
elif lc:
yield view[3].encode('utf-8')
else:
yield view[4].encode('utf-8')
if (j+1) % len(col_size) == 0:
sp = "\n"
The upper part determines
how to print the opening parentheses
and the lower part determines
how to print the closing parentheses.
To get the uc
and lc
conditions right
I had to think very carefully about
what the cols
and rows
-strides mean.
I quickly forgot what they mean
after I got this to work.
Unary operations
Every unary operation is taken to an element directly so it is trivial to write an implementation for applying them.
def unary_operation(op, self):
return Tensor(self.shape,
[op(value) for value in self.values], self.indices)
The binary operations are much harder to implement.
Binary operations
Most of the magic below happens
inside the tensor_reshape
and take_indices
.
def binary_operation(op, self, other):
self = to_tensor(self)
other = to_tensor(other)
contracts = set()
shape = []
indices = {}
for key in self.shape:
if key in indices:
continue
count1 = self.indices.get(key, 1)
count2 = other.indices.get(key, 1)
assert count1 == count2 or count1 == 1 or count2 == 1, "shape error"
indices[key] = max(count1, count2)
shape.append(key)
for key2 in other.shape:
if key2 in indices:
continue
if is_contracting(key2) and contracting_pair(key2) in indices:
key1 = contracting_pair(key2)
contracts.add(key1)
else:
key1 = key2
shape.append(key1)
count1 = self.indices.get(key1, 1)
count2 = other.indices.get(key2, 1)
assert count1 == count2 or count1 == 1 or count2 == 1, "shape error"
indices[key1] = max(count1, count2)
indexer1 = tensor_reshape(self.shape, self.indices, shape, indices)
indexer2 = tensor_reshape(other.shape, other.indices, shape, indices)
indices1 = take_indices(indexer1, 0)
indices2 = take_indices(indexer2, 0)
values = [op(self.values[i], other.values[j])
for i, j in itertools.izip(indices1, indices2)]
num = Tensor(shape, values, indices)
for key in contracts:
num = fold_dimension(contracting_fold(key), num, key)
return num
The code mentions contracting pairs, they are pair of indices that are programmed to fold and merge away from the result when they meet each other.
This design has been heavily inspired by Egison's tensor paper that is an attempt to implement Einstein summation notation in a programming language. It describes symbols that name each dimension and they behave more like annotations than being actual indices representing dimensions. But the way how I've presented it here gives it all whole new interesting meanings.
It is as if tensors never needed to have their dimensions ordered and the choice of symbols for basis and the order of those symbols was always an unavoidable representational detail.
Folding and indexing
Folding and indexing both "destroy" an index, but only in the sense that it is no longer needed in the result. The index is folded into the length of '1' meaning that the value does not expand in that direction. It is as if every index that we know and do not know about would be present in every tensor simultaneously.
def fold_dimension(op, self, rank=0):
if rank not in self.indices:
return self
strides = tensor_strides(self.shape, self.indices)
count, stride = strides.pop(rank)
width = count*stride
shape = [i for i in self.shape if i != rank]
indices = dict((i, self.indices[i]) for i in shape)
indexer = [strides[i] for i in shape]
values = [reduce(op,
(self.values[j] for j in range(i, i+width, stride)))
for i in take_indices(indexer, 0)]
if len(shape) == 0:
return values[0]
return Tensor(shape, values, indices)
def index_dimension(self, rank, index):
if rank not in self.indices:
return self
strides = tensor_strides(self.shape, self.indices)
count, stride = strides.pop(rank)
assert 0 <= index < count, "index range error"
width = count*stride
shape = [i for i in self.shape if i != rank]
indices = dict((i, self.indices[i]) for i in shape)
indexer = [strides[i] for i in shape]
values = [self.values[i] for i in take_indices(indexer, index*stride)]
if len(shape) == 0:
return values[0]
return Tensor(shape, values, indices)
The contraction mechanics automate folding
such that fold
does not need to be called very often.
Reindexing
In a traditional APL-style arrays implementation we would be transposing and stacking indices to adjust how they compose. In this tensors library we need a concept for reindexing indices as physical reordering of the array would not do it.
def l(self, new_names):
strides = tensor_strides(self.shape, self.indices)
contracts = set()
indices = {}
new_strides = {}
shape = []
for rank, (count1, stride1) in strides.iteritems():
rank = new_names.get(rank, rank)
if is_contracting(rank) and contracting_pair(rank) in indices:
key = contracting_pair(rank)
contracts.add(rank)
if rank in new_strides:
count2, stride2 = new_strides[rank]
assert count1 == count2, "shape error"
new_strides[rank] = count1, stride1+stride2
else:
new_strides[rank] = count1, stride1
indices[rank] = count1
shape.append(rank)
indexer = []
for rank in shape:
indexer.append(new_strides[rank])
values = [self.values[i] for i in take_indices(indexer, 0)]
num = Tensor(shape, values, indices)
for key in contracts:
num = fold_dimension(contracting_fold(key), num, key)
return num
The reindexing allows to trigger contraction but it may be also used for taking a diagonal and transposing the tensor. It allows to compact a lot of functionality into a short notation.
Shuffling
Reversing, rolling and slicing are all using the shuffling mechanism. The shuffler does exactly what it says, it shuffles the values inside a tensor.
def reverse_dimension(self, rank):
if rank not in self.indices:
return self
offset = 0
strides = tensor_strides(self.shape, self.indices)
indices = {}
shuffles = []
for index in self.shape:
count, stride = strides[index]
if index != rank:
shuffle = (count, stride, stride-stride*count, count-1)
else:
offset += stride*(count-1)
shuffle = (count, -stride, -stride+stride*count, count-1)
indices[index] = count
shuffles.append(shuffle)
values = [self.values[i] for i in take_shuffle(shuffles, offset)]
return Tensor(self.shape, values, indices)
def roll_dimension(self, rank, amount):
if rank not in self.indices:
return self
offset = 0
strides = tensor_strides(self.shape, self.indices)
indices = {}
shuffles = []
for index in self.shape:
if index != rank:
shuffle = (count, stride, stride-stride*count, count-1)
else:
amount %= count
offset += stride*amount
shuffle = (count, stride, stride-stride*count, count-1-amount)
indices[index] = count
shuffles.append(shuffle)
values = [self.values[i] for i in take_shuffle(shuffles, offset)]
return Tensor(self.shape, values, indices)
def slice_dimension(self, rank, start, stop):
if rank not in self.indices:
return self
offset = 0
strides = tensor_strides(self.shape, self.indices)
indices = {}
shuffles = []
for index in self.shape:
if index != rank:
shuffle = (count, stride, stride-stride*count, count-1)
else:
assert 0 <= start < stop <= count, "slice error"
offset += stride*start
count = stop-start
shuffle = (count, stride, stride-stride*count, count-1)
indices[index] = count
shuffles.append(shuffle)
values = [self.values[i] for i in take_shuffle(shuffles, offset)]
return Tensor(self.shape, values, indices)
The name 'roll' refers to the old APL operation that rotate the values, as if in a fortune wheel. The 'slice' corresponds to the like-named Python operation that takes a slice from a list.
Tensor generation and construction
We may either generate tensors automatically, or then feed in values to fill into a tensor.
def generate_tensor(dimensions, function):
shape = []
counts = []
indices = {}
for key, count in dimensions:
assert key not in indices, "shape error"
indices[key] = count
counts.append(count)
shape.append(key)
values = [function(state) for state in index_generator(counts)]
return Tensor(shape, values, indices)
def index_generator(counts):
state = [0] * len(counts)
while True:
yield state
for i in range(len(counts)):
state[i] += 1
if state[i] < counts[i]:
break
else:
state[i] = 0
else:
break
The main tool for constructing tensors
is supposed to be this function tensor
.
def tensor(index, objects):
objects = [to_tensor(n) for n in objects]
shape = []
indices = {}
for t in objects:
for col, count in t.indices.iteritems():
assert not is_contracting_pair(index, col), "cannot re-introduce a rank"
if col not in indices:
shape.append(col)
indices[col] = count
assert indices[col] == count or count == 1, "shape error"
if is_contracting(col):
assert contracting_pair(col) not in indices, "polarity error"
indices[col] = max(indices[col], count)
values = [t.values[i]
for t in objects
for i in take_indices(tensor_reshape(
t.shape, t.indices, shape, indices))]
indices[index] = len(objects)
shape.append(index)
return Tensor(shape, values, indices)
It takes the index and the objects to be put into that index. To make this trivial I require that you cannot reintroduce an index that is already present in the objects. I also require that the input objects are unable to trigger a contraction.
The tensor reshaping & indexing methods
A lot of the core functionality is packed into few functions that are then reused in the program. These are the functions to index the underlying data structure.
The tensor_reshape
takes an old tensor
and reshapes it into the shape of a new tensor.
def tensor_reshape(shape, indices, newshape, newindices):
vector = tensor_strides(shape, indices)
new_vector = []
for index in newshape:
if is_contracting(index):
coindex = contracting_pair(index)
if coindex in vector:
count1, stride = vector.get(coindex, (1,0))
else:
count1, stride = vector.get(index, (1,0))
else:
count1, stride = vector.get(index, (1,0))
count2 = newindices[index]
assert count1 == count2 or count1 == 1, (index, count1, count2)
new_vector.append((count2, stride))
return new_vector
The tensor_srides
calculate a stride for each index
and places them into a map for easy access.
It sets stride=0
for everything that has only one element
because they are not present in the array.
This tweak makes it easy to spread the single element
to fill a row or a column.
It would also suggest that vectors are an odd form of a quotient. A real number divided into multiple pieces across an index.
def tensor_strides(shape, indices, stride=1):
vector = {}
for index in shape:
count = indices.get(index, 1)
vector[index] = (count, (stride if count > 1 else 0))
stride *= count
return vector
The take_indices
and take_shuffle
are such functions
that I have been on the borderline of
merging them together,
but they seem to have their distinct places
where they are useful,
so I have kept them separate.
They generate offsets and indices into the flat array
that represents the tensor.
def take_indices(vector, offset=0):
state = [0] * len(vector)
while True:
yield offset
for i in range(len(vector)):
offset += vector[i][1]
state[i] += 1
if state[i] < vector[i][0]:
break
else:
offset -= state[i]*vector[i][1]
state[i] = 0
else:
break
def take_shuffle(shuffles, offset=0):
state = [0] * len(shuffles)
while True:
yield offset
for i in range(len(shuffles)):
count, step, skip, skipi = shuffles[i]
offset += (step if state[i] != skipi else skip)
state[i] += 1
if state[i] < count:
break
else:
state[i] = 0
else:
break
Contraction
Some of the operations above react to the contracting pairs with a fold. The contracting pair provides the folding operation and can be an any object. I expect thought that it is a natural number or a symbol representing an invariant quantity.
def is_contracting(index):
return True
def is_contracting_pair(a, b):
if isinstance(a, cnt) and a.index == b:
return True
if isinstance(b, cnt) and b.index == a:
return True
return False
def contracting_pair(index):
if isinstance(index, cnt):
return index.index
else:
return cnt(index)
Every symbol originally was not a contracting pair, so there is a bit of history left of that in the code. I did not clean it up because eventually there is a time when you have to stop and move on. It came when I verified that this is a workable concept.
The one last remaining thing was to provide a tool to represent contraction. I originally thought about negative values for this purpose. But I found it to be problematic because zero is a valid rank. We are taught to treat zero as a number like every other with the only difference that it is a neutral element.
class cnt(object):
def __init__(self, index):
self.index = index
def __eq__(self, other):
if isinstance(other, cnt):
return self.index == other.index
return False
def __ne__(self, other):
return not (self == other)
def __hash__(self):
return hash(self.index)
def __repr__(self):
return "cnt({})".format(self.index)
With the contraction rules I default to addition and do the contraction eagerly. This is different from the Egison paper. I decided to associate the operation to the invariant quantity used for contraction.
class csym(object):
def __init__(self, op):
self.op = op
def __repr__(self):
return "<csym({}) {}>".format(self.op, id(self))
The purpose of this post
This is the first time when I tried a concept called 'semantic linefeeds' when writing a blog post. I suppose that helped me to explain my line of thought better and made it easier to write the text. It also gave it a different tone. thumbs up a good idea, learned from Brandon Rhodes.
You may also like to look at the arrays.py. That was the prototype leading to this beauty here.
But why did I write this post?
I spent three days to get this through
and if I always spent that much to blogging
I would be blogging half of my time.
I think that this is something unusual
that I do not see every day.
I would like to know what smart people
think about this approach.
A lot of time I get good feedback at
irc.freenode.net#proglangdesign
and I am grateful for that.
But I guess it would be interesting to hear
what a mathematician thinks about
the concepts and claims of tensors presented here.