Hashing, comparison and equality, questions for readers

I have some open-ended questions that I have not found a satisfying answer yet. I believe they can be solved but it has not pressed against me so badly that I would have had to. I think that some of you readers would like to figure this out.

Lever has three mechanisms that may interact unfavorably.

Hash lookup

Hash table lookup relies on two functions:

hash(object)
a == b

The first one returns a hash number for the object, to quickly differentiate between values. For this to work the following constraints must hold:

x == y and hash(x) == hash(y)
x == y and y == x

If the hash and equality do not match for an object, the hash table lookups will produce false negatives.

Identity

When we are programming we expect that some rules hold. Notably, that certain objects are only equal with itself. It is common to use equality and inequality operations for this:

a == b
a != b

Unless we are dealing with illogical objects, the following constraints hold:

a != b and not a == b
true  == true
false == false
null  == null

Lever doesn't have a separate operator for checking identity, but for various reasons it has a tool to represent an object by its identity:

Id(a) == Id(b)

This allows us to use ordinary hash tables and sets to plot out values by their identities. It is extremely useful if we intend to search through objects and values, treating them as a graph, for example, to print them.

Comparison

Finally, for some objects we want to compare them. Is this larger than the another object? Or smaller?

a == b
a != b
a <= b
a >= b
a < b
a > b
min(a, b)
max(a, b)

Minimum and maximum are not usually in this list because they are derived from the above rules, but there is an interesting experimental feature in Lever's min and max functions:

min(null, null) == null
min(null, x) == x
min(x, null) == x

max(null, null) == null
max(null, x) == x
max(x, null) == x

Because null represents absence, if either one of the values passed to a max or min function is null, then it means that whatever is left is always smallest or largest.

But am I right? If we have a null and an any other value to compare with, the comparison with the following rules would be absurd:

x < null == true
null < x == true

The above rule would be absurd because it makes the sort() function impossible. Or how would you sort the lower list?

[1, 2, null]

Where the null is supposed to be in that list? The current approach taken by Lever is to throw an error on comparison when it encounters elements it cannot compare.

The problem

When comparing values or even hash table plotting them, sometimes we want to cross type boundaries during the comparison. For example, we would still want to sort the following list:

[1, 2.3, 1.2, 9, 0]

And end up with:

[0, 1, 1.2, 2.3, 9]

This means that the comparison must be valid between floating point and integer numbers:

3.2 < 3 == false
2 < 3.9 == true

The '<' and '==' are multimethods in Lever. The multimethod observes the types. To get the idea here's a sequence of events what happens during comparison:

3.2 < 3
'<' multimethod gets the argument list [3.2, 3]
It takes interface() of both arguments
[float, int]
Is the key in the '<'s dispatch table?
  If there is, take a function from the table and call it.
Otherwise call '<'.default(3.2, 3)

So we check into the dispatch table with the exact types for an implementation, if there's none we will call the default implementation for a method.

With arithmetic I have solved this by having a default implementation that calls coerce() on the arguments and re-attempts the operation once.

Unfortunately it is uncertain whether we can use the same coerce() for both arithmetic and comparative operations. I don't know if I can use the same approach at all here.

The problem is that we want to do this:

true + true == 2

But the coercion of [bool, bool] to [int, int], or anything else we put into coerce(), can end up violating the rules we expect to be true in the identity, sorting and hash table lookup comparisons.

Or is there even a problem to solve in the first place?

If you have answers or further insights on this subject, I have established a gitter channel for Lever. You can come to chat into gitter.im/leverlanguage/Lobby any time. :) The chats are logged so it is worthwhile to write there if I didn't appear to be around. I will see them anyway.

Similar posts