gusl: (Default)
[personal profile] gusl
I've just realized that using hash tables properly requires a notion of when two objects have equal value.

How come I never had to worry about this in the past?

---

I want my hash table to encode an equivalence relation (permutations of node labels), automatically dealing with my label-switching problem.

---

Do any of the normal languages/libraries let you pass an equality test, and adapt the hash function accordingly? I imagine you'd need to pass a function that brings the objects into a normal form.

---

I've installed R's hash library, but it seems to be pretty basic.

(no subject)

Date: 2009-11-18 10:53 pm (UTC)
From: [identity profile] pbrane.livejournal.com
Yeah, this kind of thing can really bite you in the ass if your notions of equals and hash value aren't compatible. Ugly sneaky bugs.

(no subject)

Date: 2009-11-19 01:23 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Do any of the normal languages/libraries let you pass an equality test, and adapt the hash function accordingly? I imagine you'd need to pass a function that brings the objects into a normal form.

(no subject)

Date: 2009-11-20 01:12 pm (UTC)
From: (Anonymous)
Ah yes. I remember when I was 16 and working with hash tables for the first time that the awesome performance gains I had weren't that awesome after all. And of course by then the code was live, in production, with the object tables being persisted, and now corrupted to hell and gone. It makes me a little nostalgic and wistful that all my hard won knowledge of pointer optimizations is irrelevant in the age of JAVA and managed code.

(no subject)

Date: 2009-11-20 08:26 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Thanks for the comment. Who is this?

(no subject)

Date: 2009-11-18 11:07 pm (UTC)
From: [identity profile] en-ki.livejournal.com
Because it's rare to want to check a hash table with a mutable value different from what you stored in the first place, so eq and equals are usually the same?

(no subject)

Date: 2009-11-19 01:59 am (UTC)
From: [identity profile] hober.livejournal.com
Common Lisp's make-hash-table optionally takes a :test keyword argument.

See also KMP's article.

(no subject)

Date: 2009-11-19 02:02 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
I feel like I knew that! I've definitely passed :test to a bunch of other functions.

Thanks!

(no subject)

Date: 2009-11-19 02:04 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
When you add a new pair, does it run the test against everything that's stored? I imagine that would destroy the efficiency of using hashes.

(no subject)

Date: 2009-11-19 02:36 am (UTC)
ikeepaleopard: (Default)
From: [personal profile] ikeepaleopard
Typically it'll test against some subset of the things that have the same key. That is, everything that hashes to the same key will go in a linked list or something, then it will look through the list to find the element that is actually equal.

(no subject)

Date: 2009-11-19 06:13 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
It sounds like I didn't state my question very well.

The problem is I'm not assuming that the hash function implements EQUAL.
And it sounds like we're using "key" in different ways: to me, "key" is the thing on the left in a key-value pair, i.e. you apply the hash function to the key.

Suppose I define EQUALity between trees so that:

[[a b] c] EQUAL [c [b a]]

I don't know how to make them hash to the same thing (other than by converting them to a normal form before hashing)... and what's the use of having a :test argument if EQUAL things hash to different things?

(no subject)

Date: 2009-11-19 06:45 am (UTC)
ikeepaleopard: (Default)
From: [personal profile] ikeepaleopard
Oh sorry, I was moving between two things in my head that are both referred to as keys. The input to the hash function, and the hashkey you get out which is the input to a map that outputs a bucket of key-value pairs whose keys all have the same hashkey.

The contract for hash functions typically requires a = b => hash(a) = hash(b). If you're using some complicated equivalence relation on keys then you still need to follow that contract. If that's difficult or slow, you need to restructure your code or figure out if you really wanted a hash table. There's no magic bullet.

(no subject)

Date: 2009-11-19 07:37 am (UTC)
From: [identity profile] bhudson.livejournal.com
There is the option, used, I gather, in some model-checking codes, to toss out the a and b and just keep hash(a) and hash(b). Then you get imperfect membership tests, but you can prove w.h.p. results about how likely it is that you inaccurately say "yes, it's there" when it's not -- you just need the number of possible hash values to be superlinear (and various other independence assumptions that you make when you use a hash table).

(no subject)

Date: 2009-11-19 08:21 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Both of these comments are tangential to the question of why you'd pass an equality test as an argument to the hash table creation, given that when you add a new element, you want to avoid comparing it to all the keys in the table (which is the whole point of using a hash table).

(no subject)

Date: 2009-11-19 04:48 pm (UTC)
From: [identity profile] wjl.livejournal.com
when you add a new element (k, v), you add (k, v) to the bucket table[hash(k)]. ("add" and not "assign", since more than one k may have the same hash value.) this shouldn't require any equality tests (unless you want to make sure to overwrite the old value for k, if any exists).

when you look up a key k, you iterate through the bucket table[hash(k)] looking for some key-value pair whose key is equal to k. if you find one, you return v. this is where the equality test comes in, since hash values may overlap -- table[hash(k)] may contain many (k', v') pairs for which k' is not equal to k.

in any case, the complexity guarantees hash tables give you depend upon the hash function being good enough that the buckets don't become too full. for instance, if everything just hashed to 1, everything would go in the same bucket, and hash table lookup would degenerate into linear search.

(no subject)

Date: 2009-11-19 04:50 pm (UTC)
From: [identity profile] wjl.livejournal.com
p.s. O'Caml's Hashtbl module comes with a functor Hashtbl.Make(...), to which you can pass any type along with an equality predicate on that type and a hash function on that type. using the functor basically amounts to patching in your own equality and hash functions.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags