I've found a couple of glitches when doing the lab myself and using the c++ routines I pointed to.

- in my system, though int's are 32 bits, the constants RAND_MAX and LONG_PRIME are 16 bits (so at most 2^15-1). This gives far too little randomness for checking large sets of items. I've reposted distrib.h which simulates (badly) a 32 random bit generator. Also, if this happens in your system, you may want to change the lines

hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

in the genajbj method of count_min_sketch.cpp with, for example,

hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][0] *= RAND_MAX;

hashes[i][0] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] *= RAND_MAX;

hashes[i][1] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

- watch out because CountMinSketch::update takes an int, but CountMinSketch::estimate returns an unsigned int. Watch subtractions with unsigneds which may give nonsensical results; use casts appropriately.

Hopefully these things don't show up in other languages.

## No comments:

## Post a Comment