Last lecture today, on predictor evaluation, more predictors, and clustering.
No lab finally.
This is about it for the seminar.
MIRI Seminar on Data Streams, spring 2014 edition
Tuesday, May 27, 2014
Tuesday, May 20, 2014
Wednesday May 21st
We'll have a lecture on frequent pattern mining. Slides here right before the class.
No lab today.
No lab today.
Tuesday, May 13, 2014
Monday, April 28, 2014
Next two weeks
No lab on this wednesday 30th, sorry. There will be less exercises and more labs in the second half of the course, data mining.
If you have time, start downloading MOA http://moa.cms.waikato.ac.nz/ and playing with it.
Also, I'm at a workshop next week (May 6th-9th), so no theory and no lab.
Ricard
If you have time, start downloading MOA http://moa.cms.waikato.ac.nz/ and playing with it.
Also, I'm at a workshop next week (May 6th-9th), so no theory and no lab.
Ricard
Tuesday, April 22, 2014
Tuesday, April 8, 2014
Saturday, April 5, 2014
Glitches in lab 1, c++ source
I've found a couple of glitches when doing the lab myself and using the c++ routines I pointed to.
- in my system, though int's are 32 bits, the constants RAND_MAX and LONG_PRIME are 16 bits (so at most 2^15-1). This gives far too little randomness for checking large sets of items. I've reposted distrib.h which simulates (badly) a 32 random bit generator. Also, if this happens in your system, you may want to change the lines
hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
in the genajbj method of count_min_sketch.cpp with, for example,
hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][0] *= RAND_MAX;
hashes[i][0] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][1] *= RAND_MAX;
hashes[i][1] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
- watch out because CountMinSketch::update takes an int, but CountMinSketch::estimate returns an unsigned int. Watch subtractions with unsigneds which may give nonsensical results; use casts appropriately.
Hopefully these things don't show up in other languages.
- in my system, though int's are 32 bits, the constants RAND_MAX and LONG_PRIME are 16 bits (so at most 2^15-1). This gives far too little randomness for checking large sets of items. I've reposted distrib.h which simulates (badly) a 32 random bit generator. Also, if this happens in your system, you may want to change the lines
hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
in the genajbj method of count_min_sketch.cpp with, for example,
hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][0] *= RAND_MAX;
hashes[i][0] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
hashes[i][1] *= RAND_MAX;
hashes[i][1] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
- watch out because CountMinSketch::update takes an int, but CountMinSketch::estimate returns an unsigned int. Watch subtractions with unsigneds which may give nonsensical results; use casts appropriately.
Hopefully these things don't show up in other languages.
Subscribe to:
Posts (Atom)