MIRI Seminar on Data Streams, spring 2014 edition: Welcome to the Spring 2014 (and first) edition of the MIRI Seminar on Data Streams

Logistics

Instructor: Ricard Gavaldà. Mail: gavalda AT lsi then the UPC domain.
Start date: Wednesday March 19th, 2014
End date: Roughly end of May, 2014
3 ECTS credits, to be enrolled and claimed from MIRI at the end of the course.
Time and places:

Theory and problems: every Wednesday, 10:00 - 12:00, room A4104
Labs: approx. every two Wednesdays, 14:00 - 16:00, room C6S303
No lab yet on Wednesday March 19th
There will be some gaps due to travels or other commitments. Will be announced on time.
I will post all the materials here, including pretty detailed descriptions of the lab assignments. It should be possible, though harder, to follow the seminar on your own.

Intended audience & requisites:

Students enrolled in MIRI, particularly in the Advanced Computing and Data Mining & Business Intelligence specialities.
But everybody is welcome.
Some familiarity with probabilistic reasoning and algorithmics is assumed. Some familiarity with machine learning / data mining is helpful, but probably not essential. Some programming may be necessary. I will try to give as much freedom in the choice of programming language but MOA is mostly Java.

Evaluation: based on a few exercise sets and a few deliverable lab assignments. Promised to be not too heavy. I am open to discussion for alternative evaluation methods.
Website: http://www.lsi.upc.edu/~gavalda/DataStreamSeminar

Overview

Streaming is one of the central aspects of the ``Big Data'' slogan. At a planet scale, we are today generating more data than we can store, and most of it will never be seen by human eyes. It often takes the form of sequences or streams of data items, arriving at high speed, potentially infinite, and evolving over time. The data stream paradigm contrasts with the usual input-compute-output algorithmic paradigm in this sequential nature, and also in the strong computational requirements it imposes: one pass over the data, small memory, small computation time per data item, ability to give answers in real-time and at anytime.

In this seminar we will:

Describe some of the scenarios where this paradigm is necessary (sensor networks, smart cities, social media, network monitoring, ...).
Study and experiment with algorithms for computing basic and not so basic queries over data streams.
Study and experiment with algorithms for mining knowledge from data streams (predictive models, clustering, pattern mining), as an extension of traditional data mining and machine learning.

The seminar will consist of 1) theory/problem sessions, where the instructor will
present the main ideas and where we will go over the solutions of problem sets
distributed in the previous sessions and 2) lab sessions, where we will experiment
or implement some of the methods described; for Part II (data stream mining) the
MOA stream mining framework will be used.

Very preliminary syllabus

Part I: Algorithms for Data Streams

The Data Stream Model: Scenarios and definition
Computing statistics on streams
Matrix and graph sketches
Detecting change on streams

Part II: Data Stream Mining

Predictive models
Clustering
Mining frequent patterns
Evaluation

Course material

See the pages with Slides, Bibliography and Links, and Lab material to the right.

MIRI Seminar on Data Streams, spring 2014 edition

Thursday, March 13, 2014

Welcome to the Spring 2014 (and first) edition of the MIRI Seminar on Data Streams

No comments:

Post a Comment