Paolo Frasconi

B010476: Apprendimento Automatico (2009/2010)

B010476: Machine Learning (2009/2010)

 

Laurea Magistrale in Ingegneria Informatica, Università degli Studi di Firenze

M.Sc. Computer Engineering, University of Florence


    Comunicazioni:
  • 01.02.10. Sarò in congedo per motivi di studio presso la K.U. Leuven fino al 31.08.10. Il ricevimento studenti è sospeso fino a tale data ma potete contattarmi per posta elettronica.
  • 10.12.09. 11 Dicembre, ultima lezione (inizio alle 14:45).
  • 19.11.09. Il 27 Novembre non ci sarà lezione per consentire agli interessati di partecipare al workshop su "Computer Graphics", aula 001 Morgagni.
  • 06.11.09. Il 12 Novembre alle 16:00 avremo un seminario del Dr. Jesse Davis (U. Washington), aula seminari di Energetica, al posto della lezione.
  • 02.11.09. Il 5 Novembre non ci sarà lezione.
  • Il 22 Ottobre non ci sarà lezione
  • Il corso inizia il 30 Settembre 2009
    News:
  • 01.02.10. I am on sabbatical leave at K.U. Leuven until 31.08.10. No office hours until then. However you can email me.
  • 10.12.09. Last class on December 11th (will begin at 2:45pm).
  • 19.11.09. No class on November 27th, so interested people can attend the "Computer Graphics Workshop", room 001 Morgagni.
  • 06.11.09. On November 12th 4pm there will be a un talk given by Dr. Jesse Davis (U. Washington), aula seminari di Energetica, instead of class.
  • 02.11.09. No class on November 5th.
  • No class on October 22nd.
  • Classes will begin on September 30th, 2009

Docente

Lecturer

Orario delle lezioni

Class schedule

Mercoledi 17:00-19:00 Aula 207 (S. Marta)
Giovedi 16:00-18:00 Aula 207 (S. Marta)
Venerdi 14:00-17:00 Aula 204 (S. Marta)
Wednesday 5:00pm-7:00pm Room 207 (S. Marta)
Thursday 4:00pm-6:00pm Room 207 (S. Marta)
Friday 2:00pm-5:00pm Room 204 (S. Marta)

Obiettivi

Goals

Crediti

Credits

Prerequisiti

Prerequisites

Esame

Exams

Libri suggeriti

Suggested books

Programma sintetico

Topics

Calendario e materiale didattico

Calendar and handouts

DataArgomentiLetture e materiale*
30/09/2009 Aspetti organizzativi del corso. Introduzione: l'apprendimento automatico per risolvere problemi difficili in IA. HTB09 1.
1/10/2009 Apprendimento con supervisione. Tipologie di ingressi e uscite. Esempi di applicazioni. Compiti: regressione, classificazione, classificazione multi-classe, ranking, uscite strutturate. Schema astratto di algoritmo di apprendimento supervisionato. Assunzioni sui dati. HTB09 2. Appunti della lezione
2/10/2009 Regressione lineare ordinaria. Stima ai minimi quadrati. Interpretazione geometrica. Analisi statistica. HTB09 3.2.
7/10/2009 Teorema di Gauss-Markov. Decomposizione bias-varianza. Selezione di un sottoinsieme di predittori: best-subset, forward selection, backward selection. Discussione. HTB09 3.2.2, 3.3, Dimostrazione del teorema di Gauss-Markov
8/10/2009 Esercitazione: OLR in Matlab, espansione in funzioni base polinomiali e RBF, regolarizzazione con Ridge regression. HTB09 3.4.1, Codice Matlab
9/10/2009 Sparsità e selezione di sottoinsiemi di predittori. Cenni su compressive sensing. Lasso. Forward stagewise regression. LARS. HTB09 3.4.
CVX software for Matlab
Mini-survey on compressive sensing: Baraniuk (2007)
LARS paper: Efron et al. (2004)
LARS software: R Matlab
14/10/2009 Introduzione alla classificazione: approccio generativo e discriminante. Il principio di massima verosimiglianza. Esempi. Proprietà degli stimatori a massima verosimiglianza: consistenza, equivarianza, gaussianità asintotica, ottimalità Wasserman 2004, cap. 9 Massima verosimiglianza
15/10/2009 Modelli generativi per la classificazione. Analisi discriminante lineare (Gaussiana). HTB09 4.3.
16/10/2009 Errore di Bayes. Esempi. Regressione logistica. Ottimizzazione con Newton-Raphson. Modelli lineari generalizzati (esempio: regressione di Poisson). HTB09 4.4. M. Jordan. Why the logistic function? - UAI 1995
21/10/2009 Regressione softmax (regressione logistica multiclasse). Introduzione a Naive Bayes. HTB09 4.4, 6.6.3. T. Mitchell: "Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression"
23/10/2009 Naive Bayes 2. Applicazione a data set generali con attributi categorici e numerici. Modelli di eventi alternativi per la categorizzazione del testo. Decomposizione dell'errore di generalizazione. Learning curves. Naive Bayes vs. Logistic Regression. A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification - AAAI 1998
U. Fayyad and K. Irani: Multi-interval discretization of continuous-valued attributes for classification learning - IJCAI 1993
A. Ng and M. Jordan. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes - NIPS 2002
28/10/2009 Iperpiano a massimo margine. Teoria di Karush Kuhn Tucker per risolvere problemi di ottimizzazione convessi in forma duale. Struttura della soluzione. HTB09 4.5. Schoelkopf and Smola (2002) Learning with Kernels Ch. 6, 7.3
29/10/2009 Macchine a vettori di supporto (C-SVC, nu-SVC). Analisi dei vettori di supporto, scelta di C. HTB09 12.1, 12.2.
30/10/2009 Introduzione informale ai kernels. Panoramica sui risolutori per SVM. SVM come minimizzazione del rischio empirico regolarizzato. Hinge vs. logit loss. HTB09 12.3.1, 12.3.2.
L. Bottou and C.J. Lin (2006). Support Vector Machine Solvers.
L. Bottou Trade-offs of Large-Scale Learning - NIPS 2008
LibSVM software
4/11/2009 Regressione a vettori di supporto. Kernels validi, feature map, teorema di Mercer. Feature map di Mercer. Introduzione alla feature map RKHS. HTB09 12.3.6, 5.8.
Schoelkopf and Smola (2002) Learning with Kernels Ch. 2
Cucker and Smale (2001): On the Mathematical Foundations of Learning
6/11/2009 Spazi di Hilbert con kernel rirpoducente. Kernel trick, trasformazione di algoritmi esistenti via kernel: (voted) perceptron, ridge regression, (dimostrazione del representer theorem per la loss quadratica). Esempi di progetto di kernels: istogrammi per immagini, Kernels su insiemi. Multi-instance learning. HTB09 5.8, 12.3.7
Y. Freund and R.E. Schapire: Large Margin Classification Using the Perceptron Algorithm
K. Grauman and T. Derrel: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features - ICCV 2005
11/11/2009 Kernels convoluzionali. Kernels per dati strutturati: k-grams per la classificazione di proteine, kernels per alberi sintattici, kernels di Tanimoto kernels per la classificazione di piccole molecole. D. Haussler (1999): Convolution Kernels on Discrete Structures - MLJ 1999
C. Leslie et al.: The spectrum kernel: A string kernel for SVM protein classification - PSB 2002
P. Kuksa et al.: Scalable Algorithms for String Kernels with Inexact Matching - NIPS 2008
M. Collins and N. Duffy Convolution Kernels for Natural Language - NIPS 2001
L. Ralaivola et al.: Graph kernels for chemical informatics - NN 2005
12/11/2009 Jesse Davis (U. Washington). Inductive Learning: Beyond the Standard Setting J. Davis and P. Domingos. Deep Transfer via Second-Order Markov Logic ICML 2009
13/11/2009 Introduzione ai modelli grafici probabilistici. Probabilità e probabilità soggettive (credenze). Independenza condizionale e modelli di dipendenza. Markov networks: semantica e decomposizione della distribuzione congiunta (Teorema di Hammersley-Clifford). Dettagli in KF09 Cap 1-4
D. Koller et al. (2007) Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
18/11/2009 Bayesian networks. Introduzione all'inferenza (eliminazione di variabili, belief propagation). A. Darwiche Bayesian Networks Communications of the ACM, to appear.
19/11/2009 Idee principali nell'algoritmo del junction tree.
Apprendimento di parametri in Bayesian networks: Stima a massima verosimiglianza con dati completi.
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
20/11/2009 Apprendimento con dati mancanti. Expectation-Maximization. Modelli basati su misture (misture di Gaussiane). K-means. HTB09 8.5, 13.2.1, 13.2.3, 14.3.6, 14.3.7. T. Minka Expectation-Maximization as lower bound maximization Tech. Report. 1999.
25/11/2009 Plates, dipendenza tra parametri in Bayesian networks, idee fondamentali sulle reti probabilistiche relazionali.
Apprendimento con ascesa del gradiente e EM per Bayesian networks con variabili non osservate.
K. Nigam et al. (2006)Semi-Supervised Text Classification Using EM in O Chapelle et al.: Semi-supervised learning
L. Getoor et al. (2007)Probabilistic Relational Models in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
26/11/2009 Apprendimento bayesiano di Bayesian networks. Stima massima a posteriori e regolarizzazione. Distribuzioni Beta e Dirichlet come priors. D. HeckermanA tutorial on learning with Bayesian networks.Technical Report MSR-TR-95-06, Microsoft Research, 1996.
02/12/2009 Apprendimento relazionale. Dati sequenziali. Hidden Markov Models. Procedura forward backward di Baum-Welch. Algoritmo di Viterbi per inferenza MAP. L.R. Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77(2):257-286, 1989.
03/12/2009 EM per l'apprendimento negli HMMs. HMMs come modelli log-lineari. Etichettamento discriminativo di sequenze: Conditional random fields. C. Sutton and A. McCallum (2007) An Introduction to Conditional Random Fields for Relational Learning. in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
04/12/2009 Seminario (M. Lippi). Markov logic networks e applicazioni. P. Domingos et al. (2008) Markov Logic in L. De Raedt et al. Probabilistic Inductive Logic Programming.
M. Lippi and P. Frasconi (2009) Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights Bioinformatics 25(18):2326-2333. Alchemy (Markov Logic software)
09/12/2009 Seminario (M. Lippi). Programmazione logica induttiva (FOIL, Tilde). R. Quinlan and R.M. Cameron-Jones (1993) FOIL: A midterm report, ECML '93.
H. Blockeel and L. De Raedt (1998)
Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2):285--297.
10/12/2009 Topic modeling. Latent Dirichlet Allocation. D. Blei et al. (2003) Latent Dirichlet Allocation Journal of Machine Learning Research 3, 993--1022
T.L Griffiths and M. Steyvers (2004) Finding scientific topics Proceedings of the National Academy of Sciences 101(Supp. 1):5228
Mallet (LDA and CFR Software)
11/12/2009 Ensemble methods, bagging, boosting, random forests. HTB09 8.7, 10, 15.
Y. Freund and R.E. Schapire (1999) A Short Introduction to Boosting Journal of Japanese Society for Artificial Intelligence 14(5):771-780.
L. Breiman (2001) Random ForestsMachine Learning 45.
Random forest software
*parte del materiale è protetto da password di accesso. La password viene comunicata esclusivamente agli studenti a lezione.
DateTopicsReadings and handouts*
30/09/2009 Administrivia. Introduction. Solving difficult AI problems: the machine learning approach. HTB09 1.
1/10/2009 Supervised learning. Types of inputs and outputs. Application examples. Tasks: regression, classification, multi-class classification, ranking, structured-output learning. Abstract view of a supervised learning algorithm. Assumptions on data. HTB09 2. Class notes
2/10/2009 Ordinary Linear Regression. Least squares estimates. Geometric interpretation. Statistical analysis. HTB09 3.2.
7/10/2009 Gauss-Markov Theorem. Bias-variance decomposition. Subset selection: best subset, forward and backward stepwise. Discussion. HTB09 3.2.2, 3.3, Proof of Gauss-Markov Theorem
8/10/2009 Practice: OLR in Matlab (toy examples), basis functions (polynomial, RBF), regularization with Ridge regression. HTB09 3.4.1, Matlab code
9/10/2009 Sparsity and subset selection. Basic ideas of compressive sensing. Lasso. Forward stagewise regression and LARS. HTB09 3.4.
CVX software for Matlab
Mini-survey on compressive sensing: Baraniuk (2007)
LARS paper: Efron et al. (2004)
LARS software: R Matlab
14/10/2009 Introduction to classification: generative and discriminative approaches. The maximum likelihood principle. Examples. Properties of maximum likelihood estimators: consistency, equivariance, asyntotic normality, optimality. Wasserman 2004, ch. 9 Maximum likelihood
15/10/2009 Generative models for classification. Linear (Gaussian) discriminant analysis. HTB09 4.3.
16/10/2009 Bayes error rate. Examples. Logistic regression. Newton-Raphson optimization. Generalized linear models (example: Poisson regression). HTB09 4.4. M. Jordan. Why the logistic function? - UAI 1995
21/10/2009 Softmax regression (multiclass logistic regression). Introduction to Naive Bayes. HTB09 4.4, 6.6.3. T. Mitchell: "Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression"
23/10/2009 Naive Bayes 2. Application to general data sets with categorical and numerical attributes. Alternative event models for text categorization. Decomposition of the generalization error. Learning curves. Naive Bayes vs. Logistic Regression. A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification - AAAI 1998
U. Fayyad and K. Irani: Multi-interval discretization of continuous-valued attributes for classification learning - IJCAI 1993
A. Ng and M. Jordan. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes - NIPS 2002
28/10/2009 Maximal margin hyperplane. Karush Kuhn Tucker theory for solving convex optimization problems in dual form. Structure of the MMH solution. HTB09 4.5. Schoelkopf and Smola (2002) Learning with Kernels Ch. 6, 7.3
29/10/2009 Support vector machines (C-SVC, nu-SVC). Analysis of support vectors. Choosing C. HTB09 12.1, 12.2.
30/10/2009 Informal introduction to kernels. Overview of SVM solvers. SVM as regularized empirical loss minimization. Hinge vs. logit loss. HTB09 12.3.1, 12.3.2.
L. Bottou and C.J. Lin (2006). Support Vector Machine Solvers.
L. Bottou Trade-offs of Large-Scale Learning - NIPS 2008
LibSVM software
4/11/2009 Support vector regression. Valid kernels, feature map, and Mercer theorem. Mercer feature map. Introduction to the RKHS feature map. HTB09 12.3.6, 5.8.
Schoelkopf and Smola (2002) Learning with Kernels Ch. 2
Cucker and Smale (2001): On the Mathematical Foundations of Learning
6/11/2009 Reproducing kernel Hilbert spaces. Kernel trick and kernelizing existing algorithms: (voted) perceptron, ridge regression, (proof of representer theorem for square loss). Examples of kernel design: image histograms. Kernels on sets. Multi-instance learning. HTB09 5.8, 12.3.7
Y. Freund and R.E. Schapire: Large Margin Classification Using the Perceptron Algorithm
K. Grauman and T. Derrel: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features - ICCV 2005
11/11/2009 Convolution kernels. Kernels for structured instances: k-grams for protein classification, subtree kernels for parse trees, Tanimoto kernels for small molecule classification. D. Haussler (1999): Convolution Kernels on Discrete Structures - MLJ 1999
C. Leslie et al.: The spectrum kernel: A string kernel for SVM protein classification - PSB 2002
P. Kuksa et al.: Scalable Algorithms for String Kernels with Inexact Matching - NIPS 2008
M. Collins and N. Duffy Convolution Kernels for Natural Language - NIPS 2001
L. Ralaivola et al.: Graph kernels for chemical informatics - NN 2005
12/11/2009 Jesse Davis (U. Washington). Inductive Learning: Beyond the Standard Setting J. Davis and P. Domingos. Deep Transfer via Second-Order Markov Logic ICML 2009
13/11/2009 Introduction to probabilistic graphical models. Review of probabilities and subjective probability (beliefs). Conditional independence and dependency models. Markov networks: semantics and decomposition of the joint distribution (Hammersley-Clifford theorem). Details in KF09 Ch 1-4
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
18/11/2009 Bayesian networks. Introduction to inference (variable elimination, belief propagation). A. Darwiche Bayesian Networks Communications of the ACM, to appear.
19/11/2009 Sketch of the junction tree algorithm.
Parameter learning in Bayesian networks: Maximum likelihood estimation with complete data.
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
20/11/2009 Learning with missing data. Expectation-Maximization. Mixture models (mixtures of Gaussians). K-means. HTB09 8.5, 13.2.1, 13.2.3, 14.3.6, 14.3.7. T. Minka Expectation-Maximization as lower bound maximization Tech. Report. 1999.
25/11/2009 Plates, parameter dependences in Bayesian networks, core ideas of probabilistic relational networks.
Gradient ascent and EM-based learning in Bayesian networks with hidden variables.
K. Nigam et al. (2006)Semi-Supervised Text Classification Using EM in O Chapelle et al.: Semi-supervised learning
L. Getoor et al. (2007)Probabilistic Relational Models in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
26/11/2009 Bayesian learning of Bayesian networks. Maximum a posteriori and regularization. Beta and Dirichlet distributions as priors. D. HeckermanA tutorial on learning with Bayesian networks.Technical Report MSR-TR-95-06, Microsoft Research, 1996.
02/12/2009 Relational learning. Sequential data. Hidden Markov Models. Baum-Welch's forward backward procedure, Viterbi's algorithm for MAP inference. L.R. Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77(2):257-286, 1989.
03/12/2009 EM learning for HMMs. HMMs as log-linear models. Discriminative sequence labeling: Conditional random fields. C. Sutton and A. McCallum (2007) An Introduction to Conditional Random Fields for Relational Learning. in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
04/12/2009 Guest lecture (M. Lippi). Markov logic networks and applications. P. Domingos et al. (2008) Markov Logic in L. De Raedt et al. Probabilistic Inductive Logic Programming.
M. Lippi and P. Frasconi (2009) Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights Bioinformatics 25(18):2326-2333.
Alchemy (Markov Logic software)
09/12/2009 Guest lecture (M. Lippi). Inductive logic programming (FOIL, Tilde). R. Quinlan and R.M. Cameron-Jones (1993) FOIL: A midterm report, ECML '93.
H. Blockeel and L. De Raedt (1998)
Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2):285--297.
10/12/2009 Topic modeling. Latent Dirichlet Allocation. D. Blei et al. (2003) Latent Dirichlet Allocation Journal of Machine Learning Research 3, 993--1022
T.L Griffiths and M. Steyvers (2004) Finding scientific topics Proceedings of the National Academy of Sciences 101(Supp. 1):5228
Mallet (LDA and CFR Software)
11/12/2009 Ensemble methods, bagging, boosting, random forests. HTB09 8.7, 10, 15.
Y. Freund and R.E. Schapire (1999) A Short Introduction to Boosting Journal of Japanese Society for Artificial Intelligence 14(5):771-780.
L. Breiman (2001) Random ForestsMachine Learning 45.
Random forest software
*some handouts are password protected. The password is only revealed to students in class.