Paolo Frasconi

B010476: Apprendimento Automatico (2010/2011)

B010476: Machine Learning (2010/2011)

 

Laurea Magistrale in Ingegneria Informatica, Università degli Studi di Firenze

M.Sc. Computer Engineering, University of Florence


    Anni precedenti:
    Previous years:
    Comunicazioni:
  • Le lezioni riprendono il 12 Gennaio. Il 23 Dicembre non c'č ricevimento. Buone feste a tutti!
  • Apprendimento automatico, secondo posto in Fastest Rising Translations of Google Zeitgeist 2010
  • Il Rettore ha convocato per il 5 Novembre un' assemblea di Ateneo. L'attivitā didattica č sospesa dalle 10:00 alle 14:00
  • Il 20 Ottobre non ci sarā lezione
  • Il corso inizia il 13 Ottobre 2010
  • Il corso inizia il 29 Settembre 2010
    News:
  • Classes start again on Jan. 10. No office hours on Dec 23. Best wishes for the Holiday Season!
  • Apprendimento automatico, 2nd place in Fastest Rising Translations of Google Zeitgeist 2010
  • The Chancelor has convened on November 5th a university assembly. Teaching activity is suspended from 10:00 to 14:00
  • No class on October 20
  • Classes will begin on October 13, 2010
  • Classes will begin on September 29, 2010

Docente

Lecturer

Orario delle lezioni

Class schedule

Mercoledi 16:00-18:00 Aula 205 (S. Marta)
Giovedi 17:00-19:00 Aula 205 (S. Marta)
Venerdi 10:15-13:15 Aula 204 (S. Marta)
Wednesday 4:00pm-6:00pm Room 205 (S. Marta)
Thursday 5:00pm-7:00pm Room 205 (S. Marta)
Friday 10:15am-1:15pm Room 204 (S. Marta)

Obiettivi

Goals

Crediti

Credits

Prerequisiti

Prerequisites

Laboratorio virtuale

Virtual lab

Esame

Exams

Libri suggeriti

Suggested books

Programma sintetico

Topics

Calendario e materiale didattico

Calendar and handouts

DataArgomentiLetture e materiale*
13/10/2010 Aspetti organizzativi del corso. Introduzione: l'apprendimento automatico per risolvere problemi difficili in IA. HTB09 1.
14/10/2010 Apprendimento con supervisione. Tipologie di ingressi e uscite. Esempi di applicazioni. Compiti: regressione, classificazione, classificazione multi-classe, ranking, uscite strutturate. Schema astratto di algoritmo di apprendimento supervisionato. Assunzioni sui dati. HTB09 2. Appunti della lezione
15/10/2010 Regressione lineare ordinaria. Stima ai minimi quadrati. Interpretazione geometrica. Analisi statistica. Teorema di Gauss-Markov. Decomposizione bias-varianza. HTB09 3.2 Dimostrazione del teorema di Gauss-Markov
21/10/2010 Selezione di un sottoinsieme di predittori: best-subset, forward selection, backward selection. Regressione ridge e lasso. Discussione. HTB09 3.3, 3.4
22/10/2010 Il principio di massima verosimiglianza. Esempi. Proprietà degli stimatori a massima verosimiglianza: consistenza, equivarianza, gaussianità asintotica, ottimalità. Minimi quadrati per la regressione come caso particolare di massima verosimiglianza. Wasserman 2004, cap. 9 Massima verosimiglianza
27/10/2010 Modelli generativi per la classificazione. Analisi discriminante lineare (Gaussiana). Errore di Bayes. Esempi. HTB09 4.3. DHS 5.8.2
28/10/2010 Regressione logistica. Ottimizzazione con Newton-Raphson. Modelli lineari generalizzati (esempio: regressione di Poisson). HTB09 4.4. M. Jordan. Why the logistic function? - UAI 1995
29/10/2010 Regressione softmax (regressione logistica multiclasse). Introduzione a Naive Bayes. HTB09 4.4, 6.6.3. T. Mitchell: Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
3/11/2010 Naive Bayes. Laplace smoothing. Caso di attributi categorici. Discretizzazione di attributi continui. Modello multinomiale per la categorizzazione del testo. A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification - AAAI 1998
U. Fayyad and K. Irani: Multi-interval discretization of continuous-valued attributes for classification learning - IJCAI 1993
4/11/2010 Decomposizione dell'errore di generalizzazione. Learning curves. Naive Bayes vs. Logistic Regression. Introduzione a SVM: Iperpiano a massimo margine. A. Ng and M. Jordan. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes - NIPS 2002
10/11/2010 Iperpiano a massimo margine (MMH). Teoria di Karush Kuhn Tucker per problemi di ottimizzazione convessi. Struttura della soluzione di MMH. HTB09 4.5. Schoelkopf and Smola (2002) Learning with Kernels Ch. 6, 7.3
11/11/2010 Macchine a vettori di supporto (C-SVC, nu-SVC). Analisi dei vettori di supporto. Risolutori per SVM. SMO. SVM come minimizzazione di una loss empirica con regolarizzazione. Hinge vs. logit (regressione logistica) loss. HTB09 12.1, 12.2. L. Bottou and C.J. Lin (2006). Support Vector Machine Solvers.
L. Bottou Trade-offs of Large-Scale Learning - NIPS 2008
LibSVM software
12/11/2010 Regressione a vettori di supporto. Introduzione informale ai kernels: Kernel polinomiale e Gaussiano. Perceptron e Kernel Perceptron. Voted-perceptron. HTB09 12.3.1, 12.3.2. Cristianini and Shawe-Taylor, Ch. 2
Y. Freund and R.E. Schapire: Large Margin Classification Using the Perceptron Algorithm
17/11/2010 Teorema di Mercer. Spazi di Hilbert a nucleo riproducente. Teorema di rappresentazione. Kernel ridge regression. HTB09 5.8, 12.3
T. Poggio and S. Smale The Mathematics of Learning: Dealing with Data (quite readable)
F. Cucker and S. Smale On the mathematical foundations of learning (very detailed but somewhat difficult paper)
18/11/2010 Progetto di kernels. Proprietā di chiusura. Min-kernel. Kernels su istogrammi. Pyramidal match kernel. Kernels convoluzionali. K. Grauman and T. Derrel: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features - ICCV 2005
D. Haussler (1999): Convolution Kernels on Discrete Structures - MLJ 1999
19/11/2010 Kernels per dati strutturati: k-grams per classificazione di proteine, kernels per alberi sintattici. Feature hashing. C. Leslie et al.: The spectrum kernel: A string kernel for SVM protein classification - PSB 2002
P. Kuksa et al.: Scalable Algorithms for String Kernels with Inexact Matching - NIPS 2008
M. Collins and N. Duffy Convolution Kernels for Natural Language - NIPS 2001
K. Weinberger et al. Feature Hashing for Large Scale Multitask Learning ICML 2009.
24/11/2010 Panoramica sui kernels per grafi. Teoria statistica dell'apprendimento. Bounds per la generalizzazione. Caso di spazi di ipotesi a dimensione finita. VC-dimension e bound di Vapnick. L. Ralaivola et al. (2005) Graph kernels for chemical informatics
Bousquet et al. (2004)Introduction to Statistical Learning Theory
25/11/2010 Introduzione ai modelli grafici probabilistici. Probabilitā soggettive (credenze). Indipendenza condizionale e modelli di indipendenza. Mappe di indipendenza non orientate (Markov networks). Details in KF09 Ch 1-4
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
26/11/2010 Reti di Markov: decomposizione della distribuzione congiunta (Teorema di Hammersley-Clifford). Reti di Bayes. D-separazione. Parametrizzazione. Introduzione all'inferenza (eliminazione di variabili). A. Darwiche Bayesian Networks Communications of the ACM.
1/12/2010 Inferenza con l'algoritmo del junction tree.
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
02/12/2010 Software per reti Bayesiane.
Apprendimento dei parmetri nelle reti Bayesiane: Stima a massima verosimiglianza con dati completamente osservati. Apprendimento Bayesiano, priors Beta e di Dirichlet.
Bayesian networks software: Hugin, Genie.
D. Heckerman (1997) Bayesian Networks for Data Mining Data Mining and Knowledge Discovery.
03/12/2010 Apprendimento con dati mancanti. Expectation-Maximization (EM). Modelli a mistura e misture di gaussiane (GMM). K-means. HTB09 8.5, 13.2.1, 13.2.3, 14.3.6, 14.3.7. T. Minka Expectation-Maximization as lower bound maximization Tech. Report. 1999.
9/12/2010 EM per l'apprendimento nelle reti Bayesiane. Plates, dipendenze tra i parametri nelle reti Bayesiane. Idee base su reti probabilistiche relazionali.
Apprendimento semi-supervisionato. Auto-addestramento, uso di modelli a mistura, Naive Bayes semi-supervisionato.
K. Nigam et al. (2006) Semi-Supervised Text Classification Using EM in O Chapelle et al.: Semi-supervised learning
L. Getoor et al. (2007) Probabilistic Relational Models in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
10/12/2010 Co-training. Semi-supervised SVM. Regolarizzazione entropica per regressione logistica.
Introduzione agli Hidden Markov Models
Zhu and Goldberg Introduction to Semi-Supervised Learning 2009.
L.R. Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77(2):257-286, 1989.
15/12/2010 Hidden Markov Models (Inferenza e procedura forward-backward di Baum-Welch) L.R. Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77(2):257-286, 1989.
16/12/2010 Algoritmo di Viterbi per HMMs. Apprendimento con EM. HMMs discriminativi (Maximum Entropy Markov Models).
17/12/2010 Discriminative sequence labeling: Conditional random fields. Structured-output learning and SVM-HMM C. Sutton and A. McCallum (2007) An Introduction to Conditional Random Fields for Relational Learning. in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
Y. Altun et al (2007). Support Vector Machine Learning for Interdependent and Structured Output Spaces. in BakIr et al: Predicting Structured Data
Mallet (LDA and CFR Software)
12/1/2011 Apprendumento statistico relazionale. Introduzione alla logica di Markov. P. Domingos et al. (2008) Markov Logic in L. De Raedt et al. Probabilistic Inductive Logic Programming.
13/1/2011 Logica di Markov e applicazioni. M. Lippi and P. Frasconi (2009) Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights Bioinformatics 25(18):2326-2333.
Alchemy (Markov Logic software)
14/1/2011 Topic modeling. Latent Dirichlet Allocation. D. Blei et al. (2003) Latent Dirichlet Allocation Journal of Machine Learning Research 3, 993--1022
T.L Griffiths and M. Steyvers (2004) Finding scientific topics Proceedings of the National Academy of Sciences 101(Supp. 1):5228
Y. Wang (2008) Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details
Mallet (LDA and CFR Software)
*parte del materiale è protetto da password di accesso. La password viene comunicata esclusivamente agli studenti a lezione.
DateTopicsReadings and handouts*
13/10/2010 Administrivia. Introduction. Solving difficult AI problems: the machine learning approach. HTB09 1.
14/10/2010 Supervised learning. Types of inputs and outputs. Application examples. Tasks: regression, classification, multi-class classification, ranking, structured-output learning. Abstract view of a supervised learning algorithm. Assumptions on data. HTB09 2. Class notes
15/10/2010 Ordinary Linear Regression. Least squares estimates. Geometric interpretation. Statistical analysis. Gauss-Markov Theorem. Bias-variance decomposition. HTB09 3.2.
21/10/2010 Subset selection: best subset, forward and backward stepwise. Ridge regression and Lasso. Discussion. HTB09 3.3, 3.4
22/10/2010 The maximum likelihood principle. Examples. Properties of maximum likelihood estimators: consistency, equivariance, asyntotic normality, optimality. Ordinary least squares as a special case of MLE. Wasserman 2004, Ch. 9 Maximum likelihood
27/10/2010 Generative models for classification. Linear (Gaussian) discriminant analysis. Bayes risk. Esempi. HTB09 4.3. DHS 5.8.2
28/10/2010 Logistic regression. Newton-Raphson optimization. Generalized linear models (example: Poisson regression). HTB09 4.4. M. Jordan. Why the logistic function? - UAI 1995
29/10/2010 Softmax regression (multiclass logistic regression). Introduction to Naive Bayes. HTB09 4.4, 6.6.3. T. Mitchell Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
3/11/2010 Naive Bayes. Laplace smoothing. General categorical attributes. Discretization of continuous attributes. Multinomial model for text categorization. A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification - AAAI 1998
U. Fayyad and K. Irani: Multi-interval discretization of continuous-valued attributes for classification learning - IJCAI 1993
4/11/2010 Decomposition of the generalization error. Learning curves. Naive Bayes vs. Logistic Regression. Introduction to SVM: Maximal margin hyperplane. A. Ng and M. Jordan. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes - NIPS 2002
10/11/2010 Maximal margin hyperplane. Karush Kuhn Tucker theory for solving convex optimization problems in dual form. Structure of the MMH solution. HTB09 4.5. Schoelkopf and Smola (2002) Learning with Kernels Ch. 6, 7.3
11/11/2010 Support vector machines (C-SVC, nu-SVC). Analysis of support vectors. Overview of SVM solvers. SMO. SVM as regularized empirical loss minimization. Hinge vs. logit (logistic regression) loss. HTB09 12.1, 12.2. L. Bottou and C.J. Lin (2006). Support Vector Machine Solvers.
L. Bottou Trade-offs of Large-Scale Learning - NIPS 2008
LibSVM software
12/11/2010 Support vector regression. Informal introduction to kernels: Polynomial and Gaussian kernel. Perceptron and Kernel Perceptron. Voted-perceptron. HTB09 12.3.1, 12.3.2. Cristianini and Shawe-Taylor, Ch. 2
Y. Freund and R.E. Schapire: Large Margin Classification Using the Perceptron Algorithm
17/11/2010 Mercer's theorem. Reproducing kernel Hilbert spaces. Representer theorem. Kernel ridge regression. HTB09 5.8, 12.3
T. Poggio and S. Smale The Mathematics of Learning: Dealing with Data (quite readable)
F. Cucker and S. Smale On the mathematical foundations of learning (very detailed but somewhat difficult paper)
18/11/2010 Designing kernels. Closure properties. Min-kernel. Kernels on histograms. Pyramidal match kernel. Convolution kernels. K. Grauman and T. Derrel: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features - ICCV 2005
D. Haussler (1999): Convolution Kernels on Discrete Structures - MLJ 1999
19/11/2010 Kernels for structured instances: k-grams for protein classification, subtree kernels for parse trees. Feature hashing. C. Leslie et al.: The spectrum kernel: A string kernel for SVM protein classification - PSB 2002
P. Kuksa et al.: Scalable Algorithms for String Kernels with Inexact Matching - NIPS 2008
M. Collins and N. Duffy Convolution Kernels for Natural Language - NIPS 2001
K. Weinberger et al. Feature Hashing for Large Scale Multitask Learning ICML 2009.
24/11/2010 Overview of graph kernels. Learning theory. Generalization bounds. Case of finite-dimensional hypothesis spaces. VC-dimension and Vapnick's bound. L. Ralaivola et al. (2005) Graph kernels for chemical informatics
Bousquet et al. (2004)Introduction to Statistical Learning Theory
25/11/2010 Introduction to probabilistic graphical models. Review of probabilities and subjective probability (beliefs). Conditional independence and dependency models. Undirected I-maps (Markov networks). Details in KF09 Ch 1-4
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
26/11/2010 Markov networks: decomposition of the joint distribution (Hammersley-Clifford theorem). Bayesian networks. D-separation. Parameterization. Introduction to inference (variable elimination). A. Darwiche Bayesian Networks Communications of the ACM.
1/12/2010 Inference with the junction tree algorithm.
D. Koller et al. (2007)Graphical models in a Nutshell in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
02/12/2010 Using Bayesian network software.
Parameter learning in Bayesian networks: Maximum likelihood estimation with complete data. Bayesian learning, Beta and Dirichlet priors.
Bayesian networks software: Hugin, Genie.
D. Heckerman (1997) Bayesian Networks for Data Mining Data Mining and Knowledge Discovery.
03/12/2010 Learning with missing data. Expectation-Maximization (EM). Mixture models and mixtures of Gaussians (GMM). K-means. HTB09 8.5, 13.2.1, 13.2.3, 14.3.6, 14.3.7. T. Minka Expectation-Maximization as lower bound maximization Tech. Report. 1999.
9/12/2010 EM for learning Bayesian Networks. Plates, parameter dependences in Bayesian networks, core ideas of probabilistic relational networks.
Semi-supervised learning: self-training, using mixture models, semi-supervised Naive Bayes.
K. Nigam et al. (2006) Semi-Supervised Text Classification Using EM in O Chapelle et al.: Semi-supervised learning
L. Getoor et al. (2007) Probabilistic Relational Models in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
10/12/2010 Co-training. Semi-supervised SVM and Logistic Regression.
Introduction to Hidden Markov Models
Zhu and Goldberg Introduction to Semi-Supervised Learning 2009.
L.R. Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77(2):257-286, 1989.
15/12/2010 Hidden Markov Models (Baum-Welch's forward backward procedure) L.R. Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77(2):257-286, 1989.
16/12/2010 Viterbi Algorithm for HMMs. Learning with EM. Discriminative HMMs (Maximum Entropy Markov Models).
17/12/2010 Discriminative sequence labeling: Conditional random fields. Structured-output learning and SVM-HMM C. Sutton and A. McCallum (2007) An Introduction to Conditional Random Fields for Relational Learning. in L. Getoor and B. Taskar: Introduction to Statistical Relational Learning
Y. Altun et al (2007). Support Vector Machine Learning for Interdependent and Structured Output Spaces. in BakIr et al: Predicting Structured Data
Mallet (LDA and CFR Software)
12/1/2011 Overview of statistical relational learning. Introduction to Markov logic. P. Domingos et al. (2008) Markov Logic in L. De Raedt et al. Probabilistic Inductive Logic Programming.
13/1/2011 Markov logic and applications. M. Lippi and P. Frasconi (2009) Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights Bioinformatics 25(18):2326-2333.
Alchemy (Markov Logic software)
14/1/2011 Topic modeling. Latent Dirichlet Allocation. D. Blei et al. (2003) Latent Dirichlet Allocation Journal of Machine Learning Research 3, 993--1022
T.L Griffiths and M. Steyvers (2004) Finding scientific topics Proceedings of the National Academy of Sciences 101(Supp. 1):5228
Y. Wang (2008) Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details
Mallet (LDA and CFR Software)
*some handouts are password protected. The password is only revealed to students in class.