Course : M35209F/Μ36209P - Text Analytics (MSc Data Science)

Course code : INF312

INF312  -  Ion Androutsopoulos

Documents
Root directory ta_slides_2025_26   The slides of 2025-26.
First Name Size Date
Introduction and course organization.
2.18 MB 1/9/26, 5:34 PM
n-gram language models, estimating probabilities from corpora, entropy, cross-entropy, perplexity, edit distance, context-aware spelling correction, beam-search decoding.
2.89 MB 1/9/26, 5:34 PM
Representing documents as bags of words. Boolean and TF-IDF features. Feature selection and extraction using information gain and SVD. Obtaining word embeddings from PMI scores. Word and text clustering with k-means. Text classification with k nearest neighbors. Linear and logistic regression, stochastic gradient descent. Evaluating classifiers with precision, recall, F1, ROC AUC. Practical advice and diagnostics for text classification with supervised machine learning.
3.63 MB 1/16/26, 10:40 PM
Multi-Layer Perceptrons (MLPs) and backpropagation. Dropout, batch and layer normalization. MLPs for text classification, regression, token classification (e.g., for POS tagging, named entity recognition). Pre-training word embeddings, Word2Vec. Advice for training large neural networks.
2.34 MB 1/27/26, 11:50 AM
Recurrent neural networks (RNNs), GRUs/LSTMs. Bidirectional and stacked RNNs. RNNs with self-attention or global max-pooling. RNNs in text and token classification. RNN language models. Obtaining word embeddings from character-based RNNs. Hierarchical RNNs. Sequence-to-sequence RNN models with attention, applications in machine translation. Optional slides: Variational dropout. Universal sentence encoders, LASER. Pre-training RNN language models, ELMo.
3.35 MB 2/10/26, 10:24 AM