The eClass platform requires JavaScript to be viewed properly.
Please turn it on and refresh.

M35209F/Μ36209P - Text Analytics (MSc Data Science)

Documents

Up

Root directory » ta_slides_2023_24

The slides of 2023-24.

	Type	Filename	Size	Date
		ta_slides_part00_introduction.pdf Introduction and course organization.	2.19 MB	1/8/24
		ta_slides_part01_ngrams.pdf n-gram language models, estimating probabilities from corpora, entropy, cross-entropy, perplexity. Edit distance, context-aware spelling correction, beam-search decoding.	2.91 MB	1/10/24
		ta_slides_part02_text_classification_with_mostly_linear_models.pdf Text classification with (mostly) linear models: Representing texts as bags of words. Boolean and TF-IDF features. Feature selection using information gain. Text classification with k-NN and Naive Bayes. Precision, recall, F1, AUC. Obtaining word embeddings from PMI scores using SVD-based dimensionality reduction. k-means. Linear and logistic regression, (stochastic) gradient descent. Practical advice and diagnostics for text classification with supervised machine learning. Optional slides: semi-supervised classification with Expectation Maximization (EM), lexicon-based features, sentiment lexica, Support Vector Machines (SVMs) and kernels.	5.61 MB	1/10/24
		ta_slides_part03_text_classification_with_mlps.pdf Perceptrons, training them with SGD, limitations. Multi-Layer Perceptrons (MLPs) and backpropagation. MLPs for text classification, regression, token classification (e.g., for POS tagging, NER). Dropout, batch/layer normalization. Pre-training word embeddings with Word2Vec. Advice for training deep neural networks.	2.9 MB	1/12/24
		ta_slides_part04_nlp_with_rnns.pdf Recurrent neural networks (RNNs), GRUs/LSTMs. Applications in token classification (e.g., named entity recognition). RNN language models. RNNs with self-attention and applications in text classification. Bidirectional and stacked RNNs. Obtaining word embeddings from character-based RNNs. Hierarchical RNNs. Sequence-to-sequence RNN models with attention, applications in machine translation. Optional slides: Universal sentence encoders, LASER. Pre-training language models, ELMo.	3.39 MB	1/30/24
		ta_slides_part05_nlp_with_cnns.pdf Quick background on Convolutional Neural Networks (CNNs) in Computer Vision. Image to text generation with CNN encoders and RNN decoders. Text processing with CNNs.	2.31 MB	2/13/24
		ta_slides_part06_nlp_with_transformers.pdf Key-query-value attention,Transformer encoders and decoders. Pre-trained Transformers, fine-tuning, prompting, instruction-tuning, BERT, SMITH, BART, T5, GPT-3, Instruct-GPT, Chat-GPT. Retrieval-augmented generation (RAG), Transformers with tools. Data augmentation for NLP.	6.09 MB	2/21/24
		ta_slides_part07_speech_recognition.pdf Introduction to automatic speech recognition (ASR). Encoding speech frames with pre-trained Transformers, wav2vec, HuBERT. ASR models: encoder/decoder models, encoder-only models, use of language models. ASR evaluation measures. Optional older material: MFCC vectors, HMM models.	2.91 MB	3/6/24