Course : M35209F/Μ36209P - Text Analytics (MSc Data Science)

Course code : INF312

INF312  -  Ion Androutsopoulos

Documents
Root directory in_class_demos_2025_26   Files for the in-class demos of 2025-26.
First Name Size Date
Introduction to the NLTK and spaCy modules. Tokenization and N-gram Language Models (LMs). Cross-entropy and perplexity of LMs. Beam-search decoding.
80.67 KB 1/21/26, 6:33 PM
Introduction to PyTorch. Text classification with MLPs using tf-idf vectors and centroids of pretrained word2vec embeddings. Example with word2vec word embeddings with gensim.
395.91 KB 2/5/26, 7:35 PM
Introduction to the scikit-learn library. Text classification with both linear & non-linear classifiers. Learning curves. Pipelines and hyper-parameter tuning via grid or randomized search.
899.28 KB 2/1/26, 10:12 PM