Open eClass του Οικονομικού Πανεπιστημίου Αθηνών | Επεξεργασία Φυσικής Γλώσσας

Επεξεργασία Φυσικής Γλώσσας - Natural Language Processing (MSc CS & MSc ISDS)

INF210 - Ion Androutsopoulos

Course Description

This course is part of the MSc in Computer Science and the MSc in Information Systems Development and Security of the Department of Informatics, Athens University of Economics and Business. The course covers algorithms, models and systems that allow computers to "understand" and generate natural language text, with emphasis in deep learning methods for natural language processing (NLP) and large language models (LLMs). Time permitting, a brief introduction to deep learning for speech processing and multimodal LLMs is also provided.

Syllabus
Course Objectives/Goals

After successfully completing the course, students will be able to:
• understand how important NLP algorithms, tools, and LLMs work,
• implement NLP algorithms and tailor NLP tools and LLMs for particular applications,
• select and implement appropriate Natural Language Processing algorithms for particular applications,
• evaluate the effectiveness and efficiency of NLP systems, including LLMs.

Course Syllabus

n-gram language models, entropy, cross-entropy, perplexity, context-aware spelling correction, beam-search decoding. Boolean and TF-IDF features. Information gain, SVD. k-NN, k-means. Linear and logistic regression, stochastic gradient descent. Precision, recall, F1, AUC. Multi-Layer Perceptrons (MLPs), backpropagation. Dropout, batch/layer normalization. Pre-training word embeddings, Word2Vec. Recurrent neural networks (RNNs), GRUs/LSTMs, RNN language models, RNNs with self-attention, bidirectional, stacked, hierarchical RNNs, encoder-decoder RNNs. Text processing with Convolutional Neural Networks (CNNs), image-to-text with CNNs-RNNs. Transformer encoders, BERT. Encoder-decoder Transformers, BART, T5. Decoder-only Transformers, GPT-x. Prompting, supervised fine-tuning, RLHF, DPO. Parameter efficient training, LoRA. Retrieval augmented generation (RAG), LLMs with tools, agents, ReACT. Adding vision to LLMs, LLaVA, InstructBLIP. Data augmentation for NLP. Introduction to automatic speech recognition (ASR). Deep learning encoders of speech segments, wav2vec, HuBERT, encoder-decoder and encoder-only ASR models. Dialog system architectures, intent recognition and dialog tracking using neural models, dialog systems based on LLMs.
Bibliography

There is no required textbook. Extensive notes in the form of slides are provided.

Recommended books:
- Speech and Language Processing, Daniel Jurafsky and James H. Martin, Pearson Education, 2^nd edition, 2009, ISBN-13: 978-0135041963. A draft of the 3rd edition is freely available (https://web.stanford.edu/~jurafsky/slp3/).
- Deep Learning for Natural Language Processing: A Gentle Introduction, Mihai Surdeanu and Marco A. Valenzuela-Escarcega, Cambridge University Presss, 2024, ISBN-13: 978-1316515662. Free draft available (https://clulab.org/gentlenlp/text.html).
- Neural Network Methods for Natural Language Processing, Yoav Goldberg, Morgan & Claypool Publishers, 2017, ISBN-13: 978-1627052986.
Prerequisites/Prior Knowledge

Basic knowledge of calculus, linear algebra, probability theory. For the programming assignments, programming experience in Python is required. An introduction to natural language processing and machine learning libraries (e.g., NLTK, spaCy, scikit-learn, PyTorch) will be provided, and students will have the opportunity to use these libraries in the course’s assignments. For assignments that require training neural networks, cloud virtual machines with GPUs (e.g., in Google’s Colab) can be used.

Assessment Methods

In each part of the course, study exercises are provided (solved and unsolved, some requiring programming), some of which are handed in (as assignments). The final grade is the average of the final examination grade (50%) and the grade of the study and programming exercises to be submitted (50%), provided that the final examination grade is at least 5/10. Otherwise, the final grade equals the final examination grade.

Instructors

Instructor: Ion Androutsopoulos (http://www.aueb.gr/users/ion/contact.html)

Units

- There are no units -

Agenda

Due day

Course event

System event

Personal event

Announcements

All announcements...

Summer School in Robotics and AI

Wednesday, March 11, 2026 at 6:31 PM
Ανεπίσημη βαθμολογία Φεβρουαρίου 2026

Sunday, February 15, 2026 at 8:24 PM
Οδηγίες βαθμολόγησης εργασιών

Thursday, October 24, 2024 at 1:24 PM

Course : Επεξεργασία Φυσικής Γλώσσας - Natural Language Processing (MSc CS & MSc ISDS)

Course code : INF210

Επεξεργασία Φυσικής Γλώσσας - Natural Language Processing (MSc CS & MSc ISDS)

Course Description

Units

Agenda

Announcements