Subject description - BECM36NLPT

Summary of Study | Summary of Branches | All Subject Groups | All Subjects | List of Roles | Explanatory Notes               Instructions
BECM36NLPT Natural Language Processing and Translation
Roles:P Extent of teaching:2P+2C
Department:13136 Language of teaching:EN
Guarantors:Bojar O. Completion:Z,ZK
Lecturers:Bojar O. Credits:6
Tutors:Bojar O. Semester:L

Anotation:

The course covers the area of natural language processing (NLP) by means of an in-depth focus on the task of machine translation (MT).

Content:

The course covers the area of natural language processing (NLP) by means of an in-depth focus on the task of machine translation (MT). Machine translation, being one of the key disciplines in computational linguistics for several decades, has utilized a broad range of concepts from the field: morphological analysis, surface syntax, deep (and typically Praguian) syntax before it fully switched to deep learning approaches, convolutional and primarily recurrent neural networks, and ultimately giving the world the self-attentive network structure in the Transformer model. We start by setting up research methodology: designing, conducting and evaluating experiments in NLP, with a particular focus on evaluation of text outputs. Two introductory lectures are then devoted to layers of linguistic analysis and neural networks in NLP, respectively. Training data are a critical prerequisite and we discuss their acquisition, processing and the relevance of the domain in the fourth lecture. The subsequent three lectures are devoted to “classical AI” search-based NLP methods, discussing how they benefit but can also suffer when incorporating underlying linguistic concepts of morphology and syntax too rigidly. The rest of the course is devoted to the Transformer model, its basic properties, the representations it learns, its use in multi-lingual and multi-modal settings, culminating with pretrained models, including large language models, and methods of self-learning.Theoretical basis is well supported with practical experience during your project work in a team or alone.

Course outlines:

1. NLP Experiment. Methods of Evaluation. Evaluating Text Generation Tasks, incl. Machine Translation.
2. Layers of Linguistic Analysis. Overview of Approaches to Machine Translation.
3. Processing Text with Neural Networks. Neural Machine Translation
4. Data Acquisition and Preparation. Alignment.
5. Classical AI Search Space. Phrase-Based Machine Translation.
6. Search Space Factorization. Morphology in Machine Translation.
7. Syntactic Analysis. Structural Approaches to Machine Translation.
8. Transformer Architecture. Caveats on Interpreting Results.
9. Word and Sentence Representations.
10. Transfer Learning in NLP. Multi-lingual Models. Multi-lingual Machine Translation.
11. Multi-modality and NLP. Non-text Modalities in Translation.
12. Pre-trained Models, Large Language Models, Instruction Tuning. Emergent Properties.
13. Self-Learning in NLP. Unsupervised Machine Translation.

Exercises outline:

Literature:

Koehn, Philipp. Statistical Machine Translation. Cambridge University Press; 2009. Koehn, Philipp. Neural Machine Translation. Cambridge University Press; 2020. Bojar, Ondřej (2012). Čeština a strojový překlad: Strojový překlad našincům, našinci strojovému překladu. ISBN 978-80-904571-4-0. 168 pp. Kocmi Tom, Macháček Dominik, Bojar Ondřej (2021). The Reality of Multi-Lingual Machine Translation. ISBN 978-80-88132-11-0. 191 pp. Vaswani, Ashish, et al. Attention is all you need. Advances in neural information processing systems, 2017. Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 2021, 8: 842-866. Ouyang, Long, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022, 35: 27730-27744.

Requirements:

Subject is included into these academic programs:

Program Branch Role Recommended semester
MPPRGAI_2025 Common courses P 2


Page updated 19.4.2026 17:51:07, semester: L/2025-6, L/2029-30, Z/2028-9, Z/2025-6, L/2028-9, Z/2026-7, Z,L/2027-8, L/2026-7, Send comments about the content to the Administrators of the Academic Programs Proposal and Realization: I. Halaška (K336), J. Novák (K336)