Module I: Speech Technologies

Course description

Introduction: description of the field, a brief historical outline of the development of intelligent audio and speech systems and the importance of research and knowledge acquisition in this field for the Slovenian language. 

Basic characteristics of the auditory perception of different sounds and the synthesis and perception of speech in human speech communication. Representations of audio and speech signals. 

Computer processing of sound and speech signals: pre-processing, sound signal features, speech signal segmentations, sound and speech databases. 

Systems for the recognition of various sounds: recognition of environmental and animal sounds, recognition of traffic sounds, audio verification of machine operation. 

Speech recognition systems: speaker recognition and verification, isolated word and continuous speech recognition, spontaneous speech recognition. Statistical acoustic and language modeling, semantic speech analysis. 

Artificial speech: systems for speech synthesis in general, grapheme-to-phoneme conversion, prosody modeling, speech-synthesis procedures. Assessment of speech synthesis systems. 

Dialogue: automated dialogue systems in general, system configurations, dialogue management, knowledge representations, multimodality, assessment of dialogue systems.

Course is carried out on study programme

2nd Cycle Postgraduate Study Programme in Electrical Engineering

Objectives and competences

The aim of this course is to acquaint students with the field of machine hearing and speech technologies and to introduce various algorithms, techniques, and methods to accomplish different tasks related to field of the course. 

Learning and teaching methods

  • Lectures 
  • Interactive teaching 
  • Practical assignments 
  • Seminar work 

Intended learning outcomes

After successful completion of the course, students should be able to: 

  • define the main approaches to the representation, description, synthesis and recognition of different sound and speech signals, 
  • describe the characteristics, components, structure and capabilities of machine hearing systems and speech technologies, 
  • use selected programming solutions (APIs) for the development of machine hearing systems and spoken man – machine communication systems, 
  • distinguish between different tasks of machine hearing systems and speech technologies and representation and processing methods needed to achieve these tasks, 
  • combine basic procedures for representation and processing of sound data into complex systems for the recognition of different sounds, and for the recognition and synthesis of speech, 
  • evaluate the accuracy and reliability of the machine hearing and speech systems. 

Reference nosilca

  1. CATALBAS, Mehmet Cem, DOBRIŠEK, Simon. Dynamic speaker localization based on a novel lightweight R-CNN model. Neural computing & applications, ISSN 0941-0643, 2023, vol. , str. 1-15. 
  2. DOBRIŠEK, Simon, GOLOB, Žiga, ŽGANEC GROS, Jerneja. Finite-state super transducers for compact language resource representation in edge voice-AI. Systems science & control engineering, ISSN 2164-2583, 2022, vol. 10, no. 1, str. 636-644. 
  3. ŽGANEC GROS, Jerneja, GOLOB, Žiga, VESNICER, Boštjan, ŽGANEC, Mario, ŠKRLJ, Simona, METLIČAR PREZELJ, Tina, ODAR, Jure, SEDEVČIČ, Robert, ČERNE, Tomaž, DOBRIŠEK, Simon. Postopek za določanje končnih super pretvornikov za učinkovit zapis slovarjev izgovarjav v vgrajenih sistemih : P-202100200 – 20211110. Ljubljana: Urad RS za intelektualno lastnino, 2021. 12 str. 
  4. DOBRIŠEK, Simon, ŽGANEC GROS, Jerneja, ŽIBERT, Janez, MIHELIČ, France, PAVEŠIĆ, Nikola. Speech database of spoken flight information enquiries SOFES 1.0. Ljubljana: CLARIN.SI, 2017. 
  5. GAJŠEK, Rok, MIHELIČ, France, DOBRIŠEK, Simon. Speaker state recognition using an HMM-based feature extraction method. Computer speech & language, ISSN 0885-2308, Jan. 2013, vol. 27, no. 1, str. 135-150. 

Study materials

  1. Mihelič F., Signali, Založba FE in FRI, Ljubljana, 2014 
  2. Pavešić N., Razpoznavanje vzorcev: uvod v analizo in razumevanje vidnih in slušnih vzorcev, 3. Popravljena in dopolnjena izdaja, Založba FE in FRI, Ljubljana, 2012 
  3. Human and Machine Hearing: Extracting Meaning from Sound, Cambridge University Press, 2017 
  4. Jurafsky D., Martin J. H., Speech and Language Processing, Stanford University, 3. Ed., 2023 

Bodi na tekočem

Univerza v Ljubljani, Fakulteta za elektrotehniko, Tržaška cesta 25, 1000 Ljubljana

E:  dekanat@fe.uni-lj.si T:  01 4768 411