Module I: Speech Technologies

Subject description

Introduction: description of the field, a brief historical outline of the development of intelligent audio and speech systems and the importance of research and knowledge acquisition in this field for the Slovenian language.

Basic characteristics of the auditory perception of different sounds and the synthesis and perception of speech in human speech communication. Representations of audio and speech signals.

Computer processing of sound and speech signals: pre-processing, sound signal features, speech signal segmentations, sound and speech databases.

Systems for the recognition of various sounds: recognition of environmental and animal sounds, recognition of traffic sounds, audio verification of machine operation.

Speech recognition systems: speaker recognition and verification, isolated word and continuous speech recognition, spontaneous speech recognition. Statistical acoustic and language modeling, semantic speech analysis.

Artificial speech: systems for speech synthesis in general, grapheme-to-phoneme conversion, prosody modeling, speech-synthesis procedures. Assessment of speech synthesis systems.

Dialogue: automated dialogue systems in general, system configurations, dialogue management, knowledge representations, multimodality, assessment of dialogue systems.

The subject is taught in programs

Objectives and competences

The aim of this course is to acquaint students with the field of machine hearing and speech technologies and to introduce various algorithms, techniques, and methods to accomplish different tasks related to field of the course.

Teaching and learning methods

Lectures
Interactive teaching
Practical assignments
Seminar work

Expected study results

After successful completion of the course, students should be able to:

define the main approaches to the representation, description, synthesis and recognition of different sound and speech signals,
describe the characteristics, components, structure and capabilities of machine hearing systems and speech technologies,
use selected programming solutions (APIs) for the development of machine hearing systems and spoken man – machine communication systems,
distinguish between different tasks of machine hearing systems and speech technologies and representation and processing methods needed to achieve these tasks,
combine basic procedures for representation and processing of sound data into complex systems for the recognition of different sounds, and for the recognition and synthesis of speech,
evaluate the accuracy and reliability of the machine hearing and speech systems.

Basic sources and literature

Mihelič F., Signali, Založba FE in FRI, Ljubljana, 2014
Pavešić N., Razpoznavanje vzorcev: uvod v analizo in razumevanje vidnih in slušnih vzorcev, 3. Popravljena in dopolnjena izdaja, Založba FE in FRI, Ljubljana, 2012
Human and Machine Hearing: Extracting Meaning from Sound, Cambridge University Press, 2017
Jurafsky D., Martin J. H., Speech and Language Processing, Stanford University, 3. Ed., 2023