Subject description
Within the course, students familiarize themselves with the development and application of intelligent audio and speech systems. Fundamental aspects of human auditory perception are covered, along with basic methods for the analysis and synthesis of speech and other sound signals.
The course addresses methods for preprocessing and analyzing the characteristics of captured sound signals, enabling the recognition of distinctive sounds such as various environmental sounds, animal sounds, and traffic sounds. Attention is also given to the possibility of sound analysis for verifying the operation of machines and devices and for diagnosing their malfunctions.
Special emphasis is placed on systems for speech and speaker recognition and on the statistical modeling of spoken languages. Systems for generating synthetic speech and human-machine communication systems are also discussed, including dialogue management systems, models for representing knowledge in such systems, and multimodality in communication.
The subject is taught in programs
Objectives and competences
The aim of this course is to acquaint students with the field of machine hearing and speech technologies and to introduce various algorithms, techniques, and methods to accomplish different tasks related to field of the course.
Teaching and learning methods
Lectures, Interactive teaching, Practical assignments, Seminar work.
Expected study results
After successful completion of the course, students should be able to:
- define the main approaches to the representation, description, synthesis and recognition of different sound and speech signals,
- describe the characteristics, components, structure and capabilities of machine hearing systems and speech technologies,
- use selected programming solutions (APIs) for the development of machine hearing systems and spoken man – machine communication systems,
- distinguish between different tasks of machine hearing systems and speech technologies and representation and processing methods needed to achieve these tasks,
- combine basic procedures for representation and processing of sound data into complex systems for the recognition of different sounds, and for the recognition and synthesis of speech,
- evaluate the accuracy and reliability of the machine hearing and speech systems.
Basic sources and literature
-
N. Pavešić: Razpoznavanje vzorcev : uvod v analizo in razumevanje vidnih in slušnih signalov, 3. popravljena in dopolnjena izdaja, Založba FE in FRI, 2012 . ISBN 978-961-243-201-0. [COBISS.SI-ID 260256256]
-
I. McLoughlin, Applied speech and audio processing: with Matlab examples. Cambridge University Press. 2009. ISBN 978-0-521-51954-0. [COBISS.SI-ID 7828564]
-
J. Davies, M. Grobelnik, D. Mladenić: Semantic knowledge management : integrating ontology management, knowledge discovery, and human language technologies. Springer, 2009. ISBN 978-3-540-88844-4. [COBISS.SI-ID 22434599]
-
S. Narayanan, A. Alwan: Text-to-Speech Synthesis, Prentice Hall Professional Technical Reference, 2005. ISBN 0-13-145661-X. [COBISS.SI-ID 4613972]