Speech and Image Technology

Subject description

The course offers a comprehensive insight into the field of speech and image technologies, with a particular emphasis on pattern recognition systems. The introduction of the course covers the history and basic characteristics of the field, including the fundamentals of auditory perception and human speech communications, as well as methods of speech information representation and encoding. It then focuses on speech processing, covering speech capture, preprocessing, and features of speech signals, as well as procedures for speech analysis and the use of speech databases. Various types of speech recognition systems are presented, as well as methods for statistical modeling of acoustic and language representations, and semantic speech analysis.

Special attention is paid to synthetic speech, where the course focuses the structure of speech-synthesis systems, grapheme-to-phoneme conversion, prosody modeling, and methods of synthetic speech signal generation. The course also addresses human-computer dialogue systems, methods for dialogue management, and corresponding evaluation methodology. In the context of image technologies, the course first introduces basic concepts, the importance of image technologies, transformations of image data, color spaces, image encoding, and processing of image data in the spatial and frequency domains. It also introduces noise models, restoration techniques, morphological operations, and edge detection procedures. Advanced algorithms for local descriptors, object detection and recognition, and subspaces for data representation are also presented. The course also covers image segmentation procedures based on clustering and other approaches such as mean-shift. 

The subject is taught in programs

Objectives and competences

The aim of this course is to acquaint students with the field of speech and image technologies and introduce various algoritms, techniques, and methods to acomplish tasks related to this field.

Teaching and learning methods

Lectures

Interactive teaching

Practical assignements

Expected study results

After successful completion of the course, students should be able to:

  • define the main approaches to the representation, description, synthesis and recognition of speech and image signals,
  • describe the characteristics, components, structure and capabilities of speech and image-based technologies,
  • use selected programing solutions (APIs) for the development of spoken man – machine communication systems, image processing and image recognition applications,
  • distinguish between different tasks of speech and image technologies and representation and processing methods needed to achieve these tasks,
  • combine basic procedures for representation and processing of speech and image data into complex systems for recognition and synthesis of images and speech,
  • evaluate the accuracy and reliability of speech and image technologies systems.

Basic sources and literature

  1. Mihelič F., Signali, Založba FE in FRI, Ljubljana, 2006. 

  2. Pavešić N., Razpoznavanje vzorcev: uvod v analizo in razumevanje vidnih in slušnih vzorcev, Popravljena in dopolnjena izdaja, Založba FE in FRI, Ljubljana, 2012. 

  3. Rabiner L., Schafer R., Theory and Applications of Digital Speech Processing, Prentince Hall, 1. Ed., 2010. 

  4. Gonzales R. C., Woods, R.E., Digital Image Processing, 3 izdaja, Prentice Hall, 2007. 

  5. R.C. Gonzales, R.E. Woods, S.L. Eddins, Digital image processing using Matlab, 2 izdaja. Gatesmark Publishing, 2009. 

Stay up to date

University of Ljubljana, Faculty of Electrical Engineering Tržaška cesta 25, 1000 Ljubljana

E:  dekanat@fe.uni-lj.si T:  01 4768 411