Speech and Image Technology

Course description

Introduction: description of the field, short outline of the historical develoment of speech and image technologies.

Basic characteristics of visual and auditory perception and human speech-based communication. Representation of speech and image patterns.

Pattern recognition: structural description, pattern recognition systems in general, feature extraction, learning, classification and clustering in pattern recognition systems.

Speech processing: acquisition and preprocessing, speech features, speech signal segmentation, databases of speech.

Speech recognition: types of speech-recognition systems, statistical modelling, acoustic and  langauge modelling, semantic analysis of speech.

Artificial speech: systems for speech synthesis in general, grapheme-to-phoneme conversion, prosody modelling, speech-synthesis procedures.

Dialogue: automated dialogue systems in general, approached to designing human-computer dialogue systems, assessment of dialogue systems.

Image technologies: terminology, use-cases, basic image transformations, color images and color spaces, image coding.

Image processing: image processing in the spatial and frequency domains, noise models and image restoration, morphological operations and algorithms, edge detection

Advanced algorithms, local descriptors and their applications, object detection in images, object recognition from image data, subspaces for data representation.

Image segmentation: clustering techniques and thier application to image segmentation, mean-shift.

Course is carried out on study programme

Multimedija 1. stopnja

Objectives and competences

The aim of this course is to acquaint students with the field of speech and image technologies and introduce various algoritms, techniques, and methods to acomplish tasks related to this field.

Learning and teaching methods


Interactive teaching

Practical assignements

Intended learning outcomes

After successful completion of the course, students should be able to:

  • define the main approaches to the representation, description, synthesis and recognition of speech and image signals,
  • describe the characteristics, components, structure and capabilities of speech and image-based technologies,
  • use selected programing solutions (APIs) for the development of spoken man – machine communication systems, image processing and image recognition applications,
  • distinguish between different tasks of speech and image technologies and representation and processing methods needed to achieve these tasks,
  • combine basic procedures for representation and processing of speech and image data into complex systems for recognition and synthesis of images and speech,
  • evaluate the accuracy and reliability of speech and image technologies systems.

Reference nosilca

1. GRM, Klemen, SCHEIRER, Walter J., ŠTRUC, Vitomir. Face hallucination using cascaded super-resolution and identity priors. IEEE transactions on image processing, ISSN 1057-7149, 2020, vol. 29, no. 1, str. 2150-2165.

2. KOVAČ, Jure, ŠTRUC, Vitomir, PEER, Peter. Frame-based classification for cross-speed gait recognition. Multimedia tools and applications, ISSN 1380-7501, Mar. 2019, vol. 78, no. 5, str. 5621-5643.

3. KRIŽAJ, Janez, PEER, Peter, ŠTRUC, Vitomir, DOBRIŠEK, Simon. Simultaneous multi-descent regression and feature learning for facial landmarking in depth images. Neural computing & applications, ISSN 0941-0643, 2019, str. 1-18.

4. ULČAR, Matej, DOBRIŠEK, Simon, ROBNIK ŠIKONJA, Marko. Razpoznavanje slovenskega govora z metodami globokih nevronskih mrež. Uporabna informatika, ISSN 1318-1882., 2019, letn. 27, št. 3, str. 96-109.

5. CATALBAS, Mehmet Cem, DOBRIŠEK, Simon. 3D moving sound source localization via conventional microphones. Elektronika ir elektrotechnika, ISSN 1392-1215. [Print ed.], 2017, vol. 23, no. 4, str. 63-69.


Study materials

  1. Mihelič F., Žibert J., Hajdinjak M., Štruc V., Skripta za predmet Govorne in slikovne tehnologije, Izdaja, Ljubljana, Fakulteta za elektrotehniko, 2012.
  2. Mihelič F., Signali, Založba FE in FRI, Ljubljana, 2006.
  3. Pavešić N., Razpoznavanje vzorcev: uvod v analizo in razumevanje vidnih in slušnih vzorcev, Popravljena in dopolnjena izdaja, Založba FE in FRI, Ljubljana, 2012.
  4. Rabiner L., Schafer R., Theory and Applications of Digital Speech Processing, Prentince Hall, 1. Ed., 2010.
  5. Gonzales R. C., Woods, R.E., Digital Image Processing, 3 izdaja, Prentice Hall, 2007.
  6. R.C. Gonzales, R.E. Woods, S.L. Eddins, Digital image processing using Matlab, 2 izdaja. Gatesmark Publishing, 2009.

Bodi na tekočem

Univerza v Ljubljani, Fakulteta za elektrotehniko, Tržaška cesta 25, 1000 Ljubljana

E:  dekanat@fe.uni-lj.si T:  01 4768 411