Systems for processing large amounts of data

Course description

Data collection: smart phones, sensors and internet-connected devices, web, cleaning and preparation of data, data anonymization and de-identification.

Data retention; scalable relational databases, NoSQL databases, understanding the compromise between the consistency of data, performance and availability.

Data processing: event-oriented processing, processing parallelization (map-reduce), extraction of structured data from unstructured.

Analyses: efficient algorithms for processing and analysis of data, machine learning

Visualization, procedures and challenges of visualizing large amounts of data, other modalities of presentation of data (soundification, etc.).

Applications of the presented techniques: systems for context detection, smart systems (applications of smart cities, smart transport, etc.), medical applications, social networks, financial systems

Course is carried out on study programme

Objectives and competences

Is familiar with the concept of "big data". Able to evaluate the amount of data, the rate of events, their diversity, and the key challenges associated with large amounts of data.

Knows the difference and can choose among relational or NoSQL database, and evaluate the appropriateness of use.

Knows the strengths and weaknesses of map-reduce model and evaluates it in comparison with relational databases.

Can apply basic analytical and visualization techniques for working with large amounts of data in a use-case.

Learning and teaching methods

Lectures or mentoring

Seminar

Intended learning outcomes

Understanding the concept of "big data": data volume, events and their diversity, and key challenges associated with large amounts of data.

Understanding of relational databases, their capabilities and limitations.

Understanding the capabilities, strengths and weaknesses of NoSQL databases.

Understanding of map-reducer model, its strengths and weaknesses, as well as a comparison with relational databases.

Understanding of basic analytical and visualization techniques for working with large amounts of data.

Reference nosilca

  1. DROBNIČ, Franc, KOS, Andrej, PUSTIŠEK, Matevž. On the interpretability of machine learning models and experimental feature selection in case of multicollinear data. Electronics. May 2020, no. 5, 761, str. 1-15, ilustr. ISSN 2079-9292. https://www.mdpi.com/2079-9292/9/5/761, DOI: 10.3390/electronics905076 [COBISS.SI-ID 14438659]
  2. KREN, Matej, KOS, Andrej, SEDLAR, Urban. Mining the IPTV channel change event stream to discover insight and detect ads. Mathematical problems in engineering. [Print ed.]. 2016, vol. 2016, str. 1-5, ilustr. ISSN 1024-123X. http://www.hindawi.com/journals/mpe/2016/2541814/, DOI: 10.1155/2016/2541814. [COBISS.SI-ID 11307860]
  3. KREN, Matej, KOS, Andrej, SEDLAR, Urban. Modeling opinion of IPTV viewers based on implicit feedback and content metadata. IEEE access. 2019, vol. 7, str. 14455 – 14462, ilustr. ISSN 2169-3536. https://ieeexplore.ieee.org/document/8607973, DOI: 10.1109/ACCESS.2019.2891837. [COBISS.SI-ID 12380756]
  4. MIHELJ, Jernej, ZHANG, Yuan, KOS, Andrej, SEDLAR, Urban. Crowdsourced traffic event detection and source reputation assessment using smart contracts. Sensors. Aug.-1 2019, iss. 15, 3267, str. 1-17, ilustr. ISSN 1424-8220. https://www.mdpi.com/1424-8220/19/15/3267, DOI: 10.3390/s19153267. [COBISS.SI-ID 12587860]
  5. SEDLAR, Urban, ŠTEFANIĆ JUŽNIČ, Leon, KREN, Matej, RABZELJ, Matej, KOS, Andrej, VOLK, Mojca. IoT cybersecurity : research challenges and opportunities ahead. IEEE IoT newsletter. May 2020, 1 spletni vir (5 str.), ilustr. https://iot.ieee.org/newsletter/may-2020/iot-cybersecurity-research-challenges-and-opportunities-ahead.html?highlight=WyJpb3QiLCJpb3QncyIsIidpb3QncyIsImN5YmVyc2VjdXJpdHkiLCJpb3QgY3liZXJzZWN1cml0eSJd. [COBISS.SI-ID 26780675]

Study materials

  1. European Commission: http://www.internet-of-things-research.eu/pdf/Converging_Technologies_for_Smart_Environments_and_Integrated_Ecosystems_IERC_Book_Open_Access_2013.pdf
  2. Tom White: Hadoop: The Definitive Guide, 3rd Edition; Storage and Analysis at Internet Scale; O'Reilly Media
  3. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman: Mining of Massive Datasets, http://i.stanford.edu/~ullman/mmds/book.pdf
  4. Jimmy Lin, Chris Dyer: Data-Intensive Text Processing with MapReduce, http://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf
  5. Tamara Munzner: Visualization Analysis and Design (2014 Draft) http://www.cs.ubc.ca/~tmm/courses/533/book/vispmp-draft.pdf
  6. Scott Murray: Interactive Data Visualization for the Web: An Introduction to Designing with D3, O'Reilly Media

Bodi na tekočem

Univerza v Ljubljani, Fakulteta za elektrotehniko, Tržaška cesta 25, 1000 Ljubljana

E:  dekanat@fe.uni-lj.si T:  01 4768 411