Web information extraction and retrieval

Osnovni podatki

Nosilec:

Vrsta predmeta: strokovni izbirni predmet

Število kreditnih točk: 6

Semester izvajanja: 2. semester

Koda predmeta: 63551

Opis predmeta

Content of the course:

This course will cover the following topics:

 

  • Information Retrieval and Web Search

  • Basic Concepts of Information Retrieval

  • Information Retrieval Models

  • Relevance Feedback

  • Evaluation Measures

  • Text and Web Page Pre-Processing

  • Inverted Index and Its Compression

  • Latent Semantic Indexing

  • Web Search

  • Meta-Search: Combining Multiple Rankings

 

  • Web Crawling

  • A Basic Crawler Algorithm

  • Implementation Issues

  • Universal Crawlers

  • Focused Crawlers

  • Topical Crawlers

 

  • Structured Data Extraction

  • Wrapper Induction

  • Instance-Based Wrapper Learning

  • Automatic Wrapper Generation

  • String Matching and Tree Matching

  • Multiple Alignment

  • Building DOM Trees

  • Extraction Based on a Single List Page or Multiple Pages

 

  • Information Integration

  • Schema-Level Matching

  • Domain and Instance-Level Matching

  • Combining Similarities

  • 1:m Match

  • Integration of Web Query Interfaces

  • Constructing a Unified Global Query Interface

     

  • Opinion Mining and Sentiment Analysis

  • Document Sentiment Classification

  • Sentence Subjectivity and Sentiment Classification

  • Opinion Lexicon Expansion

  • Aspect-Based Opinion Mining

  • Opinion Search and Retrieval

 

Cilji

The main objective of this course is to teach students about how to develop programs for web search (including surface web and deep web search) and for extraction of structural data from both, static and dynamic web pages. Beside basic concepts of the web search and retrieval, students will learn about relevant techniques and approaches. After the course, if successful, students will be able to develop programs for automatic web search and structured data extraction from web pages (including search and extraction from on-line social media).

Metode poučevanja in učenja

Lectures, seminars, homeworks, oral presentations, project work.

Na vrh

Bodi na tekočem

Univerza v Ljubljani, Fakulteta za elektrotehniko, Tržaška cesta 25, 1000 Ljubljana

E:  dekanat@fe.uni-lj.si T:  01 4768 411