150413 Msc2

MSc defense Marcel Heinz


  • Date: 13 April 2015 (Monday)
  • Time: 15.00
  • Room: B 013
  • Presenter: Marcel Heinz
  • Supervisor: Ralf Lämmel
  • 2nd reviewer: Martin Leinberger


Revising Wikipedia's computer language domain based on bad smells


An ontology serves as a collection of information pieces representing either general knowledge or a specific domain. Various ways exist to create an ontology. Other than manual labor, information extraction programs deal with the problem of retrieving information from available sources considering a specified set of criteria.

Wikipedia is currently the biggest available encyclopedia online. Therefore it is the target of numerous information extraction approaches. The 􏰆rst part of this thesis is concerned with information extraction from Wikipedia. It shows what may be extracted from pages and how it can be mapped to an underlying model.

The second part is concerned with the analysis of an extracted ontology. Common ways known in the 􏰆eld of software engineering are utilized to analyze an ontology that was extracted from Wikipedia. A bad smell approach with a preceding metric analysis is proposed. The bad smells are inspired by a set of related work from various areas such as semantic wikis, source code refactoring and general ontologies.

Based on the results from the second part the third part contains a proposal on how to improve the extracted ontology. The proposal is inspired by related work on general ontologies and source code refactoring. For ontologies one may distinguish between three primary revising activities namely re􏰆ne, prune and refactor. In this thesis a three staged pruning algorithm is adapted from an existing approach designed for general ontologies. Its goal is to remove irrelevant or 􏰇awed information in the extracted ontology. Finally, a small set of refactorings is described that are needed to improve the structure of the extracted ontology. The refinement activity is out of scope for this thesis. Exemplary approaches to re􏰆refinement are summarized in the context of related work.