Information Extraction and Computational Linguistics: A Case for Probabilistic Datalog
Supervisor: Dr Thomas Roelleke
Research group(s): Risk & Information Management
Probabilistic Datalog (PDatalog) is a rule-based programming paradigm that provides a high-level data abstraction. PDatalog can be applied to information management tasks such as classification, summarisation, semantic (knowledge-based) retrieval, prediction and recommendation. This project aims at exploring the options to model methods and algorithms from information extraction and computational linguistics in PDatalog. The syntax and meaning of language can be captured in rules (onthologies), and the semantics of a text can be modelled as a set of facts and rules. The purpose of this project is to investigate the application of probabilistic reasoning to extract information and to reason about language. There are numerous challenges to be addressed. The main hypothesis is that many knowledge engineers (data analysts) can benefit from a high-level abstraction to model methods used for information extraction and in computational linguistics.