Abstract:
|
One important issue when constructing Information
Extraction systems is how to obtain the knowledge needed for
identifying relevant information in a document. In most approaches to
this issue, the human expert intervention is necessary in many steps of
the acquisition process.
In this paper we describe {sc Essence}, a new methodology that
reduces significantly the
need for human intervention. It is based on ELA, a new algorithm for
acquiring information extraction patterns.
The distinctive features of {sc Essence} and ELA are that 1)
allow to automatically acquire IE patterns from
unrestricted text corpus representative of the domain, due to 2)
the ability of identifying surrounding context regularities
for semantically relevant concept-words for the IE task
by using non domain specific lexical knowledge tools and
semantic relations from WordNet, and 3) restricting the human
intervention to only the definition of the task and the validation and
typification of the set of IE patterns obtained.
The use of a general purpose ontology and syntactic tools of general
application allows the easy portability of the methodology and
reduces the expert effort. Results of the application of this
methodology for acquiring extraction
patterns in a MUC-like task are also shown. |