A linguistic approach to semantic extraction from text
Keywords:
Ontology learning, text understanding, Machine-learning, Linguistic-based approach.Abstract
Abstract. Ontology learning from text is the process of distilling knowledge - both implicit and explicit. Machines acquire knowledge either through human intervention or by means of an automatic, human-less learning approach, i.e. unsupervised ontology learning, using unsupervised, automatic understanding of text. Text understanding makes resort to Machine Learning or to a Linguistics-based approach. Both approaches require that a semantic representation of the text be obtained. This paper describes the context of Ontology learning, emphasizing the extraction of semantic content. We review the possible approaches and propose a heuristics based linguistic model for the automatic extraction of semantic content. The model examines the structure of the English sentence and corpus-based facts showing that sentence length is bound. This leads to the conclusion that it is possible to use finite state automata to heuristically detect clause boundaries within sentences. We show a clause-semantics retrieval example that could not be solved using other methods currently available. The semantics of the whole sentence can be obtained by combining the semantics of each individual constituent clause, based on the sentence structure found. A further paper will present the complete automaton for clause boundary detection, together with detailed results and a comparison to other available approaches.Downloads
Published
Issue
Section
License
Attribution - Non-commercial (CC BY-NC). Under this license the user can copy, distribute and publicly display the work and can create derivative works as long as these new creations acknowledge the authorship of the original work and are not used commercially.
Authors retain the copyright and full publishing rights without restrictions.