A linguistic approach to semantic extraction from text


  • Abel Browarnik
  • Oded Maimon

Palabras clave:

Ontology learning, text understanding, Machine-learning, Linguistic-based approach.


Abstract. Ontology learning from text is the process of distilling knowledge - both implicit and explicit. Machines acquire knowledge either through human intervention or by means of an automatic, human-less learning approach, i.e. unsupervised ontology learning, using unsupervised, automatic understanding of text. Text understanding makes resort to Machine Learning or to a Linguistics-based approach. Both approaches require that a semantic representation of the text be obtained. This paper describes the context of Ontology learning, emphasizing the extraction of semantic content. We review the possible approaches and propose a heuristics based linguistic model for the automatic extraction of semantic content. The model examines the structure of the English sentence and corpus-based facts showing that sentence length is bound. This leads to the conclusion that it is possible to use finite state automata to heuristically detect clause boundaries within sentences. We show a clause-semantics retrieval example that could not be solved using other methods currently available. The semantics of the whole sentence can be obtained by combining the semantics of each individual constituent clause, based on the sentence structure found. A further paper will present the complete automaton for clause boundary detection, together with detailed results and a comparison to other available approaches.

Biografía del autor/a

Abel Browarnik

Department of Industrial Engineering, Tel Aviv University.

Oded Maimon

Department of Industrial Engineering, Tel Aviv University.