Financial concepts extraction and lexical simplification in Spanish

Authors

  • Blanca Carbajo Coronado Autonomous University of Madrid
  • Antonio Moreno Sandoval Autonomous University of Madrid

DOI:

https://doi.org/10.58859/rael.v23i1.590

Keywords:

financial language, automatic simplification, linguistic resource, Spanish, specialised lexicon

Abstract

This paper delves into concept extraction and lexical simplification in the financial domain in Spanish. In our approach, concept extraction involves identifying relevant terms and phrases using AI language models, while lexical simplification aims to make complex financial concepts more accessible. For this study, terms were annotated in the FinT-esp financial corpus and the mT5 neural model was used for accurate term extraction. The model yielded remarkable results: 96% of the detected terms had not been manually annotated before, showcasing its noteworthy generative capability. For lexical simplification, the paper proposes three main strategies: paraphrasing, synonym substitution, and translation, all integrated into an interactive interface that addresses the issue of sentence length. This research significantly contributes to financial concept detection and offers an effective method for simplifying financial language in Spanish.

Author Biographies

Blanca Carbajo Coronado, Autonomous University of Madrid

Blanca Carbajo Coronado holds a BA in Translation and Interpreting and is currently a PhD student at the Universidad Autónoma de Madrid with a scholarship (FPU) awarded by the Spanish Ministry of Science, Innovation and Universities. Her thesis deals with cause-effect relations in financial narratives using computational linguistic methods.

Antonio Moreno Sandoval, Autonomous University of Madrid

Antonio Moreno-Sandoval is Professor of Linguistics, Director of the Computational Linguistics Laboratory at the UAM and Director of the UAM-IIC Chair in Computational Linguistics. Since 2010 he is Senior Researcher at the Institute of Knowledge Engineering (IIC-UAM) within the Social Business Analytics group.

References

Alarcón, R., Moreno, L., & Martínez, P. (2023). EASIER corpus: A lexical simplification resource for people with cognitive impairments. PLoS ONE, 18(4). doi: https://doi.org/10.1371/journal.pone.0283622

García Asensio, M. A., & Montolío, E. (2018). Cuestiones del léxico. In E. Montolío (Dir.), Manual de escritura académica y profesional: Estrategias gramaticales y discursivas (pp. 175–220). Barcelona: Ariel Letras.

Gisbert, A. (2021). Financial Narratives. In A. Moreno-Sandoval (Ed.), Financial Narrative Processing in Spanish (pp. 15-50). Valencia: Tirant.

Lang, C., Wachowiak, L., Heinisch, B., & Gromann, D. (2021). Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3607-3620). Online: Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.316

Mateo Martínez, J. (2007). El lenguaje de las ciencias económicas. In E. Alcaraz, J. Mateo, & F. Yus (Eds.), Las lenguas profesionales y académicas (pp. 191-203). Barcelona: Ariel.

Rigouts Terryn, A., Hoste, V., Drouin, P., & Lefever, E. (2020). TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset. In Proceedings of the 6th International Workshop on Computational Terminology (pp. 85-94). Marseille, France: European Language Resources Association.

Rigouts Terryn, A., Hoste, V., & Lefever, E. (2022). A supervised sequential labelling approach to automatic term extraction. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication, 28(1), 157-189.

Román Mínguez, V. (2016). Conocimiento temático y terminológico en traducción contable (inglés-español). Linguae Revista de la Sociedad Española de Lenguas Modernas, 3, 227-250.

Saggion, H. (2017). Automatic Text Simplification. In G. Hirst (Ed.), Synthesis Lectures on Human Language Technologies (Vol. 37). Morgan & Claypool Publishers.

Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., & Raffel, C. (2021). mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 483-498). Online: Association for Computational Linguistics. doi: https://doi.org/10.48550/arXiv.2010.11934.

Downloads

Published

2024-01-31

Issue

Section

Artículos Nuevos