Financial concepts extraction and lexical simplification in Spanish
DOI:
https://doi.org/10.58859/rael.v23i1.590Keywords:
financial language, automatic simplification, linguistic resource, Spanish, specialised lexiconAbstract
This paper delves into concept extraction and lexical simplification in the financial domain in Spanish. In our approach, concept extraction involves identifying relevant terms and phrases using AI language models, while lexical simplification aims to make complex financial concepts more accessible. For this study, terms were annotated in the FinT-esp financial corpus and the mT5 neural model was used for accurate term extraction. The model yielded remarkable results: 96% of the detected terms had not been manually annotated before, showcasing its noteworthy generative capability. For lexical simplification, the paper proposes three main strategies: paraphrasing, synonym substitution, and translation, all integrated into an interactive interface that addresses the issue of sentence length. This research significantly contributes to financial concept detection and offers an effective method for simplifying financial language in Spanish.
References
Alarcón, R., Moreno, L., & Martínez, P. (2023). EASIER corpus: A lexical simplification resource for people with cognitive impairments. PLoS ONE, 18(4). doi: https://doi.org/10.1371/journal.pone.0283622
García Asensio, M. A., & Montolío, E. (2018). Cuestiones del léxico. In E. Montolío (Dir.), Manual de escritura académica y profesional: Estrategias gramaticales y discursivas (pp. 175–220). Barcelona: Ariel Letras.
Gisbert, A. (2021). Financial Narratives. In A. Moreno-Sandoval (Ed.), Financial Narrative Processing in Spanish (pp. 15-50). Valencia: Tirant.
Lang, C., Wachowiak, L., Heinisch, B., & Gromann, D. (2021). Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3607-3620). Online: Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.316
Mateo Martínez, J. (2007). El lenguaje de las ciencias económicas. In E. Alcaraz, J. Mateo, & F. Yus (Eds.), Las lenguas profesionales y académicas (pp. 191-203). Barcelona: Ariel.
Rigouts Terryn, A., Hoste, V., Drouin, P., & Lefever, E. (2020). TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset. In Proceedings of the 6th International Workshop on Computational Terminology (pp. 85-94). Marseille, France: European Language Resources Association.
Rigouts Terryn, A., Hoste, V., & Lefever, E. (2022). A supervised sequential labelling approach to automatic term extraction. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication, 28(1), 157-189.
Román Mínguez, V. (2016). Conocimiento temático y terminológico en traducción contable (inglés-español). Linguae Revista de la Sociedad Española de Lenguas Modernas, 3, 227-250.
Saggion, H. (2017). Automatic Text Simplification. In G. Hirst (Ed.), Synthesis Lectures on Human Language Technologies (Vol. 37). Morgan & Claypool Publishers.
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., & Raffel, C. (2021). mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 483-498). Online: Association for Computational Linguistics. doi: https://doi.org/10.48550/arXiv.2010.11934.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Blanca Carbajo Coronado, Antonio Moreno Sandoval
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Attribution - Non-commercial (CC BY-NC). Under this license the user can copy, distribute and publicly display the work and can create derivative works as long as these new creations acknowledge the authorship of the original work and are not used commercially.
Authors retain the copyright and full publishing rights without restrictions.