¿Tienen GPT-3.5 y GPT-4 un estilo de escritura diferente del estilo humano? Un estudio exploratorio para el español

Lara Alonso Simón; Ana María Fernández-Pampillón Cesteros; Marianela Fernández Trinidad; Manuel Márquez Cruz

doi:10.58859/rael.v23i1.666

Authors

Lara Alonso Simón Universidad Complutense de Madrid
Ana María Fernández-Pampillón Cesteros Universidad Complutense de Madrid
Marianela Fernández Trinidad Universidad Complutense de Madrid
Manuel Márquez Cruz Universidad Complutense de Madrid

DOI:

https://doi.org/10.58859/rael.v23i1.666

Keywords:

writing style, large language models, GPT-3.5, GPT-4, corpus linguistics

Abstract

The aim of this research is to verify, using statistical techniques, that the generative language models GPT-3.5 (free version) and GPT-4 (paid version) of ChatGPT have their own writing style distinct from that of humans and that they can be distinguished by at least three types of features: lexical features, punctuation marks and syntactic sentence structure. Determining whether large language models have their own style is relevant in order to detect automatic authorship of texts. In previous work, a comparable corpus of human and automatic texts in Spanish was constructed and, through a qualitative study, a set of linguistic and stylistic features specific to each author was identified. In this work, it has been quantitatively demonstrated that the 17 identified lexical and punctuation variables show statistically significant differences between human authors and the GPT-3.5 and GPT-4 models.

References

Alonso Simón, L., Gonzalo Gimeno, J. A., Fernández-Pampillón Cesteros, A. M.ª, Fernández Trinidad, M. y Escandell Vidal, M.ª V. (2023). Using Linguistic Knowledge for Automated Text Identification. En M. Montes y Gómez et al. (Eds.), Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023). Jaén, España, 26 de septiembre. https://ceur-ws.org/Vol-3496/autextification-paper17.pdf

Berber Sardinha, T. (2024). AI-generated vs human-authored texts: A multidimensional comparison. Applied Corpus Linguistics, 4(1). https://doi.org/10.1016/j.acorp.2023.100083

Cañete, J., Chaperon, G., Fuentes, R., Ho, J-H., Kang, H. y Pérez, J. (2020). Spanish pretrained BERT model and evaluation data. arXiv:2308.02976v1. https://doi.org/10.48550/arXiv.2308.02976

Cardenuto, J. P., Yang, J., Padilha, R., Wan, R., Moreira, D., Li, H., Wang, S., Andaló, F., Marcel, S. y Rocha, A. (2023). The Age of Synthetic Realities: Challenges and Opportunities. APSIPA Transactions on Signal and Information Processing, 12(1), 1–62. https://doi.org/10.1561/116.00000138

Casal, J. E. y Kessler, M. (2023). Can linguists distinguish between ChatGPT/AI and human writing?: A study of research ethics and academic publishing. Research Methods in Applied Linguistics, 2(3). https://doi.org/10.1016/j.rmal.2023.100068

Corizzo, R. y Leal-Arenas, S. (2023). A Deep Fusion Model for Human vs. Machine-Generated Essay Classification. En D. Wang y T. Toyoizumi (Eds.), Proceedings of the International Joint Conference on Neural Networks (IJCNN). Gold Coast, Australia, 18-23 de junio. https://doi.org/10.1109/IJCNN54540.2023.10191322

Crothers, E. N., Japkowicz, N. y Viktor, H. L. (2023). Machine-Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. arXiv:2210.07321, Oct. 2023. https://doi.org/10.1109/ACCESS.2023.3294090

Desaire, H., Chua, A. E., Isom, M., Jarosova, R. y Hua, D. (2023). Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Reports Physical Science, 4(6). https://doi.org/10.1016/j.xcrp.2023.101426

Fernández Vítores, D. (2023). El español: una lengua viva. Informe 2023. En C. Pastor Villalba (dir.), Instituto Cervantes (coord.), El español en el mundo. Anuario del Instituto Cervantes 2023 (pp. 19-142). Madrid: Instituto Cervantes.

Fröhling, L. y Zubiaga, A. (2021). Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover. PeerJ Computer Science, 7, 1–23. https://doi.org/10.7717/PEERJ-CS.443

Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J. y Wu, Y. (2023). How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation y Detection. arXiv:2301.07597v1. https://doi.org/10.48550/arXiv.2301.07597

Hadi, M. U., Al-Tashi, O., Qureshi, R., Shah, A., Muneer, A., Irfan, M., Zafar, A., Shaikh, M., Akhtar, N., Wu, J. y Mirjalili, S. (2023). Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects. TechRxiv. https://doi.org/10.36227/techrxiv.23589741.v4

He, Z., Mao, R. y Liu, Y. (2024). Predictive model on detecting ChatGPT responses against human responses. Applied and Computational Engineering, 44(1), 18–25. https://doi.org/10.54254/2755-2721/44/20230078

Jawahar, G., Abdul-Mageed, M. y Lakshmanan, L. V. S. (2020). Automatic Detection of Machine Generated Text: A Critical Survey. En D. Scott, N. Bel, y C. Zong (Eds.), Proceedings of the 28th International Conference on Computational Linguistics (pp. 2296–2309). Barcelona: International Committee on Computational Linguistics. arXiv:2011.01314. https://doi.org/10.48550/arXiv.2011.01314

Jurafsky, D. y Martin, J. H. (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd ed. draft). Stanford University. Recuperado de https://web.stanford.edu/~jurafsky/slp3/

[...]