Compiling a Corpus of User-Generated Content Units for the Detection of Social Problems

Autores/as

DOI:

https://doi.org/10.58859/rael.v23i1.668

Palabras clave:

subcorpus, user-generated content, elderly institutional abuse, ALLEGRO, DIAPASON

Resumen

Social media platforms like Facebook, X, and Instagram provide valuable information about daily and global social problems through their user-generated content units. Such platforms turn their users into social sensors capable of identifying problems such as violence against women or natural hazards. Within the field of crowdsensing, which aims to extract useful information from these social sensors, we introduce a proposal for identifying problems in ALLEGRO (http://allegro.ucam.edu/). Based on previous studies within this smart multimodal system, our research first explains the conceptual framework for addressing one of these problems, namely, elderly institutional abuse, within the text analysis module of ALLEGRO or DIAPASON. We also detail the methodology and challenges of compiling a subcorpus of tweets related to this problem. Such a specific subcorpus will contribute to ALLEGRO’s comprehensive corpus of social problems, which is being built as a training set for deep learning models in text classification.

Biografía del autor/a

Rocío Jiménez-Briones, Universidad Autónoma de Madrid

Senior Lecturer in English language and linguistics at the Universidad Autónoma de Madrid (Spain)

Citas

Alameda Hernández, Á. (2024). Social media detection of texts on the social problem of violence against women within the multimodal intelligent system ALLEGRO. In R. Jiménez-Briones & A. Corral Esteban (Eds.), Approaches to Knowledge Representation and Language (pp. 63-75). Granada: Comares.

Alayiaboozar, E., & Hojjatpanah, A. A. (2022). Steps for creating two Persian specialized corpora. International Journal of Information Science and Management (IJISM), 20(4), 231-243.

An, J., & Weber, I. (2015). Whom should we sense in “social sensing” - analysing which users work best for social media now-casting. EPJ Data Science, 4(22), 1-22. https://doi.org/10.1140/epjds/s13688-015-0058-9

Appio F. P., Lima, M., & Paroutis, S. (2019). Understanding Smart Cities: Innovation ecosystems, technological advancements, and societal challenges. Technological Forecasting & Social Change, 142, 1-14. https://doi.org/10.1016/j.techfore.2018.12.018

Arthur, R., Boulton, C. A., Shotton, H., & Williams, H. T. P. (2018). Social sensing of floods in the UK. PLoS ONE, 13(1), e0189327. https://doi.org/10.1371/journal.pone.0189327

Bates, C. G., & Ciment, J. (Eds.) (2013). Global Social Issues: An Encyclopedia. New York: Sharpe Reference.

Bathia, V., Sánchez Hernández, P., & Pérez Paredes, P. (2011) Specialized languages: Corpora, meta-analyses and applications. Researching Specialized languages, 47(1), 1-8. https://doi.org/10.1075/scl.47.02bha

Bazzaz, A. S., Haghi, K. M., Mahdipour, E., & Jameii, S. M. (2021). Big data analytics meets social media: A systematic review of techniques, open issues, and future directions. Telematics and Informatics, 57, 101517. https://doi.org/10.1016/j.tele.2020.101517

Best, J. (1995). Typification and social problems construction. In J. Best (Ed.), Images of Issues: Typifying Contemporary Social Problems (pp. 1-10). London: Routledge.

Best, J. (2017). Social Problems (3rd ed.). New York City: W.W. Norton & Company.

Bowker, L., & Pearson, J. (2002) Working with Specialized Language: A Practical Guide to Using Corpora. London and New York: Routledge. https://doi.org/10.4324/9780203469255

Eitzen, S., Baca Zinn, M., & Eitzen Smith, K. (2014). Social Problems (13th ed.) Boston: Pearson.

Espinoza-Arias, P., Poveda-Villalón, M., García-Castro, R., & Corcho, O. (2019). Ontological representation of smart city data: From devices to cities. Applied Sciences, 9(1), 1-23. https://doi.org/10.3390/app9010032

Felices Lago, Á. (2024). Description of social problems by means of schemas related to the Income domain in the DIAPASON platform. In R. Jiménez-Briones & A. Corral Esteban (Eds.), Approaches to Knowledge Representation and Language (pp. 27-43). Granada: Comares.

Felices Lago, Á. (2025). Towards the characterization of WORK problem schemas in the DIAPASON ontology. Sintagma 37.

Fernández-Martínez, N. J. (2024). Exploring the creation of synthetic corpora of negative communicative functions for the task of communicative function identification. In R. Jiménez-Briones & A. Corral Esteban (Eds.), Approaches to Knowledge Representation and Language (pp. 77-93). Granada: Comares.

Gething, L. (1994). Health professional attitudes towards ageing and older people: Preliminary report of the reactions to Ageing Questionnaire. Australian Journal on Ageing, 13(2), 77-81. https://doi.org/10.1111/j.1741-6612.1994.tb00646.x

Gething, L., Fethney, J., McKee, K., Goff, M., Churchward, M., & Matthews, S. (2002). Knowledge, stereotyping and attitudes towards self-ageing. Australasian Journal on Ageing, 2(2), 74-79. https://doi.org/10.1111/j.1741-6612.2002.tb00421.x

Ghani, N. A., Hamid, S., Targio Hashem, I. A., & Ahmed, E. (2019). Social media big data analytics: A survey. Computers in Human Behavior, 101, 417-428. https://doi.org/10.1016/j.chb.2018.08.039

Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovic, N., & Meijers, E. (2007). Smart cities - Ranking of European medium-sized cities. Retrieved from http://www.smart-cities.eu/download/smart_cities_final_report.pdf.

Govada, S. S., Spruijt, W., & Rodgers, T. (2017). Smart city concept and framework. In T. M. V. Kumar (Ed.), Smart Economy in Smart Cities. Advances in 21st Century Human Settlements (pp. 187-198). New York City: Springer. https://doi.org/10.1007/978-981-10-1610-3_7

Holtgraves, T. (1994). Communication in context: Effects of speaker status on the comprehension of indirect requests. Journal of Experiential Psychology: Learning, Memory, and Cognition, 20(5), 1205-1218. https://doi.org/10.1037//0278-7393.20.5.1205

Jiménez-Briones, R., & Felices Lago, Á. (2024). La formalización del conocimiento en DIAPASON a través de una muestra de problemas poblacionales y macroeconómicos. In F. Olmo-Cazevieille (Ed.), Investigación Lingüística en Entornos Digitales (pp. 187-216). Granada: Tirant Lo Blanch.

Klaus, D., Engstler, H., Mahne, K., Wolff, J. K., Simonson, J., Wurm, S., & Tesch-Römer, C. (2017). Cohort Profile: The German Ageing Survey (DEAS). International Journal of Epidemiology, 46(4), 1105-1105. https://doi.org/10.1093/ije/dyw326

Laidlaw, K., Power, M. J., Schmidt, S., & the WHOQOL-OLD Group (2007). The attitudes to ageing questionnaire (AAQ): Development and psychometric properties. International Journal of Geriatric Psychiatry, 22, 367-379. https://doi.org/10.1002/gps.1683

Lee, H. J., Lee, D. K., & Song, W. (2019). Relationships between social capital, social capital satisfaction, self-esteem, and depression among elderly urban residents: Analysis of secondary survey data. International Journal of Environmental Research and Public Health, 16, 1445, 2-13. https://doi.org/10.3390/ijerph16081445

Li, W., Wu, W., Wang, H., Cheng, X., Chen, H., Zhou, Z., & Ding, R. (2017). Crowd intelligence in AI 2.0 era. Frontiers of Information Technology & Electronic Engineering, 18, 15-43. https://doi.org/10.1016/j.intell.2017.04.004

Marginean, I. (2014). Quality of Life Diagnosis (QoLD). In A. C. Michalos (Ed.), Encyclopedia of Quality of Life and Well-Being Research (pp. 5333-5339). Dordrecht: Springer. https://doi.org/10.1007/978-94-007-0753-5_2358

Musto, C., Semeraro, G., Lops, P., & De Gemmis, M. (2015). CrowdPulse: A framework for real-time semantic analysis of social streams. Information Systems, 54, 127-146. https://doi.org/10.1016/j.is.2015.06.007

OECD iLibrary (Organisation for Economic Co-operation and Development). Retrieved from https://www.oecd-ilibrary.org/.

OECD (2020). How’s Life? 2020: Measuring Well-Being. Paris: OECD Publishing. Retrieved from https://www.oecd-ilibrary.org/economics/how-s-life/volume-/issue-_9870c393-en. https://doi.org/10.1787/9870c393-en

OECD (2022). “Population” (indicator). https://doi.org/10.1787/d434f82b-en

Parrillo, V. N. (Ed.) (2008). Encyclopedia of Social Problems. Los Angeles: Sage. https://doi.org/10.4135/9781412963930

Pérez, L. (2013). Illocutionary constructions: (Multiple source)-in-target metonymies, illocutionary ICMs, and specification link. Language & Communication, 33(2), 128-149. https://doi.org/10.1016/j.langcom.2013.02.001

Periñán-Pascual, C. (2023a). Exploring user-generated content to detect community problems: The ontological model of ALLEGRO. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - V. 2 (pp. 224-230). KEOD. https://doi.org/10.5220/0012203300003598

Periñán-Pascual, C. (2023b). From Smart City to Smart Society: A quality-of-life ontological model for problem detection from user-generated content. Applied Ontology, 18(3), 263-306. https://doi.org/10.3233/AO-230281

Periñán-Pascual, C. (2024a). Modelización de las quejas de los ciudadanos como artefactos digitales culturales: DIAPASON. In F. Olmo-Cazevieille (Ed.), Investigación Lingüística en Entornos Digitales (pp. 129-156). Valencia: Tirant Lo Blanch.

Periñán-Pascual, C. (2024b). Exploring problems through social media: The case of Beach Quality. In R. Jiménez-Briones & A. Corral Esteban (Eds.), Approaches to Knowledge Representation and Language (pp. 11-26). Granada: Comares.

Periñán-Pascual, C. (2024c). Minería de textos para investigadores lingüistas. Valencia: Tirant Lo Blanch.

Princeton University (2010). About WordNet. Retrieved from https://wordnet.princeton.edu/.

Rueda Estrada, J. D., & Martín Martín, F. J. (2011). El maltrato a personas mayores. Instrumentos para la detección del maltrato institucional. Alternativas, 18, 7-33. https://doi.org/10.14198/ALTERN2011.18.01

Sakaki, T., Okazaki, M., & Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25(4), 919-931. https://doi.org/10.1109/TKDE.2012.29

Searle, J. R. (1969). Speech Acts. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139173438

Seccombe, K., & Kornblum, W. (2020). Social Problems (16th ed.). Boston: Pearson.

Stefanowitsch, A. (2003). A construction-based approach to indirect speech acts. In K.U. Panther & L. Thornburg (Eds.), Metonymy and Pragmatic Inferencing (pp. 105-26). Amsterdam/ Philadelphia: John Benjamins. https://doi.org/10.1075/pbns.113.09ste

Trott, S. and Bergen, B. (2017). A theoretical model of indirect request comprehension. Proceedings of the AAAI Fall Symposium Series on Artificial Intelligence for Human-Robot Interaction (pp. 129-132). Arlington, VA.

Ureña Gómez-Moreno, P. (2024) Integrating corpus methodology into the construction of an intelligent crowdsensing system. In R. Jiménez-Briones & A. Corral Esteban (Eds.), Approaches to Knowledge Representation and Language (pp. 45-62). Granada: Comares.

Walker, R. W. (2022). Elderly Financial Abuse in New Zealand: Is the Law Sufficient? (Master’s dissertation). University of Canterbury. Retrieved from https://ir.canterbury.ac.nz/items/5f894749-d5d1-4f47-a6c0-511e9bd299b7.

Wang, D., Szymanski, B. K., Abdelzaher, T., Ji, H., & Kaplan, L. (2019). The age of social sensing. IEEE Computer, 52(1), 36-45. https://doi.org/10.1109/MC.2018.2890173

Descargas

Publicado

2025-01-31

Número

Sección

Artículos Nuevos

Artículos similares

También puede {advancedSearchLink} para este artículo.