1
|
Stella M, Swanson TJ, Li Y, Hills TT, Teixeira AS. Cognitive networks detect structural patterns and emotional complexity in suicide notes. Front Psychol 2022; 13:917630. [PMID: 36570999 PMCID: PMC9773561 DOI: 10.3389/fpsyg.2022.917630] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open
Abstract
Communicating one's mindset means transmitting complex relationships between concepts and emotions. Using network science and word co-occurrences, we reconstruct conceptual associations as communicated in 139 genuine suicide notes, i.e., notes left by individuals who took their lives. We find that, despite their negative context, suicide notes are surprisingly positively valenced. Through emotional profiling, their ending statements are found to be markedly more emotional than their main body: The ending sentences in suicide notes elicit deeper fear/sadness but also stronger joy/trust and anticipation than the main body. Furthermore, by using data from the Emotional Recall Task, we model emotional transitions within these notes as co-occurrence networks and compare their structure against emotional recalls from mentally healthy individuals. Supported by psychological literature, we introduce emotional complexity as an affective analog of structural balance theory, measuring how elementary cycles (closed triads) of emotion co-occurrences mix positive, negative and neutral states in narratives and recollections. At the group level, authors of suicide narratives display a higher complexity than healthy individuals, i.e., lower levels of coherently valenced emotional states in triads. An entropy measure identified a similar tendency for suicide notes to shift more frequently between contrasting emotional states. Both the groups of authors of suicide notes and healthy individuals exhibit less complexity than random expectation. Our results demonstrate that suicide notes possess highly structured and contrastive narratives of emotions, more complex than expected by null models and healthy populations.
Collapse
Affiliation(s)
- Massimo Stella
- CogNosco Lab, Department of Computer Science, University of Exeter, Exeter, United Kingdom,*Correspondence: Massimo Stella
| | - Trevor J. Swanson
- Department of Psychology, University of Kansas, Lawrence, KS, United States
| | - Ying Li
- Max Planck Institute for Human Development, Berlin, Germany,Institute of Psychology, Chinese Academy of Sciences, Beijing, China,Ying Li
| | - Thomas T. Hills
- Department of Psychology, University of Warwick, Coventry, United Kingdom
| | - Andreia S. Teixeira
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal,INESC-ID, Lisbon, Portugal
| |
Collapse
|
2
|
Network-based prediction of the disclosure of ideation about self-harm and suicide in online counseling sessions. COMMUNICATIONS MEDICINE 2022; 2:156. [PMID: 36474010 PMCID: PMC9723576 DOI: 10.1038/s43856-022-00222-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND In psychological services, the transition to the disclosure of ideation about self-harm and suicide (ISS) is a critical point warranting attention. This study developed and tested a succinct descriptor to predict such transitions in an online synchronous text-based counseling service. METHOD We analyzed two years' worth of counseling sessions (N = 49,770) from Open Up, a 24/7 service in Hong Kong. Sessions from Year 1 (N = 20,618) were used to construct a word affinity network (WAN), which depicts the semantic relationships between words. Sessions from Year 2 (N = 29,152), including 1168 with explicit ISS, were used to train and test the downstream ISS prediction model. We divided and classified these sessions into ISS blocks (ISSBs), blocks prior to ISSBs (PISSBs), and non-ISS blocks (NISSBs). To detect PISSB, we adopted complex network approaches to examine the distance among different types of blocks in WAN. RESULTS Our analyses find that words within a block tend to form a module in WAN and that network-based distance between modules is a reliable indicator of PISSB. The proposed model yields a c-statistic of 0.79 in identifying PISSB. CONCLUSIONS This simple yet robust network-based model could accurately predict the transition point of suicidal ideation prior to its explicit disclosure. It can potentially improve the preparedness and efficiency of help-providers in text-based counseling services for mitigating self-harm and suicide.
Collapse
|
3
|
Rajabi E, Etminani K. Knowledge-graph-based explainable AI: A systematic review. J Inf Sci 2022. [DOI: 10.1177/01655515221112844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In recent years, knowledge graphs (KGs) have been widely applied in various domains for different purposes. The semantic model of KGs can represent knowledge through a hierarchical structure based on classes of entities, their properties, and their relationships. The construction of large KGs can enable the integration of heterogeneous information sources and help Artificial Intelligence (AI) systems be more explainable and interpretable. This systematic review examines a selection of recent publications to understand how KGs are currently being used in eXplainable AI systems. To achieve this goal, we design a framework and divide the use of KGs into four categories: extracting features, extracting relationships, constructing KGs, and KG reasoning. We also identify where KGs are mostly used in eXplainable AI systems (pre-model, in-model, and post-model) according to the aforementioned categories. Based on our analysis, KGs have been mainly used in pre-model XAI for feature and relation extraction. They were also utilised for inference and reasoning in post-model XAI. We found several studies that leveraged KGs to explain the XAI models in the healthcare domain.
Collapse
Affiliation(s)
- Enayat Rajabi
- Shannon School of Business, Cape Breton University, Canada
| | - Kobra Etminani
- Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Sweden
| |
Collapse
|
4
|
Lara-Martínez P, Obregón-Quintana B, Reyes-Manzano CF, López-Rodríguez I, Guzmán-Vargas L. A multiplex analysis of phonological and orthographic networks. PLoS One 2022; 17:e0274617. [PMID: 36107963 PMCID: PMC9477335 DOI: 10.1371/journal.pone.0274617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 08/31/2022] [Indexed: 11/24/2022] Open
Abstract
The study of natural language using a network approach has made it possible to characterize novel properties ranging from the level of individual words to phrases or sentences. A natural way to quantitatively evaluate similarities and differences between spoken and written language is by means of a multiplex network defined in terms of a similarity distance between words. Here, we use a multiplex representation of words based on orthographic or phonological similarity to evaluate their structure. We report that from the analysis of topological properties of networks, there are different levels of local and global similarity when comparing written vs. spoken structure across 12 natural languages from 4 language families. In particular, it is found that differences between the phonetic and written layers is markedly higher for French and English, while for the other languages analyzed, this separation is relatively smaller. We conclude that the multiplex approach allows us to explore additional properties of the interaction between spoken and written language.
Collapse
Affiliation(s)
- Pablo Lara-Martínez
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México, México
| | | | - C. F. Reyes-Manzano
- Tecnológico Nacional de México, Tecnológico de Estudios Superiores de Ixtapaluca, Ixtapaluca, Estado de México, México
| | - Irene López-Rodríguez
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México, México
| | - Lev Guzmán-Vargas
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México, México
- * E-mail:
| |
Collapse
|
5
|
Emotional profiling and cognitive networks unravel how mainstream and alternative press framed AstraZeneca, Pfizer and COVID-19 vaccination campaigns. Sci Rep 2022; 12:14445. [PMID: 36002554 PMCID: PMC9400577 DOI: 10.1038/s41598-022-18472-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 08/12/2022] [Indexed: 11/10/2022] Open
Abstract
COVID-19 vaccines have been largely debated by the press. To understand how mainstream and alternative media debated vaccines, we introduce a paradigm reconstructing time-evolving narrative frames via cognitive networks and natural language processing. We study Italian news articles massively re-shared on Facebook/Twitter (up to 5 million times), covering 5745 vaccine-related news from 17 news outlets over 8 months. We find consistently high trust/anticipation and low disgust in the way mainstream sources framed “vaccine/vaccino”. These emotions were crucially missing in alternative outlets. News titles from alternative sources framed “AstraZeneca” with sadness, absent in mainstream titles. Initially, mainstream news linked mostly “Pfizer” with side effects (e.g. “allergy”, “reaction”, “fever”). With the temporary suspension of “AstraZeneca”, negative associations shifted: Mainstream titles prominently linked “AstraZeneca” with side effects, while “Pfizer” underwent a positive valence shift, linked to its higher efficacy. Simultaneously, thrombosis and fearful conceptual associations entered the frame of vaccines, while death changed context, i.e. rather than hopefully preventing deaths, vaccines could be reported as potential causes of death, increasing fear. Our findings expose crucial aspects of the emotional narratives around COVID-19 vaccines adopted by the press, highlighting the need to understand how alternative and mainstream media report vaccination news.
Collapse
|
6
|
Shuang C, Tao R, Ying Q, Yang S. What we achieve on text extractive summarization based on graph? JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-220433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Dealing with the explosive growth of web sources on the Internet requires the use of efficient systems. Automatic text summarization is capable of addressing this issue. Recent years have seen remarkable success in the use of graph theory on text extractive summarization. However, the understanding of why and how they perform so well is still not clear. In this paper, we intend to seek a better understanding of graph models, which can benefit from graph extractive summarization. Additionally, analysis has been performed qualitatively with the graph models in the design of recent graph extractive summarization. Based on the knowledge acquired from the survey, our work could provide more clues for future research on extractive summarization.
Collapse
Affiliation(s)
- Chen Shuang
- Software College, Northeastern University, Hunnan District, Shenyang, China
| | - Ren Tao
- Software College, Northeastern University, Hunnan District, Shenyang, China
| | - Qv Ying
- Software College, Northeastern University, Hunnan District, Shenyang, China
| | - Shi Yang
- Software College, Northeastern University, Hunnan District, Shenyang, China
| |
Collapse
|
7
|
Cognitive Networks Extract Insights on COVID-19 Vaccines from English and Italian Popular Tweets: Anticipation, Logistics, Conspiracy and Loss of Trust. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6020052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science and AI-based image analysis. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between December 2020 and March 2021. One popular English tweet contained in our data set was liked around 495,000 times, highlighting how popular tweets could cognitively affect large parts of the population. We investigate both text and multimedia content in tweets and build a cognitive network of syntactic/semantic associations in messages, including emotional cues and pictures. This network representation indicates how online users linked ideas in social discourse and framed vaccines along specific semantic/emotional content. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in extremely popular English posts. Interestingly, these were absent in Italian messages. Popular tweets with images of people wearing face masks used language that lacked the trust and joy found in tweets showing people with no masks. This difference indicates a negative effect attributed to face-covering in social discourse. Behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like sad messages less. Both patterns indicate an interplay between emotions and content diffusion beyond sentiment. After its suspension in mid-March 2021, “AstraZeneca” was associated with trustful language driven by experts. After the deaths of a small number of vaccinated people in mid-March, popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust.
Collapse
|
8
|
Knowledge Source Rankings for Semi-Supervised Topic Modeling. INFORMATION 2022. [DOI: 10.3390/info13020057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Recent work suggests knowledge sources can be added into the topic modeling process to label topics and improve topic discovery. The knowledge sources typically consist of a collection of human-constructed articles, each describing a topic (article-topic) for an entire domain. However, these semisupervised topic models assume a corpus to contain topics on only a subset of a domain. Therefore, during inference, the model must consider which article-topics were theoretically used to generate the corpus. Since the knowledge sources tend to be quite large, the many article-topics considered slow down the inference process. The increase in execution time is significant, with knowledge source input greater than 103 becoming unfeasible for use in topic modeling. To increase the applicability of semisupervised topic models, approaches are needed to speed up the overall execution time. This paper presents a way of ranking knowledge source topics to satisfy the above goal. Our approach utilizes a knowledge source ranking, based on the PageRank algorithm, to determine the importance of an article-topic. By applying our ranking technique we can eliminate low scoring article-topics before inference, speeding up the overall process. Remarkably, this ranking technique can also improve perplexity and interpretability. Results show our approach to outperform baseline methods and significantly aid semisupervised topic models. In our evaluation, knowledge source rankings yield a 44% increase in topic retrieval f-score, a 42.6% increase in inter-inference topic elimination, a 64% increase in perplexity, a 30% increase in token assignment accuracy, a 20% increase in topic composition interpretability, and a 5% increase in document assignment interpretability over baseline methods.
Collapse
|
9
|
Oliva SZ, Oliveira-Ciabati L, Dezembro DG, Júnior MSA, de Carvalho Silva M, Pessotti HC, Pollettini JT. Text structuring methods based on complex network: a systematic review. Scientometrics 2021. [DOI: 10.1007/s11192-020-03785-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Lara-Martínez P, Obregón-Quintana B, Reyes-Manzano CF, López-Rodríguez I, Guzmán-Vargas L. Comparing phonological and orthographic networks: A multiplex analysis. PLoS One 2021; 16:e0245263. [PMID: 33524013 PMCID: PMC7850493 DOI: 10.1371/journal.pone.0245263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 12/26/2020] [Indexed: 11/17/2022] Open
Abstract
The complexity of natural language can be explored by means of multiplex analyses at different scales, from single words to groups of words or sentence levels. Here, we plan to investigate a multiplex word-level network, which comprises an orthographic and a phonological network defined in terms of distance similarity. We systematically compare basic structural network properties to determine similarities and differences between them, as well as their combination in a multiplex configuration. As a natural extension of our work, we plan to evaluate the preservation of the structural network properties and information-based quantities from the following perspectives: (i) presence of similarities across 12 natural languages from 4 linguistic families (Romance, Germanic, Slavic and Uralic), (ii) increase of the size of the number of words (corpus) from 104 to 50 × 103, and (iii) robustness of the networks. Our preliminary findings reinforce the idea of common organizational properties among natural languages. Once concluded, will contribute to the characterization of similarities and differences in the orthographic and phonological perspectives of language networks at a word-level.
Collapse
Affiliation(s)
- Pablo Lara-Martínez
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México, México
| | | | - Cesar F. Reyes-Manzano
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México, México
| | - Irene López-Rodríguez
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México, México
| | - Lev Guzmán-Vargas
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México, México
| |
Collapse
|
11
|
Wu Y, Zhao S, Guo R. A novel community answer matching approach based on phrase fusion heterogeneous information network. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2020.102408] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
12
|
A neural knowledge graph evaluator: Combining structural and semantic evidence of knowledge graphs for predicting supportive knowledge in scientific QA. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102309] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Stella M. Text-mining forma mentis networks reconstruct public perception of the STEM gender gap in social media. PeerJ Comput Sci 2020; 6:e295. [PMID: 33816946 PMCID: PMC7924458 DOI: 10.7717/peerj-cs.295] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/17/2020] [Indexed: 06/12/2023]
Abstract
Mindset reconstruction maps how individuals structure and perceive knowledge, a map unfolded here by investigating language and its cognitive reflection in the human mind, i.e., the mental lexicon. Textual forma mentis networks (TFMN) are glass boxes introduced for extracting and understanding mindsets' structure (in Latin forma mentis) from textual data. Combining network science, psycholinguistics and Big Data, TFMNs successfully identified relevant concepts in benchmark texts, without supervision. Once validated, TFMNs were applied to the case study of distorted mindsets about the gender gap in science. Focusing on social media, this work analysed 10,000 tweets mostly representing individuals' opinions at the beginning of posts. "Gender" and "gap" elicited a mostly positive, trustful and joyous perception, with semantic associates that: celebrated successful female scientists, related gender gap to wage differences, and hoped for a future resolution. The perception of "woman" highlighted jargon of sexual harassment and stereotype threat (a form of implicit cognitive bias) about women in science "sacrificing personal skills for success". The semantic frame of "man" highlighted awareness of the myth of male superiority in science. No anger was detected around "person", suggesting that tweets got less tense around genderless terms. No stereotypical perception of "scientist" was identified online, differently from real-world surveys. This analysis thus identified that Twitter discourse mostly starting conversations promoted a majorly stereotype-free, positive/trustful perception of gender disparity, aimed at closing the gap. Hence, future monitoring against discriminating language should focus on other parts of conversations like users' replies. TFMNs enable new ways for monitoring collective online mindsets, offering data-informed ground for policy making.
Collapse
|
14
|
Ramirez-Arellano A. Classification of Literary Works: Fractality and Complexity of the Narrative, Essay, and Research Article. ENTROPY 2020; 22:e22080904. [PMID: 33286673 PMCID: PMC7848887 DOI: 10.3390/e22080904] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 08/11/2020] [Accepted: 08/15/2020] [Indexed: 11/18/2022]
Abstract
A complex network as an abstraction of a language system has attracted much attention during the last decade. Linguistic typological research using quantitative measures is a current research topic based on the complex network approach. This research aims at showing the node degree, betweenness, shortest path length, clustering coefficient, and nearest neighbourhoods’ degree, as well as more complex measures such as: the fractal dimension, the complexity of a given network, the Area Under Box-covering, and the Area Under the Robustness Curve. The literary works of Mexican writers were classify according to their genre. Precisely 87% of the full word co-occurrence networks were classified as a fractal. Also, empirical evidence is presented that supports the conjecture that lemmatisation of the original text is a renormalisation process of the networks that preserve their fractal property and reveal stylistic attributes by genre.
Collapse
Affiliation(s)
- Aldo Ramirez-Arellano
- Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria de Ingeniería y Ciencias Sociales y Administrativas, Instituto Politécnico Nacional, Ciudad de México 07738, Mexico
| |
Collapse
|
15
|
|
16
|
Forma Mentis Networks Reconstruct How Italian High Schoolers and International STEM Experts Perceive Teachers, Students, Scientists, and School. EDUCATION SCIENCES 2020. [DOI: 10.3390/educsci10010017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study investigates how students and researchers shape their knowledge and perception of educational topics. The mindset or forma mentis of 159 Italian high school students and of 59 international researchers in science, technology, engineering and maths (STEM) are reconstructed through forma mentis networks, i.e., cognitive networks of concepts connected by free associations and enriched with sentiment labels. The layout of conceptual associations between positively/negatively/neutrally perceived concepts is informative on how people build their own mental constructs or beliefs about specific topics. Researchers displayed mixed positive/neutral mental representations of “teacher”, “student” and, “scientist”. Students’ conceptual associations of “scientist” were highly positive and largely non-stereotypical, although links about the “mad scientist” stereotype persisted. Students perceived “teacher” as a complex figure, associated with positive aspects like mentoring/knowledge transmission but also to negative sides revolving around testing and grading. “School” elicited stronger differences between the two groups. In the students’ mindset, “school” was surrounded by a negative emotional aura or set of associations, indicating an anxious perception of the school setting, mixing scholastic concepts, anxiety-eliciting words, STEM disciplines like maths and physics, and exam-related notions. Researchers’ positive stance of “school” included concepts of fun, friendship, and personal growth instead. Along the perspective of Education Research, the above results are discussed as quantitative evidence for test- and STEM anxiety co-occurring in the way Italian students perceive education places and their actors. Detecting these patterns in student populations through forma mentis networks offers new, simple to gather yet detailed knowledge for future data-informed intervention policies and action research.
Collapse
|
17
|
Baeza-Blancas E, Obregón-Quintana B, Hernández-Gómez C, Gómez-Meléndez D, Aguilar-Velázquez D, Liebovitch LS, Guzmán-Vargas L. Recurrence Networks in Natural Languages. ENTROPY 2019; 21:e21050517. [PMID: 33267231 PMCID: PMC7515007 DOI: 10.3390/e21050517] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Revised: 05/14/2019] [Accepted: 05/17/2019] [Indexed: 11/16/2022]
Abstract
We present a study of natural language using the recurrence network method. In our approach, the repetition of patterns of characters is evaluated without considering the word structure in written texts from different natural languages. Our dataset comprises 85 ebookseBooks written in 17 different European languages. The similarity between patterns of length m is determined by the Hamming distance and a value r is considered to define a matching between two patterns, i.e., a repetition is defined if the Hamming distance is equal or less than the given threshold value r. In this way, we calculate the adjacency matrix, where a connection between two nodes exists when a matching occurs. Next, the recurrence network is constructed for the texts and some representative network metrics are calculated. Our results show that average values of network density, clustering, and assortativity are larger than their corresponding shuffled versions, while for metrics like such as closeness, both original and random sequences exhibit similar values. Moreover, our calculations show similar average values for density among languages which that belong to the same linguistic family. In addition, the application of a linear discriminant analysis leads to well-separated clusters of family languages based on based on the network-density properties. Finally, we discuss our results in the context of the general characteristics of written texts.
Collapse
Affiliation(s)
- Edgar Baeza-Blancas
- Departamento de Física, Escuela Superior de Física y Matemáticas, Ciudad de México 07738, Mexico
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México 07340, Mexico
| | | | | | - Domingo Gómez-Meléndez
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Zacatecas 98000, Mexico
| | - Daniel Aguilar-Velázquez
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México 07340, Mexico
| | - Larry S. Liebovitch
- Department of Physics, Queens College, City University of New York, New York, NY 11367, USA
- Advanced Consortium on Cooperation, Conflict, and Complexity (AC4), Earth Institute, Columbia University, New York, NY 10027, USA
- Graduate Center, City University of New York, New York, NY 10016, USA
| | - Lev Guzmán-Vargas
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Ciudad de México 07340, Mexico
- Correspondence: ; Tel.: +52-55-5729600 (ext. 56873)
| |
Collapse
|