1
|
Day A. Exploratory analysis of text duplication in peer-review reveals peer-review fraud and paper mills. Scientometrics 2022. [DOI: 10.1007/s11192-022-04504-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
2
|
Abd-Elaal ES, Gamage SH, Mills JE. Assisting academics to identify computer generated writing. EUROPEAN JOURNAL OF ENGINEERING EDUCATION 2022; 47:725-745. [DOI: 10.1080/03043797.2022.2046709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 01/31/2022] [Indexed: 09/02/2023]
Affiliation(s)
- El-Sayed Abd-Elaal
- UniSA STEM, University of South Australia, South Australia, Australia
- Structural Engineering Department, Mansoura University, Mansoura, Egypt
| | | | - Julie E. Mills
- UniSA STEM, University of South Australia, South Australia, Australia
| |
Collapse
|
3
|
Cabanac G, Labbé C. Prevalence of nonsensical algorithmically generated papers in the scientific literature. J Assoc Inf Sci Technol 2021. [DOI: 10.1002/asi.24495] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Guillaume Cabanac
- Computer Science Department University of Toulouse, IRIT UMR 5505 CNRS Toulouse France
| | - Cyril Labbé
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG Grenoble France
| |
Collapse
|
4
|
Classification of abrupt changes along viewing profiles of scientific articles. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Millington T, Luz S. Analysis and Classification of Word Co-Occurrence Networks From Alzheimer’s Patients and Controls. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.649508] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In this paper we construct word co-occurrence networks from transcript data of controls and patients with potential Alzheimer’s disease using the ADReSS challenge dataset of spontaneous speech. We examine measures of the structure of these networks for significant differences, finding that networks from Alzheimer’s patients have a lower heterogeneity and centralization, but a higher edge density. We then use these measures, a network embedding method and some measures from the word frequency distribution to classify the transcripts into control or Alzheimer’s, and to estimate the cognitive test score of a participant based on the transcript. We find it is possible to distinguish between the AD and control networks on structure alone, achieving 66.7% accuracy on the test set, and to predict cognitive scores with a root mean squared error of 5.675. Using the network measures is more successful than using the network embedding method. However, if the networks are shuffled we find relatively few of the measures are different, indicating that word frequency drives many of the network properties. This observation is borne out by the classification experiments, where word frequency measures perform similarly to the network measures.
Collapse
|
6
|
Modha S, Majumder P, Mandl T. An empirical evaluation of text representation schemes to filter the social media stream. J EXP THEOR ARTIF IN 2021. [DOI: 10.1080/0952813x.2021.1907792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Sandip Modha
- Information Retrieval and Language Processing Lab, DA-IICT Gandhinagar, India
- Information Retrieval and Language Processing Lab, LDRP-ITR, Gandhinagar, India
| | - Prasenjit Majumder
- Information Retrieval and Language Processing Lab, DA-IICT Gandhinagar, India
| | - Thomas Mandl
- Information Retrieval and Language Processing Lab, DA-IICT Gandhinagar, India
- Information Retrieval and Language Processing Lab, University of Hildesheim, Hildesheim, Germany
| |
Collapse
|
7
|
|
8
|
Tlitova A, Toschev A, Talanov M, Kurnosov V. Meta-Analysis of Cross-Language Plagiarism and Self-Plagiarism Detection Methods for Russian-English Language Pair. FRONTIERS IN COMPUTER SCIENCE 2020. [DOI: 10.3389/fcomp.2020.523053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
9
|
Luo W, Lu N, Ni L, Zhu W, Ding W. Local community detection by the nearest nodes with greater centrality. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
10
|
Xia F, Liu J, Nie H, Fu Y, Wan L, Kong X. Random Walks: A Review of Algorithms and Applications. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2019.2952908] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
11
|
Brito ACM, Silva FN, Amancio DR. A complex network approach to political analysis: Application to the Brazilian Chamber of Deputies. PLoS One 2020; 15:e0229928. [PMID: 32191720 PMCID: PMC7081992 DOI: 10.1371/journal.pone.0229928] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 02/17/2020] [Indexed: 11/29/2022] Open
Abstract
In this paper, we introduce a network-based methodology to study how political entities evolve over time. We constructed networks of voting data from the Brazilian Chamber of Deputies, where deputies are nodes and edges are represented by voting similarity among deputies. The Brazilian Chamber of deputies is characterized by a multi-party political system. Thus, we would expect a broad spectrum of ideas to be represented. Our results, however, revealed that plurality of ideas is not present at all: the effective number of communities representing ideas based on agreement/disagreement in propositions is about 3 over the entire studied time span. The obtained results also revealed different patterns of coalitions between distinct parties. Finally, we also found signs of early party isolation before presidential impeachment proceedings effectively started. We believe that the proposed framework could be used to complement the study of political dynamics and even applied in similar social networks where individuals are organized in a complex manner.
Collapse
Affiliation(s)
| | - Filipi Nascimento Silva
- São Carlos Institute of Physics, University of São Paulo, São Carlos, SP, Brazil.,Indiana University Network Science Institute, Bloomington, Indiana, United States of America
| | - Diego Raphael Amancio
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, SP, Brazil
| |
Collapse
|
12
|
Frolov D, Nascimento S, Fenner T, Mirkin B. Parsimonious generalization of fuzzy thematic sets in taxonomies applied to the analysis of tendencies of research in data science. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.09.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
13
|
Ma T, Wang C, Wang J, Cheng J, Chen X. Particle-swarm optimization of ensemble neural networks with negative correlation learning for forecasting short-term wind speed of wind farms in western China. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.07.074] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Chun-Wei Lin J, Bendjoudi A. Exploiting GPU parallelism in improving bees swarm optimization for mining big transactional databases. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.06.060] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Document vectorization method using network information of words. PLoS One 2019; 14:e0219389. [PMID: 31318881 PMCID: PMC6638850 DOI: 10.1371/journal.pone.0219389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Accepted: 06/21/2019] [Indexed: 12/02/2022] Open
Abstract
We propose a new method for vectorizing a document using the relational characteristics of the words in the document. For the relational characteristics, we use two types of relational information of a word: 1) the centrality measures of the word and 2) the number of times that the word is used with other words in the document. We propose these methods mainly because information regarding the relations of a word to other words in the document are likely to better represent the unique characteristics of the document than the frequency-based methods (e.g., term frequency and term frequency–inverse document frequency). In experiments using a corpus consisting of 14 documents pertaining to four different topics, the results of clustering analysis using cosine similarities between vectors of relational information for words were comparable to (and more accurate than in some cases) those obtained using vectors of frequency-based methods. The clustering analysis using vectors of tie weights between words yielded the most accurate result. Although the results obtained for the small dataset used in this study can hardly be generalized, they suggest that at least in some cases, vectorization of a document using the relational characteristics of the words can provide more accurate results than the frequency-based vectors.
Collapse
|
16
|
Abstract
One way to increase the understanding of texts by machines is through adding semantic information to lexical items by including metadata tags, a process also called semantic annotation. There are several semantic aspects that can be added to the words, among them the information about the nature of the concept denoted through the association with a category of an ontology. The application of ontologies in the annotation task can span multiple domains. However, this particular research focused its approach on top-level ontologies due to its generalizing characteristic. Considering that annotation is an arduous task that demands time and specialized personnel to perform it, much is done on ways to implement the semantic annotation automatically. The use of machine learning techniques are the most effective approaches in the annotation process. Another factor of great importance for the success of the training process of the supervised learning algorithms is the use of a sufficiently large corpus and able to condense the linguistic variance of the natural language. In this sense, this article aims to present an automatic approach to enrich documents from the American English corpus through a CRF model for semantic annotation of ontologies from Schema.org top-level. The research uses two approaches of the model obtaining promising results for the development of semantic annotation based on top-level ontologies. Although it is a new line of research, the use of top-level ontologies for automatic semantic enrichment of texts can contribute significantly to the improvement of text interpretation by machines.
Collapse
|
17
|
F. de Arruda H, Q. Marinho V, da F. Costa L, R. Amancio D. Paragraph-based representation of texts: A complex networks approach. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.12.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
18
|
Lima TS, de Arruda HF, Silva FN, Comin CH, Amancio DR, Costa LDF. The dynamics of knowledge acquisition via self-learning in complex networks. CHAOS (WOODBURY, N.Y.) 2018; 28:083106. [PMID: 30180654 DOI: 10.1063/1.5027007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 07/16/2018] [Indexed: 06/08/2023]
Abstract
Studies regarding knowledge organization and acquisition are of great importance to understand areas related to science and technology. A common way to model the relationship between different concepts is through complex networks. In such representations, networks' nodes store knowledge and edges represent their relationships. Several studies that considered this type of structure and knowledge acquisition dynamics employed one or more agents to discover node concepts by walking on the network. In this study, we investigate a different type of dynamics adopting a single node as the "network brain." Such a brain represents a range of real systems such as the information about the environment that is acquired by a person and is stored in the brain. To store the discovered information in a specific node, the agents walk on the network and return to the brain. We propose three different dynamics and test them on several network models and on a real system, which is formed by journal articles and their respective citations. The results revealed that, according to the adopted walking models, the efficiency of self-knowledge acquisition has only a weak dependency on topology and search strategy.
Collapse
Affiliation(s)
- Thales S Lima
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Henrique F de Arruda
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Filipi N Silva
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Cesar H Comin
- Department of Computer Science, Federal University of São Carlos, São Carlos, São Paulo 13565-905, Brazil
| | - Diego R Amancio
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Luciano da F Costa
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| |
Collapse
|
19
|
Tien NM, Labbé C. Detecting automatically generated sentences with grammatical structure similarity. Scientometrics 2018. [DOI: 10.1007/s11192-018-2789-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Gruginskie LADS, Vaccaro GLR. Lawsuit lead time prediction: Comparison of data mining techniques based on categorical response variable. PLoS One 2018; 13:e0198122. [PMID: 29856787 PMCID: PMC5983432 DOI: 10.1371/journal.pone.0198122] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 05/14/2018] [Indexed: 11/18/2022] Open
Abstract
The quality of the judicial system of a country can be verified by the overall length time of lawsuits, or the lead time. When the lead time is excessive, a country’s economy can be affected, leading to the adoption of measures such as the creation of the Saturn Center in Europe. Although there are performance indicators to measure the lead time of lawsuits, the analysis and the fit of prediction models are still underdeveloped themes in the literature. To contribute to this subject, this article compares different prediction models according to their accuracy, sensitivity, specificity, precision, and F1 measure. The database used was from TRF4—the Tribunal Regional Federal da 4a Região—a federal court in southern Brazil, corresponding to the 2nd Instance civil lawsuits completed in 2016. The models were fitted using support vector machine, naive Bayes, random forests, and neural network approaches with categorical predictor variables. The lead time of the 2nd Instance judgment was selected as the response variable measured in days and categorized in bands. The comparison among the models showed that the support vector machine and random forest approaches produced measurements that were superior to those of the other models. The evaluation of the models was made using k-fold cross-validation similar to that applied to the test models.
Collapse
Affiliation(s)
| | - Guilherme Luís Roehe Vaccaro
- Graduate Program in Production Engineering and Systems, Unisinos, São Leopoldo, Rio Grande do Sul, Brazil
- Graduate Program in Business and Management, Unisinos, Porto Alegre, Rio Grande do Sul, Brazil
| |
Collapse
|
21
|
|
22
|
Yu D, Wang W, Zhang S, Zhang W, Liu R. Hybrid self-optimized clustering model based on citation links and textual features to detect research topics. PLoS One 2017; 12:e0187164. [PMID: 29077747 PMCID: PMC5659815 DOI: 10.1371/journal.pone.0187164] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/14/2017] [Indexed: 11/18/2022] Open
Abstract
The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify "core documents". First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.
Collapse
Affiliation(s)
- Dejian Yu
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| | - Wanru Wang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
- * E-mail:
| | - Shuai Zhang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| | - Wenyu Zhang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| | - Rongyu Liu
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| |
Collapse
|
23
|
Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines. Scientometrics 2016. [DOI: 10.1007/s11192-016-2209-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
24
|
de Arruda HF, Costa LDF, Amancio DR. Topic segmentation via community detection in complex networks. CHAOS (WOODBURY, N.Y.) 2016; 26:063120. [PMID: 27368785 DOI: 10.1063/1.4954215] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Many real systems have been modeled in terms of network concepts, and written texts are a particular example of information networks. In recent years, the use of network methods to analyze language has allowed the discovery of several interesting effects, including the proposition of novel models to explain the emergence of fundamental universal patterns. While syntactical networks, one of the most prevalent networked models of written texts, display both scale-free and small-world properties, such a representation fails in capturing other textual features, such as the organization in topics or subjects. We propose a novel network representation whose main purpose is to capture the semantical relationships of words in a simple way. To do so, we link all words co-occurring in the same semantic context, which is defined in a threefold way. We show that the proposed representations favor the emergence of communities of semantically related words, and this feature may be used to identify relevant topics. The proposed methodology to detect topics was applied to segment selected Wikipedia articles. We found that, in general, our methods outperform traditional bag-of-words representations, which suggests that a high-level textual representation may be useful to study the semantical features of texts.
Collapse
Affiliation(s)
- Henrique F de Arruda
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, São Paulo, Brazil
| | - Luciano da F Costa
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo, Brazil
| | - Diego R Amancio
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, São Paulo, Brazil
| |
Collapse
|
25
|
Silva FN, Amancio DR, Bardosova M, Costa LDF, Oliveira ON. Using network science and text analytics to produce surveys in a scientific topic. J Informetr 2016. [DOI: 10.1016/j.joi.2016.03.008] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|