1
|
Zinovyev A, Czerwinska U, Cantini L, Barillot E, Frahm KM, Shepelyansky DL. Collective intelligence defines biological functions in Wikipedia as communities in the hidden protein connection network. PLoS Comput Biol 2020; 16:e1007652. [PMID: 32069277 PMCID: PMC7048313 DOI: 10.1371/journal.pcbi.1007652] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 02/28/2020] [Accepted: 01/13/2020] [Indexed: 11/23/2022] Open
Abstract
English Wikipedia, containing more than five millions articles, has approximately eleven thousands web pages devoted to proteins or genes most of which were generated by the Gene Wiki project. These pages contain information about interactions between proteins and their functional relationships. At the same time, they are interconnected with other Wikipedia pages describing biological functions, diseases, drugs and other topics curated by independent, not coordinated collective efforts. Therefore, Wikipedia contains a directed network of protein functional relations or physical interactions embedded into the global network of the encyclopedia terms, which defines hidden (indirect) functional proximity between proteins. We applied the recently developed reduced Google Matrix (REGOMAX) algorithm in order to extract the network of hidden functional connections between proteins in Wikipedia. In this network we discovered tight communities which reflect areas of interest in molecular biology or medicine and can be considered as definitions of biological functions shaped by collective intelligence. Moreover, by comparing two snapshots of Wikipedia graph (from years 2013 and 2017), we studied the evolution of the network of direct and hidden protein connections. We concluded that the hidden connections are more dynamic compared to the direct ones and that the size of the hidden interaction communities grows with time. We recapitulate the results of Wikipedia protein community analysis and annotation in the form of an interactive online map, which can serve as a portal to the Gene Wiki project.
Collapse
Affiliation(s)
- Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Urszula Czerwinska
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Laura Cantini
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
- Computational Systems Biology Team, Institut de Biologie de l’Ecole Normale Supérieure, CNRS UMR8197, INSERM U1024, Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Klaus M. Frahm
- Laboratoire de Physique Théorique, IRSAMC, Université de Toulouse, CNRS, UPS, F-31062 Toulouse, France
| | - Dima L. Shepelyansky
- Laboratoire de Physique Théorique, IRSAMC, Université de Toulouse, CNRS, UPS, F-31062 Toulouse, France
| |
Collapse
|