1
|
Arık SÖ, Shor J, Sinha R, Yoon J, Ledsam JR, Le LT, Dusenberry MW, Yoder NC, Popendorf K, Epshteyn A, Euphrosine J, Kanal E, Jones I, Li CL, Luan B, Mckenna J, Menon V, Singh S, Sun M, Ravi AS, Zhang L, Sava D, Cunningham K, Kayama H, Tsai T, Yoneoka D, Nomura S, Miyata H, Pfister T. A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan. NPJ Digit Med 2021; 4:146. [PMID: 34625656 PMCID: PMC8501040 DOI: 10.1038/s41746-021-00511-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/25/2021] [Indexed: 12/18/2022] Open
Abstract
The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We propose an AI-augmented forecast modeling framework that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases, and hospitalizations during the following 4 weeks. We present an international, prospective evaluation of our models' performance across all states and counties in the USA and prefectures in Japan. Nationally, incident mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths during prospective deployment remained consistently <8% (US) and <29% (Japan), while cumulative MAPE remained <2% (US) and <10% (Japan). We show that our models perform well even during periods of considerable change in population behavior, and are robust to demographic differences across different geographic locations. We further demonstrate that our framework provides meaningful explanatory insights with the models accurately adapting to local and national policy interventions. Our framework enables counterfactual simulations, which indicate continuing Non-Pharmaceutical Interventions alongside vaccinations is essential for faster recovery from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions.
Collapse
Affiliation(s)
- Sercan Ö Arık
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA.
| | - Joel Shor
- Google, Japan, Shibuya, 3-Chrome-21-3, Tokyo, Japan
| | | | - Jinsung Yoon
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | | | - Long T Le
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | | | | | | | | | | | - Elli Kanal
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | - Isaac Jones
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | - Chun-Liang Li
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | - Beth Luan
- Google, Japan, Shibuya, 3-Chrome-21-3, Tokyo, Japan
| | - Joe Mckenna
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | - Vikas Menon
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | | | - Mimi Sun
- Google Health, 1600 Amphitheatre Parkway, Mountain View, CA, USA
| | | | - Leyou Zhang
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | - Dario Sava
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| | | | | | - Thomas Tsai
- Harvard School of Public Health, 677 Huntington Ave, Boston, MA, USA
| | - Daisuke Yoneoka
- Department of Health Policy and Management, School of Medicine, Keio University, 35 Shinanomachi, Shinjuku-ku, Tokyo, Japan
- Division of Biostatistics and Bioinformatics, Graduate School of Public Health, St Luke's International University, 3-6-2 Tsukiji, Chuo-ku, Tokyo, Japan
| | - Shuhei Nomura
- Department of Health Policy and Management, School of Medicine, Keio University, 35 Shinanomachi, Shinjuku-ku, Tokyo, Japan
- Department of Global Health Policy, Graduate School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, Japan
| | - Hiroaki Miyata
- Department of Health Policy and Management, School of Medicine, Keio University, 35 Shinanomachi, Shinjuku-ku, Tokyo, Japan
- Department of Healthcare Quality Assessment, Graduate School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, Japan
| | - Tomas Pfister
- Google Cloud AI, 1170 Bordeaux Dr, Sunnyvale, CA, USA
| |
Collapse
|
3
|
Sakakibara Y, Hachiya T, Uchida M, Nagamine N, Sugawara Y, Yokota M, Nakamura M, Popendorf K, Komori T, Sato K. COPICAT: a software system for predicting interactions between proteins and chemical compounds. Bioinformatics 2012; 28:745-6. [PMID: 22257668 DOI: 10.1093/bioinformatics/bts031] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e. amino acid sequences of proteins and structure formulas of chemical compounds. In this article, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by >1000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY The COPICAT server is available at http://copicat.dna.bio.keio.ac.jp. All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively.
Collapse
Affiliation(s)
- Yasubumi Sakakibara
- Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Yokohama 223-8522, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Popendorf K, Tsuyoshi H, Osana Y, Sakakibara Y. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes. PLoS One 2010; 5:e12651. [PMID: 20885980 PMCID: PMC2945767 DOI: 10.1371/journal.pone.0012651] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Accepted: 08/06/2010] [Indexed: 12/24/2022] Open
Abstract
Background With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. Methodology/Principal Findings Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. Conclusions/Significance Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available under GPL at http://murasaki.sourceforge.net.
Collapse
Affiliation(s)
- Kris Popendorf
- Department of Biosciences and Informatics, Keio University, Yokohama, Japan
| | - Hachiya Tsuyoshi
- Department of Biosciences and Informatics, Keio University, Yokohama, Japan
| | - Yasunori Osana
- Department of Computer and Informatics Science, Seikei University, Musashino-shi, Tokyo, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, Yokohama, Japan
- * E-mail:
| |
Collapse
|
5
|
Shang WH, Hori T, Toyoda A, Kato J, Popendorf K, Sakakibara Y, Fujiyama A, Fukagawa T. Chickens possess centromeres with both extended tandem repeats and short non-tandem-repetitive sequences. Genome Res 2010; 20:1219-28. [PMID: 20534883 DOI: 10.1101/gr.106245.110] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The centromere is essential for faithful chromosome segregation by providing the site for kinetochore assembly. Although the role of the centromere is conserved throughout evolution, the DNA sequences associated with centromere regions are highly divergent among species and it remains to be determined how centromere DNA directs kinetochore formation. Despite the active use of chicken DT40 cells in studies of chromosome segregation, the sequence of the chicken centromere was unclear. Here, we performed a comprehensive analysis of chicken centromere DNA which revealed unique features of chicken centromeres compared with previously studied vertebrates. Centromere DNA sequences from the chicken macrochromosomes, with the exception of chromosome 5, contain chromosome-specific homogenous tandem repetitive arrays that span several hundred kilobases. In contrast, the centromeres of chromosomes 5, 27, and Z do not contain tandem repetitive sequences and span non-tandem-repetitive sequences of only approximately 30 kb. To test the function of these centromere sequences, we conditionally removed the centromere from the Z chromosome using genetic engineering and have shown that that the non-tandem-repeat sequence of chromosome Z is a functional centromere.
Collapse
Affiliation(s)
- Wei-Hao Shang
- Department of Molecular Genetics, National Institute of Genetics and The Graduate University for Advanced Studies (SOKENDAI), Mishima, Shizuoka 411-8540, Japan
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Nishito Y, Osana Y, Hachiya T, Popendorf K, Toyoda A, Fujiyama A, Itaya M, Sakakibara Y. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data. BMC Genomics 2010; 11:243. [PMID: 20398357 PMCID: PMC2867830 DOI: 10.1186/1471-2164-11-243] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2009] [Accepted: 04/16/2010] [Indexed: 11/21/2022] Open
Abstract
Background Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. Results We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. Conclusions The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.
Collapse
Affiliation(s)
- Yukari Nishito
- Department of Biosciences and Informatics, Keio University, Hiyoshi, Kohoku-ku, Yokohama, Japan
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Sakakibara Y, Popendorf K, Ogawa N, Asai K, Sato K. Stem kernels for RNA sequence analyses. J Bioinform Comput Biol 2008; 5:1103-22. [PMID: 17933013 DOI: 10.1142/s0219720007003028] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2007] [Revised: 07/08/2007] [Accepted: 07/09/2007] [Indexed: 11/18/2022]
Abstract
Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.
Collapse
Affiliation(s)
- Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan.
| | | | | | | | | |
Collapse
|