1
|
Robson B, Baek O. An ontology for very large numbers of longitudinal health records to facilitate data mining and machine learning. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
|
2
|
Robson B, St Clair J. Principles of Quantum Mechanics for Artificial Intelligence in medicine. Discussion with reference to the Quantum Universal Exchange Language (Q-UEL). Comput Biol Med 2022; 143:105323. [PMID: 35240388 DOI: 10.1016/j.compbiomed.2022.105323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 01/30/2022] [Accepted: 02/13/2022] [Indexed: 11/22/2022]
Abstract
This paper reviews some basic principles of Quantum Mechanics, Quantum Computing, and Artificial Intelligence in terms of a specific unifying theme. This theme relates to the hyperbolic or split-complex imaginary numbers and their equivalent matrices, rediscovered by Dirac, and the underlying mathematics of the previously described Q-UEL language based on them. Hyperbolic imaginary numbers h have the property hh = +1: contrast the more familiar i such that ii = -1. Examples of analogous matrices include that for the Hadamard gate as used in quantum computing and the Pauli spin matrices, and all Hermitian matrices of interest in quantum computing can readily be derived from these. They also relate to Dirac dualization, spinor projectors of Quantum Field Theory, the non-wave-like part of quantum theory, collapse of the wave function, and a dualized form of classical probability theory that has advantages in automated reasoning for medicine.
Collapse
Affiliation(s)
- Barry Robson
- The Dirac Foundation, Oxfordshire, UK; Ingine Inc, USA.
| | - Jim St Clair
- Linux Foundation Public Health, San Franciso, USA
| |
Collapse
|
3
|
Robson B. Towards faster response against emerging epidemics and prediction of variants of concern. INFORMATICS IN MEDICINE UNLOCKED 2022; 31:100966. [PMID: 35611320 PMCID: PMC9119712 DOI: 10.1016/j.imu.2022.100966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/05/2022] [Accepted: 05/11/2022] [Indexed: 01/11/2023] Open
Abstract
The author, the journal, Computers in Biology and Medicine (CBM), and Elsevier Press more generally, played a helpful very early role in responding to COVID-19. Within a few days of the appearance of the "Wuhan Seafood isolate" genome on GenBank, a bioinformatics study was posted by the present author in ResearchGate in January 2020, "Preliminary Bioinformatics Studies on the Design of Synthetic Vaccines and Preventative Peptidomimetic Antagonists against the Wuhan Seafood Market Coronavirus. Possible Importance of the KRSFIEDLLFNKV Motif" DOI: 10.13140/RG.2.2.18275.09761. On February 2nd, 2020, a more thorough analysis was submitted to CBM, e-published on February 26, and formally published in April 2020, at about the same time as the virus named as 2019n-CoV was identified as essentially SARS and renames SARS-COV-2. This was followed by four further papers describing in more detail some previously unreported aspects of the early investigation. The speed of research and writing of the papers was made possible by knowledge-gathering tools. Based on this and earlier experiences with fast responses to emerging epidemics such as HIV and Mad Cow Disease, it is possible to envisage the nature of a speedier response to emerging epidemics and new variants of concern in established epidemics.
Collapse
Affiliation(s)
- B Robson
- Ingine Inc., Cleveland, Ohio, USA.,The Dirac Foundation, Oxfordshire, UK
| |
Collapse
|
4
|
Searching for the principles of a less artificial A.I. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
5
|
Robson B, Boray S, Weisman J. Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability. Comput Biol Med 2021; 141:105118. [PMID: 34971979 DOI: 10.1016/j.compbiomed.2021.105118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/18/2021] [Accepted: 12/02/2021] [Indexed: 11/03/2022]
Abstract
There are many difficulties in extracting and using knowledge for medical analytic and predictive purposes from Real-World Data, even when the data is already well structured in the manner of a large spreadsheet. Preparative curation and standardization or "normalization" of such data involves a variety of chores but underlying them is an interrelated set of fundamental problems that can in part be dealt with automatically during the datamining and inference processes. These fundamental problems are reviewed here and illustrated and investigated with examples. They concern the treatment of unknowns, the need to avoid independency assumptions, and the appearance of entries that may not be fully distinguished from each other. Unknowns include errors detected as implausible (e.g., out of range) values that are subsequently converted to unknowns. These problems are further impacted by high dimensionality and problems of sparse data that inevitably arise from high-dimensional datamining even if the data is extensive. All these considerations are different aspects of incomplete information, though they also relate to problems that arise if care is not taken to avoid or ameliorate consequences of including the same information twice or more, or if misleading or inconsistent information is combined. This paper addresses these aspects from a slightly different perspective using the Q-UEL language and inference methods based on it by borrowing some ideas from the mathematics of quantum mechanics and information theory. It takes the view that detection and correction of probabilistic elements of knowledge subsequently used in inference need only involve testing and correction so that they satisfy certain extended notions of coherence between probabilities. This is by no means the only possible view, and it is explored here and later compared with a related notion of consistency.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc, Ohio, USA; The Dirac Foundation, Oxfordshire, UK.
| | | | - J Weisman
- The Dirac Foundation, Oxfordshire, UK.
| |
Collapse
|
6
|
Robson B. Testing machine learning techniques for general application by using protein secondary structure prediction. A brief survey with studies of pitfalls and benefits using a simple progressive learning approach. Comput Biol Med 2021; 138:104883. [PMID: 34598067 DOI: 10.1016/j.compbiomed.2021.104883] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/05/2021] [Accepted: 09/17/2021] [Indexed: 01/05/2023]
Abstract
Many researchers have recently used the prediction of protein secondary structure (local conformational states of amino acid residues) to test advances in predictive and machine learning technology such as Neural Net Deep Learning. Protein secondary structure prediction continues to be a helpful tool in research in biomedicine and the life sciences, but it is also extremely enticing for testing predictive methods such as neural nets that are intended for different or more general purposes. A complication is highlighted here for researchers testing their methods for other applications. Modern protein databases inevitably contain important clues to the answer, so-called "strong buried clues", though often obscurely; they are hard to avoid. This is because most proteins or parts of proteins in a modern protein data base are related to others by biological evolution. For researchers developing machine learning and predictive methods, this can overstate and so confuse understanding of the true quality of a predictive method. However, for researchers using the algorithms as tools, understanding strong buried clues is of great value, because they need to make maximum use of all information available. A simple method related to the GOR methods but with some features of neural nets in the sense of progressive learning of large numbers of weights, is used to explore this. It can acquire tens of millions and hence gigabytes of weights, but they are learned stably by exhaustive sampling. The significance of the findings is discussed in the light of promising recent results from AlphaFold using Google's DeepMind.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Ohio, USA and the Dirac Foundation Oxfordshire, UK.
| |
Collapse
|
7
|
Robson B. The use of knowledge management tools in viroinformatics. Example study of a highly conserved sequence motif in Nsp3 of SARS-CoV-2 as a therapeutic target. Comput Biol Med 2020; 125:103963. [PMID: 32828990 PMCID: PMC7424310 DOI: 10.1016/j.compbiomed.2020.103963] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 08/07/2020] [Accepted: 08/07/2020] [Indexed: 12/16/2022]
Abstract
Knowledge management tools that assist in systematic review and exploration of scientific knowledge generally are of obvious potential importance in evidence based medicine in general, but also to the design of therapeutics based on the protein subsequences and fold motifs of virus proteins as considered here. Rapid access to bundles (clusters) of related elements of knowledge gathered from diverse sources on the Internet and from growing knowledge repositories seem particularly helpful when exploring less obvious therapeutic targets in viruses (for which knowledge new to the researcher is important), and when using the following concept. Subsequences of amino acid residue sequences of proteins that are conserved across strains and species are (a) more likely to be important targets and (b) less likely to exhibit escape mutations that would make them resistant to vaccines and therapeutic agents. However, the terms "conserved" and even "highly conserved" used by authors are matters of degree, depending on how distant from SARS-CoV-2 they wished to go in comparing other sequences. The binding site to the human ACE2 protein as virus receptor and human antibody CR3022 binding site on the spike glycoprotein are rather variable by the criteria used in the present and preceding studies. To look for more strongly conserved targets, open reading frames of SARS-CoV-2 were examined for extremely highly conserved regions, meaning recognizable across many viruses and organisms. Most prominent is a motif found in SARS-CoV-2 non-structural protein 3 (Nsp3). It relates to a fold called type called the macro domain and has remarkably wide distribution across organisms including humans with significant homologies involving three especially conserved subsequences (a) VVVNAANVYLKHGGGVAGALNK, (b) LHVVGPNVNKG, and (c) PLLSAGIFG. Careful study of the variations of these and of the more variable sequences between and around them might provide a finer "scalpel" to ensure inhibition of a vital function of the virus without impairing the functions of related host macro domains.
Collapse
Affiliation(s)
- B. Robson
- Ingine Inc., Cleveland, OH, USA,The Dirac Foundation, Oxfordshire, UK
| |
Collapse
|
8
|
Robson B. Bioinformatics studies on a function of the SARS-CoV-2 spike glycoprotein as the binding of host sialic acid glycans. Comput Biol Med 2020; 122:103849. [PMID: 32658736 PMCID: PMC7278709 DOI: 10.1016/j.compbiomed.2020.103849] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 06/04/2020] [Accepted: 06/04/2020] [Indexed: 02/08/2023]
Abstract
SARS-CoV and SARS-CoV-2 do not appear to have functions of a hemagglutinin and neuraminidase. This is a mystery, because sugar binding activities appear essential to many other viruses including influenza and even most other coronaviruses in order to bind to and escape from the glycans (sugars, oligosaccharides or polysaccharides) characteristic of cell surfaces and saliva and mucin. The S1 N terminal Domains (S1-NTD) of the spike protein, largely responsible for the bulk of the characteristic knobs at the end of the spikes of SARS-CoV and SARS-CoV-2, are here predicted to be “hiding” sites for recognizing and binding glycans containing sialic acid. This may be important for infection and the ability of the virus to locate ACE2 as its known main host cell surface receptor, and if so it becomes a pharmaceutical target. It might even open up the possibility of an alternative receptor to ACE2. The prediction method developed, which uses amino acid residue sequence alone to predict domains or proteins that bind to sialic acids, is naïve, and will be advanced in future work. Nonetheless, it was surprising that such a very simple approach was so useful, and it can easily be reproduced in a very few lines of computer program to help make quick comparisons between SARS-CoV-2 sequences and to consider the effects of viral mutations. This paper extends the studies of the author's previous SARS-CoV-2 papers. Designing vaccine and drugs must seek to avoid escape mutations. Strangely, SARS-CoV and SARS-CoV-2 appear to lack sialic acid binding functions. Sequence motifs are found, but they require a simple prediction method.
Collapse
Affiliation(s)
- B Robson
- Ingine Inc. Cleveland Ohio USA and the Dirac Foundation, Oxfordshire, UK.
| |
Collapse
|
9
|
Robson B. Computers and viral diseases. Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the SARS-CoV-2 (2019-nCoV, COVID-19) coronavirus. Comput Biol Med 2020; 119:103670. [PMID: 32209231 PMCID: PMC7094376 DOI: 10.1016/j.compbiomed.2020.103670] [Citation(s) in RCA: 126] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Revised: 02/17/2020] [Accepted: 02/17/2020] [Indexed: 12/19/2022]
Abstract
This paper concerns study of the genome of the Wuhan Seafood Market isolate believed to represent the causative agent of the disease COVID-19. This is to find a short section or sections of viral protein sequence suitable for preliminary design proposal for a peptide synthetic vaccine and a peptidomimetic therapeutic, and to explore some design possibilities. The project was originally directed towards a use case for the Q-UEL language and its implementation in a knowledge management and automated inference system for medicine called the BioIngine, but focus here remains mostly on the virus itself. However, using Q-UEL systems to access relevant and emerging literature, and to interact with standard publically available bioinformatics tools on the Internet, did help quickly identify sequences of amino acids that are well conserved across many coronaviruses including 2019-nCoV. KRSFIEDLLFNKV was found to be particularly well conserved in this study and corresponds to the region around one of the known cleavage sites of the SARS virus that are believed to be required for virus activation for cell entry. This sequence motif and surrounding variations formed the basis for proposing a specific synthetic vaccine epitope and peptidomimetic agent. The work can, nonetheless, be described in traditional bioinformatics terms, and readily reproduced by others, albeit with the caveat that new data and research into 2019-nCoV is emerging and evolving at an explosive pace. Preliminary studies using molecular modeling and docking, and in that context the potential value of certain known herbal extracts, are also described. Bioinformatics studies are carried out on the COVID-19 virus. A sequence motif KRSFIEDLLFNKV is of particular interest. Based on the above, synthetic peptides are designed. Preliminary considerations are also given to non-peptide organic molecules.
Collapse
Affiliation(s)
- B Robson
- Ingine Inc., Cleveland, Ohio, USA; The Dirac Foundation, Oxfordshire, UK.
| |
Collapse
|
10
|
Robson B. Extension of the Quantum Universal Exchange Language to precision medicine and drug lead discovery. Preliminary example studies using the mitochondrial genome. Comput Biol Med 2020; 117:103621. [PMID: 32072972 DOI: 10.1016/j.compbiomed.2020.103621] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 01/12/2020] [Accepted: 01/12/2020] [Indexed: 12/21/2022]
Abstract
The Quantum Universal Exchange Language (Q-UEL) based on Dirac notation and algebra from quantum mechanics, along with its associated data mining and Hyperbolic Dirac Net (HDN) for probabilistic inference, has proven to be a useful architectural principle for knowledge management, analysis and prediction systems in medicine. It has been described in several papers; here is described its extension to clinical genomics and precision medicine. Two use cases are studied: (a) bioinformatics in clinical decision support especially for risk for type 2 diabetes using mitochondrial patient DNA sequences, and (b) bioinformatics and computational biology (conformational) research examples related to drug discovery involving the recently discovered class of mitochondrial derived peptides (MDPs). MDPs were surprising when first discovered as coded in small open reading frames (sORFs), and are emerging as having a fundamental role in metabolic control, longevity and disease. This project originally represented a language specification study relating to what information related to genomics is essential or useful to carry, and what processing will be needed. However, novel aspects introduced or discovered include the HDN-like neural nets and their use, along with more established methods, for prediction of type 2 diabetes, and in particular for proposals for over 80 natural MDPs most of which that have not previously been described at the time of the study, as potential drug lead targets. Also, use of many medical records with simulated joining of mtDNA as performance tests led to some insightful observations regarding the behavior of HDN predictions where independent factors are involved.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc., Delaware, USA; The Dirac Foundation, OxfordShire, UK.
| |
Collapse
|
11
|
Robson B, Boray S. Studies in the use of data mining, prediction algorithms, and a universal exchange and inference language in the analysis of socioeconomic health data. Comput Biol Med 2019; 112:103369. [PMID: 31377681 DOI: 10.1016/j.compbiomed.2019.103369] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/22/2019] [Accepted: 07/23/2019] [Indexed: 12/18/2022]
Abstract
While clinical and biomedical information in digital form has been escalating, it is socioeconomic factors that are important determinants of health on the national and global scale. We show how collective use of data mining and prediction algorithms to analyze socioeconomic population health data can stand beside classical correlation analysis in routine data analysis. The underlying theoretical basis is the Dirac notation and algebra that is a scientific standard but unusual outside of the physical sciences, combined with a theory of expected information first developed for analyzing sparse data but still largely confined to bioinformatics. The latter was important here because the records analyzed (which are for US counties and equivalents, not patients) are very few by contemporary data mining standards. The approach is very unlikely to be familiar to socioeconomic researchers, so the theory and the advantages of our inference nets over the Bayes Net are reviewed here, mostly using socioeconomic examples. While our expertise and focus is in regard to novel analytical methods rather than socioeconomics per se, a significant negative (countertrending) relationship between population health and equity was initially surprising, at least to the present authors. This encouraged deeper exploration including that of the relationship between our data mining methods and traditional Pearson's correlation. The latter is susceptible to giving wrong conclusions if a phenomenon called Simpson's paradox applies, so this is also investigated. Also discussed is that, even for very few records, associative data mining can still demand significant computational resources due to a combinatorial explosion.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Virginia, USA and the Dirac Foundation OxfordShire, UK.
| | - S Boray
- Ingine Inc. Virginia, USA and the Dirac Foundation OxfordShire, UK
| |
Collapse
|
12
|
Robson B. Bidirectional General Graphs for inference. Principles and implications for medicine. Comput Biol Med 2019; 108:382-399. [PMID: 31075569 DOI: 10.1016/j.compbiomed.2019.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 04/03/2019] [Accepted: 04/04/2019] [Indexed: 12/17/2022]
Abstract
Probabilistic inference methods require a more general and realistic description of the world as a Bidirectional General Graph (BGG). While in its original form the Bayes Net (BN) has been promoted as a predictive tool, it is more immediately a way of testing a hypothesis or model about interactions in a system usually considered on a causal basis. Once established, the model can be used in a predictive way, but the problem here is that for a traditional BN the hypotheses or models that can be formed are limited to the Directed Acyclic Graph (DAG) by definition. Three interrelated features are highlighted that represent deficiencies of the DAG which are corrected by conversion to a method based on a BGG: (i) lack of intrinsic representation of coherence by Bayes' rule, (ii) relatedly the need to consider interdependence in parent nodes, and (iii) the need for management of a property called recurrence. These deficiencies can represent large errors in absolute estimates of probabilities, and while relative and renormalized probabilities ameliorate that, they can often make much of a net superfluous through cancelations by division. The Hyperbolic Dirac Net (HDN) based on Dirac's quantum mechanics is a solution that led naturally to avoiding these deficiencies. It encodes bidirectional probabilities in an h-complex value rediscovered by Dirac, i.e. with the imaginary number h such that hh = +1. Properties of the HDN described previously are reviewed (though emphasis is on descriptions in familiar probability terms), the issue of recurrence is introduced, methods of construction are simplified, and the severity of the quantitative differences between BNs and analogous HDNs are exemplified. There is also discussion of how results compare with other approaches in practice.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Viginia, USA; The Dirac Foundation, OxfordShire, UK.
| |
Collapse
|
13
|
Studies in the extensively automatic construction of large odds-based inference networks from structured data. Examples from medical, bioinformatics, and health insurance claims data. Comput Biol Med 2018; 95:147-166. [PMID: 29500985 DOI: 10.1016/j.compbiomed.2018.02.013] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2018] [Revised: 02/19/2018] [Accepted: 02/19/2018] [Indexed: 12/11/2022]
Abstract
Theoretical and methodological principles are presented for the construction of very large inference nets for odds calculations, composed of hundreds or many thousands or more of elements, in this paper generated by structured data mining. It is argued that the usual small inference nets can sometimes represent rather simple, arbitrary estimates. Examples of applications in clinical and public health data analysis, medical claims data and detection of irregular entries, and bioinformatics data, are presented. Construction of large nets benefits from application of a theory of expected information for sparse data and the Dirac notation and algebra. The extent to which these are important here is briefly discussed. Purposes of the study include (a) exploration of the properties of large inference nets and a perturbation and tacit conditionality models, (b) using these to propose simpler models including one that a physician could use routinely, analogous to a "risk score", (c) examination of the merit of describing optimal performance in a single measure that combines accuracy, specificity, and sensitivity in place of a ROC curve, and (d) relationship to methods for detecting anomalous and potentially fraudulent data.
Collapse
|
14
|
Robson B. Studies in using a universal exchange and inference language for evidence based medicine. Semi-automated learning and reasoning for PICO methodology, systematic review, and environmental epidemiology. Comput Biol Med 2016; 79:299-323. [PMID: 27846446 DOI: 10.1016/j.compbiomed.2016.10.009] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 09/28/2016] [Accepted: 10/11/2016] [Indexed: 11/24/2022]
Abstract
The Q-UEL language of XML-like tags and the associated software applications are providing a valuable toolkit for Evidence Based Medicine (EBM). In this paper the already existing applications, data bases, and tags are brought together with new ones. The particular Q-UEL embodiment used here is the BioIngine. The main challenge is one of bringing together the methods of symbolic reasoning and calculative probabilistic inference that underlie EBM and medical decision making. Some space is taken to review this background. The unification is greatly facilitated by Q-UEL's roots in the notation and algebra of Dirac, and by extending Q-UEL into the Wolfram programming environment. Further, the overall problem of integration is also a relatively simple one because of the nature of Q-UEL as a language for interoperability in healthcare and biomedicine, while the notion of workflow is facilitated because of the EBM best practice known as PICO. What remains difficult is achieving a high degree of overall automation because of a well-known difficulty in capturing human expertise in computers: the Feigenbaum bottleneck.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Delaware, USA, and The Dirac Foundation Clg, Oxfordshire, UK; St. Matthew's University School of Medicine, Cayman Islands, UK.
| |
Collapse
|
15
|
Robson B, Boray S. Studies of the role of a smart web for precision medicine supported by biobanking. Per Med 2016; 13:361-380. [DOI: 10.2217/pme-2015-0012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Both the extraction of medical knowledge from data mining many patient records and from authoritative natural language text on the Internet are important for clinical decision support and biomedical research. The samples in biobanks represent a further kind of information repository of recognized increasing importance, so mechanisms being developed for a smart web for medicine should take them into account. While this paper is primarily a review of Quantum Universal Exchange Language as an XML extension to enable a future smart web for healthcare and biomedicine, it is the first time that we have discussed the connection with biobanks and the design of Quantum Universal Exchange Language's XML-like tags to support their use.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. 46581 Riverwood Terrace, Potomac Falls, VA 20165 AND DE, USA
- The Dirac Foundation clg, Oxfordshire, UK
- St Matthew's University, Grand Cayman, USA
- The University of Wisconsin Stout, USA
| | - Srinidhi Boray
- Ingine Inc. 46581 Riverwood Terrace, Potomac Falls, VA 20165 AND DE, USA
| |
Collapse
|
16
|
Robson B, Boray S. Data-mining to build a knowledge representation store for clinical decision support. Studies on curation and validation based on machine performance in multiple choice medical licensing examinations. Comput Biol Med 2016; 73:71-93. [PMID: 27089305 PMCID: PMC7094475 DOI: 10.1016/j.compbiomed.2016.02.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 02/05/2016] [Accepted: 02/17/2016] [Indexed: 11/23/2022]
Abstract
Extracting medical knowledge by structured data mining of many medical records and from unstructured data mining of natural language source text on the Internet will become increasingly important for clinical decision support. Output from these sources can be transformed into large numbers of elements of knowledge in a Knowledge Representation Store (KRS), here using the notation and to some extent the algebraic principles of the Q-UEL Web-based universal exchange and inference language described previously, rooted in Dirac notation from quantum mechanics and linguistic theory. In a KRS, semantic structures or statements about the world of interest to medicine are analogous to natural language sentences seen as formed from noun phrases separated by verbs, prepositions and other descriptions of relationships. A convenient method of testing and better curating these elements of knowledge is by having the computer use them to take the test of a multiple choice medical licensing examination. It is a venture which perhaps tells us almost as much about the reasoning of students and examiners as it does about the requirements for Artificial Intelligence as employed in clinical decision making. It emphasizes the role of context and of contextual probabilities as opposed to the more familiar intrinsic probabilities, and of a preliminary form of logic that we call presyllogistic reasoning.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc., DE, USA; The Dirac Foundation clg, Oxfordshire, UK; St. Matthew's University School of Medicine, Cayman Islands.
| | - Srinidhi Boray
- Ingine Inc., DE, USA; The Dirac Foundation clg, Oxfordshire, UK
| |
Collapse
|
17
|
Robson B, Boray S. Implementation of a web based universal exchange and inference language for medicine: Sparse data, probabilities and inference in data mining of clinical data repositories. Comput Biol Med 2015; 66:82-102. [PMID: 26386548 DOI: 10.1016/j.compbiomed.2015.07.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 07/08/2015] [Accepted: 07/17/2015] [Indexed: 11/19/2022]
Abstract
We extend Q-UEL, our universal exchange language for interoperability and inference in healthcare and biomedicine, to the more traditional fields of public health surveys. These are the type associated with screening, epidemiological and cross-sectional studies, and cohort studies in some cases similar to clinical trials. There is the challenge that there is some degree of split between frequentist notions of probability as (a) classical measures based only on the idea of counting and proportion and on classical biostatistics as used in the above conservative disciplines, and (b) more subjectivist notions of uncertainty, belief, reliability, or confidence often used in automated inference and decision support systems. Samples in the above kind of public health survey are typically small compared with our earlier "Big Data" mining efforts. An issue addressed here is how much impact on decisions should sparse data have. We describe a new Q-UEL compatible toolkit including a data analytics application DiracMiner that also delivers more standard biostatistical results, DiracBuilder that uses its output to build Hyperbolic Dirac Nets (HDN) for decision support, and HDNcoherer that ensures that probabilities are mutually consistent. Use is exemplified by participating in a real word health-screening project, and also by deployment in a industrial platform called the BioIngine, a cognitive computing platform for health management.
Collapse
Affiliation(s)
- Barry Robson
- The Dirac Foundation clg, Oxfordshire, UK; St. Matthew's University School of Medicine, Cayman Islands. http://www.diractfoundation.org
| | - Srinidhi Boray
- The Dirac Foundation clg, Oxfordshire, UK; Ingine Inc., Potomac Falls, VA 20165, USA. http://www.ingine.com
| |
Collapse
|
18
|
Robson B, Caruso TP, Balis UG. Suggestions for a web based universal exchange and inference language for medicine. Continuity of patient care with PCAST disaggregation. Comput Biol Med 2015; 56:51-66. [DOI: 10.1016/j.compbiomed.2014.10.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Revised: 10/23/2014] [Accepted: 10/25/2014] [Indexed: 10/24/2022]
|
19
|
Robson B. POPPER, a simple programming language for probabilistic semantic inference in medicine. Comput Biol Med 2015; 56:107-23. [DOI: 10.1016/j.compbiomed.2014.10.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 09/19/2014] [Accepted: 10/12/2014] [Indexed: 11/29/2022]
|