1
|
Freites A, Corbett P, Rongier G, Geiger S. Automated Classification of Well Test Responses in Naturally Fractured Reservoirs Using Unsupervised Machine Learning. Transp Porous Media 2023. [DOI: 10.1007/s11242-023-01929-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
AbstractUnderstanding the impact of fractures on fluid flow is fundamental for developing geoenergy reservoirs. Pressure transient analysis could play a key role for fracture characterization purposes if better links can be established between the pressure derivative responses (p′) and the fracture properties. However, pressure transient analysis is particularly challenging in the presence of fractures because they can manifest themselves in many different p′ curves. In this work, we aim to provide a proof-of-concept machine learning approach that allows us to effectively handle the diversity in fracture-related p′ curves by automatically classifying them and identifying the characteristic fracture patterns. We created a synthetic dataset from numerical simulation that comprised 2560 p′ curves that represent a wide range of fracture network properties. We developed an unsupervised machine learning approach that can distinguish the temporal variations in the p′ curves by combining dynamic time warping with k-medoids clustering. Our results suggest that the approach is effective at recognizing similar shapes in the p′ curves if the second pressure derivatives are used as the classification variable. Our analysis indicated that 12 clusters were appropriate to describe the full collection of p′ curves in this particular dataset. The classification exercise also allowed us to identify the key geological features that influence the p′ curves in this particular dataset, namely (1) the distance from the wellbore to the closest fracture(s), (2) the local/global fracture connectivity, and (3) the local/global fracture intensity. With additional training data to account for a broader range of fracture network properties, the proposed classification method could be expanded to other naturally fractured reservoirs and eventually serve as an interpretation framework for understanding how complex fracture network properties impact pressure transient behaviour.
Collapse
|
2
|
Abstract
Characterisation of gender differences throughout peer-review publication process as revealed by thorough analysis of Royal Society of Chemistry submissions, publications and citation data.
The Royal Society of Chemistry is committed to investigating and addressing the barriers and biases which face women in the chemical sciences. The cornerstone of this is a thorough analysis of data regarding submissions, review and citations for Royal Society of Chemistry journals from January 2014 until July 2018, since the number and impact of publications and citations are an important factor when seeking research funding and for the progression of academic career. We have applied standard statistical techniques to multiple data sources to perform this analysis, and have investigated whether interactions between variables are significant in affecting various outcomes (author gender; reviewer gender; reviewer recommendations and submission outcome) in addition to considering variables individually. By considering several different data sources, we found that a baseline of approximately a third of chemistry researchers are female overall, although this differs considerably with Chemistry sub-discipline. Rather than one dominant bias effect, we observe complex interactions and a gradual trickle-down decrease in this female percentage through the publishing process and each of these female percentages is less than the last: authors of submissions; authors of RSC submissions which are not rejected without peer review; authors of accepted RSC publications; authors of cited articles. The success rate for female authors to progress through each of these publishing stages is lower than that for male authors. There is a decreasing female percentage when progressing through from first authors to corresponding authors to reviewers, reflecting the decreasing female percentage with seniority in Chemistry research observed in the “Diversity landscape of the chemical sciences” report. Highlights and actions from this analysis form the basis of an accompanying report to be released from the Royal Society of Chemistry.
Collapse
Affiliation(s)
- A E Day
- Royal Society of Chemistry , Thomas Graham House (290), Science Park, Milton Road , Cambridge , CB4 0WF , UK .
| | - P Corbett
- Royal Society of Chemistry , Thomas Graham House (290), Science Park, Milton Road , Cambridge , CB4 0WF , UK .
| | - J Boyle
- Royal Society of Chemistry , Thomas Graham House (290), Science Park, Milton Road , Cambridge , CB4 0WF , UK .
| |
Collapse
|
3
|
Abstract
Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks—a type of recurrent neural net. The second system eschews the rich feature set—and even tokenisation—in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).
Collapse
Affiliation(s)
- Peter Corbett
- Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK.
| | - John Boyle
- Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK
| |
Collapse
|
4
|
Corbett P, Boyle J. Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings. Database (Oxford) 2018; 2018:5053190. [PMID: 30010749 PMCID: PMC6044291 DOI: 10.1093/database/bay066] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 06/11/2018] [Accepted: 06/11/2018] [Indexed: 12/20/2022]
Abstract
In this paper, we explore the application of artificial neural network ('deep learning') methods to the problem of detecting chemical-protein interactions in PubMed abstracts. We present here a system using multiple Long Short Term Memory layers to analyse candidate interactions, to determine whether there is a relation and which type. A particular feature of our system is the use of unlabelled data, both to pre-train word embeddings and also pre-train LSTM layers in the neural network. On the BioCreative VI CHEMPROT test corpus, our system achieves an F score of 61.51% (56.10% precision, 67.84% recall).
Collapse
Affiliation(s)
- P Corbett
- Data Science Group, Technology Department, The Royal Society of Chemistry, Thomas Graham House (290), Science Park, Milton Road, Cambridge CB4 0WF, UK
| | - J Boyle
- Data Science Group, Technology Department, The Royal Society of Chemistry, Thomas Graham House (290), Science Park, Milton Road, Cambridge CB4 0WF, UK
| |
Collapse
|
5
|
Rebholz-Schuhmann D, Jimeno Yepes A, Li C, Kafkas S, Lewin I, Kang N, Corbett P, Milward D, Buyko E, Beisswanger E, Hornbostel K, Kouznetsov A, Witte R, Laurila JB, Baker CJ, Kuo CJ, Clematide S, Rinaldi F, Farkas R, Móra G, Hara K, Furlong LI, Rautschka M, Neves ML, Pascual-Montano A, Wei Q, Collier N, Chowdhury MFM, Lavelli A, Berlanga R, Morante R, Van Asch V, Daelemans W, Marina JL, van Mulligen E, Kors J, Hahn U. Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. J Biomed Semantics 2011; 2 Suppl 5:S11. [PMID: 22166494 PMCID: PMC3239301 DOI: 10.1186/2041-1480-2-s5-s11] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. Results All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I. The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. Conclusions The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.
Collapse
|
6
|
Rebholz-Schuhmann D, Jimeno Yepes AJ, Van Mulligen EM, Kang N, Kors J, Milward D, Corbett P, Buyko E, Beisswanger E, Hahn U. CALBC silver standard corpus. J Bioinform Comput Biol 2010; 8:163-79. [PMID: 20183881 DOI: 10.1142/s0219720010004562] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 12/08/2009] [Indexed: 11/18/2022]
Abstract
The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus ("silver standard" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization--formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents.
Collapse
|
7
|
Bennett SN, Olson JR, Kershner JL, Corbett P. Propagule pressure and stream characteristics influence introgression: cutthroat and rainbow trout in British Columbia. Ecol Appl 2010; 20:263-277. [PMID: 20349846 DOI: 10.1890/08-0441.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Hybridization and introgression between introduced and native salmonids threaten the continued persistence of many inland cutthroat trout species. Environmental models have been developed to predict the spread of introgression, but few studies have assessed the role of propagule pressure. We used an extensive set of fish Stocking records and geographic information system (GIS) data to produce a spatially explicit index of potential propagule pressure exerted by introduced rainbow trout in the Upper Kootenay River, British Columbia, Canada. We then used logistic regression and the information-theoretic approach to test the ability of a set of environmental and spatial variables to predict the level of introgression between native westslope cutthroat trout and introduced rainbow trout. Introgression was assessed using between four and seven co-dominant, diagnostic nuclear markers at 45 sites in 31 different streams. The best model for predicting introgression included our GIS propagule pressure index and an environmental variable that accounted for the biogeoclimatic zone of the site (r2=0.62). This model was 1.4 times more likely to explain introgression than the next-best model, which consisted of only the propagule pressure index variable. We created a composite model based on the model-averaged results of the seven top models that included environmental, spatial, and propagule pressure variables. The propagule pressure index had the highest importance weight (0.995) of all variables tested and was negatively related to sites with no introgression. This study used an index of propagule pressure and demonstrated that propagule pressure had the greatest influence on the level of introgression between a native and introduced trout in a human-induced hybrid zone.
Collapse
Affiliation(s)
- Stephen N Bennett
- Department of Watershed Sciences, Utah State University, 5210 Old Main Hill, Logan, Utah 84322-5210, USA.
| | | | | | | |
Collapse
|
8
|
Abstract
Background Chemical named entities represent an important facet of biomedical text. Results We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjustable threshold allows the system to be tuned to high precision or high recall. At a threshold set for balanced precision and recall, we were able to extract named entities at an F score of 80.7% from chemistry papers and 83.2% from PubMed abstracts. Furthermore, we were able to achieve 57.6% and 60.3% recall at 95% precision, and 58.9% and 49.1% precision at 90% recall. Conclusion These results show that chemical named entities can be extracted with good performance, and that the properties of the extraction can be tuned to suit the demands of the task.
Collapse
Affiliation(s)
- Peter Corbett
- Unilever Centre For Molecular Science Informatics, Chemical Laboratory, University of Cambridge, CB21EW UK.
| | | |
Collapse
|
9
|
Cheng T, Scott JN, Eigl B, Corbett P, McKinnon G. Clinical outcome of metastatic melanoma involving the CNS with or without (w/o) intracranial hemorrhage (IHC). J Clin Oncol 2007. [DOI: 10.1200/jco.2007.25.18_suppl.8577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
8577 Background: Melanoma has a high rate of ICH associated with CNS metastasis (mets). There is no data on the clinical outcome of these patients (pts) in the literature. Methods: A retrospective review of 52 pts with melanoma CNS mets diagnosed from 11/2001 to 09/2006 at our institution. Two pts were excluded due to a second malignancy. The clinical, radiological, and pathological characteristics of these pts were reviewed. Results: 39 pts were diagnosed after onset of neurological symptoms and 11 were diagnosed during staging. Of these, 14 (28%) were solitary, 23 (46%) were = 4, and 13 (26%) were multiple brain mets. Mean age at brain mets diagnosis was 55 years (range 24–88). Mean follow-up was 25 months (mos) (range 3–61) and 64% were male. ICH was found in 22 pts (44%) with 13 pts (26%) having frank hemorrhage on cranial CT and/or MRI scan. ICH was less common (9%) in pts with brain mets diagnosed on staging. 11 pts (8 with ICH, 3 w/o ICH) underwent CNS mets resection through craniotomy followed by whole brain radiation (WBRT) and/or stereotactic radiotherapy (STR). 7 pts received best supportive care. The remaining pts received WBRT and/or STR. The median survival (MS) for the entire group from CNS mets diagnosis was 7.6 mos. For pts with ICH, overall MS was 9.6 mos with MS of 21.2 mos for pts who underwent craniotomy vs 2.9 mos for pts w/o craniotomy. One pt refused therapy and later died from frank ICH. For pts w/o ICH, overall MS was 6.0 mos with MS of 21.3 mos for pts who underwent craniotomy vs 4.2 mos for pts w/o craniotomy. Pts who underwent craniotomy followed by WBRT and/or STR fared the best with MS of 21.2 mos. Of these, 2 pts with solitary brain mets (1 with ICH, 1 w/o ICH) were alive with no relapse at 58 and 53 mos respectively. The cause of death was disease progression in almost all pts and 85% of pts died from CNS disease progression. One pt died from post-operative complications of craniotomy. Conclusions: ICH by itself is not associated with a negative clinical outcome with appropriate clinical management. Aggressive surgical resection followed by radiotherapy in selected pts improves clinical outcome with prolonged survival possible in a minority of pts. No significant financial relationships to disclose.
Collapse
Affiliation(s)
- T. Cheng
- Tom Baker Cancer Ctr, Calgary, AB, Canada; University of Calgary, Calgary, AB, Canada
| | - J. N. Scott
- Tom Baker Cancer Ctr, Calgary, AB, Canada; University of Calgary, Calgary, AB, Canada
| | - B. Eigl
- Tom Baker Cancer Ctr, Calgary, AB, Canada; University of Calgary, Calgary, AB, Canada
| | - P. Corbett
- Tom Baker Cancer Ctr, Calgary, AB, Canada; University of Calgary, Calgary, AB, Canada
| | - G. McKinnon
- Tom Baker Cancer Ctr, Calgary, AB, Canada; University of Calgary, Calgary, AB, Canada
| |
Collapse
|
10
|
Lauzon L, Corbett P, Dunscombe P. SU-FF-T-333: Orthovoltage Dosimetry: Percent Depth Dose Multiplicity From Two Energies. Med Phys 2007. [DOI: 10.1118/1.2760996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
11
|
Abstract
Forest fire occurrence is affected by multiple controls that operate at local to regional scales. At the spatial scale of forest stands, regional climatic controls may be obscured by local controls (e.g., stochastic ignitions, topography, and fuel loads), but the long-term role of such local controls is poorly understood. We report here stand-scale (<100 ha) fire histories of the past 5000 years based on the analysis of sediment charcoal at two lakes 11 km apart in southeastern British Columbia. The two lakes are today located in similar subalpine forests, and they likely have experienced the same late-Holocene climatic changes because of their close proximity. We evaluated two independent properties of fire history: (1) fire-interval distribution, a measure of the overall incidence of fire, and (2) fire synchroneity, a measure of the co-occurrence of fire (here, assessed at centennial to millennial time scales due to the resolution of sediment records). Fire-interval distributions differed between the sites prior to, but not after, 2500 yr before present. When the entire 5000-yr period is considered, no statistical synchrony between fire-episode dates existed between the two sites at any temporal scale, but for the last 2500 yr marginal levels of synchrony occurred at centennial scales. Each individual fire record exhibited little coherency with regional climate changes. In contrast, variations in the composite record (average of both sites) matched variations in climate evidenced by late-Holocene glacial advances. This was probably due to the increased sample size and spatial extent represented by the composite record (up to 200 ha) plus increased regional climatic variability over the last several millennia, which may have partially overridden local, non-climatic controls. We conclude that (1) over past millennia, neighboring stands with similar modern conditions may have experienced different fire intervals and asynchronous patterns in fire episodes, likely because local controls outweighed the synchronizing effect of climate; (2) the influence of climate on fire occurrence is more strongly expressed when climatic variability is relatively great; and (3) multiple records from a region are essential if climate-fire relations are to be reliably described.
Collapse
Affiliation(s)
- Daniel G Gavin
- Department of Plant Biology, University of Illinois, Urbana, Illinois 61801, USA.
| | | | | | | |
Collapse
|
12
|
Corbett P, Howard J. Management of acute complications associated with sickle cell disease. Acute Med 2006; 5:8-12. [PMID: 21655499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Sickle cell disease is the commonest haemoglobinopathy within the United Kingdom. Although the majority of patients will present to hospitals within major cities, this is not invariably the case. It is therefore important that all physicians on acute medical take are familiar with the acute management of sickle cell disease. This review encompasses the initial management which is subdivided into analgesia, investigations and supportive care. In addition the more severe complications of sickle cell, including the acute chest syndrome and stroke are covered. It should be remembered that close collaboration is required with the haematology department, particularly in those patients with respiratory distress or stroke, so that prompt arrangements can be made if exchange transfusion is required.
Collapse
Affiliation(s)
- P Corbett
- MRCP MRCPath PhD Department of Haematology Brighton and Sussex University Hospitals NHS Trust Eastern Road Brighton BN2 5BE
| | | |
Collapse
|
13
|
|
14
|
|
15
|
Russell P, Bannatyne P, Shearman RP, Fraser IS, Corbett P. Premature hypergonadotropic ovarian failure: clinicopathological study of 19 cases. Int J Gynecol Pathol 1982; 1:185-201. [PMID: 6820952 DOI: 10.1097/00004347-198202000-00006] [Citation(s) in RCA: 53] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
During the 5-year period 1977-1982, 57 patients below 35 years of age with secondary amenorrhea were assessed for hypergonadotropic (primary) ovarian failure. The histological findings within the ovaries as well as pertinent clinical and laboratory correlates are described. Nineteen had diagnostic ovarian biopsies performed. The importance of this technique is stressed. The ovaries of 14 patients showed absence of primordial follicles (true premature menopause); three others showed "resistant ovary syndrome" characterized by the presence of primordial follicles but little or no follicular development (including a case of galactosemia, in which the associated ovarian failure has been ascribed to follicular atresia). The remaining two revealed florid chronic perifollicular inflammatory reactions in the presence of both primordial and also developing follicles--one lymphoplasmacytic and the other granulomatous. The former has been previously suggested as evidence of an autoimmune process, but the latter has not hitherto been reported.
Collapse
|
16
|
|