26
|
Janse RJ, Hoekstra T, Jager KJ, Zoccali C, Tripepi G, Dekker FW, van Diepen M. Conducting correlation analysis: important limitations and pitfalls. Clin Kidney J 2021; 14:2332-2337. [PMID: 34754428 PMCID: PMC8572982 DOI: 10.1093/ckj/sfab085] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
The correlation coefficient is a statistical measure often used in studies to show an association between variables or to look at the agreement between two methods. In this paper, we will discuss not only the basics of the correlation coefficient, such as its assumptions and how it is interpreted, but also important limitations when using the correlation coefficient, such as its assumption of a linear association and its sensitivity to the range of observations. We will also discuss why the coefficient is invalid when used to assess agreement of two methods aiming to measure a certain value, and discuss better alternatives, such as the intraclass coefficient and Bland-Altman's limits of agreement. The concepts discussed in this paper are supported with examples from literature in the field of nephrology.
Collapse
|
27
|
Khan A, Naeem M, Bilal M, Khan A, Subhan F, Ikram M, Shah MIA, Ullah S, Ullah A, Ullah A. Assessing the physico-chemical parameters and some metals of underground water and associated soil in the arid and semiarid regions of Tank District, Khyber Pakhtunkhwa, Pakistan. ENVIRONMENTAL MONITORING AND ASSESSMENT 2021; 193:610. [PMID: 34462828 DOI: 10.1007/s10661-021-09370-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 07/30/2021] [Indexed: 06/13/2023]
Abstract
Good-quality water and food are the basic needs of humans, plants, and animals. Polluted groundwater and soil directly and indirectly affect organisms, which is the main environmental concern. In the current study, standard protocols of atomic absorption spectrometry were adopted for the investigation of selected metals (lead, chromium, and iron) in the collected groundwater and soil samples. The Pearson correlation coefficient (r) applied to groundwater and soil samples shows a positive perfect correlation among water parameters (conductivity and total dissolved solids) in all three sources. In the hand pump samples between water table (WT) and water source depth (WSD), Pearson correlation coefficient (r) value was found (r = 0.87) while between EC and TDS, it was r = 1. Similarly, in the bore hole samples between WT and WSD (r = 0.66), EC and TDS (r = 1), EC and Cr (r = 0.70), and TDS and Cr (r = 0.70), which showed weaker correlation. In the tube well samples, correlation between EC and TDS was high (r = 1). The correlation coefficient (r) values of the soil parameters in the hand pump (soil) samples between Fe and Cr (r = 0.86), in bore hole samples between Fe and Cr (r = 0.77), in tube well samples between Fe and Cr (r = 0.69), while all the other parameter correlations were found lower (r = 0.60). Between electrical conductivity and total dissolved solids, high relation has been observed between them (r = 1). Overall, results showed that in most of the studied samples, contents of the target metals were found above the allowable limit set by the World Health Organization (WHO) and the United States Environmental Protection Agency (USEPA).
Collapse
|
28
|
Jang B, Kim I, Kim JW. Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles: Deep Learning Model Study. JMIR Med Inform 2021; 9:e23305. [PMID: 34032577 PMCID: PMC8188311 DOI: 10.2196/23305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 10/13/2020] [Accepted: 04/01/2021] [Indexed: 11/13/2022] Open
Abstract
Background Each year, influenza affects 3 to 5 million people and causes 290,000 to 650,000 fatalities worldwide. To reduce the fatalities caused by influenza, several countries have established influenza surveillance systems to collect early warning data. However, proper and timely warnings are hindered by a 1- to 2-week delay between the actual disease outbreaks and the publication of surveillance data. To address the issue, novel methods for influenza surveillance and prediction using real-time internet data (such as search queries, microblogging, and news) have been proposed. Some of the currently popular approaches extract online data and use machine learning to predict influenza occurrences in a classification mode. However, many of these methods extract training data subjectively, and it is difficult to capture the latent characteristics of the data correctly. There is a critical need to devise new approaches that focus on extracting training data by reflecting the latent characteristics of the data. Objective In this paper, we propose an effective method to extract training data in a manner that reflects the hidden features and improves the performance by filtering and selecting only the keywords related to influenza before the prediction. Methods Although word embedding provides a distributed representation of words by encoding the hidden relationships between various tokens, we enhanced the word embeddings by selecting keywords related to the influenza outbreak and sorting the extracted keywords using the Pearson correlation coefficient in order to solely keep the tokens with high correlation with the actual influenza outbreak. The keyword extraction process was followed by a predictive model based on long short-term memory that predicts the influenza outbreak. To assess the performance of the proposed predictive model, we used and compared a variety of word embedding techniques. Results Word embedding without our proposed sorting process showed 0.8705 prediction accuracy when 50.2 keywords were selected on average. Conversely, word embedding using our proposed sorting process showed 0.8868 prediction accuracy and an improvement in prediction accuracy of 12.6%, although smaller amounts of training data were selected, with only 20.6 keywords on average. Conclusions The sorting stage empowers the embedding process, which improves the feature extraction process because it acts as a knowledge base for the prediction component. The model outperformed other current approaches that use flat extraction before prediction.
Collapse
|
29
|
Satish Kumar K, Venkata Rathnam E, Sridhar V. Tracking seasonal and monthly drought with GRACE-based terrestrial water storage assessments over major river basins in South India. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 763:142994. [PMID: 33129527 DOI: 10.1016/j.scitotenv.2020.142994] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 09/30/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]
Abstract
Drought is a complex natural hazard that affects ecosystems and society in several ways and it is important to quantify drought at the river basin scale. Assessment of drought requires both hydrological observations and simulation models as the data are generally scarce. Therefore, we use remote sensing products to help understand drought conditions in four basins in South India. This study analysed the correlation among five drought indices for four seasons: gravity recovery and climate experiment - drought severity index (GRACE-DSI), standardized precipitation index (SPI), self-calibrated palmer drought severity index (sc_PDSI), standardized precipitation-evapotranspiration index (SPEI), and combined climatologic deviation index (CCDI) with GRACE terrestrial water storage anomalies (TWSA) using the Pearson correlation coefficient (r) from 2002 to 2016 over the Godavari, Krishna, Pennar, and Cauvery river basins. Basin scale drought events are evaluated using CCDI, GRACEDSI, sc_PDSI, SPI12, and SPEI12 at seasonal and monthly time scale. Characteristics of drought event analysis are calculated for CCDI monthly. The results showed that GRACE TWS is highly correlated with GRACE-DSI, CCDI, and sc_PDSI. Seasonally, high spatial correlations between CCDI and GRACE-DSI with GRACE TWS are evident for all the river basins. Additionally, correlation is found to exist between sc_PDSI and GRACE TWS as soil moisture content is an operating variable between them. The 12-month SPI and SPEI correlated better with GRACE TWS than the 3 and 6-month periods. Among the four basins, droughts in the Krishna Basin lasted 29 months, longer than in the rest of the basins between 2003 and 2005. Overall, CCDI and GRACE-DSI indices are found to be effective for examining and evaluating the drought conditions at the basin scale.
Collapse
|
30
|
Sengupta S, Mohinuddin S, Arif M. Spatiotemporal dynamics of temperature and precipitation with reference to COVID-19 pandemic lockdown: perspective from Indian subcontinent. ENVIRONMENT, DEVELOPMENT AND SUSTAINABILITY 2021; 23:13778-13818. [PMID: 33551671 PMCID: PMC7845794 DOI: 10.1007/s10668-021-01238-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/11/2021] [Indexed: 05/09/2023]
Abstract
ABSTRACT This study exclusively focuses on spatial and temporal change of temperature and precipitation before and after COVID-19 lockdown and also examines the extent of their variation and the spatial relationship between them. Our main objective is to analyze the spatiotemporal changes of two climatic variables in Indian subcontinent for the period of 2015-2020. Monthly precipitation and temperature data are collected from NOAA and NASA for January to May month across the four zones (northeast, northwest, central, and peninsular zone) of India. To conduct a zone-wise statistical analysis, we have adopted statistical process control (SPC) methods like exponentially weighted moving average (EWMA) control charts, individual charts (I- Chart) to detect the shift in temperature and precipitation over the study period and Pearson correlation coefficient applied to measure the spatial association between the two variables. The findings revealed that temperature parameter has experienced a lot of positive and negative trends in the span of 6 years and detected a weak to moderate negative correlation in many parts of the country in April 2020 after 2016. This study also identified a weak negative correlation mainly in NE zone in 2020 after 2017. This research provides vital scientific contribution to the effects of monthly temperature and precipitation before and after COVID-19 pandemic lockdown.
Collapse
|
31
|
Chang YS, Abimannan S, Chiao HT, Lin CY, Huang YP. An ensemble learning based hybrid model and framework for air pollution forecasting. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2020; 27:38155-38168. [PMID: 32621183 DOI: 10.1007/s11356-020-09855-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 06/22/2020] [Indexed: 06/11/2023]
Abstract
As advance of economy and industry, the impact of air pollution has gradually gained attention. In order to predict air quality, there were many studies that exploited various machine learning techniques to build predictive model for pollutant concentration or air quality prediction. However, enhancing the prediction performance always is the common problem of existing studies. Traditional templates based on machine learning and deep learning methods, such as GBTR (gradient boosted tree regression), SVR (support vector machine-based regression), and LSTM (long short-term memory), are most promising approaches to address these problems. Some previous researches showed that ensemble learning technology can improve predictive performance of other domains. In order to improve the accuracy of forecasting, in this paper, we propose a hybrid model and framework to improve the forecasting accuracy of air pollution. We not only exploit stacking-based ensemble learning scheme with Pearson correlation coefficient to calculate the correlation between different machine learning models to integrate various forecasting models together, but also construct a framework based on Spark+Hadoop machine learning and TensorFlow deep learning framework to physically integrate these models to demonstrate the next 1 to 8 h' air pollution forecasting. We also conduct experiments and compare the result with GBTR, SVR, LSTM, and LSTM2 (version 2) models to demonstrate the proposed hybrid model's predictive performance. The experimental results show that the hybrid model is superior to the existing models used for predicting air pollution.
Collapse
|
32
|
Fu W, Liu R, Wang H, Ali R, He Y, Cao Z, Qin Z. A Method of Multiple Dynamic Objects Identification and Localization Based on Laser and RFID. SENSORS 2020; 20:s20143948. [PMID: 32708565 PMCID: PMC7411997 DOI: 10.3390/s20143948] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 07/12/2020] [Accepted: 07/13/2020] [Indexed: 11/16/2022]
Abstract
In an indoor environment, object identification and localization are paramount for human-object interaction. Visual or laser-based sensors can achieve the identification and localization of the object based on its appearance, but these approaches are computationally expensive and not robust against the environment with obstacles. Radio Frequency Identification (RFID) has a unique tag ID to identify the object, but it cannot accurately locate it. Therefore, in this paper, the data of RFID and laser range finder are fused for the better identification and localization of multiple dynamic objects in an indoor environment. The main method is to use the laser range finder to estimate the radial velocities of objects in a certain environment, and match them with the object's radial velocities estimated by the RFID phase. The method also uses a fixed time series as "sliding time window" to find the cluster with the highest similarity of each RFID tag in each window. Moreover, the Pearson correlation coefficient (PCC) is used in the update stage of the particle filter (PF) to estimate the moving path of each cluster in order to improve the accuracy in a complex environment with obstacles. The experiments were verified by a SCITOS G5 robot. The results show that this method can achieve an matching rate of 90.18% and a localization accuracy of 0.33m in an environment with the presence of obstacles. This method effectively improves the matching rate and localization accuracy of multiple objects in indoor scenes when compared to the Bray-Curtis (BC) similarity matching-based approach as well as the particle filter-based approach.
Collapse
|
33
|
Weighted Mean Squared Deviation Feature Screening for Binary Features. ENTROPY 2020; 22:e22030335. [PMID: 33286109 PMCID: PMC7516793 DOI: 10.3390/e22030335] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 03/13/2020] [Accepted: 03/13/2020] [Indexed: 11/16/2022]
Abstract
In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near 0.5. In addition, the asymptotic properties of the proposed method are theoretically investigated under the assumption logp=o(n). The number of features is practically selected by a Pearson correlation coefficient method according to the property of power-law distribution. Lastly, an empirical study of Chinese text classification illustrates that the proposed method performs well when the dimension of selected features is relatively small.
Collapse
|
34
|
Hou MX, Gao YL, Liu JX, Shang J, Zhu R, Yuan SS. A new method for mining information of co-expression network based on multi-cancers integrated data. BMC Med Genomics 2019; 12:155. [PMID: 31888692 PMCID: PMC6936053 DOI: 10.1186/s12920-019-0608-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 10/23/2019] [Indexed: 12/23/2022] Open
Abstract
Background Gene co-expression network is a favorable method to reveal the nature of disease. With the development of cancer, the way to build gene co-expression networks based on cancer data has been become a hot spot. However, there are still a limited number of current node measurement methods and node mining strategies for multi-cancers network construction. Methods In this paper, we introduce a new method for mining information of co-expression network based on multi-cancers integrated data, named PMN. We construct the network by combining the different types of relevant measures (linear and nonlinear rules) for different nodes based on integrated gene expression data of multi-cancers from The Cancer Genome Atlas (TCGA). For mining genes, we combine different properties (local and global characteristics) of the nodes. Results We uncover more suspicious abnormally expressed genes and shared pathways of different cancers. And we have also found some proven genes and pathways; of course, there are some suspicious factors and molecules that need clinical validation. Conclusions The results demonstrate that our method is very effective in excavating gene co-expression genes of multi-cancers.
Collapse
|
35
|
Afyouni S, Smith SM, Nichols TE. Effective degrees of freedom of the Pearson's correlation coefficient under autocorrelation. Neuroimage 2019; 199:609-625. [PMID: 31158478 PMCID: PMC6693558 DOI: 10.1016/j.neuroimage.2019.05.011] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 05/02/2019] [Accepted: 05/06/2019] [Indexed: 12/13/2022] Open
Abstract
The dependence between pairs of time series is commonly quantified by Pearson's correlation. However, if the time series are themselves dependent (i.e. exhibit temporal autocorrelation), the effective degrees of freedom (EDF) are reduced, the standard error of the sample correlation coefficient is biased, and Fisher's transformation fails to stabilise the variance. Since fMRI time series are notoriously autocorrelated, the issue of biased standard errors - before or after Fisher's transformation - becomes vital in individual-level analysis of resting-state functional connectivity (rsFC) and must be addressed anytime a standardised Z-score is computed. We find that the severity of autocorrelation is highly dependent on spatial characteristics of brain regions, such as the size of regions of interest and the spatial location of those regions. We further show that the available EDF estimators make restrictive assumptions that are not supported by the data, resulting in biased rsFC inferences that lead to distorted topological descriptions of the connectome on the individual level. We propose a practical "xDF" method that accounts not only for distinct autocorrelation in each time series, but instantaneous and lagged cross-correlation. We find the xDF correction varies substantially over node pairs, indicating the limitations of global EDF corrections used previously. In addition to extensive synthetic and real data validations, we investigate the impact of this correction on rsFC measures in data from the Young Adult Human Connectome Project, showing that accounting for autocorrelation dramatically changes fundamental graph theoretical measures relative to no correction.
Collapse
|
36
|
Meng Q, Li K, Zhao C. An Improved Particle Filtering Algorithm Using Different Correlation Coefficients for Nonlinear System State Estimation. BIG DATA 2019; 7:114-120. [PMID: 30892919 DOI: 10.1089/big.2018.0130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Particle filtering (PF) algorithm has found an increasingly wide utilization in many fields at present, especially in nonlinear and non-Gaussian situations. Because of the particle degeneracy limitation, various resampling methods have been researched. This article proposed an improved PF algorithm combining with different rank correlation coefficients to overcome the shortcomings of degeneracy. By simulating iteration operation in Matlab, it discovers that the proposed algorithm provides better accuracy than sequential importance resampling, Gaussian sum particle filter, and Gaussian mixture sigma-point particle filters in Gaussian mixture noise.
Collapse
|
37
|
Identification of Vibration Events in Rotating Blades Using a Fiber Optical Tip Timing Sensor. SENSORS 2019; 19:s19071482. [PMID: 30934662 PMCID: PMC6479853 DOI: 10.3390/s19071482] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 03/13/2019] [Accepted: 03/24/2019] [Indexed: 11/25/2022]
Abstract
The blade tip timing (BTT) technique has been widely used in rotation machinery for non-contact blade vibration measurements. As BTT data is under-sampled, it requires complicated algorithms to reconstruct vibration parameters. Before reconstructing the vibration parameters, the right data segment should first be extracted from the massive volumes of BTT data that include noise from blade vibration events. This step requires manual intervention, is highly dependent on the skill of the operator, and has also made it difficult to automate BTT technique applications. This article proposes an included angle distribution (IAD) correlation method between adjacent revolutions to identify blade vibration events automatically in real time. All included angles of the rotor between any two adjacent blades were accurately detected by only one fiber optical tip timing sensor. Three formulas for calculating IAD correlation were then proposed to identify three types of blade vibration events: the blades’ overall vibrations, vibration of the adjacent two blades, and vibration of a specific blade. Further, the IAD correlation method was optimized in the calculating process to reduce computation load when identifying every blade’s vibration events. The presented IAD correlation method could be used for embedded, real-time, and automatic processing applications. Experimental results showed that the proposed method could identify all vibration events in rotating blades, even small events which may be wrongly identified by skillful operators.
Collapse
|
38
|
Contribution of phenolic compounds, ascorbic acid and vitamin E to antioxidant activity of currant (Ribes L.) and gooseberry (Ribes uva-crispa L.) fruits. Food Chem 2019; 284:323-333. [PMID: 30744864 DOI: 10.1016/j.foodchem.2019.01.072] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 01/09/2019] [Accepted: 01/11/2019] [Indexed: 01/14/2023]
Abstract
Berries of four gooseberry (Ribes uva-crispa L.) cultivars of Invicta, Rixanta, Karat and Black Negus and five currant (Ribes L.) cultivars of NS 11, Focus, Ben Gairn, Otelo and Viola were evaluated as potential sources of bioactive compounds with extraordinary antioxidant activity. Their total phenolic, flavonoid and anthocyanin contents were determined in the range of 3.52-30.77 g GA.kg-1, 2.83-17.35 g RE.kg-1 and 0.03-186.12 mg COG.100 g-1, respectively. Furthermore, quantification of phenolic compounds and vitamins was established by high-performance liquid chromatography-diode array detection. Flavonoids were the most abundant phenolic substances in the range of 345.0-3726.5 mg.kg-1. Ascorbic acid and vitamin E were established in the amounts of 6.2-14.04 g.kg-1 and 0.43-12.85 mg.kg-1, respectively. Considering all analyzed factors and antioxidant activities determined by various methods (DPPH, ACW and ACL), red gooseberry Black Negus and black currant Otelo were the most significant cultivars.
Collapse
|
39
|
Noise Reduction Method of Underwater Acoustic Signals Based on Uniform Phase Empirical Mode Decomposition, Amplitude-Aware Permutation Entropy, and Pearson Correlation Coefficient. ENTROPY 2018; 20:e20120918. [PMID: 33266642 PMCID: PMC7512504 DOI: 10.3390/e20120918] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 11/25/2018] [Accepted: 11/28/2018] [Indexed: 12/04/2022]
Abstract
Noise reduction of underwater acoustic signals is of great significance in the fields of military and ocean exploration. Based on the adaptive decomposition characteristic of uniform phase empirical mode decomposition (UPEMD), a noise reduction method for underwater acoustic signals is proposed, which combines amplitude-aware permutation entropy (AAPE) and Pearson correlation coefficient (PCC). UPEMD is a recently proposed improved empirical mode decomposition (EMD) algorithm that alleviates the mode splitting and residual noise effects of EMD. AAPE is a tool to quantify the information content of nonlinear time series. Unlike permutation entropy (PE), AAPE can reflect the amplitude information on time series. Firstly, the original signal is decomposed into a series of intrinsic mode functions (IMFs) by UPEMD. The AAPE of each IMF is calculated. The modes are separated into high-frequency IMFs and low-frequency IMFs, and all low-frequency IMFs are determined as useful IMFs (UIMFs). Then, the PCC between the high-frequency IMF with the smallest AAPE and the original signal is calculated. If PCC is greater than the threshold, the IMF is also determined as a UIMF. Finally, all UIMFs are reconstructed and the denoised signal is obtained. Chaotic signals with different signal-to-noise ratios (SNRs) are used for denoising experiments. Compared with EMD and extreme-point symmetric mode decomposition (ESMD), the proposed method has higher SNR and smaller root mean square error (RMSE). The proposed method is applied to noise reduction of real underwater acoustic signals. The results show that the method can further eliminate noise and the chaotic attractors are smoother and clearer.
Collapse
|
40
|
Mohapatra S, Weisshaar JC. Modified Pearson correlation coefficient for two-color imaging in spherocylindrical cells. BMC Bioinformatics 2018; 19:428. [PMID: 30445904 PMCID: PMC6240329 DOI: 10.1186/s12859-018-2444-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 10/22/2018] [Indexed: 11/10/2022] Open
Abstract
The revolution in fluorescence microscopy enables sub-diffraction-limit ("superresolution") localization of hundreds or thousands of copies of two differently labeled proteins in the same live cell. In typical experiments, fluorescence from the entire three-dimensional (3D) cell body is projected along the z-axis of the microscope to form a 2D image at the camera plane. For imaging of two different species, here denoted "red" and "green", a significant biological question is the extent to which the red and green spatial distributions are positively correlated, anti-correlated, or uncorrelated. A commonly used statistic for assessing the degree of linear correlation between two image matrices R and G is the Pearson Correlation Coefficient (PCC). PCC should vary from - 1 (perfect anti-correlation) to 0 (no linear correlation) to + 1 (perfect positive correlation). However, in the special case of spherocylindrical bacterial cells such as E. coli or B. subtilis, we show that the PCC fails both qualitatively and quantitatively. PCC returns the same + 1 value for 2D projections of distributions that are either perfectly correlated in 3D or completely uncorrelated in 3D. The PCC also systematically underestimates the degree of anti-correlation between the projections of two perfectly anti-correlated 3D distributions. The problem is that the projection of a random spatial distribution within the 3D spherocylinder is non-random in 2D, whereas PCC compares every matrix element of R or G with the constant mean value [Formula: see text] or [Formula: see text]. We propose a modified Pearson Correlation Coefficient (MPCC) that corrects this problem for spherocylindrical cell geometry by using the proper reference matrix for comparison with R and G. Correct behavior of MPCC is confirmed for a variety of numerical simulations and on experimental distributions of HU and RNA polymerase in live E. coli cells. The MPCC concept should be generalizable to other cell shapes.
Collapse
|
41
|
Huang J, Shi T, Gong B, Li X, Liao G, Tang Z. Fitting an Optical Fiber Background with a Weighted Savitzky-Golay Smoothing Filter for Raman Spectroscopy. APPLIED SPECTROSCOPY 2018; 72:1632-1644. [PMID: 30109810 DOI: 10.1177/0003702818785884] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The Raman background arising from optical fiber materials poses a critical problem for fiber optic surface-enhanced Raman spectroscopy (SERS). A novel filter is developed to fit the optical fiber background from the measured SERS spectrum of the target sample. The general model of the filter is built by incorporating a weighted term of matching the similarity between the estimated background spectrum and the measured background spectrum into the classic Savitzky-Golay (SG) smoothing filter model. Through respectively selecting Euclidean cosine coefficient (ECos) and Pearson correlation coefficient (PCor) as the similarity index, two different models of the weighted SG smoothing filter are derived and named as SG-ECos and SG-PCor accordingly. Furthermore, the algorithm is presented, implemented, successfully applied to experimentally measured SERS spectra of rhodamine 6G and crystal violet, and validated with mathematically simulated Raman spectra. Experimental and simulation results show that the SG-ECos filter is effective, fast, flexible, and of certain anti-noise capability in background fitting. It is suggested that the proposed filter may be also applicable for other Raman spectra measurements to remove spectral contaminants originated from sampling substrates such as glass slides.
Collapse
|
42
|
Murari A, Lungaroni M, Peluso E, Gaudio P, Lerche E, Garzotti L, Gelfusa M. On the Use of Transfer Entropy to Investigate the Time Horizon of Causal Influences between Signals. ENTROPY 2018; 20:e20090627. [PMID: 33265716 PMCID: PMC7513156 DOI: 10.3390/e20090627] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 08/14/2018] [Accepted: 08/15/2018] [Indexed: 11/30/2022]
Abstract
Understanding the details of the correlation between time series is an essential step on the route to assessing the causal relation between systems. Traditional statistical indicators, such as the Pearson correlation coefficient and the mutual information, have some significant limitations. More recently, transfer entropy has been proposed as a powerful tool to understand the flow of information between signals. In this paper, the comparative advantages of transfer entropy, for determining the time horizon of causal influence, are illustrated with the help of synthetic data. The technique has been specifically revised for the analysis of synchronization experiments. The investigation of experimental data from thermonuclear plasma diagnostics proves the potential and limitations of the developed approach.
Collapse
|
43
|
Suh S, Kim YE, Shin D, Ko S. Effect of frozen-storage period on quality of American sirloin and mackerel ( Scomber japonicus). Food Sci Biotechnol 2017; 26:1077-1084. [PMID: 30263639 PMCID: PMC6049543 DOI: 10.1007/s10068-017-0146-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 04/27/2017] [Accepted: 04/27/2017] [Indexed: 11/26/2022] Open
Abstract
This study aimed to study the effect of frozen-storage period on the quality of sirloin and mackerel (Scomber japonicus). The samples were evaluated after being kept in frozen storage at -17.9 °C for different periods of time (1, 8, 15, 22, and 29 days). The frozen storage resulted in increase in ice crystal formation on the surface of both sirloin and mackerel. Frozen-storage period had an effect on the increase in the drip loss of both sirloin and mackerel with a positive correlation (p < 0.05) as well as on the decrease in the hardness of sirloin with a negative correlation (p < 0.05). During the frozen-storage period, the 2-thiobarbituric acid reactive substance level was increased in mackerel while the level in sirloin was maintained; both levels were within safe limits. Consequently, a 29-day freezing period is postulated to have little effect on the quality of sirloin and mackerel.
Collapse
|
44
|
Signal-to-Noise Ratio Enhancement Based on Empirical Mode Decomposition in Phase-Sensitive Optical Time Domain Reflectometry Systems. SENSORS 2017; 17:s17081870. [PMID: 28805725 PMCID: PMC5579958 DOI: 10.3390/s17081870] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Revised: 08/11/2017] [Accepted: 08/12/2017] [Indexed: 11/16/2022]
Abstract
We propose a novel denoising method based on empirical mode decomposition (EMD) to improve the signal-to-noise ratio (SNR) for vibration sensing in phase-sensitive optical time domain reflectometry (φ-OTDR) systems. Raw Rayleigh backscattering traces are decomposed into a series of intrinsic mode functions (IMFs) and a residual component using an EMD algorithm. High frequency noise is eliminated by removing several IMFs at the position without vibration selected by the Pearson correlation coefficient (PCC). When the pulse width is 50 ns, the SNR of location information for the vibration events of 100 Hz and 1.2 kHz is increased to as high as 42.52 dB and 39.58 dB, respectively, with a 2 km sensing fiber, which demonstrates the excellent performance of this new method.
Collapse
|
45
|
Oyeyemi KD, Aizebeokhai AP, Okagbue HI. Geostatistical exploration of dataset assessing the heavy metal contamination in Ewekoro limestone, Southwestern Nigeria. Data Brief 2017; 14:110-117. [PMID: 28795088 PMCID: PMC5537382 DOI: 10.1016/j.dib.2017.07.041] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 07/18/2017] [Indexed: 11/21/2022] Open
Abstract
The dataset for this article contains geostatistical analysis of heavy metals contamination from limestone samples collected from Ewekoro Formation in the eastern Dahomey basin, Ogun State Nigeria. The samples were manually collected and analysed using Microwave Plasma Atomic Absorption Spectrometer (MPAS). Analysis of the twenty different samples showed different levels of heavy metals concentration. The analysed nine elements are Arsenic, Mercury, Cadmium, Cobalt, Chromium, Nickel, Lead, Vanadium and Zinc. Descriptive statistics was used to explore the heavy metal concentrations individually. Pearson, Kendall tau and Spearman rho correlation coefficients was used to establish the relationships among the elements and the analysis of variance showed that there is a significant difference in the mean distribution of the heavy metals concentration within and between the groups of the 20 samples analysed. The dataset can provide insights into the health implications of the contaminants especially when the mean concentration levels of the heavy metals are compared with recommended regulatory limit concentration.
Collapse
|
46
|
Huang J, Shi T, Tang Z, Zhu W, Liao G, Li X, Gong B, Zhou T. Extracting Optical Fiber Background from Surface-Enhanced Raman Spectroscopy Spectra Based on Bi-Objective Optimization Modeling. APPLIED SPECTROSCOPY 2017; 71:1808-1815. [PMID: 28436680 DOI: 10.1177/0003702817696088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We propose a bi-objective optimization model for extracting optical fiber background from the measured surface-enhanced Raman spectroscopy (SERS) spectrum of the target sample in the application of fiber optic SERS. The model is built using curve fitting to resolve the SERS spectrum into several individual bands, and simultaneously matching some resolved bands with the measured background spectrum. The Pearson correlation coefficient is selected as the similarity index and its maximum value is pursued during the spectral matching process. An algorithm is proposed, programmed, and demonstrated successfully in extracting optical fiber background or fluorescence background from the measured SERS spectra of rhodamine 6G (R6G) and crystal violet (CV). The proposed model not only can be applied to remove optical fiber background or fluorescence background for SERS spectra, but also can be transferred to conventional Raman spectra recorded using fiber optic instrumentation.
Collapse
|
47
|
Bhat FM, Riar CS. Cultivars effect on the physical characteristics of rice (rough and milled) ( Oryza Sativa L.) of temperate region of Kashmir (India). Journal of Food Science and Technology 2017; 53:4258-4269. [PMID: 28115766 DOI: 10.1007/s13197-016-2420-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Revised: 08/29/2016] [Accepted: 11/24/2016] [Indexed: 11/25/2022]
Abstract
The aim of present research was to evaluate physical and engineering properties of traditional paddy and rice cultivars native to temperate region of India. Length, width, thickness, equivalent diameter, surface area, aspect ratio, volume, bulk density, true density, porosity, thousand kernels weight, angle of repose and coefficient of friction were evaluated, which are required in designing of various post harvest operations and storage structures. The low bulk density of cultivars, Mushki budgi, Mushki tujan and Kaw kareed may be due to the presence of long awns possessed by these cultivars which were bulky and occupied more space. The wide variations were found in rice kernels with respect to colour, which determined the functional properties and energy requirement during polishing of these cultivars. Results indicated significant differences in the physical properties among various paddy and brown rice cultivars when compared with earlier reported results. Thousand kernel weight, width, arithmetic mean diameter and equivalent diameter showed significant positive correlations with spherecity, surface area, volume, true density, and angle of repose; but negatively correlated with bulk density. These desirable characteristics exploit agriculturists/institutions to preserve these races and encourage farmers to cultivate these cherished rice cultivars.
Collapse
|
48
|
The Synergistic Effect of Selumetinib/Docetaxel Combination Therapy Monitored by [(18)F]FDG/[(18)F]FLT PET and Diffusion-Weighted Magnetic Resonance Imaging in a Colorectal Tumor Xenograft Model. Mol Imaging Biol 2016; 18:249-57. [PMID: 26276154 DOI: 10.1007/s11307-015-0881-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
PURPOSE Positron emission tomography (PET) and diffusion-weighted MRI (DW-MRI) were used to characterize the treatment effects of the MEK1/2 inhibitor selumetinib (AZD6244), docetaxel, and their combination in HCT116 tumor-bearing mice on the molecular level. PROCEDURES Mice were treated with vehicle, selumetinib (25 mg/kg), docetaxel (15 mg/kg), or a combination of both drugs for 7 days and imaged at four time points with 2-deoxy-2-[(18)F]fluoro-D-glucose ([(18)F]FDG) or 3'-deoxy-3'-[(18)F]fluorothymidine ([(18)F]FLT) followed by DW-MRI to calculate the apparent diffusion coefficient (ADC). Data was cross-validated using the Pearson correlation coefficient (PCC) and compared to histology (IHC). RESULTS Each drug led to tumor growth inhibition but their combination resulted in regression. Separate analysis of PET or ADC could not provide significant differences between groups. Only PCC combined with IHC analysis revealed the highest therapeutic impact for combination therapy. CONCLUSION Combination treatment of selumetinib/docetaxel was superior to the respective mono-therapies shown by PCC of PET and ADC in conjunction with histology.
Collapse
|
49
|
Chandran AKN, Yoo YH, Cao P, Sharma R, Sharma M, Dardick C, Ronald PC, Jung KH. Updated Rice Kinase Database RKD 2.0: enabling transcriptome and functional analysis of rice kinase genes. RICE (NEW YORK, N.Y.) 2016; 9:40. [PMID: 27540739 PMCID: PMC4991984 DOI: 10.1186/s12284-016-0106-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 07/08/2016] [Indexed: 05/20/2023]
Abstract
BACKGROUND Protein kinases catalyze the transfer of a phosphate moiety from a phosphate donor to the substrate molecule, thus playing critical roles in cell signaling and metabolism. Although plant genomes contain more than 1000 genes that encode kinases, knowledge is limited about the function of each of these kinases. A major obstacle that hinders progress towards kinase characterization is functional redundancy. To address this challenge, we previously developed the rice kinase database (RKD) that integrated omics-scale data within a phylogenetics context. RESULTS An updated version of rice kinase database (RKD) that contains metadata derived from NCBI GEO expression datasets has been developed. RKD 2.0 facilitates in-depth transcriptomic analyses of kinase-encoding genes in diverse rice tissues and in response to biotic and abiotic stresses and hormone treatments. We identified 261 kinases specifically expressed in particular tissues, 130 that are significantly up- regulated in response to biotic stress, 296 in response to abiotic stress, and 260 in response to hormones. Based on this update and Pearson correlation coefficient (PCC) analysis, we estimated that 19 out of 26 genes characterized through loss-of-function studies confer dominant functions. These were selected because they either had paralogous members with PCC values of <0.5 or had no paralog. CONCLUSION Compared with the previous version of RKD, RKD 2.0 enables more effective estimations of functional redundancy or dominance because it uses comprehensive expression profiles rather than individual profiles. The integrated analysis of RKD with PCC establishes a single platform for researchers to select rice kinases for functional analyses.
Collapse
|
50
|
Li WV, Chen Y, Li JJ. TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples. STATISTICS IN BIOSCIENCES 2016; 9:105-136. [PMID: 28781712 DOI: 10.1007/s12561-016-9163-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and divergence of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective, in contrast to traditional correlation analyses, about capturing transcriptomic similarity. Specifically, the TROM method focuses on identifying associated genes that capture molecular characteristics of biological samples, and subsequently comparing the biological samples by testing the overlap of their associated genes. We use simulation and real data studies to demonstrate that TROM is more powerful in identifying similar transcriptomes and more robust to stochastic gene expression noise than Pearson and Spearman correlations. We apply TROM to compare the developmental stages of six Drosophila species, C. elegans, S. purpuratus, D. rerio and mouse liver, and find interesting correspondence patterns that imply conserved gene expression programs in the development of these species. The TROM method is available as an R package on CRAN (https://cran.r-project.org/package=TROM) with manuals and source codes available at http://www.stat.ucla.edu/~jingyi.li/software-and-data/trom.html.
Collapse
|