Author: "Xu, Lu" / Journal: chemometrics & intelligent laboratory systems - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xu, Lu"' showing total 19 results

Start Over Author "Xu, Lu" Journal chemometrics & intelligent laboratory systems

19 results on '"Xu, Lu"'

1. Representative splitting cross validation.

Author: Xu, Lu, Hu, Ou, Guo, Yuwan, Zhang, Mengqin, Lu, Daowang, Cai, Chen-Bo, Xie, Shunping, Goodarzi, Mohammad, Fu, Hai-Yan, and She, Yuan-Bin
Subjects: *LATENT variables, *ESTIMATION theory, *MATHEMATICAL complex analysis, *MULTIVARIATE analysis, *PARTIAL least squares regression
Abstract: Abstract Cross-validation (CV) is widely used to estimate model complexity or the number of significant latent variables (LVs) for multivariate calibration methods like partial least squares (PLS). A basic consideration when developing and validating multivariate calibration models is that both the training and validation sets should be representative and distributed in the experimental space as uniformly as possible. Motivated by this idea, we proposed a new CV method called representative splitting cross-validation (RSCV). In RSCV, firstly, the DUPLEX algorithm was used to sequentially divide the original training set into k (in this work, k = 2, 4, 8 and 16) equal parts. Secondly, a series of k-fold (k = 2, 4, 8 and 16) CVs were performed based on the above data splitting. Finally, the pooled root mean squared error of CV (RMSECV) was used to estimate model complexity. Five real multivariate calibration data sets were investigated and RSCV was compared with leave-one-out CV (LOOCV), 10-fold CV and Monte Carlo CV (MCCV). With a maximum k of 16, RSCV was shown to be a useful and stable method to select PLS LVs, and can obtain simpler models with acceptable computational burden. Highlights • Representative splitting cross validation (RSCV) was proposed. • The DUPLEX algorithm was used to split the raw data set. • RSCV is a fusion of serial k -fold cross validations. • RSCV is stable and can obtain simpler models when necessary. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

2. Stochastic cross validation.

Author: Xu, Lu, Fu, Hai-Yan, Goodarzi, Mohammad, Cai, Chen-Bo, Yin, Qiao-Bo, Wu, Ya, Tang, Bang-Cheng, and She, Yuan-Bin
Subjects: *STOCHASTIC analysis, *PARTIAL least squares regression, *RANDOM numbers, *MONTE Carlo method, *MULTIVARIATE analysis, *CALIBRATION
Abstract: Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k -fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k -fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

3. Interpretable linear and nonlinear quantitative structure-selectivity relationship (QSSR) modeling of a biomimetic catalytic system by particle swarm optimization based sparse regression.

Author: Xu, Lu, Fu, Hai-Yan, Yin, Qiao-Bo, Fan, Yao, Goodarzi, Mohammad, and She, Yuan-Bin
Subjects: *NONLINEAR systems, *PARTICLE swarm optimization, *REGRESSION analysis, *BIOMIMETIC chemicals, *NITROTOLUENE, *BENZALDEHYDE
Abstract: A particle swarm optimization (PSO) based sparse regression (PSO-SR) strategy was proposed to study the quantitative structure-selectivity relationship (QSSR) of a biomimetic catalytic system, where the selectivity in the mild oxidation of o -nitrotoluene to o -nitrobenzaldehyde was related to the molecular descriptors of 48 metalloporphyrin catalysts. PSO was used to obtain an optimal variable combination for linear or nonlinear models. For nonlinear modeling, a set of 44 nonlinear transforms were developed for each single descriptor. To enable model interpretability and reduce the risk of overfitting, the total descriptors were divided into subclasses and the selected variables were forced to be sparsely distributed in each subclass. Model complexity was controlled by adjusting the maximum total number of variables included. Accurate linear and nonlinear PSO-SR models were developed using multiple linear regression (MLR) and partial least squares (PLS) and validated by randomly and repeatedly splitting the data into training and test objects for 500 times. The best predictions were obtained with 10 variables with linear ( Q 2 =0.9460) and nonlinear ( Q 2 =0.9505) models. The results indicate PSO-SR could provide an effective and useful strategy for modeling and interpreting complex QSSR problems. The proposed nonlinear modeling method could provide more information for model interpretation by probing and catching the unknown nonlinear relationship between a descriptor and the observed selectivity. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

4. Studying a gas–solid multi-component adsorption process with near-infrared process analytical technique: Experimental setup, chemometrics, adsorption kinetics and mechanism.

Author: Cai, Chen-Bo, Xu, Lu, Zhong, Wei, Tao, Yong-Yuan, Wang, Bo, Yang, Hong-Wei, and Wen, Mei-Qiong
Subjects: *GAS-solid interfaces, *OPTICAL fibers, *SPECTROMETERS, *MULTIPHASE flow, *CHEMOMETRICS, *ADSORPTION kinetics
Abstract: The study on multi-component or competitive gas–solid adsorption process is a challenge to current theories as well as experimental methods. In the paper, a methodology composed of near-infrared spectroscopy, process analytical technique and chemometrics has been tried to investigate the adsorption process of orthoxylene/aniline onto active alumina surface. The adsorption process took place in a differential adsorption bed, which was non-invasively monitored by a near-infrared spectrometer via optical fiber probe, and spectra were collected at every minute during the whole adsorption process. After treating these spectra with chemometrics algorithms, including wavelet analysis, partial least squares and artificial neural network, the adsorption process has been investigated thoroughly as well as clearly: not only the adsorption rates of each adsorbates at various adsorption times but also a great deal of information about the mechanism of the adsorption process. For example, orthoxylene was adsorbed evenly on the active alumina surface, while aniline was adsorbed uprightly; aniline was adsorbed to fabricate the first and the second adsorption layer on the adsorbent surface simultaneously rather than to construct them in sequence or step by step; and some parts of the adsorbent surface was more active for adsorption. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

5. A MATLAB toolbox for class modeling using one-class partial least squares (OCPLS) classifiers.

Author: Xu, Lu, Goodarzi, Mohammad, Shi, Wei, Cai, Chen-Bo, and Jiang, Jian-Hui
Subjects: *LEAST squares, *CLASSIFICATION, *DEBUGGING, *ROBUST control, *RADIAL basis functions, *COMPUTER algorithms
Abstract: One-class classifiers are widely used to solve the classification problems where control or class modeling of a target class is necessary, e.g., untargeted analysis of food adulterations and frauds, tracing the origins of a food with Protected Denomination of Origin, fault diagnosis, etc. Recently, one-class partial least squares (OCPLS) has been developed and demonstrated to be a useful technique for class modeling. For analysis of nonlinear and outlier-contaminated data, nonlinear and robust OCPLS algorithms are required. This paper describes a free MATLAB toolbox for class modeling using OCPLS classifiers. The toolbox includes ordinary, nonlinear and robust OCPLS methods. The nonlinear algorithm is based on the Gaussian radial basis function (GRBF), and the robust algorithm is based on the partial robust M-regression (PRM). The usage of the toolbox is demonstrated by analysis of a real data set. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

6. One-class partial least squares (OCPLS) classifier.

Author: Xu, Lu, Yan, Si-Min, Cai, Chen-Bo, and Yu, Xiao-Ping
Subjects: *LEAST squares, *MULTIVARIATE analysis, *QUALITY control, *STATISTICAL correlation, *NEAR infrared spectroscopy, *DRIED milk
Abstract: Abstract: One-class partial least squares (OCPLS) classifier is investigated as a tool for multivariate statistical quality control (MSQC). According to the OCPLS score distance (SD) and absolute centered residual (ACR) of predicted response, an object can be classified into one of the four groups: regular points (with a small SD and a small ACR), class outliers (with a small SD and a large ACR), good leverage points (with a large SD and a small ACR) and bad leverage points (with a large SD and a large ACR). The correlation between OCPLS distance measures and some existing methods, including D-statistic, Q-statistic and correlation coefficient (Pearson's r), is briefly discussed. OCPLS is applied to non-targeted detection of adulterations in whole milk powder using near-infrared (NIR) spectroscopy. The results demonstrate OCPLS can provide an effective tool for MSQC by including both SD and ACR of predicted response. [Copyright &y& Elsevier]
Published: 2013
Full Text: View/download PDF

7. Developing novel and general descriptors for traditional Chinese medicine (TCM) formulas: A case study of quantitative formula–activity relationship (QFAR) model for hypertension prescriptions

Author: Xu, Lu, Deng, De-Hua, Jiang, Jian-Hui, Yu, Ru-Qin, Wu, Xiu-Mei, and Zhao, Yu
Subjects: *TRADITIONAL medicine, *CHINESE medicine, *CASE studies, *MATHEMATICAL formulas, *QUANTITATIVE chemical analysis, *STRUCTURE-activity relationship in pharmacology, *HYPERTENSION, *MEDICAL prescriptions
Abstract: Abstract: For the first time, the principal actions or effects of individual herbs defined by traditional Chinese medicine (TCM) are considered to describe an herbal formula. Principal effects, natures, flavors, meridian affinity, lifting, lowering, floating and sinking (LLFS) and toxicity of single herbs are then combined with their dosages to generate formula descriptors in a linear manner. Herbal actions and properties defined by TCM theory are foundations of formula prescriptions. Therefore, the resulted formula descriptors will simulate a TCM doctor''s knowledge well. Moreover, during the development of TCM for thousands of years, herbal actions and properties are well understood and one can generate descriptors for any TCM formula based on our method. To validate and demonstrate the usefulness of the newly proposed formula descriptors, a case study of quantitative formula–activity relationship (QFAR) of hypertension formulas is performed. Based on 64 general formula descriptors, least squares support vector machine (LS-SVM) is used to discriminate the effective hypertension formulas from the negative ones. An external validation set is used to optimize LS-SVM parameters and evaluate the performance of classification models. The prediction accuracy of independent positive formulas and negative formulas are 93.1% and 96.5%, respectively. The new formula descriptors and the corresponding QFAR model can be used for preliminary screening of effective formulas from the massive TCM formula data distributed in various literatures and databases. [Copyright &y& Elsevier]
Published: 2011
Full Text: View/download PDF

8. MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration

Author: Xu, Lu, Jiang, Jian-Hui, Zhou, Yan-Ping, Wu, Hai-Long, Shen, Guo-Li, and Yu, Ru-Qin
Subjects: *MULTIVARIATE analysis, *CALIBRATION, *PHYSICAL measurements, *REGRESSION analysis
Abstract: Abstract: The present paper deals with variable selection in multivariate calibration of spectral data. A machine learning method, stacked regression is improved and then used to linearly combine different regression models built on sequential spectral intervals. While automatically extracting the spectral intervals carrying useful information for quantitative analysis, the proposed method can achieve a combined regression model with minimum RMSEMCCV (root mean squared error of Monte Carlo cross validation) among all possible linear combinations of the interval models under certain reasonable constraints. As expected, this method demonstrates considerable immunity against overfitting yet holds good prediction property. Due to some inherent characteristics of stacked regression, the method is economical to compute and the computation time is acceptable for large data sets. Two real spectral data sets are investigated by this method and the results are compared with those obtained by simple interval PLS. [Copyright &y& Elsevier]
Published: 2007
Full Text: View/download PDF

9. Variable-weighted PLS

Author: Xu, Lu, Jiang, Jian-Hui, Wu, Hai-Long, Shen, Guo-Li, and Yu, Ru-Qin
Subjects: *STATISTICAL correlation, *CURVE fitting, *ESTIMATION theory, *MATHEMATICAL statistics
Abstract: Abstract: In multivariate calibration methods like partial least squares (PLS), especially when the spectra data consists of measurements at hundreds and even thousands of analytical channels, it is widely accepted that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In the present paper, the idea of variable selection is extended. Unlike in traditional variable selection methods, where the deleted variables and the variables included in the regression model are essentially weighted with discrete values 0 and 1, respectively, the strategy adopted in this paper is to weight the variables with continuous non-negative values. A recently proposed global optimization method, particle swarm optimization (PSO) algorithm is used to search for the weights of variables optimizing the training of a calibration set and the prediction of an independent validation set. Since variable selection is just a special case of variable weighting, the latter is expected to be more rational and flexible. Variable weighting would reduce the negative influence of wavelengths with undesirable qualities while retaining the useful information carried by them. Variable weighting would also prevent the possible spoiling of the multi-channel advantage of the model by variable selection, which would happen when the number of selected wavelengths is small. Two real data sets are investigated and the results of variable-weighted PLS and those of PLS are compared to demonstrate the advantages of the proposed method. [Copyright &y& Elsevier]
Published: 2007
Full Text: View/download PDF

10. Extended topological indices and prediction of activities of chiral compounds

Author: Xu, Lu, Zhang, Qing-You, Wang, Jun, and Dong, Lin
Subjects: *DOPAMINE receptors, *ANALYSIS of variance, *CHIRALITY, *COGNITIVE neuroscience
Abstract: Abstract: The activities/properties of two molecules with identical formula but different configuration states of the asymmetric atoms are different. Thus, usually the common topological indices are not suitable. In this study, the chiral topological indices were obtained by extending A mi indices suggested by our laboratory and molecular connectivity indices. The modified topological indices have been used for the studies on D2 for dopamine receptor and α receptor activities of fourteen N-alkylated 3-(3-hydroxyphenyl)-piperidines. It has been observed that selected variables possess low correlations. The results obtained by using multiple regression analysis and artificial neural networks are satisfactory. [Copyright &y& Elsevier]
Published: 2006
Full Text: View/download PDF

11. Bagging classification tree-based robust variable selection for radial basis function network modeling in metabonomics data analysis.

Author: Gu, Hui, Cui, Yan-Fang, Xu, Lu, Tu, Meng-Ying, Fu, Yan-Jiao, Fu, Hai-Yan, and Zhou, Yan-Ping
Subjects: *BIOMEDICAL materials, *BIOMEDICAL engineering, *NEURAL circuitry, *GENE expression, *RADIAL basis functions
Abstract: Complex datasets can be routinely produced from modern analytical platforms in metabonomics surveys, which brings enormous challenges to existing chemometrics tools. In the current study, inspired by the characteristic of classification tree (CT) in automatically selecting the most informative variables and measuring their importance, the potential of bagging in improving the reliability and robustness of a single model, and the promising modeling performance of radial basis function network (RBFN), we designed a new chemometrics tool, i.e., bagging classification tree-radial basis function network (BAGCT-EBFN), for metabonomics data analysis. In BAGCT-RBFN, a series of parallel CT models were firstly established based on the idea of bagging (BAGCT). The informative variables can be successfully spied via inspecting the variable importance values over all CTs in BAGCT. Then, RBFN was utilized to relate the identified informative variables to the class memberships. To demonstrate the practical application of BAGCT-RBFN in metabonomics, an H-1 NMR-based metabonomics dataset associated with lung cancer was applied. The results showed that BAGCT-RBFN can find a shortlist of discriminatory variables with reliability while attain more satisfactory classification accuracy than traditional CT and RBFN. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

12. Challenges of large-class-number classification (LCNC): A novel ensemble strategy (ES) and its application to discriminating the geographical origins of 25 green teas.

Author: Fu, Hai-Yan, Yin, Qiao-Bo, Xu, Lu, Goodarzi, Mohammad, Yang, Tian-Ming, Li, Gang-Feng, FengQiao, null, and She, Yuan-Bin
Subjects: *GREEN tea, *NEAR infrared spectroscopy, *LEAST squares, *DISCRIMINANT analysis, *SET theory
Abstract: Large-class-number classification (LCNC) would bring new challenges to pattern recognition due to increased data complexity and class overlapping. In this study, a novel ensemble strategy (ES) was proposed to tackle LCNC problems. By combining the One-Versus-Rest (OVR) and One-Versus-One (OVO) strategies to design a set of classifiers with reduced class numbers, ES assigns a new object to the class receiving the most votes. When two or more classes obtain the most votes, an additional OVR model is developed to discriminate them. ES, OVR, OVO and the softmax function were investigated to discriminate the geographical origins of 25 green tea samples using near-infrared (NIR) spectroscopy and Partial Least Squares Discriminant Analysis (PLSDA). Using the Standard Normal Variate (SNV) as a spectral scatter correction technique, the total accuracy was 0.6468 for OVR-PLSDA, 0.8494 for OVO-PLSDA, 0.9299 for PLSDA-softmax, and 0.9377 for ES-PLSDA, respectively. Using other preprocessing methods and multiple random splitting of the data sets obtained the similar results. The poor performance of OVR can be attributed to the increased possibility of class overlapping and high sub-model complexity. OVO was less influenced by LCNC because it is based on a set of relatively simpler two-class classifiers. PLSDA-softmax could overcome the class overlapping by nonlinear transformations. ES was demonstrated to be capable of extracting more useful information from sub-models and achieved improved performance in LCNC. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

13. A novel invariant and sequence comparison for DNA

Author: Zhang, Bao-Hua, Wang, Hai-Shui, and Xu, Lu
Subjects: *DNA, *MATHEMATICAL sequences, *PHYLOGENY, *NUCLEIC acids
Abstract: Abstract: We consider numerical characterization of DNA primary sequence based on the positions of bases (a, t, c, g) and the pairs of bases X, Y in DNA (X, Y=a, t, c, g). This leads to a representation of DNA by a numerical sequence. Then, we extract a novel invariant (molecular connectivity index) from the derived numerical sequences. The suitable invariant can offer a characterization of DNA primary sequence. Finally, we provide an illustration of its utility by making a comparison between ten DNA sequences belonging to β-globin gene in different species. The evolutionary relationships of ten species we have revealed in this contribution accord with phylogenetic tree properly. [Copyright &y& Elsevier]
Published: 2007
Full Text: View/download PDF

14. Highly selective atomic chiral index and its application to automatic assignment of chiral centers in chiral compounds.

Author: Xiao, Kaixia, Chen, Mengyao, Zhao, Tanfeng, Zhou, Yanmei, Liu, Xiaoqiang, Zhang, Qingyou, and Xu, Lu
Subjects: *CHIRALITY, *CATALYTIC activity, *CHEMICAL synthesis, *UNIQUENESS (Mathematics), *ATOMIC structure
Abstract: As the requirement of chiral compounds increases, the studies on asymmetric catalytic synthesis and separation are of interest. For these kinds of researches, identifications of chiral centers are often required. However, it's time-consuming, especially for the compounds with multiple chiral centers. Thus, a method to automatically identify chiral center based on highly selective atomic chiral index was suggested in this paper. At first, an atomic chiral index—aEAID was suggested based on molecular index EAID (one of the most highly discriminating indices). The uniqueness test of aEAID was performed by a virtual data set of 3,851,864 atoms, no degeneracy occurred. However, the uniqueness test was performed by NCI database of 3,814,521 atoms (NCI database were taken from the network), 12 pairs of degenerated atoms within 7 pairs of molecules were found. In order to improve the discriminating ability of aEAID, a distance factor was introduced into aEAID to generate a new highly selective atomic chiral index—d-aEAID. The two atomic data sets above were also used to test the uniqueness of the index—d-aEAID, and there was no degeneration. Furthermore, the two atomic indices, i.e., aEAID and d-aEAID, were applied to automatic identification of chiral centers for a total of more than 100,000 chiral compounds for three data sets (each compound possesses 1–38 chiral centers), and both atomic indices correctly identified all the chiral centers. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

15. Machine learning induction of chemically intuitive rules for the prediction of enantioselectivity in the asymmetric syntheses of alcohols.

Author: Zheng, Fangfang, Zhang, Qingyou, Li, Jingya, Suo, Jingjie, Wu, Chengcheng, Zhou, Yanmei, Liu, Xiaoqiang, and Xu, Lu
Subjects: *ENANTIOSELECTIVE catalysis, *CHEMINFORMATICS, *CHIRALITY, *CHEMICAL reactions, *MACHINE learning, *MATHEMATICAL models
Abstract: In an asymmetric catalytic reaction involving a specific catalyst, the prediction/determination of the absolute configuration of the major product is one of the essential issues. Some encouraging results of enantioselectivity prediction in asymmetric reactions have been achieved by chemoinformatics methods. However, intuitive empirical rules of enantioselectivity are to be preferred by chemists. This investigation attempted to combine chemoinformatics methods with the empirical rules of experts, i.e., to build chemoinformatics models on the basis of intuitive descriptors to derive interpretable empirical rules. In order to implement this, chiral substituent codes were specially developed, and Fisher linear discriminant analysis was used to construct the relationship between chiral substituent codes and the absolute configurations of the products in asymmetric reactions. The method was successfully applied to two data sets of reactions, namely to the products of the reaction – chiral alcohols. On the basis of the excellent results of structure–enantioselectivity relationships (SER), several empirical rules were extracted from the constructed mathematical models, which highlighted aspects of the mechanism and can almost entirely replace the analytical expressions. This investigation extends the application of chemoinfomatics methods to the area of generation of empirical rules, and the idea has the potential to be generally applied with enantioselective reactions by using different codes and different machine learning methods. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

16. Prediction of enantioselectivity of primary alcohols involved in racemic resolutions using chiral substituent code.

Author: Suo, Jing-Jie, Zhang, Qing-You, Zhao, Tan-Feng, Zheng, Fang-Fang, Li, Jing-Ya, Zhang, Dan-Dan, Feng, Xiu-Lin, Long, Hai-Lin, and Xu, Lu
Subjects: *ENANTIOMERS, *ALCOHOLS (Chemical class), *RACEMIC mixtures, *RESOLUTION (Chemistry), *CHIRALITY, *EMPIRICAL research, *TRANSESTERIFICATION
Abstract: Abstract: An empirical rule has been approached to predict enantioselectivity of primary alcohols involved in racemic resolutions by transesterifications, or hydrolyses catalyzed by Pseudomonas cepacia lipase, but the rule is not suitable for the primary alcohols that have an oxygen atom attached to the stereocenter. In order to overcome the shortcoming, herein we suggested a chemoinformatics method to predict enantioselectivity of primary alcohols. The primary alcohols were represented by chiral substituent code that was specially designed for the compounds including a specific chiral center bonded directly with two common substituents. In this case, the position information of four substituents of chiral center was predefined. The code was derived from 21 topological and physicochemical properties of the other two variable substituents. The chiral substituent code was applied to the prediction of enantioselectivity of 86 enantiomeric pairs of primary alcohols, that is, to the prediction of the absolute configuration of major product or minor product. The subset of the chiral substituent code selected by genetic algorithm was fed to random forest to construct prediction model, all 15 pairs of enantiomers in the test set were predicted correctly. The result of out of bag cross-validation of the whole data set reached to 91%, which was better than the result obtained by the empirical rule and also better than the result of our previous research. The results of the investigation of structure-enantioselectivity relationship were satisfactory. [Copyright &y& Elsevier]
Published: 2013
Full Text: View/download PDF

17. Combining bootstrap and uninformative variable elimination: Chemometric identification of metabonomic biomarkers by nonparametric analysis of discriminant partial least squares

Author: Sun, Xiao-Ming, Yu, Xiao-Ping, Liu, Yun, Xu, Lu, and Di, Duo-Long
Subjects: *STATISTICAL bootstrapping, *CHEMOMETRICS, *BIOMARKERS, *NONPARAMETRIC statistics, *DISCRIMINANT analysis, *LEAST squares, *DATA analysis
Abstract: Abstract: Interpretation and mining of complex metabonomic data depend heavily on proper use of chemometric methods. Due to the “small n” paradigm and the absence of sufficient information concerning distribution of data, the classical parametric methods based on known theoretical distributions are sometimes unsuitable or unreliable to treat such data. Therefore, nonparametric methods requiring no or very limited assumptions provide useful alternative tools in many practical applications. In this paper, a new discriminant partial least squares combined with bootstrap and uninformative variable elimination (DPLS–BS–UVE) method is proposed for biomarker discovery in metabonomics. The method was tested on two real chromatographic data sets containing plasma metabolic profilings for S180 and H22 tumor-bearing mice. A robust version of c j was used as the cutoff criterion. The results of biomarker discovery were compared with those obtained using variable importance in the projection (VIP) as well as BS. It is demonstrated that similar results are obtained using the three methods and DPLS–BS–UVE could provide easy interpretation of raw data. When the resampling unit increases to 500, the results were not significantly affected. In conclusion, DPLS–BS–UVE is a reliable alternative method for biomarker discovery, especially when the sample size is small. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

18. Virtual screening of a combinatorial library of enantioselective catalysts with chirality codes and counterpropagation neural networks

Author: Zhang, Qing-You, Zhang, Dan-Dan, Li, Jing-Ya, Zhou, Yan-Mei, and Xu, Lu
Subjects: *ENANTIOSELECTIVE catalysis, *CHIRALITY, *COMBINATORIAL chemistry, *ARTIFICIAL neural networks, *DISTRIBUTION (Probability theory), *ACETOPHENONE, *LIGANDS (Chemistry)
Abstract: Abstract: Conformation-independent chirality codes, radial distribution function (RDF) codes, and indicator variables are implemented to represent 1914 catalysts in a combinatorial library which was tested by Riant and co-workers for the asymmetric-hydrogen transfer to acetophenone. The catalysts which combine a metallic center with a chiral ligand have been evaluated in terms of both enantiomeric excess and yield. A counterpropagation neural network (CPG NN) was trained with a small fraction of the library to predict the performance of catalysts, and applied to the virtual screening of the remaining library. Selection of <20.8% of the virtual library with the highest predicted performance enables to identify up to 85.5% of the best catalysts. The approach illustrates a chemoinformatic method to assist the optimization of resources for the screening of enantioselective catalysts. [Copyright &y& Elsevier]
Published: 2011
Full Text: View/download PDF

19. Beyond one-against-all (OAA) and one-against-one (OAO): An exhaustive and parallel half-against-half (HAH) strategy for multi-class classification and applications to metabolomics.

Author: Yang, Qin, Tan, Lin, Wu, Ben-Qing, Tian, Guo-Li, Xu, Lu, Yang, Jiang-Tao, Jiang, Jian-Hui, and Yu, Ru-Qin
Subjects: *PARTICLE swarm optimization, *ELECTION forecasting, *DISCRIMINANT analysis, *METABOLOMICS, *NEWBORN screening, *LEAST squares, *FEATURE selection
Abstract: Urinary metabolomics coupled with GC-MS has become a leading technology in newborn screening. Because of non-specificity, complexity and high-variety in clinical characteristics and metabolomic profiling, the simultaneous detection of multiple inherited metabolic diseases (IMDs) is often challenging. As a substantial health problem, a competent chemometrics multi-class classification system for the early detection and diagnosis of IMDs would be advantageous. Beyond the commonly used binarization techniques of one-against-all (OAA) and one-against-one (OAO), an exhaustive and parallel half-against-half (EPHAH) decomposition is described in this study to deal with multi-class classification. For a K -class problem, EPHAH employs uniform class binary partition strategy to induce the binary classifier evaluating a half of K classes against the other half. With K -class problem exhaustively decomposed into all uniform binary partitions of K classes, EPHAH parallelly arranges the corresponding binary classifiers and aggregates their outputs to obtain the multi-class prediction using max-wins voting strategy. Based on orthogonal partial least squares discriminant analysis (OPLS-DA) with feature selection using particle swarm optimization (PSO) algorithm, EPHAH is investigated by GC-MS urinary metabolomics data among healthy controls and 9 most common IMDs. The results show that EPHAH enables a complete learning of the complex multi-class decision boundaries of 10 classes, exhibiting significant superiority in classification accuracies over OAA, OAO and traditional HAH. Meanwhile, compared with OAO using the same max-wins voting strategy, EPHAH gives an effective break of the tie problem in classification and enhanced resolution in votes. • An exhaustive and parallel half-against-half (EPHAH) decomposition was proposed for multi-class classification. • EPHAH involved uniform class binary partition and parallel arrangement with exhaustive decomposition of K -class problem. • EPHAH coupled with OPLS-DA profiled GC-MS urinary metabolomics data of multi-class IMDs with desired results. • A competent system for multi-class IMDs simultaneous detection was developed, aiding their early detection and diagnosis. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"Xu, Lu"'

1. Representative splitting cross validation.

2. Stochastic cross validation.

3. Interpretable linear and nonlinear quantitative structure-selectivity relationship (QSSR) modeling of a biomimetic catalytic system by particle swarm optimization based sparse regression.

4. Studying a gas–solid multi-component adsorption process with near-infrared process analytical technique: Experimental setup, chemometrics, adsorption kinetics and mechanism.

5. A MATLAB toolbox for class modeling using one-class partial least squares (OCPLS) classifiers.

6. One-class partial least squares (OCPLS) classifier.

7. Developing novel and general descriptors for traditional Chinese medicine (TCM) formulas: A case study of quantitative formula–activity relationship (QFAR) model for hypertension prescriptions

8. MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration

9. Variable-weighted PLS

10. Extended topological indices and prediction of activities of chiral compounds

11. Bagging classification tree-based robust variable selection for radial basis function network modeling in metabonomics data analysis.

12. Challenges of large-class-number classification (LCNC): A novel ensemble strategy (ES) and its application to discriminating the geographical origins of 25 green teas.

13. A novel invariant and sequence comparison for DNA

14. Highly selective atomic chiral index and its application to automatic assignment of chiral centers in chiral compounds.

15. Machine learning induction of chemically intuitive rules for the prediction of enantioselectivity in the asymmetric syntheses of alcohols.

16. Prediction of enantioselectivity of primary alcohols involved in racemic resolutions using chiral substituent code.

17. Combining bootstrap and uninformative variable elimination: Chemometric identification of metabonomic biomarkers by nonparametric analysis of discriminant partial least squares

18. Virtual screening of a combinatorial library of enantioselective catalysts with chirality codes and counterpropagation neural networks

19. Beyond one-against-all (OAA) and one-against-one (OAO): An exhaustive and parallel half-against-half (HAH) strategy for multi-class classification and applications to metabolomics.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

19 results on '"Xu, Lu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources