8 results on '"Damir Nadramija"'
Search Results
2. Toward Generating Simpler QSAR Models: Nonlinear Multivariate Regression versus Several Neural Network Ensembles and Some Related Methods.
- Author
-
Bono Lucic, Damir Nadramija, Ivan Basic, and Nenad Trinajstic
- Published
- 2003
- Full Text
- View/download PDF
3. Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges
- Author
-
Dražen Vikić-Topić, Bono Lučić, Damir Nadramija, Viktor Bojović, Mario Lovrić, Ana Sović Kržić, Jadranko Batista, and Drago Bešlo
- Subjects
Contingency table ,0303 health sciences ,Correlation coefficient ,model validation ,QSPR ,QSAR ,two-class variable ,classification model ,contingency table ,estimation ,prediction ,test set ,correlation coefficient ,predictive error ,classification accuracy ,model ranking ,random accuracy ,010401 analytical chemistry ,General Chemistry ,01 natural sciences ,Measure (mathematics) ,0104 chemical sciences ,Set (abstract data type) ,03 medical and health sciences ,Chemistry ,Standard error ,Ranking ,Test set ,Statistics ,030304 developmental biology ,Variable (mathematics) - Abstract
Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculating the accuracy of predictive model properties are analysed. Here we discuss two such cases that can often occur in the application of the model in predicting properties of a new external set of compounds. The first problem in using the correlation coefficient is its insensitivity to the systemic error that must be expected in predicting properties of a novel external set of compounds, which is not a random sample selected from the training set. The second problem is that an external set can be arbitrarily large or small and have an arbitrary and uneven distribution of the measured value of the target variable, whose values are not known in advance. In these conditions, the correlation coefficient can be an overoptimistic measure of agreement of predicted values with the corresponding experimental values and can lead to a highly optimistic conclusion about the predictive ability of the model. Due to these shortcomings of the correlation coefficient, the use of standard error (root-mean-square-error) of prediction is suggested as a better quality measure of predictive capabilities of a model. In the case of classification models, the use of the difference between the real accuracy and the most probable random accuracy of the model shows very good characteristics in ranking different models according to predictive quality, having at the same time an obvious interpretation . This work is licensed under a Creative Commons Attribution 4.0 International License.
- Published
- 2019
4. Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges
- Author
-
Bono Lučić, Jadranko Batista, Viktor Bojović, Mario Lovrić, Ana Sović Kržić, Drago Bešlo, Damir Nadramija, Dražen Vikić-Topić, Bono Lučić, Jadranko Batista, Viktor Bojović, Mario Lovrić, Ana Sović Kržić, Drago Bešlo, Damir Nadramija, and Dražen Vikić-Topić
- Abstract
Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculating the accuracy of predictive model properties are analysed. Here we discuss two such cases that can often occur in the application of the model in predicting properties of a new external set of compounds. The first problem in using the correlation coefficient is its insensitivity to the systemic error that must be expected in predicting properties of a novel external set of compounds, which is not a random sample selected from the training set. The second problem is that an external set can be arbitrarily large or small and have an arbitrary and uneven distribution of the measured value of the target variable, whose values are not known in advance. In these conditions, the correlation coefficient can be an overoptimistic measure of agreement of predicted values with the corresponding experimental values and can lead to a highly optimistic conclusion about the predictive ability of the model. Due to these shortcomings of the correlation coefficient, the use of standard error (root-mean-square-error) of prediction is suggested as a better quality measure of predictive capabilities of a model. In the case of classification models, the use of the difference between the real accuracy and the most probable random accuracy of the model shows very good characteristics in ranking different models according to predictive quality, having at the same time an obvious interpretation . This work is licensed under a Creative Commons Attribution 4.0 International License.
- Published
- 2019
5. Modeling Anti-HIV Activity of HEPT Derivatives Revisited. Multiregression Models Are Not Inferior Ones
- Author
-
Ivan Bašic, Damir Nadramija, Mario Flajšlik, Dragan Amić, Bono Lučić, Theodore E. Simos, and George Maroulis
- Subjects
Anti hiv activity ,Multivariate statistics ,Multivariate analysis ,Correlation coefficient ,Loo ,Artificial neural network ,business.industry ,Computer science ,Pattern recognition ,Machine learning ,computer.software_genre ,Data set ,Molecular descriptor ,Artificial intelligence ,business ,computer - Abstract
Several quantitative structure‐activity studies for this data set containing 107 HEPT derivatives have been performed since 1997, using the same set of molecules by (more or less) different classes of molecular descriptors. Multivariate Regression (MR) and Artificial Neural Network (ANN) models were developed and in each study the authors concluded that ANN models are superior to MR ones. We re‐calculated multivariate regression models for this set of molecules using the same set of descriptors, and compared our results with the previous ones. Two main reasons for overestimation of the quality of the ANN models in previous studies comparing with MR models are: (1) wrong calculation of leave‐one‐out (LOO) cross‐validated (CV) correlation coefficient for MR models in Luco et al., J. Chem. Inf. Comput. Sci. 37 392–401 (1997), and (2) incorrect estimation/interpretation of leave‐one‐out (LOO) cross‐validated and predictive performance and power of ANN models. More precise and fairer comparison of fit and LOO ...
- Published
- 2007
- Full Text
- View/download PDF
6. Data Visualization of Multivariate (Non)Linear Regression Ensembles in QSAR/QSPR
- Author
-
Sonja Nikoli, Bono Lu_i, and Damir Nadramija
- Subjects
Multivariate statistics ,Quantitative structure–activity relationship ,Data visualization ,business.industry ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer ,Nonlinear regression ,Mathematics - Published
- 2006
- Full Text
- View/download PDF
7. Nonlinear Multivariate Polynomial Ensembles in QSAR/QSPR
- Author
-
Lučić, Bono, Damir, Nadramija, and Simos, Theodore E.
- Subjects
QSAR/QSPR modeling ,selection of the most relevant molecular descriptors ,ensembles of multivariate regression models ,linear and nonlinear models - Abstract
In this study we demonstrate use of ensembles of linear and nonlinear multivariate regression models, based on multivariate polynomials of initial descriptors, in QSAR/QSPR modeling. Data sets, which varied significantly in size regarding number of variables and number of points, were all previously referenced in literature and molecular structures were either obtained from authors of these publications or generated in our laboratories. All data sets were encoded as SMILES and converted to 3D structures (SD files) by the CORINA program (www2.chemie.uni-erlangen.de/software/corina/). All descriptors were computed by the program DRAGON 2.1 (http://www.disat.unimib.it/chm/). Linear ensembles were built with multiple linear regression models (MLR) and nonlinear ensembles consisted of multivariate polynomials, which were constructed as controlled subsets selected among linear descriptors, their two-fold cross-products and squares, as well as cubic potencies of (only) single descriptors. Ensemble responses were computed as mean or median or weighted values of all intrinsic models. Models and ensembles discussed in this paper were constructed with the application NQSAR, a Windows console application, which is available upon request. Results obtained show clear advantage of nonlinear ensembles over linear counterparts when data sets contain 4 to 5 times more points than model coefficients. On the other side linear ensembles, which in general exhibit higher robustness and stability, are better suited for small data sets with many variables outperforming nonlinear ensembles in predicting values of data points from external data set. This can be explained by the fact that the linear models are less affected by small variations than nonlinear models while they equally benefit from the key ensemble features. Primarily, we note the impact of the inclusion of more variables spread across optimized variable subsets, which are used in ensembles’ intrinsic models that individually satisfy before mentioned rule on over-fitting. The overall ensemble responses are more stable and robust with higher predictive powers than single models.
- Published
- 2005
8. Toward generating simpler QSAR models: Nonlinear multivariate regression versus several neural network ensembles and some related methods
- Author
-
Nenad Trinajstić, Ivan Bašic, Damir Nadramija, and Bono Lučić
- Subjects
Multivariate statistics ,Quantitative structure–activity relationship ,Artificial neural network ,business.industry ,Genetic function ,Feature selection ,Field (mathematics) ,General Chemistry ,Machine learning ,computer.software_genre ,Computer Science Applications ,Nonlinear system ,Computational Theory and Mathematics ,Simple (abstract algebra) ,Variable selection ,derivatives ,outperforms ,Applied mathematics ,Artificial intelligence ,business ,computer ,Information Systems ,Mathematics - Abstract
In this study we want to test whether a simple modeling procedure used in the field of QSAR/QSPR can produce simple models that will be, at the same time, as accurate as robust Neural Network Ensemble (NNE) ones. We present results of application of two procedures for generating/selecting simple linear and nonlinear multiregression (MR) models: (1) method for selecting the best possible MR models (named as CROMRsel) and (2) Genetic Function Approximation (GFA) method from the Cerius2 program package. The obtained MR models are strictly compared with several NNE models. For the comparison we selected four QSAR data sets previously studied by NNE (Tetko et al. J. Chem. Inf. Comput. Sci. 1996, 36, 794-803. Kovalishyn et al. J. Chem. Inf. Comput. Sci. 1998, 38, 651-659.): (1) 51 benzodiazepine derivatives, (2) 37 carboquinone derivatives, (3) 74 pyrimidines, and (4) 31 antimycin analogues. These data sets were parameterized with 7, 6, 27, and 53 descriptors, respectively. Modeled properties were anti-pentylenetetrazole activity, antileukemic activity, inhibition constants to dihydrofolate reductase from MB1428 E. coli, and antifilarial activity, respectively. Nonlinearities were introduced into the MR models through 2-fold and/or 3-fold cross-products of initial (linear) descriptors. Then, using the CROMRsel and GFA programs (J. Chem. Inf. Comput. Sci. 1999, 39, 121-132) the sets of I (Ior = 8, in this paper) the best descriptors (according to the fit and leave-one-out correlation coefficients) were selected for multiregression models. Two classes of models were obtained: (1) linear or nonlinear MR models which were generated starting from the complete set of descriptors, and (2) nonlinear MR models which were generated starting from the same set of descriptors that was used in the NNE modeling. In addition, the descriptor selection method from CROMRsel was compared with the GFA method included in the QSAR module of the Cerius2 program. For each data set it has been found that the MR models have better cross-validated statistical parameters than the corresponding NNE models and that CROMRsel selects somewhat better MR models than the GFA method. MR models are also much simpler than NNEs, which is the important surprising fact, and, additionally, express calculated dependencies in a functional form. Moreover, MR models were shown to be better than all other models obtained by different methods on the same data sets ("old" multivariate regressions, functional-link-net models, back-propagation neural networks, genetic algorithm, and partial least squares models). This study also indicated that the robust NNE models cannot generate good models when applied on small data sets, suggesting that it is perhaps better to apply robust methods (like NNE ones) on larger data sets.
- Published
- 2003
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.