REAL

Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart

Tóth, Gergely and Bodai, Zsolt and Héberger, Károly (2013) Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 27 (10). pp. 837-844. ISSN 0920-654X

[img]
Preview
Text
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN_2013.pdf

Download (370kB) | Preview

Abstract

Coefficient of determination (R2) and its leave-one-out cross-validated analogue (denoted by Q2 or Rcv 2) are the most frequantly published values to characterize the predictive performance of models. In this article we use R2 and Q2 in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q2)/(1 - R2) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q2)/(1 - R2) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q2 and R2 values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling. Graphical Abstract: [Figure not available: see fulltext.] © 2013 Springer Science+Business Media Dordrecht.

Item Type: Article
Uncontrolled Keywords: training set; Quantitative structure activity relationships; PREDICTION; Leave-one-out cross-validation; Influence analysis; coefficient of determination
Subjects: Q Science / természettudomány > QD Chemistry / kémia
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 17 Dec 2013 07:42
Last Modified: 10 Jan 2015 12:43
URI: http://real.mtak.hu/id/eprint/8143

Actions (login required)

Edit Item Edit Item