REAL

Selection of optimal validation methods for quantitative structure–activity relationships and applicability domain

Héberger, Károly (2023) Selection of optimal validation methods for quantitative structure–activity relationships and applicability domain. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. ISSN 1062-936X (In Press)

[img]
Preview
Text
KH_SAR_QSAREnvironRes2023.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
[img] Text (Excel table with several sheets)
SupplInfoKH.xlsx - Supplemental Material
Available under License Creative Commons Attribution Non-commercial.

Download (26kB)

Abstract

This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision making analysis has been made by the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variants of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well-defined one. SRD as a preferable multicriteria decision making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the data set in question.

Item Type: Article
Subjects: Q Science / természettudomány > QA Mathematics / matematika > QA76.9.D343 Data mining and searching techniques / adatbányászati és keresési módszerek
Q Science / természettudomány > QD Chemistry / kémia > QD01 Analytical chemistry / analitikai kémia
Depositing User: Erika Bilicsi
Date Deposited: 30 May 2023 11:38
Last Modified: 30 May 2023 11:38
URI: http://real.mtak.hu/id/eprint/164354

Actions (login required)

Edit Item Edit Item