Comparison of supervised learning statistical methods for classifying commercial beers and identifying patterns

Koren, Dániel and Lőrincz, Laura and Kovács, Sándor and Kun-Farkas, Gabriella and Vecseriné Hegyes, Beáta and Sipos, László (2020) Comparison of supervised learning statistical methods for classifying commercial beers and identifying patterns. JOURNAL OF CHEMOMETRICS, 34 (e3216). ISSN 0886-9383 (print); 1099-128X (online)

Preview

Text
Supervised learning statistical methods.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB) | Preview

Official URL: https://onlinelibrary.wiley.com/doi/full/10.1002/c...

Abstract

In this study, 13 properties (alcohol-, real extract-, flavonoid-, anthocyanin, glucose, fructose, maltose, sucrose content, EBC [European Brewery Convention] and L*a*b* color, bitterness) of 21 beers (alcohol-free pale lagers, alcohol-free beer-based mixed drinks, beer-based mixed drinks, international lagers, wheat beers, stouts, fruit beers) were determined. In the first step, multiple factor analysis (MFA) was performed for the whole data and five clusters (target classes) were determined; then, a bootstrapping was applied to establish a balanced data so as every cluster should contain 100 samples and the total sample size is 500. In the second step, 12 supervised learning algorithms (random trees [RND], Quinlan's C4.5 decision tree algorithm [C4.5], Iterative Dichotomiser 3 algorithm [ID3], cost-sensitive decision tree algorithm [CSMC4], cost-sensitive classification tree [CSCRT], k-nearest neighbors algorithm [KNN], radial basis function [RBF], multilayer perceptron neural network [MLP], prototype nearest neighbor [PNN], linear discriminant analysis [LDA], naïve Bayes with continuous variables [NBC], partial least squares discriminant analysis [PLS-DA]) were applied to classify each brand into the target classes. Furthermore, several error rates were calculated: re-substitution error rate (RER), cross-validated error rate (CV), bootsrap error (BOOT), leaveone-out (LOO), and train-test error rate (TRAIN). The MFA could discriminate five groups, which can be characterized by some analytical parameters, and the other multivariate methods performed similarly. The methods can be discriminated best based on the BOOT, CV, and LOO. The best estimation methods are the C4.5, CSMC4, and CSCRT; these performed best along the flavonoid content and EBC color. It identified that the methods most sensitive to the properties are the NBC. The classification ability fluctuated greatly in the case of three properties (glucose, maltose, sucrose). A remarkable fluctuation has been experienced in the case of L*a*b* color parameters, flavonoid content, EBC color, and bitterness by NBC method.

Item Type:	Article
Uncontrolled Keywords:	beer, error estimation, fruit beer, learning algorithms, multiple factor analysis (MFA)
Subjects:	S Agriculture / mezőgazdaság > S1 Agriculture (General) / mezőgazdaság általában
Depositing User:	Dr. László Sipos
Date Deposited:	27 Sep 2020 16:50
Last Modified:	23 Sep 2025 08:28
URI:	https://real.mtak.hu/id/eprint/114881

Actions (login required)

Edit Item