Back to Search
Start Over
On the need for structure modelling in sequence prediction
- Source :
- Twomey, N, Diethe, T & Flach, P 2016, ' On the Need for Structure Modelling in Sequence Prediction ', Machine Learning, vol. 104, no. 2, pp. 291–314 . https://doi.org/10.1007/s10994-016-5571-y
- Publisher :
- Springer Nature
-
Abstract
- There is no uniform approach in the literature for modelling sequential correlations in sequence classification problems. It is easy to find examples of unstructured models (e.g. logistic regression) where correlations are not taken into account at all, but there are also many examples where the correlations are explicitly incorporated into a – potentially computationally expensive – structured classification model (e.g. conditional random fields). In this paper we lay theoretical and empirical foundations for clarifying the types of problem which necessitate direct modelling of correlations in sequences, and the types of problem where unstructured models that capture sequential aspects solely through features are sufficient. The theoretical work in this paper shows that the rate of decay of auto-correlations within a sequence is related to the excess classification risk that is incurred by ignoring the structural aspect of the data. This is an intuitively appealing result, demonstrating the intimate link between the auto-correlations and excess classification risk. Drawing directly on this theory, we develop well-founded visual analytics tools that can be applied a priori on data sequences and we demonstrate how these tools can guide practitioners in specifying feature representations based on auto-correlation profiles. Empirical analysis is performed on three sequential datasets. With baseline feature templates, structured and unstructured models achieve similar performance, indicating no initial preference for either model. We then apply the visual analytics tools to the datasets, and show that classification performance in all cases is improved over baseline results when our tools are involved in defining feature representations.
- Subjects :
- Conditional random field
Visual analytics
Computer science
autocorrelation
02 engineering and technology
conditional random field
computer.software_genre
Machine learning
01 natural sciences
structure modelling
010104 statistics & probability
SPHERE
Artificial Intelligence
0202 electrical engineering, electronic engineering, information engineering
Feature (machine learning)
0101 mathematics
Baseline (configuration management)
Structure (mathematical logic)
Sequence
business.industry
Autocorrelation
A priori and a posteriori
020201 artificial intelligence & image processing
Data mining
Artificial intelligence
business
computer
Jean Golding
Software
Subjects
Details
- Language :
- English
- ISSN :
- 08856125
- Volume :
- 104
- Issue :
- 2-3
- Database :
- OpenAIRE
- Journal :
- Machine Learning
- Accession number :
- edsair.doi.dedup.....af6c7b7c9c20cbeb8d046e0041b40780
- Full Text :
- https://doi.org/10.1007/s10994-016-5571-y