![]() fixedWindow: A logical: if FALSE, the training set always start at the first sample and the training set size will vary over data splits.Īs an example, suppose we have a time series with 20 data points.horizon: The number of consecutive values in test set sample.initialWindow: the initial number of consecutive values in each training set sample.The three parameters for this type of splitting are: caret contains a function called createTimeSlices that can create the indices for this type of splitting. Hyndman and Athanasopoulos (2013) discuss rolling forecasting origin techniques that move the training and test sets in time. Simple random sampling of time series is probably not the best way to resample times series data. The visualization below shows the data set (small points), the starting samples (larger blue points) and the order in which the other 20 samples are added. NewSamp <- maxDissim(start, samplePool, n = 20) For these data, the distance measure has less of an impact than the scoring method for determining which compounds are most dissimilar. The panels in the figure show the results using several combinations of distance metrics and scoring functions. Using an initial random sample of 5 compounds, we can select 20 more compounds from the data so that the new compounds are most dissimilar from the initial 5 that were specified. caret includes two functions, minDiss and sumDiss, that can be used to maximize the minimum and total dissimilarities, respectfully.Īs an example, the figure below shows a scatter plot of two chemical descriptors for the Cox2 data. The argument obj can be used to specify any function that returns a scalar measure. Also, there are many ways to calculate which sample is “most dissimilar”. See the manual for that package for a list of available measures. There are many methods in R to calculate dissimilarity. The most dissimilar point in B is added to A and the process continues. To do this, for each sample in B, the function calculates the m dissimilarities between each point in A. We may want to create a sub–sample from B that is diverse when compared to A. Suppose there is a data set A with m samples and a larger data set B with n samples. 22.2 Internal and External Performance EstimatesĪlso, the function maxDissim can be used to create sub–samples using a maximum dissimilarity approach ( Willett, 1999).22 Feature Selection using Simulated Annealing.21.2 Internal and External Performance Estimates.21 Feature Selection using Genetic Algorithms.20.3 Recursive Feature Elimination via caret.20.2 Resampling and External Validation.19 Feature Selection using Univariate Filters.18.1 Models with Built-In Feature Selection.16.6 Neural Networks with a Principal Component Step.16.2 Partial Least Squares Discriminant Analysis.16.1 Yet Another k-Nearest Neighbor Function.13.9 Illustrative Example 6: Offsets in Generalized Linear Models.13.8 Illustrative Example 5: Optimizing probability thresholds for class imbalances.13.7 Illustrative Example 4: PLS Feature Extraction Pre-Processing.13.6 Illustrative Example 3: Nonstandard Formulas.13.5 Illustrative Example 2: Something More Complicated - LogitBoost.13.2 Illustrative Example 1: SVMs with Laplacian Kernels.12.1.2 Using additional data to measure performance.12.1.1 More versatile tools for preprocessing data.11.4 Using Custom Subsampling Techniques.7.0.27 Multivariate Adaptive Regression Splines.5.9 Fitting Models Without Parameter Tuning. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |