-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Using vw hypersearch
vw-hypersearch
is a simple wrapper to vw
to help in finding lowest-loss hyper-parameters (argmin).
For example: say you want to find the lowest average loss for --l1
(L1-norm regularization) on a train-set called train.dat
. You can run:
$ vw-hypersearch 1e-10 1 vw --l1 % train.dat
vw-hypersearch
will train multiple times (but in a efficient way) until it finds the --l1
value resulting in the lowest average training loss.
In the call:
- the
%
character is a placeholder for the (argmin) parameter we are looking for. -
1e-10
is the lower-bound for the search range -
1
is the upper-bound of the search range
The lower & upper bounds are arguments to vw-hypersearch
. Anything from vw
on, are normal vw
arguments exactly as you would use in training. The only change you must apply to the training command is to use %
instead of the value of the parameter you're trying to optimize on.
Calling vw-hypersearch
without any arguments should provide a Usage message.
Additional arguments can be passed to vw-hypersearch
preceding vw
itself:
- -L will do a log-space golden-section search instead of a simple golden-section search.
-
-t test.dat
(note: this must come before thevw
argument) will search for the training parameter that results in a minimum loss ontest.dat
rather thantrain.dat
(ignoring the training errors). - An optional 3rd numeric parameter will be interpreted as a
tolerance
parameter directingvw-hypersearch
to stop only when a difference in two consecutive run errors is less thantolerance
# Find the learning-rate resulting in the lowest average loss for a logistic loss train-set:
vw-hypersearch 0.1 100 vw --loss_function logistic --learning_rate % train.dat
# Find the bootstrap resulting in the lowest average loss
# vw-hypersearch will automatically search in integer-space since --bootstrap expects an integer
vw-hypersearch 2 16 vw --bootstrap % train.dat
vw-hypersearch
conducts a golden-section search search by default. This search method strikes a good balance between safety and efficiency.
- Lowest average loss is not necessarily optimal
- Your real goal should always be to find a minimal generalization error, not training error.
- Some parameters do not have a convex loss, for these
vw-hypersearch
will converge on a local-minima instead of global
vw-hypersearch
is written in perl and is included with vowpal wabbit (in the utl
subdirectory). In case of doubt, look at the source
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: