-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Rcv1 example
RCV1-V2 Example for VW
Once this finishes, you need to put the data in the right format for VW. The following command can do that:
zless train.dat.gz | sed -e 's/^-1/0 |features/' | sed -e 's/^1/1 |features/' | sed -e 's/$/ const:.01/'
The vw_process script encapsulates this command.
The output of vw_process looks like:
1 |features 13:3.9656971e-02 24:3.4781646e-02 69:4.6296168e-02 85:6.1853945e-02 ... const:.01 0 |features 9:8.5609287e-02 14:2.9904654e-02 19:6.1031535e-02 20:2.1757640e-02 ... const:.01 ...
From the above, you can see that the input data format is similar to SVMlight's feature:value sparse representation format. There are two important differences:
- The feature can be a string not including colon ':', space ' ', or pipe '|' (which are special characters). In the above, there is only one noninteger features named "const". In general, this is pretty handy because you can use much less processed data than most learning algorithm take in.
- The features are divided into namespaces. The semantics of a namespace is: features with the same name but a different namespace are different features. This example just has one namespace "features" which is the simplest (and probably most common) case.
There are a couple variation on the above format. If you want to importance weight examples, place the importance weight after the label and before the first namespace. A missing importance weight is treated as 1 by default. Similarly, if features have a weight of 1, they can be represented as just it's name rather than name:1.
The command for training is the following.
./vw_process train.dat.gz | vw -l 20 --initial_t 128000 --power_t 1 --cache_file cache_train -f r_tempHere:
- -l 20 is the initial learning rate. It is large, but decays quickly.
- --initial_t 128000 is the initial count. This is essentially a fictitious number of examples which the algorithm imagines has already passed by.
- --power_t 1 tells vw to decay the learning rate like 1/t. The combination of learning rate parameters imply that the learning rate decays as 2560000/(128000+t) where t is the count of examples seen so far.
- --cache_file flag parses the data into VWs own internal compressed format. The second time you run the above command, it should be much faster---about 6 seconds on my desktop machine.
- -f r_temp stores the output regressor in the file r_temp.
Next, you test according to the following:
./vw_process test.dat.gz| vw -t --cache_file cache_test -i r_temp -p p_out
Here the flags are:
- -t tells vw to not use the labels for training.
- -i r_temp loads the regressor at r_temp before examples are processed
- -p p_out makes the predictions be output to the file p_out.
zless test.dat.gz | cut -d ' ' -f 1 | sed -e 's/^-1/0/' > labels
and then type:
perf -ACC -files labels p_out -t 0.5
The results on my machine are summarized by the following table:
Method | Wall clock Execution Time | Test Set Error rate |
VW | 3.0s | 5.54% |
svmsgd | 21.8s | 5.74% |
- VW is optimizing squared loss and then thresholding on 0.5 rather than hinge loss, so their internal optimizations fundamentally differ. For svmsgd, 5.74% was the best I found for one pass with lambda = 0.00001. Note that nobody is worrying about overfitting in parameter tuning here.
- The timing numbers are wall-clock execution time. svmsgd spends about 19s (wall clock time) loading the data and making it ready to train in 0.38s (cputime). VW runs fully online, so the process of loading and running data is fundamentally mixed.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: