VW contains a contextual bandit module which allows you to optimize a predictor based on already collected contextual bandit data. In other words, the module does not implement exploration, it assumes ...
This code represents the idea of contextual bandits (= associative search). The basic idea is that we have n machines which have k different actions. All machines have different reward probabilities ...