On-line problems such as website optimisation require to trade exploration and exploitation in order to learn and optimise an unknown target. For instance, in the Pascal Exploration & Exploitation Challenge 2011 I am organising with my UCL colleagues, a web server observes clicks from visitors to a webpage, and aims to maximise the ratio of clicks per page views. The relationship between the visitor-content pairs and clicks is unknown but it can be learnt from past observations. The content presented to the visitors is either chosen to improve our model of clicks in regions of uncertainty or it is based on the model that has been built so far. These two distinct motives are referred to as exploration and exploitation. Thus, the web server should explore enough to be able to build an accurate model of the visitor’s behaviour, while allowing for sufficient exploitation in order to earn clicks.
The problem of trading exploration and exploitation was first tackled in the “multi-armed bandit” formalism, in which the target observations are compared to rewards obtained when pulling arms of slot-machines. The first theoretical analysis focused on independent reward distributions for each arm. However, in a real-world scenario, arms (inputs) are rarely independent and modelling the dependencies is essential in order to obtain the best learning rate. Gaussian Processes (GP), for instance, are a powerful modelling tool that has been widely used in bayesian online optimisation, in combination with heuristics such as the Most Probable Improvement for selecting inputs where to sample the function. However, performance guarantees for the use of GP in a bandit setting were only found in 2010, when used in combination with the Upper Confidence Bound heuristic for trading exploration and exploitation (Srinivas et al., 2010).
The exploration/exploitation dilemma is a recurrent topic in many areas of research, e.g. global optimisation, reinforcement learning, tree search, recommender systems or information retrieval. The trading of exploration and exploitation is particularly of high importance in various large-scale applications, such as sponsored search advertising (Graepel et al., 2010) or content-based information retrieval (Auer at al., 2010), where the aim is to help users quickly access the information they are looking for. The workshop will provide an opportunity to present, compare and discuss the performance of different exploration/exploitation techniques as well as theoretical analysis of such algorithms. A particular focus of the workshop will be large scale applications. The results of the Exploration and Exploitation Challenge will also be presented during the workshop.
Call for papers
The workshop will be single-day, comprising of invited talks and presentations of contributed work, with time for discussion. Depending on quality and compatibility with the workshop objectives, slots for brief talks and posters will be allocated.
We invite contributions on the topic of trading exploration and exploitation in various domains including (but not limited to) bandit games, online optimisation, reinforcement learning, tree search, recommender systems, information retrieval, etc. Topics of interest include: applications and practical performance of techniques to trade exploration and exploitation, benchmark/comparison of different techniques, computational challenges in large scale applications, new techniques, theoretical analyses, etc.
Contributions should be communicated to the programme committee (the organisers) in form of an extended abstract (from 4 to 8 pages in the ICML conference paper style), sent by email to: louis [at] dorard [dot] me .
- adPredictor: Assorted Thoughts (machinedlearnings.com)