Last updated on May, 2021. Topics also generally include combinatorial bandits, semi-bandits, Thompson Sampling, submodularity, sequential decision making, surrogate losses, approximate solutions, etc. π£/:smiling_imp: denote paper importance from my own perspective.
π£π£:smiling_imp: Combinatorial Multi-Armed Bandit: General Framework and Applications [link] [slides]
Wei Chen, Yajun Wang, Yang Yuan
ICML, 2013
π£Thompson Sampling for Complex Online Problems [link]
Aditya Gopalan, Shie Mannor, Yishay Mansour
ICML, 2014
Efficient Learning in Large-Scale Combinatorial Semi-Bandits [link]
Zheng Wen, Branislav Kveton, Azin Ashkan
ICML, 2015
On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments [link]
Yifan Wu, Andras Gyorgy, Csaba Szepesvari
ICML, 2015
π£DCM Bandits: Learning to Rank with Multiple Clicks [link]
Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Zheng Wen
ICML, 2016
Contextual Combinatorial Cascading Bandits [link]
Shuai Li, Baoxiang Wang, Shengyu Zhang, Wei Chen
ICML, 2016
π£π£:smiling_imp: Thompson Sampling for Combinatorial Semi-Bandits [link] [journal version]
Siwei Wang, Wei Chen
ICML, 2018
π£Exploiting structure of uncertainty for efficient matroid semi-bandits [link]
Pierre Perrault, Vianney Perchet, Michal Valko
ICML, 2019
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously [link]
Julian Zimmert, Haipeng Luo, Chen-Yu Wei
ICML, 2019
π£π£:smiling_imp: Graphical Models Meet Bandits: A Variational Thompson Sampling Approach [link] [video]
Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
ICML, 2020
(Locally) Differentially Private Combinatorial Semi-Bandits [link]
Xiaoyu Chen, Kai Zheng, Zixin Zhou, Yunchang Yang, Wei Chen, Liwei Wang
ICML, 2020
Combinatorial Pure Exploration for Dueling Bandit [link]
Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao
ICML, 2020
Adaptive Submodular Maximization in Bandit Setting [link]
Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan
NeurIPS, 2013
Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising [link]
Min Xu, Tao Qin, Tie-Yan Liu
NeurIPS, 2013
Combinatorial Pure Exploration of Multi-Armed Bandits [link]
Shouyuan Chen, Tian Lin, Irwin King, Michael R. Lyu, Wei Chen
NeurIPS, 2014
Online Decision-Making in General Combinatorial Spaces [link]
Arun Rajkumar, Shivani Agarwal
NeurIPS, 2014
Online combinatorial optimization with stochastic decision sets and adversarial losses [link]
Gergely Neu, Michal Valko
NeurIPS, 2014
Stochastic Online Greedy Learning with Semi-bandit Feedbacks [link]
Tian Lin, Jian Li, Wei Chen
NeurIPS, 2015
Combinatorial Bandits Revisited [link]
Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, marc lelarge
NeurIPS, 2015
Combinatorial Cascading Bandits [link]
Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
NeurIPS, 2015
Linear Multi-Resource Allocation with Semi-Bandit Feedback [link]
Tor Lattimore, Koby Crammer, Csaba Szepesvari
NeurIPS, 2015
On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs [link]
Wei Cao, Jian Li, Yufei Tao, Zhize Li
NeurIPS, 2016
Combinatorial Multi-Armed Bandit with General Reward Functions [link]
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu
NeurIPS, 2016
Combinatorial semi-bandit with known covariance [link]
RΓ©my Degenne, Vianney Perchet
NeurIPS, 2016
Contextual semibandits via supervised learning oracles [link]
Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik
NeurIPS, 2016
π£Learning Unknown Markov Decision Processes: A Thompson Sampling Approach [link]
Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, Rahul Jain
NeurIPS, 2017
π£Interactive Submodular Bandit [link]
Lin Chen, Andreas Krause, Amin Karbasi
NeurIPS, 2017
Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications [link]
Qinshi Wang, Wei Chen
NeurIPS, 2017
Contextual bandits with surrogate losses: Margin bounds and efficient algorithms [link]
Dylan J. Foster, Akshay Krishnamurthy
NeurIPS, 2018
π£Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward [link]
Lixing Chen, Jie Xu, Zhuo Lu
NeurIPS, 2018
Combinatorial Bandits with Relative Feedback [link]
Aadirupa Saha, Aditya Gopalan
NeurIPS, 2019
Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback [link]
Mingrui Zhang, Lin Chen, Hamed Hassani, Amin Karbasi
NeurIPS, 2019
π£Improved Regret Bounds for Bandit Combinatorial Optimization [link]
Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi
NeurIPS, 2019
π£Optimal Decision Tree with Noisy Outcomes [link]
Su Jia, viswanath nagarajan, Fatemeh Navidi, R Ravi
NeurIPS, 2019
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio [link]
Julian Zimmert, Tor Lattimore
NeurIPS, 2019
π£:smiling_imp: Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits [link][video]
Pierre Perrault, Etienne Boursier, Michal Valko, Vianney Perchet
NeurIPS, 2020
Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors [link]
Junya Honda, Akimichi Takemura
AISTATS, 2014
Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits [link]
Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
AISTATS, 2015
Improved Learning Complexity in Combinatorial Pure Exploration Bandits [link]
Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Ronald Ortner, Peter Bartlett
AISTATS, 2016
Efficient Bandit Combinatorial Optimization Algorithm with Zero-suppressed Binary Decision Diagrams [link]
Shinsaku Sakaue, Masakazu Ishihata, Shin-ichi Minato
AISTATS, 2018
π£Combinatorial Semi-Bandits with Knapsacks [link]
Karthik Abinav Sankararaman, Aleksandrs Slivkins
AISTATS, 2019
π£π£:smiling_imp: Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms [link]
Alihan Huyuk, Cem Tekin
AISTATS, 2019
Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization [link]
Andi Nika, Sepehr Elahi, Cem Tekin
AISTATS, 2020
Information Complexity in Bandit Subset Selection [link]
Emilie Kaufmann, Shivaram Kalyanakrishnan
COLT, 2013
Online Learning with Feedback Graphs: Beyond Bandits [link]
Noga Alon, NicolΓ² Cesa-Bianchi, Ofer Dekel, Tomer Koren
COLT, 2015
First-order regret bounds for combinatorial semi-bandits [link]
Gergely Neu
COLT, 2015
π£π£Thompson Sampling for Learning Parameterized Markov Decision Processes [link]
Aditya Gopalan, Shie Mannor
COLT, 2015
Pure Exploration of Multi-armed Bandit Under Matroid Constraints [link]
Lijie Chen, Anupam Gupta, Jian Li
COLT, 2016
Best-of-K-bandits [link]
Max Simchowitz, Kevin Jamieson, Benjamin Recht
COLT, 2016
π£π£:smiling_imp: Tight Bounds for Bandit Combinatorial Optimization [link]
Alon Cohen, Tamir Hazan, Tomer Koren
COLT, 2017
π£Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration [link]
Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang
COLT, 2017
Thompson Sampling for the MNL-Bandit [link]
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
COLT, 2017
Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm [link]
Tongyi Cao, Akshay Krishnamurthy
COLT, 2019
Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem [link]
Nadav Merlis, Shie Mannor
COLT, 2019
π£Tight Lower Bounds for Combinatorial Multi-Armed Bandits [link]
Nadav Merlis, Shie Mannor
COLT, 2020
Covariance-adapting algorithm for semi-bandits with application to sparse outcomes [link]
Pierre Perrault, Michal Valko, Vianney Perchet
COLT, 2020
Adaptive Submodular Maximization under Stochastic Item Costs [link]
Srinivasan Parthasarathy
COLT, 2020
Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens [link]
Ravi Ganti, Alexander Gray
UAI, 2013
Optimal Resource Allocation with Semi-Bandit Feedback [link]
Tor Lattimore, Koby Crammer, Csaba Szepesvari
UAI, 2014
Matroid Bandits: Fast Combinatorial Optimization with Learning [link]
Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson
UAI, 2014
Thompson Sampling is Asymptotically Optimal in General Environments [link]
Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
UAI, 2016
Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences [link]
Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian Ratliff
UAI, 2018
Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank [link]
Gaurush Hiranandani, Harvineet Singh, Prakhar Gupta, Iftikhar Ahamath Burhanuddin, Zheng Wen, Branislav Kveton
UAI, 2019
π£Semi-bandit Optimization in the Dispersed Setting [link]
Maria-Florina Balcan, Travis Dick, Wesley Pegden
UAI, 2020
Submodular Bandit Problem Under Multiple Constraints [link]
Sho Takemori, Masahiro Sato, Takashi Sonoda, Janmajay Singh, Tomoko Ohkuma
UAI, 2020
Thompson Sampling for Combinatorial Bandits and Its Application to Online Feature Selection [link]
Audrey Durand, Christian GagnΓ©
AAAI, 2014
Thompson Sampling for Combinatorial Network Optimization in Unknown Environments [link]
Alihan Huyuk, Cem Tekin
IEEE/ACM Transactions on Networking, 2020