Machine Learning

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: …

On Instrumental Variable Regression for Deep Offline Policy Evaluation

We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. …

Learning Deep Features in Instrumental Variable Regression

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by using an instrumental variable, which affects the outcome only through the …

Reproducing Kernel Methods for Nonparametric and Semiparametric Treatment Effects

We propose a family of reproducing kernel ridge estimators for nonparametric and semiparametric policy evaluation. The framework includes (i) treatment effects of the population, of subpopulations, and of alternative populations; (ii) the …

Similarity-based Classification: Connecting Similarity Learning to Binary Classification

In real-world classification problems, pairwise supervision (i.e., a pair of patterns with a binary label indicating whether they belong to the same class or not) can often be obtained at a lower cost than ordinary class labels. Similarity learning …

Uncoupled Regression from Pairwise Comparison Data

Uncoupled regression is the problem to learn a model from unlabeled data and the set of target values while the correspondence between them is unknown. Such a situation arises in predicting anonymized targets that involve sensitive information, e.g., …

Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the …

Dueling Bandits with Qualitative Feedback

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) …

Alternate Estimation of a Classifier and The Class-Prior from Positive and Unlabeled Data

We consider a problem of learning a binary classifier only from positive data and unlabeled data (PU learning) and estimating the class-prior in unlabeled data under the case-control scenario. Most of the recent methods of PU learning require an …

A Fully Adaptive Algorithm for Pure Exploration in Linear Bandits

We propose the first fully-adaptive algorithm for pure exploration in linear bandits—the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences …