【论文精读】Learning Bounds for Importance Weighting

Learning Bounds for Importance Weighting

原论文地址：Learning Bounds for Importance Weighting

Abstract

1 Introduction

现实世界中机器学习训练数据和测试数据样本的分布会有偏差。一个常见的修正方法叫做importance weighting，它通过给不同的训练样本的带价值赋予权重来平衡这种偏差。一种常见的权重形式是 $w(x) = P(x)/Q(x)$ ，其中 $P$ 和 $Q$ 分别是测试数据分布和训练样本分布，这样做可以得到泛化误差的无偏估计。但这样做会有一些问题，Figure 1中展示了importance weighting失败的例子。

【论文精读】Learning Bounds for Importance Weighting

目标数据和源数据都是两个相同的高斯分布，但标准差不同，用标准差之比 $\sigma_Q/\sigma_P$ 表示importance weighting进行训练，当 $\sigma_Q/\sigma_P=0.3$ 时表现不好，当 $\sigma_Q/\sigma_P=0.7$ 时表现较好（两者分布越接近，误差越小）。许多文献表明importance weighting使用时需要比较谨慎并且强调需要找到收敛范围并且保证通过这种技术是可以学习的。

作者通过标准泛化边界证明，当权重有界时，important weighting可以成功。但这种条件不太实用，作者证明了即使权重不是有界的，在一个相对弱的条件，即权重的二阶矩有界时，也能保证收敛，这个条件与 $P$ 和 $Q$ 的Renyi熵有关。作者据此探索了其他reweighting的方法。

2 Preliminaries

2.1 Renyi Divergences

Renyi熵可以用来衡量两个分布之间的相关性，公式如下：

【论文精读】Learning Bounds for Importance Weighting

简单变形：

【论文精读】Learning Bounds for Importance Weighting

Importance Weight

$P$ 和 $Q$ 的importance weight定义为 $w(x) = P(x)/Q(x)$ ，可以得到如下引理及证明（期望是关于 $Q$ 的）：

【论文精读】Learning Bounds for Importance Weighting

既然是关于Q的，那么期望很容易得到就是1。根据Renyi熵，二阶矩（平方的期望）可以表示如下：

【论文精读】Learning Bounds for Importance Weighting

方差就是平方的期望（二阶矩）减去期望的平方：

【论文精读】Learning Bounds for Importance Weighting

没加weights和加了weights的loss如下：

【论文精读】Learning Bounds for Importance Weighting

用 $L_h(x)$ 表示 $L(h(x), f(x))$ ，那么由于非归一化的 $w(x)$ 是无偏的，那么：

【论文精读】Learning Bounds for Importance Weighting

下面的引理给出了二阶矩的边界：

【论文精读】Learning Bounds for Importance Weighting

对于 $\alpha=1$ ，不等式变为：

【论文精读】Learning Bounds for Importance Weighting

证明：

【论文精读】Learning Bounds for Importance Weighting

里面用到两个不等式，其中一个是赫德尔不等式。

3 LearningGuarantees-BoundedCase

根据Hoeffding不等式，

相关文章：

猜你喜欢

相关资源

下载 2023-03-19
下载 2021-06-06
下载 2021-06-23

相似解决方案

热门标签

Java Python linux javascript Mysql C# Docker 算法前端 SpringBoot Redis Vue spring 设计模式 .net core .net kubernetes c++ 数据库数据结构大数据 js 机器学习微服务 Android Go 程序员面试 JVM ASP.net core 云原生人工智能后端 PHP git CSS golang k8s Nginx Django mybatis 深度学习多线程 React 架构 devops 爬虫云计算 Spring Boot LeetCode