您现在的位置是:首页 >技术教程 >【论文阅读】Self-Paced Boost Learning for Classification网站首页技术教程
【论文阅读】Self-Paced Boost Learning for Classification
论文下载
bib:
@INPROCEEDINGS{PiLi2016SPBL,
title = {Self-Paced Boost Learning for Classification},
author = {Te Pi and Xi Li and Zhongfei Zhang and Deyu Meng and Fei Wu and Jun Xiao and Yueting Zhuang},
booktitle = {IJCAI},
year = {2016},
pages = {1932--1938}
}
1. 摘要
Effectiveness
androbustness
are two essential aspects of supervised learning studies.
For effective learning, ensemble methods are developed to build a strong effective model from ensemble of weak models.
For robust learning, self-paced learning (
SPL
) is proposed to learn in a self-controlled pace from easy samples to complex ones.
Motivated by simultaneously enhancing the learning effectiveness and robustness, we propose a unified framework, Self-Paced Boost Learning (SPBL).
With an adaptive from-easy-to-hard pace in boosting process, SPBL asymptotically guides the model to focus more on the insufficiently learned samples with higher reliability.
Via a max-margin boosting optimization with self-paced sample selection, SPBL is capable of capturing the intrinsic inter-class discriminative patterns while ensuring the reliability of the samples involved in learning.
We formulate SPBL as a fully-corrective optimization for classification.
The experiments on several real-world datasets show the superiority of SPBL in terms of both effectiveness and robustness.
Note:
- 将
Self-paced learning
(自步学习,从容易到难的学习)和Boost
(集成学习)融合在一起,同时保证有效性与鲁棒性。
2. 算法
问题:
多分类问题
y
~
(
x
)
=
arg max
r
∈
{
1
,
…
,
C
}
F
r
(
x
;
Θ
)
(1)
widetilde{y}(x) = argmax_{r in {1, dots, C} }F_r(x; Theta) ag{1}
y
(x)=r∈{1,…,C}argmaxFr(x;Θ)(1)
- { ( x i , y i ) } i = 1 n {(x_i, y_i)}_{i=1}^n {(xi,yi)}i=1n 表示带标签的训练数据,其中又 n n n个带标签的样本。 x i ∈ R d x_i in mathbb{R}^d xi∈Rd 是第 i i i个样本的特征, y i ∈ { 1 , … , C } y_i in {1, dots, C} yi∈{1,…,C}表示第个样本的标签。
-
F
r
(
⋅
)
:
R
d
→
R
F_r(cdot):mathbb{R}^d
ightarrow mathbb{R}
Fr(⋅):Rd→R 表示将样本
x
x
x分类到类别
r
r
r的置信度得分。
值得注意的是
, 这里相当于将多分类问题转化为了 C C C个二分类问题,对应于OvA策略。优点是只用训练类别数目 C C C个分类器,缺点是,会出现类别不平衡的问题(A对应类别样本多)。 - 最后的多分类预测则是预测样本对应最大评分的类。在实际操作中,可以理解为softmax操作后对应最大概率的类(threshold)。
boost:
boost是一种集成学习中的一个方法,目的是集成多个弱学习器成为一个强学习器。
F
r
(
x
;
W
)
=
∑
j
=
1
k
w
r
j
h
j
(
x
)
,
r
∈
{
1
,
…
,
C
}
(2)
F_r(x;W) = sum_{j=1}^k w_{rj}h_j(x), r in {1, dots, C} ag{2}
Fr(x;W)=j=1∑kwrjhj(x),r∈{1,…,C}(2)
- h j ( x ) : R d → { 0 , 1 } h_j (x) : mathbb{R}^d ightarrow {0, 1} hj(x):Rd→{0,1},表示一个弱二分类器, w r j w_{rj} wrj学习器对应权重,是一个学习参数。
- W = [ w 1 , … , w C ] ∈ R k × C W = [w_1, dots, w_C ] in mathbb{R}^{k imes C} W=[w1,…,wC]∈Rk×C with each w r = [ w r 1 , … , w r k ] T w_r = [w_{r1}, dots, w_{r_k}]^{mathsf{T}} wr=[wr1,…,wrk]T.
general objective of SPBL
:
min
W
,
v
∑
i
=
1
n
v
i
∑
r
=
1
C
L
(
ρ
i
r
)
+
∑
i
=
1
n
g
(
v
i
;
λ
)
+
υ
R
(
W
)
s
.
t
.
∀
i
,
r
,
ρ
i
,
r
=
H
i
:
w
y
i
−
H
i
:
w
r
;
W
≥
0
;
v
∈
[
0
,
1
]
n
(3)
min_{W, v}sum^{n}_{i=1}v_isum^{C}_{r=1}L(
ho_{ir}) + sum^{n}_{i=1}g(v_i;lambda) + upsilon R(W) s.t. forall i,r,
ho_{i,r} = H_{i:}w_{y_i} - H_{i:}w_{r}; W geq 0; v in [0, 1]^n ag{3}
W,vmini=1∑nvir=1∑CL(ρir)+i=1∑ng(vi;λ)+υR(W)s.t.∀i,r,ρi,r=Hi:wyi−Hi:wr;W≥0;v∈[0,1]n(3).
- H ∈ R n × k H in mathbb{R}^{n imes k} H∈Rn×k with each item H i j = h j ( x i ) H_{ij} = h_j(x_i) Hij=hj(xi).
- H i : w y i = H i : × w y i , w y i = [ w y i 1 , … , w y i k ] T H_{i:}w_{y_i} = H_{i:} imes w_{y_i}, w_{y_i} = [w_{y_i1}, dots, w_{y_ik}]^{mathsf{T}} Hi:wyi=Hi:×wyi,wyi=[wyi1,…,wyik]T.
specific formulation
:
min
W
,
v
∑
i
,
r
v
i
ln
(
1
+
exp
(
−
ρ
i
r
)
)
+
∑
i
=
1
n
g
(
v
i
;
λ
)
+
υ
∥
W
∥
2
,
1
min_{W, v}sum_{i, r}v_i ln(1+ exp(-
ho_{ir})) + sum^{n}_{i=1}g(v_i;lambda) + upsilon |W|_{2, 1}
W,vmini,r∑viln(1+exp(−ρir))+i=1∑ng(vi;λ)+υ∥W∥2,1
s.t.
∀
i
,
r
,
ρ
i
,
r
=
H
i
:
w
y
i
−
H
i
:
w
r
;
W
≥
0
;
v
∈
[
0
,
1
]
n
(3)
ext{s.t.} forall i,r,
ho_{i,r} = H_{i:}w_{y_i} - H_{i:}w_{r}; W geq 0; v in [0, 1]^n ag{3}
s.t.∀i,r,ρi,r=Hi:wyi−Hi:wr;W≥0;v∈[0,1]n(3)
- ∥ W ∥ 2 , 1 ∥ = ∑ j = 1 k ∥ W j : ∥ 2 |W|_{2, 1}| = sum_{j=1}^k |W_{j:}|_2 ∥W∥2,1∥=∑j=1k∥Wj:∥2,鼓励矩阵行列都稀疏。
- the logistic loss. 我的理解该损失就是简单的对差值求 exp exp exp。区别在于现有的是二分类的概率,概率值是由 sigmod = 1 1 + e − x ext{sigmod} = frac{1}{1+ e^{-x}} sigmod=1+e−x1计算的,即 ln ( sigmod ) = − ln ( 1 + exp ( − x ) ) ln{( ext{sigmod})} = -ln(1+ exp(-x)) ln(sigmod)=−ln(1+exp(−x))。
3. 总结
关于优化目标的求解,涉及到了对偶问题(dual problem),实在是懂不了了。