Weekly Report [3]
Weekly Report [3]
Jinning, 08/02/2018
[Newest Research Proposal]
[Project Github]
Propensity weighted BCE loss
Run experiment with both propensity weighted or not. All the two result are based on same hyperparameters:
Non-Propensity (weight: 1) Epoch 10:
IPS: 54.5463082902
,IPS_std: 2.943
propensity (weight: 1/propensity) Epoch 10:
IPS: 55.1079999611
,IPS_std: 6.328
Intuition:
Introducing propensity can improve the performance of LR model slightly. Overfitting will probably happen at epoch 10. It’s possible that propensity can reduce the bias and relieve overfitting.
Clip Experiment
Clip propensity weighted BCE loss. This means the weight applied to BCE loss is not simply \(\frac{1}{p}\), but \(\min\{\frac{1}{p}, c\}\), where \(c\) is a constant. In my experiment, \(c\) is selected as 1
, 5
, 10
, 20
, 50
, 100
, 200
, 300
, 500
.
When \(c=1\), this is equivalent to unweighted BCE. When \(c=500\), since \(\frac{1}{p}\) is less than \(500\), this is equivalent to \(\frac{1}{p}\times loss\).
Result:
Note that this experiment is trained on full training set and test on small test set.
Not so reliable.
The highest IPS appears around \(c=200\). So I guess the best \(c\) is the average or median of all the propensity values.