Weekly Report [3]

Weekly Report [3]

Jinning, 08/02/2018

[Newest Research Proposal]

[Project Github]

Propensity weighted BCE loss

Run experiment with both propensity weighted or not. All the two result are based on same hyperparameters:

  • Non-Propensity (weight: 1) Epoch 10: IPS: 54.5463082902, IPS_std: 2.943

  • propensity (weight: 1/propensity) Epoch 10: IPS: 55.1079999611, IPS_std: 6.328

Intuition:

Introducing propensity can improve the performance of LR model slightly. Overfitting will probably happen at epoch 10. It’s possible that propensity can reduce the bias and relieve overfitting.

Clip Experiment

Clip propensity weighted BCE loss. This means the weight applied to BCE loss is not simply \(\frac{1}{p}\), but \(\min\{\frac{1}{p}, c\}\), where \(c\) is a constant. In my experiment, \(c\) is selected as 1, 5, 10, 20, 50, 100, 200, 300, 500.

When \(c=1\), this is equivalent to unweighted BCE. When \(c=500\), since \(\frac{1}{p}\) is less than \(500\), this is equivalent to \(\frac{1}{p}\times loss\).

Result:

Note that this experiment is trained on full training set and test on small test set.
Not so reliable.

The highest IPS appears around \(c=200\). So I guess the best \(c\) is the average or median of all the propensity values.