SoftmaxWithLoss+OHEM Main idea
- Choosing those samples with top loss
- Dont backward loss for ignored samples API description
- use_use_hard_mining: if it is false, it is a traditional SoftmaxWithLoss
- batch_size: how many samples are taken into consideration
- hard_ratio: the ratio of hard samples (top most samples in loss) of batch_size, if it is zero, it is just the softmax loss function