-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sik-Ho Tang | Review -- Unsupervised Feature Learning via Non-Parametric Instance Discrimination. #134
Comments
Overview
Unsupervised Feature Learning via Non-Parametric Instance Discrimination. Instance Discrimination, by UC Berkeley / ICSI, Chinese University of Hong Kong, and Amazon Rekognition. 2018 CVPR, Over 1100 Citations.
|
Proposed Non-Parametric Softmax ClassifierA non-parametric variant of the above softmax equation is to replace Then the probability where The learning objective is then to maximize the joint probability: or equivalently to minimize the negative log-likelihood over the training set: |
Getting rid of these weight vectors is important, because the learning objective focuses entirely on the feature representation and its induced metric, which can be applied everywhere in the space and to any new instances at the test time. Also, it eliminates the need for computing and storing the gradients for |
Suitable for scenarios where the number of classes is large. Non-parametric classifier is a kind of contrastive learning??? |
Learning with A Memory Bank and NCEMemory BankTo compute the probability Separate notations are introduced for the memory bank and features forwarded from the network. Let During each learning iteration, the representation Then All the representations in the memory bank |
Noise-Contrastive Estimation (NCE)Noise-Contrastive Estimation (NCE) is used to approximate full Softmax. The basic idea is to cast the multi-class classification problem into a set of binary classification problems, where the binary classification tasks is to discriminate between data samples and noise samples. (NCE is originally used in NLP. Please feel free to read NCE if interested.) Specifically, the probability that feature representation where Noise samples are assumed to be The approximated training objective is to minimize the negative log-posterior distribution of data and noise samples: Here, Both Computing normalizing constant where NCE reduces the computational complexity from O(n) to O(1) per sample. |
Change representation smoothly!!! |
Weighted K-Nearest Neighbor ClassifierTo classify test image The top k nearest neighbors, denoted by |
Sik-Ho Tang. Review — Unsupervised Feature Learning via Non-Parametric Instance Discrimination.
The text was updated successfully, but these errors were encountered: