What is Zero-Shot Learning?

If you don't know something, you can refer it to familiar ones for guessing

Zero-Shot (0S) and Few-Shot Learning is designed to address the robustness at test time. ML/DL algorithms are easily prone to unseen samples since they are developed with datasets which do not cover all possible scenarios in reality. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information which lead to distinguishable properties of non-observed classes [6].

Some types of auxiliary information:

Given the auxiliary information, the 0S setting converts multi-class classifiers into binary classifiers which removes the major challenge of classification, the fixed number of labels. Any change in the predefined class list requires re-finetuning or re-training classifiers. Also, there exists challenges of classification with many classes [7].


The 0S setting maps additional data attributes into high-dimensional features during 1. training stage which could then be referred to during 2. inference stage for unseen labels [1]. Hence, 0S is done in 2 stages:

  1. Training - knowledge about attributes are captured
  2. Inference - knowledge is used to categorize instances among a new set of classes without the need of fine-tuning

Zero-Shot Learning in NLP

Speaking of the 0S applications in NLP, 0S has been recently formulated as:


There are many other zero-shot approaches than what I have introduced above. Despite the zero-shot learning addresses the data unavailability of many classes scenario, the zero-shot setting in inference leads to high latency since there will be 100 complete binary predictions (i.e. feed-forwards) for 100 classes. Regardless, zero-shot is actively researched in both Computer Vision and Natural Language Processing.

References

[1] - Zero-Shot Learning: Can you classify an object without seeing it before?