Vokenization Explained!

This video explains a new approach to Visually supervise Language models that achieves performance gains on Language-Only tasks like the GLUE benchmark and SQuAD question answering. This is done by constructing a token-image matching (vokens) and classifying corresponding tokens with a a weakly supervised loss function. Thanks for watching! Please Subscribe! Paper Links: Vokenization: ImageBERT: VilBERT:

1 view