Background State-of-the-art (SOTA) Recommendation Systems (RecSys) need to surface highly relevant items for a user-query from an underlying pool of millions of items in less than a second. To find the relevance of even one given item one needs a sophisticated deep neural network (DNN) to predict the “likeability” of that item for the user. These DNNs are enormous and have billions of parameters, and require 10s of GB of memory. This makes running just a single forward pass computationally expensive, let alone running the forward pass for millions of items.
This is a great review paper for both frozen and co training approaches. They called it offline vs online https://arxiv.org/pdf/2006.05525.pdf
Great post! Do you have any (public) references on frozen teacher and co-training approaches? Did anyone launch this successfully?