π₯ Search Hot Tweets
Search and analyze hot tweets from KOL accounts list (list: https://x.com/i/lists/1961235697677017443) within 6 hours. Use SoPilot plugin to quickly comment and occupy the comment section.

Guohao Li π«
Very late to the party. When Thinking Machines Lab released their blog on On Policy Distillation, my first reaction was that it should be just like DAGGER from 15 years ago: https://t.co/qh1NdMObBm. I finally had time to read the blog today and sure enough, they mentioned DAGGER. Actually, if you have access to an expert, there are quite a few on-policy imitation learning approaches you can try. Hereβs one incremental idea that really works from an obscure paper we published at 2018. It is basically an on policy multi-teacher imitation learning method that chooses the best teacher for each state based on their value functions. We found that itβs possible to imitate multiple imperfect teachers simultaneously, and eventually outperform all of them. Naturally, the more diverse those teachers are, the better. Would be interesting to reproduce this in the LLM era: https://t.co/xctBqEbPXb
Est. 600 views for your reply
