SoPilotSoPilot

πŸ”₯ Search Hot Tweets

Search and analyze hot tweets from KOL accounts list (list: https://x.com/i/lists/1961235697677017443) within 6 hours. Use SoPilot plugin to quickly comment and occupy the comment section.

Real-time Hot Tweet Analysis

Guohao Li 🐫

Guohao Li 🐫

@guohao_liΒ· 8.5K followers

Very late to the party. When Thinking Machines Lab released their blog on On Policy Distillation, my first reaction was that it should be just like DAGGER from 15 years ago: https://t.co/qh1NdMObBm. I finally had time to read the blog today and sure enough, they mentioned DAGGER. Actually, if you have access to an expert, there are quite a few on-policy imitation learning approaches you can try. Here’s one incremental idea that really works from an obscure paper we published at 2018. It is basically an on policy multi-teacher imitation learning method that chooses the best teacher for each state based on their value functions. We found that it’s possible to imitate multiple imperfect teachers simultaneously, and eventually outperform all of them. Naturally, the more diverse those teachers are, the better. Would be interesting to reproduce this in the LLM era: https://t.co/xctBqEbPXb

152
6
2
19.8K
Posted 5d ago Β· Data updated 4d ago
Reply Suggestion

Est. 600 views for your reply