-
Notifications
You must be signed in to change notification settings - Fork 3.1k
fix bug about sampling time from beta distribution #1605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bug about sampling time from beta distribution #1605
Conversation
fracapuano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @KleistvonLiu thank you so much for the PR, and what a nice catch ⭐
Happy to get this merged if you can:
- (1) maintain sampling encapsulated (
sample_betais better thanself.beta_distributionin terms of statefulness of the policy object)
Again, thank you so very much! 🙏
Hi @fracapuano thank you for you review and reply ! I have modified the code and I hope it work well, please review it again. 🙏 |
fracapuano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you, and what a nice catch @KleistvonLiu ⭐
* fix bug about sampling t from beta distribution * fix: address review comments ---------
* fix bug about sampling t from beta distribution * fix: address review comments ---------
* fix bug about sampling t from beta distribution * fix: address review comments ---------
* fix bug about sampling t from beta distribution * fix: address review comments ---------
* fix bug about sampling t from beta distribution * fix: address review comments ---------
* fix bug about sampling t from beta distribution * fix: address review comments ---------
* fix bug about sampling t from beta distribution * fix: address review comments ---------
What this does
The original implementation of Beta distribution for pi0 seems wrong. This PR compares sample results from three implementation (original openpi, previous lerobot and fixed), and shows that the fixed implementation is aligned to original openpi implementation and also the plot from Pi0 paper.
As shown in Pi0 paper, we sample time from Beta distribution and hope to emphasize timesteps close to noise action (in the Pi0's paper time close to 0, in Pi0's code time close to 1).

The following code compares the sample result from three implementation:
By running above code we can get following figure, which shows that the previous lerobot implementation is not as expected while the fixed one works well.
