-
Stanford University
- California
- asap7772.github.io
- @Anikait_Singh_
- in/asap7772
- https://huggingface.co/Asap7772
Pinned Loading
-
fewshot-preference-optimization
fewshot-preference-optimization PublicFew-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptation to user preferences with minimal labeled data, leveragin…
-
understanding-rlhf
understanding-rlhf PublicLearning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy samplin…
-
OfflineRlWorkflow
OfflineRlWorkflow PublicThis repository accompanies the following paper: A Workflow for Offline Model-Free Robotic RL
-
Cal-QL
Cal-QL PublicA method that learns a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q…
Python
-
Personalized-Text-To-Image-Diffusion
Personalized-Text-To-Image-Diffusion PublicPublic Implementation of PPD
Python 8
If the problem persists, check the GitHub status page or contact support.