Asap7772

Anikait Singh Asap7772

I am a PhD student in Computer Science at Stanford University. My research interests are in scaling up decision-making methods such as reinforcement learning.

82 followers · 21 following

Achievements

Organizations

Pinned Loading

fewshot-preference-optimization fewshot-preference-optimization Public

Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptation to user preferences with minimal labeled data, leveragin…

Python 11 4
understanding-rlhf understanding-rlhf Public

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy samplin…

Python 30 4
PTR PTR Public

This repository contains the implementation of the PTR algorithm described in the paper: Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning.

Python 28 3
OfflineRlWorkflow OfflineRlWorkflow Public

This repository accompanies the following paper: A Workflow for Offline Model-Free Robotic RL

Python 11 2
Cal-QL Cal-QL Public

A method that learns a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q…

Python
Personalized-Text-To-Image-Diffusion Personalized-Text-To-Image-Diffusion Public

Public Implementation of PPD

Python 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anikait Singh Asap7772

Achievements

Achievements

Organizations

Block or report Asap7772

Pinned Loading

Uh oh!