- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.8k
[Model] Support math-shepherd-mistral-7b-prm model #9697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Support math-shepherd-mistral-7b-prm model #9697
Conversation
| 👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these: 
 🚀 | 
493b4b2    to
    d3f0ead      
    Compare
  
    d3f0ead    to
    3e2c7f4      
    Compare
  
    3e2c7f4    to
    e62f65c      
    Compare
  
            
          
                vllm/model_executor/models/bert.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a factory method to Pooler to automatically merge the config with model specific defaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. we should be able to write
self._pooler = Pooler.from_config_with_defaults(
    pooler_config,
    # These values are overridden if they are set inside the config
    pooling_type=PoolingType.CLS,
    normalize=True,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, thanks for your effort and patience!
| Now you just have to get the tests to pass. | 
db2552b    to
    4e468a3      
    Compare
  
    Head branch was pushed to by a user without write access
| This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
f5434e1    to
    d1b0f5b      
    Compare
  
    | 
 Excuse me, the test produced the following error (as shown in the image). This doesn't seem to be caused by my code changes. Could you please advise on how to handle this? @DarkLight1337   | 
| I have retried the failing test, see if it passes this time. | 
| Looks like this issue comes from main branch, I have asked those with permissions to force-merge this. | 
| 
 Thanks so much !!! | 
Signed-off-by: Went-Liang <[email protected]> Signed-off-by: Randall Smith <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
Signed-off-by: Went-Liang <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Went-Liang <[email protected]> Signed-off-by: s.kochetkov <[email protected]>
Signed-off-by: Went-Liang <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>
FILL IN THE PR DESCRIPTION HERE
Support peiyi9979/math-shepherd-mistral-7b-prm as embedding model.
As mentioned by 9314, the Process-Supervised Reward Model, which provides reward scores for intermediate steps generated by LLMs, can offer more fine-grained optimization for Reinforcement Learning (RL). This will help the community reproduce the OpenAI O1 model. PR 9424 allows any model that adds a pooler method to be used as an embedding model.
Therefore, this PR adds a pooler method to
LlamaForCausalLM, introduces apooling-typenamed "STEP" and adds aPoolerConfigclass to facilitate users to configure the pooler method. In STEP mode, users can use thepeiyi9979/math-shepherd-mistral-7b-prmmodel by setting thepooling-step-tag-idandpooling-returned-token-idsvariables.pooling-returned-token-idsrepresents a list of indices for the vocabulary dimensions to be extracted, such as the token IDs ofgood_tokenandbad_tokenin the math-shepherd-mistral-7b-prm model. Whenpooling-step-tag-idis not None, it indicates that the score corresponding to thepooling-step-tag-idin the generated sentence should be returned. Otherwise, it returns the scores for all tokens.The model can be served with:
And a test correspond to the example in the huggingface model page is:
Of course, you can also use it directly like this:
Thank you for your time on reviewing this PR :)