Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Jun 25, 2024

The model tests take over 50 minutes, which is quite long and runs the risk of getting interrupted. This PR (ported from vllm-project/vllm#4874) attempts to reduce the running time by sharing the HuggingFace cache between test runs so that models need not be downloaded each time.

Please share any concerns you may have regarding this approach. I'm also not sure how to test the resulting speed-up since there is no guarantee that model tests are re-run in the same machine (and hence able to utilize the cache effectively).

Note: hostPath volumes in Kubernetes have associated security risks. Is there another way for agent-stack-k8s to use a persistent volume?

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Jun 26, 2024

I haven't noticed any significant improvement for models-test so far...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants