[Bug]: ```--fully-sharded-loras``` doesn't work on V1

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 20.04.6 LTS (x86_64)
GCC version                  : (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version                : Could not collect
CMake version                : version 3.26.0
Libc version                 : glibc-2.31

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0+cu126
Is debug build               : False
CUDA used to build PyTorch   : 12.6
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.11 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:09:17) [GCC 11.2.0] (64-bit runtime)
Python platform              : Linux-5.15.0-1074-azure-x86_64-with-glibc2.31

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 11.7.99
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB

Nvidia driver version        : 560.35.03
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Byte Order:                           Little Endian
Address sizes:                        48 bits physical, 48 bits virtual
CPU(s):                               96
On-line CPU(s) list:                  0-95
Thread(s) per core:                   1
Core(s) per socket:                   48
Socket(s):                            2
NUMA node(s):                         4
Vendor ID:                            AuthenticAMD
CPU family:                           23
Model:                                49
Model name:                           AMD EPYC 7V12 64-Core Processor
Stepping:                             0
CPU MHz:                              2445.441
BogoMIPS:                             4890.88
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            3 MiB
L1i cache:                            3 MiB
L2 cache:                             48 MiB
L3 cache:                             384 MiB
NUMA node0 CPU(s):                    0-23
NUMA node1 CPU(s):                    24-47
NUMA node2 CPU(s):                    48-71
NUMA node3 CPU(s):                    72-95
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT disabled
Vulnerability Spec rstack overflow:   Mitigation; safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru arat umip rdpid

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-ml-py==12.575.51
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvshmem-cu12==3.3.9
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pynvml==12.0.0
[pip3] pyzmq==27.0.0
[pip3] torch==2.7.0+cu126
[pip3] torchaudio==2.7.0+cu126
[pip3] torchvision==0.22.0+cu126
[pip3] transformers==4.53.1
[pip3] triton==3.3.0
[conda] numpy                     2.2.6                    pypi_0    pypi
[conda] nvidia-cublas-cu12        12.6.4.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.6.80                  pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.6.77                  pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.6.77                  pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.5.1.17                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.3.0.4                 pypi_0    pypi
[conda] nvidia-cufile-cu12        1.11.1.6                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.7.77                pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.7.1.2                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.5.4.2                 pypi_0    pypi
[conda] nvidia-cusparselt-cu12    0.6.3                    pypi_0    pypi
[conda] nvidia-ml-py              12.575.51                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.26.2                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.85                  pypi_0    pypi
[conda] nvidia-nvshmem-cu12       3.3.9                    pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.6.77                  pypi_0    pypi
[conda] pynvml                    12.0.0                   pypi_0    pypi
[conda] pyzmq                     27.0.0                   pypi_0    pypi
[conda] torch                     2.7.0+cu126              pypi_0    pypi
[conda] torchaudio                2.7.0+cu126              pypi_0    pypi
[conda] torchvision               0.22.0+cu126             pypi_0    pypi
[conda] transformers              4.53.1                   pypi_0    pypi
[conda] triton                    3.3.0                    pypi_0    pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : N/A (dev)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  	[4mGPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	NIC1	NIC2	NIC3	NIC4	NIC5	NIC6	NIC7	CPU Affinity	NUMA Affinity	GPU NUMA ID[0m
GPU0	 X 	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	24-47	1		N/A
GPU1	NODE	 X 	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	24-47	1		N/A
GPU2	SYS	SYS	 X 	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	0-23	0		N/A
GPU3	SYS	SYS	NODE	 X 	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	0-23	0		N/A
GPU4	SYS	SYS	SYS	SYS	 X 	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	72-95	3		N/A
GPU5	SYS	SYS	SYS	SYS	NODE	 X 	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	72-95	3		N/A
GPU6	SYS	SYS	SYS	SYS	SYS	SYS	 X 	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	48-71	2		N/A
GPU7	SYS	SYS	SYS	SYS	SYS	SYS	NODE	 X 	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	48-71	2		N/A
NIC0	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	 X 	NODE	SYS	SYS	SYS	SYS	SYS	SYS				
NIC1	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	 X 	SYS	SYS	SYS	SYS	SYS	SYS				
NIC2	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	 X 	NODE	SYS	SYS	SYS	SYS				
NIC3	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	 X 	SYS	SYS	SYS	SYS				
NIC4	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	 X 	NODE	SYS	SYS				
NIC5	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	 X 	SYS	SYS				
NIC6	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	 X 	NODE				
NIC7	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	NODE	 X 				

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_ib0
  NIC1: mlx5_ib1
  NIC2: mlx5_ib2
  NIC3: mlx5_ib3
  NIC4: mlx5_ib4
  NIC5: mlx5_ib5
  NIC6: mlx5_ib6
  NIC7: mlx5_ib7

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511
NCCL_IB_PCI_RELAXED_ORDERING=1
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
NCCL_VERSION=2.13.4-1
NCCL_SOCKET_IFNAME=eth0
NCCL_DEBUG_SUBSYS=GRAPH,INIT,ENV
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NCCL_DEBUG=INFO
NVIDIA_PRODUCT_NAME=CUDA
PYTORCH_TYPE=stable
NVIDIA_CUDA_END_OF_LIFE=1
CUDA_DEVICE_ORDER=PCI_BUS_ID
CUDA_VERSION=11.7.0
NCCL_IB_TIMEOUT=22
LD_LIBRARY_PATH=/opt/nccl-rdma-sharp-plugins/lib:/opt/hpcx/ompi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NCCL_IB_DISABLE=0
OMP_NUM_THREADS=92
PYTORCH_BUILD_VERSION=1.13.1
VLLM_USE_V1=1
NCCL_TOPO_FILE=/opt/microsoft/ndv4-topo.xml
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY


```

</details>


### 🐛 Describe the bug

```python
vllm serve meta-llama/Llama-3.1-70B -tp 8 --chat-template ./template.jinja --fully-sharded-loras --no-enable-prefix-caching --enable-lora --max-lora-rank 8 --lora-modules meta-llama/Llama-3.1-70B-lora-8-0=/tmp/adapters/meta-llama/Llama-3.1-70B/lora-8-0
```

[Error Stack Trace](https://gist.github.com/shashwatj07/4a6b1dd00e80a8e0be6b35f223662c54)

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Bug]: `--fully-sharded-loras` doesn't work on V1 #20944

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[Bug]: --fully-sharded-loras doesn't work on V1 #20944

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: `--fully-sharded-loras` doesn't work on V1 #20944