### Your current environment INFO 07-24 03:31:45 logger.py:36] Received request chat-d9aa01ce9bad4c01a22eb2d07e2c8392: prompt: '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n你是谁<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=None, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 882, 128007, 271, 57668, 21043, 112471, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None. INFO 07-24 03:31:45 async_llm_engine.py:173] Added request chat-d9aa01ce9bad4c01a22eb2d07e2c8392. python3: /project/lib/Analysis/Allocation.cpp:43: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed. Aborted (core dumped) ### 🐛 Describe the bug INFO 07-24 03:31:45 logger.py:36] Received request chat-d9aa01ce9bad4c01a22eb2d07e2c8392: prompt: '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n你是谁<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=None, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 882, 128007, 271, 57668, 21043, 112471, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None. INFO 07-24 03:31:45 async_llm_engine.py:173] Added request chat-d9aa01ce9bad4c01a22eb2d07e2c8392. python3: /project/lib/Analysis/Allocation.cpp:43: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed. Aborted (core dumped)