Skip to content

Commit b68f190

Browse files
KKZ20littsk
andauthored
[hotfix] fixing polices of sequence parallel (#4922)
* Add layer norm gradients all-reduce for sequence parallel. * Modify docs and polish code * Polish code * skip pipeline inference test * fix parameter passing when calling get_autopolicy --------- Co-authored-by: littsk <[email protected]>
1 parent 3978b37 commit b68f190

File tree

1 file changed

+1
-1
lines changed
  • colossalai/inference/tensor_parallel

1 file changed

+1
-1
lines changed

colossalai/inference/tensor_parallel/engine.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ def _shard_model_by(self, shardformer: ShardFormer, model: nn.Module) -> None:
213213
), "Discrepancy between the tp size of TPInferEngine and the tp size of shard config"
214214
model_name = model.__class__.__name__
215215
assert model_name in self.supported_models, f"Unsupported model cls {model_name} for TP inference."
216-
policy = get_autopolicy(model, inference_only=True)
216+
policy = get_autopolicy(model, shard_config=self.shard_config)
217217
self.model, _ = shardformer.optimize(model, policy)
218218

219219
if self.shard_config.inference_gptq:

0 commit comments

Comments
 (0)