Commit e21dc49
[inference] rebase feature/smoothquant to main (#4842)
* [shardformer] fix GPT2DoubleHeadsModel (#4703)
* [hotfix] Fix import error: colossal.kernel without triton installed (#4722)
* [hotfix] remove triton kernels from kernel init
* revise bloom/llama kernel imports for infer
* [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)
* [shardformer] fix whisper test failed
* [shardformer] fix whisper test failed
* [shardformer] fix whisper test failed
* [shardformer] fix whisper test failed
* [doc] fix llama2 code link (#4726)
* [doc] fix llama2 code link
* [doc] fix llama2 code link
* [doc] fix llama2 code link
* [doc] Add user document for Shardformer (#4702)
* create shardformer doc files
* add docstring for seq-parallel
* update ShardConfig docstring
* add links to llama example
* add outdated massage
* finish introduction & supporting information
* finish 'how shardformer works'
* finish shardformer.md English doc
* fix doctest fail
* add Chinese document
* [format] applied code formatting on changed files in pull request 4726 (#4727)
Co-authored-by: github-actions <[email protected]>
* [doc] add shardformer support matrix/update tensor parallel documents (#4728)
* add compatibility matrix for shardformer doc
* update tp doc
* Optimized some syntax errors in the documentation and code under applications/ (#4127)
Co-authored-by: flybird11111 <[email protected]>
* [shardformer] update pipeline parallel document (#4725)
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [legacy] remove deterministic data loader test
* [shardformer] update seq parallel document (#4730)
* update doc of seq parallel
* fix typo
* [example] add gpt2 HybridParallelPlugin example (#4653)
* add gpt2 HybridParallelPlugin example
* update readme and testci
* update test ci
* fix test_ci bug
* update requirements
* add requirements
* update requirements
* add requirement
* rename file
* [doc] polish shardformer doc (#4735)
* arrange position of chapters
* fix typos in seq parallel doc
* [shardformer] add custom policy in hybrid parallel plugin (#4718)
* add custom policy
* update assert
* [example] llama2 add fine-tune example (#4673)
* [shardformer] update shardformer readme
[shardformer] update shardformer readme
[shardformer] update shardformer readme
* [shardformer] update llama2/opt finetune example and shardformer update to llama2
* [shardformer] update llama2/opt finetune example and shardformer update to llama2
* [shardformer] update llama2/opt finetune example and shardformer update to llama2
* [shardformer] change dataset
* [shardformer] change dataset
* [shardformer] fix CI
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
[example] update opt example
[example] resolve comments
fix
fix
* [example] llama2 add finetune example
* [example] llama2 add finetune example
* [example] llama2 add finetune example
* [example] llama2 add finetune example
* fix
* update llama2 example
* update llama2 example
* fix
* update llama2 example
* update llama2 example
* update llama2 example
* update llama2 example
* update llama2 example
* update llama2 example
* Update requirements.txt
* update llama2 example
* update llama2 example
* update llama2 example
* [doc] explaination of loading large pretrained models (#4741)
* [kernel] update triton init #4740 (#4740)
* [legacy] clean up legacy code (#4743)
* [legacy] remove outdated codes of pipeline (#4692)
* [legacy] remove cli of benchmark and update optim (#4690)
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694)
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696)
* [legacy] clean up utils (#4700)
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742)
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
* [format] applied code formatting on changed files in pull request 4743 (#4750)
Co-authored-by: github-actions <[email protected]>
* [misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
* [doc] explain suitable use case for each plugin
* [doc] put individual plugin explanation in front
* [doc] add model examples for each plugin
* [doc] put native colossalai plugins first in description section
* [chat]: update rm, add wandb and fix bugs (#4471)
* feat: modify forward fn of critic and reward model
* feat: modify calc_action_log_probs
* to: add wandb in sft and rm trainer
* feat: update train_sft
* feat: update train_rm
* style: modify type annotation and add warning
* feat: pass tokenizer to ppo trainer
* to: modify trainer base and maker base
* feat: add wandb in ppo trainer
* feat: pass tokenizer to generate
* test: update generate fn tests
* test: update train tests
* fix: remove action_mask
* feat: remove unused code
* fix: fix wrong ignore_index
* fix: fix mock tokenizer
* chore: update requirements
* revert: modify make_experience
* fix: fix inference
* fix: add padding side
* style: modify _on_learn_batch_end
* test: use mock tokenizer
* fix: use bf16 to avoid overflow
* fix: fix workflow
* [chat] fix gemini strategy
* [chat] fix
* sync: update colossalai strategy
* fix: fix args and model dtype
* fix: fix checkpoint test
* fix: fix requirements
* fix: fix missing import and wrong arg
* fix: temporarily skip gemini test in stage 3
* style: apply pre-commit
* fix: temporarily skip gemini test in stage 1&2
---------
Co-authored-by: Mingyan Jiang <[email protected]>
* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
* [bug] fix get_default_parser in examples (#4764)
* [doc] clean up outdated docs (#4765)
* [doc] clean up outdated docs
* [doc] fix linking
* [doc] fix linking
* [doc] add shardformer doc to sidebar (#4768)
* [chat]: add lora merge weights config (#4766)
* feat: modify lora merge weights fn
* feat: add lora merge weights config
* [lazy] support torch 2.0 (#4763)
* [lazy] support _like methods and clamp
* [lazy] pass transformers models
* [lazy] fix device move and requires grad
* [lazy] fix requires grad and refactor api
* [lazy] fix requires grad
* [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713)
* Fix the version check bug in colossalai run when generating the cmd.
* polish code
* [feature] add gptq for inference (#4754)
* [gptq] add gptq kernel (#4416)
* add gptq
* refactor code
* fix tests
* replace auto-gptq
* rname inferance/quant
* refactor test
* add auto-gptq as an option
* reset requirements
* change assert and check auto-gptq
* add import warnings
* change test flash attn version
* remove example
* change requirements of flash_attn
* modify tests
* [skip ci] change requirements-test
* [gptq] faster gptq cuda kernel (#4494)
* [skip ci] add cuda kernels
* add license
* [skip ci] fix max_input_len
* format files & change test size
* [skip ci]
* [gptq] add gptq tensor parallel (#4538)
* add gptq tensor parallel
* add gptq tp
* delete print
* add test gptq check
* add test auto gptq check
* [gptq] combine gptq and kv cache manager (#4706)
* combine gptq and kv cache manager
* add init bits
* delete useless code
* add model path
* delete usless print and update test
* delete usless import
* move option gptq to shard config
* change replace linear to shardformer
* update bloom policy
* delete useless code
* fix import bug and delete uselss code
* change colossalai/gptq to colossalai/quant/gptq
* update import linear for tests
* delete useless code and mv gptq_kernel to kernel directory
* fix triton kernel
* add triton import
* [inference] chatglm2 infer demo (#4724)
* add chatglm2
* add
* gather needed kernels
* fix some bugs
* finish context forward
* finish context stage
* fix
* add
* pause
* add
* fix bugs
* finish chatglm
* fix bug
* change some logic
* fix bugs
* change some logics
* add
* add
* add
* fix
* fix tests
* fix
* [release] update version (#4775)
* [release] update version
* [doc] revert versions
* initial commit: add colossal llama 2 (#4784)
* [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)
* Add ColossalEval
* Delete evaluate in Chat
---------
Co-authored-by: Xu Yuanchen <[email protected]>
Co-authored-by: Tong Li <[email protected]>
* [doc] add llama2 domain-specific solution news (#4789)
* [doc] add llama2 domain-specific solution news
* [fix] fix weekly runing example (#4787)
* [fix] fix weekly runing example
* [fix] fix weekly runing example
* [doc] polish shardformer doc (#4779)
* fix example format in docstring
* polish shardformer doc
* [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)
* support unsharded saving/loading for model
* support optimizer unsharded saving
* update doc
* support unsharded loading for optimizer
* small fix
* update readme
* [lazy] support from_pretrained (#4801)
* [lazy] patch from pretrained
* [lazy] fix from pretrained and add tests
* [devops] update ci
* update
* [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)
change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing
* [misc] add last_epoch in CosineAnnealingWarmupLR (#4778)
* [doc] add lazy init docs (#4808)
* [hotfix] fix norm type error in zero optimizer (#4795)
* [hotfix] Correct several erroneous code comments (#4794)
* [format] applied code formatting on changed files in pull request 4595 (#4602)
Co-authored-by: github-actions <[email protected]>
* fix format (#4815)
* [chat] fix gemini strategy (#4698)
* [chat] fix gemini strategy
* [chat] fix gemini strategy
* [chat] fix gemini strategy
* [chat] fix gemini strategy
* g# This is a combination of 2 commits.
[chat] fix gemini strategy
fox
* [chat] fix gemini strategy
update llama2 example
[chat] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* fix
* fix
* fix
* fix
* fix
* Update train_prompts.py
* Update Qwen-7B results (#4821)
Co-authored-by: Xu Yuanchen <[email protected]>
* [doc] update slack link (#4823)
* add autotune (#4822)
* update Colossal (#4832)
---------
Co-authored-by: flybird11111 <[email protected]>
Co-authored-by: Yuanheng Zhao <[email protected]>
Co-authored-by: binmakeswell <[email protected]>
Co-authored-by: Baizhou Zhang <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: digger yu <[email protected]>
Co-authored-by: Pengtai Xu <[email protected]>
Co-authored-by: Bin Jia <[email protected]>
Co-authored-by: ppt0011 <[email protected]>
Co-authored-by: Xuanlei Zhao <[email protected]>
Co-authored-by: Hongxin Liu <[email protected]>
Co-authored-by: Wenhao Chen <[email protected]>
Co-authored-by: littsk <[email protected]>
Co-authored-by: Jianghai <[email protected]>
Co-authored-by: Tong Li <[email protected]>
Co-authored-by: Yuanchen <[email protected]>
Co-authored-by: Xu Yuanchen <[email protected]>
Co-authored-by: Desperado-Jia <[email protected]>
Co-authored-by: Chandler-Bing <[email protected]>
Co-authored-by: Yan haixu <[email protected]>1 parent 068372a commit e21dc49
File tree
1,520 files changed
+68453
-49520
lines changed- .github
- ISSUE_TEMPLATE
- workflows
- scripts
- example_checks
- applications
- Chat
- benchmarks
- ray
- coati
- dataset
- experience_buffer
- experience_maker
- kernels
- models
- base
- bloom
- chatglm
- gpt
- llama
- opt
- quant
- llama_gptq
- ray
- callbacks
- trainer
- callbacks
- strategies
- evaluate
- config
- unieval
- examples
- community
- peft
- ray
- ray
- inference
- tests
- tests
- Colossal-LLaMA-2
- colossal_llama2
- dataset
- model
- tokenizer
- utils
- docs
- ColossalEval
- colossal_eval
- dataset
- evaluate
- dataset_evaluator
- models
- utils
- configs/gpt_evaluation
- config
- data
- prompt
- battle_prompt
- evaluation_prompt
- examples
- dataset_evaluation
- config
- evaluation
- inference
- gpt_evaluation
- config
- evaluation
- inference
- colossalai
- _analyzer
- _subclasses
- fx
- passes
- tracer
- amp
- naive_amp
- grad_scaler
- mixed_precision_mixin
- auto_parallel
- checkpoint
- meta_profiler
- meta_registry
- offload
- passes
- tensor_shard
- node_handler
- strategy
- solver
- utils
- autochunk
- booster
- mixed_precision
- plugin
- checkpoint_io
- cli
- benchmark
- check
- launcher
- cluster
- context
- device
- fx
- codegen
- passes
- experimental
- profiler
- experimental
- profiler_function
- profiler_module
- tracer
- bias_addition_patch
- patched_bias_addition_function
- patched_bias_addition_module
- meta_patch
- patched_function
- patched_module
- inference
- quant/gptq
- cai_gptq
- tensor_parallel
- modeling
- policies
- interface
- kernel
- cuda_native
- csrc
- gptq
- kernels
- include
- mha
- jit
- triton
- lazy
- legacy
- amp
- apex_amp
- naive_amp
- torch_amp
- builder
- communication
- context
- process_group_initializer
- random
- engine
- gradient_accumulation
- gradient_handler
- schedule
- nn
- _ops
- layer
- colossalai_layer
- parallel_1d
- parallel_2d
- parallel_2p5d
- parallel_3d
- parallel_sequence
- utils
- vanilla
- wrapper
- loss
- metric
- parallel
- layers
- cache_embedding
- pipeline
- middleware
- adaptor
- rpc
- registry
- tensor
- trainer
- hooks
- utils
- checkpoint
- data_sampler
- profiler
- legacy
- zero
- gemini
- ophooks
- paramhooks
- init_ctx
- shard_utils
- sharded_model
- sharded_optim
- sharded_param
- logging
- nn
- layer
- moe
- loss
- lr_scheduler
- optimizer
- pipeline
- middleware
- rpc
- schedule
- shardformer
- examples
- layer
- modeling
- chatglm2_6b
- policies
- shard
- tensor
- d_tensor
- testing
- utils
- checkpoint
- model
- multi_tensor_apply
- profiler/legacy
- rank_recorder
- tensor_detector
- zero
- gemini
- chunk
- memory_tracer
- legacy/gemini
- low_level
- bookkeeping
- docs
- source
- en
- advanced_tutorials
- basics
- features
- zh-Hans
- advanced_tutorials
- basics
- features
- examples
- community
- fp8/mnist
- roberta
- preprocessing
- pretraining
- model
- utils
- images
- diffusion
- configs
- ldm
- data
- models
- diffusion
- dpm_solver
- modules
- diffusionmodules
- distributions
- encoders
- image_degradation
- midas
- midas
- scripts
- tests
- dreambooth
- resnet
- vit
- inference
- language
- bert
- gpt
- experiments
- auto_offload
- auto_parallel
- pipeline_parallel
- gemini
- commons
- hybridparallelism
- titans
- configs
- dataset
- model
- llama2
- opt
- palm
- palm_pytorch
- tutorial
- auto_parallel
- hybrid_parallel
- large_batch_optimizer
- new_api
- cifar_resnet
- cifar_vit
- glue_bert
- opt
- inference
- benchmark
- script
- process-opt-175b
- opt
- sequence_parallel
- data
- datasets
- test
- tokenizer
- loss_func
- lr_scheduler
- model
- layers
- op_builder
- requirements
- tests
- components_to_test
- utils
- kit/model_zoo
- diffusers
- timm
- torchaudio
- torchrec
- torchvision
- transformers
- test_analyzer
- test_fx
- test_subclasses
- test_auto_parallel
- test_ckpt_solvers
- test_offload
- test_pass
- test_tensor_shard
- test_gpt
- test_metainfo
- test_node_handler
- test_autochunk
- test_autochunk_alphafold
- test_autochunk_diffuser
- test_autochunk_transformer
- test_autochunk_vit
- test_booster
- test_mixed_precision
- test_plugin
- test_checkpoint_io
- test_cluster
- test_config
- test_context/configs
- test_device
- test_fx
- test_codegen
- test_meta
- test_pipeline
- test_hf_model
- test_timm_model
- test_topo
- test_torchvision
- test_profiler
- test_tracer
- test_hf_model
- test_timm_model
- test_torchaudio_model
- test_torchrec_model
- test_torchvision_model
- test_gptq
- test_infer_ops
- cuda
- triton
- test_infer
- test_lazy
- test_legacy
- test_amp
- test_comm
- test_context
- configs
- test_data
- test_engine
- test_layers
- test_1d
- checks_1d
- test_2d
- checks_2d
- test_2p5d
- checks_2p5d
- test_3d
- checks_3d
- test_sequence
- checks_seq
- test_pipeline
- test_tensor
- common_utils
- core
- test_trainer
- test_pipeline
- test_utils
- test_checkpoint
- test_zero
- test_moe
- test_optimizer
- test_pipeline
- test_pipeline_utils
- test_schedule
- test_shardformer
- test_layer
- test_model
- test_tensor
- test_dtensor
- test_utils
- test_zero
- test_gemini
- test_low_level
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
1,520 files changed
+68453
-49520
lines changedThis file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
144 | | - | |
| 144 | + | |
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| |||
214 | 214 | | |
215 | 215 | | |
216 | 216 | | |
| 217 | + | |
217 | 218 | | |
218 | 219 | | |
219 | 220 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
0 commit comments