Skip to content

Commit e21dc49

Browse files
Xu-Kaiflybird11111yuanheng-zhaobinmakeswellBaizhou Zhang
authored
[inference] rebase feature/smoothquant to main (#4842)
* [shardformer] fix GPT2DoubleHeadsModel (#4703) * [hotfix] Fix import error: colossal.kernel without triton installed (#4722) * [hotfix] remove triton kernels from kernel init * revise bloom/llama kernel imports for infer * [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [doc] fix llama2 code link (#4726) * [doc] fix llama2 code link * [doc] fix llama2 code link * [doc] fix llama2 code link * [doc] Add user document for Shardformer (#4702) * create shardformer doc files * add docstring for seq-parallel * update ShardConfig docstring * add links to llama example * add outdated massage * finish introduction & supporting information * finish 'how shardformer works' * finish shardformer.md English doc * fix doctest fail * add Chinese document * [format] applied code formatting on changed files in pull request 4726 (#4727) Co-authored-by: github-actions <[email protected]> * [doc] add shardformer support matrix/update tensor parallel documents (#4728) * add compatibility matrix for shardformer doc * update tp doc * Optimized some syntax errors in the documentation and code under applications/ (#4127) Co-authored-by: flybird11111 <[email protected]> * [shardformer] update pipeline parallel document (#4725) * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [legacy] remove deterministic data loader test * [shardformer] update seq parallel document (#4730) * update doc of seq parallel * fix typo * [example] add gpt2 HybridParallelPlugin example (#4653) * add gpt2 HybridParallelPlugin example * update readme and testci * update test ci * fix test_ci bug * update requirements * add requirements * update requirements * add requirement * rename file * [doc] polish shardformer doc (#4735) * arrange position of chapters * fix typos in seq parallel doc * [shardformer] add custom policy in hybrid parallel plugin (#4718) * add custom policy * update assert * [example] llama2 add fine-tune example (#4673) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix * [example] llama2 add finetune example * [example] llama2 add finetune example * [example] llama2 add finetune example * [example] llama2 add finetune example * fix * update llama2 example * update llama2 example * fix * update llama2 example * update llama2 example * update llama2 example * update llama2 example * update llama2 example * update llama2 example * Update requirements.txt * update llama2 example * update llama2 example * update llama2 example * [doc] explaination of loading large pretrained models (#4741) * [kernel] update triton init #4740 (#4740) * [legacy] clean up legacy code (#4743) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci * [format] applied code formatting on changed files in pull request 4743 (#4750) Co-authored-by: github-actions <[email protected]> * [misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format * [doc] explain suitable use case for each plugin * [doc] put individual plugin explanation in front * [doc] add model examples for each plugin * [doc] put native colossalai plugins first in description section * [chat]: update rm, add wandb and fix bugs (#4471) * feat: modify forward fn of critic and reward model * feat: modify calc_action_log_probs * to: add wandb in sft and rm trainer * feat: update train_sft * feat: update train_rm * style: modify type annotation and add warning * feat: pass tokenizer to ppo trainer * to: modify trainer base and maker base * feat: add wandb in ppo trainer * feat: pass tokenizer to generate * test: update generate fn tests * test: update train tests * fix: remove action_mask * feat: remove unused code * fix: fix wrong ignore_index * fix: fix mock tokenizer * chore: update requirements * revert: modify make_experience * fix: fix inference * fix: add padding side * style: modify _on_learn_batch_end * test: use mock tokenizer * fix: use bf16 to avoid overflow * fix: fix workflow * [chat] fix gemini strategy * [chat] fix * sync: update colossalai strategy * fix: fix args and model dtype * fix: fix checkpoint test * fix: fix requirements * fix: fix missing import and wrong arg * fix: temporarily skip gemini test in stage 3 * style: apply pre-commit * fix: temporarily skip gemini test in stage 1&2 --------- Co-authored-by: Mingyan Jiang <[email protected]> * [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) * fix master param sync for hybrid plugin * rewrite unwrap for ddp/fsdp * rewrite unwrap for zero/gemini * rewrite unwrap for hybrid plugin * fix geemini unwrap * fix bugs * [bug] fix get_default_parser in examples (#4764) * [doc] clean up outdated docs (#4765) * [doc] clean up outdated docs * [doc] fix linking * [doc] fix linking * [doc] add shardformer doc to sidebar (#4768) * [chat]: add lora merge weights config (#4766) * feat: modify lora merge weights fn * feat: add lora merge weights config * [lazy] support torch 2.0 (#4763) * [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad * [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) * Fix the version check bug in colossalai run when generating the cmd. * polish code * [feature] add gptq for inference (#4754) * [gptq] add gptq kernel (#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import * [inference] chatglm2 infer demo (#4724) * add chatglm2 * add * gather needed kernels * fix some bugs * finish context forward * finish context stage * fix * add * pause * add * fix bugs * finish chatglm * fix bug * change some logic * fix bugs * change some logics * add * add * add * fix * fix tests * fix * [release] update version (#4775) * [release] update version * [doc] revert versions * initial commit: add colossal llama 2 (#4784) * [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) * Add ColossalEval * Delete evaluate in Chat --------- Co-authored-by: Xu Yuanchen <[email protected]> Co-authored-by: Tong Li <[email protected]> * [doc] add llama2 domain-specific solution news (#4789) * [doc] add llama2 domain-specific solution news * [fix] fix weekly runing example (#4787) * [fix] fix weekly runing example * [fix] fix weekly runing example * [doc] polish shardformer doc (#4779) * fix example format in docstring * polish shardformer doc * [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix * update readme * [lazy] support from_pretrained (#4801) * [lazy] patch from pretrained * [lazy] fix from pretrained and add tests * [devops] update ci * update * [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) change filename: pretraining.py -> trainin.py there is no file named pretraing.py. wrong writing * [misc] add last_epoch in CosineAnnealingWarmupLR (#4778) * [doc] add lazy init docs (#4808) * [hotfix] fix norm type error in zero optimizer (#4795) * [hotfix] Correct several erroneous code comments (#4794) * [format] applied code formatting on changed files in pull request 4595 (#4602) Co-authored-by: github-actions <[email protected]> * fix format (#4815) * [chat] fix gemini strategy (#4698) * [chat] fix gemini strategy * [chat] fix gemini strategy * [chat] fix gemini strategy * [chat] fix gemini strategy * g# This is a combination of 2 commits. [chat] fix gemini strategy fox * [chat] fix gemini strategy update llama2 example [chat] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * fix * fix * fix * fix * fix * Update train_prompts.py * Update Qwen-7B results (#4821) Co-authored-by: Xu Yuanchen <[email protected]> * [doc] update slack link (#4823) * add autotune (#4822) * update Colossal (#4832) --------- Co-authored-by: flybird11111 <[email protected]> Co-authored-by: Yuanheng Zhao <[email protected]> Co-authored-by: binmakeswell <[email protected]> Co-authored-by: Baizhou Zhang <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: digger yu <[email protected]> Co-authored-by: Pengtai Xu <[email protected]> Co-authored-by: Bin Jia <[email protected]> Co-authored-by: ppt0011 <[email protected]> Co-authored-by: Xuanlei Zhao <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: Wenhao Chen <[email protected]> Co-authored-by: littsk <[email protected]> Co-authored-by: Jianghai <[email protected]> Co-authored-by: Tong Li <[email protected]> Co-authored-by: Yuanchen <[email protected]> Co-authored-by: Xu Yuanchen <[email protected]> Co-authored-by: Desperado-Jia <[email protected]> Co-authored-by: Chandler-Bing <[email protected]> Co-authored-by: Yan haixu <[email protected]>
1 parent 068372a commit e21dc49

File tree

1,520 files changed

+68453
-49520
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,520 files changed

+68453
-49520
lines changed

.flake8

Lines changed: 0 additions & 22 deletions
This file was deleted.

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
blank_issues_enabled: true
22
contact_links:
33
- name: ❓ Simple question - Slack Chat
4-
url: https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w
4+
url: https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack
55
about: This issue tracker is not for technical support. Please use our Slack chat, and ask the community for help.
66
- name: ❓ Simple question - WeChat
77
url: https://github.com/hpcaitech/ColossalAI/blob/main/docs/images/WeChat.png

.github/workflows/build_on_pr.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ jobs:
141141
runs-on: [self-hosted, gpu]
142142
container:
143143
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
144-
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
144+
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
145145
timeout-minutes: 60
146146
defaults:
147147
run:
@@ -214,6 +214,7 @@ jobs:
214214
NCCL_SHM_DISABLE: 1
215215
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
216216
TESTMON_CORE_PKGS: /__w/ColossalAI/ColossalAI/requirements/requirements.txt,/__w/ColossalAI/ColossalAI/requirements/requirements-test.txt
217+
LLAMA_PATH: /data/scratch/llama-tiny
217218

218219
- name: Store Testmon Cache
219220
run: |

.github/workflows/build_on_schedule.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
runs-on: [self-hosted, 8-gpu]
1414
container:
1515
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0
16-
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
16+
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
1717
timeout-minutes: 40
1818
steps:
1919
- name: Check GPU Availability # ensure all GPUs have enough memory
@@ -64,6 +64,7 @@ jobs:
6464
env:
6565
DATA: /data/scratch/cifar-10
6666
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
67+
LLAMA_PATH: /data/scratch/llama-tiny
6768

6869
- name: Notify Lark
6970
id: message-preparation

.github/workflows/compatiblity_test_on_dispatch.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ jobs:
5050
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
5151
container:
5252
image: ${{ matrix.container }}
53-
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
53+
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
5454
timeout-minutes: 120
5555
steps:
5656
- name: Install dependencies
@@ -92,3 +92,4 @@ jobs:
9292
DATA: /data/scratch/cifar-10
9393
NCCL_SHM_DISABLE: 1
9494
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
95+
LLAMA_PATH: /data/scratch/llama-tiny

.github/workflows/compatiblity_test_on_pr.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
4242
container:
4343
image: ${{ matrix.container }}
44-
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
44+
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
4545
timeout-minutes: 120
4646
concurrency:
4747
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test-${{ matrix.container }}
@@ -87,3 +87,4 @@ jobs:
8787
DATA: /data/scratch/cifar-10
8888
NCCL_SHM_DISABLE: 1
8989
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
90+
LLAMA_PATH: /data/scratch/llama-tiny

.github/workflows/compatiblity_test_on_schedule.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ jobs:
3838
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
3939
container:
4040
image: ${{ matrix.container }}
41-
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
41+
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 -v /data/scratch/llama-tiny:/data/scratch/llama-tiny
4242
timeout-minutes: 120
4343
steps:
4444
- name: Install dependencies
@@ -85,6 +85,7 @@ jobs:
8585
DATA: /data/scratch/cifar-10
8686
NCCL_SHM_DISABLE: 1
8787
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
88+
LLAMA_PATH: /data/scratch/llama-tiny
8889

8990
- name: Notify Lark
9091
id: message-preparation

.github/workflows/doc_test_on_pr.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ jobs:
8989
- name: Install ColossalAI
9090
run: |
9191
source activate pytorch
92-
pip install -v .
92+
CUDA_EXT=1 pip install -v .
9393
9494
- name: Test the Doc
9595
run: |

.github/workflows/doc_test_on_schedule.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232

3333
- name: Install ColossalAI
3434
run: |
35-
pip install -v .
35+
CUDA_EXT=1 pip install -v .
3636
3737
- name: Install Doc Test Requirements
3838
run: |

.github/workflows/example_check_on_dispatch.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ jobs:
5353
uses: actions/checkout@v3
5454
- name: Install Colossal-AI
5555
run: |
56-
pip install -v .
56+
CUDA_EXT=1 pip install -v .
5757
- name: Test the example
5858
run: |
5959
dir=${{ matrix.directory }}

0 commit comments

Comments
 (0)