Most #189

jamesthesnake · 2023-10-25T04:12:04Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

,

tra

Co

* add bench chatglm * fix bug and make utils --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

mchange bug

…ch#4938) * merge kvcache with pipeline inference and refactor the code structure * support ppsize > 2 * refactor pipeline code * do pre-commit * modify benchmark * fix bench mark * polish code * add docstring and update readme * refactor the code * fix some logic bug of ppinfer * polish readme * fix typo * skip infer test

…#4953) * [inference] Dynamic Batching for Single and Multiple GPUs (hpcaitech#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (hpcaitech#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (hpcaitech#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <[email protected]> * Revert "[inference]Re push async dynamic batching (hpcaitech#4901)" (hpcaitech#4905) This reverts commit fbf3c09. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" (hpcaitech#4909) This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (hpcaitech#4911) * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (hpcaitech#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <[email protected]> * add assertion for config (hpcaitech#4947) * [Inference] Finish dynamic batching offline test (hpcaitech#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Cuiqing Li <[email protected]> Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

…for llama token attention (hpcaitech#4965) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <[email protected]>

Co-authored-by: Xu Yuanchen <[email protected]>

* update doc * Update README.md --------- Co-authored-by: cuiqing.li <[email protected]>

…eased. (hpcaitech#4940) * Fix the bug where process groups were not being properly released. * test * Revert "test" This reverts commit 479900c.

…tech#4996)

* refactor pipeline into new CaiInferEngine * updata llama modeling forward * merge tp with pp * update docstring * optimize test workflow and example * fix typo * add assert and todo

* [release] update version * [hotfix] fix ci

* update moe module * support openmoe

…pcaitech#4926) * [hotfix] Add layer norm gradients all-reduce for sequence parallel. (hpcaitech#4915) * Add layer norm gradients all-reduce for sequence parallel. * skip pipeline inference test * [hotfix] fixing polices of sequence parallel (hpcaitech#4922) * Add layer norm gradients all-reduce for sequence parallel. * fix parameter passing when calling get_autopolicy --------- Co-authored-by: littsk <[email protected]> * Hotfix/add grad all reduce for sequence parallel (hpcaitech#4927) * Add layer norm gradients all-reduce for sequence parallel. * fix parameter passing when calling get_autopolicy * fix bug using wrong variables --------- Co-authored-by: littsk <[email protected]> * fix policy initialization * fix bloom and chatglm policices * polish code of handling layernorm * fix moe module * polish code of class initializing --------- Co-authored-by: Zhongkai Zhao <[email protected]>

hpcaitech#5007) Co-authored-by: github-actions <[email protected]>

* fix bug * fix * fix multiquery * fix multiquery --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

* Refactor MoE Manager setup method * unshard optim ckpt * optim io * update transformer version * update requirements * update ckpt * update ckpt * update ckpt * fix engine * fix engine

Co-authored-by: Xu Yuanchen <[email protected]>

* fix: add warning for EP different behavior * fix: use shard_data in ep & tp model * to: add used_capacity * fix: fix router test * feat: add create_ep_node_group * feat: add create_ep_hierarchical_group fn * feat: add HierarchicalAllToAll * test: add hierarchical all2all test * fix: fix test errors * fix: simplify create_ep_hierarchical_group * fix: add hierarchical_alltoall arg * fix: fix environ typo * revert: revert process mesh order * to: add todo mark * fix: skip hierarchical_comm if torch < 1.13.1

…ng (hpcaitech#5018) * Fix serialization error with Tensor Parallel state saving * Refactor state_dict CPU transfer using tree_map

* [colossalai]fix typo * [inference] Add smmoothquant for llama (hpcaitech#4904) * [inference] add int8 rotary embedding kernel for smoothquant (hpcaitech#4843) * [inference] add smoothquant llama attention (hpcaitech#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (hpcaitech#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (hpcaitech#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (hpcaitech#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (hpcaitech#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (hpcaitech#4902) * rafactor code * add license * add torch-int and smoothquant license * Update flash_attention_patch.py To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer. huggingface/transformers#25598 * [kernel] support pure fp16 for cpu adam and update gemini optim tests (hpcaitech#4921) * [kernel] support pure fp16 for cpu adam (hpcaitech#4896) * [kernel] fix cpu adam kernel for pure fp16 and update tests (hpcaitech#4919) * [kernel] fix cpu adam * [test] update gemini optim test * [format] applied code formatting on changed files in pull request 4908 (hpcaitech#4918) Co-authored-by: github-actions <[email protected]> * [gemini] support gradient accumulation (hpcaitech#4869) * add test * fix no_sync bug in low level zero plugin * fix test * add argument for grad accum * add grad accum in backward hook for gemini * finish implementation, rewrite tests * fix test * skip stuck model in low level zero test * update doc * optimize communication & fix gradient checkpoint * modify doc * cleaning codes * update cpu adam fp16 case * [hotfix] fix torch 2.0 compatibility (hpcaitech#4936) * [hotfix] fix launch * [test] fix test gemini optim * [shardformer] fix vit * [test] add no master test for low level zero plugin (hpcaitech#4934) * [format] applied code formatting on changed files in pull request 4820 (hpcaitech#4886) Co-authored-by: github-actions <[email protected]> * [nfc] fix some typo with colossalai/ docs/ etc. (hpcaitech#4920) * [Refactor] Integrated some lightllm kernels into token-attention (hpcaitech#4946) * add some req for inference * clean codes * add codes * add some lightllm deps * clean codes * hello * delete rms files * add some comments * add comments * add doc * add lightllm deps * add lightllm cahtglm2 kernels * add lightllm cahtglm2 kernels * replace rotary embedding with lightllm kernel * add some commnets * add some comments * add some comments * add * replace fwd kernel att1 * fix a arg * add * add * fix token attention * add some comments * clean codes * modify comments * fix readme * fix bug * fix bug --------- Co-authored-by: cuiqing.li <[email protected]> Co-authored-by: CjhHa1 <[email protected]> * [test] merge old components to test to model zoo (hpcaitech#4945) * [test] add custom models in model zoo * [test] update legacy test * [test] update model zoo * [test] update gemini test * [test] remove components to test * [inference] add reference and fix some bugs (hpcaitech#4937) * add reference and fix some bugs * update gptq init --------- Co-authored-by: Xu Kai <[email protected]> * [Inference]ADD Bench Chatglm2 script (hpcaitech#4963) * add bench chatglm * fix bug and make utils --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [Pipeline inference] Combine kvcache with pipeline inference (hpcaitech#4938) * merge kvcache with pipeline inference and refactor the code structure * support ppsize > 2 * refactor pipeline code * do pre-commit * modify benchmark * fix bench mark * polish code * add docstring and update readme * refactor the code * fix some logic bug of ppinfer * polish readme * fix typo * skip infer test * updated c++17 compiler flags (hpcaitech#4983) * [Inference] Dynamic Batching Inference, online and offline (hpcaitech#4953) * [inference] Dynamic Batching for Single and Multiple GPUs (hpcaitech#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (hpcaitech#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (hpcaitech#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <[email protected]> * Revert "[inference]Re push async dynamic batching (hpcaitech#4901)" (hpcaitech#4905) This reverts commit fbf3c09. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" (hpcaitech#4909) This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (hpcaitech#4911) * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (hpcaitech#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <[email protected]> * add assertion for config (hpcaitech#4947) * [Inference] Finish dynamic batching offline test (hpcaitech#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Cuiqing Li <[email protected]> Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (hpcaitech#4965) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <[email protected]> * fix ColossalEval (hpcaitech#4992) Co-authored-by: Xu Yuanchen <[email protected]> * [doc]Update doc for colossal-inference (hpcaitech#4989) * update doc * Update README.md --------- Co-authored-by: cuiqing.li <[email protected]> * [hotfix] Fix the bug where process groups were not being properly released. (hpcaitech#4940) * Fix the bug where process groups were not being properly released. * test * Revert "test" This reverts commit 479900c. * [hotfix] fix the bug of repeatedly storing param group (hpcaitech#4951) * [doc] add supported feature diagram for hybrid parallel plugin (hpcaitech#4996) * [Pipeline Inference] Merge pp with tp (hpcaitech#4993) * refactor pipeline into new CaiInferEngine * updata llama modeling forward * merge tp with pp * update docstring * optimize test workflow and example * fix typo * add assert and todo * [release] update version (hpcaitech#4995) * [release] update version * [hotfix] fix ci * [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp * fix fix fix * update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO * support fused layernorm support fused layernorm support fused layernorm * update fusedlayernorm update fusedlayernorm update fusedlayernorm * add sequence parallel to gemini add sequence parallel to gemini * fix * fix comments fix comments fix comments * fix * fix t5 * clear cache * fix * activate ci * activate ci * fix * fix * fix * fix * revert * modify tp gather method modify tp gather method modify tp gather method modify tp gather method * fix test --------- Co-authored-by: Xu Kai <[email protected]> Co-authored-by: Zian(Andy) Zheng <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: Baizhou Zhang <[email protected]> Co-authored-by: Zhongkai Zhao <[email protected]> Co-authored-by: digger yu <[email protected]> Co-authored-by: Cuiqing Li <[email protected]> Co-authored-by: cuiqing.li <[email protected]> Co-authored-by: CjhHa1 <[email protected]> Co-authored-by: Xu Kai <[email protected]> Co-authored-by: Jianghai <[email protected]> Co-authored-by: Bin Jia <[email protected]> Co-authored-by: アマデウス <[email protected]> Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Yuanchen <[email protected]> Co-authored-by: Xu Yuanchen <[email protected]> Co-authored-by: littsk <[email protected]> Co-authored-by: ppt0011 <[email protected]>

* [refactor]: replace inference args with extra_kwargs in ShardConfig * modify shardconfig * polish code * fix policy bug in llama * fix bug in auto policy * remove setattr in ShardConfig

* update flash-context-attention * adding kernels * fix * reset * add build script * add building process * add llama2 exmaple * add colossal-llama2 test * clean * fall back test setting * fix test file * clean * clean * clean --------- Co-authored-by: cuiqing.li <[email protected]>

… loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (hpcaitech#5017) * Use p2p * Cannot bidirectonal send p2p * Refactor tensor creation and serialization in P2P communication * Fix llama forward args in flash attention * Add flop estimate from megatron * Support loading weight not in weight_map when strict=False in hybrid_parallel * Use send_forward_recv_backward, etc in 1f1b * Use dataclass for metdata Remove torch.cuda.synchronize() as suggested * Add comment about the torch.cuda.synchronize for potential error * Typo * Update hybrid_parallel_checkpoint_io.py * Update p2p.py * Update one_f_one_b.py * Update p2p.py --------- Co-authored-by: flybird11111 <[email protected]>

* support ddp * fix * fix * fix fix * support ddp * fix * fix * fix fix * simplify tests * fix * fix * fix fix fix * fix

…5055) * fix-llama * Update llama.py

…#5032) * feat: modify create_ep_hierarchical_group args * test: add ep tests * fix: remove get_process_group_ranks * fix: fix src_rank

…itech#5060) fix llama example

* [inference] support only TP (hpcaitech#4998) * support only tp * enable tp * add support for bloom (hpcaitech#5008) * [refactor] refactor gptq and smoothquant llama (hpcaitech#5012) * refactor gptq and smoothquant llama * fix import error * fix linear import torch-int * fix smoothquant llama import error * fix import accelerate error * fix bug * fix import smooth cuda * fix smoothcuda * [Inference Refactor] Merge chatglm2 with pp and tp (hpcaitech#5023) merge chatglm with pp and tp * [Refactor] remove useless inference code (hpcaitech#5022) * remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2 * [Refactor] refactor policy search and quant type controlling in inference (hpcaitech#5035) * [Refactor] refactor policy search and quant type controling in inference * [inference] update readme (hpcaitech#5051) * update readme * update readme * fix architecture * fix table * fix table * [inference] udpate example (hpcaitech#5053) * udpate example * fix run.sh * fix rebase bug * fix some errors * update readme * add some features * update interface * update readme * update benchmark * add requirements-infer --------- Co-authored-by: Bin Jia <[email protected]> Co-authored-by: Zhongkai Zhao <[email protected]>

* added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by: cuiqing.li <[email protected]>

* [npu] setup device utils (hpcaitech#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (hpcaitech#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (hpcaitech#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support

* update examples and engine * fix choices * update example

hpcaitech#5072) Co-authored-by: github-actions <[email protected]>

…ark (hpcaitech#5074) * fix init model with random parameters * fix example

* [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples

…signment index out of range (hpcaitech#5085)

…#5064) * hotfix/Fix get model policy strategy in ShardFormer * fix bug in auto policy

…ad it (hpcaitech#5084) * fix flash attn * fix fix

* llama 3d * update * fix autocast

* add langchain * add langchain * Add files via upload * add langchain * fix style * fix style: remove extra space * add pytest; modified retriever * add pytest; modified retriever * add tests to build_on_pr.yml * fix build_on_pr.yml * fix build on pr; fix environ vars * seperate unit tests for colossalqa from build from pr * fix container setting; fix environ vars * commented dev code * add incremental update * remove stale code * fix style * change to sha3 224 * fix retriever; fix style; add unit test for document loader * fix ci workflow config * fix ci workflow config * add set cuda visible device script in ci * fix doc string * fix style; update readme; refactored * add force log info * change build on pr, ignore colossalqa * fix docstring, captitalize all initial letters * fix indexing; fix text-splitter * remove debug code, update reference * reset previous commit * update LICENSE update README add key-value mode, fix bugs * add files back * revert force push * remove junk file * add test files * fix retriever bug, add intent classification * change conversation chain design * rewrite prompt and conversation chain * add ui v1 * ui v1 * fix atavar * add header * Refactor the RAG Code and support Pangu * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo. * resolved conversation. tested scripts under examples. web demo still buggy * fix ci tests * Some modifications to add ChatGPT api * modify llm.py and remove unnecessary files * Delete applications/ColossalQA/examples/ui/test_frontend_input.json * Remove OpenAI api key * add colossalqa * move files * move files * move files * move files * fix style * Add Readme and fix some bugs. * Add something to readme and modify some code * modify a directory name for clarity * remove redundant directory * Correct a type in llm.py * fix AI prefix * fix test_memory.py * fix conversation * fix some erros and typos * Fix a missing import in RAG_ChatBot.py * add colossalcloud LLM wrapper, correct issues in code review --------- Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Orion-Zheng <[email protected]> Co-authored-by: Zian(Andy) Zheng <[email protected]> Co-authored-by: Orion-Zheng <[email protected]>

m

Ra

jamesthesnake and others added 30 commits September 17, 2023 00:57

Merge pull request #169 from hpcaitech/main

750ec01

,

Merge pull request #183 from hpcaitech/main

786d11e

tra

Merge pull request #184 from jamesthesnake/co

0789a8d

Co

[Inference]ADD Bench Chatglm2 script (hpcaitech#4963)

c6cd629

* add bench chatglm * fix bug and make utils --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

Merge pull request #188 from hpcaitech/main

a000b61

mchange bug

updated c++17 compiler flags (hpcaitech#4983)

4e4a10c

fix ColossalEval (hpcaitech#4992)

abe071b

Co-authored-by: Xu Yuanchen <[email protected]>

[doc]Update doc for colossal-inference (hpcaitech#4989)

4f0234f

* update doc * Update README.md --------- Co-authored-by: cuiqing.li <[email protected]>

[hotfix] Fix the bug where process groups were not being properly rel…

be82b5d

…eased. (hpcaitech#4940) * Fix the bug where process groups were not being properly released. * test * Revert "test" This reverts commit 479900c.

[hotfix] fix the bug of repeatedly storing param group (hpcaitech#4951)

c040d70

[doc] add supported feature diagram for hybrid parallel plugin (hpcai…

335cb10

…tech#4996)

[Pipeline Inference] Merge pp with tp (hpcaitech#4993)

b6696be

* refactor pipeline into new CaiInferEngine * updata llama modeling forward * merge tp with pp * update docstring * optimize test workflow and example * fix typo * add assert and todo

[release] update version (hpcaitech#4995)

8993c8a

* [release] update version * [hotfix] fix ci

[moe] merge moe into main (hpcaitech#4978)

dc003c3

* update moe module * support openmoe

[hotfix] fix grad accumulation plus clipping for gemini (hpcaitech#5002)

d99b2c9

[format] applied code formatting on changed files in pull request 4926 (

c36e782

hpcaitech#5007) Co-authored-by: github-actions <[email protected]>

[Inference] Fix bug in ChatGLM2 Tensor Parallelism (hpcaitech#5014)

ef4c14a

* fix bug * fix * fix multiquery * fix multiquery --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

[misc] add code owners (hpcaitech#5024)

67f5331

[moe] support optimizer checkpoint (hpcaitech#5015)

f71e63b

* Refactor MoE Manager setup method * unshard optim ckpt * optim io * update transformer version * update requirements * update ckpt * update ckpt * update ckpt * fix engine * fix engine

Support mtbench (hpcaitech#5025)

239cd92

Co-authored-by: Xu Yuanchen <[email protected]>

[shardformer] Fix serialization error with Tensor Parallel state savi…

a448938

…ng (hpcaitech#5018) * Fix serialization error with Tensor Parallel state saving * Refactor state_dict CPU transfer using tree_map

[hotfix] Suport extra_kwargs in ShardConfig (hpcaitech#5031)

70885d7

* [refactor]: replace inference args with extra_kwargs in ShardConfig * modify shardconfig * polish code * fix policy bug in llama * fix bug in auto policy * remove setattr in ShardConfig

fix wrong EOS token in ColossalChat

43ad0d9

zeyugao and others added 26 commits November 16, 2023 20:15

[gemini] gemini support extra-dp (hpcaitech#5043)

3e02154

* support ddp * fix * fix * fix fix * support ddp * fix * fix * fix fix * simplify tests * fix * fix * fix fix fix * fix

[shardformer] fix llama error when transformers upgraded. (hpcaitech#…

97cd0cd

…5055) * fix-llama * Update llama.py

[hotfix]: modify create_ep_hierarchical_group and add test (hpcaitech…

3c08f17

…#5032) * feat: modify create_ep_hierarchical_group args * test: add ep tests * fix: remove get_process_group_ranks * fix: fix src_rank

[exampe] fix llama example' loss error when using gemini plugin (hpca…

bc09b95

…itech#5060) fix llama example

[Kernels]added flash-decoidng of triton (hpcaitech#5063)

bce9197

* added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by: cuiqing.li <[email protected]>

[misc] remove outdated submodule (hpcaitech#5070)

8d56c9c

[hotfix/hybridengine] fix bug when tp*pp size = 1 (hpcaitech#5069)

0c7d8be

[inference] update examples and engine (hpcaitech#5073)

fb103cf

* update examples and engine * fix choices * update example

[format] applied code formatting on changed files in pull request 5067 (

8921a73

hpcaitech#5072) Co-authored-by: github-actions <[email protected]>

[hotfix/hybridengine] Fix init model with random parameters in benchm…

4e3959d

…ark (hpcaitech#5074) * fix init model with random parameters * fix example

[inference] refactor examples and fix schedule (hpcaitech#5077)

1cd7efc

* [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples

fix thrust-transform-reduce error (hpcaitech#5078)

dce05da

[nfc] fix typo in docs/ (hpcaitech#4972)

fd3567e

[nfc] fix typo and author name (hpcaitech#5089)

0d48230

[gemini]fix gemini optimzer, saving Shardformer in Gemini got list as…

4ccb9de

…signment index out of range (hpcaitech#5085)

[Hotfix] Fix model policy matching strategy in ShardFormer (hpcaitech…

75af66c

…#5064) * hotfix/Fix get model policy strategy in ShardFormer * fix bug in auto policy

[shardformer]fix flash attention, when mask is casual, just don't unp…

aae4966

…ad it (hpcaitech#5084) * fix flash attn * fix fix

[npu] add npu support for hybrid plugin and llama (hpcaitech#5090)

3acbf6d

* llama 3d * update * fix autocast

remove duplicate import (hpcaitech#5100)

68fcaa2

fix typo change lazy_iniy to lazy_init (hpcaitech#5099)

2bdf76f

Merge pull request #192 from hpcaitech/main

35722c5

m

Merge pull request #193 from jamesthesnake/ra

62ebc0a

Ra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Most #189

Most #189

Uh oh!

jamesthesnake commented Oct 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Most #189

Are you sure you want to change the base?

Most #189

Uh oh!

Conversation

jamesthesnake commented Oct 25, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants