-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Open
Labels
Description
背景
飞桨在3.1 版本推出了 类 CUDA 硬件接入方案。该方案在 Custom Device硬件接入方案 的基础上进行了升级,最大的特点是可以 复用飞桨 PHI 算子库中的大量 CUDA Kernel。 当前此方案已经成功接入沐曦(metax_gpu)和天数智芯(iluvatar_gpu)。
然而,目前PHI 算子库中的部分 CUDA Kernel 并未考虑被其他模块复用的情况,导致出现以下问题: 部分 Kernel 缺少函数声明,类 CUDA 硬件在复用时不得不直接 #include .cu 源文件,这不符合代码规范。
因此,本次活动旨在对 PHI算子库的 CUDA Kernel 进行规范化修复:
- 在Paddle仓库中为缺少头文件的 Kernel 新增对应声明文件(
.h); - 修复 PaddleCustomDevice 仓库中错误的
#include cu用法,改为#include正确的头文件。
涉及范围
-
涉及仓库
-
影响文件
在 PaddleCustomDevice 仓 中,所有被#include到注册文件中的算子 Kernel.cu源文件,共 136 个。
具体文件列表见下方表格:
任务
修复目标
- 在 PaddlePaddle 仓库 中为缺少声明的 Kernel 新增头文件;
- 在 PaddleCustomDevice 仓库 中修改错误的
#include *.cu,改为#include新增的头文件,同时把Kernel的实现代码正确的添加到CMakelists编译列表中。需要修改的代码只出现在backends/metax_gpu和backends/iluvatar_gpu这两个目录下。
| 序号 | 文件名称 | 认领人 / 状态 / PR号 |
|---|---|---|
| 1 | paddle/phi/kernels/fusion/gpu/distributed_fused_lamb_init_kernel.cu | @Le-soleile @YqGe585 |
| 2 | paddle/phi/kernels/fusion/gpu/fused_bias_act_kernel.cu | @Le-soleile |
| 3 | paddle/phi/kernels/fusion/gpu/fused_bias_dropout_residual_layer_norm_grad_kernel.cu | @wanglezz |
| 4 | paddle/phi/kernels/fusion/gpu/fused_bias_dropout_residual_layer_norm_kernel.cu | @wanglezz |
| 5 | paddle/phi/kernels/fusion/gpu/fused_embedding_eltwise_layernorm_kernel.cu | @wanglezz |
| 6 | paddle/phi/kernels/fusion/gpu/fused_layernorm_kernel.cu | @WanRui37 |
| 7 | paddle/phi/kernels/fusion/gpu/fused_seqpool_cvm_grad_kernel.cu | @SpongeBob0318 |
| 8 | paddle/phi/kernels/fusion/gpu/fused_seqpool_cvm_kernel.cu | @SpongeBob0318 |
| 9 | paddle/phi/kernels/fusion/gpu/fused_softmax_mask_grad_kernel.cu | @SpongeBob0318 |
| 10 | paddle/phi/kernels/fusion/gpu/fused_softmax_mask_kernel.cu | @youge325 |
| 11 | paddle/phi/kernels/fusion/gpu/fused_softmax_mask_upper_triangle_kernel.cu | |
| 12 | paddle/phi/kernels/fusion/gpu/fused_stack_transpose_quant_kernel.cu | @youge325 |
| 13 | paddle/phi/kernels/fusion/gpu/fused_transpose_split_quant_kernel.cu | @SpongeBob0318 |
| 14 | paddle/phi/kernels/fusion/gpu/fused_transpose_wlch_split_quant_kernel.cu | @SpongeBob0318 |
| 15 | paddle/phi/kernels/fusion/gpu/fusion_group_kernel.cu | @SpongeBob0318 |
| 16 | paddle/phi/kernels/fusion/gpu/masked_multihead_attention_kernel.cu | @Le-soleile |
| 17 | paddle/phi/kernels/fusion/gpu/qkv_unpack_mha_kernel.cu | @Le-soleile |
| 18 | paddle/phi/kernels/fusion/gpu/skip_layernorm_kernel.cu | @SpongeBob0318 |
| 19 | paddle/phi/kernels/gpu/affine_channel_grad_kernel.cu | @SpongeBob0318 |
| 20 | paddle/phi/kernels/gpu/affine_channel_kernel.cu | @SpongeBob0318 |
| 21 | paddle/phi/kernels/gpu/ap_facade_kernel.cu | @youge325 @Echo-Nie |
| 22 | paddle/phi/kernels/gpu/ap_trivial_fusion_begin_kernel.cu | @youge325 |
| 23 | paddle/phi/kernels/gpu/ap_trivial_fusion_end_kernel.cu | @youge325 |
| 24 | paddle/phi/kernels/gpu/ap_variadic_kernel.cu | @youge325 |
| 25 | paddle/phi/kernels/gpu/argsort_grad_kernel.cu | @Patrisam |
| 26 | paddle/phi/kernels/gpu/barrier_kernel.cu | @youge325 |
| 27 | paddle/phi/kernels/gpu/bce_loss_grad_kernel.cu | @Luxorion-12 |
| 28 | paddle/phi/kernels/gpu/bce_loss_kernel.cu | @tjujingzong |
| 29 | paddle/phi/kernels/gpu/binomial_kernel.cu | @tjujingzong |
| 30 | paddle/phi/kernels/gpu/bmm_grad_kernel.cu | @tjujingzong |
| 31 | paddle/phi/kernels/gpu/bmm_kernel.cu | @tjujingzong |
| 32 | paddle/phi/kernels/gpu/box_clip_kernel.cu | @algorithm1832 |
| 33 | paddle/phi/kernels/gpu/c_concat_kernel.cu | @algorithm1832 |
| 34 | paddle/phi/kernels/gpu/c_embedding_grad_kernel.cu | @algorithm1832 |
| 35 | paddle/phi/kernels/gpu/c_scatter_kernel.cu | @algorithm1832 |
| 36 | paddle/phi/kernels/gpu/c_softmax_with_cross_entropy_grad_kernel.cu | @youge325 |
| 37 | paddle/phi/kernels/gpu/cast_kernel.cu | @Patrisam |
| 38 | paddle/phi/kernels/gpu/class_center_sample_kernel.cu | @Patrisam |
| 39 | paddle/phi/kernels/gpu/collect_fpn_proposals_kernel.cu | @youge325 |
| 40 | paddle/phi/kernels/gpu/comm_init_all_kernel.cu | @youge325 |
| 41 | paddle/phi/kernels/gpu/complex_kernel.cu | |
| 42 | paddle/phi/kernels/gpu/correlation_grad_kernel.cu | @tjujingzong |
| 43 | paddle/phi/kernels/gpu/correlation_kernel.cu | @youge325 |
| 44 | paddle/phi/kernels/gpu/ctc_align_kernel.cu | |
| 45 | paddle/phi/kernels/gpu/cvm_grad_kernel.cu | @Le-soleile |
| 46 | paddle/phi/kernels/gpu/cvm_kernel.cu | @Le-soleile |
| 47 | paddle/phi/kernels/gpu/deformable_conv_grad_kernel.cu | @123wjr |
| 48 | paddle/phi/kernels/gpu/deformable_conv_kernel.cu | @123wjr |
| 49 | paddle/phi/kernels/gpu/elementwise_grad_kernel.cu | @LiaoYFBH |
| 50 | paddle/phi/kernels/gpu/embedding_with_scaled_gradient_grad_kernel.cu | @LiaoYFBH @metax666 |
| 51 | paddle/phi/kernels/gpu/exponential_kernel.cu | @LiaoYFBH |
| 52 | paddle/phi/kernels/gpu/flip_kernel.cu | @LiaoYFBH |
| 53 | paddle/phi/kernels/gpu/fused_token_prune_kernel.cu | @Le-soleile |
| 54 | paddle/phi/kernels/gpu/gather_grad_kernel.cu | @liangqi520 |
| 55 | paddle/phi/kernels/gpu/gelu_grad_kernel.cu | @Patrisam |
| 56 | paddle/phi/kernels/gpu/global_gather_kernel.cu | @Le-soleile |
| 57 | paddle/phi/kernels/gpu/global_scatter_kernel.cu | @Le-soleile |
| 58 | paddle/phi/kernels/gpu/group_norm_grad_kernel.cu | |
| 59 | paddle/phi/kernels/gpu/group_norm_kernel.cu | |
| 60 | paddle/phi/kernels/gpu/gru_kernel.cu | @algorithm1832 |
| 61 | paddle/phi/kernels/gpu/index_add_grad_kernel.cu | @algorithm1832 |
| 62 | paddle/phi/kernels/gpu/interpolate_grad_kernel.cu | @algorithm1832 |
| 63 | paddle/phi/kernels/gpu/interpolate_kernel.cu | @algorithm1832 |
| 64 | paddle/phi/kernels/gpu/kldiv_loss_grad_kernel.cu | @algorithm1832 |
| 65 | paddle/phi/kernels/gpu/kldiv_loss_kernel.cu | |
| 66 | paddle/phi/kernels/gpu/l1_norm_grad_kernel.cu | @Le-soleile |
| 67 | paddle/phi/kernels/gpu/l1_norm_kernel.cu | |
| 68 | paddle/phi/kernels/gpu/label_smooth_grad_kernel.cu | |
| 69 | paddle/phi/kernels/gpu/label_smooth_kernel.cu | |
| 70 | paddle/phi/kernels/gpu/lamb_kernel.cu | |
| 71 | paddle/phi/kernels/gpu/lgamma_kernel.cu | |
| 72 | paddle/phi/kernels/gpu/log_softmax_grad_kernel.cu | |
| 73 | paddle/phi/kernels/gpu/logsumexp_kernel.cu | |
| 74 | paddle/phi/kernels/gpu/lookup_table_grad_kernel.cu | @Le-soleile |
| 75 | paddle/phi/kernels/gpu/lookup_table_kernel.cu | @Le-soleile |
| 76 | paddle/phi/kernels/gpu/lu_solve_kernel.cu | @ChenMiaoi |
| 77 | paddle/phi/kernels/gpu/margin_cross_entropy_kernel.cu | @ChenMiaoi |
| 78 | paddle/phi/kernels/gpu/matrix_power_grad_kernel.cu | @ChenMiaoi |
| 79 | paddle/phi/kernels/gpu/matrix_power_kernel.cu | @ChenMiaoi |
| 80 | paddle/phi/kernels/gpu/mean_all_grad_kernel.cu | @Patrisam |
| 81 | paddle/phi/kernels/gpu/moe_unpermute_kernel.cu | @Le-soleile |
| 82 | paddle/phi/kernels/gpu/momentum_kernel.cu | |
| 83 | paddle/phi/kernels/gpu/mp_allreduce_sum_kernel.cu | @fsylmxx |
| 84 | paddle/phi/kernels/gpu/multiclass_nms3_kernel.cu | @fsylmxx |
| 85 | paddle/phi/kernels/gpu/multiplex_grad_kernel.cu | @fsylmxx |
| 86 | paddle/phi/kernels/gpu/nonzero_kernel.cu | |
| 87 | paddle/phi/kernels/gpu/pad3d_kernel.cu | |
| 88 | paddle/phi/kernels/gpu/partial_allgather_kernel.cu | @Le-soleile |
| 89 | paddle/phi/kernels/gpu/partial_concat_grad_kernel.cu | @Le-soleile |
| 90 | paddle/phi/kernels/gpu/partial_concat_kernel.cu | |
| 91 | paddle/phi/kernels/gpu/partial_recv_kernel.cu | @Le-soleile |
| 92 | paddle/phi/kernels/gpu/partial_send_kernel.cu | @Le-soleile |
| 93 | paddle/phi/kernels/gpu/psroi_pool_grad_kernel.cu | @xxiu1 |
| 94 | paddle/phi/kernels/gpu/quantize_linear_kernel.cu | |
| 95 | paddle/phi/kernels/gpu/reduce_kernel.cu | |
| 96 | paddle/phi/kernels/gpu/repeat_interleave_grad_kernel.cu | |
| 97 | paddle/phi/kernels/gpu/repeat_interleave_kernel.cu | |
| 98 | paddle/phi/kernels/gpu/rmsprop_kernel.cu | |
| 99 | paddle/phi/kernels/gpu/roi_align_grad_kernel.cu | |
| 100 | paddle/phi/kernels/gpu/roi_align_kernel.cu | @Le-soleile |
| 101 | paddle/phi/kernels/gpu/row_conv_grad_kernel.cu | @Le-soleile |
| 102 | paddle/phi/kernels/gpu/row_conv_kernel.cu | @Le-soleile |
| 103 | paddle/phi/kernels/gpu/seed_kernel.cu | @Le-soleile |
| 104 | paddle/phi/kernels/gpu/sequence_expand_kernel.cu | @Le-soleile |
| 105 | paddle/phi/kernels/gpu/set_value_kernel.cu | @Le-soleile |
| 106 | paddle/phi/kernels/gpu/shuffle_channel_grad_kernel.cu | @Le-soleile |
| 107 | paddle/phi/kernels/gpu/shuffle_channel_kernel.cu | @Le-soleile |
| 108 | paddle/phi/kernels/gpu/soft_relu_grad_kernel.cu | @Le-soleile |
| 109 | paddle/phi/kernels/gpu/spectral_norm_grad_kernel.cu | @Le-soleile |
| 110 | paddle/phi/kernels/gpu/spectral_norm_kernel.cu | @Le-soleile |
| 111 | paddle/phi/kernels/gpu/stack_grad_kernel.cu | |
| 112 | paddle/phi/kernels/gpu/stft_grad_kernel.cu | @Le-soleile |
| 113 | paddle/phi/kernels/gpu/sync_batch_norm_grad_kernel.cu | |
| 114 | paddle/phi/kernels/gpu/top_k_kernel.cu | |
| 115 | paddle/phi/kernels/gpu/uniform_random_batch_size_like_kernel.cu | @Le-soleile |
| 116 | paddle/phi/kernels/gpu/weighted_sample_neighbors_kernel.cu | |
| 117 | paddle/phi/kernels/gpu/yolo_box_head_kernel.cu | @Le-soleile |
| 118 | paddle/phi/kernels/gpu/yolo_box_post_kernel.cu | @Le-soleile |
| 119 | paddle/phi/kernels/kps/elementwise_kernel.cu | |
| 120 | paddle/phi/kernels/legacy/gpu/cal_aux_loss_grad_kernel.cu | @Le-soleile |
| 121 | paddle/phi/kernels/legacy/gpu/cal_aux_loss_kernel.cu | @Le-soleile |
| 122 | paddle/phi/kernels/legacy/gpu/expand_modality_expert_id_kernel.cu | @Le-soleile |
| 123 | paddle/phi/kernels/legacy/gpu/ext_build_src_rank_and_local_expert_id_kernel.cu | @Le-soleile |
| 124 | paddle/phi/kernels/legacy/gpu/fp8_quant_blockwise_kernel.cu | @Le-soleile |
| 125 | paddle/phi/kernels/legacy/gpu/int_bincount.cu | |
| 126 | paddle/phi/kernels/legacy/gpu/layer_norm_cuda_kernel.cu | |
| 127 | paddle/phi/kernels/legacy/gpu/moe_combine_grad_kernel.cu | |
| 128 | paddle/phi/kernels/legacy/gpu/moe_combine_kernel.cu | |
| 129 | paddle/phi/kernels/legacy/gpu/moe_combine_no_weight_kernel.cu | |
| 130 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_grad_kernel.cu | |
| 131 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_kernel.cu | |
| 132 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_permute_grad_kernel.cu | @Le-soleile |
| 133 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_permute_kernel.cu | @Le-soleile |
| 134 | paddle/phi/kernels/legacy/gpu/moe_ops_partial_nosoftmaxtopk_grad_kernel.cu | @Le-soleile |
| 135 | paddle/phi/kernels/legacy/gpu/moe_ops_partial_nosoftmaxtopk_kernel.cu | @Le-soleile |
| 136 | paddle/phi/kernels/legacy/kps/compare_kernel.cu |
示例修复&代码提交方式
请参考 #75226 (comment)
认领方式
请大家以 comment 的形式认领任务,如:
【报名】:1、3、2-3
- 多个任务之间需要使用中文顿号分隔,报名多个连续任务可用横线表示,如 1-2
- PR 提交格式:
- 两个仓库分别提交 PR,Paddle 的 PR 合入后,再提交 PaddleCustomDevice 的 PR
- 两个仓库的 PR 标题均以 【CUDA Kernel No.xxx】 开头,注明任务编号
- Paddle 仓库的 PR 标题以
-part结尾
看板信息
| 任务方向 | 任务数量 | 提交作品 / 任务认领 | 提交率 | 完成 | 完成率 |
|---|---|---|---|---|---|
| CUDA Kernel规范化 | 136 | 89 / 100 | 65.44% | 44 | 32.35% |
统计信息
排名不分先后 @wanglezz (3) @SpongeBob0318 (9) @youge325 (11) @Le-soleile (9) @algorithm1832 (8) @tjujingzong (1) @LiaoYFBH (2) @xxiu1 (1)
Metadata
Metadata
Labels
Type
Projects
Status
In Progress