Skip to content

Conversation

@Prithviraj-R
Copy link
Collaborator

No description provided.

{
const std::uint32_t ic_chunks_per_hw_thread = 8;
const std::uint32_t exec_size = 8;
const std::uint32_t exec_size = 16; // BMG = 16, DG2 = 8
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarcink How to handle this?

const auto gws_x = cm_params_.slice_ic * (round_up_next_multiple(output_shape_.w, cm_params_.block_w) / cm_params_.block_w);
const auto gws_y = round_up_next_multiple(output_shape_.h, cm_params_.block_h) / cm_params_.block_h;
const auto gws_z = (params_.input_shape.n / cm_params_.block_batch) * out_ch_size;
const auto execsize = 2; // BMG = 2, DG2 = 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarcink how to handle this?

@Prithviraj-R Prithviraj-R requested a review from smarcink July 15, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant