-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[microNPU] Improve cascader memory transfer estimates #10508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: Idadc5f354dce42c8dbcdcbe281d324adddb41ba3
|
@ekalda would be able to take a look here ? |
ekalda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Some nits, can be done in a follow up :)
| read_latency: int = 0, | ||
| write_latency: int = 0, | ||
| burst_length: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add these to the docstring as well
| int read_bandwidth; | ||
| /*! \brief The write bandwidth of the region in bytes per cycle */ | ||
| int write_bandwidth; | ||
| /*! \brief The read bandwidth of the region in bytes per cycle */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Update the docstring
| output_cycles *= reduce(lambda a, b: a * b, output_block, 1) | ||
| output_cycles = int(math.ceil(output_cycles)) | ||
| block_config.append(BlockConfig(output_block, 0, output_cycles)) | ||
| block_config.append(BlockConfig(output_block, output_block, 0, output_cycles)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the larger of the input blocks?
| int read_cycles = bytes_transferred * home_region->read_bandwidth + | ||
| input_configs[i]->GetHomeRegion()->read_latency; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| int read_cycles = bytes_transferred * home_region->read_bandwidth + | |
| input_configs[i]->GetHomeRegion()->read_latency; | |
| int read_cycles = bytes_transferred * home_region->read_bandwidth + | |
| home_region->read_latency; |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets take it in a follow up :)
|
Thanks @ekalda @jacobbohlin ! this is merged now. |
Change-Id: Idadc5f354dce42c8dbcdcbe281d324adddb41ba3
This PR improves the cascader's memory transfer cycle estimates by taking memory burst length and latency into account. Copying tensors between external memories are also more accurately modeled.