Skip to content

Conversation

@lsy323
Copy link
Collaborator

@lsy323 lsy323 commented Feb 26, 2025

Add start_trace and stop_trace APIs to programmatically start and stop profiling session. Before this PR, we can only start the profiling with a time duration, or within a context manager. This support allows better control over the profiling session.

The implementation is based on the profiler implementation in JAX.

Example usage:

server = xp.start_server(8001)
xp.start_trace(profilng_dir)
# Run some computation
...
xp.stop_trace()

@lsy323 lsy323 changed the title Support programmatically start and stop profiling session Add start_trace and stop_trace API in profiler Feb 26, 2025
@lsy323 lsy323 marked this pull request as ready for review February 26, 2025 07:39
@tengyifei
Copy link
Collaborator

That's amazing

Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Siyuan!

@miladm
Copy link
Collaborator

miladm commented Feb 26, 2025

@lsy323

  • plz add this form of profiling to our documentations?
  • I wonder how large of a profile file we can create. Do we know the user experience impact if the profiling duration is super long?

cc @mikegre-google

@miladm miladm self-requested a review February 26, 2025 22:25
y.cpu()


class TestProfilerSession(absltest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a long-duration profile test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we need a long-duration profile test in torchxla. The goal of this PR is to provide better usability to users, the capability of profiler is out of scope of this PR (Should be in the underlying tsl library)

@lsy323
Copy link
Collaborator Author

lsy323 commented Feb 26, 2025

@lsy323

  • plz add this form of profiling to our documentations?
  • I wonder how large of a profile file we can create. Do we know the user experience impact if the profiling duration is super long?

cc @mikegre-google

plz add this form of profiling to our documentations?

Let me add in a follow up PR I just realized we don't have a user guide on how to use the profiler.

I wonder how large of a profile file we can create. Do we know the user experience impact if the profiling duration is super long?

If the profiling time is super long, the traced content will be omitted in the tensorboard.
image

@lsy323 lsy323 merged commit b4ba17b into master Feb 27, 2025
23 checks passed
@lsy323 lsy323 deleted the lsiyuan/profiler branch February 27, 2025 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants