Skip to content

Conversation

@benieric
Copy link
Contributor

@benieric benieric commented Mar 26, 2025

Issue #, if available:

Description of changes:

  • This change should address error like model_fn() takes x positional argument but y were given
  • This error occurs under a race condition where validate_and_initialize_user_module() is called by 2 workers and extra arg is calculated incorrectly

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@benieric benieric marked this pull request as ready for review March 27, 2025 21:44
@arjkesh
Copy link

arjkesh commented Mar 28, 2025

Can you add a test to check for regression against the race condition you described? Besides this, LGTM

@davidthomas426
Copy link
Contributor

Can you add a test to check for regression against the race condition you described? Besides this, LGTM

+1.

@benieric
Copy link
Contributor Author

Thanks for taking a look, I'll work on getting a test for this in

Comment on lines +178 to +179
inference_handler.initialize(CONTEXT)
inference_handler.initialize(CONTEXT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why there are two threads calling initialize twice in a single python process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it is in the handle() method -

specifically this block:

 if not self.initialized:
                if self.attempted_init:
                    logger.warn(
                        "Model is not initialized, will try to load model again.\n"
                        "Please consider increase wait time for model loading.\n"
                    )
                self.initialize(context)

The test just assumes the fail condition already occurred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some personal testing I was able to see that that model gets loaded and later on fails when attempting to load again:

2025-03-25T18:41:48,682 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=7a3c6bfffe7cf36a-0000007c-00000000-9116165d8c9c5504-f2c269a2

...

2025-03-25T18:42:34,468 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: model_fn() takes 1 positional argument but 2 were given : 400

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing with fix, did not see such error

Copy link
Contributor

@davidthomas426 davidthomas426 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Though I'm still slightly confused about how a race condition will happen in this code. But I see that if the initialization function runs twice, it causes this problem, and I can see how this fixes that.

@davidthomas426 davidthomas426 merged commit 92b57dd into aws:main Mar 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants