- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.8k
[FLINK-38581][model] Model Function supports error handling strategy #27163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[FLINK-38581][model] Model Function supports error handling strategy #27163
Conversation
03835af    to
    d377599      
    Compare
  
    d377599    to
    b4e212e      
    Compare
  
    | <ul> | ||
| <li><code>retry</code>: Retry sending the request. The retrying behavior is limited by retry-num, retry-fallback-strategy, retry-backoff-strategy and retry-backoff-base-interval.</li> | ||
| <li><code>failover</code>: Throw exceptions and fail the Flink job.</li> | ||
| <li><code>ignore</code>: Ignore the input that caused the error and continue. The error itself would be recorded in log.</li> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the HTTP connector . We have introduced metadata columns to surface the error details and a flag to continueOnError. Can we have a strategy here that would surface the error and carry on. This allows the stream to "handle" the error with dead letter queues etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I tried to add error-string, http-status-code and http-headers-map as optional output metadata columns in the latest commit. Please take a look.
I did not add http-completion-state yet, as OpenAI's Java SDK did not provide direct access to this information. Maybe we can extend this function in future.
| .defaultValue(AbstractOpenAIModelFunction.ErrorHandlingStrategy.RETRY) | ||
| .withDescription( | ||
| Description.builder() | ||
| .text( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we retry the requests when the error is retryable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the retrying functionality has been added in reference to OpenAI's Java SDK's RetryingHttpClient.
| add(408); // Retry on request timeouts | ||
| add(409); // Retry on lock timeouts | ||
| add(429); // Retry on rate limits | ||
| add(500); // Retry internal errors | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not think internal errors should be retryable. The http connector allows this list to be supplied by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retryable code list is referenced from the OpenAI SDK RetryingHttpClient.kt#L173. Since the OpenAI REST API has a relatively stable behavior and this ModelFunction does not mean to support arbitrary HTTP server with LLM yet, I don't think we need to add this flexibility for users now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of retrying ourselves, can we just use the retry ability from the RetryingHttpClient?
662e73f    to
    3927931      
    Compare
  
    | @flinkbot run azure | 
3927931    to
    de40c8b      
    Compare
  
    
What is the purpose of the change
This issue aims to support configuring Model Function's behavior when the remote requests failed, so as to better adapt to different use cases like debugging or production.
Brief change log
Adds support for the following error handling strategies
Verifying this change
Added unit tests in ModelFunctionErrorHandlingStrategyTest to cover the newly introduced functions.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation