Skip to content

Conversation

@rarajes2
Copy link
Contributor

@rarajes2 rarajes2 commented Oct 31, 2025

COMPLETES #< CAI-7258 >

This pull request addresses

Adds the call keepalive error handling scenarios for 4xx and 5xx cases where for 4xx cases call cleanup happens and for 5xx keepalive retries are done based on the value received in the response header retry-after followed by fallback value if it's not present.

by making the following changes

Normal keepalive for call will be 10 minutes but if receives 5xx response then it will schedule the re-tries based on the after-retry header value and fallback to 30 seconds

Vidcast - https://app.vidcast.io/share/670a12d0-1b16-4561-8d86-f34df10e3a63

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tooling change
  • Internal code refactor

The following scenarios were tested

< ENUMERATE TESTS PERFORMED, WHETHER MANUAL OR AUTOMATED >

Tested the following with the burp suite (to update the response header)

  • For 4xx cases the call was being cleaned up
  • For 5xx cases with retry-after, it honors the value if provided else fallback to 30 sec retry interval
  • Tries max of 4 retries and if all fails it stops the retries
  • Failed 3 keepalives and then let 1 pass which did reset the counter and again letting the keepalive fail for 4 times stops sending further keepalive request

The GAI Coding Policy And Copyright Annotation Best Practices

  • GAI was not used (or, no additional notation is required)
  • Code was generated entirely by GAI
  • GAI was used to create a draft that was subsequently customized or modified
  • Coder created a draft manually that was non-substantively modified by GAI (e.g., refactoring was performed by GAI on manually written code)
  • Tool used for AI assistance (GitHub Copilot / Other - specify)
    • Github Copilot
    • Other - Please Specify
  • This PR is related to
    • Feature
    • Defect fix
    • Tech Debt
    • Automation

I certified that

  • I have read and followed contributing guidelines
  • I discussed changes with code owners prior to submitting this pull request
  • I have not skipped any automated checks
  • All existing and new tests passed
  • I have updated the documentation accordingly

Make sure to have followed the contributing guidelines before submitting.

@rarajes2 rarajes2 requested a review from a team as a code owner October 31, 2025 05:14
@rarajes2 rarajes2 marked this pull request as draft October 31, 2025 05:15
@rarajes2 rarajes2 added the validated If the pull request is validated for automation. label Nov 5, 2025
@rarajes2 rarajes2 marked this pull request as ready for review November 5, 2025 05:36
@rarajes2 rarajes2 changed the title update call keepalive retry logic fix(calling): update call keepalive retry logic Nov 5, 2025
Copy link
Contributor

@Kesari3008 Kesari3008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the test files for now, because there are comments for code changes.

Copy link
Contributor

@adhmenon adhmenon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test this with the agent desktop?
Just want to ensure it does not break the flow.

Copy link
Contributor

@adhmenon adhmenon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
But let us test once with agent desktop just to be safe.

expect(emitSpy).toHaveBeenCalledWith(CALL_EVENT_KEYS.DISCONNECT, call.getCorrelationId());
});

it('session refresh 403 ends the call without emitting error', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

403 case is update here.

Comment on lines -1554 to -1560
setTimeout(() => {
/* We first post the status and then recursively call the handler which
* starts the timer again
*/
this.postStatus();
this.sendCallStateMachineEvt({type: 'E_CALL_ESTABLISHED'});
}, interval * 1000);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setTimeout is not required as it was causing the this.postStatus() call to be made twice - once here and next on invoking the handleCallEstablished through state machine.

Now we are only calling handleCallEstablished through state machine immediately as the handleCallEstablished makes the keepalive call inside the setInterval which adds the required delay and keeps the counter consistent.

Copy link
Contributor

@Kesari3008 Kesari3008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR but there are still significant changes needed. Please address these

this.emit(CALL_EVENT_KEYS.DISCONNECT, this.getCorrelationId());
this.callKeepaliveRetryCount = 0;

return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't skip logs uploading for final error cases, this return would cause that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this.postStatus();
this.sendCallStateMachineEvt({type: 'E_CALL_ESTABLISHED'});
}, interval * 1000);
this.callKeepaliveRetryCount += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't need to remove the setTimeout. Retry logic was implemented in a way that if it fails, we were immediately trying one more time and schedule the timeout to try it again after specific interval. Even if we wanted to avoid the postStatus being sent twice, we should just remove postStatus and keep setting the state machine to established with a interval timeout to trigger the keepalive retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we wanted to avoid the postStatus being sent twice, we should just remove postStatus and keep setting the state machine to established with a interval timeout to trigger the keepalive retry

This logic is already there. I have removed the unnecessary setTimeout and postStatus, instead I am directly invoking the logic with the state machine. Please check the new logic. If required we can discuss over a call

this.postStatus();
this.sendCallStateMachineEvt({type: 'E_CALL_ESTABLISHED'});
}, interval * 1000);
this.callKeepaliveRetryCount += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this logic with the first keepalive retry itself, count will start increasing

Copy link
Contributor Author

@rarajes2 rarajes2 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only being increased in case of error.

endSpy
);

expect(emitted).toBe(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check for for abort being false for non-final error cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

expect(retrySpy).not.toHaveBeenCalled();
});

it('503 during keepalive with retry-after does not emit and retries with interval', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should say does not emit what

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to -> 503 during keepalive with retry-after does not invoke emitterCb and retries with interval

logObj.file
);

expect(emitted).toBe(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be check for accurate emitter callback where we emit error event instead of plain setting it true or false. Or just pass jest.fn() and check for it's invocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check ensures that the passed emitterCb is being invoked. We don't need to make it complicated.


expect(clearInterval).toHaveBeenCalledTimes(1);
expect(funcSpy).toBeCalledTimes(1);
expect(emitSpy).toHaveBeenCalledWith(CALL_EVENT_KEYS.DISCONNECT, call.getCorrelationId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add expect to check for end API being invoked, media connection getting closed and call getting cleared from callCollection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that make it an integration test? There would be a separate unit test for that already present

call['handleCallEstablished']({} as CallEvent);
}

jest.advanceTimersByTime(DEFAULT_SESSION_TIMER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only advance here as per retyr after and not default session timer right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time invocation of postStatus will only happen after the DEFAULT_SESSION_TIMER

jest.spyOn(call, 'postStatus').mockRejectedValue(errorPayload);

// Manually set the retry count to 3 so it becomes 4 after increment
call['callKeepaliveRetryCount'] = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should let the code run 4 times and count getting increased on its own. And add expect for postStatus being invoked 4 times too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rarajes2 rarajes2 force-pushed the keepalive-retries-CAI-7258 branch from e2e01ce to e044e2d Compare November 18, 2025 14:47
@rarajes2 rarajes2 enabled auto-merge (squash) November 18, 2025 16:00
@rarajes2 rarajes2 merged commit d9fa606 into webex:next Nov 18, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

validated If the pull request is validated for automation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants