Skip to content

Conversation

@VickyStash
Copy link
Contributor

@VickyStash VickyStash commented Oct 28, 2025

Details

This PR updates error handling by:

  • adding operation retries limitation. Before, failed operations could retry endlessly, but with this PR, the number of retries is limited to 5.
  • preventing data eviction if error is not related to the storage capacity.

Related Issues

Expensify/App#73779

Automated Tests

Added to tests/unit/onyxUtilsTest.ts

Manual Tests

  1. Launch the iOS/Android app and open the logs.
  2. Send an expense
  3. See the errors about some failing operations. But since the errors are not related to storage capacity, the data isn't evicted from the storage.

Before
image

After

image

Author Checklist

  • I linked the correct issue in the ### Related Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android / native
    • Android / Chrome
    • iOS / native
    • iOS / Safari
    • MacOS / Chrome / Safari
    • MacOS / Desktop
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • If we are not using the full Onyx data that we loaded, I've added the proper selector in order to ensure the component only re-renders when the data it is using changes
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR author checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop

Copy link
Contributor

@chrispader chrispader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, i just have a few minor comments 🙌🏼

lib/OnyxUtils.ts Outdated
* - Other errors: retries the operation
*/
function evictStorageAndRetry<TMethod extends typeof Onyx.set | typeof Onyx.multiSet | typeof Onyx.mergeCollection | typeof Onyx.setCollection>(
function retryOperation<TMethod extends typeof Onyx.set | typeof Onyx.multiSet | typeof Onyx.mergeCollection | typeof Onyx.setCollection>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAB: Do you think it would be feasible to add a type backing TMethod that requires an onyx method to have an optional retryAttempt parameter on the last position? 👀

const isStorageCapacityError = STORAGE_ERRORS.some((storageError) => storageError === error?.name?.toLowerCase() || errorMessage?.includes(storageError));

if (!isStorageCapacityError) {
// @ts-expect-error No overload matches this call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My above comment was to prevent the need for this // @ts-expect-error comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This // @ts-expect-error comment isn't coming due to retryAttempt parameter, it something that exists on main as well, just cause onyx methods are different and can't be overloaded.

// @ts-expect-error No overload matches this call.
return remove(keyForRemoval).then(() => onyxMethod(...args));

I've tried to go with different approaches (add retryOperation overloads, adding assertion worked by didn' look much better)

@VickyStash VickyStash changed the title Update evictStorageAndRetry to not evict the data if error isn't storage related Update error handling in Onyx to prevent unnecessary data eviction Oct 30, 2025
const updatePromise = OnyxUtils.broadcastUpdate(key, valueWithoutNestedNullValues, hasChanged);

// If the value has not changed or the key got removed, calling Storage.setItem() would be redundant and a waste of performance, so return early instead.
if (!hasChanged && !retryAttempt) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the condition to not skip Storage.setItem operation if the retryAttempt is defined.
I haven't added any cache-clearing logic, as I'm not sure about it, since:

  • currently app doesn't rollback the cache in case of a failed operation (for any operation)
  • the subscribers (useOnyx for example) are also notified with the updated values no matter if storage operation succeed or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. Could we change the comment above to reflect the new logic?

Copy link
Contributor

@fabioh8010 fabioh8010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things!

@VickyStash is possible to add some unit tests to cover these retry mechanisms?

const updatePromise = OnyxUtils.broadcastUpdate(key, valueWithoutNestedNullValues, hasChanged);

// If the value has not changed or the key got removed, calling Storage.setItem() would be redundant and a waste of performance, so return early instead.
if (!hasChanged && !retryAttempt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. Could we change the comment above to reflect the new logic?

@VickyStash
Copy link
Contributor Author

@VickyStash is possible to add some unit tests to cover these retry mechanisms?

Sure, it's planned! I've mentioned it here.

@VickyStash
Copy link
Contributor Author

While writing the tests, I've found some flaws in the eviction mechanism.

We get keyForEviction by checking recentlyAccessedKeys:

/**
* Finds a key that can be safely evicted
*/
getKeyForEviction(): OnyxKey | undefined {
for (const key of this.recentlyAccessedKeys) {
if (!this.evictionBlocklist[key]) {
return key;
}
}
return undefined;
}

In the recentlyAccessedKeys the key can be related to the data that was removed.

So if we remove value:

return remove(keyForRemoval).then(() => onyxMethod(...args));

We notify subscribers about that => the key is added to the recentlyAccessedKeys.

It means:

  • We can end up trying to evict the key that already has no data inside.
  • On retries, we always try to remove the key that was just removed before, cause it's the latest one in the recentlyAccessedKeys

@VickyStash VickyStash marked this pull request as ready for review October 31, 2025 11:07
@VickyStash VickyStash requested a review from a team as a code owner October 31, 2025 11:07
@melvin-bot melvin-bot bot requested review from lakchote and removed request for a team, chrispader and fabioh8010 October 31, 2025 11:08
Copy link
Contributor

@fabioh8010 fabioh8010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Collaborator

@tgolen tgolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VickyStash Did you do anything about those eviction problems you found while writing the tests? Is that something we need to fix here in this PR or is that going to be something we look at separately?

<dd><p>If we fail to set or merge we must handle this by
evicting some data from Onyx and then retrying to do
whatever it is we attempted to do.</p>
<dt><a href="#retryOperation">retryOperation()</a></dt>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you regenerate these docs after your recent changes? I don't think retryOperation() is exposed anywhere publicly, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgolen I've re-generated it one more time.
Yeah, it's not public! This file is also API-INTERNAL (describes internal methods)

@VickyStash
Copy link
Contributor Author

@VickyStash Did you do anything about those eviction problems you found while writing the tests? Is that something we need to fix here in this PR or is that going to be something we look at separately?

I haven't added any changes for it.

I've the idea of the fix in mind: updating this check to check not only for null, but for undefined as well.

// Add or remove this key from the recentlyAccessedKeys lists
if (value !== null) {
cache.addLastAccessedKey(key, isCollectionKey(key));
} else {
cache.removeLastAccessedKey(key);
}

This way we at least won't try to evict what we know is removed.
But I'm not sure if I should complicate this PR even more. Eager to hear what you think!

@tgolen
Copy link
Collaborator

tgolen commented Oct 31, 2025

I think as long as that issue doesn't prevent this PR from working, then I think we should tackle it separately. I think the best step would be to make a proposal in the #quality channel in Slack.

const updatePromise = OnyxUtils.scheduleNotifyCollectionSubscribers(collectionKey, mutableCollection, previousCollection);

return Storage.multiSet(keyValuePairs)
.catch((error) => OnyxUtils.retryOperation(error, setCollectionWithRetry, {collectionKey, collection}, retryAttempt))
Copy link

@abzokhattab abzokhattab Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #694 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abzokhattab Onyx.setCollection/setCollectionWithRetry doesn't have any options param, right?
So there is nothing extra to pass.

lib/types.ts Outdated
isProcessingCollectionUpdate?: boolean;
};

type OnyxRetryOperation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAB: Minor request, but OnyxRetryOperation sounds to me as if these operations would mainly retry and if that was their main purpose. I would name this type something like RetriableOperation or RetriableOnyxOperation


describe('retryOperation', () => {
it('should retry only one time if the operation is firstly failed and then passed', async () => {
const retryOperationSpy = jest.spyOn(OnyxUtils, 'retryOperation');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this line out of the individual tests and store globally?

describe('retryOperation', () => {
it('should retry only one time if the operation is firstly failed and then passed', async () => {
const retryOperationSpy = jest.spyOn(OnyxUtils, 'retryOperation');
const genericError = new Error('Generic storage error');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this actually


it('should stop retrying after MAX_STORAGE_OPERATION_RETRY_ATTEMPTS retries for failing operation', async () => {
const retryOperationSpy = jest.spyOn(OnyxUtils, 'retryOperation');
const genericError = new Error('Generic storage error');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


it('should not retry in case of storage capacity error and no keys to evict', async () => {
const retryOperationSpy = jest.spyOn(OnyxUtils, 'retryOperation');
const quotaError = new Error('out of memory');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here :)

Copy link
Contributor

@chrispader chrispader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, i have minor requests but these are not blockers. Thanks for your work @VickyStash 🙌🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants