-
Notifications
You must be signed in to change notification settings - Fork 91
perf: memoize blobs requests in the request scope #2777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: memoize blobs requests in the request scope #2777
Conversation
…asserting we only check blobs once per request for same key
📊 Package size report 0.9%↑
Unchanged files
🤖 This report was automatically generated by pkg-size-action |
…h blobs with common tracing pattern
f8a565c to
c97fee2
Compare
c97fee2 to
85d4eb9
Compare
…-timeouts-on-runtime-v594
src/run/regional-blob-store.cts
Outdated
| type BlobLRUCache = LRUCache<string, BlobType | typeof NullValue | Promise<BlobType | null>> | ||
|
|
||
| const FETCH_BEFORE_NEXT_PATCHED_IT = Symbol.for('nf-not-patched-fetch') | ||
| const IN_MEMORY_CACHE_MAX_SIZE = Symbol.for('nf-in-memory-cache-max-size') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a customer make a call on this number? Trail and error?
Should we maybe just use DEFAULT_FALLBACK_MAX_SIZE until there's an observed need to increase this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Customer wouldn't use this directly. Idea was to honor actual next.config.js setting https://nextjs.org/docs/app/api-reference/config/next-config-js/incrementalCacheHandlerPath (cacheMaxMemorySize one) that I do wire up here -
opennextjs-netlify/src/run/config.ts
Lines 53 to 56 in 41bca90
| // honor the in-memory cache size from next.config (either one set by user or Next.js default) | |
| setInMemoryCacheMaxSizeFromNextConfig( | |
| config.cacheMaxMemorySize ?? config.experimental?.isrMemoryCacheSize, | |
| ) |
Note checking 2 settings because vercel/next.js#57953 landed in 14.1.0 that moved experimental one into stable (and renamed it)
This comment might shed a bit more light on few following ones
src/run/regional-blob-store.cts
Outdated
| : DEFAULT_FALLBACK_MAX_SIZE | ||
|
|
||
| extendedGlobalThis[IN_MEMORY_LRU_CACHE] = | ||
| maxSize === 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the user could have IN_MEMORY_LRU_CACHE set, but IN_MEMORY_CACHE_MAX_SIZE set to 0 which would keep it off. Two dials like this probably warrants a warning message or similar for the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All those extendedGlobalThis are meant to be private handling and not public - users are not meant to mess with them
globalThis is used here not as a way for user to set things, but rather in case we end up with duplicate of this module (like we did prior to #2774 )
This is also similar setup that you can spot in few places in Next.js itself (like https://github.com/vercel/next.js/blob/9a5b5ce5cef468a14f5cd2438a6e667dcb7d7996/packages/next/src/server/use-cache/handlers.ts#L14-L26 for example)
src/run/regional-blob-store.cts
Outdated
| const extendedGlobalThis = globalThis as typeof globalThis & { | ||
| [FETCH_BEFORE_NEXT_PATCHED_IT]?: typeof globalThis.fetch | ||
| [IN_MEMORY_CACHE_MAX_SIZE]?: number | ||
| [IN_MEMORY_LRU_CACHE]?: BlobLRUCache | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move noOpInMemoryCache here? Instead of null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized these are types, but the intention of my comment still sort of stands. Can we try to have these be non-nullable if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noOpInMemoryCache is not implementing same interface as BlobLRUCache so it couldn't be used here. To make it non-nullable I would need to create a shim class that implements LRUCache interface and this doesn't seem worth doing and maintaing to get rid of null case to me?
src/run/regional-blob-store.cts
Outdated
| extendedGlobalThis[IN_MEMORY_LRU_CACHE] = | ||
| maxSize === 0 | ||
| ? null // if user sets 0 in their config, we should honor that and not use in-memory cache | ||
| : new LRUCache<string, BlobType | typeof NullValue | Promise<BlobType | null>>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is LRUCache a lot better than just using {} for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LRUCache ensures some limits, so that lambda that stays warm for a long time doesn't suddenly OOM. The memory/size is not exact (hence naming of estimate for the size calculation) and should be thought more as "ballpark" values.
If we don't use LRUCache - we'd need to come up with some other mechanism to manage in-memory pruning, but I don't think it's worth doing given how battle-tested lru-cache package is (with 200 million downloads a wek).
Lastly - default implementation for self-hosted Next.js also uses LRUCache for its in-memory cache (size calculation was inspired by how it's used in that default implementation)
src/run/regional-blob-store.cts
Outdated
| }, | ||
| } | ||
|
|
||
| const getRequestSpecificInMemoryCache = (): RequestSpecificInMemoryCache => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a interface/wrapper around LRUCache, should we pop this in another file? Perhaps in-memory-cache.cts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a "refactor" in cff9f73 that did few things:
- I did create
run/storagedirectory to not polluterunwith "implementation details" of storage handling - There is "public" module
run/storage/storage.tsthat is meant to be used everywhere and other modules inrun/storageshould not be imported directly outside ofstoragedirectory to not leak abstractions/implementation details (added eslint rule for that) regional-blob-store.ctsis now exactly as it is inmainbranch - it was just moved (with adjusted import paths only) - this now is not meant to be used directly because it's considered implementation detail- created
request-scoped-in-memory-cache.ctsand moved details of in-memory handling there (this is also implementation detail not meant to be used directly outside ofrun/storage)
Hopefully this splits concerns reasonably well, making modules rather small and hopefully easier to digest and provide some boundaries between concerns with standard import/export contract
src/shared/cache-types.cts
Outdated
| return false | ||
| } | ||
|
|
||
| const isHtmlBlob = (value: BlobType): value is HtmlBlob => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it belongs in another file as it is not a "cache type"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is more of possible value that is being stored in blobs, but not exactly cache type. TagManifest is also same way, because it's also not really a cache type (it's not comming from next.js - it's our custom handling for tracking tag invalidations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some shuffling in ca233ef - i did create sibling module blob-types and moved those non Next.js CacheHandler related types and type guards there. Actually fixing type guards ( #2777 (comment) ) will be done in separate commit
The main thing with this move to me was question about estimateBlobSize and wether this should be in this new module or it should be in the in-memory module, but decided to move it to in-memory module because size is only concern/detail of in-memory cache and not shared anywhere else (at least not right now). Maybe it could have it's own module, but to me over-modularizing make things less readable when you have to jump too much between different modules 🤷
src/shared/cache-types.cts
Outdated
| return false | ||
| } | ||
|
|
||
| export const estimateBlobSize = (valueToStore: BlobType | null | Promise<unknown>): number => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this is not a "cache type" so I think we should put this in a different file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2777 (comment)
| return tracer | ||
| } | ||
|
|
||
| export function recordWarning(warning: Error, span?: Span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCacheKeySpan looked to be non-nullable, can we make span non-nullable here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opennextjs-netlify/src/shared/cache-types.cts
Lines 210 to 214 in 41bca90
| recordWarning( | |
| new Error( | |
| `Blob size calculation did fallback to JSON.stringify. Kind: KnownKindFailed: ${knownKindFailed}, ${valueToStore.value?.kind ?? 'undefined'}`, | |
| ), | |
| ) |
span to pass it, so that's why I made it optional.
Generally preferably we do pass span explicitly instead of getting "active span", because it's more stable on which span things would be recorded, but it's better to record at all than skip recording because we don't have reference to a span.
Alternative could be to make above linked code get active span and then make span here non-nullable, but I did want to simplify usage of it, but would be fine making that change (less "magic")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did attempt the alternative but to me it moved too much tracing related logic and checks to module that doesn't already have reference to containing span I mentioned above and concern of finding active span seems much more in tracer.cts module than modules that might want to record warnings, so I'd prefer to keep the signature of this function as it is
…1629-intermittent-proxy-timeouts-on-runtime-v594
…pes to blob-types module
…-timeouts-on-runtime-v594
| { | ||
| // only */storage/storage.cjs is allowed to be imported | ||
| // rest are implementation details that should not be used directly | ||
| group: ['*/storage/*', '!*/storage/storage.cjs'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use storage/index.cjs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer not to because my personal experience is that using index modules makes navigation around code base harder when you start to get multiple modules with same name in different directories, as well as just seeing multiple index modules opened in IDE tabs hellish so I do try to avoid it
similar example (just instead of index module, is Next's app router pattern to have page.j/ts(x)):

I can appreciate that storage/storage.cjs looks pretty weird, but I think it's better than added mental load of tracking multiple index modules. Additionally with IDEs auto-adding imports (like https://code.visualstudio.com/docs/languages/javascript#_auto-imports ) it's a bit "out of mind" as nowadays you usually don't even write your import statements manually
Some compromise could be to have index that just re-exports and doesn't actually have any logic inside, but then it just pollute directory tree IMO
| } from '../headers.js' | ||
| import { setFetchBeforeNextPatchedIt } from '../regional-blob-store.cjs' | ||
| import { nextResponseProxy } from '../revalidate.js' | ||
| import { setFetchBeforeNextPatchedIt } from '../storage/storage.cjs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to change here, just reading notes - Both regional-blob-store and storage are unexpected places for me to import setFetchBeforeNextPatchedIt from, but don't have a suggestion for an alterantive at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only alternative really would be to not call this function (and just remove this function altogether) and instead directly store fetch on globalThis to be consumed later
| const IN_MEMORY_LRU_CACHE = Symbol.for('nf-in-memory-lru-cache') | ||
| const extendedGlobalThis = globalThis as typeof globalThis & { | ||
| [IN_MEMORY_CACHE_MAX_SIZE]?: number | ||
| [IN_MEMORY_LRU_CACHE]?: BlobLRUCache | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting for myself that we're actually using null and undefined as distinct values
undefinedmeans theBlobLRUCachewas not initialized yetnullmeans theBlobLRUCachewas initialized but resulted in no value
| } | ||
|
|
||
| interface RequestScopedInMemoryCache { | ||
| get(key: string): BlobType | null | Promise<BlobType | null> | undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting for myself that we're actually using null and undefined as distinct values
nullmeans there was aNullValuein the cache/blob storeundefinedmeans there was no value found, theRequestContextdoesn't exist, orLRUCacheis not specified
49bee0f to
af34d70
Compare
| tags.map(async (tag) => { | ||
| try { | ||
| await this.blobStore.setJSON(await this.encodeBlobKey(tag), data) | ||
| await this.cacheStore.set(tag, data, 'tagManifest.set') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this will add tracing (I believe that's a good thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did want to use this opportunity to capture all blobs operations for debugging purposes
| const file = (await store.get(await encodeBlobKey(relPath), { | ||
| type: 'json', | ||
| })) as HtmlBlob | null | ||
| const file = await cacheStore.get<HtmlBlob>(relPath, 'staticHtml.get') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again noting that this will add tracing (again I think it's a good thing)
| const file = (await store.get(await encodeBlobKey(relPath), { | ||
| type: 'json', | ||
| })) as HtmlBlob | null | ||
| const file = await cacheStore.get<HtmlBlob>(relPath, 'staticHtml.get') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably automatic, but I'm just calling it out in case - I see blobStore.set and tagManifest.set but not seeing staticHtml.set anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We upload those only at build time (
opennextjs-netlify/src/build/content/static.ts
Lines 42 to 46 in b5754f4
| await writeFile( | |
| join(destDir, await encodeBlobKey(path)), | |
| JSON.stringify({ html, isFallback } satisfies HtmlBlob), | |
| 'utf-8', | |
| ) |
@nelify/cli or buildbot upload those blobs after build command - we don't create build-time traces for that.
We don't add/update those anymore once deployed
af34d70 to
7daeeca
Compare
|
|
||
| // TODO: use metadata for this | ||
| lastModified = await tracer.withActiveSpan( | ||
| const cacheStore = getMemoizedKeyValueStoreBackedByRegionalBlobStore({ consistency: 'strong' }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming note - perhaps getMemoizedRegionalBlobStore or getRegionalBlobService?
| inMemoryCache.set(key, getPromise) | ||
| return getPromise | ||
| }, | ||
| async set(key: string, value: BlobType, otelSpanTitle: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this works, but maybe this?
async set<T extends BlobType>(key: string, value: T, otelSpanTitle: string) {
My idea here is to have as similar an interface between get/set as we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only added generic to .get to save few characters as we currently any time we use blobStore.get we have to do something like this:
opennextjs-netlify/src/run/handlers/cache.cts
Line 256 in b5754f4
| })) as NetlifyCacheHandlerValue | null |
so it just avoid adding | null in all those places while preserving same type safety.
I don't think that there does need to be parity between .get and .set just for parity sake.
That said, adding generic to .set is fine and we could lock down to concrete types from available here
| export type BlobType = NetlifyCacheHandlerValue | TagManifest | HtmlBlob |
TagManifest when generally given code path should set NetlifyCacheHandlerValue), so if you feel up for it - you can go ahead and make that change.
| return await encodeBlobKeyImpl(key) | ||
| } | ||
|
|
||
| export const getMemoizedKeyValueStoreBackedByRegionalBlobStore = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hopeful that the 3 implementations (NetlifyCacheHandlerValue | TagManifest | HtmlBlob) of this don't diverge too much. If we find ourselves adding any if statements in here for each of the types, the responsible thing to do quickly becomes splitting off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can move size estimation out of storage directory, we could add similar eslint rule like I did for "private storage modules" to not allow imports other than BlobType from the module that exports it in that directory to enforce storage not caring about details of individual types and only enforce that we use allowed union of types
|
|
||
| // lru-cache types don't like using `null` for values, so we use a symbol to represent it and do conversion | ||
| // so it doesn't leak outside | ||
| const NullValue = Symbol.for('null-value') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case today where we save null in the blob store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer - 404 pages in certain scenarios
…-timeouts-on-runtime-v594
Description
If site is doing multiple fetches with data caching enabled or is calling method wrapped in
unstable_cacheresulting in same cache key multiple times to render single page - we currently are repeatedly callingblobStore.getfor each of the repeated use.This PR looks to add memoization of those calls scoped to concrete request. We don't want to memoize them globally due to distributed nature of serverless with potentially many instances running at the same time, each holding different potentially state, but we should be able to safely memoize blobs call while handling same request. This also allow for reading it's own writes within scope of same request handling.
This PR also replace similar handling we already had for tag manifests with more generic one shared for all blobs operations.
Tests
opennextjs-netlify/tests/integration/revalidate-path.test.ts
Lines 77 to 81 in 8675b3e
Relevant links (GitHub issues, etc.) or a picture of cute animal
https://linear.app/netlify/issue/FRB-1629/intermittent-proxy-timeouts-on-runtime-v594