Skip to content

fix(provisioning): Context.App or Context.AppIfConfigured will return (val, nil) even if the app failed to provision or validate the first time #7070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

alexandre-daubois
Copy link

@alexandre-daubois alexandre-daubois commented Jun 16, 2025

Fix dunglas/symfony-docker#801

When starting FrankenPHP with Mercure and the default prod config, the publisher and subscriber keys are empty. This causes the Mercure app provision to fail, return an error in Provision and be partially started. It is then in an inconsistent state, and Caddy panics (as shown in the stacks trace of the original issue).

By adding this new cleanup, we ensure the app isn't in the apps map anymore. Caddy doesn't panic anymore and the error message returned by Mercure is now successfully displayed in the console.

cc @dunglas

@CLAassistant
Copy link

CLAassistant commented Jun 16, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo, thanks, that's an interesting case. Yeah it's probably best if the failed app is removed...

@mholt mholt enabled auto-merge (squash) June 16, 2025 19:16
@francislavoie
Copy link
Member

francislavoie commented Jun 16, 2025

Hmm. I'm not sure this is a good enough solution. I think what might happen is if the app is referenced by more than one module (e.g. HTTP handler module gets a ref to the app), then it would try to provision the app many times (and probably fail every time), causing a lot of noise. Maybe the context should hold onto a map of failed apps (e.g. ctx.cfg.failedApps or something), skip trying to load that app again if it's in failedApps, and then get rid of failedApps after the whole provision step is done, but before starting the apps.

@mholt mholt disabled auto-merge June 16, 2025 19:29
@mholt
Copy link
Member

mholt commented Jun 16, 2025

@francislavoie Interesting... (I've disabled auto-merge while we figure this out)

Once a module fails to load, it should bubble the error up all the way, and cease trying to provision more modules, yeah? The config will proceed to unload and roll back to its previous state.

@francislavoie
Copy link
Member

I don't understand why Caddy continues to start then 🤔 With a bad config it should just throw it all away, no? Maybe I don't understand how Mercure is being used here.

@alexandre-daubois
Copy link
Author

Maybe I don't understand how Mercure is being used here.

Nothing fancy here actually, Mercure app provision just starts by checking that provided keys are not empty, and it returns an error if they are

@mholt
Copy link
Member

mholt commented Jun 16, 2025

@francislavoie Good question -- the stack trace shows that it's starting the http app, I'll study this a bit more.

I'm not strictly opposed to "noisy" logs though, but I do want to know why a failed module provisioning during config load allows the config to continue loading... I'll dig.

@mholt
Copy link
Member

mholt commented Jun 18, 2025

Okay, I had some time to digest this. I don't fully understand it yet.

Thanks @alexandre-daubois for proposing a fix. Even though I can't explain why, if the mercure module returns an error on Provision(), we would call Start() on any app, I do think it's the right thing to remove the app from the map if it fails to provision.

This does feel like a band-aid (but also proper bookkeeping).

As I don't have Docker to reproduce the issue, can anyone explain how it's possible that we're calling Start() here:

caddy/caddy.go

Lines 431 to 435 in 2f0fc62

// Start
err = func() error {
started := make([]string, 0, len(ctx.cfg.apps))
for name, a := range ctx.cfg.apps {
err := a.Start()

after an error is returned here:

caddy/caddy.go

Lines 401 to 405 in 2f0fc62

ctx, err := provisionContext(newCfg, start)
if err != nil {
globalMetrics.configSuccess.Set(0)
return ctx, err
}

a few lines above?

I pushed a commit to do the cleanup using defer, which is a little more robust and takes care of a failure from Validate() as well. We can probably merge this since it's at least more correct. (Thanks again!)

@mholt mholt added the under review 🧐 Review is pending before merging label Jun 18, 2025
@mholt mholt added this to the v2.10.1 milestone Jun 18, 2025
@mholt
Copy link
Member

mholt commented Jun 19, 2025

@alexandre-daubois -- or anyone else getting the error -- would you be able to confirm that the latest commit still "fixes" the issue?

And if anyone knows why Start() proceeds to be called despite an error being returned, I'd love to be enlightened 😅

@alexandre-daubois
Copy link
Author

I just tested your fix and everything seems fine! Thank you Matt!

@WeidiDeng
Copy link
Member

@mholt I think I know why caddy app starts even though provision failed. It's related to frankenphp.

In this line, frankenphp will try to load http app, and if the load is successful, metrics will be setup.

Let's dig how AppIfConfigured is implemented

func (ctx Context) AppIfConfigured(name string) (any, error) {
	if ctx.cfg == nil {
		return nil, fmt.Errorf("app module %s: %w", name, ErrNotConfigured)
	}
	if app, ok := ctx.cfg.apps[name]; ok {
		return app, nil
	}
	appRaw := ctx.cfg.AppsRaw[name]
	if appRaw == nil {
		return nil, fmt.Errorf("app module %s: %w", name, ErrNotConfigured)
	}
	return ctx.App(name)
}

This line is interesting, let's see in what ways the app will be there.

This app will be populated by App if not found.

func (ctx Context) App(name string) (any, error) {
	if app, ok := ctx.cfg.apps[name]; ok {
		return app, nil
	}
	appRaw := ctx.cfg.AppsRaw[name]
	modVal, err := ctx.LoadModuleByID(name, appRaw)
	if err != nil {
		return nil, fmt.Errorf("loading %s app module: %v", name, err)
	}
	if appRaw != nil {
		ctx.cfg.AppsRaw[name] = nil // allow GC to deallocate
	}
	return modVal, nil
}

It's possible that ctx.LoadModuleByID will fail, but the app is still stored inside ctx.cfg.apps. And subsequent calls will return this app without error.

Imagine frankenphp is provisioned first, and since it doesn't react when there is an error regarding caddy http app. Subsequently, when caddy uses http app, the error is silently ignored, and problems will ensue.

By removing the problematic app from the map, caddy will try to reinitialize this app, and inside caddy core, the error is caught and logged.

@WeidiDeng
Copy link
Member

@alexandre-daubois I have a suggestion, we can record the error for a particular app since when config is reloaded the new config itself should not change theoretically. And just like App will return the caddy app if it's found, we can return the error if it's found, saving the trouble of reinitializing the app again and watch it fail.

@dunglas I think frankenphp should handle the case when caddy http app fails to provision.

@alexandre-daubois
Copy link
Author

I'm having a look at the implementation then I'll investigate on the fix for FrankenPHP. Thanks!

@WeidiDeng WeidiDeng changed the title fix(provisioning): remove app from apps map when its provision failed fix(provisioning): Context.App or Context.AppIfConfigured will return (val, nil) even if the app failed to provision or validate the first time Jul 29, 2025
@WeidiDeng
Copy link
Member

@alexandre-daubois I pushed some commits that simplified the recording of the error, and it will return the error if it's found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
under review 🧐 Review is pending before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: runtime error: invalid memory address or nil pointer dereference
6 participants