- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
Add get AppContext properties diagnostic IPC command #86610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Tagging subscribers to this area: @tommcdon Issue DetailsThese changes allow diagnostic tools to enumerate the AppContext properties of a running application over the diagnostic communication channel for CoreCLR (I'm not sure if Mono has the AppContext notion and have opted out of implementing it there by returning DS_IPC_E_NOTSUPPORTED; similarly for NativeAOT). The returned payload is exactly the same as the DS_PROCESS_COMMANDID_GET_PROCESS_ENV command. Each property name and value is concatenated together with an equal sign character ( Corresponding diagnostic repo change is dotnet/diagnostics#3901 Partially addresses #83756 cc @dotnet/dotnet-monitor 
 | 
| STATIC_CONTRACT_NOTHROW; | ||
| EP_ASSERT (props_array != NULL); | ||
|  | ||
| return Configuration::EnumerateKnobs([](const LPCWSTR& name, const LPCWSTR& value, void* context) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tapped into the Configuration class for this enumeration since it is the owner of the app config properties starting here and the diagnostic server/command handling is started later via here.
The other statically available places to access this information were possibly from the host or coreclr exports and contracts, but I felt that it was not appropriate to add the enumeration API there for external consumption; also, based on discussion in #83756, there may be some desire to filter the properties before the AppContext gets to report them, which feels like it would naturally fit in at configuration initialization. Finally, I will have a follow up PR that allows modification of these values, which the Configuration class can be more easily made to do than say the host and coreclr properties since those are already "sealed" at this point in time.
| STATIC_CONTRACT_NOTHROW; | ||
| EP_ASSERT (props_array != NULL); | ||
|  | ||
| return Configuration::EnumerateKnobs([](const LPCWSTR& name, const LPCWSTR& value, void* context) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any properties that we might want to omit from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the list for a very basic .NET 8 console app:
- FX_DEPS_FILE
- TRUSTED_PLATFORM_ASSEMBLIES
- NATIVE_DLL_SEARCH_DIRECTORIES
- PLATFORM_RESOURCE_ROOTS
- APP_CONTEXT_BASE_DIRECTORY
- APP_CONTEXT_DEPS_FILES
- PROBING_DIRECTORIES
- RUNTIME_IDENTIFIER
- System.Reflection.Metadata.MetadataUpdater.IsSupported
- HOST_RUNTIME_CONTRACT
- STARTUP_HOOKS
Other identifiers are added to this list when the app contains them in the runtimeOptions:configProperties section in the *.runtimeconfig.json file.
Probably HOST_RUNTIME_CONTRACT can be skipped because its value is a in-memory address to the host runtime contract. Everything else seems to be directory and file paths of various types.
My default position would be to only filter out HOST_RUNTIME_CONTRACT and leave everything else in in the list for reading values. In a future change that may allow modification of values, I can see some more of these being effectively made read-only, but I would have to defer to others on what's desirable to allowed to be read.
@vitek-karas and @elinor-fung, do you have suggestions on what to filter (if any and where to do it) for this initial PR that enables reading the app config values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total size of all these switches can be in 10's kB. I assume that this event is going to be typically enabled by default. Do you really want to send 10's kB over the diagnostic channel during startup by default?
I would ask a different question. What do you expect that this data is going to be used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total size of all these switches can be in 10's kB. I assume that this event is going to be typically enabled by default. Do you really want to send 10's kB over the diagnostic channel during startup by default?
There's already an existing diagnostic command for collecting the entire environment block of the target process. I would hazard a guess that it's has similar size characteristics. We haven't seen any significant slowdown during startup nor have we received any customer feedback about it.
The only one that I see right now that is significantly large is TRUSTED_PLATFORM_ASSEMBLIES, which we could probably do without. Maybe we can categorize the data and have those categories as part of the request payload on the diagnostic command in order to limit the response to the interested data.
I would ask a different question. What do you expect that this data is going to be used for?
Originally, we only wanted to get startup hook information, and append our own startup hook to the list which the process is held up during the diagnostics suspension (which occurs just before managed code execution is started); this request is #83756. It was suggested that this be generalized to get / set app context properties. Expanding it to this general concept also allows us to:
- read RUNTIME_IDENTIFIERto determine what type ofICorProfilerCallbackimplementation we should load in the target process, especially differentiating on glibc vs musl-libc implementations.
The following are hypothetical but likely candidates for use:
- read System.StartupHookProvider.IsSupportedand determine if we should bother with the startup hook at all. This would also allow us to automatically tell users if features that depend on the startup hook will work. Other app context switches may be useful; not sure how to determine the full set of "in box" switches though.
- read BUNDLE_PROBE\HOSTPOLICY_EMBEDDED\PINVOKE_OVERRIDEto check if the process is single file; some diagnostic scenarios do not work in single file applications. It would be good to know whether we bother trying to set up for those scenarios or be able to give the user a better indication why the diagnostic scenario didn't work (e.g. "sorry, this feature is not compatible with single file applications").
- read APP_CONTEXT_BASE_DIRECTORYand tell our profiler what that is; this may help in discovering application symbol information that was published with the application so that we can report that back to the .NET Monitor process in order to get SourceLink information. We can use that information in combination with diagnostic artifacts that are collected to help guide users or automated tooling back to real source code.
- (this is a stretch) read / set APP_PATHS/PROBING_DIRECTORIESto get our managed assemblies loadable without having to rely on managed assembly resolving and LoadFrom context.
Those are the obvious ones that I can think of right now; I can probably dig out more scenarios where some of this data may be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re single-file crash dump: Wouldn't it be cleaner if the runtime responded via the IPC channel if the dump could be collected or not? After all, the runtime knows the answer... using hints like single-file is just guessing. I understand that guessing is all we have right now, but if we're introducing new functionality why not design it for the original purpose?
I agree. I'm dropping the "other" bonus usage and won't mention it anymore unless others keep poking at it. I just want startup hooks #83756 and runtime identifiers #74476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Startup hooks only execute at startup... they are never evaluated again after the first processing of the startup hooks at the very beginning of managed execution
We can extend this contract via diagnostic command. The startup hook instantiation is load assembly + invocation of the well-known entrypoint. Load assembly has built-in synchronization that ensures that given assembly is loaded exactly once. And once the entrypoint is executing, it can do its own synchronization if there is a chance that it can be invoked multiple times by the diagnostic command.
Once you have managed hook running in the process, you can do any inspection you want, with any precision you like. For example, you do not have to depend on guessing to figure out what kind of libc is loaded based on the RID. You can inspect the process from inside to see what kind of libc is in use. You do not need to depend on runtime adding built-in tracing for every piece of information that you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, you do not have to depend on guessing to figure out what kind of libc is loaded based on the RID.
Determining the libc type during diagnostic suspension is not for startup hook usage; it's for determining which variant of our profiler to use.
For loading the profiler, either AddStartupProfiler and AttachProfiler diagnostic commands are invoked, both of which take a full file path to the profiler implementation. The former command is only usable during the suspension and does not evaluate whether the library is loadable or not because the actual load comes later after the runtime has been resumed. There is no feedback on whether the library is loaded or not, so we can't even just try one and fallback to the other; we need to know the RID before the runtime is resumed. For the latter, we could just try one and then the other if the first fails, but that feels like a waste if we already know what the RID is due to needing it for AddStartupProfiler.
You do not need to depend on runtime adding built-in tracing for every piece of information that you need.
I have never asked for that. There is no tracing involved.
To make decisions on what needs to be loaded or enable at diagnostic suspension, we need diagnostic commands to get specific information (such as RID) because some features (such as profilers) require native-specific information and managed code is not executing at that time thus cannot help in a platform agnostic way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need diagnostic commands to get specific information (such as RID) because some features (such as profilers)
RID is not appropriate for this need. For example, what are you going to do when you see RID void-x64?
If you need libc type to load the right profiler flavor, we should introduce diagnostic command that returns the libc type.
Alternatively, you can find out the libc that the process has loaded using /proc file system. Is that an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need libc type to load the right profiler flavor, we should introduce diagnostic command that returns the libc type.
I logged #74476 specifically for this a 9 months ago. I'll try to get to this if the diagnostics team cannot.
Alternatively, you can find out the libc that the process has loaded using
/procfile system. Is that an option?
No, .NET Monitor is typically not running in the same process namespace in its primary scenario (multi-container environments such as Kubernetes, where .NET Monitor is typically running as a sidecar container and maybe be running as a different user). We can however use that as a fallback mechanism of almost last resort if it happens to be running in the same namespace, which we may do for runtime versions lower than .NET 8.
| 
 I would agree to that only if we are guaranteed that RUNTIME_IDENTIFIER will always be available as one of the app config properties. | 
| 
 Mono supports AppContext, it gets registered here during startup, https://github.com/dotnet/runtime/blob/main/src/mono/mono/metadata/appdomain.c#L836, and kept in two internal arrays with keys and values, but after the properties have been installed into the managed app context, https://github.com/dotnet/runtime/blob/main/src/mono/mono/metadata/appdomain.c#L860, they seemed to be freed, so if we would need to keep them around in order to handle the IPC command without extracting the data from managed side, then we would need to alter the logic to not free the keys/values after they have been installed into the AppContext. The size of the AppContext data could be rather big, so its not optimal to keep double copies in memory just in case it is needed, so maybe a better alternative would be to extract the data from managed side if/when needed by this IPC command. | 
| How does this relate to: runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/RuntimeEventSource.cs Line 129 in ce689d9 
 Do we need both? | 
| 
 That's not helpful in the intended scenario. We need to be able to get/set the startup hook (this PR is only helping with the "get" portion) at the diagnostic startup suspension point, which occurs before managed execution; event sources haven't executed yet at this point. | 
| 
 To be clear the question wasn't whether we can use the existing data. The question was whether we evaluate existing instrumentation for redundancy when adding new ones. | 
These changes allow diagnostic tools to enumerate the AppContext properties of a running application over the diagnostic communication channel for CoreCLR (I'm not sure if Mono has the AppContext notion and have opted out of implementing it there by returning DS_IPC_E_NOTSUPPORTED; similarly for NativeAOT).
The returned payload is exactly the same as the DS_PROCESS_COMMANDID_GET_PROCESS_ENV command. Each property name and value is concatenated together with an equal sign character (
=) and delimited with a null-terminator; this represents a single entry in the AppContext.Corresponding diagnostic repo change is dotnet/diagnostics#3901
Partially addresses #83756
cc @dotnet/dotnet-monitor