-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(codecs): Add syslog encoder #23777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Original commit from syedriko
This is only a temporary change to make the diffs for future commits easier to follow.
- Introduce a `Pri` struct with fields for severity and facility as enum values. - `Pri` uses `strum` crate to parse string values into their appropriate enum variant. - Handles the responsibility of encoding the two enum values ordinal values into the `PRIVAL` value for the encoder. - As `Facility` and `Severity` enums better represent their ordinal mapping directly - The `Fixed` + `Field` subtyping with custom deserializer isn't necessary. Parsing a string that represents the enum by name or its ordinal representation is much simpler. - Likewise this removes the need for the get methods as the enum can provide both the `String` or `u8` representation as needed.
`SyslogSerializer::encode()` has been simplified. - Only matching `Event::Log` is relevant, an `if let` bind instead of `match` helps remove a redundant level of nesting. - This method only focuses on boilerplate now, delegating the rest to `ConfigDecanter` (_adapt `LogEvent` + encoder config_) and `SyslogMessage` (_encode into syslog message string_). - This removes some complexity during actual encoding logic, which should only be concerned about directly encoding from one representation to another, not complimentary features related to Vector config or it's type system. The new `ConfigDecanter` is where many of the original helper methods that were used by `SyslogSerializer::encode()` now reside. This change better communicates the scope of their usage. - Any interaction with `LogEvent` is now contained within the methods of this new struct. Likewise for the consumption of the encoder configuration (instead of queries to config throughout encoding). - The `decant_config()` method better illustrates an overview of the data we're encoding and where that's being sourced from via the new `SyslogMessage` struct, which splits off the actual encoding responsibility (see next commit).
`SyslogSerializerConfig` has been simplified. - Facility / Severity deserializer methods aren't needed, as per their prior refactor with `strum`. - The `app_name` default is set via `decant_config()` when not configured explicitly. - The other two fields calling a `default_nil_value()` method instead use an option value which encodes `None` into the expected `-` value. - Everything else does not need a serde attribute to apply a default, the `Default` trait on the struct is sufficient. - `trim_prefix` was removed as it didn't seem relevant. `tag` was also removed as it's represented by several subfields in RFC 5424 which RFC 3164 can also use. `SyslogMessage::encode()` refactors the original PR encoding logic: - Syslog Header fields focused, the PRI and final message value have already been prepared prior. They are only referenced at the end of `encode()` to combine into the final string output. - While less efficient than `push_str()`, each match variant has a clear structure returned via the array `join(" ")` which minimizes the noise of `SP` from the original PR. Value preparation prior to this is clear and better documented. - `Tag` is a child struct to keep the main logic easy to grok. `StructuredData` is a similar case.
No changes beyond relocating the code into a single file.
- Drop notes referring to original PR differences + StructuredData adaption references. None of it should be relevant going forward. - Revise some other notes. - Drop `add_log_source` method (introduced from the original PR author) in favor of using `StructuredData` support instead.
This should be simple and lightweight enough to justify for the DRY benefit? This way the method doesn't need to be duplicated redundantly. That was required because there is no trait for `FromRepr` provided via `strum`. That would require a similar amount of lines for the small duplication here. The `akin` macro duplicates the `impl` block for each value in the `&enums` array.
- `ConfigDecanter::get_message()` replaces the fallback method in favor of `to_string_lossy()` (a dedicated equivalent for converting `Value` type to a String type (_technically it is a CoW str, hence the follow-up with `to_string()`_)). - This also encodes the value better, especially for the default `log_namespace: false` as the message value (when `String`) is not quote wrapped, which matches the behaviour of the `text` encoder output. - Additionally uses the `LogEvent` method `get_message()` directly from `lib/vector-core/src/event /log_event.rs`. This can better retrieve the log message regardless of the `log_namespace` setting. - Encoding of RFC 5424 fields has changed to inline the `version` constant directly, instead of via a redundant variable. If there's ever multiple versions that need to be supported, it could be addressed then. - The RFC 5424 timestamp has a max precision of microseconds, thus this should be rounded and `AutoSi` can be used (_or `Micros` if it should have fixed padding instead of truncating trailing `000`_).
- The original PR author appears to have relied on a hard-coded timestamp key here. - `DateTime<Local>` would render the timestamp field with the local timezone offset, but other than that `DateTime<Utc>` would seem more consistent with usage in Vector, especially since any original TZ context is lost by this point? - Notes adjusted accordingly, with added TODO query for each encoding mode to potentially support configurable timezone.
- Move encoder config settings under a single `syslog` config field. This better mirrors configuration options for existing encoders like Avro and CSV. - `ConfigDecanter::value_by_key()` appears to accomplish roughly the same as the existing helper method `to_string_lossy()`. Prefer that instead. This also makes the `StructuredData` helper `value_to_string()` redundant too at a glance? - Added some reference for the priority value `PRIVAL`. - `Pri::from_str_variants()` uses the existing defaults for fallback, communicate that more clearly. Contextual note is no longer useful, removed.
To better communicate the allowed values, these two config fields can change from the `String` type to their appropriate enum type. - This relies on serde to deserialize the config value to the enum which adds a bit more noise to grok. - It does make `Pri::from_str_variants()` redundant, while the `into_variant()` methods are refactored to `deserialize()` with a proper error message emitted to match the what serde would normally emit for failed enum variant deserialization. - A drawback of this change is that these two config fields lost the ability to reference a different value path in the `LogEvent`. That'll be addressed in a future commit.
In a YAML config a string can optionally be wrapped with quotes, while a number that isn't quote wrapped will be treated as a number type. The current support was only for string numbers, this change now supports flexibility for config using ordinal values in YAML regardless of quote usage. The previous `Self::into_variant(&s)` logic could have been used instead of bringing in `serde-aux`, but the external helper attribute approach seems easier to grok/follow as the intermediary container still seems required for a terse implementation. The match statement uses a reference (_which requires a deref for `from_repr`_) to appease the borrow checker for the later borrow needed by `value` in the error message.
This seems redundant given the context? Mostly adds unnecessary noise. Could probably `impl Configurable` or similar to try workaround the requirement. The metadata description could generate the variant list similar to how it's been handled for error message handling?
Not sure if this is worthwhile, but it adopts error message convention elsewhere I've seen by managing them via Snafu.
Signed-off-by: Vitalii Parfonov <[email protected]>
…rity dynamic, payload_key optional Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Since there are no documentation updates in this PR, there's nothing for me to review, so I'm approving on behalf of the Docs Team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vparfonov, we will do a more thorough review shortly.
chrono.workspace = true | ||
csv-core = { version = "0.1.12", default-features = false } | ||
derivative.workspace = true | ||
derivative = { version = "2", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use workspace dependencies if they are used in >1 places.
snafu.workspace = true | ||
syslog_loose = { version = "0.23", default-features = false, optional = true } | ||
strum = { version = "0.26", features = ["derive"], optional = true } | ||
syslog_loose = { version = "0.22", default-features = false, optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intentionally downgrade here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, i think it was mistake i will fix
TL;DR: It looks like this PR is trying to mix in an additional feature for the encoder config to support referencing My feedback below is regarding changes related to the TLDR concern raised above, when comparing to my stale PR and it's review feedback. EDIT: The concern appears to have been raised internally, so further review should wait until the refactor is pushed. I can also update my PR if that'd help with changes I note below. Regex for RFC 3164 tag encoding
Could you please explain when the I assume this is specifically for the use-case where you want the static encoder config to additionally support referencing a different You have this as:
Compared to my implementation:
NOTE: I don't use the convenience of From the looks of it with your regex addition, you're adding support for when As this doesn't actually seem like a valid requirement for the encoder to support (see reference links in next section for both RFCs on this encoding step), I would discourage it and defer to correction of the input before it is processed by the encoder. Serde alias
|
Thank you for the thorough feedback @polarathene. Some general comments from me to help move things forward:
|
I normally do :) However I was mostly focused on the PR description statements and comparing to my own equivalent snippets from my earlier PR. Along with the understanding that the current PR was going to see some notable refactoring... it was easier for me to structure it as I did for my own reference. As such it was more of a discussion with the PR authors decisions, to get on the same page rather than review the existing PR. IIRC when reviewing inline on proposed changes, it doesn't always convey the scope of lines the comment is for (only highlights the last one and a few above it), so without a suggested change proposed, it was clearer for me to use as reference for comparison, especially after any rework was done, I'm not particularly interested in that extra leg work to dig up the old context intended 😅
I was more curious about the deviation from what I already had there. |
Summary
This PR this is a continuation of work on the PR started here: #21307
Changes:
DynamicOrStatic<T>
enum was introduced to allowfacility
andseverity
to be configured in multiple ways:serde deserializers
were implemented to handle this complex, multi-format input, providing clear error messages for invalid values.except_fields
, this allows users to remove specified fields from the LogEvent before it is processed, which is useful for stripping internal or sensitive data before it's included in the final payload.#[serde(alias = "tag")]
to theapp_name
field, allowing users to use the more familiar tag option for RFC3164 configurations.Vector configuration
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changelog
label to this PR.References
Notes
@vectordotdev/vector
to reach out to us regarding this PR.pre-push
hook, please see this template.make fmt
make check-clippy
(if there are failures it's possible some of them can be fixed withmake clippy-fix
)make test
git merge origin master
andgit push
.Cargo.lock
), pleaserun
make build-licenses
to regenerate the license inventory and commit the changes (if any). More details here.