Skip to content

Conversation

gwenaskell
Copy link

@gwenaskell gwenaskell commented Sep 18, 2025

Summary

This PR modifies the internal Splunk HEC source so that it can be extended with a custom set of extractors. It does not change the config or the behavior of the default implementation of the source.

Detailed explanation

  • a trait Extractor was created and added as a generic parameter to SplunkSource
  • the array of DefaultExtractors used in the EventIterator were grouped under a single DefaultExtractor type implementing Extractor and provided as the default type for the splunk source generic (the original DefaultExtractor type was renamed MetaExtractor)
  • this Extractor trait provides a constructor which is called when instantiating the EventIterator, passing a RequestMeta object containing information about the request being processed (splunk token, remote address)
  • the splunk source initialization logic was put out of the SplunkConfig into a listen method so that the SplunkSource can be easily initialized in a different context
  • some methods declared in the SplunkSource impl which did not depend on the SplunkSource struct (options, required_channel, lenient_json_content_type_check) were moved to plain functions to avoid the need for calling them with SplunkSource::<DefaultExtractor>::options()

Vector configuration

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Sep 18, 2025
@bruceg bruceg changed the title Refactor SplunkSource to support custom extractors chore(splunk_hec source): Refactor to support custom extractors Sep 18, 2025
@bruceg bruceg added source: splunk_hec Anything `splunk_hec` source related no-changelog Changes in this PR do not need user-facing explanations in the release changelog labels Sep 18, 2025
Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of this approach and left some comments for cleanups.

}

pub struct RequestMeta {
token: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice the type of this changed from Arc<str> to String. What motivated that?

Copy link
Author

@gwenaskell gwenaskell Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, the token is retrieved as an Option<String>. It is converted to an Arc<str> because this is the type expected by set_splunk_hec_token, but now we want to also make it available to the Extractor. I opted to keep the Option<String> type here because there is no explicit reason to use an Arc for that so it sounded more natural to keep a more "basic" type. What do you think?

),
],
token: f.meta.token.clone().map(Into::into),
extractors: E::new(f.meta, f.log_namespace),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved into EventIteratorGenerator instead of new being called here? How far up the call chain could that go?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f0d2d0, lmk if that looks better

/// handle to EventsReceived registry
events_received: Registered<EventsReceived>,
/// Whether to store the HEC token in the log event metadata
store_hec_token: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this flag eventually be expressed as an optional extractor? I'd call it out of scope for this PR but something to think about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ✅ 8616be0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources no-changelog Changes in this PR do not need user-facing explanations in the release changelog source: splunk_hec Anything `splunk_hec` source related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants