Skip to content

Conversation

@yaauie
Copy link
Member

@yaauie yaauie commented Jan 30, 2025

Release notes

  • adds bin/logstash-plugin clean command to allow the plugin manager to prune orphaned dependencies

What does this PR do?

adds bin/logstash-plugin clean command to allow the plugin manager to prune orphaned dependencies

Why is it important/What is the impact to the user?

  • For users who update plugins, providing a way to clean out deactivated dependencies can reduce the disk footprint of the logstash installation
  • For building Logstash artifacts, a clean option allows us to build a true subset of the dependency graph after removing plugins that are not needed in a minimalized artifact

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • [ ] I have added tests that prove my fix is effective or that my feature works There are existing tests around LogStash::Bunder#invoke!(clean: true)

Author's Checklist

  • File follow-up to document after the asciidoc-freeze on main is lifted.

How to test this PR locally

  1. remove a plugin that has dependencies (like logstash-integration-aws)
    bin/logstash-plugin remove logstash-integration-aws
  2. run the plugin manger's clean command
    bin/logstash-plugin clean
  3. Observe that the orphaned plugin and dependencies are cleaned up and removed from vendor

Logs

╭─{ rye@perhaps:~/src/elastic/logstash@main (pluginmanager-clean-command ✘) }
╰─● bin/logstash-plugin remove logstash-integration-aws
Using system java: /Users/rye/.jenv/shims/java
Resolving dependencies......
Resolving dependencies......
Successfully removed logstash-integration-aws
[success (00:00:06)]

╭─{ rye@perhaps:~/src/elastic/logstash@main (pluginmanager-clean-command ✘) }
╰─● bin/logstash-plugin clean
Using system java: /Users/rye/.jenv/shims/java
cleaned inactive plugin logstash-integration-aws-7.1.8 (java)
cleaned inactive dependency aws-eventstream (1.3.0)
cleaned inactive dependency aws-partitions (1.1043.0)
cleaned inactive dependency aws-sdk-cloudfront (1.109.0)
cleaned inactive dependency aws-sdk-cloudwatch (1.109.0)
cleaned inactive dependency aws-sdk-core (3.217.0)
cleaned inactive dependency aws-sdk-kms (1.97.0)
cleaned inactive dependency aws-sdk-resourcegroups (1.77.0)
cleaned inactive dependency aws-sdk-s3 (1.179.0)
cleaned inactive dependency aws-sdk-sns (1.94.0)
cleaned inactive dependency aws-sdk-sqs (1.91.0)
cleaned inactive dependency aws-sigv4 (1.11.0)
cleaned inactive dependency jar-dependencies (0.4.1)
cleaned inactive dependency jmespath (1.6.2)
[success]

Copy link
Member

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty straight forward! I like partitioning by plugin gem vs supporting gem dep.

> ~~~
> ╭─{ rye@perhaps:~/src/elastic/logstash@main (pluginmanager-clean-command ✘) }
> ╰─● bin/logstash-plugin remove logstash-integration-aws
> Using system java: /Users/rye/.jenv/shims/java
> Resolving dependencies......
> Resolving dependencies......
> Successfully removed logstash-integration-aws
> [success (00:00:06)]
>
> ╭─{ rye@perhaps:~/src/elastic/logstash@main (pluginmanager-clean-command ✘) }
> ╰─● bin/logstash-plugin clean
> Using system java: /Users/rye/.jenv/shims/java
> cleaned inactive plugin logstash-integration-aws-7.1.8 (java)
> cleaned inactive dependency aws-eventstream (1.3.0)
> cleaned inactive dependency aws-partitions (1.1043.0)
> cleaned inactive dependency aws-sdk-cloudfront (1.109.0)
> cleaned inactive dependency aws-sdk-cloudwatch (1.109.0)
> cleaned inactive dependency aws-sdk-core (3.217.0)
> cleaned inactive dependency aws-sdk-kms (1.97.0)
> cleaned inactive dependency aws-sdk-resourcegroups (1.77.0)
> cleaned inactive dependency aws-sdk-s3 (1.179.0)
> cleaned inactive dependency aws-sdk-sns (1.94.0)
> cleaned inactive dependency aws-sdk-sqs (1.91.0)
> cleaned inactive dependency aws-sigv4 (1.11.0)
> cleaned inactive dependency jar-dependencies (0.4.1)
> cleaned inactive dependency jmespath (1.6.2)
> [success]
> ~~~
@yaauie yaauie force-pushed the pluginmanager-clean-command branch from a9896fb to 131e22c Compare February 19, 2025 23:41
@yaauie yaauie requested a review from donoghuc February 20, 2025 00:36
Copy link
Member

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about something... When i do a clean build (check out this code) and run ./gradlew --console=plain clean bootstrap assemble installDefaultGems to get a fresh build. Running the clean command a couple times removes some gems we actually need.... Once that happens I get messages about jar-dependencies not being able to be loaded etc. @yaauie Is this a problem with out I'm building/testing? Or is is this an issue we need to dig further in to?

➜  logstash git:(c4196a9676) ✗ bin/logstash-plugin clean
Using system java: /Users/cas/.jenv/shims/java
cleaned inactive dependency public_suffix (6.0.1)
cleaned inactive dependency ruby-maven (3.9.3)
➜  logstash git:(c4196a9676) ✗ bin/logstash-plugin clean
Using system java: /Users/cas/.jenv/shims/java
cleaned inactive dependency jar-dependencies (0.4.1)
➜  logstash git:(c4196a9676) ✗ bin/logstash-plugin clean
Using system java: /Users/cas/.jenv/shims/java

That said I did some additional manual testing with two example gems a and b:

logstash git:(c4196a9676) ✗ cat a.gemspec
Gem::Specification.new do |s|
    s.name = 'a'
    s.version = '0.1.1'
    s.summary = "A depends on B"
    s.authors = ["Test"]
    s.files = [__FILE__]
    s.add_runtime_dependency "winrm", ">= 0.1"
    s.metadata = { "logstash_plugin" => "true" }
  endlogstash git:(c4196a9676) ✗ cat b.gemspec
Gem::Specification.new do |s|
    s.name = 'b'
    s.version = '0.1.1'
    s.summary = "B - no dependencies"
    s.authors = ["Test"]
    s.files = [__FILE__]
    s.add_runtime_dependency "erubi", ">= 0"
    s.metadata = { "logstash_plugin" => "true" }
  end

I did combinations of adding/removing and cleaning and everything I could come up with worked as expected!

@yaauie
Copy link
Member Author

yaauie commented Feb 20, 2025

Once that happens I get messages about jar-dependencies not being able to be loaded etc

That is in fact concerning. We're just proxying to bundler, so I don't see why that dependency should get cleaned. I'll dig in.

@yaauie yaauie requested a review from donoghuc February 22, 2025 06:41
Comment on lines 51 to 59
%w(
full_gem_path
loaded_from
spec_file
cache_file
build_info_file
doc_dir
).map { |attr| spec.public_send(attr) }
.compact
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably reads easier, and avoids the metaprogramming indirection of using Object#public_send:

Suggested change
%w(
full_gem_path
loaded_from
spec_file
cache_file
build_info_file
doc_dir
).map { |attr| spec.public_send(attr) }
.compact
[
spec.full_gem_path,
spec.loaded_from,
spec.spec_file,
spec.cache_file,
spec.build_info_file,
spec.doc_dir,
].compact

@donoghuc
Copy link
Member

I've been questioning the utility of making this a stand alone command. I feel like as a consumer of remove I would expect that to leave the system in a clean state. It seems like we would be always reccomending using a clean right after a remove which makes me think we should just fold them in to the same action. Is there a case where we would want to remove without clean?

$stderr.puts("]")
else
verb = "cleaned"
FileUtils.rm_rf(file_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing files on disk seems like the most severe option. I get that bundle clean was not doing exactly what we want, i would think the next level of granularity would be to collect a list of innactive gems then use rubygems to uninstall each (for example ./vendor/jruby/bin/gem uninstall [orphaned-gem].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, and that is effectively what bundle clean does under the hood.

The trouble is that we are invoking a modified Bundler, which is the same Bundler that is managing the dependencies of the current process, and I don't have confidence that invoking the gem binary from jruby will invoke rubygems as-configured to remove the gems from our vendored location without also having side-effects.

@yaauie
Copy link
Member Author

yaauie commented Feb 25, 2025

I've been questioning the utility of making this a stand alone command. I feel like as a consumer of remove I would expect that to leave the system in a clean state. It seems like we would be always reccomending using a clean right after a remove which makes me think we should just fold them in to the same action. Is there a case where we would want to remove without clean?

I agree, in principle, but I don't know if anyone in the wild has a realistic expectation for the plugins that have been removed to be still present on disk (e.g., airgapped environments where they want to be able to re-install the plugins).

I think this stems from us conflating two closely related things, notably:

  • bundler's add and remove are primarily about adding and removing dependencies from the Gemfile, but add has the side-effect of needing to re-resolve the dependency graph and will install missing gems using rubygems if necessary
  • gem's install and uninstall actually installs/unpacks the gem, or removes the gem's various artifacts

Our bin/logstash-plugin install effectively maps to bundle add (relying on its invocation of bundle install), and our bin/logstash-plugin remove maps to bundle remove. Neither of them work directly with rubygems, but instead rely on our (modified) bundler to do so on its behalf.

Because bin/logstash-plugin is our plugin manager, I would expect it to fully manage the plugins installed state, so I would expect any command that "orphans" dependencies to remove them. This includes the remove, install, and update commands, since we can install a later version, or install a plugin that modifies the required versions of a shared dependency. I'm open to both of these commands cleaning the orphaned gems by default.

NOTE: the clean step was removed intentionally, apparently to work around a bug in update

I had considered adding a --[no-]clean flag to bin/logstash-plugin remove. It looks like the bit underneath bin/logstash-plugin install does have a --clean flag that gets propagated to bundle install, but I am unsure if it has the same risks of borking the dependency graph that directly invoking bundle clean does. There is also the ambiguity of what should happen if someone invokes the plugin manager multiple times and only sometimes tells it to clean (or to skip cleaning, if cleaning is the default).

@donoghuc
Copy link
Member

Correct me if i'm wrong, but it seems like our recent addition of the ability to remove multiple deps in a single invocation of remove did not introduce the ability to orphan dependencies? I wonder if the clean command is required or urgent. Deleting gems manually just seems very risky to me and im wondering if it is worth solving immediately. As you mention we are already in a tricky situation with our use of bundler and writing code to modify our gem environment outside of bundler just seems like we are asking for more trouble.

@robbavey
Copy link
Member

What is the practical effect and risk of not having a clean method -

  • In a limited/controlled environment, where we do not supply the plugin manager to the end user, and we are basically providing an "immutable" version of Logstash?
  • A more standard environment, where we still want to supply a reduced set of plugins, but still have the option of adding/updating plugins that are already present?

@robbavey
Copy link
Member

cc @jsvd

@yaauie yaauie requested a review from donoghuc February 27, 2025 15:19
@elastic-sonarqube
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

Gem::Uninstaller.new(gem_spec.name, removal_options.merge(version: gem_spec.version)).uninstall
end
rescue Gem::InstallError => e
fail "Failed to uninstall `#{gem_spec.full_name}`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add anything from the error object here? e.message? Maybe more at debug level?

Copy link
Member

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the refactor to use Gem uninistall instead of deleting files out of band. Open question on whether we make this a top level command or just do it after any option that can orphan deps.

@mergify
Copy link
Contributor

mergify bot commented Mar 5, 2025

This pull request does not have a backport label. Could you fix it @yaauie? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.

@mergify
Copy link
Contributor

mergify bot commented Mar 5, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Mar 5, 2025
@donoghuc
Copy link
Member

donoghuc commented Mar 5, 2025

Closing in favor of #17203

@donoghuc donoghuc closed this Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.x Automated backport to the 8.x branch with mergify

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants