Skip to content

Conversation

@maxmoehl
Copy link
Member

@maxmoehl maxmoehl commented Sep 10, 2025

Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and points of discussion.

General:

  • I'm missing the "opt-in" nature and control of the feature and discussion of possible security risks when this feature is enabled overall. The "attacker" in this case is a rouge operator and that might be more of an organizational problem. But the app's operator shouldn't be able to override an org policy, etc.
  • Because this is an analog to bosh pcap, maybe we can explain the functionality and one of its main benefits (multiplexing of traffic captured across multiple instances) a bit earlier? Right now it's in the very last sentence of the very last section (before the references). If it's important it should be repeated.

@maxmoehl
Copy link
Member Author

Maybe a general note on the opt-in: there were no extensive discussions on what that "opt-in" would look like. In my opinion it should just be the existing SSH access and not a dedicated flag. Mainly because having SSH access already puts you in a position of being able to do a lot and we have this setup as a feature flag on the different levels of CF.

@peanball
Copy link
Contributor

peanball commented Sep 10, 2025

Maybe a general note on the opt-in: there were no extensive discussions on what that "opt-in" would look like. In my opinion it should just be the existing SSH access and not a dedicated flag. Mainly because having SSH access already puts you in a position of being able to do a lot and we have this setup as a feature flag on the different levels of CF.

I see your point. That said, capturing network traffic might show data that is not even stored plain-text in a database that you could gain access to via SSH. So maybe something worth discussing more extensively.

For future discussions and votes: My opinion is that network capture is a privilege beyond SSH.

Copy link
Member

@ameowlia ameowlia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Would a user be able to see unencrypted traffic contents?

  2. What roles would be able to use this comment for a given app?

  3. Would this feature always be on, or would there be a foundation wide flag?

  4. Could there be a cf event to log when this action is taken?

Comment on lines 110 to 117
```bash
# Capture HTTP traffic for myapp
cf pcap myapp --interface eth0 --filter "tcp port 80" --snaplen 1500

# Capture specific instance with custom filter
cf pcap myapp --instance 1 --filter "host database.example.com"
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some examples of what the output would look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please check again.

@peanball
Copy link
Contributor

Some replies to Amelia's questions. @maxmoehl, please correct me if I'm wrong.

  • Would a user be able to see unencrypted traffic contents?

If they capture on the interface lo, they would.

  • What roles would be able to use this comment for a given app?

This needs to be fleshed out in the RFC a bit. I think it depends on the scenario. In a test org or space a group of developers may want to capture stuff, in a production org maybe only select people after prior approval.

  • Would this feature always be on, or would there be a foundation wide flag?

Foundation wide general flag, and ideally additional per org/space/app(?) flag.

  • Could there be a cf event to log when this action is taken?

That is a good idea, for traceability purposes.

@beyhan beyhan added toc rfc CFF community RFC labels Sep 12, 2025
@maxmoehl
Copy link
Member Author

maxmoehl commented Sep 16, 2025

  • What roles would be able to use this comment for a given app?

This needs to be fleshed out in the RFC a bit. I think it depends on the scenario. In a test org or space a group of developers may want to capture stuff, in a production org maybe only select people after prior approval.

My current proposal (though I should maybe clarify this a bit more) does not introduce any additional permissions. As long as a user can SSH into the app they will be able to initiate a capture. That includes capturing plain-text traffic.

  • Would this feature always be on, or would there be a foundation wide flag?

Foundation wide general flag, and ideally additional per org/space/app(?) flag.

This is where it gets tricky. Foundational flag we can somehow make work via diego-release, it would need to control whether the binary is injected or not which makes this complicated as the injected binaries right now come via a BOSH package which is not configurable in any way. I would prefer not to add org/space/app flags beyond the SSH one1.

  • Could there be a cf event to log when this action is taken?

That is a good idea, for traceability purposes.

This comes down to the feature just being SSH with a special binary. There won't be any interaction with the CF API beyond SSH, so no special audit log will be written.

I will also spend some time today addressing the remaining comments.

Footnotes

  1. This also comes down to capacity. If we require a feature flag equivalent to the SSH one on every level I would need someone to help me make this work as it is quite a lot more work.

@maxmoehl maxmoehl marked this pull request as ready for review September 25, 2025 05:51
@beyhan beyhan requested review from a team, Gerg, beyhan, cweibel, rkoster and stephanme and removed request for a team September 29, 2025 06:49
@beyhan beyhan moved this from Inbox to In Progress in CF Community Sep 30, 2025
cweibel
cweibel previously approved these changes Oct 14, 2025
@rkoster
Copy link
Contributor

rkoster commented Oct 21, 2025

I'm concerned about the additional attack surface being introduced. I would highly recommend making this opt-in at the bosh level. As in don't deploy the binary unless a property is being set. For the simple reason that when a CVE is being discovered in the to be created binary an operator can just switch the property to mitigate the security risk.

In general it would be good to call out all the security controls that will be put in place around the to be created binary. For example it should probably be mounted from a read only location, be statically linked to prevent dynamic library injection, etc.

@maxmoehl
Copy link
Member Author

I'm concerned about the additional attack surface being introduced. I would highly recommend making this opt-in at the bosh level. As in don't deploy the binary unless a property is being set. For the simple reason that when a CVE is being discovered in the to be created binary an operator can just switch the property to mitigate the security risk.

Valid point, however, I'm not sure how to best achieve this. Right now the injection of such binaries is achieved via the various *_lifecycle BOSH packages of diego-release which simply can't be parameterized (see also my previous comment). Could someone from the diego side chime in on how we could best achieve this?

In general it would be good to call out all the security controls that will be put in place around the to be created binary. For example it should probably be mounted from a read only location, be statically linked to prevent dynamic library injection, etc.

I'll add that to the RFC: being written in Go it will be statically linked (without CGO), based on how the injection currently works I would copy it into the container instead of mounting it, possibly with a way to prevent that step with a flag.

I also have a small correction: it seems like the container does not have cap_net_admin but it seems like we also don't need it for basic captures like TCP so I will remove that part (thanks @thomasthal for pointing me in the right direction).

@beyhan
Copy link
Member

beyhan commented Nov 4, 2025

We clarified the open points in the comments above. The assumption was that the container is getting more privileges than before which is not the case. That is why the point with making the binary loaded dynamically is not valid any more.

That is why, we decided to start the final comment period during the TOC meeting today with the goal to accept this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc CFF community RFC toc

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

6 participants