Skip to content

Conversation

@brynblack
Copy link

@brynblack brynblack commented Feb 25, 2024

Description

This PR adds a module to Polykey-CLI that allows for the automated activation of the Polykey Agent through a systemd service, both in user-space and system-wide space.

Two module configurations are provided, both a programs module and a services module. Fundamentally, both have the same configuration options. The main difference here is that the programs module gives each user their own unit service that can be enabled on a per-user basis, while the system service can be managed by a root user, or members of the group polykey (a lot like Docker daemon).

At the moment, the module will require the user to provide a path to the password/recovery-code files, and will otherwise fail if not provided.

In the future by default, the system level service will read from a directory under /var/lib/polykey for the relevant password and recovery code files, and under the user-level service, will read from ~/.config/polykey. These paths may be overwritten in the module configuration.

feature-module Spec excalidraw

Issues Fixed

Tasks

  • 1. Implement the service module
  • 2. Implement the programs module
  • 3. Add minor finishing touches
  • 3. Test module and expected behaviour

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@brynblack brynblack self-assigned this Feb 25, 2024
@brynblack brynblack linked an issue Feb 28, 2024 that may be closed by this pull request
@tegefaulkes
Copy link
Contributor

tegefaulkes commented Mar 1, 2024

This PR isn't very clear. This is related to the service configuration using nix? Should we call it nix service configuration? Also, if there are no issues directly fixed by this then just remove that section.

Also, do we have a specification for this anywhere? is #143 related?

@CMCDragonkai
Copy link
Member

@brynblack please make sure you got an issue in nixpkgs-matrix - that should be an epic that governs all the relevant issues of moving into module style. This PR should then be part of that - possibly Related or a Fix depending of scope. Then spec out implementation wise - and important to tell people how to use it if there's a special flag. And if we need special flag that's not quite good, it's better if things "work by default".

@CMCDragonkai CMCDragonkai requested a review from amydevs March 1, 2024 23:11
@tegefaulkes
Copy link
Contributor

Ok, so after setting up the password file and restarting the service it just works.

❯ systemctl status polykey
● polykey.service - Polykey Agent
     Loaded: loaded (/etc/systemd/system/polykey.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-03-06 15:53:59 AEDT; 14s ago
   Main PID: 23832 (polykey-agent)
         IP: 29.7K in, 44.2K out
         IO: 23.2M read, 0B written
      Tasks: 78 (limit: 38078)
     Memory: 580.9M (peak: 701.9M)
        CPU: 6.707s
     CGroup: /system.slice/polykey.service
             └─23832 polykey-agent

Running agent status works too. so its using the correct default path.

❯ sudo npm run polykey -- agent status
[sudo] password for brian: 

> [email protected] polykey
> ts-node src/polykey.ts agent status

✔ Please enter the password … ************
status                  LIVE
pid                     23832
nodeId                  vu5gloq1a1abnf07q47vl74seln1haf5leeeck8dah58d198mf750
clientHost              ::1
clientPort              43095
agentHost               ::
agentPort               43627
upTime                  177
connectionsActive       2
nodesTotal              2
version                 0.1.8
sourceVersion           1.2.1-alpha.47
stateVersion            1
networkVersion          1
commitHash              2bcebd05046bbae86a7b9da67e22613720952627
libVersion              1.2.1-alpha.47-1-1
libSourceVersion        1.2.1-alpha.47
libStateVersion         1
libNetworkVersion       1

Note, that the service is running in root, so I have to sodo the command to have it get the correct default path. That's not ideal right now.

So there are a few things I'd like to point out.

  1. It seems simple enough, the service is added be default on the new platforms changes. But it doesn't run properly without providing a password file in some magic path. So I need to create a pass file in /run/keys/polykey before the service can even run. Ideally we have a safe way of defining secrets and then using config parameters to use them. Is this possible at all?
  2. The service itself is running as root. No path is specified so it defaults to ~/.local/share/polykey. For the root user this is /root/.local/share/polykey. From a usability standpoint if we wanted to use this node we'd have to specify nodeid, ip and port directly OR run the command as root with sudo polykey agent start. Ideally we'd be able to seamlessly access this node without using root or specifying the path/connection info. Maybe run the service as a user we configure?
  3. Can we add a way to configure flags when setting up the service? For example, can we override the -np --node-path that is used when running? Specify the recovery code or --fresh and the like?
  4. Can we configure the user and group the service runs as? I think it's best practice to not run a service as root if at all possible.
  5. Can we configure multiple Polykey services at once? I know it's possible to have multiple of the same service running under different names and configurations. Things like VPNs need this to run multiple networks with the same program.

@tegefaulkes
Copy link
Contributor

Not a problem with the service per se, but the agent still has issues with connections dropping out.

Mar 07 09:35:59 matrix-precision-3480-syzygy polykey[44478]: WARN:polykey.PolykeyAgent:Moving Task v0pf803kstlo0180keei7clko5c from Active to Queued
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: (node:44478) [DEP0112] DeprecationWarning: Socket.prototype._handle is deprecated
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: (Use `node --trace-deprecation ...` to show where the warning was created)
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: WARN:polykey.PolykeyAgent.NodeManager:Duplicate refreshBucket task was found for bucket 255, cancelling
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: WARN:polykey.PolykeyAgent.NodeManager:Duplicate refreshBucket task was found for bucket 254, cancelling
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: pid               44478
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: nodeId            vu5gloq1a1abnf07q47vl74seln1haf5leeeck8dah58d198mf750
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: clientHost        ::1
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: clientPort        41049
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: agentHost         ::
Mar 07 09:36:01 matrix-precision-3480-syzygy polykey[44478]: agentPort         36307
Mar 07 09:40:12 matrix-precision-3480-syzygy polykey[44478]: ErrorQUICClientInternal: Failed to send data on the QUICSocket
Mar 07 09:40:12 matrix-precision-3480-syzygy systemd[1]: polykey.service: Main process exited, code=exited, status=70/SOFTWARE
Mar 07 09:40:12 matrix-precision-3480-syzygy systemd[1]: polykey.service: Failed with result 'exit-code'.
Mar 07 09:40:12 matrix-precision-3480-syzygy systemd[1]: polykey.service: Consumed 35.082s CPU time, 315.7M memory peak, 0B memory swap peak, no IO, received 154.9K IP traffic, sent 270.3K IP traffic.

I can trigger this by disabling wifi on the laptop.

I thought I had fixed this in quic where send failures are thrown back to the connection making the send. The error codes it handles this way is very focused so it may be a new error code which is getting bubbled up to an internal error in stead. I'll need to look into this.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 6, 2024 via email

@brynblack brynblack force-pushed the feature-module branch 21 times, most recently from 1a1a255 to 638d394 Compare March 13, 2024 06:29
@brynblack brynblack force-pushed the feature-module branch 3 times, most recently from a743dbd to af18f72 Compare March 27, 2024 07:09
@tegefaulkes
Copy link
Contributor

I'm going to do another review of this in a sec. Since I'm not directly working on this I usually don't have a full picture of all the decisions made and what the final constraints are. It's clear that what we thought was needed has evolved over time and we've made some changes to that. But I don't think you've updated the spec in #143 so I don't have a clear reference to todo to compare against for working out if this is done or not.

Remember that we refer to the spec while implementing, reviewing and even much later when writing up the R&D report. So it needs to be updated as the constraints change.

@brynblack
Copy link
Author

But I don't think you've updated the spec in #143 so I don't have a clear reference to todo to compare against for working out if this is done or not.

I want to just add on to this, there never really was a specification issue per say pertaining to the entire specification of this, issue #143 was more related to the secret zero problem itself. I have instead mostly been using this PR as the specification.

@brynblack
Copy link
Author

Something else to note I noticed CI has been failing, something to do with the identities and discovering gestalts by node. Investigating.

@CMCDragonkai
Copy link
Member

@brynblack can you update the spec - and describe at a high level what this will do - as well as provide excalidraw diagram.

Copy link
Contributor

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using mkOptional for required fields seems weird. Is there any alternative?

Otherwise it looks fine to my un-trained eye. So long as everything works with manual testing then I think we're good.

@brynblack
Copy link
Author

Using mkOptional for required fields seems weird. Is there any alternative?

Where exactly is this? Are you referring to mkOption? or lib.optionalString?

Copy link
Contributor

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if it works and you think it's good then I think we're good.

@CMCDragonkai
Copy link
Member

Needs to be made ready to merge @tegefaulkes.

@CMCDragonkai
Copy link
Member

This must start being used now. I'm reviewing the implementation along with the new platform profiles in our Orchestrator.

@tegefaulkes
Copy link
Contributor

Working on this now. It just needs review comments addressed and then merge?

brynblack and others added 4 commits April 8, 2024 13:15
feat: initial module

feat: added service

wip: changed from programs to services

wip: removed ExecStop

wip: changing from /root/... to /run/keys/... for secret zero

wip: module configuration refactoring

feat: improved module configuration

feat: added recovery option

feat: added -dsf CLI option

fix: ignore bootstarp failure

fix: renamed option to recoveryCodeOutFile

fix: minor fixes
Refactored code and clarified agent service intentions

fix: removed defaults

feat: added file permissions checker
- Fixed importing `* as fs` to just `fs`.
- Fixed descriptions of the recovery code file options. Made the distinction between them clear.
@tegefaulkes
Copy link
Contributor

Rebased on staging.

@tegefaulkes tegefaulkes merged commit 5070a60 into staging Apr 8, 2024
@CMCDragonkai
Copy link
Member

@CryptoTotalWar the above spec would need to be adapted for the Polykey-Docs too. It would come under deployment usage or service usage.

@CryptoTotalWar
Copy link

@CryptoTotalWar the above spec would need to be adapted for the Polykey-Docs too. It would come under deployment usage or service usage.

@CMCDragonkai Polykey-Docs Issue created to track documentation request. MatrixAI/Polykey-Docs#40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

NixOS Service Configuration via flake.nix

6 participants