The filemanager tracks object state in S3 to show how objects are created, deleted or moved and maintains a queryable table of the results. It does this by ingesting S3 events into a postgres database and filling out object metadata like the storage class. It also supports annotating records with JSON key-values and tracking how objects move using S3 tags.
See the API guide for how to use the filemanager API and the architecture doc for details on design.
The documentation of the app has further details on how the filemanager works.
This service provides a RESTful API following OpenAPI conventions. The Swagger documentation of the production endpoint is available here:
https://file.prod.umccr.org/schema/swagger-ui
The filemanager has somewhat complex permission requirements as it prioritizes ingesting S3 events, and fills out
optional information from calls like HeadObject
and GetObjectTagging
. If permissions are lacking, the filemanager
may not fail ingesting, but instead proceed with partial information.
In general, the following S3 permissions are required for full functionality:
- For ingesting objects and calling
HeadObject
:s3:GetObject
s3:GetObjectVersion
- For crawling objects and listing buckets:
s3:ListBucket
s3:ListBucketVersions
- For tagging objects to track moves:
s3:GetObjectTagging
s3:GetObjectVersionTagging
s3:PutObjectTagging
s3:PutObjectVersionTagging
The filemanager may operation with a subset of these requirements and have limited functionality. For example, objects
will still be ingested if HeadObject
fails. This behaviour may change in the future. Note that the version-based
permissions are not required if bucket versioning is not used.
Another complexity in permission requirements is that the cache, and archive buckets should be accessible across accounts. This means that the bucket policy which allows the filemanager role access should have the above permissions set in the infrastructure repo.
In general, care should be taken when updating buckets for the filemanager, otherwise errors may occur.
The filemanager also requires permissions to access:
- The database with the
orcabus-rds-connect-filemanager
policy using an RDS IAM connection. - Access to the
orcabus/file-manager-presign-user
secret withsecretsmanager:GetSecretValue
andsecretsmanager:DescribeSecret
for presigning S3 urls. - Access to receiving events from the
orcabus-event-source-queue
SQS queue, VPC and CloudWatch access.
This service employs a fully automated CI/CD pipeline that automatically builds and releases all changes to the main
code branch.
There are no automated changelogs or releases, however semantic versioning is followed for any manual release, and conventional commits are used for future automation.
The filemanager is a primarily a stateless service that consumes S3 events and maintains an API to query database
records. It also has a stateful AccessKeySecret
user which is able to presign long-lived URLs, and an event source
SQS queue that receives S3 events from the event bus.
There are 4 Lambda functions using RustFunction
in the stateless stack:
- An
IngestFunction
which converts and inserts S3 events into the filemanager database. - An
ApiFunction
which responds to requests using API Gateway. - A
MigrateFunction
which migrates and makes changes to the database tables. - An
InventoryFunctino
which consumes an S3 inventory instead of S3 events.
You can access CDK commands using the pnpm
wrapper script.
cdk-stateless
: Used to deploy the filemanagerRustFunction
scdk-stateful
: Used to deploy the filemanagerAccessKeySecret
andEventSource
.
The type of stack to deploy is determined by the context set in the ./bin/deploy.ts
file. This ensures the correct stack is executed based on the provided context.
For example:
# Deploy a stateless stack
pnpm cdk-stateless <command>
# Deploy a stateful stack
pnpm cdk-stateful <command>
This CDK project manages multiple stacks. The root stack (the only one that does not include DeploymentPipeline
in its stack ID)
is deployed in the toolchain account and sets up a CodePipeline for cross-environment deployments to beta
, gamma
, and prod
.
To list all available stacks, run the cdk-stateless
or cdk-stateful
script:
pnpm cdk-stateless ls
Output:
OrcaBusStatelessFileManagerStack
OrcaBusStatelessFileManagerStack/DeploymentPipeline/OrcaBusBeta/FileManagerStack (OrcaBusBeta-FileManagerStack)
OrcaBusStatelessFileManagerStack/DeploymentPipeline/OrcaBusGamma/FileManagerStack (OrcaBusGamma-FileManagerStack)
OrcaBusStatelessFileManagerStack/DeploymentPipeline/OrcaBusProd/FileManagerStack (OrcaBusProd-FileManagerStack)
The root of the project is an AWS CDK project and the main application logic lives inside the ./app
folder.
The project is organized into the following directories:
-
./app
: Contains the main application logic written in Rust. -
./bin/deploy.ts
: Serves as the entry point of the application. It initializes two stacks:stateless
andstateful
. -
./infrastructure
: Contains the infrastructure code for the project:./infrastructure/toolchain
: Includes stacks for the stateless and stateful resources deployed in the toolchain account. These stacks primarily set up the CodePipeline for cross-environment deployments../infrastructure/stage
: Defines the stage stacks for different environments:./infrastructure/stage/functions
: Contains the filemanager function definitions../infrastructure/stage/config.ts
: Contains environment-specific configuration files (e.g.,beta
,gamma
,prod
)../infrastructure/stage/filemanager-stateless-stack.ts
: The CDK stack entry point for provisioning stateless resources required by the application in./app
../infrastructure/stage/filemanager-stateful-stack.ts
: The CDK stack entry point for provisioning stateful resources required by the application in./app
.
-
.github/workflows/pr-tests.yml
: Configures GitHub Actions to run tests formake check-all
(linting and code style), tests defined in./test
, andmake test
for the./app
directory. -
./test
: Contains tests for CDK code compliance againstcdk-nag
.
This project requires Rust for development. It's recommended for it to be installed to make use of local bundling, however to just deploy the stack, all that should be required is pnpm and nodejs:
node --version
v22.9.0
# Update Corepack (if necessary, as per pnpm documentation)
npm install --global corepack@latest
# Enable Corepack to use pnpm
corepack enable pnpm
To install pnpm dependencies, run:
make install
A top-level Makefile
contains commands to install, build, lint and test code. See the Makefile
in the app
directory
for commands to run lints against the application code. There are links to the app Makefile
in the top-level Makefile
.
Automated checks are enforced via pre-commit hooks, ensuring only checked code is committed. For details consult the .pre-commit-config.yaml
file.
To run linting and formatting checks on the whole project (this requires Rust), use:
make check-all
To automatically fix issues with ESLint and Prettier, run:
make fix
Tests for the application are contained in the app
directory. Infrastructure and cdk-nag tests can be run by using:
make test