Python package for interacting with CloudOS
- Requirements
- Installation
- Usage
- Configuration
- Commands
- Python API Usage
- Unit Testing
CloudOS CLI requires Python 3.9 or higher and several key dependencies for API communication, data processing, and user interface functionality.
click>=8.0.1
pandas>=1.3.4
numpy>=1.26.4
requests>=2.26.0
rich_click>=1.8.2
CloudOS CLI can be installed in multiple ways depending on your needs and environment. Choose the method that best fits your workflow.
The repository is also available from PyPI:
pip install cloudos-cli
To update CloudOS CLI to the latest version using pip, you can run:
pip install --upgrade cloudos-cli
To check your current version:
cloudos --version
It is recommended to install it as a docker image using the Dockerfile
and the environment.yml
files provided.
To run the existing docker image at quay.io
:
docker run --rm -it quay.io/lifebitaiorg/cloudos-cli:latest
You will need Python >= 3.9 and pip installed.
Clone the repo and install it using pip:
git clone https://github.com/lifebit-ai/cloudos-cli
cd cloudos-cli
pip install -r requirements.txt
pip install .
NOTE: To be able to call the
cloudos
executable, ensure that the local clone of thecloudos-cli
folder is included in thePATH
variable, using for example the commandexport PATH="/absolute/path/to/cloudos-cli:$PATH"
.
CloudOS CLI can be used both as a command-line interface tool for interactive work and as a Python package for scripting and automation.
To get general information about the tool:
cloudos --help
Usage: cloudos [OPTIONS] COMMAND [ARGS]...
CloudOS python package: a package for interacting with CloudOS.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --debug Show detailed error information and tracebacks │
│ --version Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ bash CloudOS bash functionality. │
│ configure CloudOS configuration. │
│ cromwell Cromwell server functionality: check status, start and stop. │
│ datasets CloudOS datasets functionality. │
│ job CloudOS job functionality: run, check and abort jobs in CloudOS. │
│ procurement CloudOS procurement functionality. │
│ project CloudOS project functionality: list and create projects in CloudOS. │
│ queue CloudOS job queue functionality. │
│ workflow CloudOS workflow functionality: list and import workflows. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
This will tell you the implemented commands. Each implemented command has its own subcommands with its own --help
:
cloudos job list --help
Collect workspace jobs from a CloudOS workspace in CSV or JSON format.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * --apikey -k TEXT Your CloudOS API key [required] │
│ * --cloudos-url -c TEXT The CloudOS url you are trying to access to. │
│ Default=https://cloudos.lifebit.ai. │
│ [required] │
│ * --workspace-id TEXT The specific CloudOS workspace id. [required] │
│ --output-basename TEXT Output file base name to save jobs list. Default=joblist │
│ --output-format [csv|json] The desired file format (file extension) for the output. │
│ For json option --all-fields will be automatically set to │
│ True. Default=csv. │
│ --all-fields Whether to collect all available fields from jobs or just │
│ the preconfigured selected fields. Only applicable when │
│ --output-format=csv. Automatically enabled for json │
│ output. │
│ --last-n-jobs TEXT The number of last workspace jobs to retrieve. You can │
│ use 'all' to retrieve all workspace jobs. Default=30. │
│ --page INTEGER Response page to retrieve. If --last-n-jobs is set, then │
│ --page value corresponds to the first page to retrieve. │
│ Default=1. │
│ --archived When this flag is used, only archived jobs list is │
│ collected. │
│ --filter-status TEXT Filter jobs by status (e.g., completed, running, failed, │
│ aborted). │
│ --filter-job-name TEXT Filter jobs by job name ( case insensitive ). │
│ --filter-project TEXT Filter jobs by project name. │
│ --filter-workflow TEXT Filter jobs by workflow/pipeline name. │
│ --last When workflows are duplicated, use the latest imported │
│ workflow (by date). │
│ --filter-job-id TEXT Filter jobs by specific job ID. │
│ --filter-only-mine Filter to show only jobs belonging to the current user. │
│ --filter-queue TEXT Filter jobs by queue name. Only applies to jobs running │
│ in batch environment. Non-batch jobs are preserved in │
│ results. │
│ --filter-owner TEXT Filter jobs by owner username. │
│ --verbose Whether to print information messages or not. │
│ --disable-ssl-verification Disable SSL certificate verification. Please, remember │
│ that this option is not generally recommended for │
│ security reasons. │
│ --ssl-cert TEXT Path to your SSL certificate file. │
│ --profile TEXT Profile to use from the config file │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CloudOS CLI uses a profile-based configuration system to store your credentials and settings securely. This eliminates the need to provide authentication details with every command and allows you to work with multiple CloudOS environments.
Configuration will be saved in the $HOME path folder regardless of operating system. Here, a new folder named .cloudos
will be created, with files credentials
and config
also being created. The structure will look like:
$HOME
└── .cloudos/
├── credentials # Stores API keys
└── config # Stores all other parameters
To facilitate the reuse of required parameters, you can create profiles.
To generate a profile called default
, use the following command:
cloudos configure
This will prompt you for API key, platform URL, project name, platform executor, repository provider, workflow name (if any), and session ID for interactive analysis. This becomes the default profile if no other profile is explicitly set. The default profile allows running all subcommands without adding the --profile
option.
To generate a named profile, use the following command:
cloudos configure --profile {profile-name}
The same prompts will appear. If a profile with the same name already exists, the current parameters will appear in square brackets and can be overwritten or left unchanged by pressing Enter/Return.
Note
When there is already at least 1 previous profile defined, a new question will appear asking to make the current profile as default
Change the default profile with:
cloudos configure --profile {other-profile} --make-default
View all configured profiles and identify the default:
cloudos configure list-profiles
The response will look like:
Available profiles:
- default (default)
- second-profile
- third-profile
Remove any profile with:
cloudos configure remove-profile --profile second-profile
See Configuration section above for detailed information on setting up profiles and managing your CloudOS CLI configuration.
Projects in CloudOS provide logical separation of datasets, workflows, and results, making it easier to manage complex research initiatives. You can list all available projects or create new ones using the CLI.
You can get a summary of all available workspace projects in two different formats:
- CSV: A table with a minimum predefined set of columns by default, or all available columns using the
--all-fields
parameter - JSON: All available information from projects in JSON format
To get a CSV table with all available projects for a given workspace:
cloudos project list --profile my_profile --output-format csv --all-fields
The expected output is something similar to:
Executing list...
Project list collected with a total of 320 projects.
Project list saved to project_list.csv
To get the same information in JSON format:
cloudos project list --profile my_profile --output-format json
You can create a new project in your CloudOS workspace using the project create
command. This command requires the name of the new project and will return the project ID upon successful creation.
cloudos project create --profile my_profile --new-project "My New Project"
The expected output is something similar to:
Project "My New Project" created successfully with ID: 64f1a23b8e4c9d001234abcd
Job queues are required for running jobs using AWS batch executor. The available job queues in your CloudOS workspace are listed in the "Compute Resources" section in "Settings". You can get a summary of all available workspace job queues in two formats:
- CSV: A table with a selection of the available job queue information. You can get all information using the
--all-fields
flag - JSON: All available information from job queues in JSON format
This command allows you to view available computational queues and their configurations. Example command for getting all available job queues in JSON format:
cloudos queue list --profile my_profile --output-format json --output-basename "available_queues"
Executing list...
Job queue list collected with a total of 5 queues.
Job queue list saved to available_queues.json
This command will output the list of available job queues in JSON format and save it to a file named available_queues.json
. You can use --output-format csv
for a CSV file, or omit --output-basename
to print to the console.
NOTE: The queue name that is visible in CloudOS and must be used with the
--job-queue
parameter is the one in thelabel
field.
Job queues for platform workflows
Platform workflows (those provided by CloudOS in your workspace as modules) run on separate and specific AWS batch queues. Therefore, CloudOS will automatically assign the valid queue and you should not specify any queue using the --job-queue
parameter. Any attempt to use this parameter will be ignored. Examples of such platform workflows are "System Tools" and "Data Factory" workflows.
You can get a summary of all available workspace workflows in two different formats:
- CSV: A table with a minimum predefined set of columns by default, or all available columns using the
--all-fields
parameter - JSON: All available information from workflows in JSON format
To get a CSV table with all available workflows for a given workspace:
cloudos workflow list --profile my_profile --output-format csv --all-fields
The expected output is something similar to:
Executing list...
Workflow list collected with a total of 609 workflows.
Workflow list saved to workflow_list.csv
To get the same information in JSON format:
cloudos workflow list --profile my_profile --output-format json
Executing list...
Workflow list collected with a total of 609 workflows.
Workflow list saved to workflow_list.json
The collected workflows are those that can be found in the "WORKSPACE TOOLS" section in CloudOS.
You can import new workflows to your CloudOS workspaces. The requirements are:
- The workflow must be a Nextflow pipeline
- The workflow repository must be located at GitHub, GitLab or BitBucket Server (specified by the
--repository-platform
option. Available options:github
,gitlab
andbitbucketServer
) - If your repository is private, you must have access to the repository and have linked your GitHub, Gitlab or Bitbucket server accounts to CloudOS
Usage of the workflow import command
To import GitHub workflows to CloudOS:
# Example workflow to import: https://github.com/lifebit-ai/DeepVariant
cloudos workflow import --profile my_profile --workflow-url "https://github.com/lifebit-ai/DeepVariant" --workflow-name "new_name_for_the_github_workflow" --repository-platform github
The expected output will be:
CloudOS workflow functionality: list and import workflows.
Executing workflow import...
[Message] Only Nextflow workflows are currently supported.
Workflow test_import_github_3 was imported successfully with the following ID: 6616a8cb454b09bbb3d9dc20
Optionally, you can add a link to your workflow documentation by providing the URL using the --workflow-docs-link
parameter:
cloudos workflow import --profile my_profile --workflow-url "https://github.com/lifebit-ai/DeepVariant" --workflow-name "new_name_for_the_github_workflow" --workflow-docs-link "https://github.com/lifebit-ai/DeepVariant/blob/master/README.md" --repository-platform github
NOTE: Importing workflows using cloudos-cli is not yet available in all CloudOS workspaces. If you try to use this feature in a non-prepared workspace you will get the following error message:
It seems your API key is not authorised. Please check if your workspace has support for importing workflows using cloudos-cli
.
The job commands allow you to submit, monitor, and manage computational workflows on CloudOS. This includes both Nextflow pipelines and bash scripts, with support for various execution platforms.
You can submit Nextflow workflows to CloudOS using either configuration files or command-line parameters. Jobs can be configured with specific compute resources, execution platforms, parameters, etc.
First, configure your local environment to ease parameter input. We will try to submit a small toy example already available:
cloudos job run --profile my_profile --workflow-name rnatoy --job-config cloudos_cli/examples/rnatoy.config --resumable
As you can see, a file with the job parameters is used to configure the job. This file could be a regular nextflow.config
file or any file with the following structure:
params {
reads = s3://lifebit-featured-datasets/pipelines/rnatoy-data
annot = s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff
}
In addition, parameters can also be specified using the command-line -p
or --parameter
. For instance:
cloudos job run \
--profile my_profile \
--workflow-name rnatoy \
--parameter reads=s3://lifebit-featured-datasets/pipelines/rnatoy-data \
--parameter genome=s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.Ggal71.500bpflank.fa \
--parameter annot=s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff \
--resumable
NOTE: options
--job-config
and--parameter
are completely compatible and complementary, so you can use a--job-config
and add additional parameters using--parameter
in the same call.
If everything went well, you should see something like:
Executing run...
Job successfully launched to CloudOS, please check the following link: https://cloudos.lifebit.ai/app/advanced-analytics/analyses/62c83a1191fe06013b7ef355
Your assigned job id is: 62c83a1191fe06013b7ef355
Your current job status is: initializing
To further check your job status you can either go to https://cloudos.lifebit.ai/app/advanced-analytics/analyses/62c83a1191fe06013b7ef355 or use the following command:
cloudos job status \
--apikey $MY_API_KEY \
--cloudos-url https://cloudos.lifebit.ai \
--job-id 62c83a1191fe06013b7ef355
As you can see, the current status is initializing
. This will change while the job progresses. To check the status, just apply the suggested command.
Another option is to set the --wait-completion
parameter, which runs the same job run command but waits for its completion:
cloudos job run --profile my_profile --workflow-name rnatoy --job-config cloudos_cli/examples/rnatoy.config --resumable --wait-completion
When setting this parameter, you can also set --request-interval
to a bigger number (default is 30s) if the job is quite large. This will ensure that the status requests are not sent too close from each other and recognized as spam by the API.
If the job takes less than --wait-time
(3600 seconds by default), the previous command should have an output similar to:
Executing run...
Job successfully launched to CloudOS, please check the following link: https://cloudos.lifebit.ai/app/advanced-analytics/analyses/62c83a6191fe06013b7ef363
Your assigned job id is: 62c83a6191fe06013b7ef363
Please, wait until job completion or max wait time of 3600 seconds is reached.
Your current job status is: initializing.
Your current job status is: running.
Your job took 420 seconds to complete successfully.
When there are duplicate --workflow-name
in the platform, you can add the --last
flag to use the latest import of that pipeline in the workspace, based on the date.
For example, the pipeline lifebit-process
was imported on May 23 2025 and again on May 30 2025; with the --last
flag, it will use the import of May 30, 2025.
AWS Executor Support
CloudOS supports AWS batch executor by default.
You can specify the AWS batch queue to use from the ones available in your workspace (see here) by specifying its name with the --job-queue
parameter. If none is specified, the most recent suitable queue in your workspace will be selected by default.
Example command:
cloudos job run --profile my_profile --workflow-name rnatoy --job-config cloudos_cli/examples/rnatoy.config --resumable
Note: From cloudos-cli 2.7.0, the default executor is AWS batch. The previous Apache ignite executor is being removed progressively from CloudOS, so most likely will not be available in your CloudOS. Cloudos-cli still supports ignite during this period by adding the
--ignite
flag to thecloudos job run
command. Please note that if you use the--ignite
flag in a CloudOS without ignite support, the command will fail.
Azure Execution Platform Support
CloudOS can also be configured to use Microsoft Azure compute platforms. If your CloudOS is configured to use Azure, you will need to take into consideration the following:
- When sending jobs to CloudOS using
cloudos job run
command, please use the option--execution-platform azure
- Due to the lack of AWS batch queues in Azure,
cloudos queue list
command is not working
Other than that, cloudos-cli
will work very similarly. For instance, this is a typical send job command:
cloudos job run --profile my_profile --workflow-name rnatoy --job-config cloudos_cli/examples/rnatoy.config --resumable --execution-platform azure
HPC Execution Support
CloudOS is also prepared to use an HPC compute infrastructure. For such cases, you will need to take into account the following for your job submissions using cloudos job run
command:
- Use the following parameter:
--execution-platform hpc
- Indicate the HPC ID using:
--hpc-id XXXX
Example command:
cloudos job run --profile my_profile --workflow-name rnatoy --job-config cloudos_cli/examples/rnatoy.config --execution-platform hpc --hpc-id $YOUR_HPC_ID
Please note that HPC execution does not support the following parameters and all of them will be ignored:
--job-queue
--resumable | --do-not-save-logs
--instance-type
|--instance-disk
|--cost-limit
--storage-mode
|--lustre-size
--wdl-mainfile
|--wdl-importsfile
|--cromwell-token
The following command allows you to get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
Example:
cloudos job logs --profile my_profile --job-id "12345678910"
Executing logs...
Logs URI: s3://path/to/location/of/logs
Nextflow log: s3://path/to/location/of/logs/.nextflow.log
Nextflow standard output: s3://path/to/location/of/logs/stdout.txt
Trace file: s3://path/to/location/of/logs/trace.txt
The following command allows you to get the path where CloudOS stores the output files for a job. This can be used only on your user's jobs and for jobs with "completed" status.
Example:
cloudos job results --profile my_profile --job-id "12345678910"
Executing results...
results: s3://path/to/location/of/results/results/
To get the working directory of a job submitted to CloudOS:
cloudos job workdir \
--apikey $MY_API_KEY \
--cloudos-url $CLOUDOS \
--job-id 62c83a1191fe06013b7ef355
Or with a defined profile:
cloudos job workdir \
--profile profile-name \
--job-id 62c83a1191fe06013b7ef355
The output should be something similar to:
CloudOS job functionality: run, check and abort jobs in CloudOS.
Finding working directory path...
Working directory for job 68747bac9e7fe38ec6e022ad: az://123456789000.blob.core.windows.net/cloudos-987652349087/projects/455654676/jobs/54678856765/work
The clone
command allows you to create a new job based on an existing job's configuration, with the ability to override specific parameters.
The resume
command allow you to create a new job (with the ability to override specific parameters) withour re-running every step but only the ones failed/where changes are applied.
These commands are particularly useful for re-running jobs with slight modifications without having to specify all parameters or starting again from scratch.
Note
Only job initially run with --resumable
can be resumed.
Cloning basic usage:
Aborting jobs... Job 680a3cf80e56949775c02f16 aborted successfully.
#### Clone/resume a job with optional parameter overrides
The `clone` and `resume` commands allows you to create a new job based on an existing job's configuration, with the ability to override specific parameters. This is useful for re-running jobs with slight modifications without having to specify all parameters from scratch.
Basic usage:
```console
cloudos job clone/resume \
--profile MY_PROFILE
--job-id "60a7b8c9d0e1f2g3h4i5j6k7"
Cloning with parameter overrides:
Clone/resume with parameter overrides:
cloudos job clone/resume \
--profile MY_PROFILE
--job-id "60a7b8c9d0e1f2g3h4i5j6k7" \
--job-queue "high-priority-queue" \
--cost-limit 50.0 \
--instance-type "c5.2xlarge" \
--job-name "cloned_analysis_v2" \
--nextflow-version "24.04.4" \
--git-branch "dev" \
--nextflow-profile "production" \
--do-not-save-logs true \
--accelerate-file-staging true \
--workflow-name "updated-workflow" \
-p "input=s3://new-bucket/input.csv" \
-p "output_dir=s3://new-bucket/results"
Resuming a job without parameter overrides
cloudos job resume \
--profile MY_PROFILE
--job-id JOB_ID
Resuming with parameter overrides:
cloudos job resume \
--profile MY_PROFILE
--job-id "60a7b8c9d0e1f2g3h4i5j6k7" \
--job-queue "high-priority-queue" \
--cost-limit 50.0 \
--instance-type "c5.2xlarge" \
--job-name "cloned_analysis_v2" \
--nextflow-version "24.04.4" \
--git-branch "dev" \
--nextflow-profile "production" \
--do-not-save-logs true \
--accelerate-file-staging true \
--workflow-name "updated-workflow" \
-p "input=s3://new-bucket/input.csv" \
-p "output_dir=s3://new-bucket/results"
Available override options:
--job-queue
: Specify a different job queue--cost-limit
: Set a new cost limit (use -1 for no limit)--instance-type
: Change the master instance type--job-name
: Assign a custom name to the cloned/resumed job--nextflow-version
: Use a different Nextflow version--git-branch
: Switch to a different git branch--nextflow-profile
: Change the Nextflow profile--do-not-save-logs
: Enable/disable log saving--accelerate-file-staging
: Enable/disable fusion filesystem--workflow-name
: Use a different workflow-p, --parameter
: Override or add parameters (can be used multiple times)
Note
Parameters can be overridden or new ones can be added using -p
option
Aborts jobs in the CloudOS workspace that are either running or initializing. It can be used with one or more job IDs provided as a comma-separated string using the --job-ids
parameter.
Example:
cloudos job abort --profile my_profile --job-ids "680a3cf80e56949775c02f16"
Aborting jobs...
Job 680a3cf80e56949775c02f16 aborted successfully.
To check the status of a submitted job, use the following command:
cloudos job status --profile my_profile --job-id 62c83a1191fe06013b7ef355
The expected output should be something similar to:
Executing status...
Your current job status is: completed
To further check your job status you can either go to https://cloudos.lifebit.ai/app/advanced-analytics/analyses/62c83a1191fe06013b7ef355 or repeat the command you just used.
Details of a job, including cost, status, and timestamps, can be retrieved with:
cloudos job details --profile my_profile --job-id 62c83a1191fe06013b7ef355
cloudos job details \
--profile job-details \
--job-id 62c83a1191fe06013b7ef355
The expected output should be something similar to when using the defaults and the details are displayed in the standard output console:
Executing details...
Job Details
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Field ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Parameters │ -test=value │
│ │ --gaq=test │
│ │ cryo=yes │
│ Command │ echo 'test' > new_file.txt │
│ Revision │ sha256:6015f66923d7afbc53558d7ccffd325d43b4e249f41a6e93eef074c9505d2233 │
│ Nextflow Version │ None │
│ Execution Platform │ Batch AWS │
│ Profile │ None │
│ Master Instance │ c5.xlarge │
│ Storage │ 500 │
│ Job Queue │ nextflow-job-queue-5c6d3e9bd954e800b23f8c62-feee │
│ Accelerated File Staging │ None │
│ Task Resources │ 1 CPUs, 4 GB RAM │
└──────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
To change this behaviour and save the details into a local JSON, the parameter --output-format
needs to be set as --output-format=json
.
By default, all details are saved in a file with the basename as job_details
, for example job_details.json
or job_details.config.
. This can be changed with the parameter --output-basename=new_filename
.
The details
subcommand, can also take --parameters
as an argument flag, which will create a new file *.config
that holds all parameters as a Nexflow configuration file, example:
params {
parameter_one = value_one
parameter_two = value_two
parameter_three = value_three
}
This file can later be used when running a job with cloudos job run --job-config job_details.config ...
.
Note
Job details can only be retrieved for a single user, cannot see other user's job details.
To get the working directory of a job submitted to CloudOS:
cloudos job workdir \
--profile profile-name \
--job-id 62c83a1191fe06013b7ef355
The output should be something similar to:
CloudOS job functionality: run, check and abort jobs in CloudOS.
Finding working directory path...
Working directory for job 68747bac9e7fe38ec6e022ad: az://123456789000.blob.core.windows.net/cloudos-987652349087/projects/455654676/jobs/54678856765/work
You can get a summary of the workspace's last 30 submitted jobs (or a selected number of last jobs using --last-n-jobs n
parameter) in two different formats:
- CSV: this is a table with a minimum predefined set of columns by default, or all the available columns using the
--all-fields
argument. - JSON: all the available information from the workspace jobs, in JSON format (
--all-fields
is always enabled for this format).
To get a list with the workspace's last 30 submitted jobs, in CSV format, use:
cloudos job list --profile my_profile--output-format csv --all-fields
The expected output is something similar to:
Executing list...
Job list collected with a total of 30 jobs.
Job list saved to joblist.csv
In addition, a file named joblist.csv
is created.
To get the same information, but for all the workspace's jobs and in JSON format, use the following command:
cloudos job list \
--cloudos-url $CLOUDOS \
--apikey $MY_API_KEY \
--workspace-id $WORKSPACE_ID \
--last-n-jobs all \
--output-format json
Executing list...
Job list collected with a total of 276 jobs.
Job list saved to joblist.json
You find specific jobs within your workspace using the listing filtering options. Filters can be combined to narrow down results and all filtering is performed after retrieving jobs from the server.
Available filters:
--filter-status
: Filter jobs by execution status (e.g., completed, running, failed, aborted, queued, pending, initializing)--filter-job-name
: Filter jobs by job name (case insensitive partial matching)--filter-project
: Filter jobs by project name (exact match required)--filter-workflow
: Filter jobs by workflow/pipeline name (exact match required)--filter-job-id
: Filter jobs by specific job ID (exact match required)--filter-only-mine
: Show only jobs belonging to the current user--filter-owner
: Show only job for the specified owner (exact match required, i.e needs to be in quotes and be "Name Surname")--filter-queue
: Filter jobs by queue name (only applies to batch jobs)
Here following are some examples:
Get all completed jobs from the last 50 jobs:
cloudos job list --profile my_profile --last-n-jobs 50 --filter-status completed
Find jobs with "analysis" in the name from a specific project:
cloudos job list --profile my_profile --filter-job-name analysis --filter-project "My Research Project"
Get all jobs using a specific workflow and queue:
cloudos job list --profile my_profile --filter-workflow rnatoy --filter-queue high-priority-queue
Note
- Project and workflow names must match exactly (case sensitive)
- Job name filtering is case insensitive and supports partial matches
- The
--last
flag can be used with--filter-workflow
when multiple workflows have the same name
Execute bash scripts on CloudOS for custom processing workflows. Bash jobs allow you to run shell commands with custom parameters and are ideal for data preprocessing or simple computational tasks.
A bash job can be sent to CloudOS using the command bash
and the subcommand job
. In this case, the --workflow-name
must be a bash job already present in the platform. Bash jobs are identified by bash icon (unlike Nextflow jobs, which are identified with Nextflow icon).
cloudos bash job \
--profile my_profile \
--workflow-name ubuntu \
--parameter -test_variable=value \
--parameter --flag=activate \
--parameter send="yes" \
--job-name $JOB_NAME \
--command "echo 'send' > new_file.txt" \
--resumable
The --command
parameter is required and will setup the command for the parameters to run.
Each --parameter
can have a different prefix, either '--', '-', or '', depending on the use case. These can be used as many times as needed.
Note
At the moment only string values are allowed to the --parameter
options, adding a filepath at the moment does not upload/download the file. This feature will be available in a future implementation.
If everything went well, you should see something like:
CloudOS bash functionality.
Job successfully launched to CloudOS, please check the following link: https://cloudos.lifebit.ai/app/advanced-analytics/analyses/682622d09f305de717327334
Your assigned job id is: 682622d09f305de717327334
Your current job status is: initializing
To further check your job status you can either go to https://cloudos.lifebit.ai/app/advanced-analytics/analyses/682622d09f305de717327334 or use the following command:
cloudos job status \
--apikey $MY_API_KEY \
--cloudos-url https://cloudos.lifebit.ai \
--job-id 682622d09f305de717327334
As you can see, the current status is initializing
. This will change while the job progresses. To check the status, just apply the suggested command.
Other options like --wait-completion
are also available and work in the same way as for the cloudos job run
command.
Check cloudos bash job --help
for more details.
Run parallel bash jobs across multiple samples or datasets using array files. This is particularly useful for processing large datasets where each row represents a separate computational task.
When running a bash array job, you can specify an array file containing sample information and process each row in parallel. The CLI validates column names and provides flexible parameter mapping.
cloudos bash array-job --profile my_profile --command "echo {file}" --array-file my_array.csv --separator ,
--array-file
: Specifies the path to a file containing a set of columns useful in running the bash job. This option is required when using the commandbash array-job
.
--separator
: Defines the separator to use in the array file. Supported separators include:,
(comma);
(semicolon)tab
space
|
(pipe) This option is required when using the commandbash array-job
.
--list-columns
: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
Columns:
- column1
- column2
- column3
--array-file-project
: Specifies the name of the project in which the array file is placed, if it is different from the project specified by--project-name
.
--disable-column-check
: Disables the validation of columns in the array file. This implies that each--array-parameter
value is not checked against the header of the--array-file
. For example,--array-parameter --bar=foo
, without--disable-column-check
, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When--disable-column-check
flag is added, the column check is not performed and the bash array job is sent to the platform.
Note
Adding --disable-column-check
will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with --array-parameter
.
-a
/--array-parameter
: Allows specifying the column name present in the header of the array file. Each parameter should be in the formatarary_parameter_name=array_file_column
. For example:-a --test=value
or--array-parameter -test=value
specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to-p, --parameter
, both parameters can be interpolated in the bash array job command (either with--command
or--custom-script-path
), but this parameter can only be used to name the column present in the header of the array file.
For example, the array file has the following header:
id,bgen,csv
1,s3://data/adipose.bgen,s3://data/adipose.csv
2,s3://data/blood.bgen,s3://data/blood.csv
3,s3://data/brain.bgen,s3://data/brain.csv
...
and in the command there is need to go over the bgen
column, this can be specified as --array-parameter file=bgen
, refering to the column in the header.
--custom-script-path
: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter--command
is ignored. To ensure the script runs successfully, you must either:
- Use a Shebang Line at the Top of the Script
The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
Examples:
#!/usr/bin/python3
–-> for Python scripts
#!/usr/bin/Rscript
–-> for R scripts
#!/bin/bash
–-> for Bash scripts
Example Python Script:
#!/usr/bin/python3
print("Hello world")
- Or use an interpreter command in the executable field
If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
python my_script.py
Rscript my_script.R
bash my_script.sh
This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
/usr/bin/python3 my_script.py
/usr/local/bin/Rscript my_script.R
--custom-script-project
: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by--project-name
.
These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
The option --parameter
could specify a file input located in a different project than option --project-name
. The files can only be located inside the project's Data
subfolder, not Cohorts
or Analyses Results
. The accepted structures for different parameter projects are:
-p/--parameter "--file=<project>/Data/file.txt"
-p/--parameter "--file=<project>/Data/subfolder/file.txt"
-p/--parameter "--file=Data/subfolder/file.txt"
(the same project as--project-name
)-p/--parameter "--file=<project>/Data/subfolder/*.txt"
-p/--parameter "--file=<project>/Data/*.txt"
-p/--parameter "--file=Data/*.txt"
(the same project as--project-name
)
The project should be specified at the beginning of the file path. For example:
cloudos bash array-job --profile my_profile -p file=Data/input.csv
This will point to the global project, specified with --project-name
. In contrast:
cloudos bash array-job \
--profile my_profile \
-p data=Data/input.csv \
-p exp=PROJECT_EXPRESSION/Data/input.csv \
--project-name "ADIPOSE"
for parameter exp
it will point to a project named PROJECT_EXPRESSION
in the File Explorer, and data
parameter will be found in the global project ADIPOSE
.
Apart from files, the parameter can also take glob patterns, for example:
cloudos bash array-job \
--profile my_profile \
-p data=Data/input.csv \
-p exp="PROJECT_EXPRESSION/Data/*.csv" \
--project-name "ADIPOSE"
will take all csv
file extensions in the specified folder.
Note
When specifying glob patterns, depending on the terminal it is best to add it in double quotes to avoid the terminal searching for the glob pattern locally, e.g. -p exp="PROJECT_EXPRESSION/Data/*.csv"
.
Note
Project names in the --parameter
option can start with either forward slash /
or without. The following are the same: -p data=/PROJECT1/Data/input.csv
and -p data=PROJECT1/Data/input.csv
.
Manage files and folders within your CloudOS File Explorer programmatically. These commands provide comprehensive file management capabilities for organizing research data and results.
Browse files and folders within your CloudOS projects. Use the --details
flag to get comprehensive information about file ownership, sizes, and modification dates.
cloudos datasets ls <path> --profile <profile>
The output of this command is a list of files and folders present in the specified project.
Note
If the <path>
is left empty, the command will return the list of folders present in the selected project.
If you require more information on the files and folder listed, you can use the --details
flag that will output a table containing the following columns:
- Type (folder or file)
- Owner
- Size (in human readable format)
- Last updated
- Virtual Name (the file or folder name)
- Storage Path
Relocate files and folders within the same project or across different projects. This is useful for reorganizing data and moving results to appropriate locations.
Note
Files and folders can be moved from Data
or any of its subfolders (i.e Data
, Data/folder/file.txt
) to Data
or any of its subfolders programmatically. Furthermore, only virtual folders can be destination folders.
The move can happen within the same project
cloudos datasets mv <source_path> <destination_path> --profile <profile>
But it can also happen across different projects within the same workspace by specifying the destination project name.
cloudos datasets mv <source_path> <destination_path> --profile <profile> --destination-project-name <project>
Any of the source_path
must be a full path, starting from the Data
datasets and its folder; any destination_path
must be a path starting with Data
and finishing with the folder where to move the file/folder.
An example of such command is:
cloudos datasets mv Data/results/my_plot.png Data/plots
Change file and folder names while keeping them in the same location. This helps maintain organized file structures and clear naming conventions.
Note
Files and folders within the Data
dataset can be renamed using the following command
cloudos datasets rename <path> <new_name> --profile my_profile
where path
is the full path to the file/folder to be renamed and new_name
is just the name, no path required, as the file will not be moved.
Note
Renaming can only happen in files and folders that are present in the Data
datasets and that were created or uploaded by your user.
Create copies of files and folders for backup purposes or to share data across projects without moving the original files.
Note
Files and folders can be copied from anywhere in the project to Data
or any of its subfolders programmatically (i.e Data
, Data/folder/file.txt
). Furthermore, only virtual folders can be destination folders.
The copy can happen within the same project
cloudos datasets cp <source_path> <destination_path> --profile <profile>
or it can happen across different projects within the same workspace
cloudos datasets cp <source_path> <destination_path> --profile <profile> --destination-project-name <project>
Any of the source_path
must be a full path; any destination_path
must be a path starting with Data
and finishing with the folder where to move the file/folder.
An example of such command is:
cloudos datasets cp AnalysesResults/my_analysis/results/my_plot.png Data/plots
Connect external S3 buckets or internal File Explorer folders to your interactive analysis sessions. This provides direct access to data without needing to copy files.
This subcommand is using the option --session-id
to access the correct interactive session. This option can be added to the CLI or defined in a profile, for convenience.
cloudos datasets link <S3_FOLDER_COMPLETE_PATH_OR_VIRTUAL_FOLDER_PATH> --profile <profile> --session-id <SESSION_ID>
For example, an s3 folder can be linked like follows
cloudos datasets link s3://bucket/path/folder --profile test --session-id 1234
A virtual folder can be linked like
cloudos datasets link "Analyses Results/HLA" --session-id 1234
Note
If running the CLI inside a jupyter session, the pre-configured CLI installation will have the session ID already installed and only the --apikey
needs to be added.
Note
Virtual folders in File Explorer, the ones a user has created in File explorer and are not actual storage locations, cannot be linked.
Create new organizational folders within your projects to maintain structured data hierarchies.
Note
New folders can be created within the Data
dataset and its subfolders.
cloudos datasets mkdir <new_folder_path> --profile my_profile
Remove unnecessary files or empty folders from your File Explorer. Note that this removes files from CloudOS but not from underlying cloud storage.
Note
Files and folders can be removed in the Data
datasets and its subfolders.
cloudos datasets rm <path> --profile my_profile
Note
If a file was uploaded by the user, in order to be removed you must use --force
and that will permanently remove the file. If the file is "linked" (e.g a s3 folder or file), removing it using cloudos datasets rm
will not remove it from the the s3 bucket.
CloudOS supports procurement functionality to manage and list images associated with organizations within a given procurement. This feature is useful for administrators and users who need to view available container images across different organizations in their procurement.
You can get a list of images associated with organizations of a given procurement using the cloudos procurement images ls
command. This command provides paginated results showing image configurations and metadata.
To list images for a specific procurement, use the following command:
cloudos procurement images ls \
-- profile procurement_profile
--procurement-id "your_procurement_id_here"
Command options:
--apikey
/-k
: Your CloudOS API key (required)--cloudos-url
/-c
: The CloudOS URL you are trying to access (default: https://cloudos.lifebit.ai)--procurement-id
: The specific CloudOS procurement ID (required)--page
: The response page number (default: 1)--limit
: The page size limit (default: 10)--disable-ssl-verification
: Disable SSL certificate verification--ssl-cert
: Path to your SSL certificate file--profile
: Profile to use from the config file
Example usage:
# List images for the procurement (first page, 10 items)
cloudos procurement images ls --profile procurement_profile --procurement-id "your_procurement_id_here"
To get more results per page or navigate to different pages:
# Get 25 images from page 2
cloudos procurement images ls --profile procurement_profile --page 2 --limit 25 --procurement-id "your_procurement_id_here"
Output format:
The command returns detailed information about image configurations and pagination metadata in JSON format, including:
- Image configurations: Details about available container images
- Pagination metadata: Information about total pages, current page, and available items
This is particularly useful for understanding what container images are available across different organizations within your procurement and for programmatic access to image inventory.
You can set a custom image ID or name for an organization within a procurement using the cloudos procurement images set
command. This allows you to override the default CloudOS images with your own custom images for specific organizations.
To set a custom image for an organization, use the following command:
cloudos procurement images set --profile procurement_profile --image-type "JobDefault" --provider "aws" --region "us-east-1" --image-id "ami-0123456789abcdef0" --image-name "custom-image-name" --procurement-id "your_procurement_id_here" --organisation-id "your_organization_id"
Set command options:
--apikey
/-k
: Your CloudOS API key (required)--cloudos-url
/-c
: The CloudOS URL you are trying to access (default: https://cloudos.lifebit.ai)--procurement-id
: The specific CloudOS procurement ID (required)--organisation-id
: The organization ID where the change will be applied (required)--image-type
: The CloudOS resource image type (required). Possible values:RegularInteractiveSessions
SparkInteractiveSessions
RStudioInteractiveSessions
JupyterInteractiveSessions
JobDefault
NextflowBatchComputeEnvironment
--provider
: The cloud provider (required). Currently onlyaws
is supported--region
: The cloud region (required). Currently only AWS regions are supported--image-id
: The new image ID value (required)--image-name
: The new image name value (optional)--disable-ssl-verification
: Disable SSL certificate verification--ssl-cert
: Path to your SSL certificate file--profile
: Profile to use from the config file
Set command example:
# Set custom image for job execution
cloudos procurement images set --profile procurement_profile --image-type "JobDefault" --provider "aws" --region "us-east-1" --image-id "ami-0123456789abcdef0" --image-name "my-custom-job-image" --procurement-id "your_procurement_id_here" --organisation-id "your_organization_id"
You can reset an organization's image configuration back to CloudOS defaults using the cloudos procurement images reset
command. This removes any custom image configurations and restores the original CloudOS defaults.
To reset an organization's image to defaults, use the following command:
cloudos procurement images reset --profile procurement_profile --image-type "JobDefault" --provider "aws" --region "us-east-1" --procurement-id "your_procurement_id_here" --organisation-id "your_organization_id"
Reset command options:
--apikey
/-k
: Your CloudOS API key (required)--cloudos-url
/-c
: The CloudOS URL you are trying to access (default: https://cloudos.lifebit.ai)--procurement-id
: The specific CloudOS procurement ID (required)--organisation-id
: The organization ID where the change will be applied (required)--image-type
: The CloudOS resource image type (required). Same values as forset
command--provider
: The cloud provider (required). Currently onlyaws
is supported--region
: The cloud region (required). Currently only AWS regions are supported--disable-ssl-verification
: Disable SSL certificate verification--ssl-cert
: Path to your SSL certificate file--profile
: Profile to use from the config file
Reset command example:
# Reset image configuration to CloudOS defaults
cloudos procurement images reset --profile procurement_profile --image-type "JobDefault" --provider "aws" --region "us-east-1" --procurement-id "your_procurement_id_here" --organisation-id "your_organization_id"
In order to run WDL pipelines, a Cromwell server in CloudOS should be running. This server can be accessed to check its status, restart it or stop it, using the following commands:
# Check Cromwell status
cloudos cromwell status --profile my_profile
Executing status...
Current Cromwell server status is: Stopped
# Cromwell start
cloudos cromwell start --profile my_profile
Starting Cromwell server...
Current Cromwell server status is: Initializing
Current Cromwell server status is: Running
# Cromwell stop
cloudos cromwell stop --profile my_profile
Stopping Cromwell server...
Current Cromwell server status is: Stopped
To run WDL workflows, cloudos job run
command can be used normally, but adding two extra parameters:
--wdl-mainfile
: name of the mainFile (*.wdl) file used by the CloudOS workflow.--wdl-importsfile
[Optional]: name of the workflow imported file (importsFile, *.zip).
All the rest of the cloudos job run
functionality is available.
NOTE: WDL does not support
profiles
and therefore,--nextflow-profile
option is not available. Instead, use--job-config
and/or--parameter
. The format of the job config file is expected to be the same as for nextflow pipelines.
Example of job config file for WDL workflows:
params {
test.hello.name = aasdajdad
test.bye.nameTwo = asijdadads
test.number.x = 2
test.greeter.morning = true
test.wf_hello_in = bomba
test.arrayTest = ["lala"]
test.mapTest = {"some":"props"}
}
NOTE: when using
--parameter
option, if the value needs quotes ("
) you will need to escape them. E.g.:--parameter test.arrayTest=[\"lala\"]
cloudos job run --profile my_profile --project-name wdl-test --workflow-name "wdl-test" --wdl-mainfile hello.wdl --wdl-importsfile imports_7mb.zip --job-config cloudos/examples/wdl.config --wait-completion
Executing run...
WDL workflow detected
Current Cromwell server status is: Stopped
Starting Cromwell server...
Current Cromwell server status is: Initializing
Current Cromwell server status is: Running
*******************************************************************************
[WARNING] Cromwell server is now running. Plase, remember to stop it when your
job finishes. You can use the following command:
cloudos cromwell stop \
--cromwell-token $CROMWELL_TOKEN \
--cloudos-url $CLOUDOS \
--workspace-id $WORKSPACE_ID
*******************************************************************************
Job successfully launched to CloudOS, please check the following link: ****
Your assigned job id is: ****
Please, wait until job completion or max wait time of 3600 seconds is reached.
Your current job status is: initializing.
Your current job status is: running.
Your job took 60 seconds to complete successfully.
To illustrate how to import the package and use its functionality inside your own python scripts, we will perform a job submission and check its status from inside a python script.
Again, we will set up the environment to ease the work:
import cloudos_cli.jobs.job as jb
import json
# GLOBAL VARS.
apikey = 'xxxxx'
cloudos_url = 'https://cloudos.lifebit.ai'
workspace_id = 'xxxxx'
project_name = 'API jobs'
workflow_name = 'rnatoy'
job_config = 'cloudos/examples/rnatoy.config'
First, create the Job
object:
j = jb.Job(cloudos_url, apikey, None, workspace_id, project_name, workflow_name)
print(j)
Then, send the job:
j_id = j.send_job(job_config)
To check the status:
j_status = j.get_job_status(j_id)
j_status_h = json.loads(j_status.content)["status"]
print(j_status_h)
The status will change while your job progresses, so to check again just repeat the above code.
You can also collect your last 30 submitted jobs for a given workspace using the following command.
my_jobs_r = j.get_job_list(workspace_id)
my_jobs = j.process_job_list(my_jobs_r)
print(my_jobs)
Or inspect all the available workflows for a given workspace using the following command.
my_workflows_r = j.get_workflow_list(workspace_id)
my_workflows = j.process_workflow_list(my_workflows_r)
print(my_workflows)
Similarly, you can inspect all the available projects for a given workspace using the following command.
my_projects_r = j.get_project_list(workspace_id)
my_projects = j.process_project_list(my_projects_r)
print(my_projects)
You can even run WDL pipelines. First check the Cromwell server status and restart it if Stopped:
import cloudos_cli.clos as cl
import cloudos_cli.jobs.job as jb
import json
# GLOBAL VARS.
apikey = 'xxxxx'
cloudos_url = 'https://cloudos.lifebit.ai'
workspace_id = 'xxxxx'
project_name = 'wdl-test'
workflow_name = 'wdl- test'
mainfile = 'hello.wdl'
importsfile = 'imports_7mb.zip'
job_config = 'cloudos/examples/wdl.config'
# First create cloudos object
cl = cl.Cloudos(cloudos_url, apikey, None)
# Then, check Cromwell status
c_status = cl.get_cromwell_status(workspace_id)
c_status_h = json.loads(c_status.content)["status"]
print(c_status_h)
# Start Cromwell server
cl.cromwell_switch(workspace_id, 'restart')
# Check again Cromwell status (wait until status: 'Running')
c_status = cl.get_cromwell_status(workspace_id)
c_status_h = json.loads(c_status.content)["status"]
print(c_status_h)
# Send a job (wait until job has status: 'Completed')
j = jb.Job(cloudos_url, apikey, None, workspace_id, project_name, workflow_name, True, mainfile,
importsfile)
j_id = j.send_job(job_config, workflow_type='wdl', cromwell_id=json.loads(c_status.content)["_id"])
j_status = j.get_job_status(j_id)
j_status_h = json.loads(j_status.content)["status"]
print(j_status_h)
# Stop Cromwell server
cl.cromwell_switch(workspace_id, 'stop')
# Check again Cromwell status
c_status = cl.get_cromwell_status(workspace_id)
c_status_h = json.loads(c_status.content)["status"]
print(c_status_h)
Unit tests require 4 additional packages:
pytest>=6.2.5
requests-mock>=1.9.3
responses>=0.21.0
mock>=3.0.5
Command to run tests from the cloudos-cli
main folder:
python -m pytest -s -v