-
-
Notifications
You must be signed in to change notification settings - Fork 6
Onboard Project: Usage
Matthew Beech edited this page Aug 1, 2025
·
3 revisions
This page details the silnlp.common.onboard_project script's usage and the configuration options available.
Cleans and uploads a Paratext project from a local machine to the MinIO bucket. Optionally performs other Onboarding tasks.
usage: python -m silnlp.common.onboard_project [--copy-from local_project] [--config path_to_config]
[--extract-corpora] [--collect-verse-counts] project
Arguments:
Argument | Purpose | Description |
---|---|---|
project |
Paratext project name | (Required) The project will be stored on the bucket at Paratext/projects/project . |
--copy-from local_project |
(Required) Path to a downloaded Paratext project folder. | This local project will be copied to the bucket. |
--config path_to_config |
Path to a config.yml file | This is used to configure what optional Onboarding tasks will run. |
--extract-corpora |
Runs silnlp.common.extract_corpora | Extracts corpora. See here for more information. |
--collect-verse-counts |
Runs silnlp.common.collect_verse_counts | Collects verse counts. |
The config file contains the parameters for all of the optional onboarding tasks this script can execute.
Below is an example of a onboarding config:
extract_corpora:
include: NT
exclude: OT
verse_counts:
output_folder: /root/M/MT/experiments/test_onboard_project
files: *.txt
deutero: false
recount: false
-
include=[]
: A list of books to include; e.g., 'NT', 'OT', 'GEN'. -
exclude=[]
: A list of books to exclude; e.g., 'NT', 'OT', 'GEN'. -
markers=False
: If true, include USFM markers in extraction. -
lemmas=False
: If true, extract lemmas. -
project_vrefs=False
: If true, extract project_vrefs.
-
output_folder=path_to_output_folder
: Folder to store the verse counts. -
files=*.txt
: Semicolon-delimited list of patterns of extract file names to count (e.g. 'en-*.txt;fr-NT.txt). -
deutero=False
: If true, include counts for Deuterocanon books. -
recount=False
: If true, force recount of verse counts.