Skip to content

Onboard Project: Usage

Matthew Beech edited this page Aug 1, 2025 · 3 revisions

Onboarding Projects

This page details the silnlp.common.onboard_project script's usage and the configuration options available.

onboard_project

Cleans and uploads a Paratext project from a local machine to the MinIO bucket. Optionally performs other Onboarding tasks.

usage: python -m silnlp.common.onboard_project [--copy-from local_project] [--config path_to_config]
[--extract-corpora] [--collect-verse-counts] project

Arguments:

Argument Purpose Description
project Paratext project name (Required) The project will be stored on the bucket at Paratext/projects/project.
--copy-from local_project (Required) Path to a downloaded Paratext project folder. This local project will be copied to the bucket.
--config path_to_config Path to a config.yml file This is used to configure what optional Onboarding tasks will run.
--extract-corpora Runs silnlp.common.extract_corpora Extracts corpora. See here for more information.
--collect-verse-counts Runs silnlp.common.collect_verse_counts Collects verse counts.

config file

The config file contains the parameters for all of the optional onboarding tasks this script can execute.

Below is an example of a onboarding config:

extract_corpora:
  include: NT
  exclude: OT
verse_counts:
  output_folder: /root/M/MT/experiments/test_onboard_project
  files: *.txt
  deutero: false
  recount: false

Parameter Definitions

extract_corpora

  • include=[]: A list of books to include; e.g., 'NT', 'OT', 'GEN'.
  • exclude=[]: A list of books to exclude; e.g., 'NT', 'OT', 'GEN'.
  • markers=False: If true, include USFM markers in extraction.
  • lemmas=False: If true, extract lemmas.
  • project_vrefs=False: If true, extract project_vrefs.

collect_verse_counts

  • output_folder=path_to_output_folder: Folder to store the verse counts.
  • files=*.txt: Semicolon-delimited list of patterns of extract file names to count (e.g. 'en-*.txt;fr-NT.txt).
  • deutero=False: If true, include counts for Deuterocanon books.
  • recount=False: If true, force recount of verse counts.
Clone this wiki locally