pfizer-opensource · ivanmilevtues · Jul 4, 2025
diff --git a/.codeboarding/Data_Configuration_Management.md b/.codeboarding/Data_Configuration_Management.md
@@ -0,0 +1,65 @@
+```mermaid
+
+graph LR
+
+    Data_Configuration_Management["Data & Configuration Management"]
+
+    Data_Configuration_Management -- "configures" --> Main_Entry_Point_Orchestrator
+
+    Data_Configuration_Management -- "provides configuration to" --> Quality_Control_Alignment_Metrics
+
+    Data_Configuration_Management -- "provides reference data to" --> Sequence_Alignment_Modules
+
+    Data_Configuration_Management -- "provides reference data/configuration to" --> Variant_Analysis_Typing_Logic
+
+    click Data_Configuration_Management href "https://github.com/pfizer-opensource/LISTT/blob/main/.codeboarding//Data_Configuration_Management.md" "Details"
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Details
+
+
+
+These components and their relationships highlight a clear data-centric architecture where configuration and reference data are centrally managed and then distributed to the modules that consume them, ensuring consistency and proper execution across the pipeline.
+
+
+
+### Data & Configuration Management [[Expand]](./Data_Configuration_Management.md)
+
+This component is responsible for parsing command-line arguments, managing input/output file paths, and providing access to static configuration parameters (e.g., quality thresholds, reference lengths) and essential reference sequences/databases. It acts as the initial setup and data provisioning layer for the entire bioinformatics pipeline.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/pfizer-opensource/LISTT/blob/main/src/cmd_parse.py#L1-L1" target="_blank" rel="noopener noreferrer">`src/cmd_parse.py` (1:1)</a>
+
+- `variants/min_cov_metrics.csv` (1:1)
+
+- `variants/thresholds.csv` (1:1)
+
+- `variants/reference_lengths.csv` (1:1)
+
+- `alleles/ref_alleles.fasta` (1:1)
+
+- `ref/ospA_allele_fasta` (1:1)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
diff --git a/.codeboarding/Pipeline_Orchestrator.md b/.codeboarding/Pipeline_Orchestrator.md
@@ -0,0 +1,223 @@
+```mermaid
+
+graph LR
+
+    Pipeline_Orchestrator["Pipeline Orchestrator"]
+
+    Command_Line_Argument_Parser["Command-Line Argument Parser"]
+
+    Sequence_Aligner_OspA_Specific_["Sequence Aligner (OspA Specific)"]
+
+    BLAST_Aligner["BLAST Aligner"]
+
+    Alignment_Quality_Extractor["Alignment Quality Extractor"]
+
+    Best_Alignment_Selector["Best Alignment Selector"]
+
+    Alignment_QC_Checker["Alignment QC Checker"]
+
+    Consensus_Sequence_Builder["Consensus Sequence Builder"]
+
+    Variant_Similarity_Analyzer["Variant Similarity Analyzer"]
+
+    Pipeline_Orchestrator -- "receives configuration from" --> Command_Line_Argument_Parser
+
+    Pipeline_Orchestrator -- "initiates alignment" --> Sequence_Aligner_OspA_Specific_
+
+    Pipeline_Orchestrator -- "initiates alignment" --> BLAST_Aligner
+
+    Pipeline_Orchestrator -- "requests quality extraction" --> Alignment_Quality_Extractor
+
+    Pipeline_Orchestrator -- "queries for best alignment" --> Best_Alignment_Selector
+
+    Pipeline_Orchestrator -- "requests QC check" --> Alignment_QC_Checker
+
+    Pipeline_Orchestrator -- "initiates consensus building" --> Consensus_Sequence_Builder
+
+    Pipeline_Orchestrator -- "requests variant analysis" --> Variant_Similarity_Analyzer
+
+    click Pipeline_Orchestrator href "https://github.com/pfizer-opensource/LISTT/blob/main/.codeboarding//Pipeline_Orchestrator.md" "Details"
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Details
+
+
+
+The Pipeline Orchestrator is the central control unit of the bioinformatics pipeline, managing the overall execution flow based on the input mode (NGS or Assembly). It coordinates various stages by invoking specialized modules for specific tasks, ensuring a proper sequence of operations from initial data processing to final analysis.
+
+
+
+### Pipeline Orchestrator [[Expand]](./Pipeline_Orchestrator.md)
+
+The main entry point and coordinator of the entire bioinformatics pipeline. It directs the flow, handles file system setup (e.g., creating output directories), and orchestrates calls to other modules based on the input data type (NGS reads or assembled genomes).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Pipeline Orchestrator` (1:1)
+
+
+
+
+
+### Command-Line Argument Parser
+
+Responsible for parsing and validating command-line arguments provided by the user, configuring the pipeline's execution parameters and input files.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Command-Line Argument Parser` (1:1)
+
+
+
+
+
+### Sequence Aligner (OspA Specific)
+
+Performs sequence alignment of raw reads against reference sequences, specifically tailored for OspA gene analysis in NGS mode. It generates alignment files (e.g., BAM, VCF).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Sequence Aligner (OspA Specific)` (1:1)
+
+
+
+
+
+### BLAST Aligner
+
+Utilizes the BLAST algorithm to align assembled genomes or sequences against a reference database, primarily used in the assembly mode of the pipeline.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `BLAST Aligner` (1:1)
+
+
+
+
+
+### Alignment Quality Extractor
+
+Extracts and processes quality metrics from variant call format (VCF) files generated during the alignment stage, providing data for downstream quality control.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Alignment Quality Extractor` (1:1)
+
+
+
+
+
+### Best Alignment Selector
+
+Analyzes multiple alignment results and determines the optimal alignment based on predefined quality criteria, ensuring the most reliable data is used for subsequent steps.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Best Alignment Selector` (1:1)
+
+
+
+
+
+### Alignment QC Checker
+
+Performs quality control checks on the selected best alignment, evaluating metrics such as coverage to ensure the alignment meets the required quality thresholds for further analysis.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Alignment QC Checker` (1:1)
+
+
+
+
+
+### Consensus Sequence Builder
+
+Constructs a consensus sequence from the aligned reads or assembled genome, incorporating variant information to generate a representative sequence.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Consensus Sequence Builder` (1:1)
+
+
+
+
+
+### Variant Similarity Analyzer
+
+Compares the generated consensus sequence against known variants or reference sequences to determine serotype, species, or other relevant genetic characteristics.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `Variant Similarity Analyzer` (1:1)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)