2 Node Solr Deployment Steps (WIP)

Airflow builds, deployments, runs

Stage:

Infrastructure: Terraform-built / updated VMs as appropriate (1 VM)
- Mount for harvested data
- VM for Airflow
- within stage, same VM for Airflow database
- ansible-server-playbook runs
Airflow code (from official airflow releases) deploy:
- ansible playbook for bare bones airflow deployment
- set up system-level airflow configs (logs, etc.)
Integration tests / Data QA testing (future?)
- at the DAG repo level (future? manual?)
Deploy our airflow DAGS
- use the above playbook, but point at git repos for various DAGs we copy to Airflow staging VM
- whenever airflow DAG git repo has PR or Master merge (or release?) with 'airflow-stage' flag, that triggers ansible-play run that updates DAGs on Airflow Stage
Dev eyeball test

TUL Cob Stage => Airflow Stage Overlaps:

Solr Schema.xml update(s)
- 2 node setup:
  - turn off replication
  - pause partial updates to solr staging index;
  - update schema on leader node
  - perform full reindex (with updated mappings / traject configs) => airflow needs to know where to point to (consumer); always check for new traject configs (default) or traject config version indicated by envvar in airflow?;
  - restart partial updates to solr staging index;
  - turn on replication;
  - message that replication is done & you can deploy clients of solr to use updated solr index + schema (where does this happens?).
- solrcloud setup:
  - create new solrcloud collection named after schema.xml release / version
  - deploy new schema.xml there
  - perform full index (with updated mappings / traject configs) => airflow needs to know where to point to (consumer); always check for new traject configs (default) or traject config version indicated by envvar in airflow?;
  - start partial updates to solr staging index;
  - redeploy blacklight app node in cluster with updated envvar pointing to new solrcloud collection
  - iterate through every node in blacklight app cluster
Traject configuration / mapping update(s) (i.e. indexing_config.rb changes)
- 2 node setup:
  - pause partial updates to solr staging index;
  - perform full reindex (with updated mappings / traject configs) => airflow needs to know where to point to (consumer); always check for new traject configs (default) or traject config version indicated by envvar in airflow?;
  - restart partial updates to solr staging index;
  - turn on replication;
  - message that replication is done & you can deploy clients of solr to use updated solr index + schema (where does this happens?).
- solrcloud setup:
  - create new solrcloud collection named after traject config release / version
  - perform full index (with updated mappings / traject configs) => airflow needs to know where to point to (consumer); always check for new traject configs (default) or traject config version indicated by envvar in airflow?;
  - start partial updates to solr staging index;
  - redeploy blacklight app node in cluster with updated envvar pointing to new solrcloud collection
  - iterate through every node in blacklight app cluster
Solr replication config update(s) (rare)
- 2 node setup
  - should never change?
- solrcloud setup
  - managed for us
Solr version updates (solr 6.6 is current version)
- 2 node setup
  - rebuild infrastructure for new nodes with updated version
  - update airflow, blacklight, etc. to point to new solr urls (wherever that pointing configuration occurs, we hope consul)
- solrcloud setup
  - christina will find dev notes on how this was done at stanford

Questions:

Currently, presume dev will deploy DAG changes in coordination with any jobs running on the existing DAG in stage; would like better automated orchestration of this in the future;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2 Node Solr Deployment Steps (WIP)

Uh oh!

Uh oh!

Clone this wiki locally