Skip to content

veldhub/veld_chain__compare_tokenizations

Repository files navigation

veld chain veld_chain__compare_tokenizations

This repo contains chain velds encapsulating comparison of two tokenization tools: teitok-tools and xmlanntools, using their veldified versions: veld_code__teitok-tools and veld_code__xmlanntools respectively. Additionally, veld_code__downloader is reused and veld_code__jupyter_notebook_base is integrated directly into this chain repo as a git subtree.

requirements

  • git
  • docker compose (note: older docker compose versions require running docker-compose instead of docker compose)

Clone this repo with all its submodules

git clone --recurse-submodules https://github.com/veldhub/veld_chain__compare_tokenizations.git

how to reproduce

The following chain velds were used. Open the respective veld yaml file for more information.

all chains in one go

./veld_step_all.yaml

This chain reuses the individual chains described below and allows batch execution of them all in one go.

docker compose -f veld_step_all.yaml up

all chains sequentially

./veld_step_1_download.yaml

Downloads a sample TEI XML from the german ELTeC corpus

docker compose -f veld_step_1_download.yaml up

./veld_step_2_xmlanntools.yaml

Runs xmlanntools to tokenize the TEI file.

docker compose -f veld_step_2_xmlanntools.yaml up

./veld_step_3_teitok.yaml

Runs teitok-tools to tokenize the TEI file.

docker compose -f veld_step_3_teitok.yaml up

./veld_step_4_jupyter_analysis.yaml

Launches a jupyter notebook that compares the two enriched TEI files. After running the following command, the notebook can be opened at http://localhost:8888/ . The notebook is persisted at ./code/veld_code__jupyter_analysis/src/enrichment_summary.ipynb

docker compose -f veld_step_4_jupyter_analysis.yaml up

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published