Skip to content

Conversation

@mart-r
Copy link
Collaborator

@mart-r mart-r commented Aug 29, 2025

This PR adds only the ES parts from working_with_cogstack.
This is based on the newer cogstack2.py and relevant search notebook.

Currently:

  • Move code from WWC
  • Refactor code
  • Add a few simple tests for cogstack.py module
  • Add a test that runs through the notebook
  • Add workflow
  • Real world validation
  • Publish on PyPI
  • Documentation on usage alongside the MedCAT-specific parts

TODO later:

NOTE:
The "real world validation" was performed as follows:

  • For each of ES8, ES9, and OS
    • Run the server using
      • tests/.e2e/run_es8_locally.sh
      • tests/.e2e/run_es9_locally.sh
      • tests/.e2e/run_os_locally.sh
    • Run local tests
      • pytest tests/.e2e/test_locally.py

Later changes to this PR makes a number of important changes:

  • The Cogstack class in cogstack.py will now be distributed through PyPI as cogstack-es
    • Though import will still be from cogstack import CogStack
  • The package will be expected to be installed with a specified optional extra
    • Currently, either ES8, ES9, or OS for v8 or v9 of Elasticsearch or Opensearch (v2)
      • e.g pip install cogstack-es[ES9]
    • If installed without any of these, an exception is raised at import time to suggest you do so
  • The code between the different back ends (ES or OS) was seprated
    • The ES specifis code is in cogstack.es and the OS specified code in cogstack.os
    • The imports are dynamic, but elasticsearch will be preferred if both are available
  • Credentials can now be read automatically from environmental variables
    • The changable credentials.py is no longer
    • See the readme changes and/or src/cogstack/read_creds.py for details
  • CI tests against all combinations
    • Backends ES8, ES9, OS
    • Python 3.10, 3.11, 3.12, 3.13
    • Type checks are done with both ES and OS installed
    • But the rest (linting, tests) are done for all combinations separately

@vladd-bit
Copy link
Member

vladd-bit commented Oct 29, 2025

Looks good, the only thing can be add here is to use a specific docker image release for OS tests (3.2.0 for e.g) which we currently use on main cg in non-native-ES deployments, just so we know what behavior to expect in case we update these images in the future.

@mart-r
Copy link
Collaborator Author

mart-r commented Oct 29, 2025

Looks good, the only thing can be add here is to use a specific docker image release for OS tests (3.2.0 for e.g) which we currently use on main cg in non-native-ES deployments, just so we know what behavior to expect in case we update these images in the future.

So you're suggesting we explicitly set runs-on: ubuntu-24.04 for the workflows?

@vladd-bit
Copy link
Member

vladd-bit commented Oct 29, 2025

in this case, by OS I mean OpenSearch 😄 referenced in tests/.e2e/run_os_locally.sh ,

opensearchproject/opensearch:latest
to
opensearchproject/opensearch:3.2.0

@vladd-bit vladd-bit self-assigned this Oct 29, 2025
@mart-r
Copy link
Collaborator Author

mart-r commented Oct 29, 2025

in this case, by OS I mean OpenSearch 😄 referenced in tests/.e2e/run_os_locally.sh ,

opensearchproject/opensearch:latest to opensearchproject/opensearch:3.2.0

That does make more sense! Yes, will do. Though I ran these with :latest 5 days ago, so it will have used 3.3.1. So I'll use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants