Skip to content

Conversation

sofyalaski
Copy link
Member

@sofyalaski sofyalaski commented Oct 3, 2025

Description

This is a big PR that introduces two changes.

  1. This is a big change and it introduces many new components and a dependency on another backend. This, however, can be turned off in the config.json file by setting:
  "ingestorComponent": {
    "ingestorEnabled": false,
    }
  1. At the SciCatCon 2025, we discussed a new project to simplify ingestion. This PR also introduces a change to how the button "Create Dataset" below the filter side bar at the Dataset page behaves according to Change "create dataset" button #1912 . ( Option to control it already present in config - addDatasetEnabled)

Motivation

At PSI with OpenEM we have been working on a new ingestor backend that will allow data ingestion from sites different from the host of SciCat Catalog. This is represented by Point 1. An addition of the Ingestor backend repo into SciCatProject is planned as well.

Changes:

For point 2:

  • when user wants to ingest dataset into SciCat from frontend, the dialog opens, where user enters dataset-specific information.
  • User can provide a url to json schema for scientific metadata. We only check if provided JSON is a valid object.
  • If user provided schema, an additional set of questions is created based on that schema ( with Json forms) and user can specify the scientific metadata details based on it
  • Conformation page where user can review entered metadata
  • Dataset is being added after the submission of the form

For point 1:

  • config.json changes include this new object:
"ingestorComponent": {
    "ingestorEnabled": true,
    "ingestorAutodiscoveryOptions": [
      {
        "mailDomain": "university.org",
        "description": "University/facility of Choice",
        "facilityBackend": "https://facility-ingestor.facility.org"
      }
    ]
  },
  • The main option to turn off the component entirely is controlled by the ingestorEnabled value. This will redirect call to ingestor to 404. When turned on, the ingestor component is available at /ingestor/ with a link in the hamburger menu.

  • ingestorAutodiscoveryOptions is an optional argument and constitutes an array of available facilities running ingestor software.

  • facilityBackend is a reachable backend of the ingestor service.

  • mailDomain is used to match the email of logged-in user against the mailDomain value as a regular expression and in case of success, automatically connect to the respective backend. A regular expression is used to connect to the email of form "staff.university.org" or similar.

  • description is optional, but in case of the match with mailDomain will prefill the creationLocation property in the dataset schema

  • Ingestor component ( when used with the backend ) looks similar to the Point 2 and represents a set of dialogs for SciCat dataset and scientific metadata ingestion, with most of the information prefilled. For this, it interacts with ingestor backend, which does all the hard work such as:

    • loading set of available methods, which correspond to the available metadata extractors.
    • upon method selction, extraction of the metadata into a newly generated json object, that is used in the scientificMetadata
    • creation of a dataset on SciCat
    • creation of a transferring job to the tape.

Tests included

  • Included for each change/fix?
  • Passing? (Merge will not be approved unless this is checked)

Documentation

  • swagger documentation updated [required]
  • official documentation updated [nice-to-have]

official documentation info

If you have updated the official documentation, please provide PR # and URL of the pages where the updates are included

Backend version

  • Does it require a specific version of the backend
  • which version of the backend is required:

sofyalaski and others added 30 commits December 17, 2024 13:17
…ent (SciCatProject#1673)

* fix: optimize condition editing logic in DatasetsFilterSettingsComponent

* if user creates duplicated condition, do nothing

* add snackbar notification for duplicate condition in DatasetsFilterSettingsComponent

* remove unused import

* remove panelClass from snackBar

* added e2e test for the change
* feat: add the new auth service to prepare for the new sdk

* try to fix some ai-bot review suggestions

* add the note for the good review suggestion from ai-bot

* remove old sdk and adjust types against the new one

* fix more types and issues against the new sdk

* finalize type error fixes

* remove prefix

* add the new sdk generation script for local development

* start fixing TODOs after newly generated sdk

* fixed sdk local generation for linux

* update the sdk package version and fix some more types

* detect the OS and use the right current directory path

* improve types and fix more TODOs

* improve types and fix TODOs after backend improvements

* finalize TODOs and FIXMEs fixes and type improvements with the new sdk

* fix some sourcery-ai comments

* fix some of the last TODOs

* adapted sdk generation to unix environment

* ignore the @SciCatProject that is generated with the sdk

* start fixing tests with the new sdk

* add needed stub classes and fix some more tests

* continue fixing unit tests

* try to fix e2e tests and revert some changes that need more attention for now

* changes to just run the tests

* use latest sdk

* update package-lock file

* fixing unit tests

* fix more unit tests

* continue fixing tests

* update the sdk

* fix last e2e test

* fix thumbnail unit tests

* revert some change

* finalize fixing unit tests

* revert the backend image changes after the tests pass

* add some improvements in the mocked objects for unit tests based on ai bot suggestion

* remove encodeURIComponent in the effects as it seems redundant

* fix test files after some changes

* try to use mock objects as much as possible

* update the sdk version

* update package-lock file

* update the sdk to latest

* BREAKING CHANGE: new sdk release

---------

Co-authored-by: martintrajanovski <[email protected]>
Co-authored-by: Jay <[email protected]>
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @sofyalaski, your pull request is larger than the review limit of 150000 diff characters

@sofyalaski sofyalaski changed the title Ingestor component feat: ingestor component for datasets Oct 6, 2025
@sofyalaski sofyalaski marked this pull request as ready for review October 7, 2025 12:49
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @sofyalaski, your pull request is larger than the review limit of 150000 diff characters

@sbliven
Copy link
Member

sbliven commented Oct 8, 2025

Great work putting this together @sofyalaski!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants