You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/wiki-guide/Digital-Product-Lifecycle.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,44 +11,44 @@ Although most of the engagement from the side of research teams is expected to (
11
11
12
12
The following adds additional context and direction to supplement the diagram, organized by project lifecycle stage.
13
13
14
-
### Setup Phase:
14
+
### Setup Phase
15
15
16
16
* NextGens and/or project[^1] PIs schedule a project consultation with the Senior Data Scientist. This will include scope and intended data usage for improved research convergence and to ensure projects start with all available resources in mind.
17
17
* In GitHub project repo, create an issue for each of the repositories for the digital products with the appropriate checklist:
18
-
***Code and workflows:** GitHub Repository ([Code checklist](Code-Checklist.md)).
19
-
***Datasets:** Hugging Face Dataset Repository ([Data checklist](Data-Checklist.md)).
20
-
* For already published data usage, see the [Metadata Checklist](Metadata-Checklist.md).
21
-
***ML Models:** Hugging Face Model Repository ([Model checklist](Model-Checklist.md)).
18
+
***Code and workflows:** GitHub Repository ([Code checklist](Code-Checklist.md)).
19
+
***Datasets:** Hugging Face Dataset Repository ([Data checklist](Data-Checklist.md)).
20
+
* For already published data usage, see the [Metadata Checklist](Metadata-Checklist.md).
21
+
***ML Models:** Hugging Face Model Repository ([Model checklist](Model-Checklist.md)).
22
22
23
-
### Exploration Phase:
23
+
### Exploration Phase
24
24
25
25
* Maintain record of any and all data utilized (source, license, citation, etc.).
26
-
* See [Data Sources Template](https://docs.google.com/spreadsheets/d/1r4-_Ytg2bwGMxLpYrk4GVhx61JSOYXANsSFjryNmsDE/edit?usp=drive_link).
26
+
* See [Data Sources Template](https://docs.google.com/spreadsheets/d/1r4-_Ytg2bwGMxLpYrk4GVhx61JSOYXANsSFjryNmsDE/edit?usp=drive_link).
27
27
* Document exploration of data.
28
-
* This establishes an understanding of what the data is and how it can be used. For an example and guidance, consider the exploration and documentation done in the [Data Workshop](https://github.com/Imageomics/data-workshop-AH-2024).
28
+
* This establishes an understanding of what the data is and how it can be used. For an example and guidance, consider the exploration and documentation done in the [Data Workshop](https://github.com/Imageomics/data-workshop-AH-2024).
29
29
* Record processing steps applied—maintained in a well-documented code repository (following [GitHub Guidance](GitHub-Repo-Guide.md))—and update Dataset Card(s) with information and links back to GitHub repository.
30
30
* Establish and update contributor list—follow the [Imageomics Author Guide](https://docs.google.com/spreadsheets/d/1GwlCukfoQPL8JI2yyWRD3g4uiMTO3tlGNE_qeb_xBCs/edit?usp=sharing).[^2]
31
-
* Authors and author order for the paper and codebase (and/or dataset) may differ, all should be discussed.
31
+
* Authors and author order for the paper and codebase (and/or dataset) may differ, all should be discussed.
32
32
33
-
### Model Development Phase:
33
+
### Model Development Phase
34
34
35
35
* Maintain a record of any and all base models utilized (source, license, citation, etc.).
36
36
* Record model experiments—scripts or Jupyter Notebooks, _documented_[^3] and maintained in GitHub for version control as different approaches are tried.
37
37
* Document model experiments and evaluation—record results of various tests performed and overall evaluation and comparison of these runs in Model Card(s) with links back to GitHub repository.
38
38
* Add all code used to generate figures to the project GitHub repository; including documentation for reproduction (e.g., package requirements, data info, instructions).
39
39
* Review (and revise as necessary) the Author/Contributor list(s).
40
40
41
-
### Preparing for Publication:
41
+
### Preparing for Publication Phase
42
42
43
43
* Project components should align with FAIR and Reproducibility principles:
44
-
* Completed and fully documented GitHub Repository for code (recall [Code checklist](Code-Checklist.md)).
45
-
* Completed and fully documented Hugging Face Dataset Repository for data products (recall [Data checklist](Data-Checklist.md)).
46
-
* If using an already published dataset, all requisite metadata and provenance information included (recall [Metadata checklist](Metadata-Checklist.md)). Specifically, ensure that all attribution requirements and/or expectations have been appropriately met.
47
-
* Completed and fully documented Hugging Face Model Repository for ML models (recall [Model checklist](Model-Checklist.md)).
44
+
* Completed and fully documented GitHub Repository for code (recall [Code checklist](Code-Checklist.md)).
45
+
* Completed and fully documented Hugging Face Dataset Repository for data products (recall [Data checklist](Data-Checklist.md)).
46
+
* If using an already published dataset, all requisite metadata and provenance information included (recall [Metadata checklist](Metadata-Checklist.md)). Specifically, ensure that all attribution requirements and/or expectations have been appropriately met.
47
+
* Completed and fully documented Hugging Face Model Repository for ML models (recall [Model checklist](Model-Checklist.md)).
48
48
* Schedule Review by Senior Data Scientist of data, model, and code repositories 3 weeks prior to camera-ready deadline (approval required for DOI generation).
49
49
* Review (and revise as necessary) the Author/Contributor list(s).
50
50
51
-
[^1]: Here we use the term project at a smaller scale to mean any endeavor resulting in a digital product (dataset, ML model, code) and/or paper (e.g., for the purposes of this policy [SST](https://github.com/Imageomics/SST) is a *project*, while Butterflies is not).
51
+
[^1]: Here we use the term project at a smaller scale to mean any endeavor resulting in a digital product (dataset, ML model, code) and/or paper (e.g., for the purposes of this policy [SST](https://github.com/Imageomics/SST) is a _project_, while Butterflies is not).
52
52
53
53
[^2]: Contributor lists should be started as early as possible and are subject to change as a project progresses; this is expected and the reason to review during each phase of development.
0 commit comments