Imageomics · egrace479 · Jun 3, 2025 · Feb 14, 2025 · Feb 14, 2025 · Feb 17, 2025
diff --git a/.markdownlint.json b/.markdownlint.json
@@ -0,0 +1,5 @@
+{
+  "MD007": { "indent": 4 },
+  "no-hard-tabs": false,
+  "MD013": false
+}
diff --git a/docs/index.md b/docs/index.md
@@ -15,7 +15,7 @@ Check out our guides to get your project off on the right foot!
 
 - [The Hugging Face Repo Guide](wiki-guide/Hugging-Face-Repo-Guide.md): Analogous expected and suggested repository contents for Hugging Face repositories; there are notable differences from GitHub in both content and structure.
 
-- [Metadata Guide](wiki-guide/Metadata-Guide.md): Guide to metadata collection and documentation. This closely follows our [HF Dataset Card Template](wiki-guide/HF_DatasetCard_Template_mkdocs.md) sections.
+- [FAIR Guide](wiki-guide/FAIR-Guide.md): Guide to producing FAIR digital products, from metadata collection through product documentation and publication. This builds on the content in both the GitHub and Hugging Face Repository Guides, providing checklists to ensure [code](wiki-guide/Code-Checklist.md), [data](wiki-guide/Data-Checklist.md), and [model](wiki-guide/Model-Checklist.md) repositories are FAIR. The latter two closely follow our [HF Templates](wiki-guide/About-Templates.md).
 
 ### Project repo up, what's next?
 Check out our workflow guides for how to interact with your new repo:

diff --git a/docs/wiki-guide/Code-Checklist.md b/docs/wiki-guide/Code-Checklist.md
@@ -0,0 +1,107 @@
+# Code Checklist
+
+This checklist provides an overview of essential and recommended elements to include in a GitHub repository to ensure that it conforms to FAIR principles and best practices for reproducibility. Along with the generation of a DOI (see [DOI Generation](DOI-Generation.md) and [Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md)), following this checklist ensures compliance with the FAIR Principles for research software.[^1]
+[^1]: Barker, M., Chue Hong, N. P., Katz, D. S., Lamprecht, A. L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., & Honeyman, T. (2022). Introducing the FAIR Principles for research software. _Scientific data_, 9(1), 622. [URL](https://doi.org/10.1038/s41597-022-01710-x).
+
+!!! tip "Pro tip"
+
+    Use the eye icon at the top of this page to access the source and copy the markdown for the checklist below into an issue on your GitHub [Repo](GitHub-Repo-Guide.md) or [Project](Guide-to-GitHub-Projects.md) so you can check the boxes as you add each element to your GitHub repository.
+
+## Required Files
+
+- [ ] **License**: Verify and include an appropriate license (e.g., `MIT`, `CC0-1.0`, etc.). See discussion in the [Repo Guide](GitHub-Repo-Guide.md/#license).
+- [ ] **README File**: Following the [Repo Guide](GitHub-Repo-Guide.md/#readme), provide a detailed `README.md` with:
+    - [ ] Overview of the project.
+    - [ ] Installation instructions.
+    - [ ] Basic usage examples.
+    - [ ] Links to related/created dataset(s).
+    - [ ] Links to related/created model(s).
+    - [ ] Acknowledge source code dependencies and contributors.
+    - [ ] Reference related datasets used in training or evaluation.
+- [ ] **Requirements File**: Provide a [file detailing software requirements](GitHub-Repo-Guide.md/#software-requirements-file), such as a `requirements.txt` or `pyproject.toml` for Python dependencies.
+- [ ] **Gitignore File**: GitHub has premade `.gitignore` files ([here](https://github.com/github/gitignore)) tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc.
+- [ ] **CITATION CFF**: This facilitates citation of your work, follow guidance provided in the [Repo Guide](GitHub-Repo-Guide.md/#citation).
+
+### Data-Related
+
+- [ ] Preprocessing code.
+- [ ] Description of dataset(s), including description of training and testing sets (with links to relevant portions of dataset card, which will have more information).
+
+### Model-Related
+
+- [ ] Training code.
+- [ ] Inference/evaluation code.
+- [ ] Model weights (if not in Hugging Face model repository).
+- [ ] Description of model(s)/benchmark(s).
+- [ ] Explanation of training and testing (with links to relevant portions of model card, which will have more information).
+
+!!! note
+    The [bioclip GitHub repository](https://github.com/Imageomics/bioclip) provides an example of incorporating data-and model-related code into a GitHub repository as published open-source code for both data and model development.
+
+## General Information
+
+- [ ] **Repository Structure**: Ensure the code repository follows a clear and logical directory structure. (See [Repo Guide](GitHub-Repo-Guide.md/#general-repository-structure).)
+- [ ] **Code Comments**: Include meaningful inline comments and function descriptions for clarity.
+- [ ] **Random Seed Control**: Save seed(s) for random number generator(s) to ensure reproducible results.
+
+## Security Considerations
+
+- [ ] **Sensitive Data Handling**: Ensure no hardcoded sensitive information (e.g., API keys, credentials) are included in your repository. These can be shared through a config file on OSC.
+
+!!! note
+    The best practices described below will help you meet the above requirements. The more advanced development practices noted further down are included for educational purposes and are highly recommended&mdash;though these may go beyond what is expected for a given project, we advise collaborators to at least have a discussion about the topics covered in [Code Quality](#code-quality) and whether other practices discussed would be appropriate for their project.
+
+---
+
+## Best Practices
+
+The [Repo Guide](GitHub-Repo-Guide.md/) provides general guidance on repository structure, [collaborative workflow](The-GitHub-Workflow.md/), and [how to make and review pull requests (PR)](The-GitHub-Pull-Request-Guide.md/). Below, we highlight some best practices in checklist form to help you meet the requirements described above for a FAIR and Reproducible project.
+
+### Reproducibility
+
+- **Version Control**: Use Git for version control and commit regularly.
+- **Modularization**: Structure code into reusable and independent modules.
+- **Code Execution**: Provide Notebooks to demonstrate how to reproduce results.
+
+### Code Review & Maintenance
+
+- **Code Reviews**: Regular peer reviews for quality assurance. Refer to the [GitHub PR Review Guide](The-GitHub-Pull-Request-Guide.md/#2-review-a-pull-request).
+- **Issue Tracking**: Use GitHub issues for tracking bugs and feature requests.
+- **Versioning**: Tag releases, changelogs can be auto-generated and informative when PRs are appropriately scoped.
+
+### Installation and Dependencies
+
+- [ ] **Environment Setup**: Include setup instructions (e.g., `conda` environment file, `Dockerfile`).
+- [ ] **Dependency Management**: Use virtual environments and the frameworks that manage them (e.g., `venv`, `conda`, `uv` for Python) to isolate dependencies.
+
+---
+
+## More Advanced Development
+
+### Documentation
+
+- [ ] **API Documentation**: Generate API documentation (e.g., [`MkDocs`](https://www.mkdocs.org) for Python or wiki pages in the repo).
+- [ ] **Docstrings**: Add comprehensive docstrings for all functions, classes, and modules. These can be incorporated to help generate documentation. Note that generative AI tools with access to your code, such as GitHub Copilot, can be quite accurate in generating these, especially if you are using type annotations. 
+- [ ] **Example Scripts**: Include example scripts for common use cases.
+- [ ] **Configuration Files**: Use `yaml`, `json`, or `ini` for configuration settings.
+
+### Code Quality
+
+- [ ] **Consistent Style**: Follow coding style guidelines (e.g., `PEP 8` for Python).
+- [ ] **Linting**: Ensure the code passes a linter (e.g., `Ruff` for Python).
+- [ ] **Logging**: Use logging instead of print statements for better debugging (e.g., `logging` in Python).
+- [ ] **Error Handling**: Implement robust exception handling to avoid crashes or bogus results from input outside of code expectations.
+
+### Testing
+
+- [ ] **Unit Tests**: Write unit tests to validate core functionality.
+- [ ] **Integration Tests**: Ensure components work together correctly.
+- [ ] **Test Coverage**: Check test coverage, e.g., using [Coverage](https://coverage.readthedocs.io/).
+- [ ] **Continuous Integration (CI)**: Set up CI/CD pipelines (e.g., [GitHub Actions](https://docs.github.com/en/actions)) for automated testing.
+
+### Code Distribution & Deployment
+
+- [ ] **Packaging**: Provide installation instructions (e.g., `setup.py`, `hatch`, `poetry`, `uv` for Python).
+- [ ] **Deployment Guide**: Document deployment procedures
+
+!!! question "[Questions, Comments, or Concerns?](https://github.com/Imageomics/Imageomics-guide/issues)"
diff --git a/docs/wiki-guide/DOI-Generation.md b/docs/wiki-guide/DOI-Generation.md
@@ -1,31 +1,28 @@
 # DOI Generation
 
 This guide discusses DOI generation for digital artifacts that may be associated with publications, such as datasets, models, and software.
-You are likely familiar with DOIs from citing (journal/arXiv/conference) papers, for which they are generated by the publisher and regularly used in citations. However, they are also invaluable for proper citation of code, models, and data. One may think of this in the manner they are handled on arXiv, where there are options for "Cite as:" or "for this version" (with the "v#" at the end) option when citing a preprint.
+You are likely familiar with DOIs from citing (journal/arXiv/conference) papers, for which they are generated by the publisher and regularly used in citations. However, they are also invaluable for proper citation of code, models, and data. Similar to how DOIs help track different versions of preprints on repositories like arXiv, they can provide persistent identification and versioning for your research artifacts beyond traditional publications.
 
 ## What is a DOI?
 
-A DOI (Digital Object Identifier) is a _persistent_ (permanent) digital identifier for any object (data, model, code, etc.) that _uniquely_ distinguishes it from other objects and links to information&mdash;metadata&mdash;about the object. The International DOI Foundation (IDF) is responsible for developing and administering the DOI system. See their [What is a DOI](https://www.doi.org/the-identifier/what-is-a-doi/) article for more information.
-
+A DOI (Digital Object Identifier) is a _persistent_ (permanent) digital identifier for any object (data, model, code, etc.) that _uniquely_ distinguishes it from other objects and links to information&mdash;metadata&mdash;about the object. The International DOI Foundation (IDF) is responsible for developing and administering the DOI system. See their [What is a DOI?](https://www.doi.org/the-identifier/what-is-a-doi/) article for more information.
 
 ## How do you generate a DOI?
 
 When publishing code, data, or models, there are various options for DOI generation, and selecting one is generally dependent on where the object of interest is published. We will go over the two standard methods used by the Institute here, and we mention a third option for completeness. A comparison of these three options is provided in the [Data Archive Options Comparative Overview](../pdfs/Data_Archive-Publication-Options-Comparative-Overview.pdf).
 
-
 ### 1. Generate a DOI on Hugging Face
 
-This is the simplest method for generating a DOI for a model or dataset since [Hugging Face partnered with DataCite to offer this option](https://huggingface.co/blog/introducing-doi). 
+This is the simplest method for generating a DOI for a model or dataset since [Hugging Face partnered with DataCite to offer this option](https://huggingface.co/blog/introducing-doi).
 
 !!! warning "Warning"
-    Though it is a very simple process, it is not one to be taken lightly, as there is no removing data once this has been done--any changes require generation of a ***new*** DOI for the updated version: the old version will be maintained in perpetuity!
+    Though it is a very simple process, it is not one to be taken lightly, as there is no removing data once this has been done--any changes require generation of a _**new**_ DOI for the updated version: the old version will be maintained in perpetuity!
 
 !!! warning "Warning"
     As stated in the [Imageomics Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md), DOIs are not to be generated for Imageomics Organization Repositories until approval has been granted by the Senior Data Scientist or Institute Leadership.
 
 Hugging Face allows for the generation of a DOI through the settings tab on the Model or Dataset. For details on _how_ to generate a DOI with Hugging Face, please see the [Hugging Face DOI Documentation](https://huggingface.co/docs/hub/doi).
 
-
 ### 2. Generate a DOI with Zenodo
 
 This is the most common method used for generating a DOI for a GitHub repository, because [Zenodo](https://zenodo.org/) has a [GitHub integration](https://zenodo.org/account/settings/github/), which is accessed through your Zenodo account settings (for more information, please see [GitHub's associated Docs](https://docs.github.com/articles/referencing-and-citing-content)). Zenodo can also be used to generate DOIs for data, as is relatively common in biology. However, for direct use of ML models and datasets, there are many more advantages to using Hugging Face; please see the [Data Archive Options Comparative Overview](../pdfs/Data_Archive-Publication-Options-Comparative-Overview.pdf) for more information.[^1]
@@ -38,11 +35,11 @@ When your GitHub and Zenodo accounts are linked, there will be a list of availab
 ![Zenodo instructions and enabled repos](images/doi-generation/enabled_repos+intstructions.png){ loading=lazy, width="800" }
 
 !!! info "The Sync now button"
-    There is a "Sync now" button at the top right of the instructions, with information on when the last sync occurred. Observe that a badge appears for the enabled repository that <b>_has_</b> a DOI, while the one without just shows up as enabled; this will also be true for repositories to which you have access but that you did not submit to Zenodo yourself.
+    There is a "Sync now" button at the top right of the instructions, with information on when the last sync occurred. Observe that a badge appears for the enabled repository that **_has_** a DOI, while the one without just shows up as enabled; this will also be true for repositories to which you have access but that you did not submit to Zenodo yourself.
 
 #### Metadata Tracking
 
-When automatically generating a DOI with Zenodo, it uses information provided in your `CITATION.cff` file to populate the metadata for the record. However, there is important information that is not supported through this integration despite its inclusion in the `CITATION.cff` format in some cases. 
+When automatically generating a DOI with Zenodo, it uses information provided in your `CITATION.cff` file to populate the metadata for the record. However, there is important information that is not supported through this integration despite its inclusion in the `CITATION.cff` format in some cases.
 
 If your repository is likely to be updated repeatedly (i.e., generating new releases), then you may consider adding a `.zenodo.json` to preserve the remaining metadata on release sync with Zenodo for DOI. This metadata includes grant (funding) information, references (which may be included in your `CITATION.cff`), and a description of your repository/code.
 
@@ -70,8 +67,8 @@ Building on the alternate edit options, there is also the option to simply gener
 
 When creating a new record on Zenodo, please ensure that other members of your project have access, as appropriate. In particular, there should be at least one member of Institute leadership or the Senior Data Scientist added to the record with management permissions. This ensures the ability to maintain the metadata and address matters related to the record (which may extend beyond your tenure with the Institute) in a timely manner.
 
-
 ### 3. Generate a DOI with Dryad
 
 [Dryad](https://datadryad.org/stash/about) is another research data repository, similar to Zenodo, through which one can archive digital objects (such as, but not limited to, data) supporting scholarly publications, and obtain a DOI. It has a review process when depositing data and requires dedication to the public domain (CC0) of all digital objects uploaded. Imageomics through OSU is a member organization of Dryad, reducing or eliminating data deposit charge(s). To determine whether Dryad is a suitable archive for Institute data products supporting your publication, please consider the [Data Archive Options Comparative Overview](../pdfs/Data_Archive-Publication-Options-Comparative-Overview.pdf) for more information, and consult with the Institute's Senior Data Scientist.[^1]
 
+!!! question "[Questions, Comments, or Concerns?](https://github.com/Imageomics/Imageomics-guide/issues)"