diff --git a/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst b/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst index 49ff029..ca2717a 100644 --- a/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst +++ b/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst @@ -1,48 +1,43 @@ -################################################### - Adding ability to store/query downloaded packages -################################################### -**Organization:** `AboutCode `_ +===================================================== +Adding Ability to Store and Query Downloaded Packages +===================================================== -**Project:** `ScanCode.io -`_ +**Organization:** `AboutCode `__ -| **Varsha U N** -| GitHub: `VarshaUN `_ -| LinkedIn: `Varsha U N `_ +**Project:** `ScanCode.io `__ -**Mentors:** - -- `Philippe Ombredanne `_ -- `Ayan Sinha Mahapatra `_ - -********** - Overview -********** +| **Contributor:** Varsha U N +| **GitHub:** `VarshaUN `__ +| **LinkedIn:** `Varsha U N `__ -Currently ScanCode.io scans the packages but doesn’t store it. This -makes it difficult for users to maintain a reference of packages used in -their projects, meet source redistribution obligations, or revisit -scanned packages for future. +**Mentors:** +- `Philippe Ombredanne `__ +- `Ayan Sinha Mahapatra `__ -This project enhanced ScanCode.io by adding the ability to store and -query downloaded packages locally and re-use packages that were already -scanned. +Overview +-------- ----- +ScanCode.io currently stores scanned packages on disk without a centralized index, +leading to duplicate storage, project-specific data, and potential data loss when +inputs are deleted. This project enhances ScanCode.io by introducing structured +package storage and querying, enabling indexing, reuse across projects, and +reliable preservation. -**************** - Implementation -**************** +Implementation +-------------- The project involved the following key components and steps: + .. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png :alt: Project Flow Diagram :align: center :width: 70% - Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them. +This project addresses the limitations of ScanCode.io's unstructured package +storage by adding a system to index, reuse, and preserve packages reliably. + Storage System Development: @@ -79,37 +74,76 @@ Validation and Testing: `find`), testing normal cases, edge cases (e.g., empty files), and errors (e.g., duplicate origins). -********************** - Linked Pull Request: -********************** -Add download archiving system with local filesystem provider - -(https://github.com/aboutcode-org/scancode.io/pull/1815) +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 40 20 + :header-rows: 1 + + * - Sr. No + - Name + - Link + * - 1 + - Add download archiving system + - `scancode.io#1815 `__ + * - 2 + - Support local package storage + - `scancode.io#1685 `__ + +Related Issues +-------------- + +.. list-table:: + :widths: 10 40 20 + :header-rows: 1 + + * - Sr. No + - Name + - Link + * - 1 + - Store and retrieve scanned packages + - `#1063 `__ + * - 2 + - Support local package storage + - `#1683 `__ + +Pre-GSoC Work +------------- + +Here are some PRs submitted before GSoC: + +- `Add bluefin-container image support `__ +- `Tag whitedout files `__ +- `Support python-private-classifier `__ +- `Parse labels in Dockerfile `__ +- `Add OCI labels to Dockerfile `__ +- `Extract LibreOffice documents `__ + +Links +----- + +- **Project Idea:** `GSoC 2025 Idea `__ +- **GSoC Project Page:** `GSoC 2025 `__ +- **Proposal:** `Project Proposal `__ + +Future Work +----------- -**************** - Related Issue: -**************** +Future enhancements include implementing the web UI for the `LocalFilesystemProvider` +to enable package uploads, searches, listings, and retrievals in ScanCode.io, with +Django views, templates, and URL routes, backed by comprehensive testing. Additionally, +integrating an external cloud storage option (e.g., AWS S3) alongside the local +filesystem will extend the `DownloadStore` interface, providing scalable and remote +storage capabilities. -Store and retrieve on demand scanned packages/archives - -(https://github.com/aboutcode-org/scancode.io/issues/1063) +Closing Note +------------ -******** - Links: -******** +During GSoC 2025, my mentors and I held weekly meetings to discuss progress, +challenges, and next steps. I am deeply grateful to my mentors for their guidance +and support, which greatly enriched my learning experience. -| Project Idea: `Idea Link - `_ -| GSoC Project Page: `GSoC 2025 - `_ -| Proposal: `Proposal Link - `_ -*************** - Closing Notes -*************** -During the GSoC coding period, my mentors and I had weekly meetings to -discuss progress, challenges, and next steps. Thank you so much to my -mentors for being there every step of the way during GSoC 2025. Your -encouragement and insights made a huge difference in my learning -journey.