Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 90 additions & 56 deletions docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst
Original file line number Diff line number Diff line change
@@ -1,48 +1,43 @@
###################################################
Adding ability to store/query downloaded packages
###################################################

**Organization:** `AboutCode <https://aboutcode.org>`_
=====================================================
Adding Ability to Store and Query Downloaded Packages
=====================================================

**Project:** `ScanCode.io
<https://github.com/aboutcode-org/scancode.io>`_
**Organization:** `AboutCode <https://aboutcode.org>`__

| **Varsha U N**
| GitHub: `VarshaUN <https://github.com/VarshaUN>`_
| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_
**Project:** `ScanCode.io <https://github.com/aboutcode-org/scancode.io>`__

**Mentors:**

- `Philippe Ombredanne <https://github.com/pombredanne>`_
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_

**********
Overview
**********
| **Contributor:** Varsha U N
| **GitHub:** `VarshaUN <https://github.com/VarshaUN>`__
| **LinkedIn:** `Varsha U N <https://www.linkedin.com/in/varsha-un/>`__

Currently ScanCode.io scans the packages but doesn’t store it. This
makes it difficult for users to maintain a reference of packages used in
their projects, meet source redistribution obligations, or revisit
scanned packages for future.
**Mentors:**
- `Philippe Ombredanne <https://github.com/pombredanne>`__
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`__

This project enhanced ScanCode.io by adding the ability to store and
query downloaded packages locally and re-use packages that were already
scanned.
Overview
--------

----
ScanCode.io currently stores scanned packages on disk without a centralized index,
leading to duplicate storage, project-specific data, and potential data loss when
inputs are deleted. This project enhances ScanCode.io by introducing structured
package storage and querying, enabling indexing, reuse across projects, and
reliable preservation.

****************
Implementation
****************
Implementation
--------------

The project involved the following key components and steps:


.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
:alt: Project Flow Diagram
:align: center
:width: 70%

Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them.
This project addresses the limitations of ScanCode.io's unstructured package
storage by adding a system to index, reuse, and preserve packages reliably.


Storage System Development:

Expand Down Expand Up @@ -79,37 +74,76 @@ Validation and Testing:
`find`), testing normal cases, edge cases (e.g., empty files), and
errors (e.g., duplicate origins).

**********************
Linked Pull Request:
**********************

Add download archiving system with local filesystem provider -
(https://github.com/aboutcode-org/scancode.io/pull/1815)
Linked Pull Requests
--------------------

.. list-table::
:widths: 10 40 20
:header-rows: 1

* - Sr. No
- Name
- Link
* - 1
- Add download archiving system
- `scancode.io#1815 <https://github.com/aboutcode-org/scancode.io/pull/1815>`__
* - 2
- Support local package storage
- `scancode.io#1685 <https://github.com/aboutcode-org/scancode.io/pull/1685>`__

Related Issues
--------------

.. list-table::
:widths: 10 40 20
:header-rows: 1

* - Sr. No
- Name
- Link
* - 1
- Store and retrieve scanned packages
- `#1063 <https://github.com/aboutcode-org/scancode.io/issues/1063>`__
* - 2
- Support local package storage
- `#1683 <https://github.com/aboutcode-org/scancode.io/issues/1683>`__

Pre-GSoC Work
-------------

Here are some PRs submitted before GSoC:

- `Add bluefin-container image support <https://github.com/aboutcode-org/scancode.io/pull/1620>`__
- `Tag whitedout files <https://github.com/aboutcode-org/scancode.io/pull/1529>`__
- `Support python-private-classifier <https://github.com/aboutcode-org/scancode-toolkit/pull/4075>`__
- `Parse labels in Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__
- `Add OCI labels to Dockerfile <https://github.com/aboutcode-org/scancode-toolkit/pull/3987>`__
- `Extract LibreOffice documents <https://github.com/aboutcode-org/extractcode/pull/67>`__

Links
-----

- **Project Idea:** `GSoC 2025 Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`__
- **GSoC Project Page:** `GSoC 2025 <https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`__
- **Proposal:** `Project Proposal <https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`__

Future Work
-----------

****************
Related Issue:
****************
Future enhancements include implementing the web UI for the `LocalFilesystemProvider`
to enable package uploads, searches, listings, and retrievals in ScanCode.io, with
Django views, templates, and URL routes, backed by comprehensive testing. Additionally,
integrating an external cloud storage option (e.g., AWS S3) alongside the local
filesystem will extend the `DownloadStore` interface, providing scalable and remote
storage capabilities.

Store and retrieve on demand scanned packages/archives -
(https://github.com/aboutcode-org/scancode.io/issues/1063)
Closing Note
------------

********
Links:
********
During GSoC 2025, my mentors and I held weekly meetings to discuss progress,
challenges, and next steps. I am deeply grateful to my mentors for their guidance
and support, which greatly enriched my learning experience.

| Project Idea: `Idea Link
<https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_
| GSoC Project Page: `GSoC 2025
<https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_
| Proposal: `Proposal Link
<https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_

***************
Closing Notes
***************

During the GSoC coding period, my mentors and I had weekly meetings to
discuss progress, challenges, and next steps. Thank you so much to my
mentors for being there every step of the way during GSoC 2025. Your
encouragement and insights made a huge difference in my learning
journey.