Skip to content

Conversation

@chinyeungli
Copy link
Contributor

@chinyeungli chinyeungli commented Apr 15, 2025

… of test code that will need to be removed).

Signed-off-by: Chin Yeung Li <[email protected]>
```
pkg:golang/github.com/*
pkg:golang/gitlab.com/*
pkg:golang/bitbucket.org/*
```

Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
 * Collect metadata from API for the following "namespace"
 ```
 pkg:golang/github.com/*
 pkg:golang/gitlab.com/*
 pkg:golang/bitbucket.org/*
```
 * Add tests
 * Add "golang" in the "supported_ecosystems" list in the api.py

Signed-off-by: Chin Yeung Li <[email protected]>
@chinyeungli chinyeungli requested a review from JonoYang April 15, 2025 22:40
@chinyeungli chinyeungli changed the title 596 add on demand package data collection for golang add on demand package data collection for golang #596 Apr 15, 2025
Copy link
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinyeungli I am looking at https://github.com/package-url/purl-spec/blob/main/PURL-SPECIFICATION.rst#rules-for-each-purl-component and I am not sure if we can add gitlab.com in the package namespace otherwise, the code looks good.


if from_go_lang:
packages[0].type = "golang"
packages[0].namespace = "github.com/" + packages[0].namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinyeungli could there be golang packages not from github?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Only golang packages from github use this map_fetchcode_supported_package function.
Others will use map_golang_package()

version = ""
if "@" in purl_str:
version = purl_str.rpartition("@")[2]
subset = purl_str.partition("pkg:golang/gitlab.com/")[2].partition("@")[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license_text = package_data.get("licenses")
extracted_license_statement = [license_text]

download_url = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go has some weird rules to encode upper case in these strings. See https://github.com/aboutcode-org/go-inspector/blob/442bc5b83d5aeff2b7a27937ec82b63277bc8f7c/src/go_inspector/utils.py#L211

We are adding support for getting golang download URL in PURL library. @pombredanne @chinyeungli I think we can reuse that here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package-url/packageurl-python#195 here is PR for same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TG1999 Thanks. We can use the code there once it's merged.

chinyeungli and others added 5 commits July 29, 2025 15:30
    * This is so we can use the updated packageurl-python library

Signed-off-by: Jono Yang <[email protected]>
 * purldb depends on scancodeio which depends on sctk 32.4.0 (scancodeio 35.1.0 depends on scancode-toolkit==32.4.0)

Signed-off-by: Chin Yeung Li <[email protected]>
@pombredanne pombredanne changed the title add on demand package data collection for golang #596 add on demand package data collection for golang, gitlab and bitbucket #596 Sep 2, 2025
for item in data["values"]:
version = item["name"]
author = ""
if "target" in item and item["target"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using .get here

target = item.get("target") or {}
author = target.get("author") or {}
if author.get("type") == "author":
   user = author.get("user") or {}
   author_display_name = user.get("author")

break

for tag in data:
version_list.append(tag["name"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "name" property does not exits on a tag it should not crash, We should log and continue

data = response.json()
version_author_list = []
# Get all available versions
for item in data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO .get will be a more better option

]
data = response.json()
# Search for license files in the root directory
for item in data["values"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a .get will be a better choice

Copy link
Contributor

@TG1999 TG1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinyeungli majorly looks good to me, IMO we should use .get instead of directly getting an item using dict"foo"], so we can log that and know whenever the contract changes from upstream.

Copy link
Contributor

@TG1999 TG1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@TG1999 TG1999 merged commit edff9e1 into main Sep 9, 2025
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add on-demand package data collection for golang

4 participants