- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 36
add on demand package data collection for golang, gitlab and bitbucket #596 #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add on demand package data collection for golang, gitlab and bitbucket #596 #608
Conversation
… golang Signed-off-by: Chin Yeung Li <[email protected]>
… of test code that will need to be removed). Signed-off-by: Chin Yeung Li <[email protected]>
``` pkg:golang/github.com/* pkg:golang/gitlab.com/* pkg:golang/bitbucket.org/* ``` Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
* Collect metadata from API for the following "namespace" ``` pkg:golang/github.com/* pkg:golang/gitlab.com/* pkg:golang/bitbucket.org/* ``` * Add tests * Add "golang" in the "supported_ecosystems" list in the api.py Signed-off-by: Chin Yeung Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinyeungli I am looking at https://github.com/package-url/purl-spec/blob/main/PURL-SPECIFICATION.rst#rules-for-each-purl-component and I am not sure if we can add gitlab.com in the package namespace otherwise, the code looks good.
|  | ||
| if from_go_lang: | ||
| packages[0].type = "golang" | ||
| packages[0].namespace = "github.com/" + packages[0].namespace | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinyeungli could there be golang packages not from github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Only golang packages from github use this map_fetchcode_supported_package function.
Others will use map_golang_package()
| version = "" | ||
| if "@" in purl_str: | ||
| version = purl_str.rpartition("@")[2] | ||
| subset = purl_str.partition("pkg:golang/gitlab.com/")[2].partition("@")[0] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne https://github.com/package-url/purl-spec/blob/main/PURL-SPECIFICATION.rst#rules-for-each-purl-component
Does this mean we cannot have things like gitlab.com in the namespace field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this is about getting the subset and version from a purl str that then pass to https://github.com/aboutcode-org/purldb/blob/596_add_on-demand_package_data_collection_for_golang/minecode/collectors/golang.py#L253
Signed-off-by: Chin Yeung Li <[email protected]> Co-authored-by: Jono Yang <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
        
          
                minecode/miners/golang.py
              
                Outdated
          
        
      | license_text = package_data.get("licenses") | ||
| extracted_license_statement = [license_text] | ||
|  | ||
| download_url = ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go has some weird rules to encode upper case in these strings. See https://github.com/aboutcode-org/go-inspector/blob/442bc5b83d5aeff2b7a27937ec82b63277bc8f7c/src/go_inspector/utils.py#L211
We are adding support for getting golang download URL in PURL library. @pombredanne @chinyeungli I think we can reuse that here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package-url/packageurl-python#195 here is PR for same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TG1999 Thanks. We can use the code there once it's merged.
Signed-off-by: Chin Yeung Li <[email protected]>
…ages #596 Signed-off-by: Chin Yeung Li <[email protected]>
* This is so we can use the updated packageurl-python library Signed-off-by: Jono Yang <[email protected]>
* purldb depends on scancodeio which depends on sctk 32.4.0 (scancodeio 35.1.0 depends on scancode-toolkit==32.4.0) Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
        
          
                minecode/collectors/bitbucket.py
              
                Outdated
          
        
      | for item in data["values"]: | ||
| version = item["name"] | ||
| author = "" | ||
| if "target" in item and item["target"]: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using .get here
target = item.get("target") or {}
author = target.get("author") or {}
if author.get("type") == "author":
   user = author.get("user") or {}
   author_display_name = user.get("author")
        
          
                minecode/collectors/github.py
              
                Outdated
          
        
      | break | ||
|  | ||
| for tag in data: | ||
| version_list.append(tag["name"]) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "name" property does not exits on a tag it should not crash, We should log and continue
| data = response.json() | ||
| version_author_list = [] | ||
| # Get all available versions | ||
| for item in data: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO .get will be a more better option
        
          
                minecode/miners/bitbucket.py
              
                Outdated
          
        
      | ] | ||
| data = response.json() | ||
| # Search for license files in the root directory | ||
| for item in data["values"]: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO a .get will be a better choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinyeungli majorly looks good to me, IMO we should use .get instead of directly getting an item using dict"foo"], so we can log that and know whenever the contract changes from upstream.
Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks
Uh oh!
There was an error while loading. Please reload this page.