Skip to content

Conversation

sarafarajnasardi
Copy link

  • Add maven.py module with enhanced JAR detection for Maven packages
  • Detect Maven JARs via pom.properties files and URL pattern analysis
  • Convert JAR PURLs to correct Maven format (pkg:jar → pkg:maven)
  • Add comprehensive test suite covering all detection scenarios
  • Update scan_codebase and inspect_packages pipelines

Fixes #1836

* Add maven.py module with enhanced JAR detection for Maven packages
* Detect Maven JARs via pom.properties files and URL pattern analysis
* Convert JAR PURLs to correct Maven format (pkg:jar → pkg:maven)
* Add comprehensive test suite covering all detection scenarios
* Update scan_codebase and inspect_packages pipelines

Signed-off-by: Sara Faraj <[email protected]>
@sarafarajnasardi sarafarajnasardi force-pushed the maven-jar-return-jar-package-type-1836 branch from 74cf7dd to c4cdb81 Compare September 9, 2025 11:43
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, did you get all the tests passing locally? Can you paste the test run results?

@sarafarajnasardi
Copy link
Author

Thanks for the feedback @pombredanne! You're absolutely right about code reuse being better than duplication. I've updated the implementation to leverage existing utilities from the ScanCode ecosystem:

Code Reuse Implementation ✅

  • Added conditional imports for packagedcode.maven utilities from scancode-toolkit
  • Integrated packageurl.contrib.url2purl for URL-to-PURL conversion from packageurl-python
  • Used packagedcode.utils.get_base_purl for canonical PURL normalization
  • Maintained graceful fallbacks when external utilities are unavailable to ensure backward compatibility

Test Results ✅

All tests are now passing with 100% success rate:

$ docker exec -it scancodeio-web-1 python manage.py test scanpipe.tests.pipes.test_maven -v 2

Found 8 test(s).
System check identified no issues (0 silenced).

test_detect_maven_jars_from_input_source_url ... ok
test_detect_maven_jars_from_pom_properties_basic ... ok  
test_extract_maven_coordinates_from_pom_properties ... ok
test_extract_maven_coordinates_from_url_invalid ... ok
test_extract_maven_coordinates_from_url_maven_central ... ok
test_extract_maven_coordinates_missing_fields ... ok
test_no_maven_jars_detected ... ok
test_validate_maven_coordinates_against_jar_package ... ok

----------------------------------------------------------------------
Ran 8 tests in 0.392s
OK

The implementation now properly reuses existing code from purldb, scancode-toolkit, and packageurl-python repositories as requested, while maintaining robust fallback mechanisms for when optional dependencies aren't available. No breaking changes to the existing API.


from packageurl import PackageURL

# Try to import existing Maven utilities for better code reuse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really feels like vibe coded. Do not use conditional imports, this is bad form

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

  1. reuse the code we have and drop your duplicated code entirely
  2. no conditional imports. Just do 1.
  3. the core mods should be in scancode-toolkit and then imported and reused here.
  4. you are likely smart. Do not vibe code or use LLMs to generate the stuff as that is wasting my precious review time. Write your own code please. I will not say that twice.

@sarafarajnasardi
Copy link
Author

Thank you for the feedback! I've completely rewritten the Maven detection code to use existing ScanCode Toolkit functions with minimal custom logic. The new implementation leverages packagedcode.get_package_handler() directly instead of manual parsing, reducing the code from 60+ lines to 18 lines while maintaining all functionality.

@sarafarajnasardi sarafarajnasardi force-pushed the maven-jar-return-jar-package-type-1836 branch 2 times, most recently from 39246ac to c4cdb81 Compare September 12, 2025 09:03
Use toolkit functions instead of custom parsing
Simplify coordinate extraction logic

Signed-off-by: Sara Faraj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

maven jar return 'jar' as package type instead of 'maven'
2 participants