Skip to content

Conversation

peteretelej
Copy link
Owner

Detect encoding detection for diff files instead of failing miserably, especially for Windows specific ones windows1252, bom :(, latin-1
Strip markers to avoid breaking reads.

Fixes #1

@peteretelej peteretelej requested a review from Copilot July 17, 2025 21:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements encoding detection for diff files to fix issues with Windows-specific encodings (Windows-1252, BOM, Latin-1) that were previously causing failures. The changes enable robust handling of various file encodings instead of failing when UTF-8 decoding fails.

  • Replaces single UTF-8 file reading with multi-encoding detection (UTF-8, UTF-8-sig, cp1252, latin-1)
  • Adds BOM stripping functionality to handle UTF-8 files with byte order marks
  • Includes comprehensive test coverage for different encoding scenarios

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

File Description
src/parser.py Implements new _read_diff_file method with multi-encoding support and BOM handling
tests/test_windows_repro.py Adds test class to verify encoding scenarios work correctly
tests/test_data/minimal_*.diff Test fixtures for UTF-8, Windows CRLF, BOM, and Latin-1 encoded diff files
.github/ISSUE_TEMPLATE/bug_report.yml Simplifies bug report template by removing required fields and verbose descriptions
Comments suppressed due to low confidence (1)

tests/test_windows_repro.py:33

  • The assertion only checks that chunks > 0 but doesn't validate the actual content was parsed correctly. Consider adding assertions that verify the parsed content matches expected values for each encoding scenario.
            assert result["chunks"] > 0, f"{filename} should work with fix"

Copy link

codecov bot commented Jul 17, 2025

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@peteretelej peteretelej merged commit 0317d53 into main Jul 17, 2025
9 checks passed
@peteretelej peteretelej deleted the fix/encoding branch July 17, 2025 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: windows: some diffs fail with "No valid diff content found"

1 participant