- 
                Notifications
    
You must be signed in to change notification settings  - Fork 18
 
fix: Resolve XML validation error in OutputGitRepoXML function #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fixed XML generation to properly handle special characters and CDATA sections - Added protection against premature CDATA termination by escaping "]]>" sequences - Improved XML formatting with consistent indentation and structure - Simplified token placeholder replacement without breaking formatting
This commit adds support for a .gptinclude file, which allows users to explicitly specify which files should be included in the repository export. The feature complements the existing .gptignore functionality: - When both .gptinclude and .gptignore exist, files are first filtered by the include patterns, then any matching ignore patterns are excluded - Added new command-line flag: -I/--include to specify a custom path to the .gptinclude file - Default behavior looks for .gptinclude in repository root - Added comprehensive tests for the new functionality - Updated README.md with documentation and examples With this change, users gain more fine-grained control over which parts of their repositories are processed by git2gpt, making it easier to focus on specific areas when working with AI language models.
This commit fixes an issue where the XML export would fail with "unexpected EOF in CDATA section" errors when file content contained the CDATA end marker sequence ']]>'. The fix implements a proper CDATA handling strategy that: - Detects all occurrences of ']]>' in file content - Splits the content around these markers - Creates properly nested CDATA sections to preserve the original content - Ensures all XML output is well-formed regardless of source content This approach maintains the efficiency of CDATA for storing large code blocks while ensuring compatibility with all possible file content. Fixes the XML validation error that would occur when processing files containing CDATA end marker sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes XML validation errors and improves file filtering by adding support for an include list alongside the existing ignore list. Key changes include:
- Enhancing the OutputGitRepoXML function to safely handle CDATA sections.
 - Introducing functions to generate and process a .gptinclude file alongside .gptignore.
 - Updating command-line flags and function signatures in ProcessGitRepo and processRepository.
 
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
| File | Description | 
|---|---|
| prompt/prompt.go | Updated XML output handling, added include list functions, and adjusted repo processing logic. | 
| prompt/gptinclude_test.go | Added test cases for the new include filtering functionality. | 
| cmd/root.go | Added a new flag for the include file and updated ProcessGitRepo invocation. | 
| README.md | Updated documentation to cover include file usage along with ignore file usage. | 
Files not reviewed (1)
- .gptinclude: Language not supported
 
Comments suppressed due to low confidence (2)
prompt/prompt.go:324
- [nitpick] Consider renaming variable 'process' to 'shouldProcess' for improved readability and clarity.
 
process := shouldProcess(relativeFilePath, includeList, ignoreList)
prompt/prompt.go:135
- Consider handling the error returned from getIgnoreList instead of ignoring it, to ensure any issues with reading the ignore file are properly reported.
 
ignoreList, _ = getIgnoreList(ignoreFilePath)
| 
               | 
          ||
| // Split content around CDATA end marker (]]>) and create multiple CDATA sections | ||
| contents := file.Contents | ||
| result.WriteString(" <contents>") | 
    
      
    
      Copilot
AI
    
    
    
      Apr 29, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the algorithm splitting file.Contents into multiple CDATA sections properly handles consecutive ']]>' sequences to avoid malformed XML.
          
 You are right. I messed up with the branches. Thank you again for this amazing project!  | 
    
Pull Request: Fix XML validation error in OutputGitRepoXML function
Description
This PR fixes the XML validation error that occurs when using the
-xflag to export repositories as XML. The specific error message was:Changes Made
OutputGitRepoXMLfunction inprompt/prompt.goto properly handle XML special characters&,<,>,",') in file pathsTesting Done
I tested this fix by:
git2gpt -x -s -e -o output.xml .on repositories containing files with special XML characters]]>) to ensure they're properly escapedRelated Issue
Fixes #15
Before/After
Before: The tool fails with XML validation errors when files contain special characters
After: The tool successfully generates valid XML regardless of file content