Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions src/utils/__tests__/text-normalization.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,5 +75,30 @@ describe("Text normalization utilities", () => {
expect(unescapeHtmlEntities("")).toBe("")
expect(unescapeHtmlEntities(undefined as unknown as string)).toBe(undefined)
})

it("unescapes square bracket entities (numeric)", () => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to group all the square bracket tests together in a nested describe block for better organization? This would make the test structure clearer and easier to navigate.

expect(unescapeHtmlEntities("string[]")).toBe("string[]")
expect(unescapeHtmlEntities("array[0]")).toBe("array[0]")
expect(unescapeHtmlEntities("matrix[i][j]")).toBe("matrix[i][j]")
})

it("unescapes square bracket entities (named)", () => {
expect(unescapeHtmlEntities("string[]")).toBe("string[]")
expect(unescapeHtmlEntities("array[0]")).toBe("array[0]")
expect(unescapeHtmlEntities("matrix[i][j]")).toBe("matrix[i][j]")
})

it("handles C# array syntax with escaped square brackets", () => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! This test case directly addresses the reported issue with C# array syntax. The test is well-documented with a comment explaining its purpose.

// Test case based on the reported issue
const input = "string[] myArray = new string[5];\nmyArray[2] = "hello";"
const expected = 'string[] myArray = new string[5];\nmyArray[2] = "hello";'
expect(unescapeHtmlEntities(input)).toBe(expected)
})

it("handles mixed square bracket entities", () => {
const input = "array[0] and [1]"
const expected = "array[0] and [1]"
expect(unescapeHtmlEntities(input)).toBe(expected)
})
})
})
4 changes: 4 additions & 0 deletions src/utils/text-normalization.ts
Original file line number Diff line number Diff line change
Expand Up @@ -91,5 +91,9 @@ export function unescapeHtmlEntities(text: string): string {
.replace(/"/g, '"')
.replace(/'/g, "'")
.replace(/'/g, "'")
.replace(/[/g, "[")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of replacements intentional here? I notice that the ampersand replacement is placed last, which seems correct since replacing it first could interfere with other entity replacements. Would it be worth adding a comment explaining why it must be last to prevent future reordering mistakes?

.replace(/]/g, "]")
.replace(/[/g, "[")
.replace(/]/g, "]")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this PR fixes the square bracket issue, are there other HTML entities that Gemini might escape that we're not handling? For example: parentheses (( and )), curly braces ({ and }), or other mathematical/programming symbols. Might be worth investigating what other entities Gemini commonly escapes in code contexts.

.replace(/&/g, "&")
}
Loading