Skip to content

Converting from GB18030 encoded bytes to chars throws Exception #110521

@uBpringlIoNaRys

Description

@uBpringlIoNaRys

Description

Converting from GB18030 encoded data to string throws an exception in net9.

Reproduction Steps

This code throws the exception on net9 but not on net8.

using System.Text;

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var encoding = Encoding.GetEncoding("GB18030");

ReadOnlySpan<byte> encodedBytes = [0x95, 0x32, 0xB7, 0x37];

// This call throws, encoding GetString or GetChars as well.
var actual = encoding.GetCharCount(encodedBytes);

Console.WriteLine($"CharCount: {actual}");

var bytes = encoding.GetBytes("𠈓");
Console.WriteLine($"EncodedBytes of character match encodedBytes span: {bytes.AsSpan().SequenceEqual(encodedBytes)}");

Expected behavior

The conversion from bytes to chars does not throw an exception.

Actual behavior

Unhandled exception. System.ArgumentException: The output char buffer is too small to contain the decoded characters, encoding 'Chinese Simplified (GB18030)' fallback 'System.Text.DecoderReplacementFallback'. (Parameter 'chars')
   at System.Text.EncodingNLS.ThrowCharsOverflow()
   at System.Text.EncodingNLS.ThrowCharsOverflow(DecoderNLS decoder, Boolean nothingDecoded)
   at System.Text.EncodingCharBuffer.AddChar(Char ch1, Char ch2, Int32 numBytes)
   at System.Text.GB18030Encoding.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder)
   at System.Text.GB18030Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
   at System.Text.Encoding.GetCharCount(ReadOnlySpan`1 bytes)
   at Program.<Main>$(String[] args) in C:\git\Receiver\SYC_Appliance\Test\GB18030DecodeFailure\Program.cs:line 12
   

Regression?

Yes

Known Workarounds

none

Configuration

.net 9
Windows
x64

Other information

There was a change (861164c) that modified the if statement in EncodingCharBuffer.AddChar. Maybe the logic for uninitialized chars and "counting" has changed.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions