Skip to content

Conversation

@maxfortun
Copy link

@maxfortun maxfortun changed the title updated UTF16 buffer boundary XALANJ-2725 Jan 24, 2024
}
else if (m_highUTF16Surrogate != 0 && Encodings.isLowUTF16Surrogate(ch)) {
// The complete utf16 byte sequence is now available and may be serialized.
if (! m_encodingInfo.isInEncoding(m_highUTF16Surrogate, ch)) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1602: If ch is high surrogate, dump clean chars and retain ch. This works within the same buffer or across buffer boundaries
1610: if we have a retained high surrogate and ch is a valid low surrogate, we have 2 use cases.
1. encoding doesn't support the multibyte char, escape entity.
2. encoding does support the multibyte char, output chars as is.

// encoding doesn't contain ch and it's not part
// of a surrogate pair
// The right thing is to write out an entity
if(m_highUTF16Surrogate != 0) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not a valid scenario and should throw an error. I am not certain what is the correct way to address a situation where we have seen a high surrogate not followed by a low one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I did encounter this scenario, that's why I coded this. But in theory this should not be happening.

@jkesselm
Copy link
Contributor

Looks more plausible, though I'm still not sure writing out an NCE for an isolated high surrogate is a win.

I still sorta hate the idea of adding Still More Conditionals to the rendering loop, but I don't see a great alternative.

This doesn't seem to fix what slipped between 2419 and master, alas. Lemme work on that and get back to you.

@jkesselm
Copy link
Contributor

Getting back to this issue ... I've got an idea on how to handle the look-behind with a bit less overhead, essentially as inlined tail-recursion, so we can avoid testing for leftover high surrogate every time. Lemme see what I can do with that.

@kubycsolutions
Copy link
Contributor

I believe a fix for this issue has already been committed; if you agree, we should close this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants