-
Notifications
You must be signed in to change notification settings - Fork 46
UTF16 buffer boundary XALANJ-2725 #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
UTF16 buffer boundary XALANJ-2725 #166
Conversation
| } | ||
| else if (m_highUTF16Surrogate != 0 && Encodings.isLowUTF16Surrogate(ch)) { | ||
| // The complete utf16 byte sequence is now available and may be serialized. | ||
| if (! m_encodingInfo.isInEncoding(m_highUTF16Surrogate, ch)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1602: If ch is high surrogate, dump clean chars and retain ch. This works within the same buffer or across buffer boundaries
1610: if we have a retained high surrogate and ch is a valid low surrogate, we have 2 use cases.
1. encoding doesn't support the multibyte char, escape entity.
2. encoding does support the multibyte char, output chars as is.
| // encoding doesn't contain ch and it's not part | ||
| // of a surrogate pair | ||
| // The right thing is to write out an entity | ||
| if(m_highUTF16Surrogate != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not a valid scenario and should throw an error. I am not certain what is the correct way to address a situation where we have seen a high surrogate not followed by a low one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I did encounter this scenario, that's why I coded this. But in theory this should not be happening.
|
Looks more plausible, though I'm still not sure writing out an NCE for an isolated high surrogate is a win. I still sorta hate the idea of adding Still More Conditionals to the rendering loop, but I don't see a great alternative. This doesn't seem to fix what slipped between 2419 and master, alas. Lemme work on that and get back to you. |
|
Getting back to this issue ... I've got an idea on how to handle the look-behind with a bit less overhead, essentially as inlined tail-recursion, so we can avoid testing for leftover high surrogate every time. Lemme see what I can do with that. |
|
I believe a fix for this issue has already been committed; if you agree, we should close this pull request. |
https://issues.apache.org/jira/browse/XALANJ-2725