fromUnicode() documentation bug
Posted: Thu Apr 01 2021 1:50 pm
All,
As part of trying to track down a real answer to this issue: https://forum.copperspice.com/viewtopic.php?f=11&t=1755
Unraveling this
ultimately tracks back to a method that returns either toUtf8() of a string _or_ the QByteArray from QTextCodec::fromUnicode()
Under Qt
=====
Detailed Description. QString stores a string of 16-bit QChars, where each QChar corresponds to one UTF-16 code unit. (Unicode characters with code values above 65535 are stored using surrogate pairs, i.e., two consecutive QChars.).
=====
So, whoever did this
should have really done text.toUtf16() so both sides of the if match.
No problem. Hapless coder goes looking to see what fromUnicode() returns in its byte array under CopperSpice
Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.
Is the content of this QChar32 so everything is 4-bytes wide and no need for high surrogate logic?
Is this still returning a QByteArray of UTF-16?
There is a similar issue for QTextCodec::convertFromUnicode
Basically, every place in the documentation where QByteArray is returned from something that could have been some form of a string the documentation needs to spell out what form the content is really in.
There is no safe assumption here.
As part of trying to track down a real answer to this issue: https://forum.copperspice.com/viewtopic.php?f=11&t=1755
Unraveling this
Code: Select all
for (unsigned int i = 0; i < commitStrLen;) {
const unsigned int ucWidth = commitStr.at(i).isHighSurrogate() ? 2 : 1;
const QString oneCharUTF16 = commitStr.mid(i, ucWidth);
const QByteArray oneChar = sqt->BytesForDocument(oneCharUTF16);
sqt->InsertCharacter(std::string_view(oneChar.data(), oneChar.length()), EditModel::CharacterSource::directInput);
i += ucWidth;
}
Under Qt
=====
Detailed Description. QString stores a string of 16-bit QChars, where each QChar corresponds to one UTF-16 code unit. (Unicode characters with code values above 65535 are stored using surrogate pairs, i.e., two consecutive QChars.).
=====
So, whoever did this
Code: Select all
QByteArray ScintillaQt::BytesForDocument(const QString &text) const
{
if (IsUnicodeMode()) {
return text.toUtf8();
} else {
QTextCodec *codec = QTextCodec::codecForName(
CharacterSetID(CharacterSetOfDocument()));
return codec->fromUnicode(text);
}
}
No problem. Hapless coder goes looking to see what fromUnicode() returns in its byte array under CopperSpice
Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.
Is the content of this QChar32 so everything is 4-bytes wide and no need for high surrogate logic?
Is this still returning a QByteArray of UTF-16?
There is a similar issue for QTextCodec::convertFromUnicode
Basically, every place in the documentation where QByteArray is returned from something that could have been some form of a string the documentation needs to spell out what form the content is really in.
There is no safe assumption here.