fromUnicode() documentation bug

Report any problems with CopperSpice
Post Reply
seasoned_geek
Posts: 105
Joined: Thu Jun 11 2020 12:18 pm

fromUnicode() documentation bug

Post by seasoned_geek »

All,

As part of trying to track down a real answer to this issue: https://forum.copperspice.com/viewtopic.php?f=11&t=1755

Unraveling this

Code: Select all

		for (unsigned int i = 0; i < commitStrLen;) {
			const unsigned int ucWidth = commitStr.at(i).isHighSurrogate() ? 2 : 1;
			const QString oneCharUTF16 = commitStr.mid(i, ucWidth);
			const QByteArray oneChar = sqt->BytesForDocument(oneCharUTF16);

			sqt->InsertCharacter(std::string_view(oneChar.data(), oneChar.length()), EditModel::CharacterSource::directInput);
			i += ucWidth;
		}
ultimately tracks back to a method that returns either toUtf8() of a string _or_ the QByteArray from QTextCodec::fromUnicode()

Under Qt
=====
Detailed Description. QString stores a string of 16-bit QChars, where each QChar corresponds to one UTF-16 code unit. (Unicode characters with code values above 65535 are stored using surrogate pairs, i.e., two consecutive QChars.).
=====

So, whoever did this

Code: Select all

QByteArray ScintillaQt::BytesForDocument(const QString &text) const
{
	if (IsUnicodeMode()) {
		return text.toUtf8();
	} else {
		QTextCodec *codec = QTextCodec::codecForName(
				CharacterSetID(CharacterSetOfDocument()));
		return codec->fromUnicode(text);
	}
}
should have really done text.toUtf16() so both sides of the if match.

No problem. Hapless coder goes looking to see what fromUnicode() returns in its byte array under CopperSpice

Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.

Is the content of this QChar32 so everything is 4-bytes wide and no need for high surrogate logic?
Is this still returning a QByteArray of UTF-16?

There is a similar issue for QTextCodec::convertFromUnicode

Basically, every place in the documentation where QByteArray is returned from something that could have been some form of a string the documentation needs to spell out what form the content is really in.

There is no safe assumption here.

ansel
Posts: 109
Joined: Fri Apr 10 2015 8:23 am

Re: fromUnicode() documentation bug

Post by ansel »

The intent of the QTextCodec class is to convert between a QString and some specified text encoding. For Qt, the QString is in UCS-2, whereas for CopperSpice the QString is in UTF-8. In either case, the QByteArray that is returned by QTextCodec::fromUnicode will be in whatever encoding was requested when calling QTextCodec::codecForName(). The purpose of the QTextCodec class has not changed although we did redesign the implementation.
Ansel Sermersheim
CopperSpice Cofounder

Post Reply