Strings¶
What is a String?¶
Depending on compiler setting, a String in FPC is an alias for;
ShortString(fixed 255 length),AnsiString(variable length) orUnicodeString(UTF16).
When {$H+} is not specified, or {$H-}, String is an alias for ShortString.
Any ShortString have a maximum length of 255 characters with the implicit codepage CP_ACP. Short strings are always assumed to use the system code page.
When {$H+} is specified, String is an alias for AnsiString.
Any String is essentially an AnsiString with the DefaultSystemCodePage declared in it; AnsiString(CP_ACP). And if the default system code page is 65001, then any String is UTF-8.
With {$mode delpiunicode} switch, string is an alias for Unicodestring string.
Commonly on Windows, the system code page is 1252. If the system code page is 1252, then any String is 1252.
Refs:
Display UTF-8 on a console¶
Alternatively, you can assign your UTF-8 test to a string variable.
Note
If you see garbage characters on console;
- your console might not support code page 65001, or
- your windows does not support UTF on API level (only read/write file in UTF-8)
See this answer from StackOverflow on how to enable code page 65001 on your console.
Warning
The same answer from StackOverflow also shows how to enable UTF-8 on Windows (system-wide).
DO NOT MISS the caveat section and comments in from that answer.
Enabling UTF-8 system-wide on Windows is currently in beta and could lead to unintended system-wide side effects.
Refs:
- https://wiki.freepascal.org/FPC_Unicode_support#Code_pages
- https://stackoverflow.com/a/57134096
- https://superuser.com/a/1435645
What is my system's default codepage?¶
See https://www.freepascal.org/docs-html/rtl/system/defaultsystemcodepage.html
If it says 65001, then you should be able to see UTF-8 characters on the console.
Remove trailing chars at the end of a string¶
Contribution
Gustavo 'Gus' Carreno, from the Unofficial Free Pascal Discord server, shared a neat trick to remove trailing characters by using SetLength(str, length(str) - n);.
Let's say you have a loop that append strings with trailing characters at the end.
One way to remove trailing characters is use a flag to inside the for loop. The logic would be: do not add commas or spaces if we are at the end of the loop.
A simpler way is to use SetLength(str, length(str) - n_chars_to_remove);.
See the example below.
- The
forloop completes a sentence with a comma and a space at the end. Line 19-20. - The trick;
SetLength(line, length(line) - 2);removes the last 2 chars from the end of the sentence. Line 29.