Strings¶
What is a String
?¶
Depending on compiler setting, a String
in FPC is an alias for;
ShortString
(fixed 255 length),AnsiString
(variable length) orUnicodeString
(UTF16).
When {$H+}
is not specified, or {$H-}
, String
is an alias for ShortString
.
Any ShortString
have a maximum length of 255 characters with the implicit codepage CP_ACP
. Short strings are always assumed to use the system code page.
When {$H+}
is specified, String
is an alias for AnsiString
.
Any String
is essentially an AnsiString
with the DefaultSystemCodePage
declared in it; AnsiString(CP_ACP)
. And if the default system code page is 65001
, then any String
is UTF-8
.
With {$mode delpiunicode}
switch, string
is an alias for Unicodestring
string.
Commonly on Windows, the system code page is 1252
. If the system code page is 1252
, then any String
is 1252
.
Refs:
Display UTF-8 on a console¶
Alternatively, you can assign your UTF-8 test to a string
variable.
Note
If you see garbage characters on console;
- your console might not support code page 65001, or
- your windows does not support UTF on API level (only read/write file in UTF-8)
See this answer from StackOverflow on how to enable code page 65001 on your console.
Warning
The same answer from StackOverflow also shows how to enable UTF-8 on Windows (system-wide).
DO NOT MISS the caveat section and comments in from that answer.
Enabling UTF-8 system-wide on Windows is currently in beta and could lead to unintended system-wide side effects.
Refs:
- https://wiki.freepascal.org/FPC_Unicode_support#Code_pages
- https://stackoverflow.com/a/57134096
- https://superuser.com/a/1435645
What is my system's default codepage?¶
See https://www.freepascal.org/docs-html/rtl/system/defaultsystemcodepage.html
If it says 65001
, then you should be able to see UTF-8 characters on the console.
Remove trailing chars at the end of a string¶
Contribution
Gustavo 'Gus' Carreno, from the Unofficial Free Pascal Discord server, shared a neat trick to remove trailing characters by using SetLength(str, length(str) - n);
.
Let's say you have a loop that append strings with trailing characters at the end.
One way to remove trailing characters is use a flag to inside the for
loop. The logic would be: do not add commas or spaces if we are at the end of the loop.
A simpler way is to use SetLength(str, length(str) - n_chars_to_remove);
.
See the example below.
- The
for
loop completes a sentence with a comma and a space at the end. Line 19-20. - The trick;
SetLength(line, length(line) - 2);
removes the last 2 chars from the end of the sentence. Line 29.