From 446ddeec53285f8d35a98a55c7b52f695f4a19be Mon Sep 17 00:00:00 2001 From: Per Cederqvist <ceder@lysator.liu.se> Date: Fri, 28 Dec 2001 22:23:51 +0000 Subject: [PATCH] (Simple Data Types): Talk a little about character sets under HOLLERITH, without saying anything definite. (Bug 339). --- doc/Protocol-A.texi | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/doc/Protocol-A.texi b/doc/Protocol-A.texi index 11a709a2a..5a629d7fb 100644 --- a/doc/Protocol-A.texi +++ b/doc/Protocol-A.texi @@ -1033,6 +1033,24 @@ nulls. Long live FORTRAN! +@cindex character set +@cindex Unicode +The character set used in the strings is not yet specified by Protocol +A. In the future, some Unicode encoding will probably be used, but it +is not yet decided which one or how the transition will be handled. +@url{http://bugzilla.lysator.liu.se/show_bug.cgi?id=99, Bug 99} is +about the need for a Unicode roadmap; check that bug for the current +state of the plans. + +For now, which character set to use is a local policy of each server +installation. There is not yet any way in the protocol to specify the +character set that a certain server uses. Most clients currently +assume that ISO 8859-1 (Latin-1) is used, and the default collate +table of lyskomd also assumes ISO 8859-1. Conference names must +currently use an 8-bit character set encoding where whitespace is +defined as in ASCII, or conference matching won't work. +@reqlink{get-collate-table} contains some more information about +character set issues. @anchor{BITSTRING} @@ -9774,4 +9792,4 @@ End: @c LocalWords: rec recpt ref regexp regexps rkom sans stat struct submitters @c LocalWords: sven svensson swascii sync synched synching texinfo tkom kent @c LocalWords: ttykom uconf undef unmark userid username val varg yoruba dont -@c LocalWords: Nyheter davby Testconf com +@c LocalWords: Nyheter davby Testconf com Unicode roadmap -- GitLab