From 446ddeec53285f8d35a98a55c7b52f695f4a19be Mon Sep 17 00:00:00 2001
From: Per Cederqvist <ceder@lysator.liu.se>
Date: Fri, 28 Dec 2001 22:23:51 +0000
Subject: [PATCH] (Simple Data Types): Talk a little about character sets under
 	HOLLERITH, without saying anything definite.  (Bug 339).

---
 doc/Protocol-A.texi | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/doc/Protocol-A.texi b/doc/Protocol-A.texi
index 11a709a2a..5a629d7fb 100644
--- a/doc/Protocol-A.texi
+++ b/doc/Protocol-A.texi
@@ -1033,6 +1033,24 @@ nulls.
 
 Long live FORTRAN!
 
+@cindex character set
+@cindex Unicode
+The character set used in the strings is not yet specified by Protocol
+A.  In the future, some Unicode encoding will probably be used, but it
+is not yet decided which one or how the transition will be handled.
+@url{http://bugzilla.lysator.liu.se/show_bug.cgi?id=99, Bug 99} is
+about the need for a Unicode roadmap; check that bug for the current
+state of the plans.
+
+For now, which character set to use is a local policy of each server
+installation.  There is not yet any way in the protocol to specify the
+character set that a certain server uses.  Most clients currently
+assume that ISO 8859-1 (Latin-1) is used, and the default collate
+table of lyskomd also assumes ISO 8859-1.  Conference names must
+currently use an 8-bit character set encoding where whitespace is
+defined as in ASCII, or conference matching won't work.
+@reqlink{get-collate-table} contains some more information about
+character set issues.
 
 
 @anchor{BITSTRING}
@@ -9774,4 +9792,4 @@ End:
 @c  LocalWords:  rec recpt ref regexp regexps rkom sans stat struct submitters
 @c  LocalWords:  sven svensson swascii sync synched synching texinfo tkom kent
 @c  LocalWords:  ttykom uconf undef unmark userid username val varg yoruba dont
-@c  LocalWords:  Nyheter davby Testconf com
+@c  LocalWords:  Nyheter davby Testconf com Unicode roadmap
-- 
GitLab