-
Notifications
You must be signed in to change notification settings - Fork 2
/
draft-ietf-precis-7564bis.xml
2458 lines (2135 loc) · 123 KB
/
draft-ietf-precis-7564bis.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc iprnotified="no" ?>
<?rfc sortrefs="yes"?>
<?rfc strict="yes"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocdepth="3"?>
<?rfc rfcedstyle="yes"?>
<rfc category="std" ipr="trust200902" docName="draft-ietf-precis-7564bis-10" obsoletes="7564">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<front>
<title abbrev="PRECIS Framework">PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols</title>
<author initials="P." surname="Saint-Andre" fullname="Peter Saint-Andre">
<organization>Filament</organization>
<address>
<postal>
<street>18335 E 103rd Ave, Suite 203</street>
<city>Commerce City</city>
<region>CO</region>
<code>80022</code>
<country>USA</country>
</postal>
<phone>+1 720 256 6756</phone>
<email>[email protected]</email>
<uri>https://filament.com/</uri>
</address>
</author>
<author initials="M." surname="Blanchet" fullname="Marc Blanchet">
<organization>Viagenie</organization>
<address>
<postal>
<street>246 Aberdeen</street>
<city>Quebec</city>
<region>QC</region>
<code>G1R 2E1</code>
<country>Canada</country>
</postal>
<email>[email protected]</email>
<uri>http://www.viagenie.ca/</uri>
</address>
</author>
<date/>
<keyword>internationalization</keyword>
<keyword>i18n</keyword>
<keyword>Stringprep</keyword>
<abstract>
<t>Application protocols using Unicode code points in protocol strings
need to properly handle such strings in order to enforce
internationalization rules for strings placed in various protocol slots
(such as addresses and identifiers) and to perform valid comparison
operations (e.g., for purposes of authentication or authorization).
This document defines a framework enabling application protocols to
perform the preparation, enforcement, and comparison of
internationalized strings ("PRECIS") in a way that depends on the
properties of Unicode code points and thus is more agile with respect to
versions of Unicode. As a result, this framework provides a more
sustainable approach to the handling of internationalized strings than
the previous framework, known as Stringprep (RFC 3454). This document
obsoletes RFC 7564.</t>
</abstract>
</front>
<middle>
<section title="Introduction" anchor='intro'>
<t>Application protocols using Unicode code points <xref
target='Unicode'/> in protocol strings need to properly handle such
strings in order to enforce internationalization rules for strings
placed in various protocol slots (such as addresses and identifiers) and
to perform valid comparison operations (e.g., for purposes of
authentication or authorization). This document defines a framework
enabling application protocols to perform the preparation, enforcement,
and comparison of internationalized strings ("PRECIS") in a way that
depends on the properties of Unicode code points and thus is more agile with
respect to versions of Unicode. (PRECIS is restricted to Unicode and does
not support any other coded character set <xref target='RFC6365'/>.)</t>
<t>As described in the PRECIS problem statement <xref
target='RFC6885'/>, many IETF protocols have used the Stringprep
framework <xref target='RFC3454'/> as the basis for preparing,
enforcing, and comparing protocol strings that contain Unicode
code points, especially code points outside the ASCII range <xref
target='RFC20'/>. The Stringprep framework was developed during work on
the original technology for internationalized domain names (IDNs), here
called "IDNA2003" <xref target='RFC3490'/>, and Nameprep <xref
target="RFC3491"/> was the Stringprep profile for IDNs. At the time,
Stringprep was designed as a general framework so that other application
protocols could define their own Stringprep profiles. Indeed, a number
of application protocols defined such profiles.</t>
<t>After the publication of <xref target='RFC3454'/> in 2002, several
significant issues arose with the use of Stringprep in the IDN case, as
documented in the IAB's recommendations regarding IDNs <xref
target='RFC4690'/> (most significantly, Stringprep was tied to Unicode
version 3.2). Therefore, the newer IDNA specifications, here called
"IDNA2008" (<xref target='RFC5890'/>, <xref target='RFC5891'/>, <xref
target='RFC5892'/>, <xref target='RFC5893'/>, <xref target='RFC5894'/>),
no longer use Stringprep and Nameprep. This migration away from
Stringprep for IDNs prompted other "customers" of Stringprep to consider
new approaches to the preparation, enforcement, and comparison of
internationalized strings, as described in <xref target='RFC6885'/>.</t>
<t>This document defines a framework for a post-Stringprep approach to
the preparation, enforcement, and comparison of internationalized
strings in application protocols, based on several principles:</t>
<t>
<list style='numbers'>
<t>Define a small set of string classes that specify the Unicode
code points appropriate for common application protocol constructs
(where possible, maintaining compatibility with IDNA2008 to help
ensure a more consistent user experience).</t>
<t>Define each PRECIS string class in terms of Unicode code points
and their properties so that an algorithm can be used to determine
whether each code point or character category is (a) valid,
(b) allowed in certain contexts, (c) disallowed, or
(d) unassigned.</t>
<t>Use an "inclusion model" such that a string class consists only
of code points that are explicitly allowed, with the result that any
code point not explicitly allowed is forbidden.</t>
<t>Enable application protocols to define profiles of the PRECIS
string classes if necessary (addressing matters such as width
mapping, case mapping, Unicode normalization, and directionality)
but strongly discourage the multiplication of profiles beyond
necessity in order to avoid violations of the "Principle of Least
Astonishment".</t>
</list>
</t>
<t>It is expected that this framework will yield the following
benefits:</t>
<t>
<list style="symbols">
<t>Application protocols will be more agile with regard to Unicode
versions (recognizing that complete agility cannot be realized in
practice).</t>
<t>Implementers will be able to share code point tables and software
code across application protocols, most likely by means of software
libraries.</t>
<t>End users will be able to acquire more accurate expectations
about the code points that are acceptable in various contexts. Given
this more uniform set of string classes, it is also expected that
copy/paste operations between software implementing different
application protocols will be more predictable and coherent.</t>
</list>
</t>
<t>Whereas the string classes define the "baseline" code points for a
range of applications, profiling enables application protocols to apply
the string classes in ways that are appropriate for common constructs
such as usernames <xref target='I-D.ietf-precis-7613bis'/>, opaque
strings such as passwords <xref target='I-D.ietf-precis-7613bis'/>,
and nicknames <xref target='I-D.ietf-precis-7700bis'/>. Profiles are
responsible for defining the handling of right-to-left code points as
well as various mapping operations of the kind also discussed for IDNs
in <xref target='RFC5895'/>, such as case preservation or lowercasing,
Unicode normalization, mapping of certain code points to other code points
or to nothing, and mapping of fullwidth and halfwidth code points.</t>
<t>When an application applies a profile of a PRECIS string class, it
transforms an input string (which might or might not be conforming) into
an output string that definitively conforms to the profile. In
particular, this document focuses on the resulting ability to achieve
the following objectives:</t>
<t>
<list style='letters'>
<t>Enforcing all the rules of a profile for a single output
string (e.g., to determine if a string can be included in a protocol
slot, communicated to another entity within a protocol, stored in a
retrieval system, etc.) to check whether the output string conforms
to the rules of the profile.</t>
<t>Comparing two output strings to determine if they are equivalent,
typically through octet-for-octet matching to test for
"bit&nbhy;string identity" (e.g., to make an access decision for
purposes of authentication or authorization as further described
in <xref target='RFC6943'/>).</t>
</list>
</t>
<t>The opportunity to define profiles naturally introduces the
possibility of a proliferation of profiles, thus potentially mitigating
the benefits of common code and violating user expectations. See <xref
target='profiles'/> for a discussion of this important topic.</t>
<t>In addition, it is extremely important for protocol designers and
application developers to understand that the transformation of an input
string to an output string is rarely reversible. As one relatively
simple example, case mapping would transform an input string of
"StPeter" to "stpeter", and information about the capitalization of the
first and third characters would be lost. Similar considerations apply
to other forms of mapping and normalization.</t>
<t>Although this framework is similar to IDNA2008 and includes by
reference some of the character categories defined in <xref
target='RFC5892'/>, it defines additional character categories to meet
the needs of common application protocols other than DNS.</t>
<t>The character categories and calculation rules defined under
Sections <xref target="PropertyCalculation" format="counter"/>
and <xref target="categories" format="counter" /> are normative and
apply to all Unicode code points. The code point table that
results from applying the character categories and calculation
rules to the latest version of Unicode can be found in an IANA
registry.</t>
</section>
<section title="Terminology" anchor="terms">
<t>Many important terms used in this document are defined in <xref
target='RFC5890'/>, <xref target='RFC6365'/>, <xref target='RFC6885'/>,
and <xref target='Unicode'/>. The terms "left-to-right" (LTR) and
"right-to-left" (RTL) are defined in Unicode Standard Annex #9 <xref
target='UAX9'/>.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in <xref
target='RFC2119'/>.</t>
</section>
<section title="Preparation, Enforcement, and Comparison" anchor="precis">
<t>This document distinguishes between three different actions that an
entity can take with regard to a string:</t>
<t>
<list style='symbols'>
<t>Enforcement entails applying all of the rules specified for a
particular string class or profile thereof to a single input string,
for the purpose of checking whether the string conforms to all of
the rules and thus determining if the string can be used in a given
protocol slot.</t>
<t>Comparison entails applying all of the rules specified for a
particular string class or profile thereof to two separate input strings,
for the purpose of determining if the two strings are equivalent.</t>
<t>Preparation primarily entails ensuring that the code points in a
single input string are allowed by the underlying PRECIS string
class, and sometimes also entails applying one or more of the rules
specified for a particular string class or profile thereof.
Preparation can be appropriate for constrained devices that can
to some extent restrict the code points in a string to a limited
repertoire of characters but that do not have the processing power or onboard
memory to perform operations such as Unicode normalization.
However, preparation does not ensure that an input string conforms
to all of the rules for a string class or profile thereof.
<list style='empty'><t>Note: The term "preparation" as used in
this specification and related documents has a much more limited
scope than it did in Stringprep; it essentially refers to a kind
of preprocessing of an input string, not the actual operations
that apply internationalization rules to produce an output string
(here termed "enforcement") or to compare two output strings (here
termed "comparison").</t></list>
</t>
</list>
</t>
<t>In most cases, authoritative entities such as servers are responsible
for enforcement, whereas subsidiary entities such as clients are
responsible only for preparation. The rationale for this distinction is
that clients might not have the facilities (in terms of device memory and
processing power) to enforce all the rules regarding internationalized
strings (such as width mapping and Unicode normalization), although they
can more easily limit the repertoire of characters they offer to an end
user. By contrast, it is assumed that a server would have more capacity
to enforce the rules, and in any case acts as an authority regarding
allowable strings in protocol slots such as addresses and endpoint
identifiers. In addition, a client cannot necessarily be trusted to
properly generate such strings, especially for security-sensitive contexts
such as authentication and authorization.</t>
</section>
<section title="String Classes" anchor='classes'>
<section title="Overview" anchor='classes-overview'>
<t>Starting in 2010, various "customers" of Stringprep began to
discuss the need to define a post-Stringprep approach to the
preparation and comparison of internationalized strings other than
IDNs. This community analyzed the existing Stringprep profiles and
also weighed the costs and benefits of defining a relatively small set
of Unicode code points that would minimize the potential for user
confusion caused by visually similar code points (and thus be
relatively "safe") vs. defining a much larger set of Unicode
code points that would maximize the potential for user creativity (and
thus be relatively "expressive"). As a result, the community
concluded that most existing uses could be addressed by two string
classes:</t>
<t>
<list style='hanging'>
<t hangText="IdentifierClass:">a sequence of letters, numbers, and
some symbols that is used to identify or address a network entity
such as a user account, a venue (e.g., a chatroom), an information
source (e.g., a data feed), or a collection of data (e.g., a
file); the intent is that this class will minimize user confusion
in a wide variety of application protocols, with the result that
safety has been prioritized over expressiveness for this
class.</t>
<t hangText="FreeformClass:">a sequence of letters, numbers,
symbols, spaces, and other code points that is used for free-form
strings, including passwords as well as display elements such as
human-friendly nicknames for devices or for participants in a
chatroom; the intent is that this class will allow nearly any
Unicode code point, with the result that expressiveness has been
prioritized over safety for this class. Note well that protocol
designers, application developers, service providers, and end
users might not understand or be able to enter all of the
code points that can be included in the FreeformClass -- see <xref
target='security-freeformclass'/> for details.</t>
</list>
</t>
<t>Future specifications might define additional PRECIS string
classes, such as a class that falls somewhere between the
IdentifierClass and the FreeformClass. At this time, it is not clear
how useful such a class would be. In any case, because application
developers are able to define profiles of PRECIS string classes, a
protocol needing a construct between the IdentifierClass and the
FreeformClass could define a restricted profile of the FreeformClass
if needed.</t>
<t>The following subsections discuss the IdentifierClass and
FreeformClass in more detail, with reference to the dimensions
described in Section 5 of <xref target='RFC6885'/>. Each string
class is defined by the following behavioral rules:</t>
<t>
<list style='hanging'>
<t hangText='Valid:'>Defines which code points are treated as
valid for the string.</t>
<t hangText='Contextual Rule Required:'>Defines which code points
are treated as allowed only if the requirements of a contextual
rule are met (i.e., either CONTEXTJ or CONTEXTO as originally
defined in the IDNA2008 specifications).</t>
<t hangText='Disallowed:'>Defines which code points need to be
excluded from the string.</t>
<t hangText='Unassigned:'>Defines application behavior in the
presence of code points that are unknown (i.e., not yet
designated) for the version of Unicode used by the
application.</t>
</list>
</t>
<t>This document defines the valid, contextual rule required,
disallowed, and unassigned rules for the IdentifierClass and
FreeformClass. As described under <xref target='profiles'/>, profiles
of these string classes are responsible for defining the width
mapping, additional mappings, case mapping, normalization, and
directionality rules.</t>
</section>
<section title="IdentifierClass" anchor="classes-id">
<t>Most application technologies need strings that can be used to
refer to, include, or communicate protocol strings like usernames,
filenames, data feed identifiers, and chatroom names. We group such
strings into a class called "IdentifierClass" having the following
features.</t>
<section title='Valid' anchor='classes-id-valid'>
<t>
<list style='symbols'>
<t>Code points traditionally used as letters and numbers in
writing systems, i.e., the LetterDigits ("A") category first
defined in <xref target='RFC5892'/> and listed here under <xref
target='A'/>.</t>
<t>Code points in the range U+0021 through U+007E, i.e., the
(printable) ASCII7 ("K") category defined under <xref target='K'/>.
These code points are "grandfathered" into PRECIS and thus are
valid even if they would otherwise be disallowed according to
the property-based rules specified in the next section.</t>
</list>
</t>
<t><list style='empty'><t>Note: Although the PRECIS IdentifierClass
reuses the LetterDigits category from IDNA2008, the range of
code points allowed in the IdentifierClass is wider than the range of
code points allowed in IDNA2008. The main reason is that IDNA2008
applies the Unstable category before the LetterDigits category, thus
disallowing uppercase code points, whereas the IdentifierClass does
not apply the Unstable category.</t></list></t>
</section>
<section title='Contextual Rule Required' anchor='classes-id-contextual'>
<t>
<list style='symbols'>
<t>A number of code points from the Exceptions ("F") category
defined under <xref target='F'/> (see <xref target='F'/> for a
full list).</t>
<t>Joining code points, i.e., the JoinControl ("H") category
defined under <xref target='H'/>.</t>
</list>
</t>
</section>
<section title='Disallowed' anchor='classes-id-disallowed'>
<t>
<list style='symbols'>
<t>Old Hangul Jamo code points, i.e., the OldHangulJamo ("I")
category defined under <xref target='I'/>.</t>
<t>Control code points, i.e., the Controls ("L") category defined
under <xref target='L'/>.</t>
<t>Ignorable code points, i.e., the PrecisIgnorableProperties
("M") category defined under <xref target='M'/>.</t>
<t>Space code points, i.e., the Spaces ("N") category defined
under <xref target='N'/>.</t>
<t>Symbol code points, i.e., the Symbols ("O") category defined
under <xref target='O'/>.</t>
<t>Punctuation code points, i.e., the Punctuation ("P") category
defined under <xref target='P'/>.</t>
<t>Any code point that is decomposed and recomposed into something
other than itself under Unicode normalization form KC, i.e., the
HasCompat ("Q") category defined under <xref target='Q'/>.
These code points are disallowed even if they would otherwise be
valid according to the property-based rules specified in the
previous section.</t>
<t>Letters and digits other than the "traditional" letters and
digits allowed in IDNs, i.e., the OtherLetterDigits ("R")
category defined under <xref target='R'/>.</t>
</list>
</t>
</section>
<section title='Unassigned' anchor='classes-id-unassigned'>
<t>Any code points that are not yet designated in the Unicode
coded character set are considered unassigned for purposes of the
IdentifierClass, and such code points are to be treated as
disallowed. See <xref target='J'/>.</t>
</section>
<section title='Examples' anchor='classes-id-examples'>
<t>As described in the Introduction to this document, the string
classes do not handle all issues related to string preparation and
comparison (such as case mapping); instead, such issues are handled
at the level of profiles. Examples for profiles of the
IdentifierClass can be found in <xref target='I-D.ietf-precis-7613bis'/>
(the UsernameCaseMapped and UsernameCasePreserved profiles).</t>
</section>
</section>
<section title="FreeformClass" anchor="classes-free">
<t>Some application technologies need strings that can be used in a
free-form way, e.g., as a password in an authentication exchange (see
<xref target='I-D.ietf-precis-7613bis'/>) or a nickname in a
chatroom (see <xref target='I-D.ietf-precis-7700bis'/>). We group
such things into a class called "FreeformClass" having the following
features.</t>
<t><list style='empty'><t>Security Warning: As mentioned, the
FreeformClass prioritizes expressiveness over safety; <xref
target='security-freeformclass'/> describes some of the security
hazards involved with using or profiling the
FreeformClass.</t></list></t>
<t><list style='empty'><t>Security Warning: Consult <xref
target='security-passwords'/> for relevant security considerations
when strings conforming to the FreeformClass, or a profile thereof,
are used as passwords.</t></list></t>
<section title='Valid' anchor='classes-free-valid'>
<t>
<list style='symbols'>
<t>Traditional letters and numbers, i.e., the LetterDigits ("A")
category first defined in <xref target='RFC5892'/> and listed
here under <xref target='A'/>.</t>
<t>Letters and digits other than the "traditional" letters and
digits allowed in IDNs, i.e., the OtherLetterDigits ("R")
category defined under <xref target='R'/>.</t>
<t>Code points in the range U+0021 through U+007E, i.e., the
(printable) ASCII7 ("K") category defined under <xref
target='K'/>.</t>
<t>Any code point that is decomposed and recomposed into something
other than itself under Unicode normalization form KC, i.e., the
HasCompat ("Q") category defined under <xref target='Q'/>.</t>
<t>Space code points, i.e., the Spaces ("N") category defined
under <xref target='N'/>.</t>
<t>Symbol code points, i.e., the Symbols ("O") category defined
under <xref target='O'/>.</t>
<t>Punctuation code points, i.e., the Punctuation ("P") category
defined under <xref target='P'/>.</t>
</list>
</t>
</section>
<section title='Contextual Rule Required' anchor='classes-free-contextual'>
<t>
<list style='symbols'>
<t>A number of code points from the Exceptions ("F") category
defined under <xref target='F'/> (see <xref target='F'/> for a
full list).</t>
<t>Joining code points, i.e., the JoinControl ("H") category
defined under <xref target='H'/>.</t>
</list>
</t>
</section>
<section title='Disallowed' anchor='classes-free-disallowed'>
<t>
<list style='symbols'>
<t>Old Hangul Jamo code points, i.e., the OldHangulJamo ("I")
category defined under <xref target='I'/>.</t>
<t>Control code points, i.e., the Controls ("L") category defined
under <xref target='L'/>.</t>
<t>Ignorable code points, i.e., the PrecisIgnorableProperties
("M") category defined under <xref target='M'/>.</t>
</list>
</t>
</section>
<section title='Unassigned' anchor='classes-free-unassigned'>
<t>Any code points that are not yet designated in the Unicode
coded character set are considered unassigned for purposes of the
FreeformClass, and such code points are to be treated as
disallowed.</t>
</section>
<section title='Examples' anchor='classes-free-examples'>
<t>As described in the Introduction to this document, the string
classes do not handle all issues related to string preparation and
comparison (such as case mapping); instead, such issues are handled
at the level of profiles. Examples for profiles of the
FreeformClass can be found in <xref target='I-D.ietf-precis-7613bis'/>
(the OpaqueString profile) and <xref target='I-D.ietf-precis-7700bis'/>
(the Nickname profile).</t>
</section>
</section>
<section title="Summary" anchor='classes-summary'>
<t>The following table summarizes the differences between the
IdentifierClass and the FreeformClass, i.e., the disposition
of a code point as valid, contextual rule required, disallowed, or
unassigned) depending on its PRECIS category.</t>
<figure align="center">
<artwork align="center"><![CDATA[
+===============================+=================+===============+
| CATEGORY | IDENTIFIERCLASS | FREEFORMCLASS |
+===============================+=================+===============+
| (A) LetterDigits | Valid | Valid |
+-------------------------------+-----------------+---------------+
| (B) Unstable | [N/A (unused)] |
+-------------------------------+-----------------+---------------+
| (C) IgnorableProperties | [N/A (unused)] |
+-------------------------------+-----------------+---------------+
| (D) IgnorableBlocks | [N/A (unused)] |
+-------------------------------+-----------------+---------------+
| (E) LDH | [N/A (unused)] |
+-------------------------------+-----------------+---------------+
| (F) Exceptions | Contextual | Contextual |
| | Rule Required | Rule Required |
+-------------------------------+-----------------+---------------+
| (G) BackwardCompatible | [Handled by IDNA Rules] |
+-------------------------------+-----------------+---------------+
| (H) JoinControl | Contextual | Contextual |
| | Rule Required | Required |
+-------------------------------+-----------------+---------------+
| (I) OldHangulJamo | Disallowed | Disallowed |
+-------------------------------+-----------------+---------------+
| (J) Unassigned | Unassigned | Unassigned |
+-------------------------------+-----------------+---------------+
| (K) ASCII7 | Valid | Valid |
+-------------------------------+-----------------+---------------+
| (L) Controls | Disallowed | Disallowed |
+-------------------------------+-----------------+---------------+
| (M) PrecisIgnorableProperties | Disallowed | Disallowed |
+-------------------------------+-----------------+---------------+
| (N) Spaces | Disallowed | Valid |
+-------------------------------+-----------------+---------------+
| (O) Symbols | Disallowed | Valid |
+-------------------------------+-----------------+---------------+
| (P) Punctuation | Disallowed | Valid |
+-------------------------------+-----------------+---------------+
| (Q) HasCompat | Disallowed | Valid |
+-------------------------------+-----------------+---------------+
| (R) OtherLetterDigits | Disallowed | Valid |
+-------------------------------+-----------------+---------------+
]]></artwork>
<postamble>Table 1: Comparative Disposition of Code Points</postamble>
</figure>
</section>
</section>
<section title="Profiles" anchor="profiles">
<t>This framework document defines the valid,
contextual-rule-required, disallowed, and unassigned rules for the
IdentifierClass and the FreeformClass. A profile of a PRECIS string
class MUST define the width mapping, additional mappings (if any),
case mapping, normalization, and directionality rules. A profile MAY
also restrict the allowable code points above and beyond the definition
of the relevant PRECIS string class (but MUST NOT add as valid any
code points that are disallowed by the relevant PRECIS string class).
These matters are discussed in the following subsections.</t>
<t>Profiles of the PRECIS string classes are registered with the IANA
as described under <xref target='iana-profiles'/>. Profile names use
the following convention: they are of the form "Profilename of
BaseClass", where the "Profilename" string is a differentiator and
"BaseClass" is the name of the PRECIS string class being profiled; for
example, the profile of the FreeformClass used for opaque strings
such as passwords is the OpaqueString profile <xref
target='I-D.ietf-precis-7613bis'/>.</t>
<section title="Profiles Must Not Be Multiplied beyond Necessity" anchor="profiles-proliferation">
<t>The risk of profile proliferation is significant because having too
many profiles will result in different behavior across various
applications, thus violating what is known in user interface design as
the "Principle of Least Astonishment".</t>
<t>Indeed, we already have too many profiles. Ideally we would have
at most two or three profiles. Unfortunately, numerous application
protocols exist with their own quirks regarding protocol strings.
Domain names, email addresses, instant messaging addresses, chatroom
nicknames, filenames, authentication identifiers, passwords, and other
strings are already out there in the wild and need to be supported in
existing application protocols such as DNS, SMTP, the
Extensible Messaging and Presence Protocol (XMPP),
Internet Relay Chat (IRC), NFS, the Internet Small Computer System
Interface (iSCSI), the Extensible Authentication Protocol (EAP),
and the Simple Authentication and Security Layer (SASL), among
others.</t>
<t>Nevertheless, profiles must not be multiplied beyond necessity.</t>
<t>To help prevent profile proliferation, this document recommends
sensible defaults for the various options offered to profile creators
(such as width mapping and Unicode normalization). In addition, the
guidelines for designated experts provided under <xref
target='guidelines'/> are meant to encourage a high level of due
diligence regarding new profiles.</t>
</section>
<section title="Rules" anchor="profiles-rules">
<section title="Width Mapping Rule" anchor="profiles-principles-width">
<t>The width mapping rule of a profile specifies whether width
mapping is performed on a string, and how the
mapping is done. Typically, such mapping consists of mapping
fullwidth and halfwidth code points, i.e., code points with a
Decomposition Type of Wide or Narrow, to their decomposition
mappings; as an example, FULLWIDTH DIGIT ZERO (U+FF10) would be
mapped to DIGIT ZERO (U+0030).</t>
<t>The normalization form specified by a profile (see below) has an
impact on the need for width mapping. Because width mapping is
performed as a part of compatibility decomposition, a profile
employing either normalization form KD (NFKD) or normalization form
KC (NFKC) does not need to specify width mapping. However, if
Unicode normalization form C (NFC) is used (as is recommended) then
the profile needs to specify whether to apply width mapping; in this
case, width mapping is in general RECOMMENDED because allowing
fullwidth and halfwidth code points to remain unmapped to their
compatibility variants would violate the "Principle of Least
Astonishment". For more information about the concept of width in
East Asian scripts within Unicode, see Unicode Standard Annex #11
<xref target='UAX11'/>.</t>
<t><list style='empty'><t>Note: Because the East Asian width
property is not guaranteed to be stable by the Unicode Standard
(see <eref target='http://unicode.org/policies/stability_policy.html'/>
for details), the results of applying a given width mapping rule
might not be consistent across different versions of Unicode.</t></list></t>
</section>
<section title="Additional Mapping Rule" anchor="profiles-principles-additional">
<t>The additional mapping rule of a profile specifies whether
additional mappings are performed on a string, such
as:</t>
<t>
<list>
<t>Mapping of delimiter code points (such as '@', ':', '/', '+',
and '-')</t>
<t>Mapping of special code points (e.g., non-ASCII space
code points to ASCII space or control code points to nothing).</t>
</list>
</t>
<t>The PRECIS mappings document <xref
target='RFC7790'/> describes such mappings in more
detail.</t>
</section>
<section title="Case Mapping Rule" anchor="profiles-principles-case">
<t>The case mapping rule of a profile specifies whether case mapping
(instead of case preservation) is performed on a
string, and how the mapping is applied (e.g., mapping uppercase and
titlecase code points to their lowercase equivalents).</t>
<t>If case mapping is desired (instead of case preservation), it is
RECOMMENDED to use the Unicode toLowerCase() operation defined in the
Unicode Standard <xref target='Unicode'/>. In contrast to the Unicode
toCaseFold() operation, the toLowerCase() operation is less likely to violate
the "Principle of Least Astonishment", especially when an application
merely wishes to convert uppercase and titlecase code points to the
lowercase equivalents while preserving lowercase code points. Although
the toCaseFold() operation can be appropriate when an application needs
to compare two strings (such as in search operations), in general few
application developers and even fewer users understand its implications,
so toLowerCase() is almost always the safer choice.</t>
<t><list style='empty'><t>Note: Neither toLowerCase() nor toCaseFold() is
designed to handle various language-specific issues (such as so-called
"dotless i" in several Turkic languages). The reader is referred to
the PRECIS mappings document <xref target='RFC7790'/>, which describes
these issues in greater detail.</t></list></t>
<t>In order to maximize entropy and minimize the potential for false
accepts, it is NOT RECOMMENDED for application protocols to map
uppercase and titlecase code points to their lowercase equivalents
when strings conforming to the FreeformClass, or a profile thereof,
are used in passwords; instead, it is RECOMMENDED to preserve the
case of all code points contained in such strings and then perform
case-sensitive comparison. See also the related discussion
in <xref target="security-passwords"/> and
in <xref target='I-D.ietf-precis-7613bis'/>.</t>
</section>
<section title="Normalization Rule" anchor="profiles-principles-normalization">
<t>The normalization rule of a profile specifies which Unicode
normalization form (D, KD, C, or KC) is to be applied (see Unicode
Standard Annex #15 <xref target='UAX15'/> for background
information).</t>
<t>In accordance with <xref target='RFC5198'/>, normalization form C
(NFC) is RECOMMENDED.</t>
<t>Protocol designers and application developers need to understand
that use certain Unicode normalization forms, especially NFKC and NFKD,
can result in significant loss of information in various circumstances,
and that these circumstances can vary depending on the language and script
of the strings to which the normalization forms are applied. Extreme
care should be taken when specifying the use of these normalization forms.</t>
</section>
<section title="Directionality Rule" anchor="profiles-principles-directionality">
<t>The directionality rule of a profile specifies how to treat
strings containing what are often called "right-to-left" (RTL)
code points (see Unicode Standard Annex #9 <xref target='UAX9'/>).
RTL code points come from scripts that are normally written from
right to left and are considered by Unicode to, themselves, have
right-to-left directionality. Some strings containing RTL
code points also contain "left-to-right" (LTR) code points, such as
ASCII numerals, as well as code points without directional properties.
Consequently, such strings are known as "bidirectional strings".</t>
<t>Presenting bidirectional strings in different layout systems
(e.g., a user interface that is configured to handle primarily an
RTL script vs. an interface that is configured to handle primarily
an LTR script) can yield display results that, while predictable to
those who understand the display rules, are counter-intuitive to
casual users. In particular, the same bidirectional string (in
PRECIS terms) might not be presented in the same way to users of
those different layout systems, even though the presentation is
consistent within any particular layout system. In some
applications, these presentation differences might be considered
problematic and thus the application designers might wish to
restrict the use of bidirectional strings by specifying a
directionality rule. In other applications, these presentation
differences might not be considered problematic (this especially
tends to be true of more "free-form" strings) and thus no
directionality rule is needed.</t>
<t>The PRECIS framework does not directly address how to deal with
bidirectional strings across all string classes and profiles, and
does not define any new directionality rules, because at present there
is no widely accepted and implemented solution for the safe display
of arbitrary bidirectional strings beyond the Unicode bidirectional
algorithm <xref target='UAX9'/>. Although rules for management and
display of bidirectional strings have been defined for domain name
labels and similar identifiers through the "Bidi Rule" specified in
the IDNA2008 specification on right-to-left scripts <xref
target='RFC5893'/>, those rules are quite restrictive and are not
necessarily applicable to all bidirectional strings.</t>
<t>The authors of a PRECIS profile might believe that they need to
define a new directionality rule of their own. Because of the
complexity of the issues involved, such a belief is almost always
misguided, even if the authors have done a great deal of careful
research into the challenges of displaying bidirectional strings.
This document strongly suggests that profile authors who are
thinking about defining a new directionality rule think again, and
instead consider using the "Bidi Rule" <xref target='RFC5893'/> (for
profiles based on the IdentifierClass) or following the Unicode
bidirectional algorithm <xref target='UAX9'/> (for profiles based on
the FreeformClass or in situations where the IdentifierClass is not
appropriate).</t>
</section>
</section>
<section title="A Note about Spaces" anchor="profiles-space">
<t>With regard to the IdentifierClass, the consensus of the PRECIS
Working Group was that spaces are problematic for many reasons,
including the following:</t>
<t>
<list style='symbols'>
<t>Many Unicode code points are confusable with ASCII space.</t>
<t>Even if non-ASCII space code points are mapped to ASCII space
(U+0020), space code points are often not rendered in user
interfaces, leading to the possibility that a human user might
consider a string containing spaces to be equivalent to the same
string without spaces.</t>
<t>In some locales, some devices are known to generate a code point
other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
user performs an action like hitting the space bar on a
keyboard.</t>
</list>
</t>
<t>One consequence of disallowing space code points in the
IdentifierClass might be to effectively discourage their use within
identifiers created in newer application protocols; given the
challenges involved with properly handling space code points
(especially non-ASCII space code points) in identifiers and other
protocol strings, the PRECIS Working Group considered this to be a
feature, not a bug.</t>
<t>However, the FreeformClass does allow spaces, which enables
application protocols to define profiles of the FreeformClass that are
more flexible than any profiles of the IdentifierClass. In addition,
as explained in <xref target="apps-constructs"/>, application
protocols can also define application-layer constructs containing
spaces.</t>
</section>
</section>
<section title="Applications" anchor="apps">
<section title="How to Use PRECIS in Applications" anchor="apps-howto">
<t>Although PRECIS has been designed with applications in mind,
internationalization is not suddenly made easy through the use of
PRECIS. Indeed, because it is extremely difficult for protocol
designers and application developers to do the right thing for all
users when supporting internationalized strings, often the safest
option is to support only the ASCII range <xref target='RFC20'/>
in various protocol slots. This state of affairs is unfortunate
but is the direct result of the complexities involved with human
languages (e.g., the vast number of code points, scripts, user
communities, and rules with their inevitable exceptions), which
kinds of strings application developers and their users wish to
support, the wide range of devices that users employ to access
services enabled by various Internet protocols, and so on.</t>
<t>Despite these significant challenges, application and protocol
developers sometimes persevere in attempting to support internationalized
strings in their systems. These developers need to think carefully about
how they will use the PRECIS string classes, or profiles thereof, in their
applications. This section provides some guidelines to application
developers (and to expert reviewers of application protocol
specifications).</t>
<t>
<list style='symbols'>
<t>Don't define your own profile unless absolutely necessary (see
<xref target="profiles-proliferation"/>). Existing profiles have
been designed for wide reuse. It is highly likely that an existing
profile will meet your needs, especially given the ability to
specify further excluded code points (<xref
target='apps-exclusion'/>) and to build application-layer
constructs (see <xref target='apps-constructs'/>).</t>
<t>Do specify:
<list style='symbols'>
<t>Exactly which entities are responsible for preparation,
enforcement, and comparison of internationalized strings
(e.g., servers or clients).</t>
<t>Exactly when those entities need to complete their tasks
(e.g., a server might need to enforce the rules of a profile
before allowing a client to gain network access).</t>
<t>Exactly which protocol slots need to be checked against
which profiles (e.g., checking the address of a message's
intended recipient against the UsernameCaseMapped profile
<xref target='I-D.ietf-precis-7613bis'/> of the
IdentifierClass, or checking the password of a user against
the OpaqueString profile <xref
target='I-D.ietf-precis-7613bis'/> of the
FreeformClass).</t>
</list>
See <xref target='I-D.ietf-precis-7613bis'/> and <xref target='RFC7622'/> for definitions of these matters for several applications.</t>
</list>
</t>
</section>
<section title="Further Excluded Characters" anchor="apps-exclusion">
<t>An application protocol that uses a profile MAY specify particular
code points that are not allowed in relevant slots within that
application protocol, above and beyond those excluded by the string
class or profile.</t>
<t>That is, an application protocol MAY do either of the
following:</t>
<t>
<list style='numbers'>
<t>Exclude specific code points that are allowed by the relevant
string class.</t>
<t>Exclude code points matching certain Unicode properties (e.g.,
math symbols) that are included in the relevant PRECIS string
class.</t>
</list>
</t>
<t>As a result of such exclusions, code points that are defined as
valid for the PRECIS string class or profile will be defined as
disallowed for the relevant protocol slot.</t>
<t>Typically, such exclusions are defined for the purpose of
backward compatibility with legacy formats within an application
protocol. These are defined for application protocols, not profiles,
in order to prevent multiplication of profiles beyond necessity (see
<xref target='profiles-proliferation'/>).</t>
</section>
<section title="Building Application-Layer Constructs" anchor="apps-constructs">
<t>Sometimes, an application-layer construct does not map in a
straightforward manner to one of the base string classes or a profile
thereof. Consider, for example, the "simple user name" construct in
the Simple Authentication and Security Layer (SASL) <xref
target='RFC4422'/>. Depending on the deployment, a simple user name
might take the form of a user's full name (e.g., the user's personal
name followed by a space and then the user's family name). Such a
simple user name cannot be defined as an instance of the
IdentifierClass or a profile thereof, because space code points are not
allowed in the IdentifierClass; however, it could be defined using a
space-separated sequence of IdentifierClass instances, as in the
following ABNF <xref target='RFC5234'/> from <xref
target='I-D.ietf-precis-7613bis'/>:</t>
<figure>
<artwork><![CDATA[
username = userpart *(1*SP userpart)
userpart = 1*(idpoint)
;
; an "idpoint" is a Unicode code point that
; can be contained in a string conforming to