Specification for RTF --------------------- RTF text is a form of encoding of various text formatting properties, document structures, and document properties, using the printable ASCII character set. Special characters can be also thus encoded, although RTF does not prevent the utilization of character codes outside the ASCII printable set. The main encoding mechanism of "control words" provides a name space that may be later used to expand the realm of RTF with macros, programming, etc. 1. BASIC INGREDIENTS Control words are of the form: \lettersequence where . is: . a space: the space is part of the control word. . a digit or - means that a parameter follows. The following digit sequence is then delimited by a space or any other non-letter-or-digit as for control words. . any other non-letter-or digit: terminates the control word, but is not a part of the control word. By "letter:, here we mean just the upper and lower case ASCII letters. Control symbols consist of a \ character followed by a single nonletter. They require no further delimiting. Notes: control symbols are compact, but there are not too many of them. The number of possible control words are not limited. The parameter is partially incorporated in control symbols, so that a program that does not understand a control symbol can recognize and ignore the corresponding parameter as well. In addition to control words and control symbols, there are also the braces: { group start, and } group end. The text grouping will be used for formatting and to delineate document structure - such as the footnotes, headers, title, and so on. The control words, control symbols, and braces constitute control information. All other characters in RTF text constitute "plain text". Since the characters \, {, and } have specific uses in RFT, the control symbols \\,\{, and \} are provided to express the corresponding plain characters. 2. WHAT RFT TEXT MEANS (SEMANTICS) The reader of a RFT stream will be concerned with: Separating control information from plain text. Acting on control information. This is designed to be a relatively simple process, as described below. Some control information just contributes special characters to the plain text stream. Other information serves to change the "program state" which includes properties of the document as a whole and also a stack of "group states" that apply to parts. Note that the group state is saved by the { brace and is restored by the } brace. The current group state specifies: 1. the "destination" or part of the document that the plain text is building up. 2. the character formatting properties - such as bold or italic. 3. the paragraph formatting properties - such as justified. 4. the section formatting properties - such as number of columns. Collecting and properly disposing of the remaining "plain text" as directed by the current group state. In practice the RFT reader will proceed as follows: 0. read next char 1. if ={ stack current state. current state does not change. continue. 2. if =} unstack current state from stack. this will change the state in general. 3. if =\ collect control word/control symbol and parameter, if any. look up word/symbol in symbol table (a constant table) and act according to the description there. The different actions are listed below. Parameter is left available for use by the action. Leave read pointer before or after the delimiter, as appropriate. After the action, continue. 4. otherwise, write "plain text" character to current destination using current formatting properties. Given a symbol table etry, the possible actions are as follows: A. Change destination: change destination to the destination described in the entry. Most destination changes are legal only immediately after a {. Other restrictions may also apply (for example, footnotes may not be nested.) B. Change formatting property: The symbol table entry will describe the property and whether the parameter is required. C. Special character: The symbol table entry will describe the character code.. goto 4. D. End of paragraph This could be viewed as just a special character. E. End of section This could be viewed as just a special character. F. Ignore 3. SPECIAL CHARACTERS The special characters are explained as they exist in Mac Word. Clearly, other characters may be added for interchange with other programs. If a character name is not recognized by a reader, according to the rules described above, it will be simply ignored. \chpgn current page number (as in headers) \chftn auto numbered footnote reference (footnote to follow in a group) \chpict placeholder character for picture (picture to follow in a group) \chdate current date (as in headers) \chtime current time (as in headers) \| formula character \~ non-breaking space \- non-required hyphen \_ non-breaking hyphen \page required page break \line required line break (no paragraph break) \par end of paragraph. \sect end of section and end of paragraph. \tab same as ASCII 9 For simplicity of opertation, the ASCII codes 9 and 10 will be accepted as \tab and \par respectively. ASCII 13 will be ignored. The control code \<10> will be ignored. It may be used to include "soft" carriage returns for easier readibility but which will have no effect on the interpretation. 4. DESTINATIONS The change of destination will reset all properties to default. Changes are legal only at the beginning of a group (by group here we mean the text and controls enclosed in braces.) \rtf The destination is the document. The parameter is the version number of the writer. This destination preceded by { the beginnings of RTF documents and the corresponding } marks the end. Legal only once after the initial {. Small scale interchange of RTF where other methods for marking the end of string are available, as in a string constant, need not include this identification but will start with this destination as the default. \pict The destination is a picture. The group must immediately follow a \chpict character. The plain text describes the picture as a hex dump (string of characters 0,1,... 9, a, ..., e, f.) (Formatting properties to determine data interpretation, size) \footnote The destination is a footnote text. The group must immediately follow the footntoe reference character(s). \header The destination is the header text for the current section. The group must precede the first plain text character in the section. \headerl Same as above, but header for left-hand pages. \headerr Same as above, but header for right-hand pages. \headerf Same as above, but header for first page. \footer Same as above, but footer. \footerl Same as above, but footer for left-hand pages. \footerr Same as above, but footer for right-hand pages. \footerf Same as above, but header for first page. \ftnsep Same as above, but text is footnote separator \ftnsepc Same as above, but text is separator for continued footnotes. \ftncn Same as above, but text is continued footnote notice. \info text is information block for the document. Parts of the text is further classified by "properties" of the text that are listed below - such as "title". These are not formatting properties, but a device to delimit and identify parts of the info from the text in the group. \stylesheet text is the style sheet for the document. More precisely, text between semicolons are taken to be style names which will be defined to stand for the formatting properties which are in effect. \fonttbl font table. See below. \colortbl color table. See below. \comment text will be ignored. 5. DOCUMENT FORMATTING PROPERTIES (000 stands for a number which may be signed) \paperw000 paper width in twips 12240 \paperh000 paper height 15840 \margl000 left margin 1800 \margr000 right margin 1800 \margt000 top margin 1440 \margb000 bottom margin 1440 \facingp facing pages \gutter000 gutter width \deftab000 default tab width 720 \widowctrl enable widow control \endnotes footnotes at end of section \ftnbj footnotes at bottom of page default \ftntj footnotes beneath text (top just) \ftnstart000 starting footnote number 1 \ftnrestart restart footnote numbers each page \pgnstart000 starting page number 1 \linestart000 starting line number 1 \landscape printed in landscape format (the "next file" property will be encoded in the info text ) 6. SECTION FORMATTING PROPERTIES \sectd reset to default section properties \nobreak break code \colbreak break code default \pagebreak break code \evenbreak break code \oddbreak break code \pgnrestart restart page numbers at 1 \pgndec page number format decimal default \pgnucrm page number format uc roman \pgnlcrm page number format lc roman \pgnucltr page number format uc letter \pgnlcltr page number format lc letter \pgnx000 auto page number x pos 720 \pgny000 auto page number y pos 720 \linemod000 line number modulus \linex000 line number - text distance 360 \linerestart line number restart at 1 default \lineppage line number restart on each page \linecont line number continued from prev section \headery000 header y position from top of page 720 \footery000 footer y position from bottom of page 720 \cols000 number of columns 1 \colsx000 space between columns 720 \endnhere include endnotes in this section \titlepg title page is special 7. PARAGRAPH FORMATTING PROPERTIES \pard dreset to default para properties. \s000 style \ql quad left default \ql right \qj justified \qc centered \fi000 first line indent \li000 left indent \ri000 right indent \sb000 space before \sa000 space after \sl000 space between lines \keep keep \keepn keep with next para \sbys side by side \pagebb page break before \noline no line numbering \brdrt border top \brdrb border bottom \brdrl border left \brdrr border right \box border all around \brdrs single thickness \brdrth thick \brdrsh shadow \brdrdb double \tx000 tab position \tqr right flush tab (these apply to last specified pos) \tqc centered tab \tqdec decimal aligned tab \tldot leader dots \tlhyph leader hyphens \tlul leader underscore \tlth leader thick line 8. CHARACTER FORMATTING PROPERTIES \plain reset to default text properties. \b bold \i italic \strike strikethrough \outl outline \shad shadow \scaps small caps \caps all caps \v invisible text \f000 font number n \fs000 font size in half points 24 \ul underline \ulw word underline \uld dotted underline \uldb double underline \up000 superscript in half points \dn000 subscript in half points 9. INFO GROUP The plain text in the group is used to sepcify the various fields of the information block. The current field may be thought of as a particular setting of the "sub-destination" property of the text.. \title following plain text is the title \subject following text is the subject \operator \author \keywords \doccomm comments (not to be cofused with \comment ) \version \nextfile following text is name of "next" file The other properties assign their parameters directly to the info block. \verno000 internal version number \creatim creation time follows \yr000 year to be assigned to previously specified time field \mo000 \dy000 \hr000 \min000 \sec000 \revtim revision time follows \printtim print time follows \buptim backup time follows \edmins00 editing minutes \nofpages000 \nofwords000 \noofchars000 \id000 internal id number