David Thielen  
Home On Programming Articles, Etc. Blogs Resume

The RTF Specification

I have been through numerous battles trying to figure out the rtf format. This document is both a list of open questions I have as well as items I think I have figured out. If you can shed any more light on any of these, please shoot me an e-mail at david@thielen.com.

Questions

  • We have a problem that if we have "\f0\af0" the text after it is in italic, and in some cases bold. If we have just "\f0", then it's fine. How if \af0 which is the same font as \f0 causing this change sometimes?
  • How do you set the color of an underline in a list bullet? I have looked at created rtf files and \ulc is not there - yet it works. It is in \listtext but that we are supposed to ignore.
  • What are \irow and \irowband? They appear to be a 0-based count of the row number in a table. No idea why there are both tags.
  • And \lastrow seems to be an optional tag in the last row of a table.
  • \picw and \pich are defined as the picture width and height in pixels for a bitmap image. However, Word 2000 saves them as values that are 35.28 times the bitmap size.
  • What is the relation between \trleft, \clwWidth, and \cellx? It seems that \trleft + \clwWidth == \cellx (first cell). But that is not always true (it's not true in the rtf doc example).
  • How does \trgaph relate to the \trpadd* tags that set cell padding?
  • There is both row padding \trpadd* and cell padding \clpadd*. Which holds and if it's the cell padding, why is the row padding there?
  • The spec says \listoverrideformat will always be followed by a number. But Word 2000 emits the tag with no number. And how do you tell which level a value is overridden for?
  • It appears that after the last \cell in a row, you must put “\pard \ql \li0\ri0\widctlpar\intbl” before doing the {\trowd…\row} or Word will GPF. Exactly what is necessary here (and why)? Also, this comes after the paragraph row text – so what paragraph are these values assigned to?
  • \slN – what is N in – twips?
  • \trftsWidth says it is units for clwWidth. Isn’t it units for trwWidth?
  • A " character is saved as \'94 - which is not unicode for a quote - what is going on here
  • What is \faauto? It's used a lot but never explained.
  • \trautofit DOES have a (usually small) effect even if clwWidth and trwWidth are set!
  • The width of the last cell in a row seems to be set by \cellx, not clwWidth
  • A bullet character in a list is saved as a \u-3913 (0xf0b7). 0xb7 is unicode for a bullet - but that is a much smaller bullet than the one Word displays. 0xf*** is for user-defined chars so where is Word getting this from?

Answers

  • Word will write the normal style out as the first entry in the stylesheet with no style number. However OpenOffice will write it out as the non-first entry and will give it a style number.
  • If there is \pard\plain text instead of \pard\plain\fs24 text - then Word 2000 will display the text as 10 point - even though the spec says that the default for fs is 24 points.
  • What’s the difference between \line, \lbr, and \par? Well \lbr3 == \line (not sure what \lbr0-2 mean) and while it is a hard line break like a \par, it does not start a new para and therefore the next line follows the left indent, not the first indent.
  • Does each cell have it’s own paragraph formatting attributes? It appears not except for vertical alignment. Instead there are standard paragraph & character formatting within a cell.
  • Why are there row border values if each cell has border values? I don’t know but the cell border settings are what Word uses.
  • \plain means reset character formatting to nothing (bold, etc.) on, 12 point (or is it 10?) and the document default font. The values in style Normal are not used. And the only reset value that is not hardcoded across all documents is the font number. (Is there a list of what tags this resets?)
  • \pard means reset all paragraph formatting to default values (mostly 0’s). It does not use settings in the Normal style or any document level settings – everything is set to a hardcoded default. (Is there a list of what tags this resets?)
  • \s identifies the style for that paragraph – but has no effect on the format of that paragraph. In other words, nothing is changed in the formatting of a paragraph by the \s tag, all formatting comes from the formatting tags appearing in that paragraph.
  • In every rtf doc I have seen \intbl preceeds \itapN. The docs don’t say that’s required but I have a feeling most rtf readers will blow up if this order is reversed.
  • There is no documentation for \brdrnone – what is it? I assume no border but it still takes up the border width with blank space.
  • Word sometimes writes a table paragraph with no \trowd…\row. In this case all you have is a \intbl and in that case use the table & cell settings from the previous paragraph. I have only seen this happen with outer tables, not nested tables.

Comments from Robert Morley

  • For \irow and \irowband, have a look at the RTF specification 1.9 (and I think it was in 1.8 as well). They are, as you guessed, row numbers. The difference is that \irowband takes header rows into account...my understanding is that it numbers them all "-1", where \irow simply counts linearly. You may want to play with this and confirm, as I've never really looked at it and don't bother to emit either of them in the mini RTF writer I'm working on.
  • Similarly, \lastrow does indeed mark the last row in the table. It also seems to make no difference if you don't emit it.
  • For \pich and \picw, are the sizes given maybe related to the dpi of the picture? I'm only using WMF, which is different from how bitmaps are handled, so you'd have to experiment a bit with that.
  • \trleft is the left edge of the table row relative to the margin. It's often "related" to the cell width in some way for alignment purposes. For example, if you have a cell margin of 60 twips, then you might want to set \trleft to -60 so that the text aligns with previous non-table paragraphs instead of the border aligning with the text of the previous paragraph. Since \cellx - \clwWidth should leave you with the cell margin, \trleft and the cell margin will often, but not always be related.
  • For \trgaph, I'm not 100% sure of the relationship, but I think it's additive with \trpadd commands...sort of like an external and internal margin. It might also come into play if you have cell spacing turned on. Since you obviously have Word, I'd suggest playing and see if you can figure out more. I haven't had a need to, myself.
  • I also suspect that \trpadd is a "base" for \clpadd...i.e., if \clpadd isn't specified, probably \trpadd would take effect. Again, I'm not sure, so you'd have to play.
  • Next, on to \cell and \row...not sure what's necessary there, but I suspect not much. The format I emit in, which seems to work fine, is to emit \trowd (and all associated data) ONCE at the beginning of the table (not per row or both at start and end of row, as the docs suggest...this may break some readers, though). Anyway, after that, I emit the various cell formatting options, followed by \intbl. Then at the end of the row, I simply emit \cell\row with no other special commands whatsoever. At the end of the table, I emit \intbl0\par. That seems to work flawlessly in Word 2002.
  • I'm almost certain that \trftsWidth is for \trwWidth. At least in the 1.9 specs, they've actually intermixed the two (\trwWidth and \clwWidth) within the same descriptions, which I suspect is just a cut & paste error where they didn't adjust the commands after pasting them from \clftsWidth.
  • \' is character-set dependant and in hex notation. In standard ASCII, hex value 94 (decimal 148) is a closing quotation mark. Mostly likely, Word auto-replaced your straight quote with a closing quote. I suspect if you look in that same document, you'll probably also find a lot of \'93 which is an opening quote. It's probably better to use \ldblquote and \rdblquote, but Word doesn't seem to follow that standard when writing (though it respects it when reading).
  • \faauto is Font Alignment: Auto. There's also \fahang, \facenter, \faroman, \favar, and \fafixed. I believe these only apply to Far East documents, so are probably nothing for you to worry about.