The purpose of this page is to provide documentation for the TFM file
format. Much of the content of this page was taken directly from the
tftopl.web program, written by Donal Knuth (this explains the several
references to the TFtoPL program; these can be ignored).
TFM File Formant
The idea behind TFM files is that typesetting routines like TeX
need a compact way to store the relevant information about several dozen fonts, and
computer centers need a compact way to store the relevant information about several
hundred fonts. TFM files are compact, and most of the information they
contain is highly relevant, so they provide a solution to the problem.
The information in a TFM file appears in a sequence of 8-bit bytes. Since
the number of bytes is always a multiple of 4, we could also regard the file as a
sequence of 32-bit words; but TeX uses the byte interpretation, and so
does TFtoPL. Note that the bytes are considered to be unsigned numbers.
The first 24 bytes (6 words) of a TFM file contain twelve 16-bit integers
that give the lengths of the various subsequent portions of the file. These twelve integers
are, in order:
TFM Length Data | |
|---|---|
| Name | Description |
lf | length of the entire file, in words |
lh | length of the header data, in words |
bc | smallest character code in the font |
ec | largest character code in the font |
nw | number of words in the width table |
nh | number of words in the height table |
nd | number of words in the depth table |
ni | number of words in the italic correction table |
nl | number of words in the lig/kern table |
nk | number of words in the kern table |
ne | number of words in the extensible character table |
np | number of font parameter words |
They are all nonnegative and less than 2^15. We must have
bc-1 <= ec < =255 |
ne <=256 |
lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np |
Note that a font may contain as many as 256 characters (if bc=0 and
ec=255), and as few as 0 characters (if bc=ec+1).
Incidentally, when two or more 8-bit bytes are combined to form an integer of 16 or more bits, the most significant bytes appear first in the file. This is called BigEndian order.
The rest of the TFM file may be regarded as a sequence of ten data arrays
having the informal specification
TFM File Body Structure | ||
|---|---|---|
| Name | Range | Type |
header | [0..lh-1] | stuff |
char_info | [bc..ec] | char_info_word |
width | [0..nw-1] | fix_word |
height | [0..nh-1] | fix_word |
depth | [0..nd-1] | fix_word |
italic | [0..ni-1] | fix_word |
lig_kern | [0..nl-1] | lig_kern_command |
kern | [0..nk-1] | fix_word |
exten | [0..ne-1] | extensible_recipe |
param | [1..np] | fix_word |
The most important data type used here is a fix_word, which is a 32-bit
representation of a binary fraction. A fix_word is a signed quantity, with
the two's complement of the entire word used to represent negation. Of the 32 bits in a
fix_word, exactly 12 are to the left of the binary point; thus, the largest
fix_word value is 2048-2^{-20}, and the smallest is -2048.
We will see below, however, that all but one of the fix_word values will lie
between -16 and +16.
3.1 The
header Array
The first data array is a block of header information, which contains general facts about
the font. The header must contain at least two words, and for TFM files to
be used with Xerox printing software it must contain at least 18 words, allocated as
described below. When different kinds of devices need to be interfaced, it may be
necessary to add further words to the header block.
header[0] is a 32-bit check sum that TeX will copy into the
DVI output file whenever it uses the font. Later on when the DVI
file is printed, possibly on another computer, the actual font that gets used is supposed
to have a check sum that agrees with the one in the TFM file used by
TeX. In this way, users will be warned about potential incompatibilities.
(However, if the check sum is zero in either the font file or the TFM file,
no check is made.) The actual relation between this check sum and the rest of the
TFM file is not important; the check sum is simply an identification number
with the property that incompatible fonts almost always have distinct check sums.
header[1] is a fix_word containing the design size of the
font, in units of TeX points (7227 TeX points = 254 cm).
This number must be at least 1.0; it is fairly arbitrary, but usually the design size
is 10.0 for a ``10 point'' font, i.e., a font that was designed to look best at a
10-point size, whatever that really means. When a TeX user asks for a font
at delta pt, the effect is to override the design size and replace it by
delta, and to multiply the x and y coordinates
of the points in the font image by a factor of delta divided by the design
size. All other dimensions in the TFM file are fix_word
kern-1pt numbers in design-size units. Thus, for example, the value of
param[6], one em or \quad, is often the
fix_word value 2^{20}=1.0, since many fonts have a design
size equal to one em. The other dimensions must be less than 16 design-size units in
absolute value; thus, header[1] and param[1] are the only
fix_word entries in the whole TFM file whose first byte might
be something besides 0 or 255.
header[2..11], if present, contains 40 bytes that identify the character
coding scheme. The first byte, which must be between 0 and 39, is the number of
subsequent ASCII bytes actually relevant in this string, which is intended to specify
what character-code-to-symbol convention is present in the font. Examples are
ASCII for standard ASCII, TeX text for fonts like cmr10
and cmti9, TeX math extension for cmex10,
XEROX text for Xerox fonts, GRAPHIC for special-purpose
non-alphabetic fonts, UNSPECIFIED for the default case when there is no
information. Parentheses should not appear in this name. (Such a string is said to be
in BCPL format.)
header[12..16], if present, contains 20 bytes that name the font family
(e.g., CMR or HELVETICA), in BCPL format. This
field is also known as the ``font identifier.''
header[17], if present, contains a first byte called the
seven_bit_safe_flag, then two bytes that are ignored, and a fourth byte
called the face. If the value of the fourth byte is less than 18, it has
the following interpretation as a ``weight, slope, and expansion'': Add 0 or 2 or 4
(for medium or bold or light) to 0 or 1 (for roman or italic) to 0 or 6 or 12 (for
regular or condensed or extended). For example, 13 is 0+1+12, so it represents medium
italic extended. A three-letter code (e.g., MIE) can be used for such
face data.
header[18..whatever] might also be present; the individual words are
simply called header[18], header[19], etc., at the moment.
char_info Array
Next comes the char_info array, which contains one char_info_word
per character. Each char_info_word contains six fields packed into four
bytes as follows.
char_info_word Data Type | ||
|---|---|---|
| Byte no. | Name | Size |
| 1 | width_index | 8 bits |
| 2 | height_index (times 16) | 4 bits |
depth_index | 4 bits | |
| 3 | italic_index (times 4) | 6 bits |
tag | 2 bits | |
| 4 | remainder | 8 bits |
The actual width of a character is width[width_index], in design-size
units; this is a device for compressing information, since many characters have the same
width. Since it is quite common for many characters to have the same height, depth, or
italic correction, the TFM format imposes a limit of 16 different heights,
16 different depths, and 64 different italic corrections.
Incidentally, the relation width[0]=height[0]=depth[0]=italic[0]=0 should
always hold, so that an index of zero implies a value of zero. The width_index
should never be zero unless the character does not exist in the font, since a character
is valid if and only if it lies between bc and ec and has a
nonzero width_index.
The tag field in a char_info_word has four values that explain
how to interpret the remainder field.
The tag field in char_info_word | ||
|---|---|---|
tag value | Name | Description |
| 0 | no_tag | means that remainder is unused. |
| 1 | lig_tag | means that this character has a
ligature/kerning program starting at lig_kern[remainder]. |
| 2 | list_tag | means that this character is part of
a chain of characters of ascending sizes, and not the largest in the chain. The remainder field gives the character code of the next larger character. |
| 3 | ext_tag | means that this character code
represents an extensible character, i.e., a character that is built up of smaller pieces so that it can be made arbitrarily large. The pieces are specified in exten[remainder]. |
lig_kern Array
The lig_kern array contains instructions in a simple programming language
that explains what to do for special letter pairs. Each word is a lig_kern_command
of four bytes.
lig_kern_command Data Type | ||
|---|---|---|
| Byte no. | Field Name | Description |
| 1 | skip_byte | indicates that this is
the final program step if the byte is 128 or more, otherwise the next step is obtained by skipping this number of intervening steps. |
| 2 | next_char | ``if next_char
follows the current character, then perform the operation and stop, otherwise continue.'' |
| 3 | op_byte | indicates a ligature step if less than 128, a kern step otherwise. |
| 4 | remainder | |
In a kern step, an additional space equal to kern[256*(op_byte-128)+remainder]
is inserted between the current character and next_char. This amount is
often negative, so that the characters are brought closer together by kerning; but it
might be positive.
There are eight kinds of ligature steps, having op_byte codes 4a+2b+c
where 0 <= a <= b+c and 0 <= b,c <= 1. The character
whose code is remainder is inserted between the current character and
next_char; then the current character is deleted if b=0, and
next_char is deleted if c=0; then we pass over a
characters to reach the next current character (which may have a ligature/kerning program
of its own).
Notice that if a=0 and b=1, the current character is unchanged;
if a=b and c=1, the current character is changed but the next
character is unchanged. TFtoPL will check to see that infinite loops are
avoided.
If the very first instruction of the lig_kern array has skip_byte=255,
the next_char byte is the so-called right boundary character of this font;
the value of next_char need not lie between bc and ec.
If the very last instruction of the lig_kern array has skip_byte=255,
there is a special ligature/kerning program for a left boundary character, beginning at
location 256*op_byte+remainder. The interpretation is that TeX
puts implicit boundary characters before and after each consecutive string of characters
from the same font. These implicit characters do not appear in the output, but they can
affect ligatures and kerning.
If the very first instruction of a character's lig_kern program has
skip_byte > 128, the program actually begins in location
256*op_byte+remainder. This feature allows access to large lig_kern
arrays, because the first instruction must otherwise appear in a location <=255.
Any instruction with skip_byte > 128 in the lig_kern array
must have 256*op_byte+remainder < nl. If such an instruction is
encountered during normal program execution, it denotes an unconditional halt; no
ligature command is performed.
3.4 The
extensible_recipe Array
Extensible characters are specified by an extensible_recipe, which consists
of four bytes called top, mid, bot, and
rep (in this order). These bytes are the character codes of individual pieces
used to build up a large symbol. If top, mid, or bot
are zero, they are not present in the built-up result. For example, an extensible vertical
line is like an extensible bracket, except that the top and bottom pieces are missing.
3.5 The
param Array
The final portion of a TFM file is the param array, which is
another sequence of fix_word values.
param[1]=slant is the amount of italic slant, which is used to help position
accents. For example, slant=.25 means that when you go up one unit, you also
go .25 units to the right. The slant is a pure number; it's the only
fix_word other than the design size itself that is not scaled by the design
size.
param[2]=space is the normal spacing between words in text. Note that
character " " in the font need not have anything to do with blank spaces.
param[3]=space_stretch is the amount of glue stretching between words.
param[4]=space_shrink is the amount of glue shrinking between words.
param[5]=x_height is the height of letters for which accents don't have
to be raised or lowered.
param[6]=quad is the size of one em in the font.
param[7]=extra_space is the amount added to param[2] at the
ends of sentences.
TeX math symbols, the font is supposed
to have 15 additional parameters called num1, num2,
num3, denom1, denom2, sup1, sup2,
sup3, sub1, sub2, supdrop, subdrop,
delim1, delim2, and axis_height, respectively.
When the character coding scheme is TeX math extension, the font is supposed
to have six additional parameters called default_rule_thickness and
big_op_spacing1 through big_op_spacing5.