Extending Font Knowledge

In some cases it may be necessary to add to the font knowledge of MathFlow. MathFlow can use a configuration text file called FontInfo.ini. By placing this file into the same directory as the DLL, it will automatically try to use this file when the DLL is loaded. This section is dedicated to explaining how to use this feature. Using the techniques shown here, a user can assign a character set (encoding) to a font, create a new encoding, and define the PostScript font name to be used in EPS file generation.

MathFlow contains knowledge of the fonts and characters it works with, which results in improved formatting and translation into EPS. Most of this knowledge is in the form of tables built into the code. However, this information can be extended via the FontInfo.ini file external to the program, allowing it to be expanded and corrected without having to change the application itself.

Encodings

An encoding is a one-to-one correspondence between character meanings and integers. For example, ASCII is an encoding that maps characters onto numbers between 0 and 127 and "a" is assigned the number 97. Fonts are said to use or have a specific encoding. The font's encoding determines what character gets displayed when we pass a given number to the operating system. Note that character style and shape plays no part in the encoding concept --- A Times-Roman "a" has the same value as a Bookman-Italic "a" (assuming the fonts use the same encoding). A code-point is a particular value in an encoding. For example, "a" has the code-point 97 in the ASCII encoding.

The MTCode Encoding

Central to our font information is the MTCode encoding. MTCode assigns a 16-bit constant to every different character that our software works with. It is superset of Unicode, a standard encoding that attempts to assign a unique number with each of the characters used in the world's languages. Unicode covers a lot of math, but not all the math characters that we need. For this reason, MTCode uses Unicode's Private Use Area (PUA), a range of 6400 code points (0xE000 to 0xF8FF) for its additional math characters. We use MTCode values as the key to all of the per-character information --- human-readable character descriptions, token types (variable, operator, etc.).

For more information on Unicode, see the Unicode Consortium. To find out about how we use the Unicode's Private Use Area, see the MTCode Encoding Tables.

Font Encodings

Every font is the expression of some character set. In fact, many fonts share the same character set. We use the term "font encoding" to represent a character set that might be shared by one or more fonts. Many applications (e.g. word processors) don't have to know a font's encoding --- the user hits a key, a code is sent to the application, the code is sent back to the operating system to select a character from a font for display. Our software needs to know more.

A font encoding can be thought of as a table with two columns, the position within the font (a numerical index) and an MTCode code point value (the number from our own master character list, MTCode, that uniquely identifies the character). We give each font encoding a name (e.g. WindowsANSI, MacStd, Symbol). Many encodings are named after the single font whose encoding it is. Many fonts share the same encoding. For example, standard ISO Latin-1 fonts on Windows all have the WindowsANSI encoding.

Our software represents every encoding other than MTCode as a mapping onto MTCode. That is, for every code-point in a given encoding it indicates a unique MTCode code-point. Using this mapping, we can get at all the per-character information for any code-point in the encoding and, therefore, for any character in fonts that have that encoding. For this reason, knowing a font's encoding is very important.

Unfortunately, the computer's operating system tells us very little about the encodings of fonts, least of all for those containing math symbols. So, we have to keep our own knowledge of which encoding each font has. Of course, as people can create their own fonts, the set of font encodings is open-ended.

Extension Scenarios

Font information may be extended in the following ways:

Define that a font has a given existing encoding.
Define (or override) the PostScript font name of a font.
Define a new encoding and specify that certain font(s) have that encoding.
Define new MTCode values and their attributes (description, default style, token type) and use them in a new encoding.

Determining What MathFlow Knows

Since the font information extension mechanism we are talking about also applies to MathType, it can be very useful to do the setup with MathType first to verify that the proper syntax has been used in the configuration file. When the file is finished, you can then copy the file and place it in the same directory as the DLL.

If you have a font installed on your computer that our software does not seem to know anything about, the first thing to do is to verify that this is the case. The easiest way to do this is via MathType's Insert Symbol dialog. Follow these steps:

Choose Insert Symbol from the Edit menu.
Choose Font in the View by menu.
Choose the font in question from the font menu just to the right of the View by menu.
Look at the Encoding name displayed directly under the character grid.

If the encoding name is "Unknown", it means our database of fonts and characters has no information for that font.