The world of Content Management is awash in terminology that may be challenging for the uninitiated. We have compiled a glossary of the more commonly used terms to help get started. If you encounter a term that you don’t find here or would like further explanation, please feel free to contact us for an answer.

99.95% accuracy

Applied to key entry or OCR this number reflects the percentage of correct characters. For example, 99.95% accuracy means there are no more than 5 character errors per 10,000 characters, or approximately 1-2 character errors per page.

Aggregator

A web-based company that sells content from multiple sources, often focused on a particular subject. Currently, most aggregators focus on scientific, technical and medical information, but many are appearing in other fields such as libraries, technology and education.

API (Application Program Interface)

A set of routines, protocols, and tools that assist programmers to build software applications. For example, MS-Windows provides an API that enables programmers to write Windows applications that maintain a consistent look and “feel” across the Windows platform. The API assists programmers and also makes resulting programs easier for users to learn because they have similar interfaces.

ASP (Active Server Page)

An HTML page that is tailored to the user by scripts processed on the fly before the page is sent to the browser. Scripts select information from a database based on user preferences. Examples include “My Yahoo” and instant local weather reports on the desktop.

ASP (Application Service Provider)

Provides software such as business applications and e-learning programs online, relieving the user from installing and maintaining programs.

Attribute

Provides additional information to an element in XML. For example, in the tag , note is the element, date is the attribute, and 10/20/2003 is the value.

CALS Tables (Computer Aided Logistics Support)

A model for representing data in table form defined as part of the U.S. Department of Defense CALS document interchange initiative (military standard MIL-M-28001B). This definition has become standard in SGML usage.

CGM (Computer Graphics Metafile)

A graphics file format used to exchange and store 2 dimensional graphics information including raster (bitmap) and vector information. Featuring platform independence and small file size, CGM has become an international standard and was included in the CALS initiative standards.

CMS (Color Management System)

A system for providing consistent color across peripheral devices and across operating-system platforms. A color management system enables the user to match colors, recognize the limits of a device (colors that cannot be reproduced), simulate a color on a device, and to calibrate devices.

CMS (Content Management System)

Content management systems provide the means to manage document content and design independently and an environment in which data can be re-used.

Conditional Text

A block of text in a master document specified for inclusion or exclusion depending on audience or output format. For example, conditional text might exclude certain graphics if the document is output as online help and include them for a print version, enabling one master document to serve many purposes. FrameMaker and Bookmaster are examples of software packages that support conditional text.

CSS (Cascading Style Sheet)

File that defines custom formatting including fonts, spacing, alignment, color, and so on for structured documents such as HTML and XML. A CSS separates document design from content simplifying website maintenance. CSS is now supported by both Netscape and Internet Explorer.

DOM (Document Object Model)

An interface that allows programs and scripts to dynamically access and modify the content, structure and style of XML documents. An XML document can be managed as a “DOM tree” allowing random access to document content. A DOM tree requires a large amount of available memory.

DPI (Dots Per Inch)

A measure of image resolution expressed as the number of dots a device can display or print per linear inch. The more dots per inch the sharper the image and the larger the file, causing the image to print and display with more clarity but more slowly. Internet use has inspired the development of highly efficient compression algorithms, such as JPEG, that greatly reduce file size without compromising quality. Common resolutions include 72dpi for web display, 300 or 600dpi for laser printers and much greater for high quality print publishing.

DTD (Document Type Definition)

An ASCII text file that accompanies an SGML or XML document and defines markup codes and usage rules. The DTD enables an application to correctly read and display the document contents and makes it easy to change the format of a document by modifying the DTD without having to make changes to the document itself.

EDD (Element Definition Document)

Accompanies a structured FrameMaker file and defines elements, how they relate to each other, and how they are formatted. An EDD can be created from scratch or adapted from an existing DTD file. For more information, see print and online documentation for unstructured FrameMaker, including the Structured FrameMaker Developer’s Guide located in the OnlineManuals folder of the FrameMaker 7.0 installation directory.

Element Attributes

Contains additional information about an XML element that is not part of its contents. Attributes contain a name and a value separated by an equal sign and can accept default values whereas elements cannot. In the example , type_of_book is the element and type=”fiction” is the attribute. The value must appear in quotation marks.
Attributes can be used to control element formatting, express descriptive information about an element, or store source and destination information. Attributes are optional and are designed by developers.

Element Type Declaration

Identifies the type and number of elements an XML document may contain, which element types can appear as children of the element, and in what order the elements must appear.

Element, Structured FrameMaker

Structured FrameMaker documents consist of elements that may contain a paragraph, a text range, a heading, a table, a marker, a cross-reference or another FrameMaker item. An element may also contain other elements. The elements form the hierarchy that defines the structure of the document.

Element, XML

A logical data structure in an XML document beginning with a start tag and ending with an end tag, or for empty elements, an empty-element tag. Each element has a type, identified by a name that may be called its generic identifier or GI, and may have a set of attributes. An element can enclose other elements as in the example, “….” the element contains two elements.

GIF (Graphics Interchange Format)

Highly-compressed graphic image format frequently used to display 2 dimensional raster images on the internet. The GIF 89a version also supports animated GIFs made up of a short series of images within a single GIF file. GIFs use the proprietary LZW compression algorithm owned by Unisys. For color images and photographs on the internet, the JPEG format is usually preferred.

Glass Typewriter

Refers to text that is formatted inconsistently, using any method that makes the text display correctly. For example, spaces might be inserted to align text rather than using tabs or a table. The “glass typewriter” approach was common on older systems that preceded today’s sophisticated desktop publishing programs. It is very difficult to convert this inconsistently formatted text to structured formats like XML, SGML, and Structured FrameMaker.

HTML (Hypertext Markup Language)

Coding language used to indicate how text and images should display on the World Wide Web. HTML resembles old-fashioned typesetting code in that you place codes on each side of a block of text to indicate how it should appear. HTML is a standard recommended by the World Wide Web Consortium (W3C) and is compatible with major browsers.

IETM (Interactive Electronic Technical Manual)

Electronic version of a technical manual that can interact with the user. An IETM enables the user to hyperlink to figures, tables, chapters, and so on, rather than turning pages to access them. The IETM can request information from the user, then use that input to determine what it displays next. In a troubleshooting section, the user can click on a problem and the IETM can show the step-by-step solution. IETMs are frequently stored on CD-ROM.

JPEG (Joint Photographic Experts Group)

Graphic image format used to display monochrome, gray-scale or color images electronically. JPEGs maintain high image quality while using compression to keep file size small and have become the primary format used to display photographs on the internet.

Mapping

In the context of converting a file from FrameMaker to XML/SGML, mapping refers to the translation from paragraph or font style (or string of text) in the FrameMaker file to SGML tagging in the output file. For example, a style called ChapTitle may map to the SGML tag …. In this case, when the SGML encoding software encounters the paragraph style ChapTitle in the input file, it produces …

Master Format

A process that converts incoming data files to one master format in order to standardize them and enable conversion software to work uniformly. Files in the master format can be converted to varied output formats.

MathML (Mathematical Markup Language)

XML-based language used to display mathematical information. MathML is a standard recommended by the World Wide Web Consortium (W3C) and gaining use by mathematical software vendors.

Metadata

Data that describes data. An XML document or its DTD might contain metadata that enables it to interact with other applications. For example, database developers use standard SQL Data Definition Language to describe the tables that hold the real data.

Middleware

Software that mediates between the network and the applications, sometimes called “glue”. Providing such services as identification, authentication, authorization, directories, and security, middleware is essential to the efficient operation of a network.

MIF (Management Information Format)

A system-independent format that describes a managed hardware or software component. For example, Windows 95 requires the corresponding MIF file to install a new device. The hardware and operating system-independent Desktop Management Interface (DMI) uses MIF files to report information about system configuration.

Namespaces

In XML, a logically related set of names for elements and attributes in which each name is unique. Because a single XML document can be processed by multiple software applications, a namespace is needed to define which elements and attributes should be used by each application.

OASIS (Organization for the Advancement of Structured Information Standards)

The home site for this group is http://www.oasis-open.org/ . The DTD repository they sponsor is at http://www.XML.org .

OCR (Optical Character Recognition)

Process in which characters on a page are scanned or read to interpret them into an electronic character based file. Dark and light areas of the text are assessed by a scanner or reader and recognition software matches each pattern with an alphabetic character or numeric digit.

Open Source

Open source refers to software source code that the general public can read, redistribute and modify free of charge. Because more programmers are involved, this approach often results in code that runs more smoothly and evolves more quickly than commercially developed code. Software programmers share their changes within the community and review each other’s work. The Open Source Initiative (OSI) issues a certification standard for open source code. See http://www.opensource.org/docs/definition.php for the specific distribution terms of open source code.

Parse

Process of inspecting and dividing data into smaller units so that a program can act on the information. Compilers parse source code to translate it into object code and end user applications parse commands. In structured mark-up languages such as XML and SGML the parser checks that tags are applied legally as defined in the Document Type Definition.

PDF (Portable Document Format)

A device and resolution independent format developed by Adobe and modeled after the PostScript language. A PDF file can be viewed, printed, and searched using the freely available Adobe Acrobat Reader. PDF format is especially useful for viewing documents where appearance is critical because it reproduces them just as they were composed including all graphics and formatting. PDF files are compressed, supported by all popular operating systems and compatible with most printers.

Prolog

The part of an XML file that precedes the actual marked-up document. The prolog can contain a version declaration, a DTD, comments and processing instructions, or it can be blank.

Raster

A method of representing images using pixels (picture elements) or points displayed in rows and columns for display on an output device. Also called bitmap images, formats include GIF, JPEG, TIFF, and PCX.

Repository

A place where data is stored and maintained, a repository can consist of one or more databases or files. Data from the repository can be distributed over a network or available locally to the user. The advantages of a repository may include controlled access to content, versioning, dynamic preview and single source publishing. Related terms include data warehouse and data mining .

Resolution

Indicating clarity, sharpness or fineness of detail achieved in displaying an image, this term is used in connection with monitors, printers and bit-mapped graphic images. On printers, resolution indicates the number dots on the horizontal and vertical axes per unit of measurement (such as inch or centimeter). On monitors, screen resolution indicates the number of dots (pixels) on the screen. The more pixels per unit or on the screen, the finer and sharper the image.

RTF (Rich Text Format)

A file format established by Microsoft that includes formatting information in the same document as the text and graphics. The RTF format allows a file to be transferred across applications and platforms, which means a file can be transferred from one word-processing environment to another and between a PC and an Apple machine.

Sample Markup

Text of a sample document with the tags inserted as a preliminary step in the Proof of Concept phase. This step might be an electronic file with its corresponding hardcopy or simply a hardcopy with the tags written in.

Schema

A means for constraining XML markup constructs in ways that go beyond the ability of the DTD. For example, a schema can restrict data for a specific field to all numeric whereas a DTD cannot. The idea of a schema originated with databases where a schema might define the tables and fields and the relationships between them.

SGML (Standard Generalized Markup Language)

A language that specifies the rules for tagging elements of a document that can subsequently be interpreted into formatting. The tags are generic in the sense that they can be interpreted for formatting in different ways depending on the DTD. For example, the SGML tag may specify text as a level-one heading but the DTD specifies the details of how the level-one headings will look (font type, size, color and so on). SGML is a convenient method for managing large documents that may be revised frequently and must be printed in different formats, including paper, online and databases. SGML, the parent of HTML and XML, is an internationally agreed standard for information representation dating back to 1986.

SMIL (Synchronized Multimedia Integration Language)

(pronounced smile) A markup language designed to coordinate the display of various media (multimedia) on web sites. This method uses a single time line to synchronize the display of media on a page.

SQL (Structured Query Language)

ANSI standard computer language for accessing and working with databases.

Styled

A document is styled when like paragraphs are made to look alike by applying a stylesheet (or template) that defines their appearance. Stylesheets are available in most word processing and desktop publishing programs.

Stylesheet

A master template consisting of a collection of style definitions that can be applied to a document. Most word processing and desktop publishing software comes with a standard stylesheet (template) that defines styles for basic elements such as headings, body text and bulleted lists. Custom stylesheets can be created by the user and styles can be added, deleted or modified. Stylesheets are especially useful for maintaining uniformity across multiple documents or authors.

SVG (Scalable Vector Graphics)

An XML image type that describes two-dimension vector and combined vector/raster graphics. These images maintain clarity when scaled and are more compressed than JPEGs or GIFs.

Tag

In XML, a set of text characters that describe an element (data unit). A tag is surrounded by angle brackets to distinguish it from data. For example, in the element My Title, is the start tag, My Title is the data, and is the end tag.

Template

See Stylesheet.

Text Frames

In desktop publishing, text frames (or text boxes) function to position text on a page. For example, a page might contain three text frames, one for a sidebar and one for each of two columns. A text frame might also be used to continue a “story” on a later page. This can cause problems during a conversion process because the logically connected text (the story) may not appear in connected text frames.

TIFF (Tag Image File Format)

A raster (bitmap) image format with filenames usually ending with a .tiff or .tif extension. This format was developed in 1986 by a committee consisting of the Aldus Corporation (now part of Adobe) along with Microsoft and Hewlett-Packard. This format is widely used for faxes, medical imaging, and desktop publishing.

UML (Unified Modeling Language)

An object oriented language that specifies, visualizes, and documents artifacts in an object-oriented system that is being developed. This language is independent of the development process and programming languages and tools.

Unicode

Unicode uses a 16-bit code page to map digits to characters in all languages. Within 16 bits, 32,768 codes are possible, making it capable of encoding all the languages of the world (except ideographic languages like Chinese). This standard was defined by the Unicode Consortium. For more info, see http://www.unicode.org/ .

Unstyled

In an unstyled document, text elements have been formatted individually rather than applying styles from a stylesheet or template. For example, rather than tagging a heading as a Heading 1 style, which would imply its appearance (indent, spacing, font size, type, color) the text remains untagged (or some default style) and each characteristic is applied separately to the text. This approach can lead to inconsistency and makes the task of conversion very difficult.

URI (Universal Resource Identifier)

An abstract identifier, URI can apply to either a URL or URN.

URL (Universal Resource Locator)

Points to a unique location (address) on the internet. The first section of the URL specifies the type of address and indicates what protocol to use. For example, http:/ specifies a Web location, ftp:/ specifies a downloadable file, file:/ specifies a file on the local disk system, mailto:/ specifies an email address. The second part specifies the IP address (domain name) where the specific resource is located. Thus, in the URL https://www.clearpath.cc , http:/ is the web location and clearpath.cc is the domain name.

URN (Universal Resource Name)

Specifies a name for a resource or unit of information but does not define its location (as a URL does). This allows the resource identified by the URN to exist in one or more locations and allows it to move but still be found.

Vector

A method of representing an image as a collection of independent lines and shapes usually defined by mathematical formulas. These images are more easily modified than raster images and can be scaled and maintain perfect clarity. Adobe Illustrator, CorelDraw, and AutoCad are examples of vector drawing programs. Most programs have their own file format and unique file extension (such as .ai or .cd).

Well-formed

In XML, a document that conforms to the constraints defined in XML 1.0.

WYSIWYG (What You See Is What You Get)

(pronounced wiz-ee-wig) Describes a program that allows a developer to see the end result of a coded or markup document as it is being developed. These programs use a graphical user interface (GUI) to make the text appear as it will in the final product and shield the codes or tags from view. Examples of software that enables a developer to create web pages with minimal knowledge of HTML code include PageMill (Adobe) and FrontPage (Microsoft).

XMetaL

A set of tools (from BlastRadius) designed to simplify the implementation of XML applications across an organization. XMetaL is a structured editor that can be customized for use with applications based on well-known DTDs. A CSS and a set of macros are required for data entry with each DTD. XMetaL provides three views of an XML file: plain text view (can see the underlying XML code), tags-on view (elements represented as symbols in a formatted document) and normal view (shows the formatted document and hides the markup). For more information see http://www.softquad.com/ .

XML (Extensible Markup Language)

A language that allows content to be separated completely from the mechanisms that specify how to render and display this content. XML is a subset of SGML designed especially for use on the internet and is supported by both major browsers. XML allows designers to create customized tags, enabling data to be defined, transmitted and validated between applications and organizations. This capacity for unlimited markup symbols is what makes XML “extensible”.

Both XML and HTML contain markup symbols to describe contents, but HTML describes content only in terms of how it is to be displayed on the Web whereas XML also provides some indication of what the data is that is being described. For example, might indicate that the following data is the name of a director and his agent. This information could then be processed as pure data as well as formatted in a specific way.

XSL (Extensible Style Language)

A stylesheet language that specifies how XML data formats on screen. XSL allows you to

  • Define an addressing mechanism in order to identify parts of an XML file (XPath)
  • Define tag conversions that allow you to convert XML data into different formats (XSLT)
  • Define display characteristics (page size, margins, font information, text flow, wrapping).

XSL-FO (Extensible Style Language-Formatting Objects)

A styling language written in XML syntax, similar to CSS but with wider range. XSL-FO is a relatively new standard, and has yet to achieve the maturity or widespread adoption of CSS.

XSLT (Extensible Style Language Transformations)

A scripting language for manipulating XML that uses XML syntax. XSLT is used to create other formats from an XML source such as HTML or to convert XML content between two DTDs.

Leave a Reply