Golden Files

Introduction Raster File Conversion Segmentation Golden Database

Raster Files *.pnh Type 3 Conversion Program Application Files

Introduction: Some CAD programs have the ability to snap a line to a grid. In a similar way, Pac-n-Zoom® has the ability to "snap" things to a previously defined shape.

Pac-n-Zoom snaps similar shapes to exactly the same shapes to reduce the amount of noise in the image. For example, if a page full of identical 'e's on a word processor are printed out and scanned back in, they will probably all be different. Among the many 'e's that were scanned back in, there would probably not be an 'e' that was exactly correct, but they would all be within an exact tolerance of being perfect (assuming that the printer, paper, and scanner were within their specified tolerance).

The golden files are a set of perfect shapes. When an imperfect shape is within an acceptable tolerance of a perfect shape, Pac-n-Zoom will snap away the imperfection (which is due to noise) to leave a perfect shape. There are three primary benefits of exact conformation.

1. Visual Appeal: We have all struggled with fuzzy FAXes and blurred images. It's hard to argue against a sharper image.

2. Better Compression: In a number of compression algorithms, any repeating shape can be tagged, but as shown in the example of the 'e's (given above), the noise prevents initially identical shapes from repeating. By storing or transmitting only perfect shapes, compression can be much higher. The exact amount depends on the amount of tolerance allowed in noise sources (e.g. printer, paper, camera, camcorder, or scanner).

Top of Next Column
3. Mathematical Accuracy: The whole point of the blob compressor is to group shapes together to achieve mathematical consistency, and with the golden shapes, we can deliver accurate mathematical shapes to the data tagger which is an important step towards machines achieving human-like cognizance.

With accurate mathematical shapes being provided to the the data tagger and third party support being provided to the glider convolver, Pac-n-Zoom 2006 can become a graphical rosetta stone that converts pictures from one program into fully compatible files of entirely different application.

In the golden database, the golden file shapes are divided into two groups, text and graphics. The graphics can be named, and the text is categorized by the following three fields.

1. Font: The golden files will come with a few fonts, but we expect to add more all the time. By following the correct procedure, the end user can add fonts as well.

2. Pitch: The pitch is held as the maximum height and width of the characters across the entire character set.

3. Attributes: The person loading the golden files can specify as many attributes as desired.

As shown in the diagram, if the text from the image can be matched to text found in a golden file, we can create a formatted page of text from a scanned image when additional tools are used.

Click on a specific area for more information.
Click on specifi area to expand

Raster File Conversion: Pac-n-Zoom® can not currently read TIFF files. FAXes are almost always TIFF G3, and scanned images are typically TIFF G4, TIFF LZW, or JPEG. Of these formats, Pac-n-Zoom can only read JPEG. The TIFF formats are owned by Adobe, but Adobe lets the software and associated tools be used without a license. This environment has allowed companies to convert from TIFF to bitmap. A short Internet search turned up the following programs which is probably a small fraction of the possiblities.

1. BMP Smartz from Smart Image Converter
2. 2Bitmap from fCoder Group
3. Advanced Batch Converter from Gold-Software Development
4. Able Fax Tif View from Graphic Region

Segmentation: If the page is black and white (eg. black text on a white background), threshold segmentation should be used. Threshold segmentation is a very simple (read fast) segmentation. If the pixel is lighter or darker than gray, it is set to white or black repsectively. If there are any colors or shades of gray, full color segmentation should be used.

Golden Database: The external database is optional, and it is not supported at the present time . A number of golden files can be loaded by Pac-n-Zoom when Pac-n-Zoom launches, if the document has a relatively homogeneous font set (such as a legal document), these files can be enough. All of these files are loaded into system memory, however, even when they are not needed by the current document that is being processed.

Since all of these fonts are held in raster, a single font type (such as courier), with various pitches and attributes, might require 500 files (or the equivalent amount of memory requirements grouped into fewer files). Then, a 100 different fonts requires 50,000 files. If each file requires 50 KBytes of system memory, 2.5 GBytes of system memory are needed. This number would be reduced to about 900 MBytes because some commonality will be found between the fonts. An external database reduces the amount of system memory that is required.

By manipulating the flags and attributes in the configuration file, the database can add to the ability of harvesting data from a graphic file.

Raster Files: By storing the files in a *.html wrapper, we can accomplish the following things.
1. Formatted: Pac-n-Zoom will retain the original format within the acceptable tolerance.
2. Compressed: The *.html file will have all the recognized text and the original *.pnz file. A small penalty from the text and overhead of the *.html is therefore paid.
3. Accessible: With the *.html format, the document can be placed on the World Wide Web (WWW - or Internet). The document can be viewed with a standard Internet browser, if the browser has the Pac-n-Zoom plug-in which is free.
4. Secure: The normal Internet security procedures can be used.
5. Searchable: When the document is on the Internet, the viewer's favorite Internet search engine can be used. If the file is inside an intranet, the provider's search engine(s) can be used.
*.pnh Type 3: When the blob compressor snaps an image cluster to a golden cluster, the image cluster inherits the golden cluster attributes. Besides the golden attributes, the *.pnh file always provides the following information.

1. Size: The maximum pixel height and width of the cluster are given.
2. Location: The row and column of the initial (highest then most left) pixel of the cluster are given.

The author of the golden file decides how many golden cluster attributes are included, but the following are some of the more common ones.

1. Text: The letter, number, or other text character is specified.
2. Font: The style of the character (e.g., Times Roman or Courier) is specified.
3. Pitch: The size of the letter (e.g., 10, 12, etc.) is specified. Since the size of the cluster is given in pixels, the pitch can be estimated from the page size and the number of dots (both height and width) used in the output device. If re-sizeable fonts are used, the application file can resize the image in a limited way. Without raster to vector conversion, the graphics won't re-size in a graceful way.
4. Attribute: The emphasis of the letter (e.g., bold, italics, underlined) is specified. If the builder of the golden file is thorough, the combinations of attributes (e.g. bold and italics) are included.

The type 3 *.pnh file contains all the graphics needed to reconstruct the image, and in addition, it contains attributes about clusters that were snapped to a golden cluster.

Conversion Program: While Accelerated I/O has no intention of writing this program, this conversion program does have some utility across the corporate world as the following uses show.

1. Editing: A FAXed contract could be edited and FAXed back.
2. Checking: The numbers on the page need to be audited. This program would load the numbers from the page into a spreadsheet which could be a real time saver for some professions.
3. Conformity: In a world where the customer is always right, input from the customer can come in several different ways. If the information is on a paper form, the program can gather the different fields from the form to create a database consistency with information gathered from some electronic means (such as a web site).

Application Files: There are a number of different application files (such as spreadsheets, databases, and word processors) that would benefit from being loaded with a FAX or scanned image. By cleaning most of the noise out of an image, Pac-n-Zoom can help get the scanned information into useable computer data.

Contents AIO Home

© 2004 - 2008 Accelerated I/O, Inc, All rights reserved.