Document Handling White Paper

Page: 1

Page: 2

Page: 3

Contents of Page:

Electronic Documents

Current Implementations

Emerging Technology

Electronic Documents

   Without human-like segementation, elaborate filing systems are used to organize paper, but it is very difficult (if not impossible) to keep track of thousands of pieces of paper as they make their way around the corporate circuit.

    When paper is lost, the company (if not the officers) incur liability. For example, the IRS will probably not allow the deduction.

    A quality segmenter can play a central role in mitigating these issues.

    On the image quality side, the storage method must be able to retain all necessary information on the document. To paraphrase an eminent philosopher, a system with too many exceptions is of little use. We need to handle all the pieces of paper. For example, a dental office might need to put paper notes, dental x-rays, check copies, and several other categories of documents into a single system. If we can't get the information into the computer, we have to continue to rely on the paper files.

    We are not advocating throwing the paper away. Everyone will want to take a few years to get comfortable with the electronic image first.

    Electronic images have a number of advantages over paper images. People will choose the electronic file over the paper file for various reasons. For example, to use a paper file, people usually have to get out of their chair (it could be considered work). On the electronic side, we do not have to move body mass to get what we want.

    Electronic documents are easier to copy and distribute. By using bandwidth instead of the Post Office, we can afford to ham it up. We don't have to purchase envelopes, stamps, or paper.

    Our colleagues could look at one of our files without bothering us, and we can set it up so they don't lose it.

    To make all of this possible and practical, the segmentation must be at human level (or better), and the compression must take the 25 MByte file down to about 10 KBytes.

    Even if this system was just used to store and distribute critical tax and corporate documents that everyone seems to need to see on a regular basis, it would be worth its weight in gold. Of course, it can be applied to a much broader scope.

    When we take business trips, we could simply take a photos of a receipts. Then when our luggage suffers airline abuse that causes the shampoo bottle to explode and destroy all those receipts (which probably roughly equal a mortgage payment on our house), we could still electronically file our expense report with all the necessary receipts long before we ever get back to the office.

    Once the receipts are electonically attached to the reimbursement record, the IRS auditor can simply click on the reciept to verify its existence. With paper records, the filing clerk has to take the list of requests from the auditor, pull the files, run the copies, refile the records, and send them to the auditor. There can be some number of iterations.

    IRS and other audits are typical of the use of files, but obviously, the files are used for many other reasons as well.

    Computer records have proven to be much more durable, distributable, and trackable than paper documents. With computer records, it is easier and cheaper to establish, manage, and maintain the nearly all important "paper" trail.

    As an example, when the World Trade Center catastrophe occurred, nearly all the paper records were lost, but most of the computer records survived.

Current Implementations

    When pictures are highly complex and high quality is desired, most companies use a transform compressor (usually JPEG) that throws out various colors as determined by a mathematical algorithm. The output is considered high quality, but both contrast and detail are compromised among other things.

    TIFF G4 is the workhorse of the document imaging industry. Threshold segmenation is used to produce bi-tonal images.

    The documents should be clean, and any handwriting could be lost in the process. Fine text may not be accurately displayed.

    This system usually reduces one page documents from 25 MBytes to 30 to 50 KBytes.

    With TIFF G4 technology, the file sizes are too large and the image quality is too low to meet the needs of the typical office environment.

Emerging Technology

    To be practical, the same document imaging system needs to be used on all the documents an office has.

    Therefore the document imaging system required in a typical office needs to reproduce all the information on the document that humans can see. If segmentation is used at this high of a quality, the number of blobs will be very high which makes useable compression harder to obtain.

    In theory, a geometrical compressor could only obtain the required compression if the blobs were repeatable. The blobs are not repeatable when segmentation is taken to human-like quality. The trade off between segmentation and compression seems to result in a catch 22 for today's equipment, but to paraphrase classical military strategy, the worst part of the problem contains the key to the solution.

    When we create documents on a computer, there is usually extreme repeatability and very few colors. For example, the word "repeatability" has two colors, two 'e's, two 'a's, two 't's and two 'i's. There are similarities and repetition between indiviual letters. For example, the bottom of the 'i' is similar to the bottom of the 'l'. Nature, on the other hand, is hardly ever repeatible either in form or color. For example, every snowflake and fingerprint are different. In fact, nature almost never does exactly the same thing twice.

    A geometrical compressor would have a field day on human created documents before they were printed out, scanned in, crumpled up, written on, and otherwise mutilated and distorted (entropy attacks repeatability and symmetry). The typical office paper was created by humans and distorted by entropic processes. If we could reverse the entropic distortions of our documents, we could recover the original (and repeatable) data they contain.

    Since the documents were made by a human, we know, that except for some photographs, they were originally very repeatable.

    Entropic distortion follows rules (or laws). For example, optical aberrations can be precisely simulated in a known environment. In the typical office, the environment is not known, but it can be approximated.

    Of course, there are a number of different ways a document can be compromised, but we can estimate how the more common distortions would occur.

    If we use image restoration procedures to correct the more common distortions, we can pull the image back to something closer to the original. We can not get the original back perfect with image restoration, because we have to approximate the environment of distortion. Then by allowing a tolerance around what is expected, we can snap the image back to nearly exactly the expected original.

    With these techniques we do not recover the original image exactly, but we would recover the information we would draw from the image.

    Therefore, human-like segmentation combined with geometrical compression can meet the needs of the typical office, if image restoration is carried to within the tolerance of an expected artifact.

    The solution would be implemented in two steps. In the first step, we would segment and restore. In the second step, we would apply a geometrical compressor that tolerates acceptable pixel misplacements as illustrated here.


    The implementation is complicated and computer intensive, but the compression could easily be 100 times greater than any conventional method. Furthermore, humans would not fault the quality.

    The technique of combining advanced segmentation, image restoration, geometrical compression, and tolerating acceptable artifact distortions will likely have a dramatic impact on the document handling industry. High quality and extremely high compression should propel the industry into many new markets while simultaneously expanding its existing markets.

Page: 1 - Go to start of white paper

Page: 2

Page: 3 - Go to next page


2004 - 2008 Accelerated I/O, Inc, All rights reserved.