Where is the vast majority of civilized man's knowledge and expression stored? It's not in people's minds. It's not on tape or computer media (yet). It's mostly on paper or canvas in libraries, galleries, museums, schools, businesses, and homes around the world. This includes knowledge and information dating back to the beginning of human civilization. What's the big problem with this information? It's hard to find, and it's hard and slow to get to. You often have to physically travel to places, sometimes far away, to find them. Even when you get there, the material may be damaged, lost, restricted, checked out, or not what you're really looking for. Searching for the information can be very time-consuming, even with computerized card catalogs, because you still have to find the actual works and manually scan through them. Because of the difficulty of getting at this information, it limits its usefulness. Productivity and progress is thus limited.
The computer is a tremendous tool. It enables a user to search, access, manipulate, file, reuse, and transmit information anywhere in the world instantly. The trick is to get the information into it. The scanner is the link between printed information and the computer. The scanner can take this information and turn it into a form that a computer can work with. It opens up the computer to an enormous and virtually unlimited source of input. With the power of the computer applied to this source of input, the possibilities are staggering. Information is power. Combine that computer processing power with the instantaneous and widespread capability of disseminating information on the Internet and the Worldwide Web, and you have a potential for empowering the average person to a degree that's unprecedented in human history.
That's all impressive from a global standpoint, but for the average user, the question is: what can a scanner do for me? Here are some general uses:
I predict that scanners will soon become so important that they will be bundled with PC's like modems, and sound cards. A low-end scanner is about the same price as a high-end sound card, and for office use, a scanner is much more useful than a sound card. Printers and scanners complement each other, so it's likely they'll be bundled together with compatible resolution.
I also predict that the huge memory requirements of scanned images will drive the need for more powerful computers with enhanced graphics-handling capability, more RAM, and more storage. In the storage area, this not only includes larger hard disk drives, but high-capacity removable storage for archiving and transporting data. Standard 3.5" 1.4MB diskettes are inadequate for this task.
As it becomes easier to incorporate color graphic content into documents, this will increase the demand for high-quality color output devices, such as inkjet and color laser printers. The resolution and quality for both scanners and printers will increase until they meet or surpass film and high-quality printing processes.
Graphics made the World Wide Web the instant phenomenon that it is today. Scanners put more power into the hands of small office, academic, and home users. Scanned images will proliferate even more on Web pages (like this one), which will increase the demand for more bandwidth on the Web.
Scanners may displace dedicated fax machines in some office applications where fax demands are light and a computer, printer, and modem are already present. For multiple-page faxing, an auto-document feeder is useful. A scanner, with the appropriate hardware and software at the receiving end, can do something no ordinary fax machine can do: send high-resolution color faxes.
Digital cameras will proliferate and will displace film cameras to some degree. They won't, however, replace scanners for a long time. The resolution, in terms of total numbers of points digitized, of a digital camera is orders of magnitude smaller than a scanner. A high-end digital camera, costing in the $1K+ range, will have a resolution of 1000+ points total across the entire image. A low-end scanner costing under $100 can resolve (with interpolation) 1000+ points every fraction of an inch. A digital camera has the advantage of portability and immediacy and is very useful, but it's a different use from a scanner. It's like the difference between RAM and hard disk storage. The two are complementary, not mutually exclusive. I do believe that digital cameras will become more popular as they get better and cheaper, but they'll displace film cameras, not scanners.
Type | Usage |
Handheld | Specialized applications, portable scanning of documents for OCR. Cheap, light, but limited in size of scan and quality. |
Sheet-Fed | Ideal for OCR of multiple sheets of text pages. Many have auto-document feeders. Compact, some are portable, can be cheaper than flatbeds. Quality not quite up to the best flatbeds. Size of scan theoretically unlimited in vertical direction. Specialized variations include photo scanners, business card scanners, and combination keyboard-scanners. |
Flatbed | Most versatile scanner. Can scan sheets, books, objects. Wide range of price, quality. Some have auto-document feeders and slide scanners as accessories. Takes a lot of deskspace. |
Film | For direct scans of negatives and slides, usually 35 mm. For professional photographic work of highest quality. Compact, but more expensive than above types. Has widest dynamic range. |
Drum | Highest quality for scans of sheets. Uses photomultiplier tube. Has highest resolution, dynamic range, and color fidelity. Extremely expensive, graphics arts pros only. Usually owned by service bureaus. |
Scanners convert analog data (page images) to digital data. Digital data can have different bit depths depending on the application and scanner hardware:
Conditions:
Scanner: Microtek
Scanmaker
E3
Maximum resolution: 300
X 300 DPI optical resolution, 2400 X 2400 interpolated resolution
Interface: SCSI
Computer: Pentium 120 CPU
Memory: 49 MB RAM, 128K
pipeline burst cache
Scan size: 8 1/2" X 11"
Scan Type | Resolution | File Size | Scan Time |
1-bit B&W | 75 DPI | 258 KB | 13 secs |
1-bit B&W | 100 DPI | 458 KB | 18 secs |
1-bit B&W | 300 DPI | 4.11 MB | 65 secs |
1-bit B&W | 600 DPI | 16.44 MB | 175 secs |
24-bit Color | 75 DPI | 6.17 MB | 59 secs |
24-bit Color | 100 DPI | 10.96 MB | 81 secs |
24-bit Color | 300 DPI | 98 MB | * |
24-bit Color | 600 DPI | 394.4 MB | * |
24-bit Color | 1200 DPI | 1.58 GB | * |
24-bit Color | 2400 DPI | 2.4 GB | * |
* Not enough memory to scan.
This shows that scan time and memory required can vary tremendously with resolution. Color also takes much more time and memory than B&W. Scanning is processor and memory intensive, so the scan speed depends on the speed of the PC and the amount of RAM installed. The memory requirements are such that even if your scanner is capable of 1200 to 9600 DPI interpolated, you may find it impractical to use such high resolutions except for special purposes. (See Zooming In.)
Scan time may or may not be important, depending on what your application is. If you're using the scanner for photo imaging, you'll more likely spend much more time editing and manipulating the photo scan than doing the scan itself. Scan time becomes important if you're scanning in multiple pages for OCR or archiving purposes. However for OCR work, the OCR processing time can be longer than the scan time.
One problem with trying to compare scan times is that there is no standard measurement. One manufacturer may quote the scan time at 300 DPI of a 4 X 6 photo, while another may quote the scan time for an 8 1/2 X 11 sheet at 100 DPI. Independent reviewers will test all the scanners they're reviewing under the same test conditions, but different reviewers may use different conditions. I've seen in some reviews that the relative ranking of the speed of different scanners will vary with the test conditions. The PC speed and configuration can have a big effect on the scan speed. Some scanners may have built-in intelligence and rely little on the PC speed, while others may heavily use PC resources and will be very sensitive to PC speed. Scanner drivers can vary in the way they use RAM and hard drive space to store scans, so the scan speed can be greatly affected by the amount of RAM or the speed of the PC's hard drive.
How much resolution do you need? It depends on your application. The more resolution, the more you pay, so you don't want to pay for resolution you don't need. However, if you have multiple applications for the scanner, get the resolution appropriate for the most demanding application you think you MIGHT have. Keep in mind that the important specification is image resolution. Image resolution is the resolution of the final output. That depends on the initial sample resolution and how much you blow it up or shrink it down. If you blow images up, you'll need more sample resolution than if you use them actual size. Conversely, if you shrink scanned images down, you need less sample resolution. If you don't know what you need, get the highest resolution scanner you can afford, or wait until the prices come down. Here are some rules of thumb on what scanning resolution is needed for different applications, based upon using the images actual size:
Scan size: 5.89" X
3.96"
(4 X 6 photo)
DPI: 21
BMP size: 122,894 bytes
Color: 24-bit
Pixels: 246 X 166
Quality | File Size |
5% | 1,528 |
15% | 3,113 |
25% | 4,475 |
35% | 5,903 |
45% | 7,021 |
55% | 8,053 |
65% | 9,488 |
75% | 11,559 |
85% | 15,204 |
95% | 25,981 |
JPEG at 75%
quality,
11,559 bytes
This looks virtually
identical
to the original BMP file, but is much smaller. 75" quality is a safe
compromise
between quality and compression for most images.
JPEG at 55%
quality,
8,053 bytes
This still looks
acceptable,
but some artifacts can be seen in the hills at the left and along the
horizon.
Some of the details are starting to get blurry. You can get away with
this
level of compression if you have an image with large details, or where
distortions in the detail are not apparent, such as in pictures of
trees
or grass. Also, you shouldn't have detailed areas next to solid-color
areas
or else you'll see color artifacts in the solid areas.
JPEG at 25%
quality,
4,475 bytes
The details along the
horizon
and left side are very blurry. Detail is lost on the hills. The sky
shows
some blockiness on the right side. It's marginally acceptable, and the
high degree of compression may be more important than the quality.
JPEG at 15%
quality,
3,113 bytes
This looks a view through
a wet window. Much detail is lost. The sky is severely blocky, with
many
color artifacts. This is probably unacceptable for this image, but
there
may be some images where this level of quality works. The only way to
tell
is try it and see.
The raw image was somewhat blurry, so I sharpened it up with PhotoImpact. The sharpening process enhances the edges of objects, but it also introduces some "noise" into the picture. The original photograph was taken with a $40 point-and-shoot camera on 35 mm film, processed at a discount store, so it's not the sharpest original in the world. I don't know if it's true, but I read somewhere that mass market photofinishers tend to print negatives slightly out of focus to hide dust, grain, and scratches. This means that if you scan these prints at high resolution, they'll be blurry. For maximum resolution and quality, dedicated negative scanners are the best, but that's only necessary if you're a very serious user or a graphics arts professional. The point is that if you're using the scanner for Web images, you don't need very much resolution. Even the cheapest scanners are adequate.
I would guess, based on the inherent design differences between sheet-fed and flatbed scanners, that flatbeds are more reliable. The sheet-feds have to handle paper and are more prone to jamming, just like printers. Paper tends to shed particles, which can clog the mechanics or dirty the optics. Dust and other contaminants can get into the innards of a sheet-fed more easily than a flatbed, which tends to be sealed up. It's like the difference between a floppy disk drive and hard disk drive.
The component most likely to go out first is the scanner's lamp. Many scanners, like my E3, use a standard fluorescent lamp that stays on all the time to stabilize its color temperature. Other scanners use cold cathode lamps that have 10,000 hour lives, which is probably beyond the useful life of the scanner. On the other hand, even though standard fluorescent lamps don't last as long as cold cathode lamps, they are cheap to replace, about $5.
Most flatbed and sheetfed scanners use CCD (charge-coupled device) arrays with a system of mirrors and lenses to project the scanned image onto the array. A recent innovation that does away with the mirrors and lenses is the CIS (contact image sensor). CIS uses a long, thin array of sensor elements next to a row of color LEDs that provide a light source. The advantage of the CIS technology is that it allows the scanner to be very thin and light. It also uses less power, which can allow these to be powered by batteries or by the power from the USB port. As with any new technology, it is going through some growing pains, so the image quality is still inferior to traditional CCD designs. This may change in the future. For most home and office uses, the space-saving (vertical height, not desk area) and the lower power is not a big deal, so there's no desperate need for this technology.
If you're a serious
film
photographer and want the ultimate in picture quality for print
publishing,
get a film scanner. Unfortunately, they're much more expensive than
flatbeds
and are much more specialized. The professional-level ones are over
$1000,
but the prices are coming down to the point where lower-priced models
are
affordable by serious amateurs. They are still not quite mass-market
yet
(and it's uncertain if they will ever be), so you don't see as many new
models or such intense price competition as in the flatbed market. HP
has
one, the Photosmart, that is geared towards high-end consumers and is
priced
below $500. It can handle not only slides and negatives, but color
prints
up to 5X7. Other big players in this arena are the traditional camera
companies,
like Kodak, Nikon, Minolta, Polaroid, Konica, and Olympus. All have
models
above and below $1000. Microtek's ScanMaker 35T Plus is the venerable
scanner
manufacturer's slide scanner. (See the Scanner
Manufacturers links below.) Color slides have tremendous dynamic
range,
much more so than prints. You need a good film scanner, preferably with
30-bits or more bit depth to capture and take advantage of that dynamic
range. Since film scanners are aimed at professionals or serious
amateurs, performance is more important than ease of setup. That's why
these scanners mostly have SCSI interfaces.
SCSI scanners may come with their own SCSI card. My E3, for instance, came with a low-end Adaptec SCSI card that only officially supports one device. Other SCSI scanners require you to provide your own SCSI card. SCSI cards can cost as little as $50 for a low-end model, to >$200 for high-end cards. Some scanners have their own proprietary AT-bus interface cards; some are plug-n-play. In any case, unless you already have a SCSI card installed, you need to open up your PC, which can be a pain for some users. SCSI cards also require a precious interrupt, which may not be available if you already have a lot of peripherals on the system (not that the archaic PC architecture has a lot of interrupts to spare - don't get me started on a diatribe about this).
There are many different kinds and price ranges of SCSI interface cards. The cards that are typically bundled with low-cost scanners are very simple cards that are intended to be used only with the scanner. They run in programmed I/O mode, which means the CPU has to get involved with each bye of data transferred. The Adaptec card that came with my Microtek E3 is one such card. While scanning, it totally ties up the CPU, so all other processes are frozen. For more money, you can get a general-purpose bus-mastering SCSI interface card that will not tie up the CPU as much while scanning. It can also interface with more than one device. This is an advantage. If you can spare an interrupt to install the SCSI card, you can plug multiple devices into that SCSI card and only use that one interrupt. However, SCSI peripherals can be tricky to set up. There are several flavors of SCSI (SCSI-1, SCSI-2, Ultra-SCSI, wide SCSI, etc.), so you can run into complications if you try to drive different types of SCSI types from the same card. Different types of SCSI use different connectors, which are not compatible. There are adapters available to convert from one type to another (e.g. 68-pin to 50-pin), but they tend to be expensive. SCSI also has cable-length limitations and termination requirements that you have to keep in mind.
If you already have an interrupt allocated for the parallel port, you don't need another interrupt to run the parallel port scanner. These scanners have pass-through connectors so you can hook up your printer to the same port. If you also have a parallel-port Zip drive, you could (theoretically), daisy-chain all three. However, this can have compatibility problems, particularly with the printer. You could use a switch box, but that can also cause problems, depending on how picky your printer is about signal quality. You also have to remember to manually set the switch box to the right position. Like SCSI, the parallel interface has cable length limitations, but they are less well-defined than SCSI. Usually, it's the printer that's the pickiest about the cabling.
The easiest interface solution, in my opinion, is the new Universal Serial Bus (USB). This bus is found on most new computers and is supported by Windows 98. It's not as fast as SCSI, at 12 million bits/sec, but it's much easier to set up and is faster than most parallel ports. It's also much more expandable. Theoretically, you could hook 127 USB devices on the same bus, not that you'd want to or could afford to. USB connectors are smaller and easier to hook up than parallel port or SCSI cables. They are also thinner and can be longer. The USB port provides power to low-powered peripherals like joysticks. Most scanners use too much power to be powered from the USB port, but with the new low-powered CIS technology, this may change. USB is relatively new, but more and more peripherals are coming out with it. A USB scanner would be my choice if I had a new PC and needed a new scanner. If you have an old PC and don't have built-in USB ports, you can add a USB card. Newer Macs, like the iMac, also have USB ports. If you have a Mac, you need to make sure the software that comes with the scanner supports it.
You have to decide on what your priorities are and read the reviews and manufacturers' specs to see how different scanners compare in each of these factors. See the links below to go to those specs and reviews. Unless you have limited specialized applications, the best type of scanner to get is a flatbed. It's the most versatile, but it does take a lot of space. Personally, I like my Scanmaker E3. It works great. The software bundle is very good, especially PhotoImpact. I haven't yet had a need for a higher optical resolution than 300 DPI, but my HP660C printer is limited to 300 DPI. However, the E3 is not state of the art anymore. For a few dollars more, you can get a 600 X 300 scanner by several companies. 600 X 600 and even 600 X 1200 scanners for reasonable prices are becoming more common.
The process of optical character recognition is not an easy job. It requires a tremendous amount of computer power, and it's only with the more powerful processors like Pentiums has it become practical for home computer use. If you consider all the thousands of fonts that are available, in all different sizes and spacing, the OCR program has to be very flexible to be able to recognize them. Some characters are very difficult to tell apart, such as 1, I, l, 0, O, and o, especially with mixed fonts of different sizes. The better OCR programs can be "trained" to improve recognition quality, especially with unusual fonts. The quality of the original is also important for accurate scanning. Dirt or smudges on the original, or copies with blurry or broken letters are also difficult to scan. Decimal points can get lost if they're too small, or spots can appear as periods in the wrong place.
The hardware requirements for good OCR work is not too demanding. The scan resolution required depends on the size of the text. The smallest font size that can be scanned effectively is usually around 6 points. At this size, you need to scan it at 300-400 DPI. For bigger fonts, you can and should scan at a lower resolution. Over-scanning doesn't help and can even hurt. The OCR program has to sift through all that scan data to recognize the characters. No use overloading it with unnecessary samples. Normally, documents are scanned as line art.
The accuracy of the OCR process is mostly dependent on the software. There are many OCR programs. Most scanners come bundled with one. The OCR software bundled with lower-end scanners are often simpler versions or out-of-date versions of full-featured OCR packages. Which is the best OCR program? New programs and updates appear regularly, so the situation changes constantly. In general, Caere's Omnipage and Xerox's Textbridge have gotten the best ratings. A lite and old version of Omnipage came with my Scanmaker E3. It does a good job on text in the 10-point range. However, when I tried it on 6-point type, it made a lot of errors. I didn't try optimizing it, however. I later tried the latest and greatest version of Xerox's Textbridge (on a UMAX 600P scanner) with the default setup, and it read the same text with few errors. Many OCR programs, including Textbridge, allow for training the OCR to improve its accuracy. An OCR program can be setup to feed text directly into a word processor. It can appear as a menu item in the File menu of a word processor.
Recent hardware innovations that have popped up since the last update are the Universal Serial Bus interface and the Contact Image Sensor (CIS) technology, which I discuss above. I also got a new computer with a USB bus, but I haven't seen any reason to replace my good old reliable SCSI Microtek S3 with a hot new USB model.
I also added some more information about film scanners. I checked and deleted some of the dead links in the reviews links, but kept some of the old ones since some of the information is still valid, if not the prices. I updated the manufacturers links. Some manufacturers (like Storm recently) have died or changed names. New ones have popped up. The old reliable brands (HP, Microtek, Umax, and Mustek) are still plugging along. I added a few new retailers. I haven't gotten around to checking all the other links, so if there are a few dead ones, I'll prune them out later.
I admit that when I
first
created this page, I went overboard on animated GIFs and horizontal
rules,
and it's way too long. If I ever get time, I'll re-design and
re-organize
this page. I'm still dabbling with Web page styles. To see one of my
latest
pages, go to Bay Area Biking or the even
more
recent North Coast and Redwood Empire pages.
2009 Update Notes:
Geez, it's been 10 years since I updated this page. It's so old, I'm
tempted to just start over. Instead, I just cleaned it up a bit, fixed
the spelling errors, deleted lots of dead links, and added some new
ones. The page is interesting from an historical perspective. While
some of the
details and predictions are a bit out-dated, the principles are still
valid. Many changes have occurred since this was first written. USB has
become the only interface for scanners. USB 2.0 has increased the speed
of the USB interface. SCSI and parallel port printers are no longer
available. All-in-one multi-function scanner-printer-faxes have gotten
more popular and are cost and space-saving alternatives to having a
separate printer and scanner. Many small scanner companies have
disappeared. Microtek has stopped selling to consumers in North
America. Digital
cameras have pretty much displaced film cameras. Konica and Minolta
have exited the consumer camera and scanner business.
I got rid
of my good old reliable SCSI Microtek S3 scanner and got a USB Epson
1650 and a Canon Canoscan CIS scanner. I use digital cameras for
photography now, so
I rarely use the scanners for scanning pictures, except for old photos
to be used in digital picture frames. See my web page on outdoor
photography for more on digital photography. I mostly use the
scanners for scanning documents so I have a softcopy of them and can
e-mail them.
Click here to go to Ron Horii's Bay Area Back Pages (lots of scanned photos)
Previous version
12/11/97.
Latest Update 3/6/2009.