CDIA+ CompTIA Document Imaging Exam Notes

CompTIA CDIA+ Exam: 225-030
85 questions
Conventional, linear format.
90 minutes alloted time.
Passing Score: 700 out of 900 possible.

Goals define activities
Activities define documents and data
Documents and data define technology requirements

A process metric is an indicator of the process, ex: how many, how fast. Metrics are taken before, during and after an implementation.

Business value vs Technical value
Technical value – ask if it is reliable, easy to maintain and cost efficient
Business value – When designing a solution, ask what is the strategic value, is it user friendly, is it used often, and what happens if the solution is not available.

Imaging Solution Project Team Members

Project Sponsor – Senior management who is responsible for the project’s success.
Executive Review Committee – managers who support and provide direction for the project
Project team – Members of a team to provide requirements, support and commitment to the project
Project Manager – Coordinates team roles, meetings and schedules
Outside Consultants – Third party sources of expertise and experience, propose products and solutions
System Users -User group A subset of users with similar needs Org charts can be used to identify user groups.High level skill assessments can categorize users to determine training and support needs.

Project Charter

The charter is documentation of the project scope, goals, a member roster, change management, escalation and risk mitigation.
Scope – definition of the high level goals and constraints of the project. The project scope determines the budget, timeframe and credibility of the project.
Scope Creep (requirement creep) – small changes to project goals and requirements that are outside of the original plan. These add up and throw original scheduling, requirements and timeframes off of the plan.

Requirements Document

The requirements document document sums up information gathered during the research phase, and can be used to validate user needs and vendor proposals. The project sponsor should sign off on this document.
The requirements document contains:

  • Description of goal
  • Detailed features required
  • Document volume and retrieval patterns
  • System environment that syustem must live in
  • Security specifications
  • Maintenance and support needs


Document Technologies

Role of the database layer – The database layer exists for organization, categorization and labeling of the converted images. This speeds lookup and retrieval of documents. This also provides centralized storage, and facilitates security permissions.

Imaging Technology – Converting, using scanning or rendition, hard copy or electronic data into electronically stored images. These images can be compressed to lessen storage needs. Technology can also be applied to the images to facilitate document management, such as bar codes, optical character recognition and ICR for indexing.

A DMS (Document Management System) is a product that provides an infrastructure for the digitized images. Software that provides this functionality can manage version control, library services, and security. The DMS may also provide an audit trail to show who accessed what documents and when.

A portal combines different information sources into a unified point of access to the document imaging system. The portal provides access to documents, and also may allow adding or removing documents, task management, security management and search capabilities.

Content Management systems are content-centric- the display of information is the goal, not the documents themselves. A CM solution stores its data in a database and can display the information independently from the original document.

A workflow automates the process behind the document imaging solution. Document routing, using defined rules about the steps required to process information, is the heart of the workflow. Roles are assigned to workers according to the functions they perform. Routing is defined to determine the actions that occur after each step is completed. Routing may need to go in multiple directions at once, or may follow a sequential path.

Groupware – allows groups to simultaneously work on multiple items, and synchronize their tasks. Groupware allows for calendaring and communication, scheduling and management of tasks.

Knowledge management is the process of concatenating the knowledge of the workforce and centralizing that information. Knowledge base applications are an example of this process. KM solutions include search capabilities, the ability to control versioning of documents, and document hierarchy.

EDI Electronic Data Interchange – Transferring information electronically, over a variety of communication interfaces- LAN/WAN, POTS and modems, database to database synchronization, XML data exchange.

ERM- Enterprise Report Management

RM Records Management – The systematic control of records. Produces reliable records over time, with an emphasis on the life cycle of data. The system should address creation of records, distribution, access, and document retention and destruction.

A Record is defined as anything written down and preserved, or an account of events.
A Vital Record is regularly referenced or required for day-to day operations.
Unitized Record -all related materials are in the same location
Document Series – a collection of documents
Document Class – Set of document types with similar parameters
Document Type – Documents that are similar format or content
ODMA Open Document Management API – standardized Application Programming Interface for document management systems.
WebDav Web- Based Distributed Authoring And Versioning- Allows collaborative work on documents and the ability to track modifications and revisions.

Groups involved with records management – ARMA (Association of Records Managers and Administrators) ICRM (Institute of Certified Records Managers)

Microfilm and Microfiche Types
CAR Computer Assisted Retrieval 16mm film with a computer based index for the film. Uses indicator marks on the film itself to find images in the index.
Jacket – all record assets are placed on microfilm, these images are “jacketed” together to unitize the record.
COM Computer Output to Microfiche – information printed directly to microfiche, and indexed on a computer.
Aperture Cards – 35mm film slides are inserted to punch cards for each record. The card provides the index, the slide contains a large document (example: maps, blueprints)

Disaster Recovery vs. Business Continuity

Disaster Recovery Plan – A corporate plan to re-implement services in the event of an outage. Test the plan ( and document the test) at least yearly. The DR plan should include a complete inventory of all devices.
Business Continuity Plan Processes – and methods to minimize business disruption. Should contain information about specific events, contracts and a contact list.
Redundancy – Multiple components designed for fail-over
Clustering – Strategy for redundancy and load balancing
Fault Tolerance – Operation is continued if a fault occurs

Hot Site -A fully equipped and operational data processing facility- ready to go. Very expensive.
Cold Site -Conditioned space, possibly with communications, environmental controls and power. No live data.
Warm Site �Conditioned space with communications, environmental controls and power, Equipment is in place. Data may be near line or brought in via removable media such as tape.

RAID – Redundant Array of Inexpensive Disks – multi disk arrays offer fault tolerance at the cost of speed and capacity.

SLA Service Level Agreement- This document defines what should be delivered, when the services will be delivered and how they will be provisioned. It should also detail response time, what is specifically covered and escalation procedures.

DM System Types

Archival systems replace media that is used to store seldom-used information. These systems lower the cost of storage and provide quick accessibility to information.
File and Retrieval systems replace active media systems. These systems increase productivity and reduce labor cost.
Strategic Business/Enterprise systems integrate the needs of the entire business and allow addressing the needs of the entire business more efficiently. These systems are often integrated into other IT systems.


One Time Costs – the amount of initial investment (hardware, licenses, training, conversion)
Ongoing Costs- recurring bills – labor, maintenance and support, supplies and media
Conversion Costs – The cost of converting the existing files and document to the new system. Depending on implementation, this can be a recurring or one time expense.
ROI Return on Investment – These factors can be a financial or quantitative return.

Three major types of Risk

Financial Risk – Includes investment risk, supplier risk, legal risk. Code Escrow can be used to mitigate the risk of a supplier going out of business.
Technical Risk – Proprietary integrated systems must have a long life. Can the network foundation support the endeavor?
Operational Risk- Will the users and business processes support the new imaging system?

Risk management steps:
Identify the risks
Analyze risks-the probability of the risk occurring
Plan the project and the mitigation of risks. Have a contingency plan
Implement the project
Track and Control the risks

How to design a solution

Designing A Solution: Identify the scope, the users, the database, hardware, interfaces and implementation. Use this information to create a conceptual design document, aka a Recommendation For Proposed Solution. Gear this towards the audience, with high level conceptual information for managers and more detailed for line workers.

Document Scanning And Information Insertion

Document Preparation – Prepping of the documents to be added to the document management system – Are the documents in order, free of staples and paper clips? Problems with misfeeds can lower quality and decrease the productivity of the system.

After the documents are scanned, where will they go? A safe strategy will consider retaining them for a period of time.Consider marking the documents as scanned somehow. The scanned documents should also be checked for quality.

Watermarking Images – A pattern inserted into a digital image. Watermarking can be used to assert copyright, to show an origination source or identify forged or stolen material.

Common Document file formats

ISO Latin1 (pure text) text only, small size, indexable
Tif ( tagged image file) rasterized images of the original document. Large output files, but can be compressed
PDF (portable document format) Freely readable and protable across multiple document systems. Image sizing can be conmtrolled via quality sliders. Text elements are stored as chearacters. Format is owned by Adobe Systems.
DjVU – More compressed and less functionality then PDF. Owned by LizardTech
LDF (LuraDoc format) breaks the documents into scanned images and test layers.
GIF Graphics Interchange Format – line art representation of original document. Portable format.
JPG Joint Photographic Experts Group – compressed graphic images. JPG is a lossy format.
BMP Bitmap- Large file size in a portable image format. BMP is not lossy.

Image Enhancement /Image Processing add ins

Image enhancement can be performed by add in boards that increase the performance and abilities of the system. These boards can do additional image processing such as scaling, rotation and OCR among other tricks.

Image Enhancement Terms and Techniques
Adaptive Thresholding – in black and white conversion, the threshold where a black /white decision is made. This can be dyanmic according to contrast.
Thresholding – the threshold where a black /white decision is made. The decision is set manually.
Cropping – removing borders or portions of an image
Deskewing – straigtening out a skewed image.
Despeckling – removing stray speckles from an image, leaves a clean background.
Dithering – a technique of improving an image by softening the image edges. This can reduce resolution but leaves a cleaner image.
Edge Detection – frames the target scan area by finding the document edge.
Inverting– reverses the colors, black to white.
Rotating – turning the image either to fit a format, or to fix the appearance of a scanned original.
Screen Scraping – capturing input from a specific area of the scanned image.
Background Dropout – dumping the background pattern or image during the scanning process. Saves space, leaves a clean image.
CAR\LAR Courtesy Amount Recognition/Legal Amount Recognition – In chack scanning – the software compaeres the two amounts and returns the value.
ICR Intelligent Character Recognition – used to recognize handwriting. Learns andbecomes more accurate as time goes on.
OCR Optical Character Recogniton – reads character text in a scanned image.
OMR Optical Mark Sense – reads checkmarks, boxes etc on a form.
Forms Processing – The ability to read just the data from a filled out form, dropping all but the data and storing the text only. This may include the ability to act on the data inserted to the form.
Annotations– Additional notes outside of the document main.
Redactions – use of a black area over certain parts of a document, to keep sensitive information private.

Scanner types and terms

Scanners are commonly available in flatbed or sheetfed form factors. An auto document feeder (ADF) allows for moving through a stack of documents easily. Scanners can also be hand-held, overhead, large format or film scanners. MICR scanners can read magnetic ink on checks. Scanners can scan into a binary large object (blob) or turn the document into machine readbale text via OCR.

Scanners are rated on duty cycle and pages per minute (ppm). Duty cycle is the amount of use until maintenence. The ppm rating for a scanner is based on a certain resolution (dpi)- a larger dpi will scan slower but improve OCR results. The resolution of an image is is number of dots per inch (dpi). The larger the dpi, the larger the file size.

Standardized Scanner Drivers – TWAIN and ISIS (Image and Scanner Interface Specification) – standards allow for programmers to have an expected feature set, abstracting the actual hardware layer. This allows developers to write to a single driver and use it across many equipment manufacturers.

Data Retrieval planning

Indexing refers to cataloging the documents in the document management system. Using index fields, it is possible to find documents based on the information they contain. The indexing scheme should identify the fields to be indexed. Index information can be manually keyed in or extracted automatically via barcodes, MICR, OCR, metadata, or other automated methods.

Retrieval Methods

Indexes – Find items based on document indexes
Full Text Retrieval -search for word strings
Federated Search -searching across multiple data sources
Category Search – Topical searching
Keyword Search- specific words, can use Boolean operators

A data retrieval system needs to be user friendly and accessible for maximum user acceptance.

Data Bridge – the link between two disparate document systems, whether used as a one time conversion tool or for integration/join/data view purposes.

Storage options

Storage is a balance of speed against price. Devise a system that keeps infrequently accessed documents on slower, inexpensive media.

Storage formats have problems – corrupt media, securing information, obsolescence, capacity. Account for these issues with backup strategies in the DR plan. Migrate obsolete formats to newer technology.

WORM technology- Write Once Read Many – this technology allows for write protection. This may be a legal requirement, guarding against unauthorized changes. Using WORM media provides data integrity and non-repudiation.

Jukeboxes are used to load multiple media sources under a unified interface. JMS (jukebox management software) is used to control these subsystems, and manage archiving and indexing.

NAS Network Attached Storage
SAN Storage Area Network
Generally speaking, a SAN deals with block level access and NAS is file level access.
Storage Virtualization -unification of storage – create a single pool managed across varied platforms and locations.

DASD -Direct Access Storage Devices – an example is a hard drive. RAID is one type.

HSM Hierarchical Storage Management -software that migrates documents between on line and near line storage. The HSM software may leave an index file on the online storage system for speeding access to a migrated file.

RAID – Redundant Array Of Inexpensive Discs. A redundancy strategy that stripes or mirrors data across multiple drives. All RAID solutions are not fault tolerant. Raid can be implemented via software or hardware – Hardware is more expensive but is vastly superior.

Common RAID levels
Raid 0 -stripe set, no parity. Fast, but more prone to failure.
Raid 1 – Mirrored disks. Faster reads, slower writes then single disk
Raid 5 – Disk striping with parity. 3 Drives minimum. Fault tolerant and fast.

Tape Storage – Good for offsiting and cold storage. Less expensive solution, slow writes and reads. Common formats are DAT and DLT.

Optical Disc – DVD and CD using wither rewritable or write once technology. Files can be added with packet writing software, either Disc At Once or Track At once. Once a disc is finalized it cannot be added to. Longevity may be an issue for burned CDs and DVDs.

Conversion Strategies

No Conversion- let sleeping dogs lie. Leave the data on the original media. Day-Forward – Lets the old data stay , new data is brought into the new system. Day Forward with On Demand – New documents are scanned. Old data that is needed is scanned when needed. Back file conversion – either partial or complete, brings old documents into the system. In house against Out of House conversion Outside firms havbe trained staff, specialized hardware and must guarantee quality (at a high cost.) In house conversion can be cheaper on the whole, but must consider costs in labor, equipment, training and quality control.

System design and Implementation

Prototypes can be built to simulate a new system. Useful for turning up unconsidered factors, reduce risk for project stakeholders. A pilot is not a prototype. Pilot programs are small scale rollouts of a finalized solution. Prototype systems use test data and do not intruse on active systems and business processes. After testing with the prototype system, gather feedback on what works and what didn’t. Use this opportunity to set expectations and address issues that arose duing the test.

Implementation Planning – Try and break the implementation into phases to reduce the impact of change. Project management software can be used to show dates, milestones and deliverables. This should be controlled by the implementation manager. The plan should define what constitutues a successful rollout.
System testing-
UAT User Acceptance Testing – does the product function as expected?
Perfromance Testing – Does the product perfrom as expected?
Load Testing – Does system performance suffer unexpectedly under a heavy load?
Stress Testing – Overloading the system in an attempt to produce a failure.
Regression Testing – Verifying all bugs have been fixed and no new ones have arisen.

Test Plan -how the test will be undertaken. Includes the specifications of the test, the data used in the test, how it will be used, a record of the test procedure, and a report on findings.

System Documentation
The implementaton of the system should include technichal, system and user documentation.

Documentation is part of the overall deliverable system. The documentation needs to be expandable and updated when needed.
System Admninistration: Backups/DR, user account and security, maintence and troubleshooting
User: Help docs, user guides
Customization: changes made to the base system for implementation

The system users will require the most detailed training, with the initial rollout training being the most intensive. Evaluate the user group for skills and resistance to change. User skills can be assessed in multiple ways including surveys, observation and testing.

Industry Acronyms

ADF Auto Document Feeder
ADO ActiveX Data Objects
ADU Automatic Duplexing Unit
AP Accounts Payable
API Application Programming Interface
AVI Audio Video Interleave
B2B Business to Business
B2C Business to customer Bit Map
CAR Computer Assisted Retrieval
CD Compact Disc
CDR Compact Disc Recordable
CEO Chief Executive Officer
CFO Chief Financial Officer
CIO Chief Information Officer
CMYK Cyan, Magenta, Yellow, Black
COLD Computer Output to Laser Disk
CRM Customer Relationship Management
CSS Cascading Style Sheets
DDS Digital Data Storage
DIS Document Imaging Solution
DLT Digital Linear Tape
DM/DI Document Management / Document Imaging
DMS Document Management System
DMS/DIS Document Management System/Document Imaging System
dpi dots per inch
DRM Digital Rights Management
DSL Digital Subscriber Line
DVD Digital Video Disc
DVD-R Digital Video Disc-Recordable
DVD+R Digital Video Disc+Recordable
ECC Error Correction Control
EDM Electronic Document Management
EDMS Electronic Document Management System
EDS Electronic Document System
ERP Enterprise Resource Planning
FTP File Transfer Protocol
GB Gigabyte
GIF Graphics Interchange Format
HSM Hierarchical Storage Management
HTTP Hypertext Transfer Protocol
SICR Secure Intelligent Character Recognition
IDE Integrated Drive Electronics
IMS Image Management System
ISDN Integrated Services Digital Network
ISIS Image and Scanner Interface Specification
IT Information Technology
JDBC JAVA Database Connect
JPEG Joint Photographic Expert Group
JSP JavaServer Pages
KB Kilobyte
LAN Local Area Network
LDAP Lightweight Directory Access Protocol
MB Megabytes per second
MFD Multi-functional Device
MICR Magnetic Ink Character Recognition
MO Magneto-Optical
NAS Network Attached Storage
OCR Optical Character Recognition
ODBC Open Database Connectivity
ODMA Open Document Management API
OMR Optical Mark Recognition
PCL Printer Control Language
PDL Page Description Language
PKI Public Key Infrastructure
POP3 Post Office Protocol version 3
ppm pages per minute
PS Postscript
RAID Redundant Array of Independent Disks
RFI Request For Information
RFP Request For Proposal
RGB Red, Green, Blue
ROI Return on Investment
SAN Storage Area Network
SCSI Small Computer System Interface
SGML Standard Generalized Markup Language
SME Subject Matter Expert
SMTP Simple Mail Transfer Protocol
SNMP Simple Network Management Protocol
SQL Structured Query Language
SSL Secure Sockets Layer
TCP/IP Transfer Control Protocol / Internet Protocol
TFTP Trivial File Transfer Protocol
TIF Tagged Image File Format
USB Universal Serial Bus
VPN Virtual Private Network
WAN Wide Area Network
WCMS Web Content Management System
WORM Write Once, Read Many
XML Extensible Markup Language