Glossary Intelligent Document Processing

Below is a glossary of essential terms to help you understand and maximize intelligent document automation.

Francesco Cavina

CEO & Co-Founder

IDP Glossary: essential terms in Intelligent Document Processing

The world of Intelligent Document Processing uses technical terms related to OCR, artificial intelligence, data extraction, and document automation. In recent years, this vocabulary has also expanded to include concepts related to language and multimodal models, such as LLMs, SLMs, VLMs, SVLMs, MLLMs, and document-based RAG.

In this glossary, you will find the essential definitions needed to understand how an IDP platform works and which concepts are most important in business document digitization processes.

Basic IDP Concepts

Intelligent Document Processing

Intelligent Document Processing, often abbreviated as IDP, is a set of technologies that enables the capture, classification, reading, and extraction of data from digital or paper documents.

Unlike simple OCR, an IDP solution does not merely convert an image into text; it interprets the content of the document and returns structured data that can be validated and integrated into business systems.

IDP is used to automate the management of invoices, utility bills, identity documents, passports, payslips, contracts, forms, and other recurring documents.

More information about IDP

Optical Character Recognition

Optical Character Recognition, or OCR, is a technology that recognizes letters, numbers, and symbols in a document and converts them into machine-readable digital text.

OCR is a fundamental component of document automation, but on its own it is not sufficient to understand the meaning of data. For this reason, in modern IDP platforms, it is combined with artificial intelligence, machine learning, NLP, and advanced extraction models.

More Information OCR

Zonal OCR

Zonal OCR is an OCR technique that extracts text only from specific, predefined areas of a document.

Instead of reading the entire page, the system focuses on precise zones, such as the “total,” “date,” “document number,” or “tax code” field.

This technique can increase accuracy on documents with a fixed layout, but it is less effective on semi-structured or variable documents, where information may change position.

Template matching

Template matching is a technique that compares a document with a predefined template in order to identify fields and information in known positions.

It is effective on structured or highly standardized documents, such as forms that are always the same, but becomes less suitable when layouts change frequently.

In modern IDP platforms, template matching can be complemented or replaced by more flexible AI models, capable of understanding content even when the position of the data varies.

Intelligent Character Recognition

Intelligent Character Recognition, or ICR, is an evolution of OCR specialized in recognizing handwritten text.

It is more complex than traditional OCR because it must handle the variability of handwriting. It is used, for example, on manually completed forms, annotations, paper documents, and extended signatures.

Document AI

Document AI refers to the broader field that applies artificial intelligence to the understanding, processing, and generation of content from documents.

It includes technologies such as OCR, automatic classification, data extraction, semantic analysis, intelligent validation, and multimodal models.

While IDP is the application-level solution that transforms documents into business-ready data, Document AI is the technological discipline that makes this process possible.

Document Automation

Document automation is the process that makes it possible to reduce or eliminate manual activities related to document management.

It can include capturing, reading, classifying, extracting, verifying, and sending data to management software, ERP systems, CRM platforms, or corporate databases.

IDP represents an advanced form of document automation because it uses artificial intelligence to manage variable and complex documents as well.

More information about Document Automation

Types of Documents

Structured Document

A structured document follows a fixed and predictable schema. The layout, fields, position of information, and format are generally the same across different copies of the same document.

An example of a structured document is an identity card, where first name, last name, date of birth, document number, and other data are found in defined areas.

Semi-Structured Document

A semi-structured document contains recurring information, but not always in the same position or in the same format.

This is typical of invoices, utility bills, or payslips. In these documents, fields such as date, amount, tax code, VAT number, or document number are often present, but they may change position depending on the supplier, model, or layout.

Unstructured Document

An unstructured document does not follow a fixed schema and may contain information distributed freely throughout the text.

Contracts, letters, reports, formal communications, and narrative documents are examples of unstructured documents.

To process them correctly, an IDP solution must understand the context and meaning of the information, not just its position in the document.

More information about Structured, Semi-Structured, and Unstructured Documents

Native PDF and Scanned PDF

A native PDF, or born-digital PDF, is generated directly by software and contains text that is already selectable and searchable.

A scanned PDF, on the other hand, is essentially an image of the document and requires OCR to extract the text.

This distinction is important because it affects the accuracy and type of processing required: a low-quality scanned document is more prone to reading errors than a native PDF.

The Stages of Intelligent Document Processing

Data capture

Data capture is the process of acquiring data from documents, images, PDFs, emails, or other channels.

In IDP, it represents one of the first stages of the workflow, because it makes it possible to transform a document into a processable data source.

Pre-processing

Pre-processing is the document preparation stage that takes place before extraction.

It may include operations such as image deskewing, noise removal, contrast enhancement, or page separation.

Good pre-processing improves the quality of OCR and subsequent extraction, especially on scanned or low-quality documents.

Document parsing

Document parsing is the process that analyzes the structure of a document in order to separate, recognize, and organize the information it contains.

It may involve paragraphs, tables, headers, rows, fields, and sections of the document.

Parsing is a technical stage within the IDP workflow, not a synonym for it: it is used to transform a complex document into ordered data on which classification and extraction can then operate.

Document Classification

Document classification consists of automatically assigning one or more categories to a document.

For example, an IDP platform can recognize whether a file is an invoice, a utility bill, an identity document, a payslip, or a contract.

This step is important because it makes it possible to apply the most suitable extraction model to each document type.

Document indexing

Document indexing is the process of assigning metadata, labels, and search keys to a document.

It is used to make documents easier to archive, find, and retrieve within a document management system.

In the IDP context, indexing can be based on automatically extracted data, such as document type, date, customer name, case number, tax code, supplier, or processing status.

Information Extraction

Information extraction is the process of identifying and retrieving specific data from a document.

It may involve personal data, amounts, dates, tax codes, VAT numbers, document numbers, addresses, or other relevant information.

In IDP, data extraction is not based only on the position of the text, but also on the meaning of the fields.

Data Validation

Data validation is the check that verifies whether the extracted information is correct, complete, and consistent.

It can be automatic, manual, or hybrid.

For example, a system can check whether a tax code has a valid structure, whether a date is consistent, or whether the total amount of an invoice matches the sum of the amounts shown in the document.

Post-processing

Post-processing is the stage that follows data extraction.

It is used to normalize, correct, validate, or enrich information before it is sent to business systems.

For example, a date can be converted into a standard format, or an amount can be checked against other values present in the document.

Human-in-the-loop

Human-in-the-loop, often abbreviated as HITL, refers to the process in which a human operator intervenes to verify, correct, or confirm the results produced by an artificial intelligence model.

This approach combines the speed of automation with human control, increasing data reliability in more complex or ambiguous cases.

Extracted Data and Output Structure

Key-Value Pair

A key-value pair is a data structure composed of a label, the key, and its corresponding value.

Example:

Invoice total → €1,220.00

It is one of the main formats in which IDP returns extracted data, because it makes each piece of information immediately identifiable and integrable into business systems.

Line-item

A line item is a single row in a table or list within a document, such as the rows of an invoice or receipt.

Extracting line items requires recognizing the table structure and the correct association between columns and values, such as description, quantity, price, and total.

Table Extraction

Table extraction is the process that identifies and reconstructs tables present in a document, preserving the relationship between rows, columns, and cells.

It is one of the most complex IDP tasks, because tables vary greatly in layout and may extend across multiple pages.

Signature Recognition

Signature recognition is the ability to identify the presence, position, and, where applicable, status of a signature within a document.

It is useful, for example, for verifying that a contract or form has been signed before continuing with the workflow.

Entity

An entity is a meaningful piece of information identified within a text: names of people, companies, dates, amounts, addresses, or codes.

Named Entity Recognition, or NER, is the technique that automatically identifies and classifies these entities, and it is one of the building blocks of data extraction in documents.

Extraction Schema

An extraction schema is the definition of the fields that the system must retrieve from a specific type of document, together with the expected format for each field.

It works as a reference data model: for example, it establishes that an invoice should include the extraction of invoice number, date, supplier, taxable amount, VAT, and total.

Bounding Box

A bounding box is the coordinate-based rectangle that identifies the exact position of an element, such as a field, word, or table, within the document image.

It is the basis of visual grounding, because it makes it possible to link each extracted data point to the precise location from which it originates.

AI Technologies and Models

Algorithms

An algorithm is an ordered sequence of instructions that enables a computer system to perform a task or solve a problem.

In the context of Intelligent Document Processing, algorithms are used to classify documents, recognize patterns, extract data, compare information, and automate decisions within the document workflow.

Machine Learning

Machine Learning is a branch of artificial intelligence that enables systems to learn from examples and data.

In IDP, it is used to recognize document types, identify relevant fields, and progressively improve extraction accuracy.

Supervised Learning

Supervised learning is a Machine Learning technique in which a model is trained using already labeled data.

In the IDP context, it can be used to teach the system to recognize specific document types or fields, such as invoices, tax codes, amounts, dates, or document numbers.

The model learns by comparing input examples with the expected correct output and progressively improves its predictive capability.

Unsupervised Learning

Unsupervised learning is a Machine Learning technique in which the model analyzes unlabeled data to identify patterns, similarities, or recurring groups.

In IDP, it can be useful for automatically recognizing similar documents, identifying layout clusters, detecting anomalies, or organizing large document archives without having to manually define every category in advance.

Reinforcement Learning

Reinforcement learning is a Machine Learning method in which a model learns through positive or negative feedback received from the environment.

In the document context, it is less common than supervised learning, but it can be applied in advanced scenarios where the system must progressively optimize its decisions, for example by choosing the best validation path or improving the workflow based on previous outcomes.

Deep Learning

Deep Learning is an advanced Machine Learning technique based on neural networks.

It is particularly useful in the processing of complex documents, images, variable layouts, and text that is not always perfectly readable.

In IDP, it is often used to improve OCR, classification, layout understanding, and field recognition.

Natural Language Processing

Natural Language Processing, or NLP, is the technology that enables software to analyze and understand human language.

In IDP, it is used to interpret texts, identify entities, understand relationships between words, and recognize the meaning of information contained in documents.

Computer Vision

Computer Vision is a technology that enables systems to analyze images and visual content.

In IDP, it is used to recognize layouts, tables, signatures, stamps, fields, sections, and other graphic elements present in documents.

Large Language Model

A Large Language Model, or LLM, is an artificial intelligence model trained on large amounts of text to understand and generate natural language.

In IDP, LLMs make it possible to interpret the meaning of documents, not just the position of the text.

However, when used in a generic way, LLMs can be costly, slow, and prone to hallucinations. For this reason, in document contexts, smaller and more specialized models are often preferred.

Small Language Model

A Small Language Model, or SLM, is a smaller language model specialized for specific tasks.

Compared with a general-purpose LLM, it offers greater speed, lower costs, and more control, while maintaining high accuracy in the domain for which it is optimized.

In IDP, it can be used, for example, for a specific document category, such as invoices, utility bills, or payslips.

Vision-Language Model

A Vision-Language Model, or VLM, is a model that processes text and images together, “seeing” the document instead of only reading its transcribed text.

This allows it to interpret layouts, tables, field positions, and graphic elements without depending exclusively on OCR.

Multimodal models are the foundation of the most recent IDP solutions, because they treat the document in its full visual and textual form.

Small Vision-Language Model

A Small Vision-Language Model, or SVLM, is a lighter and more specialized version of a vision-language model.

It combines visual and linguistic understanding, but with a more efficient architecture than large multimodal models.

In the IDP context, SVLMs are useful when large volumes of documents need to be analyzed while maintaining fast response times and low processing costs.

They can support tasks such as layout recognition, field detection, table reading, scanned document checking, and visual verification of extracted information.

Compared with a general-purpose VLM, an SVLM can be optimized for specific domains, such as invoices, utility bills, identity documents, passports, or payslips, improving the balance between accuracy, speed, and operating cost.

More information about SVLMs

Multimodal Large Language Model

A Multimodal Large Language Model, or MLLM, is a large-scale model capable of processing multiple types of input, such as text, images, and documents.

In IDP, it can read textual content, interpret the visual structure of the page, and reason about the relationships between fields, tables, signatures, stamps, and sections of the document.

MLLMs are particularly useful for complex or poorly standardized documents, but they require strict controls over costs, response times, security, and the risk of hallucination.

Document Embeddings

Document embeddings are numerical representations of the content of a document, sentence, or field.

They allow AI systems to compare information based on meaning, not only on the exact presence of words.

In IDP, they can be used for semantic search, document classification, deduplication, comparison between similar documents, and retrieval of relevant information within corporate archives.

Document RAG

Retrieval-Augmented Generation, or RAG, is an approach that combines generative models with information retrieval from document sources.

Before generating an answer, the system searches for relevant content in the available documents and uses it as context.

In the IDP context, RAG can be useful for querying document archives, answering questions about contracts, manuals, or business cases, and reducing the risk that the model produces answers not grounded in the original documents.

Generative AI

Generative AI refers to artificial intelligence models capable of generating new content, such as text, structured data, or summaries, starting from an input.

In IDP, it can be used to extract and reorganize information in natural language, but it requires appropriate controls to avoid hallucinations and ensure data reliability.

Model Training

Model training is the process through which an artificial intelligence system learns to recognize patterns, fields, and information from real examples.

The more representative and high-quality the training data is, the greater the model’s accuracy will be in real-world cases.

Fine-Tuning

Fine-tuning is the specialization of a pre-trained model on a specific task or domain through an additional set of targeted examples.

In IDP, it makes it possible to adapt a general model to a particular type of document or industry, improving its accuracy without having to train it from scratch.

Generalization

Generalization is the ability of an artificial intelligence model to work correctly even on new data that differs from the data used during training.

In IDP, this is an important characteristic because real documents can vary greatly in layout, quality, format, and content.

A model with good generalization can extract data even from documents it has never seen before, while a model that is too closely tied to training examples risks working well only on cases very similar to those it already knows.

Overfitting

Overfitting occurs when a model learns the details of the training data too well, including noise, exceptions, or characteristics that cannot be generalized.

In these cases, the model may achieve good results on the documents used for training, but perform poorly on new documents.

In the IDP context, overfitting can lead to unreliable extractions when layouts, suppliers, formats, or scanning conditions change.

Zero-Shot and Few-Shot

Zero-shot and few-shot describe a model’s ability to perform a task with no examples or with only a few examples provided at the time of the request, respectively.

In IDP, they are useful for handling new or rare documents for which dedicated training is not available.

Prompt

A prompt is the instruction or input provided to a generative model in order to obtain the desired result.

In document extraction, a well-constructed prompt guides the model to return the required fields in the correct format.

Token

A token is the minimum unit of text that a model processes. It may correspond to a word, part of a word, or a symbol.

The number of tokens affects the costs and processing times of a language model.

Quality, Control, and Accuracy

Confidence score

A confidence score is a score that indicates the level of certainty with which the system has recognized or extracted a data point.

A high value indicates greater reliability, while a low value may require human review.

For example, if the system extracts a tax code with a low confidence score, it can send it to an operator for verification.

Accuracy

Accuracy measures how correct the results produced by the system are compared with the real data.

In IDP, it is a fundamental indicator for evaluating the effectiveness of a document automation solution.

Precision, Recall, and F1

Precision, recall, and F1 are metrics used to evaluate extraction quality more precisely than overall accuracy.

Precision indicates how many of the extracted data points are correct.

Recall indicates how many of the data points present in the document were actually retrieved.

F1 is the harmonic mean of precision and recall and provides a synthetic, balanced measure.

Error Rate

The error rate indicates the percentage of data recognized or extracted incorrectly.

Reducing the error rate is one of the main objectives of an IDP platform, especially in business processes where data quality is critical.

Human Error

Human error is an unintentional mistake caused by a person during a manual activity.

It may involve data entry, incorrect reading of a document, duplication of information, or omissions.

Document automation reduces the risk of human error in repetitive, high-volume processes.

Hallucination

Hallucination is the phenomenon in which a generative model produces plausible information that is not present in the document.

In document extraction, it is a critical risk, because an invented data point may appear correct even though it has no basis in the source.

Techniques such as visual grounding, confidence scores, and validation help contain this risk, ensuring that every extracted data point is verifiable.

Visual Grounding

Visual grounding, or source anchoring, is the system’s ability to link each extracted data point to the precise location in the document from which it originates.

This can be done through coordinates, boxes, lines, or highlighted areas in the document.

It makes it possible to verify the origin of each piece of information and makes AI more transparent, preventing the system from behaving like a black box.

Benchmark

A benchmark is a standardized set of documents and metrics used to measure and compare the performance of different systems on the same task.

In IDP, it is used to objectively evaluate the accuracy of OCR, classification, and extraction.

Ground Truth

Ground truth is the set of correct, manually verified data against which a system’s results are measured.

It is essential both for training models and for evaluating their accuracy reliably.

Integration, Business, and Compliance

Document Workflow

A document workflow is the operational flow that a document follows within an organization.

It may include receipt, classification, data extraction, approval, archiving, and delivery to other systems.

IDP makes it possible to automate many stages of the document workflow.

Business Process Management

Business Process Management, or BPM, is the set of methods and tools used to design, manage, monitor, and optimize business processes.

In the IDP context, BPM can orchestrate the document workflow end to end: document receipt, classification, data extraction, validation, approval, and delivery to business systems.

The integration between IDP and BPM makes it possible to transform document automation into a structured, measurable, and controllable process.

Straight-Through Processing

Straight-through processing, or STP, indicates the percentage of documents processed end to end without any human intervention.

It is often also referred to as the automation rate.

It is one of the most important value indicators of an IDP solution, because it measures how much manual work is actually eliminated.

API

An API is an interface that allows two software systems to communicate with each other.

In IDP, APIs make it possible to connect the document extraction platform to management systems, ERP systems, CRM platforms, databases, or other business systems.

Data Mapping

Data mapping is the process that defines the correspondence between data extracted from a document and the fields present in another system.

For example, the “Invoice total” field extracted from an invoice can be linked to the “amount_total” field of an ERP or business management system.

In the IDP context, data mapping is essential for correctly integrating extracted information into destination systems, avoiding association errors or data loss.

ERP

An ERP is a management system used by companies to administer processes such as accounting, purchasing, sales, inventory, human resources, and production.

An IDP platform can send data extracted from documents directly to the ERP, reducing manual data entry.

RPA

Robotic Process Automation, or RPA, is a technology that automates repetitive tasks performed on software and digital interfaces.

It can be integrated with IDP to automate broader processes, such as accounts payable invoice management or data entry into management systems.

Manual Data Entry

Manual data entry is the manual input of data by an operator.

It is often a repetitive, slow, and error-prone activity.

IDP makes it possible to reduce manual data entry by automatically transforming documents into structured data.

Low-Code / No-Code

Low-code and no-code refer to approaches that make it possible to create applications, workflows, or software configurations with little or no traditional programming.

Low-code platforms require minimal technical skills, while no-code platforms are also designed for business users who are not developers.

In the IDP context, low-code and no-code tools can allow users to configure extraction models, validation rules, approval workflows, and integrations without having to write complex code.

Cloud and On-Premise

Cloud and on-premise refer to two software delivery models.

In the cloud model, the platform is hosted on servers managed by the provider and accessible via the internet.

In the on-premise model, on the other hand, it is installed on the customer’s internal infrastructure.

The choice affects security, maintenance, updates, and data control, and should be evaluated based on industry requirements and applicable regulations.

Data Sovereignty

Data sovereignty is the principle whereby data remains subject to the legislation and infrastructure of a specific jurisdiction.

For example, a European company may require that data be stored on European servers in compliance with the GDPR.

It is a central requirement in regulated industries and in the processing of documents containing personal data.

GDPR

The GDPR, General Data Protection Regulation, is the European regulation on the protection of personal data.

In IDP, it is relevant because processed documents often contain personal and sensitive data, whose processing must comply with specific requirements for security, retention, and transparency.

Audit Trail

An audit trail is the chronological record of all operations performed on a document or on a data point.

It makes it possible to know who did what, when, and with what outcome.

It ensures process traceability and is important for internal controls, regulatory compliance, and accountability management.

Articles in the same category

Make or buy in IDP: how to choose the right document automation solution

Build or buy an IDP platform? a practical guide to assessing costs, timelines, scalability, and risks when choosing the best document automation solution.

Read it now

OCR vs IDP: differences and which technology to choose

OCR and IDP are two key technologies for document automation: OCR makes it possible to read text from images and PDFs, while IDP understands document content and transforms it into structured data ready for business processes.

Read it now

What Is Document AI? Its Evolution Over the Years and Main Tasks

Document AI represents the evolution of technologies designed to understand, classify, extract, and generate data from documents, from rule-based systems to multimodal models and complete IDP platforms.

Read it now

What is artificial intelligence and why is it important for businesses

Artificial intelligence helps businesses automate tasks, analyze data, manage documents, and make processes more efficient. In this article, we explore what AI is, how it works, and where it can generate real value within a company.

Read it now

What is OCR and how has it evolved: from traditional techniques to Vision Language Models

OCR converts text from images and PDFs into digital content, but today it's only the first step. With VLM and IDP, advanced systems don't just read: they understand documents, structure data and enable automation.

Read it now

Small Vision Language Models (SVLM): what they are and why they are transforming document processing

Small Vision Language Models (SVLM) are artificial intelligence models capable of simultaneously processing visual and textual content. Born as a compact evolution of generalist VLMo, they are used in numerous domains.

Read it now