Granite 4.0 3B Vision:企业文档视觉模型
Granite 4.0 3B Vision 是为企业文档理解打造的紧凑视觉—语言模型,擅长复杂表格解析、图表结构化与语义键值对抽取。以 LoRA 适配器形式搭配 Granite 4.0 Micro,视觉与语言模块化,支持文本回退与混合流水线集成,便于在企业场景中可靠提取结构化信息。
原文内容
Back to Articles
Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
Enterprise
Article
Published
March 31, 2026
Upvote
28
+22
Madison Lee
kristunlee
Follow
ibm-granite
Rogerio Feris
rferis
Follow
ibm-granite
Eli Schwartz
elischwartz
Follow
ibm-granite
Dhiraj Joshi
dhirajjoshi116
Follow
ibm-granite
Pengyuan Li
pengyuan
Follow
ibm-granite
Isaac Sanchez
sanchy-ibm
Follow
ibm-granite
Today we’re excited to announce
Granite 4.0 3B Vision
, a compact vision-language model (VLM) designed for enterprise document understanding. It’s purpose-built for reliable information extraction from complex documents, forms, and structured visuals. Granite 4.0 3B Vision excels on the following capabilities:
Table Extraction
: Accurately parsing complex table structures (e.g., multi-row, multi-column, etc.) from document images
Chart Understanding
: Converting charts and figures into structured machine-readable formats, summaries, or executable code
Semantic Key-Value Pair (KVP) Extraction
: Identifying and grounding semantically meaningful key-value field pairs across diverse document layouts
The model ships as a LoRA adapter on top of
Granite 4.0 Micro
, our dense language model, keeping vision and language modular for text-only fallbacks and seamless integration into mixed pipelines. It continues to support vision-language tasks such as producing detailed natural-language descriptions from images (e.g., “Describe this image in detail”). The model can be used standalone or in tandem with
Docling
to enhance document processing pipelines with deep visual understanding capabilities.
How Granite 4.0 3B Vision Was Built
Granite 4.0 3B Vision’s performance is the result of three key investments: A purpose-built chart understanding dataset constructed via a novel code-guided data augmentation approach, a novel variant of the
DeepStack architecture
that enables high-detail visual feature injection, and a modular design that keeps the model practical for enterpri
Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
Enterprise
Article
Published
March 31, 2026
Upvote
28
+22
Madison Lee
kristunlee
Follow
ibm-granite
Rogerio Feris
rferis
Follow
ibm-granite
Eli Schwartz
elischwartz
Follow
ibm-granite
Dhiraj Joshi
dhirajjoshi116
Follow
ibm-granite
Pengyuan Li
pengyuan
Follow
ibm-granite
Isaac Sanchez
sanchy-ibm
Follow
ibm-granite
Today we’re excited to announce
Granite 4.0 3B Vision
, a compact vision-language model (VLM) designed for enterprise document understanding. It’s purpose-built for reliable information extraction from complex documents, forms, and structured visuals. Granite 4.0 3B Vision excels on the following capabilities:
Table Extraction
: Accurately parsing complex table structures (e.g., multi-row, multi-column, etc.) from document images
Chart Understanding
: Converting charts and figures into structured machine-readable formats, summaries, or executable code
Semantic Key-Value Pair (KVP) Extraction
: Identifying and grounding semantically meaningful key-value field pairs across diverse document layouts
The model ships as a LoRA adapter on top of
Granite 4.0 Micro
, our dense language model, keeping vision and language modular for text-only fallbacks and seamless integration into mixed pipelines. It continues to support vision-language tasks such as producing detailed natural-language descriptions from images (e.g., “Describe this image in detail”). The model can be used standalone or in tandem with
Docling
to enhance document processing pipelines with deep visual understanding capabilities.
How Granite 4.0 3B Vision Was Built
Granite 4.0 3B Vision’s performance is the result of three key investments: A purpose-built chart understanding dataset constructed via a novel code-guided data augmentation approach, a novel variant of the
DeepStack architecture
that enables high-detail visual feature injection, and a modular design that keeps the model practical for enterpri