IBM's Granite 4.1 Fits 512,000 Words in Memory

May 1, 2026

IBM's Granite 4.1 Fits 512,000 Words in Memory

Published: May 1, 2026 at 12:15 AM

Updated: May 1, 2026 at 12:15 AM

100-word summary

IBM just launched Granite 4.1, a family of open-source models that can hold 512,000 tokens in working memory (roughly 400,000 words). That's enough to process a full novel or dozens of research papers in a single pass. The 8B-parameter version matches IBM's previous flagship while staying small enough to run cheaply. It uses old-fashioned dense architecture instead of the trendy mixture-of-experts approach, trained through five phases including reinforcement learning across math, code, and structured queries. The models ship under Apache 2.0, so you can use them commercially without licensing headaches. Translation: your document Q&A bot can finally answer "what did page 47 say about revenue recognition?" without forgetting page 12.

What happened

IBM just launched Granite 4.1, a family of open-source models that can hold 512,000 tokens in working memory (roughly 400,000 words). That's enough to process a full novel or dozens of research papers in a single pass. The 8B-parameter version matches IBM's previous flagship while staying small enough to run cheaply. It uses old-fashioned dense architecture instead of the trendy mixture-of-experts approach, trained through five phases including reinforcement learning across math, code, and structured queries. The models ship under Apache 2.0, so you can use them commercially without licensing headaches.

Why it matters

Translation: your document Q&A bot can finally answer "what did page 47 say about revenue recognition?" without forgetting page 12.

Sources