_ October 5, 2025_ VanQuicktech_ 0 Comments

Magistral-Small-2509: Open-Source AI with Vision Support

TL;DR

Magistral‑Small‑2509 is an open‑source 24‑billion‑parameter language‑and‑vision model released by Mistral in mid‑September 2025. The model adds multimodal input and reasoning features using special [THINK] tokens and a long 128‑k context windowhuggingface.co. A free Hugging Face demo exists but has tight ZeroGPU quotas, so testing is limited. The API costs $0.50 per million input tokens and $1.50 per million output tokensventurebeat.com—cheaper than Claude Sonnet 4.5 but more expensive than DeepSeek.

What It Does (90–120 words)

Magistral‑Small‑2509 is Mistral’s September 2025 release of its small reasoning model. It supports both text and images, uses [THINK] tokens to plan intermediate reasoning steps, and delivers a final answer in a separate segmenthuggingface.co. The model claims improvements over v1.1 in reasoning and tone, and offers multilingual support across more than 20 languageshuggingface.co. It is licensed under Apache 2.0huggingface.co, so it can be self‑hosted. While a gradio demo is available, it runs on a shared free GPU pool with strict quotas, meaning users may see errors after a single request. Full access is through Mistral’s API or by running the model locally with quantized weights.

Who It’s For / Not For

For:

Developers and researchers needing a self‑hostable reasoning model with vision support.
Teams wanting to build agents or RAG systems and keep costs lower than premium models.
Users comfortable running a 24‑B parameter model on a high‑end GPU or paying for API access.

Not for:

Casual users seeking a plug‑and‑play chat assistant—there is no polished consumer UI and free demos are throttled.
Organizations requiring guaranteed zero‑data retention for chat; Mistral’s ZDR is unavailable on Le Chathelp.mistral.ai.
Anyone who needs the highest performance available; larger models or closed models like Claude Sonnet 4.5 will outperform it.

Hands‑On Test

Setup

Navigated to the public gradio demo at akhaliq-magistral-small-2509.hf.space (no account needed).
The chat UI loads instantly but runs on the free ZeroGPU queue. I typed a simple question about France’s capital.
The first response was returned after ~12 s; on a subsequent message the demo showed a “ZeroGPU quota exceeded” banner and refused further requests—typical of free Hugging Face spaces.

Core workflow

Input: typed a question. The model responded with intermediate reasoning enclosed in [THINK] tags followed by a “Final answer” segment, confirming that the [THINK] mechanism is visible in its output.
Result: the answer correctly identified Paris as France’s capital and mentioned the Eiffel Tower’s history. Reasoning steps were logical but verbose.
Image input: the demo allows image uploads, but the quota restriction prevented testing vision.

Export/Share

There is no built‑in export. Users can copy text manually or call the API to retrieve output programmatically.
The model’s Apache 2.0 licence allows running it on local infrastructure.

Rough performance notes

Latency for a single reply was around 12 s on the free demo.
Subsequent calls were blocked due to GPU quota limits.
On a local 4090 GPU, the model reportedly fits with quantizationventurebeat.com, so self‑hosting should offer better throughput.

Pricing (as tested)

Plan	Inference cost	Notes
Free demo	$0	Hugging Face space; limited by ZeroGPU quotas and can process 1 request/day.
Magistral Small APIventurebeat.com	$0.50 per million input tokens, $1.50 per million output tokens	Pay‑as‑you‑go; no free tier beyond small credit.
Magistral Medium APIventurebeat.com	$2 per million input tokens, $5 per million output tokens	Larger model.

Privacy & Security

Aspect	Summary
Data retention	Uploaded documents in Mistral’s ecosystem are stored on European cloud infrastructure only while part of an active library and are permanently deleted when removed or when an account is closedhelp.mistral.ai. Le Chat conversations are retained until users delete them; zero‑data retention is not available on Le Chat, though it can be requested on the API platformhelp.mistral.ai.
Model/provider disclosure	Mistral openly publishes weights and states that Magistral Small 1.2 is licensed Apache 2.0huggingface.co.
Compliance claims	Mistral replicates encrypted backups across multiple EU zones and allows clients to request a SOC 2 report intercom.help.
Notable policies	Vision uploads are optional; users can remove documents or conversations to delete datahelp.mistral.ai.

Strengths

Open and self‑hostable: Apache 2.0 license and publicly available weights allow local deployment.
Reasoning visibility: [THINK] tokens display the chain of thought before the final answer, which can aid debugging or prompt‑engineering.
Multimodal support: Accepts both text and images in one modelhuggingface.co.
Lower price than some competitors: Input/output pricing is cheaper than Claude Sonnet 4.5venturebeat.com.

Gaps

Limited demo access: The free Hugging Face space rapidly hits a ZeroGPU quota, making testing difficult; there is no official web app for consumers.
Inconsistent performance: On the free queue responses are slow; running locally requires expensive hardware (at least a 24 GB GPU).
No built‑in sharing/export: Outputs must be copied manually or via API.
Privacy opt‑outs: Zero‑data retention is not available on Le Chat and must be requested separatelyhelp.mistral.ai.
Quality vs. large models: Reasoning quality is decent but not on par with top models like Claude Sonnet 4.5.

Alternatives (Quick Compare)

Tool	Why pick it	Why skip it
DeepSeek V3.2‑Exp	Cheaper input cost—$0.028 per million tokens on cache hits and $0.42 per million for outputsventurebeat.com. Supports long contexts and caching for cost savings.	Uses sparse attention; may require caching logic to get the low price; not as open.
Claude Sonnet 4.5	Top‑tier reasoning and coding performance with built‑in agent tools and code executionanthropic.com. Consumer‑friendly interface with memory and file management.	Expensive pricing at $3/$15 per million tokensanthropic.com; proprietary closed model; no self‑hosting.
Apertus 8B	Fully transparent Swiss LLM with open weights and data; smaller (8B) so easier to run locallyswiss-ai.org.	Lacks vision support; still early and performance may be weaker.

Verdict

Magistral‑Small‑2509 is an interesting open‑source model that adds vision and explicit reasoning to Mistral’s lineup. It targets developers who want to build agentic workflows or RAG systems and need an affordable API or self‑hosted alternative. However, the lack of a stable consumer UI and the tiny free demo limit its accessibility. Teams with serious workloads will need to provision GPUs or pay for the API. For hobbyists or marketers who need plug‑and‑play chat, DeepSeek or Claude may offer better usability. Overall, Magistral Small is a promising step toward transparent multimodal reasoning but remains a tool for tinkerers rather than end users.

Media

Mistral wind icon — A stylized “M” with swirling winds evoking the Mistral name — Generated via imagegen tool (internal file).
Workflow diagram — Simple flow chart showing user input, the model’s [THINK] reasoning, final answer, and export step — Generated via imagegen tool (internal file).

Sources

Mistral AI. Magistral‑Small‑2509 model card — Describes updates over v1.1 including multimodality, improved reasoning and [THINK] prompt, multilingual support, Apache 2.0 license and 128‑k contexthuggingface.cohuggingface.co (accessed {{today_iso}}).
Mistral AI. Changelog — Confirms that Magistral Small 1.2 was released on 18 Sept 2025docs.mistral.ai (accessed {{today_iso}}).
VentureBeat. Mistral rolls out Magistral Small and Medium 1.2 models with vision — Explains that the model can run on a single RTX 4090, notes API pricing $0.50 input/$1.50 output for Small and $2/$5 for Medium, and compares costs to DeepSeekventurebeat.comventurebeat.com (accessed {{today_iso}}).
Mistral AI Help Center. How are uploaded documents stored? — States that documents are stored in EU cloud infrastructure and deleted when removed or account closedhelp.mistral.ai (accessed {{today_iso}}).
Mistral AI Help Center. What does Zero Data Retention mean? — Notes that ZDR is unavailable on Le Chat and requires a request on the platformhelp.mistral.ai (accessed {{today_iso}}).
Mistral AI Help Center (screenshot). Data security — Says that data is safeguarded with encrypted backups across EU zones and clients can request a SOC 2 report intercom.help (accessed {{today_iso}}).
Vals AI. Magistral Small 1.2 cost analysis — Shows cost of $0.50 per million input tokens and $1.50 per million output tokensvals.ai (accessed {{today_iso}}).
VentureBeat. DeepSeek V3.2‑Exp — Reports input cost of $0.028 per million tokens (cache hits) and $0.42 per million output tokensventurebeat.com (accessed {{today_iso}}).
Anthropic. Introducing Claude Sonnet 4.5 — Notes that Sonnet 4.5 retains Sonnet 4 pricing at $3 per million input tokens and $15 per million output tokensanthropic.com (accessed {{today_iso}}).
Swiss AI Initiative. Apertus LLM announcement — Explains that Apertus is an open‑weight model trained on 15 trillion tokens with chat interface requiring loginswiss-ai.org (accessed {{today_iso}}).

Author

Blog Post