Our
records indicate you downloaded OpenVINO™ AI inferencing software from
Intel in the past. We wanted to make you aware that a new release of
OpenVINO™ toolkit, is now available for you to upgrade.
This release includes ongoing GPU optimizations for scalable LLM performance and NPU updates for simplified deployment.
Key Highlights:
More Gen AI coverage and frameworks integrations to minimize code changes
- New models supported
- On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B, Mistral-Small-24B-Instruct-2501
- On NPUs: Gemma-3-4b-it, and Qwen2.5-VL-3B-Instruct
- Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for Qwen3-30B-A3B
- GenAI pipeline integrations:
Qwen3-Embedding-0.6B and Qwen3-Reranker-0.6B for enhanced
retrieval/ranking and Qwen2.5VL-7B for video pipeline
Broader LLM model support and more model compression techniques
- Gold support for Windows ML* enables
developers to deploy AI models and applications effortlessly across
CPUs, GPUs, and NPUs on AI PCs powered by Intel® Core™ Ultra processors.
- The Neural Network Compression Framework
(NNCF) ONNX backend now supports INT8 static post-training quantization
(PTQ) and INT8/INT4 weight-only compression to ensure accuracy parity
with OpenVINO IR format models. SmoothQuant algorithm support added for
INT8 quantization.
- Accelerated multi-token generation for
GenAI, leveraging optimized GPU kernels to deliver faster inference,
smarter KV-cache reuse, and scalable LLM performance.
- GPU plugin updates include improved
performance with prefix caching for chat history scenarios and enhanced
LLM accuracy with dynamic quantization support for INT8.
More portability and performance to run AI at the edge, in the cloud or locally
- Announcing support for Intel® Core™ Ultra Processor Series 3 (Formerly codenamed Panther Lake)
- Encrypted blob format support added for
secure model deployment with OpenVINO™ GenAI. Model weights and
artifacts are stored and transmitted in an encrypted format, reducing
risks of IP theft during deployment. Developers can deploy with minimal
code changes using OpenVINO GenAI pipelines.
- OpenVINO™ Model Server and OpenVINO™
GenAI now extend support for Agentic AI scenarios with new features such
as output parsing, improved chat templates for reliable multi-turn
interactions, and preview functionality for the Qwen3-30B-A3B model.
OpenVINO™ Model Server also introduces a preview for audio endpoints.
- NPU deployment is simplified through
batch support, which automatically reshapes models to batch size = 1 for
compatibility with older driver versions. This enables seamless model
execution across all Intel® Core™ Ultra processors regardless of driver
version.
- The improved Nvidia Triton Server*
integration with OpenVINO backend, now enables developers to utilize
Intel GPUs or NPUs for deployment.