星期日, 04 05月 2025 23:48

Intel® Extension for PyTorch* v2.7.10+xpu

Install
pip install intel-extension-for-pytorch==2.7.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/


Intel® Extension for PyTorch* v2.7.10+xpu. This is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch* 2.7.0.

Highlights

  • Intel® oneDNN v3.7.1 integration
  • Large Language Model (LLM) optimization
  • Intel® Extension for PyTorch* optimizes typical LLM models like Llama 2, Llama 3, Phi-3-mini, Qwen2, and GLM-4 on the Intel® Arc™ Graphics family. Moreover, new LLM inference models such as Llama 3.3, Phi-3.5-mini, Qwen2.5, and Mistral-7B are also optimized on Intel® Data Center GPU Max Series platforms compared to the previous release. A full list of optimized models can be found in the LLM Optimizations Overview, with supported transformer version updates to 4.48.3.
  • Serving framework support
  • Intel® Extension for PyTorch* offers extensive support for various ecosystems, including vLLM and TGI, with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features, such as chunked prefill and MoE (Mixture of Experts), are supported by the backend kernels provided in Intel® Extension for PyTorch*. In this release, Intel® Extension for PyTorch* adds sliding windows support in ipex.llm.modules.PagedAttention.flash_attn_varlen_func to meet the need of models like Phi3, and Mistral, which enable sliding window support by default.
  • [Prototype] QLoRA/LoRA finetuning using BitsAndBytes
    • The performance of the NF4 dequantize kernel has been improved by approximately 4.4× to 5.6× across different shapes compared to the previous release.
    • _int_mm support in INT8 has been added to enable INT8 LoRA finetuning in PEFT (with float optimizers like adamw_torch).
  • Intel® Extension for PyTorch* supports QLoRA/LoRA finetuning with BitsAndBytes on Intel® GPU platforms. This release includes several enhancements for better performance and functionality:
  • Codegen support removal
  • Removes codegen support from Intel® Extension for PyTorch* and reuses the codegen capability from Torch XPU Operators, to ensure interoperability of code change in codegen with usages in Intel® Extension for PyTorch*.
  • [Prototype] Python 3.13t support
  • Adds prototype support for Python 3.13t and provides prebuilt binaries on the download server.
查看 36186
 
Please support our site by viewing this advertisement.

Please support our site by viewing this advertisement

Free Content