PyTorch Foundation launches version 2.5 with Intel GPU support and performance enhancements

The PyTorch Foundation has released version 2.5, adding Intel GPU support and several performance optimisations, aiming to enhance usability across a wider range of hardware.

In a significant development for the machine learning community, the PyTorch Foundation has unveiled PyTorch version 2.5, incorporating a range of enhancements and new capabilities. Notably, the updated version introduces support for Intel GPUs, alongside a suite of performance optimisations, establishing a critical step in extending PyTorch’s utility across a broader spectrum of hardware platforms.

The announcement of PyTorch 2.5 was made at the recent PyTorch conference, where attendees previewed the anticipated support for Intel GPUs for the first time. Intel’s Eikan Wang and Min Jean Cho offered insights into the modifications implemented in PyTorch to accommodate Intel hardware. This included a major update to generalise the PyTorch runtime and device layers, thereby facilitating the integration of various hardware backends. Moreover, Intel-specific backends for torch.compile and torch.distributed have been introduced.

Kismat Singh, Intel’s Vice President of Engineering for AI Frameworks, shared optimistic projections regarding the impact of this new compatibility. “With PyTorch 2.5, we’ve added support for Intel client GPUs. This effectively enables PyTorch to run on Intel-equipped laptops and desktops, potentially unlocking 40 million devices this year, with expectations to expand to around 100 million by the end of next year,” Singh stated.

Alongside hardware compatibility improvements, PyTorch 2.5 introduces the FlexAttention API, a tool designed to simplify experimentation with different attention mechanisms in machine learning models. Traditionally, such experimentation required tedious hand-coding using PyTorch operators, often resulting in suboptimal runtime and memory issues. The new API enables users to write concise and idiomatic PyTorch code, which the compiler subsequently transforms into optimised kernels. These kernels avoid extra memory usage, offering performance on par with manually crafted solutions.

Several performance boosts have also been integrated into this release, albeit in beta form. A standout is the Fused Flash Attention backend, which delivers up to a 75% speed enhancement over its predecessor when operating on NVIDIA H100 GPUs. Furthermore, the new regional compilation feature in torch.compile targets repeated modules within models, such as Transformer layers, significantly reducing overall compilation latency while maintaining robust performance.

The release also introduces Flight Recorder, a diagnostic tool designed to troubleshoot and analyse stuck jobs during distributed training sessions. It employs an in-memory circular buffer to document diagnostic information, which is subsequently offloaded to a file upon detecting a problem. This functionality allows for post-mortem analysis using heuristic scripts to pinpoint potential issues, ranging from data shortages to network errors.

The reception amongst the user community, especially on platforms like Reddit, has been overwhelmingly positive. Users hailed the Intel GPU support as transformative and expressed enthusiasm for improvements to torch.compile and the promising capabilities of the FlexAttention API. This sentiment encapsulates a broader appreciation for the PyTorch team’s continuous efforts toward innovation and responsiveness to user needs.

The comprehensive details of PyTorch 2.5, including code and release notes, are readily accessible on GitHub, offering developers and researchers alike a robust resource for their ongoing projects.

Source: Noah Wire Services