mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Update Sleef to include fix for FMA4 detection (#20450)
Summary: FMA4 support is in bit 16 of register ECX, not EDX of the "extended processor info" (0x80000001). Once we verify that this change fixes https://github.com/pytorch/pytorch/issues/12112, I'll make a PR for upstream Sleef. The mapping of registers to reg is: ``` reg[0] = eax reg[1] = ebx reg[2] = ecx <--- reg[3] = edx ``` Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely supported. Intel CPUs do not set this bit. This causes "Illegal instruction" errors on AMD CPUs that do not support FMA4. See https://github.com/pytorch/pytorch/issues/12112 See https://github.com/shibatch/sleef/issues/261 http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20450 Differential Revision: D15324405 Pulled By: colesbury fbshipit-source-id: 96fb344c646998ff5da19e4cdbf493f5a4e9892a
This commit is contained in:
committed by
Facebook Github Bot
parent
101176870e
commit
b46a630836
2
third_party/sleef
vendored
2
third_party/sleef
vendored
Submodule third_party/sleef updated: 191f655caa...9b249c53a8
Reference in New Issue
Block a user