Update Sleef to include fix for FMA4 detection (#20450)

Summary:
FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

Once we verify that this change fixes https://github.com/pytorch/pytorch/issues/12112, I'll make a PR for upstream Sleef.

The mapping of registers to reg is:

```
  reg[0] = eax
  reg[1] = ebx
  reg[2] = ecx <---
  reg[3] = edx
```

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal
instruction"
errors on AMD CPUs that do not support FMA4.

See https://github.com/pytorch/pytorch/issues/12112
See https://github.com/shibatch/sleef/issues/261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20450

Differential Revision: D15324405

Pulled By: colesbury

fbshipit-source-id: 96fb344c646998ff5da19e4cdbf493f5a4e9892a
This commit is contained in:
Sam Gross
2019-05-14 08:29:38 -07:00
committed by Facebook Github Bot
parent 101176870e
commit b46a630836