CI Handle error with MacOS and transformers
A change in transformers introduced an error in the MacOS CI, which is
handled in this PR.
Context
For context on why we use torch 2.2 for MacOS, check #2431.
Unfortunately, as of today, the available GH workers for MacOS still
haven't improved.
Description
The error was introduced by
https://github.com/huggingface/transformers/pull/37785, which results in
torch.load failing when using torch < 2.6.
The proposed solution is to plug into pytest, intercept the test report,
check for the specific error, and mark the test as skipped instead.
Alternative solutions
The proposed solution is obviously an ugly hack. However, these are
errors we cannot fix directly, as they're caused by a dependency and are
caused by the old torch version we're forced to use (thus fixing them in
transformers is probably not an option).
Instead of altering the test report, the individual tests that fail
could get an explicit skip marker when MacOS is detected. However, since
the amount of affected tests are several hundreds, this is very
impractical and leads to a lot of noise in the tests.
Alternatively, we could move forward with the proposal in #2431 and
remove MacOS completely from the CI. I do, however, still have the faint
hope that GH will provide arm64 workers with more RAM in the future,
allowing us to switch.
Description
In general, for regression tests, we need two steps:
1. Creating the regression artifacts, in this case the adapter
checkpoint and the expected output of the model.
2. Running the regression tests, i.e. loading the adapter and checking
that the output of the model is the same as the expected output.
My approach is to re-use as much code as possible between those two
steps. Therefore, the same test script can be used for both, with only
an environment variable to distinguish between the two. Step 1 is
invoked by calling:
`REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py`
and to run the second step, we call:
`pytest tests/regression/test_regression.py`
Creating regression artifacts
The first step will create an adapter checkpoint and an output for the
given PEFT version and test setting in a new directory. E.g. it will
create a directory `tests/regression/lora_opt-125m_bnb_4bit/0.5.0/` that
contains adapter_model.bin and output.pt.
Before this step runs, there is a check that the git repo is clean (no
dirty worktree) and that the commit is tagged (i.e. corresponds to a
release version of PEFT). Otherwise, we may accidentally create
regression artifacts that do not correspond to any PEFT release.
The easiest way to get such a clean state (say, for PEFT v0.5.0) is by
checking out a tagged commit, e.g:
`git checkout v0.5.0`
before running the first step.
The first step will also skip the creation of regression artifacts if
they already exist.
It is possible to circumvent all the aforementioned checks by setting
the environment variable `REGRESSION_FORCE_MODE` to True like so:
`REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py`
You should only do this if you know exactly what you're doing.
Running regression tests
The second step is much simpler. It will load the adapters and the
output created in the first step, and compare the output to the output
from a new PEFT model using the loaded adapter. The outputs should be
the same.
If more than one version is discovered for a given test setting, all of
them are tested.
Notes
Regression artifacts are stored on HF Hub.