1 Commits

Author SHA1 Message Date
9c11a3e59a Attempt at adding a cache for models (#2327)
This change introduces CI caching for datasets and hub artifacts across runner operating systems with the intended goal to minimize the number of failed test runs because of network faults. As an additional bonus it might make the CI a bit faster.

The following artifacts are cached: ${HF_HOME}/hub/**

Note that we're avoiding .lock files as well as *.pyc files. We're not simply caching $HF_HOME since there is also the datasets and modules where the former was acting up when testing (no details, just dropped, we may explore this later but we're not using that many datasets) and the latter is just code which is probably not a good idea to cache anyway.

There is a post process for the cache action which uploads new data to the cache - only one runner can access the cache for uploading. This is done because github actions is locking cache creation, so if there's a concurrent cache creation, both may fail. This runner is currently set to ubuntu in the python 3.10 run.

If this modification turns out to be ineffective we can move to forbidding access to the hub in general (HF_HUB_OFFLINE=1) and updating the cache once per day but let's first try out if this is already enough to decrease the fail rate.
2025-01-23 10:54:25 +01:00