9e99198e5e
Use | for Optional and Union typing ( #41646 )
...
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com >
2025-10-16 14:29:54 +00:00
4df2529d79
🚨 🚨 🚨 Fully remove Tensorflow and Jax support library-wide ( #40760 )
...
* setup
* start the purge
* continue the purge
* more and more
* more
* continue the quest: remove loading tf/jax checkpoints
* style
* fix configs
* oups forgot conflict
* continue
* still grinding
* always more
* in tje zone
* never stop
* should fix doc
* fic
* fix
* fix
* fix tests
* still tests
* fix non-deterministic
* style
* remove last rebase issues
* onnx configs
* still on the grind
* always more references
* nearly the end
* could it really be the end?
* small fix
* add converters back
* post rebase
* latest qwen
* add back all converters
* explicitly add functions in converters
* re-add
2025-09-18 18:27:39 +02:00
5ac3c5171a
Track the CI (model) jobs that don't produce test output files (process being killed etc.) ( #40981 )
...
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-09-18 18:27:27 +02:00
738b223f57
Add captured actual outputs to CI artifacts ( #40965 )
...
* fix
* fix
* Remove `# TODO: ???` as it make me `???`
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-09-18 15:40:53 +02:00
80f4c0c6a0
CI when PR merged to main ( #40451 )
...
* up
* up
* up
* up
* up
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-08-27 10:56:18 +02:00
1054494dd6
Update notification service amd_daily_ci_workflows definition ( #40314 )
2025-08-20 17:49:46 +02:00
5d906740d2
Update CI with nightly torch workflow file ( #40306 )
...
* fix nightly ci
* Apply suggestions from code review
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com >
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com >
2025-08-20 16:59:00 +02:00
4668ef1459
Update notification service MI325 ( #40078 )
...
add mi325 to amd_daily_ci_workflows
2025-08-12 10:22:52 +02:00
43001fd3c6
Fix time_spent in notification_service.py. ( #40081 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-08-11 18:30:58 +02:00
f4d57f2f0c
Revert "fix notification_service.py about time_spent" ( #40044 )
...
Revert "fix `notification_service.py` about `time_spent` (#40037 )"
This reverts commit d2ba153b29feb9cc0e9818c1ce63a07679b47250.
2025-08-08 22:32:24 +02:00
d2ba153b29
fix notification_service.py about time_spent ( #40037 )
...
temp
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-08-08 17:11:16 +02:00
1e0665a191
Simplify conditional code ( #39781 )
...
* Use !=
Signed-off-by: cyy <cyyever@outlook.com >
* Use get
Signed-off-by: cyy <cyyever@outlook.com >
* Format
* Simplify bool operations
Signed-off-by: cyy <cyyever@outlook.com >
---------
Signed-off-by: cyy <cyyever@outlook.com >
2025-07-30 12:32:10 +00:00
54cbea5615
more info in model_results.json ( #39783 )
...
more info
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-07-30 11:43:10 +02:00
95faabf0a6
Apply several ruff SIM rules ( #37283 )
...
* Apply ruff SIM118 fix
Signed-off-by: cyy <cyyever@outlook.com >
* Apply ruff SIM910 fix
Signed-off-by: cyy <cyyever@outlook.com >
* Apply ruff SIM101 fix
Signed-off-by: cyy <cyyever@outlook.com >
* Format code
Signed-off-by: cyy <cyyever@outlook.com >
* More fixes
Signed-off-by: cyy <cyyever@outlook.com >
---------
Signed-off-by: cyy <cyyever@outlook.com >
2025-07-29 11:40:34 +00:00
fb58377700
Slack CI bot: set default result for non-existing artifacts ( #39499 )
...
* Set default result for non-existing artifacts
* FMT
* Address review comments
2025-07-18 11:45:47 +00:00
79941c61ce
Fix missing definition of diff_file_url in notification service ( #39445 )
...
Fix missing definition of diff_file_url
2025-07-16 12:09:18 +02:00
0dc2df5dda
CI workflow for performed test regressions ( #39198 )
...
* WIP script to compare test runs for models
* Update line normalitzation logic
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
2025-07-16 04:20:02 +02:00
508a704055
No more Tuple, List, Dict ( #38797 )
...
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-06-17 19:37:18 +01:00
e8b292e35f
Fix utils/notification_service.py ( #38556 )
...
* fix
* fix
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-06-03 13:59:31 +00:00
5f49e180a6
Add mi300 to amd daily ci workflows definition ( #38415 )
2025-05-28 09:17:41 +02:00
eb74cf977b
Use one utils/notification_service.py ( #38379 )
...
* step 1
* step 2
* step 3
* step 4
* step 5
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-26 16:15:29 +02:00
4a03044ddb
Hot fix for AMD CI workflow ( #38349 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-25 11:15:31 +02:00
d0c9c66d1c
new failure CI reports for all jobs ( #38298 )
...
* new failures
* report_repo_id
* report_repo_id
* report_repo_id
* More fixes
* More fixes
* More fixes
* ruff
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-24 19:15:02 +02:00
feec294dea
CI reporting improvements ( #38230 )
...
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-20 19:34:58 +02:00
b1375177fc
add job links to new model failure report ( #37973 )
...
* update for job link
* stye
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-06 15:10:29 +02:00
afbc293e2b
More fault tolerant notification service ( #37924 )
...
* Let notification service succeed even when artifacts and reported jobs on github have mismatch
* Use default trace msg if no trace msg available
* Add pop_default helper fn
* style
2025-05-05 15:19:48 +02:00
da4ff2a5f5
Add Optional to remaining types ( #37808 )
...
More Optional typing
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-28 14:20:45 +01:00
d9e76656ae
Fix new failure reports not including anything other than tests/models/ ( #37415 )
...
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-10 14:47:23 +02:00
4f139f5a50
Send trainer/fsdp/deepspeed CI job reports to a single channel ( #37411 )
...
* send trainer/fsdd/deepspeed channel
* update
* change name
* no .
* final
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-10 13:17:31 +02:00
c6814b4ee8
Update ruff to 0.11.2 ( #36962 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-03-25 16:00:11 +01:00
90b46e983f
Remove old benchmark code ( #35730 )
...
* remove traces of the old deprecated benchmarks
* also remove old tf benchmark example, which uses deleted code
* run doc builder
2025-01-21 17:56:43 +00:00
40821a2478
Fix CI slack reporting issue ( #34833 )
...
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-11-20 21:36:13 +01:00
9360f1827d
Tiny update after #34383 ( #34404 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-10-28 12:01:05 +01:00
fce1fcfe71
Ping team members for new failed tests in daily CI ( #34171 )
...
* ping
* fix
* fix
* fix
* remove runner
* update members
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-10-17 16:11:52 +02:00
f2122cc6eb
Upload new model failure report to Hub ( #32264 )
...
upload
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-07-29 09:42:54 +02:00
d4564df1d4
Revive Nightly/Past CI ( #31159 )
...
* build
* build
* build
* build
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-06-20 18:57:24 +02:00
3714f3f86b
Upload (daily) CI results to Hub ( #31168 )
...
* build
* build
* build
* build
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-06-04 21:20:54 +02:00
a3cdff417b
save the list of new model failures ( #31013 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-24 15:20:25 +02:00
1432f641b8
Finally fix the missing new model failure CI report ( #30968 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-22 17:48:26 +02:00
82c1625ec3
Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) ( #30699 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* Update utils/notification_service.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-13 17:27:44 +02:00
884e3b1c53
Rename artifact name prev_ci_results to ci_results ( #30697 )
...
* rename
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-07 16:59:16 +02:00
fbb41cd420
consistent job / pytest report / artifact name correspondence ( #30392 )
...
* better names
* run better names
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-04-24 22:32:42 +02:00
58a939c6b7
Fix quantization tests ( #29914 )
...
* revert back to torch 2.1.1
* run test
* switch to torch 2.2.1
* udapte dockerfile
* fix awq tests
* fix test
* run quanto tests
* update tests
* split quantization tests
* fix
* fix again
* final fix
* fix report artifact
* build docker again
* Revert "build docker again"
This reverts commit 399a5f9d9308da071d79034f238c719de0f3532e.
* debug
* revert
* style
* new notification system
* testing notfication
* rebuild docker
* fix_prev_ci_results
* typo
* remove warning
* fix typo
* fix artifact name
* debug
* issue fixed
* debug again
* fix
* fix time
* test notif with faling test
* typo
* issues again
* final fix ?
* run all quantization tests again
* remove name to clear space
* revert modfiication done on workflow
* fix
* build docker
* build only quant docker
* fix quantization ci
* fix
* fix report
* better quantization_matrix
* add print
* revert to the basic one
2024-04-09 17:10:29 +02:00
b17b54d3dd
Refactor daily CI workflow ( #30012 )
...
* separate jobs
* separate jobs
* use channel name directly instead of ID
* use channel name directly instead of ID
* use channel name directly instead of ID
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-04-05 15:49:51 +02:00
f54d82cace
[CI] Quantization workflow ( #29046 )
...
* [CI] Quantization workflow
* build dockerfile
* fix dockerfile
* update self-cheduled.yml
* test build dockerfile on push
* fix torch install
* udapte to python 3.10
* update aqlm version
* uncomment build dockerfile
* tests if the scheduler works
* fix docker
* do not trigger on psuh again
* add additional runs
* test again
* all good
* style
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* test build dockerfile with torch 2.2.0
* fix extra
* clean
* revert changes
* Revert "revert changes"
This reverts commit 4cb52b8822da9d1786a821a33e867e4fcc00d8fd.
* revert correct change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
2024-02-28 10:09:25 -05:00
4735866141
Split daily CI using 2 level matrix ( #28773 )
...
* update / add new workflow files
* Add comment
* Use env.NUM_SLICES
* use scripts
* use scripts
* use scripts
* Fix
* using one script
* Fix
* remove unused file
* update
* fail-fast: false
* remove unused file
* fix
* fix
* use matrix
* inputs
* style
* update
* fix
* fix
* no model name
* add doc
* allow args
* style
* pass argument
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-01-31 18:04:43 +01:00
95346e9dcd
Add artifact name in job step to maintain job / artifact correspondence ( #28682 )
...
* avoid using job name
* apply to other files
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-01-31 15:58:17 +01:00
79e7655906
Fix notification_service.py ( #27903 )
...
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2023-12-08 14:55:02 +01:00
9f1f11a2e7
Show new failing tests in a more clear way in slack report ( #27881 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2023-12-07 15:09:30 +01:00
e0d2e69582
restructure AMD scheduled CI ( #27743 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2023-12-04 15:32:05 +01:00