transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	a15d77cd0c	Remove upper version bound of pandas (#41677 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-17 17:31:41 +02:00
Yuanyuan Chen	c01ceffeb4	Enable faiss-cpu on Windows (#41678 ) faiss-cpu is supported on Windows Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-17 13:00:57 +00:00
Lucain	af2a66ced9	Migrate transformers cli to Typer (#41487 ) * Add typer-slim as explicit dependency * Migrate CLI to Typer * code quality * bump release candidate * adapt test_cli.py * Remove ./commands + adapt tests * fix quality * consistency * doctested * do not serve model in chat * style * will it fix them? * fix test * capitalize classes * Rebase * Rebase * tests + fixup tests + fixup * csutom error message * fix ? * should be good * fix caplog globally * inner caplog * last attempt * Retry * Let's try with capsys disabled --------- Co-authored-by: Lysandre <hi@lysand.re>	2025-10-16 13:29:42 +02:00
Yih-Dar	96d245a83d	torch 2.9 don't ❤️ torchcodec 💔 (#41610 ) pin Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-10-15 14:34:00 +02:00
Lucain	b4067472ae	Bump to hfh 1.0.0.rc5 to fix test (#41508 )	2025-10-10 12:12:08 +02:00
Marc Sun	1a3a5f5289	Remove SigOpt (#41479 ) * remove sigopt * style	2025-10-09 18:05:55 +02:00
Raushan Turganbay	89a4115a6b	Validate processing kwargs with @strict from huggingface_hub (#40793 ) * initial design draft * delete * fix a few tests * fix * fix the rest of tests * common-kwargs * why the runner complains about typing with "\|"? * revert * forgot to delete * update * fix last issues * add more detalis in docs * pin the latest hub release * fix tests for new models * also fast image processor * fix copies * image processing ast validated * fix more tests * typo.and fix copies * bump * style * fix some tests * fix copies * pin rc4 and mark all TypedDict as non-total * delete typed dict adaptor * address comments * delete optionals	2025-10-08 16:14:09 +02:00
Yuanyuan Chen	c528f50663	Remove Python 3.9 classifier (#41410 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-08 12:20:36 +00:00
Lysandre Debut	34dcd73b57	v5 dev version (#41436 )	2025-10-08 10:45:33 +02:00
Marc Sun	c562c5d801	[v5] Bump accelerate to 1.1.0 (#41234 ) * bump to 1.1.0 ! * bump accelerate * fix * None * fixed ! * style	2025-10-07 17:18:32 +02:00
Cyril Vallez	55b172b8eb	🚨 Bump to Python 3.10 and rework how we check 3rd-party libraries existence (#41268 ) * cleanup * add check * fix * remove all global variables * fix * add lru caches everywhere * fix * fix * style * improve * reorder all functions * fix order * improve * fix * fix * fix	2025-10-06 11:04:19 +02:00
Yuanyuan Chen	ca975f1cb8	[V5] Remove deprecated transformers.onnx (#41214 ) * Remove deprecated transformers.onnx Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove onnx docs Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-10-01 12:17:04 +00:00
Lucain	fcd483f0ff	Bump hfh prerelease version (#41175 )	2025-09-29 16:28:36 +02:00
Lucain	44682e7131	Adapt and test huggingface_hub v1.0.0 (#40889 ) * Adapt and test huggingface_hub v1.0.0.rc0 * forgot to bump hfh * bump * code quality * code quality * relax dependency table * fix has_file * install hfh 1.0.0.rc0 in circle ci jobs * repostiryo * push to hub now returns a commit url * catch HfHubHTTPError * check commit on branch * add it back * fix ? * remove deprecated test * uncomment another test * trigger * no proxies * many more small changes * fix load PIL Image from httpx * require 1.0.0.rc0 * fix mocked tests * fix others * unchange * unchange * args * Update .circleci/config.yml * Bump to 1.0.0.rc1 * bump kernels version * fix deps	2025-09-25 11:13:50 +00:00
Yuanyuan Chen	43a613c8da	Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809 ) Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-09-24 06:37:21 +00:00
Yuanyuan Chen	8a52288dba	Remove optax (#41030 ) Remove optax dep Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-22 11:30:39 +00:00
Joao Gante	dfc230389c	🚨 [v5] remove deprecated entry point (#40997 ) * remove old entry point * update references to transformers-cli	2025-09-19 14:40:27 +00:00
Cyril Vallez	4df2529d79	🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760 ) * setup * start the purge * continue the purge * more and more * more * continue the quest: remove loading tf/jax checkpoints * style * fix configs * oups forgot conflict * continue * still grinding * always more * in tje zone * never stop * should fix doc * fic * fix * fix * fix tests * still tests * fix non-deterministic * style * remove last rebase issues * onnx configs * still on the grind * always more references * nearly the end * could it really be the end? * small fix * add converters back * post rebase * latest qwen * add back all converters * explicitly add functions in converters * re-add	2025-09-18 18:27:39 +02:00
Marc Sun	c5553b4120	Fix trainer tests (#40823 ) * fix liger * fix * more * fix * fix hp * fix --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>	2025-09-17 16:05:17 +00:00
Joao Gante	96a5774f2e	[serve] re-enable tests (#40717 ) run tests	2025-09-05 15:15:34 +01:00
Joao Gante	a2a8a3ca1e	[tests] fix blip2 edge case (#40699 )	2025-09-05 11:35:29 +01:00
dsinghvi	1363fceeec	remove the redundant non maintained jieba and use rjieba instead (#40383 ) * porting not maintained jieba to rjieba * Fix format * replaced the line with rjieba instead of removing it * cut_all is not included as a parameter. cut_all is a seperate function rjieba * rev * jieba remove installation * Trigger tests * Update tokenization_cpm.py * Update tokenization_cpm_fast.py --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-08-30 13:28:52 +02:00
Yih-Dar	36fddebcee	pin `pytest-rerunfailures<16.0` (#40561 ) ping pytest-rerunfailures<16.0 Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-30 12:58:44 +02:00
Lysandre	ce48e9cac0	Dev version	2025-08-29 20:17:34 +02:00
Arthur	11a6b95553	Oupsy (#40544 ) fix bump!	2025-08-29 12:59:49 +02:00
Arthur	b07144ac27	`tokenizers` bump tokenizers version (#40540 ) * bump tokenizers version * use rc0 * ? * fml * update	2025-08-29 12:34:41 +02:00
Rémi Ouazan	34108a2230	Continuous batching refactor (#40426 ) * Rework of the CB example * Further rework of CB example * Refactor PA cache, slice on tokens, add debug prints -- WIP * Slice cache -- WIP * Added a mechanism to check batched outputs in CB script * Less logging, debug flag for slice, !better reset! -- WIP * QOL and safety margins * Refactor and style * Better saving of cb example * Fix * Fixes and QOL * Mor einformations about metrics * Further logging * Style * Licenses * Removed some comments * Add a slice input flag * Fix in example * Added back some open-telemetry deps * Removed some aux function * Added FA2 option to example script * Fixed math (all of it) * Added a simple example * Renamed core to classes * Made allocation of attention mask optionnal * Style	2025-08-26 13:01:42 +02:00
Matt	2df0c323cb	byebye torch 2.1 (#40317 ) * Bump minimum torch version to 2.2 * Remove is_torch_greater_or_equal_than_2_2 * update versions table * Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa	2025-08-20 15:03:46 +01:00
Anuraag (Rag) Agrawal	a5f0b505a0	Remove OTel SDK dependencies (#40305 )	2025-08-20 12:31:44 +02:00
Thomas Børstad	00b4dfb786	Add `chat_template` (`jinja2`) as an extra dependency (#40128 ) * add jinja2 as a dependency * Make jinja2 a core dependency in install_requires - Add jinja2 to install_requires list in setup.py for automatic installation - Add jinja2 to runtime version checks in dependency_versions_check.py - Resolves issue where pip install transformers doesn't install jinja2 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Make jinja2 a core dependency in install_requires * Make jinja2 an extra dependency instead of adding a core dep --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-18 14:31:40 +00:00
Lysandre	eb6e26acf3	Dev version	2025-08-05 18:09:30 +02:00
Arthur	7c38d8fc23	Add GPT OSS model from OpenAI (#39923 ) * fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(args, kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (#39840)" This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3. * update * Revert "remove dtensors, not explicit (#39840)" This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (#51) * Add oai fix fast tests (#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: edbeeching <edbeeching@gmail.com> Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Zhuohan Li <zhuohan@openai.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal> Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Akos Hadnagy <akos@ahadnagy.com> Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-05 18:02:18 +02:00
Yih-Dar	379209b603	add `libcst` to `extras["testing"]` in `setup.py` (#39761 ) add Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-29 16:58:51 +02:00
Rémi Ouazan	14adcbd937	Fix AMD dockerfile for audio models (#39669 )	2025-07-28 19:05:41 +02:00
Arthur	c3401d6fad	dev version 4.55	2025-07-25 21:11:20 +02:00
Lysandre Debut	f90de364c2	Rename huggingface_cli to hf (#39630 ) * Rename huggingface_cli to hf * hfh	2025-07-25 14:10:04 +02:00
Joao Gante	4741e1f1b7	[timm] new timm pin (#39640 )	2025-07-24 16:01:59 +00:00
Joao Gante	328ca9cf1d	[dependencies] Update `datasets` pin (#39500 ) * pyarrow pin * make fixup * test? * like this? * like this? * like this? * datasets pin * comment	2025-07-18 12:05:28 +00:00
Lysandre Debut	de5ca373ac	Responses API in `transformers serve` (#39155 ) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * Responses API (to be merged into #39155) (#39338) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * use openai * validate request, including detecting unused fields * dict indexing * dict var access * tmp commit (tests failing) * add slow * use oai output type in completions * (little rebase errors) * working spec? * guard type hint * type hints. fix state (CB can now load different models) * type hints; fn names; error type * add docstrings * responses + kv cache * metadata support; fix kv cache; error event * add output_index and content_index * docstrings * add test_build_response_event * docs/comments * gate test requirements; terminate cb manager on model switch * nasty type hints * more type hints * disable validation by default; enable force models * todo --------- Co-authored-by: Lysandre <hi@lysand.re> * Slight bugfixes * PR comments from #39338 * make fixup --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2025-07-16 14:16:16 +02:00
Julien Denize	70e57e4710	Add mistral common support (#38906 ) * wip: correct docstrings * Add mistral-common support. * quality * wip: add requested methods * wip: fix tests * wip: add internally some methods not being supported in mistral-common * wip * wip: add opencv dependency and update test list * wip: add mistral-common to testing dependencies * wip: revert some test changes * wip: ci * wip: ci * clean * check * check * check * wip: add hf image format to apply_chat_template and return pixel_values * wip: make mistral-common non-installed safe * wip: clean zip * fix: from_pretrained * fix: path and base64 * fix: path and import root * wip: add docs * clean * clean * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-07-11 16:26:58 +00:00
Cyril Vallez	9bc675b3b6	Fix link for testpypi (#39360 ) fix link	2025-07-11 15:34:01 +02:00
Joao Gante	38c3931362	[server] add tests and fix passing a custom `generation_config` (#39230 ) * add tests; fix passing a custom generation_config * tool integration test * add install step * add accelerate as dep to serving * add todo	2025-07-10 13:41:38 +00:00
Lysandre Debut	e8f90b5397	Split `transformers chat` and `transformers serve` (#38443 ) * Next token * Split chat and serve * Support both generation methods * Style * Generation Config * temp * temp * Finalize serving.py Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> * Finalize chat.py * Update src/transformers/commands/serving.py Co-authored-by: célina <hanouticelina@gmail.com> * Lucain's comments Co-authored-by: Lucain <lucain@huggingface.co> * Update * Last comments on PR * Better error handling * Better error handling * CI errors * CI errors * Add tests * Fix tests * Fix tests * [chat] Split chat/serve (built on top of lysandre's PR) (#39031) * Next token * Split chat and serve * Support both generation methods * Style * Generation Config * temp * temp * Finalize serving.py Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> * Finalize chat.py * Update src/transformers/commands/serving.py Co-authored-by: célina <hanouticelina@gmail.com> * Lucain's comments Co-authored-by: Lucain <lucain@huggingface.co> * Update * Last comments on PR * Better error handling * Better error handling * CI errors * CI errors * Add tests * Fix tests * Fix tests * streaming tool call * abstract tool state; set tool start as eos * todos * server working on models without tools * rm chat's deprecated flags * chat defaults * kv cache persists across calls * add server docs * link * Update src/transformers/commands/serving.py * Apply suggestions from code review * i love merge conflicts * solve multi turn with tiny-agents * On the fly switching of the models * Remove required positional arg --------- Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> Co-authored-by: Lucain <lucain@huggingface.co> * Protect names * Fix tests --------- Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> Co-authored-by: Lucain <lucain@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-06-30 15:10:53 +02:00
Lysandre	5154497607	Dev version	2025-06-26 18:04:36 +02:00
Mohamed Mekkouri	08bf7f1afe	Add kernelize to transformers (#38205 ) * fix * fix * fix flow * remove non compiling path * change * style * fix * update * update pin * revert	2025-06-24 17:38:54 +02:00
Kyle Mylonakis	3542e0b844	build: 📌 Remove upper bound on PyTorch (#38789 ) build: 📌 remove upper bound on torch dependency as issue which originally resulted in the pin has been released in torch 2.7.1	2025-06-12 16:34:13 +02:00
Sai-Suraj-27	88912b8e95	Remove `isort` from dependencies (#38616 ) Removed isort as a dependency	2025-06-05 16:42:49 +00:00
Yih-Dar	8c59cdb3f8	pin pandas (#38605 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-05 11:33:06 +02:00
Arthur	211f2b0875	Add CB (#38085 ) * stash for now * initial commit * small updated * up * up * works! * nits and fixes * don't loop too much * finish working example * update * fix the small freeblocks issue * feat: stream inputs to continuous batch * fix: update attn from `eager` to `sdpa` * refactor: fmt * refactor: cleanup unnecessary code * feat: add `update` fn to `PagedAttentionCache` * feat: broken optimal block size computation * fix: debugging invalid cache logic * fix: attention mask * refactor: use custom prompts for example * feat: add streaming output * fix: prefill split refactor: add doc strings and unsound/redundant logic fix: compute optimal blocks logic * fix: send decoded tokens when `prefilling_split` -> `decoding` * refactor: move logic to appropriate parent class * fix: remove truncation as we split prefilling anyways refactor: early return when we have enough selected requests * feat: add paged attention forward * push Ggraoh> * add paged sdpa * update * btter mps defaults * feat: add progress bar for `generate_batch` * feat: add opentelemetry metrics (ttft + batch fill %age) * feat: add tracing * Add cuda graphs (#38059) * draft cudagraphs addition * nits * styling * update * fix * kinda draft of what it should look like * fixes * lol * not sure why inf everywhere * can generate but output is shit * some fixes * we should have a single device synch * broken outputs but it does run * refactor * updates * updates with some fixes * fix mask causality * another commit that casts after * add error * simplify example * update * updates * revert llama changes * fix merge conflicts * fix: tracing and metrics * my updates * update script default values * fix block allocation issue * fix prefill split attnetion mask * no bugs * add paged eager * fix * update * style * feat: add pytorch traces * fix * fix * refactor: remove pytorch profiler data * style * nits * cleanup * draft test file * fix * fix * fix paged and graphs * small renamings * cleanups and push * refactor: move tracing and metrics logic to utils * refactor: trace more blocks of code * nits * nits * update * to profile or not to profile * refactor: create new output object * causal by default * cleanup but generations are still off for IDK what reason * simplifications but not running still * this does work. * small quality of life updates * nits * updaet * fix the scheduler * fix warning * ol * fully fixed * nits * different generation parameters * nice * just style * feat: add cache memory usage * feat: add kv cache free memory * feat: add active/waiting count & req latency * do the sampling * fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager * fix on mps * feat: add dashboard & histogram buckets * perf: improve waiting reqs data structures * attempt to compile, but we should only do it on mps AFAIK * feat: decouple scheduling logic * just a draft * c;eanup and fixup * optional * style * update * update * remove the draft documentation * fix import as well * update * fix the test * style doomed --------- Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>	2025-05-22 17:43:48 +02:00
Arthur	7b7bb8df97	Protect ParallelInterface (#38262 ) Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-05-21 17:45:38 +02:00

1 2 3 4 5 ...

586 Commits