mirror of
https://github.com/huggingface/transformers.git
synced 2025-10-25 04:04:35 +08:00
Compare commits
259 Commits
reverse_te
...
update-qua
| Author | SHA1 | Date | |
|---|---|---|---|
| 1523e08a9e | |||
| 4e90b99ed9 | |||
| 18871599c9 | |||
| d6a5c23f71 | |||
| ae5cbf804b | |||
| c57eafdaa1 | |||
| d4e1acbb7c | |||
| 28fb02fc05 | |||
| 40821a2478 | |||
| 3cb8676a91 | |||
| bf42c3bd4b | |||
| 67890de3b8 | |||
| f297af55df | |||
| 8cadf76e1c | |||
| 9d16441e4f | |||
| 9470d65324 | |||
| 145fbd46cb | |||
| 3033509327 | |||
| befbbf2f98 | |||
| 469eddbe2d | |||
| 05ebe8b9b0 | |||
| eedc113914 | |||
| b99ca4d28b | |||
| 15dd625a0f | |||
| dc42330388 | |||
| 427b62ed1a | |||
| fdb9230485 | |||
| 7b9e51c1a0 | |||
| 5fa4f64605 | |||
| 581524389a | |||
| e3a5889ef0 | |||
| ce1d328e3b | |||
| 4bff54f921 | |||
| 54739a320e | |||
| 5de58d5955 | |||
| 3cd78be34e | |||
| 0db91c3c8d | |||
| 1a0cd69435 | |||
| d8a5d31d9c | |||
| dadb286f06 | |||
| eed11f34ab | |||
| 759a378ee5 | |||
| 20142ab542 | |||
| 7df93d6ffb | |||
| 7693b62268 | |||
| 1ef6c5f1c5 | |||
| e80a65ba4f | |||
| 9568a9dfc5 | |||
| 8568bf1bcf | |||
| 36759f3312 | |||
| 1c471fc307 | |||
| c772d4d91e | |||
| eb0ab3ed4b | |||
| 1646ffb4d1 | |||
| 3ee24e2208 | |||
| 13493215ab | |||
| 8d50fda644 | |||
| b0c0ba7b4d | |||
| 52ea4aa589 | |||
| 7b3d615bc2 | |||
| f5dbfab7f3 | |||
| 8ba3e1505e | |||
| a3d69a8994 | |||
| 68f8186a89 | |||
| e7c36a9d57 | |||
| be8748a53c | |||
| 33eef99250 | |||
| 6de2a4d1f1 | |||
| 25f510a9c6 | |||
| 3ea3ab62d8 | |||
| 134ba90da9 | |||
| 768f3c016e | |||
| a06a0d1263 | |||
| 1cf17077bf | |||
| 6938524a28 | |||
| 7bbc624743 | |||
| e83aaaa86b | |||
| 9f28d0c5d0 | |||
| d2bae7ee9d | |||
| f2d5dfbab2 | |||
| 082e57e0d4 | |||
| 74d3824cc0 | |||
| 45b0c7680c | |||
| 663c851239 | |||
| 893ad04fad | |||
| 5e1fd4e204 | |||
| d0b1d8d888 | |||
| eb811449a2 | |||
| bfa021be05 | |||
| 0a6795af12 | |||
| 1112c54604 | |||
| a86bd6f2d8 | |||
| 48831b7d11 | |||
| 34927b0f73 | |||
| 187439c3fa | |||
| ef976a7e18 | |||
| 33868a057c | |||
| e2ac16b28a | |||
| 86701f2b6f | |||
| 4cc0813e28 | |||
| 6beb3f1691 | |||
| b53e44e847 | |||
| 2801d7bcf6 | |||
| df8640cedb | |||
| 203e27059b | |||
| c443d8d536 | |||
| 114dd812dd | |||
| 294c170ff9 | |||
| b5919e12f7 | |||
| 4ca004eac6 | |||
| ab98f0b0a1 | |||
| dca93ca076 | |||
| 1b86772de5 | |||
| f38531619d | |||
| 405b562698 | |||
| 48872fd6ae | |||
| 9f06fb0505 | |||
| 5251fe6271 | |||
| eab6c491d4 | |||
| 241d79026f | |||
| 8a734ea2c3 | |||
| 913330ca9f | |||
| 0f764a5af7 | |||
| 25a9fc584a | |||
| cd277618d4 | |||
| 9bee9ff5db | |||
| e4449bb790 | |||
| f55595b177 | |||
| 4e2e8809ff | |||
| e9ad460494 | |||
| f339042b0b | |||
| 34620e8f0a | |||
| 56c45d5757 | |||
| 0ab0a42651 | |||
| 8755dd26b7 | |||
| 5392f12e16 | |||
| 004530aa05 | |||
| 9e3d704e23 | |||
| 626c610a4d | |||
| 439334c8fb | |||
| a1835195d1 | |||
| 655bec2da7 | |||
| 63ca6d9771 | |||
| 808d6c50f8 | |||
| fe76b60370 | |||
| a769ed45e1 | |||
| 6cc4a67b3d | |||
| d21dbd1520 | |||
| a17f287ac0 | |||
| 084e946cfd | |||
| 1f7539c829 | |||
| fc1ae7f30f | |||
| c1753436db | |||
| 8b3b9b48fc | |||
| 92bcdff2ef | |||
| 9360f1827d | |||
| fc465bb196 | |||
| fddbd3c13c | |||
| 1d06379331 | |||
| 6a62a6d1b5 | |||
| f73f5e62e2 | |||
| e447185b1f | |||
| 186b8dc190 | |||
| 8814043c8c | |||
| 223855314f | |||
| 9f365fe0ac | |||
| 5779bac4c4 | |||
| 940a6bd343 | |||
| 3d99f1746e | |||
| a308d28d39 | |||
| 4c6e0c9252 | |||
| 1c5918d910 | |||
| d9989e0b9a | |||
| fe35073319 | |||
| e288616606 | |||
| 450b9cbfac | |||
| 6432ad8bb5 | |||
| dd267fca72 | |||
| 30c76d5b28 | |||
| 2112027d0c | |||
| b29c24ff1e | |||
| f0b3ef9e2e | |||
| 9643069465 | |||
| f0e640adfa | |||
| 05863817d6 | |||
| 65753d6065 | |||
| b0f0c61899 | |||
| e50bf61dec | |||
| c42b3223db | |||
| d9f733625c | |||
| 1fb575fcf0 | |||
| 343c8cb86f | |||
| 5ba85de7a4 | |||
| 049682a5a6 | |||
| 644d5287b2 | |||
| b03dc0a87e | |||
| 4b14aa1bcd | |||
| 688eeac81e | |||
| a65a6ce7fe | |||
| e7c3fa7f57 | |||
| 96f67c068b | |||
| eef6b0ba42 | |||
| c14ccbcd64 | |||
| 7a08a772cc | |||
| c31a6ff474 | |||
| 104599d7a8 | |||
| 51e395d13e | |||
| eb6a734995 | |||
| 84b17e03f1 | |||
| 681fc43713 | |||
| 93352e81f5 | |||
| b644178ed4 | |||
| 73d65e637b | |||
| 5077bc034f | |||
| 21d5025826 | |||
| 32590b5ecb | |||
| f701b98e4a | |||
| a4122813d1 | |||
| 24bdc94da5 | |||
| ca541bd4f4 | |||
| 816f442496 | |||
| e46e3bc173 | |||
| 6604764007 | |||
| e95ea479ee | |||
| 0437d6cd03 | |||
| 5a5b590d06 | |||
| b54109c746 | |||
| 6ba31a8a94 | |||
| 7a06d07e14 | |||
| c1c7e89620 | |||
| f51ac9e059 | |||
| 1d2c29f0b3 | |||
| 9470c00042 | |||
| 7f5088503f | |||
| f2846ad2b7 | |||
| b57c7bce21 | |||
| fce1fcfe71 | |||
| aa3e35ac67 | |||
| 6d2b203339 | |||
| 3f06f95ebe | |||
| 3a10c6192b | |||
| bd5dc10fd2 | |||
| cc7d8b87e1 | |||
| 98bad9c6d6 | |||
| 9ba021ea75 | |||
| d087165db0 | |||
| 9d6998c759 | |||
| 554ed5d1e0 | |||
| 8c33cf4eec | |||
| 67acb0b123 | |||
| 0f49deacbf | |||
| d00f1ca860 | |||
| 65442718c4 | |||
| d314ce70bf | |||
| 5ee9e786d1 | |||
| 4de1bdbf63 | |||
| 293e6271c6 | |||
| 23874f5948 | |||
| dd4216b766 |
2
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
2
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@ -55,7 +55,7 @@ body:
|
||||
- deepspeed: HF Trainer/Accelerate: @muellerzr
|
||||
- ray/raytune: @richardliaw, @amogkam
|
||||
- Big Model Inference: @SunMarc
|
||||
- quantization (bitsandbytes, autogpt): @SunMarc
|
||||
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
|
||||
|
||||
Documentation: @stevhliu
|
||||
|
||||
|
||||
2
.github/PULL_REQUEST_TEMPLATE.md
vendored
2
.github/PULL_REQUEST_TEMPLATE.md
vendored
@ -59,7 +59,7 @@ Integrations:
|
||||
- deepspeed: HF Trainer/Accelerate: @muellerzr
|
||||
- ray/raytune: @richardliaw, @amogkam
|
||||
- Big Model Inference: @SunMarc
|
||||
- quantization (bitsandbytes, autogpt): @SunMarc
|
||||
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
|
||||
|
||||
Documentation: @stevhliu
|
||||
|
||||
|
||||
20
.github/workflows/benchmark.yml
vendored
20
.github/workflows/benchmark.yml
vendored
@ -16,23 +16,22 @@ env:
|
||||
jobs:
|
||||
benchmark:
|
||||
name: Benchmark
|
||||
strategy:
|
||||
matrix:
|
||||
group: [aws-g5-4xlarge-cache, aws-p4d-24xlarge-plus]
|
||||
runs-on:
|
||||
group: aws-g5-4xlarge-cache
|
||||
group: ${{ matrix.group }}
|
||||
if: |
|
||||
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark') )||
|
||||
(github.event_name == 'push' && github.ref == 'refs/heads/main')
|
||||
container:
|
||||
image: huggingface/transformers-pytorch-gpu
|
||||
options: --gpus all --privileged --ipc host
|
||||
steps:
|
||||
- name: Get repo
|
||||
if: github.event_name == 'pull_request'
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ref: ${{ github.event.pull_request.head.sha }}
|
||||
|
||||
- name: Get repo
|
||||
if: github.event_name == 'push'
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ref: ${{ github.sha }}
|
||||
ref: ${{ github.event.pull_request.head.sha || github.sha }}
|
||||
|
||||
- name: Install libpq-dev & psql
|
||||
run: |
|
||||
@ -67,6 +66,9 @@ jobs:
|
||||
python3 benchmark/llama.py "${{ github.head_ref || github.ref_name }}" "$commit_id" "$commit_msg"
|
||||
env:
|
||||
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
|
||||
# Enable this to see debug logs
|
||||
# HF_HUB_VERBOSITY: debug
|
||||
# TRANSFORMERS_VERBOSITY: debug
|
||||
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
|
||||
PGUSER: transformers_benchmarks
|
||||
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
|
||||
|
||||
642
.github/workflows/build-docker-images.yml
vendored
642
.github/workflows/build-docker-images.yml
vendored
@ -3,7 +3,7 @@ name: Build docker images (scheduled)
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- build_ci_docker_image*
|
||||
- update-quantization-docker
|
||||
repository_dispatch:
|
||||
workflow_call:
|
||||
inputs:
|
||||
@ -18,341 +18,341 @@ concurrency:
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
latest-docker:
|
||||
name: "Latest PyTorch + TensorFlow [dev]"
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-all-latest-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
|
||||
# Push CI images still need to be re-built daily
|
||||
-
|
||||
name: Build and push (for Push CI) in a daily basis
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-all-latest-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-all-latest-gpu-push-ci
|
||||
# latest-docker:
|
||||
# name: "Latest PyTorch + TensorFlow [dev]"
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-all-latest-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
|
||||
# # Push CI images still need to be re-built daily
|
||||
# -
|
||||
# name: Build and push (for Push CI) in a daily basis
|
||||
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-all-latest-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-all-latest-gpu-push-ci
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the transformers-all-latest-gpu-push-ci docker build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the transformers-all-latest-gpu-push-ci docker build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
latest-torch-deepspeed-docker:
|
||||
name: "Latest PyTorch + DeepSpeed"
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu${{ inputs.image_postfix }}
|
||||
# latest-torch-deepspeed-docker:
|
||||
# name: "Latest PyTorch + DeepSpeed"
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-deepspeed-latest-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-deepspeed-latest-gpu${{ inputs.image_postfix }}
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER}}
|
||||
title: ЁЯдЧ Results of the transformers-pytorch-deepspeed-latest-gpu docker build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER}}
|
||||
# title: ЁЯдЧ Results of the transformers-pytorch-deepspeed-latest-gpu docker build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
# Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`)
|
||||
latest-torch-deepspeed-docker-for-push-ci-daily-build:
|
||||
name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# Push CI images still need to be re-built daily
|
||||
-
|
||||
name: Build and push (for Push CI) in a daily basis
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
|
||||
# # Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`)
|
||||
# latest-torch-deepspeed-docker-for-push-ci-daily-build:
|
||||
# name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# # Push CI images still need to be re-built daily
|
||||
# -
|
||||
# name: Build and push (for Push CI) in a daily basis
|
||||
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-deepspeed-latest-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
doc-builder:
|
||||
name: "Doc builder"
|
||||
# Push CI doesn't need this image
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-doc-builder
|
||||
push: true
|
||||
tags: huggingface/transformers-doc-builder
|
||||
# doc-builder:
|
||||
# name: "Doc builder"
|
||||
# # Push CI doesn't need this image
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-doc-builder
|
||||
# push: true
|
||||
# tags: huggingface/transformers-doc-builder
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the huggingface/transformers-doc-builder docker build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the huggingface/transformers-doc-builder docker build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
latest-pytorch:
|
||||
name: "Latest PyTorch [dev]"
|
||||
# Push CI doesn't need this image
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-gpu
|
||||
# latest-pytorch:
|
||||
# name: "Latest PyTorch [dev]"
|
||||
# # Push CI doesn't need this image
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-gpu
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the huggingface/transformers-pytorch-gpudocker build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the huggingface/transformers-pytorch-gpudocker build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
latest-pytorch-amd:
|
||||
name: "Latest PyTorch (AMD) [dev]"
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-amd-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
|
||||
# Push CI images still need to be re-built daily
|
||||
-
|
||||
name: Build and push (for Push CI) in a daily basis
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-amd-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-amd-gpu-push-ci
|
||||
# latest-pytorch-amd:
|
||||
# name: "Latest PyTorch (AMD) [dev]"
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-amd-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
|
||||
# # Push CI images still need to be re-built daily
|
||||
# -
|
||||
# name: Build and push (for Push CI) in a daily basis
|
||||
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-amd-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-amd-gpu-push-ci
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
latest-tensorflow:
|
||||
name: "Latest TensorFlow [dev]"
|
||||
# Push CI doesn't need this image
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-tensorflow-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-tensorflow-gpu
|
||||
# latest-tensorflow:
|
||||
# name: "Latest TensorFlow [dev]"
|
||||
# # Push CI doesn't need this image
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-tensorflow-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-tensorflow-gpu
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the huggingface/transformers-tensorflow-gpu build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the huggingface/transformers-tensorflow-gpu build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
latest-pytorch-deepspeed-amd:
|
||||
name: "PyTorch + DeepSpeed (AMD) [dev]"
|
||||
runs-on:
|
||||
group: aws-general-8-plus
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
|
||||
# Push CI images still need to be re-built daily
|
||||
-
|
||||
name: Build and push (for Push CI) in a daily basis
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci
|
||||
# latest-pytorch-deepspeed-amd:
|
||||
# name: "PyTorch + DeepSpeed (AMD) [dev]"
|
||||
# runs-on:
|
||||
# group: aws-general-8-plus
|
||||
# steps:
|
||||
# -
|
||||
# name: Set up Docker Buildx
|
||||
# uses: docker/setup-buildx-action@v3
|
||||
# -
|
||||
# name: Check out code
|
||||
# uses: actions/checkout@v4
|
||||
# -
|
||||
# name: Login to DockerHub
|
||||
# uses: docker/login-action@v3
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
# -
|
||||
# name: Build and push
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-deepspeed-amd-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
|
||||
# # Push CI images still need to be re-built daily
|
||||
# -
|
||||
# name: Build and push (for Push CI) in a daily basis
|
||||
# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# # The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
# if: inputs.image_postfix != '-push-ci'
|
||||
# uses: docker/build-push-action@v5
|
||||
# with:
|
||||
# context: ./docker/transformers-pytorch-deepspeed-amd-gpu
|
||||
# build-args: |
|
||||
# REF=main
|
||||
# push: true
|
||||
# tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci
|
||||
|
||||
- name: Post to Slack
|
||||
if: always()
|
||||
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
with:
|
||||
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
title: ЁЯдЧ Results of the transformers-pytorch-deepspeed-amd-gpu build
|
||||
status: ${{ job.status }}
|
||||
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
# - name: Post to Slack
|
||||
# if: always()
|
||||
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||
# with:
|
||||
# slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
|
||||
# title: ЁЯдЧ Results of the transformers-pytorch-deepspeed-amd-gpu build
|
||||
# status: ${{ job.status }}
|
||||
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
|
||||
latest-quantization-torch-docker:
|
||||
name: "Latest Pytorch + Quantization [dev]"
|
||||
|
||||
129
.github/workflows/check_failed_model_tests.yml
vendored
Normal file
129
.github/workflows/check_failed_model_tests.yml
vendored
Normal file
@ -0,0 +1,129 @@
|
||||
name: Process failed tests
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
docker:
|
||||
required: true
|
||||
type: string
|
||||
start_sha:
|
||||
required: true
|
||||
type: string
|
||||
|
||||
|
||||
env:
|
||||
HF_HOME: /mnt/cache
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
OMP_NUM_THREADS: 8
|
||||
MKL_NUM_THREADS: 8
|
||||
RUN_SLOW: yes
|
||||
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
|
||||
# This token is created under the bot `hf-transformers-bot`.
|
||||
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
|
||||
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
|
||||
TF_FORCE_GPU_ALLOW_GROWTH: true
|
||||
RUN_PT_TF_CROSS_TESTS: 1
|
||||
CUDA_VISIBLE_DEVICES: 0,1
|
||||
|
||||
|
||||
jobs:
|
||||
run_models_gpu:
|
||||
name: " "
|
||||
runs-on:
|
||||
group: aws-g4dn-2xlarge-cache
|
||||
container:
|
||||
image: ${{ inputs.docker }}
|
||||
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: ci_results_run_models_gpu
|
||||
path: /transformers/ci_results_run_models_gpu
|
||||
|
||||
- name: Update clone
|
||||
working-directory: /transformers
|
||||
run: git fetch && git checkout ${{ github.sha }}
|
||||
|
||||
- name: Get target commit
|
||||
working-directory: /transformers/utils
|
||||
run: |
|
||||
echo "END_SHA=$(TOKEN=${{ secrets.ACCESS_REPO_INFO_TOKEN }} python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"]); print(commit)')" >> $GITHUB_ENV
|
||||
|
||||
- name: Checkout to `start_sha`
|
||||
working-directory: /transformers
|
||||
run: git fetch && git checkout ${{ inputs.start_sha }}
|
||||
|
||||
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
|
||||
working-directory: /transformers
|
||||
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Environment
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
python3 utils/print_env.py
|
||||
|
||||
- name: Show installed libraries and their versions
|
||||
working-directory: /transformers
|
||||
run: pip freeze
|
||||
|
||||
- name: Check failed tests
|
||||
working-directory: /transformers
|
||||
run: python3 utils/check_bad_commit.py --start_commit ${{ inputs.start_sha }} --end_commit ${{ env.END_SHA }} --file ci_results_run_models_gpu/new_model_failures.json --output_file new_model_failures_with_bad_commit.json
|
||||
|
||||
- name: Show results
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
ls -l new_model_failures_with_bad_commit.json
|
||||
cat new_model_failures_with_bad_commit.json
|
||||
|
||||
- name: Checkout back
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
git checkout ${{ inputs.start_sha }}
|
||||
|
||||
- name: Process report
|
||||
shell: bash
|
||||
working-directory: /transformers
|
||||
env:
|
||||
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
|
||||
run: |
|
||||
python3 utils/process_bad_commit_report.py
|
||||
|
||||
- name: Process report
|
||||
shell: bash
|
||||
working-directory: /transformers
|
||||
env:
|
||||
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
|
||||
run: |
|
||||
{
|
||||
echo 'REPORT_TEXT<<EOF'
|
||||
python3 utils/process_bad_commit_report.py
|
||||
echo EOF
|
||||
} >> "$GITHUB_ENV"
|
||||
|
||||
- name: Send processed report
|
||||
if: ${{ !endsWith(env.REPORT_TEXT, '{}') }}
|
||||
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
|
||||
with:
|
||||
# Slack channel id, channel name, or user id to post message.
|
||||
# See also: https://api.slack.com/methods/chat.postMessage#channels
|
||||
channel-id: '#transformers-ci-feedback-tests'
|
||||
# For posting a rich message using Block Kit
|
||||
payload: |
|
||||
{
|
||||
"blocks": [
|
||||
{
|
||||
"type": "section",
|
||||
"text": {
|
||||
"type": "mrkdwn",
|
||||
"text": "${{ env.REPORT_TEXT }}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
env:
|
||||
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||
10
.github/workflows/self-scheduled.yml
vendored
10
.github/workflows/self-scheduled.yml
vendored
@ -562,3 +562,13 @@ jobs:
|
||||
ci_event: ${{ inputs.ci_event }}
|
||||
|
||||
secrets: inherit
|
||||
|
||||
check_new_model_failures:
|
||||
if: ${{ always() && inputs.ci_event == 'Daily CI' && inputs.job == 'run_models_gpu' && needs.send_results.result == 'success' }}
|
||||
name: Check new model failures
|
||||
needs: send_results
|
||||
uses: ./.github/workflows/check_failed_model_tests.yml
|
||||
with:
|
||||
docker: ${{ inputs.docker }}
|
||||
start_sha: ${{ github.sha }}
|
||||
secrets: inherit
|
||||
@ -132,7 +132,7 @@ You will need basic `git` proficiency to contribute to
|
||||
manual. Type `git --help` in a shell and enjoy! If you prefer books, [Pro
|
||||
Git](https://git-scm.com/book/en/v2) is a very good reference.
|
||||
|
||||
You'll need **[Python 3.8](https://github.com/huggingface/transformers/blob/main/setup.py#L449)** or above to contribute to ЁЯдЧ Transformers. Follow the steps below to start contributing:
|
||||
You'll need **[Python 3.9](https://github.com/huggingface/transformers/blob/main/setup.py#L449)** or above to contribute to ЁЯдЧ Transformers. Follow the steps below to start contributing:
|
||||
|
||||
1. Fork the [repository](https://github.com/huggingface/transformers) by
|
||||
clicking on the **[Fork](https://github.com/huggingface/transformers/fork)** button on the repository's page. This creates a copy of the code
|
||||
|
||||
@ -128,10 +128,10 @@ incredible projects built in the vicinity of transformers.
|
||||
|
||||
If you own or use a project that you believe should be part of the list, please open a PR to add it!
|
||||
|
||||
## If you are looking for custom support from the Hugging Face team
|
||||
## Serious about AI in your organisation? Build faster with the Hugging Face Enterprise Hub.
|
||||
|
||||
<a target="_blank" href="https://huggingface.co/support">
|
||||
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
|
||||
<a target="_blank" href="https://huggingface.co/enterprise">
|
||||
<img alt="Hugging Face Enterprise Hub" src="https://github.com/user-attachments/assets/247fb16d-d251-4583-96c4-d3d76dda4925">
|
||||
</a><br>
|
||||
|
||||
## Quick tour
|
||||
@ -249,7 +249,7 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
|
||||
|
||||
### With pip
|
||||
|
||||
This repository is tested on Python 3.8+, Flax 0.4.1+, PyTorch 1.11+, and TensorFlow 2.6+.
|
||||
This repository is tested on Python 3.9+, Flax 0.4.1+, PyTorch 1.11+, and TensorFlow 2.6+.
|
||||
|
||||
You should install ЁЯдЧ Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -7,6 +7,10 @@ CREATE TABLE IF NOT EXISTS benchmarks (
|
||||
created_at timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS benchmarks_benchmark_id_idx ON benchmarks (benchmark_id);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS benchmarks_branch_idx ON benchmarks (branch);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS device_measurements (
|
||||
measurement_id SERIAL PRIMARY KEY,
|
||||
benchmark_id int REFERENCES benchmarks (benchmark_id),
|
||||
@ -17,6 +21,8 @@ CREATE TABLE IF NOT EXISTS device_measurements (
|
||||
time timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS device_measurements_branch_idx ON device_measurements (benchmark_id);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS model_measurements (
|
||||
measurement_id SERIAL PRIMARY KEY,
|
||||
benchmark_id int REFERENCES benchmarks (benchmark_id),
|
||||
@ -24,3 +30,4 @@ CREATE TABLE IF NOT EXISTS model_measurements (
|
||||
time timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS model_measurements_branch_idx ON model_measurements (benchmark_id);
|
||||
|
||||
@ -96,17 +96,21 @@ def run_benchmark(branch: str, commit_id: str, commit_msg: str, num_tokens_to_ge
|
||||
)
|
||||
conn.commit()
|
||||
benchmark_id = cur.fetchone()[0]
|
||||
logger.info(f"running benchmark #{benchmark_id} on {gpu_name}")
|
||||
metrics_thread = Thread(target=collect_metrics, args=[benchmark_id, continue_metric_collection])
|
||||
metrics_thread.start()
|
||||
logger.info("started background thread to fetch device metrics")
|
||||
|
||||
os.environ["TOKENIZERS_PARALLELISM"] = "false" # silence warnings when compiling
|
||||
|
||||
device = "cuda"
|
||||
ckpt = "meta-llama/Llama-2-7b-hf"
|
||||
|
||||
logger.info("downloading weights")
|
||||
# This is to avoid counting download in model load time measurement
|
||||
model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16)
|
||||
gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
|
||||
logger.info("loading model")
|
||||
start = perf_counter()
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
ckpt, torch_dtype=torch.float16, generation_config=gen_config
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
|
||||
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
|
||||
LABEL maintainer="Hugging Face"
|
||||
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
@ -9,7 +9,7 @@ SHELL ["sh", "-lc"]
|
||||
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
|
||||
# to be used as arguments for docker build (so far).
|
||||
|
||||
ARG PYTORCH='2.4.0'
|
||||
ARG PYTORCH='2.5.1'
|
||||
# (not always a valid torch version)
|
||||
ARG INTEL_TORCH_EXT='2.3.0'
|
||||
# Example: `cu102`, `cu113`, etc.
|
||||
@ -26,7 +26,7 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
|
||||
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
|
||||
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
|
||||
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
|
||||
RUN python3 -m pip install --no-cache-dir -U tensorflow==2.13 protobuf==3.20.3 tensorflow_text tensorflow_probability && python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
|
||||
RUN python3 -m pip install --no-cache-dir -U tensorflow==2.13 protobuf==3.20.3 "tensorflow_text<2.16" "tensorflow_probability<0.22" && python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
|
||||
|
||||
RUN python3 -m pip uninstall -y flax jax
|
||||
|
||||
@ -43,7 +43,7 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/pef
|
||||
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum
|
||||
|
||||
# For video model testing
|
||||
RUN python3 -m pip install --no-cache-dir decord av==9.2.0
|
||||
RUN python3 -m pip install --no-cache-dir av==9.2.0
|
||||
|
||||
# Some slow tests require bnb
|
||||
RUN python3 -m pip install --no-cache-dir bitsandbytes
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
|
||||
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
|
||||
LABEL maintainer="Hugging Face"
|
||||
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
@ -11,7 +11,7 @@ ARG REF=main
|
||||
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
|
||||
|
||||
# If set to nothing, will install the latest version
|
||||
ARG PYTORCH='2.4.0'
|
||||
ARG PYTORCH='2.5.1'
|
||||
ARG TORCH_VISION=''
|
||||
ARG TORCH_AUDIO=''
|
||||
# Example: `cu102`, `cu113`, etc.
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
|
||||
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
|
||||
LABEL maintainer="Hugging Face"
|
||||
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
@ -9,12 +9,12 @@ SHELL ["sh", "-lc"]
|
||||
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
|
||||
# to be used as arguments for docker build (so far).
|
||||
|
||||
ARG PYTORCH='2.2.1'
|
||||
ARG PYTORCH='2.4.1'
|
||||
# Example: `cu102`, `cu113`, etc.
|
||||
ARG CUDA='cu118'
|
||||
|
||||
RUN apt update
|
||||
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python python3-pip ffmpeg
|
||||
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
|
||||
RUN python3 -m pip install --no-cache-dir --upgrade pip
|
||||
|
||||
ARG REF=main
|
||||
@ -53,7 +53,7 @@ RUN python3 -m pip install --no-cache-dir gguf
|
||||
|
||||
# Add autoawq for quantization testing
|
||||
# >=v0.2.3 needed for compatibility with torch 2.2.1
|
||||
RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.3/autoawq-0.2.3+cu118-cp38-cp38-linux_x86_64.whl
|
||||
RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.3/autoawq-0.2.3+cu118-cp310-cp310-linux_x86_64.whl
|
||||
|
||||
# Add quanto for quantization testing
|
||||
RUN python3 -m pip install --no-cache-dir optimum-quanto
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
|
||||
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
|
||||
LABEL maintainer="Hugging Face"
|
||||
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
@ -18,7 +18,7 @@ RUN [ ${#TENSORFLOW} -gt 0 ] && VERSION='tensorflow=='$TENSORFLOW'.*' || VERSIO
|
||||
RUN python3 -m pip uninstall -y torch flax
|
||||
RUN python3 -m pip install -U "itsdangerous<2.1.0"
|
||||
|
||||
RUN python3 -m pip install --no-cache-dir -U tensorflow_probability
|
||||
RUN python3 -m pip install --no-cache-dir -U "tensorflow_probability<0.22"
|
||||
|
||||
# When installing in editable mode, `transformers` is not recognized as a package.
|
||||
# this line must be added in order for python to be aware of transformers.
|
||||
|
||||
@ -276,14 +276,14 @@ building the return.
|
||||
|
||||
Here's an example of a single value return:
|
||||
|
||||
```
|
||||
```python
|
||||
Returns:
|
||||
`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
|
||||
```
|
||||
|
||||
Here's an example of a tuple return, comprising several objects:
|
||||
|
||||
```
|
||||
```python
|
||||
Returns:
|
||||
`tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
|
||||
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
|
||||
@ -322,10 +322,9 @@ includes an example of how to transcribe speech to text in the
|
||||
|
||||
The syntax for Example docstrings can look as follows:
|
||||
|
||||
```
|
||||
```python
|
||||
Example:
|
||||
|
||||
```python
|
||||
>>> from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
|
||||
>>> from datasets import load_dataset
|
||||
>>> import torch
|
||||
@ -347,7 +346,6 @@ The syntax for Example docstrings can look as follows:
|
||||
>>> transcription = processor.batch_decode(predicted_ids)
|
||||
>>> transcription[0]
|
||||
'MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL'
|
||||
```
|
||||
```
|
||||
|
||||
The docstring should give a minimal, clear example of how the respective model
|
||||
|
||||
@ -1,57 +1,70 @@
|
||||
### Translating the Transformers documentation into your language
|
||||
# Translating the Transformers documentation into your language
|
||||
|
||||
As part of our mission to democratize machine learning, we'd love to make the Transformers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language ЁЯЩП.
|
||||
As part of our mission to democratize machine learning, we aim to make the Transformers library available in many more languages! Follow the steps below to help translate the documentation into your language.
|
||||
|
||||
**ЁЯЧЮя╕П Open an issue**
|
||||
## Open an Issue
|
||||
|
||||
To get started, navigate to the [Issues](https://github.com/huggingface/transformers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "Translation template" from the "New issue" button.
|
||||
1. Navigate to the Issues page of this repository.
|
||||
2. Check if anyone has already opened an issue for your language.
|
||||
3. If not, create a new issue by selecting the "Translation template" from the "New issue" button.
|
||||
4. Post a comment indicating which chapters youтАЩd like to work on, and weтАЩll add your name to the list.
|
||||
|
||||
Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list.
|
||||
## Fork the Repository
|
||||
|
||||
1. First, fork the Transformers repo by clicking the Fork button in the top-right corner.
|
||||
2. Clone your fork to your local machine for editing with the following command:
|
||||
|
||||
**ЁЯН┤ Fork the repository**
|
||||
```bash
|
||||
git clone https://github.com/YOUR-USERNAME/transformers.git
|
||||
```
|
||||
|
||||
Replace `YOUR-USERNAME` with your GitHub username.
|
||||
|
||||
First, you'll need to [fork the Transformers repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo). You can do this by clicking on the **Fork** button on the top-right corner of this repo's page.
|
||||
## Copy-paste the English version with a new language code
|
||||
|
||||
Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:
|
||||
The documentation files are organized in the following directory:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/YOUR-USERNAME/transformers.git
|
||||
```
|
||||
- **docs/source**: This contains all documentation materials organized by language.
|
||||
|
||||
**ЁЯУЛ Copy-paste the English version with a new language code**
|
||||
To copy the English version to your new language directory:
|
||||
|
||||
The documentation files are in one leading directory:
|
||||
1. Navigate to your fork of the repository:
|
||||
|
||||
- [`docs/source`](https://github.com/huggingface/transformers/tree/main/docs/source): All the documentation materials are organized here by language.
|
||||
```bash
|
||||
cd ~/path/to/transformers/docs
|
||||
```
|
||||
|
||||
You'll only need to copy the files in the [`docs/source/en`](https://github.com/huggingface/transformers/tree/main/docs/source/en) directory, so first navigate to your fork of the repo and run the following:
|
||||
Replace `~/path/to` with your actual path.
|
||||
|
||||
```bash
|
||||
cd ~/path/to/transformers/docs
|
||||
cp -r source/en source/LANG-ID
|
||||
```
|
||||
2. Run the following command:
|
||||
|
||||
Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
|
||||
```bash
|
||||
cp -r source/en source/LANG-ID
|
||||
```
|
||||
|
||||
**тЬНя╕П Start translating**
|
||||
Replace `LANG-ID` with the appropriate ISO 639-1 or ISO 639-2 language code (see [this table](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for reference).
|
||||
|
||||
The fun part comes - translating the text!
|
||||
## Start translating
|
||||
|
||||
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
|
||||
Begin translating the text!
|
||||
|
||||
> ЁЯЩЛ If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!
|
||||
1. Start with the `_toctree.yml` file that corresponds to your documentation chapter. This file is essential for rendering the table of contents on the website.
|
||||
|
||||
The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/transformers/blob/main/docs/source/en/_toctree.yml):
|
||||
- If the `_toctree.yml` file doesnтАЩt exist for your language, create one by copying the English version and removing unrelated sections.
|
||||
- Ensure it is placed in the `docs/source/LANG-ID/` directory.
|
||||
|
||||
```yaml
|
||||
- sections:
|
||||
- local: pipeline_tutorial # Do not change this! Use the same name for your .md file
|
||||
title: Pipelines for inference # Translate this!
|
||||
...
|
||||
title: Tutorials # Translate this!
|
||||
```
|
||||
HereтАЩs an example structure for the `_toctree.yml` file:
|
||||
|
||||
Once you have translated the `_toctree.yml` file, you can start translating the [MDX](https://mdxjs.com/) files associated with your docs chapter.
|
||||
```yaml
|
||||
- sections:
|
||||
- local: pipeline_tutorial # Keep this name for your .md file
|
||||
title: Pipelines for Inference # Translate this
|
||||
...
|
||||
title: Tutorials # Translate this
|
||||
```
|
||||
|
||||
> ЁЯЩЛ If you'd like others to help you with the translation, you should [open an issue](https://github.com/huggingface/transformers/issues) and tag @stevhliu.
|
||||
2. Once youтАЩve translated the `_toctree.yml`, move on to translating the associated MDX files.
|
||||
|
||||
## Collaborate and share
|
||||
|
||||
If you'd like assistance with your translation, open an issue and tag `@stevhliu`. Feel free to share resources or glossaries to ensure consistent terminology.
|
||||
|
||||
@ -108,38 +108,38 @@
|
||||
# title: ╪п┘Д┘К┘Д ╪е╪▒╪┤╪з╪п┘К ┘Д┘Е╪н┘Б╪▓╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║┘И┘К╪й ╪з┘Д┘Г╪и┘К╪▒╪й
|
||||
# title: ╪з┘Д╪е╪▒╪┤╪з╪п
|
||||
# title: ╪г╪п┘Д╪й ╪з┘Д┘Е┘З╪з┘Е
|
||||
# - sections:
|
||||
# - local: fast_tokenizers
|
||||
# title: ╪з╪│╪к╪о╪п┘Е ╪и╪▒╪з┘Е╪м ╪з┘Д╪к╪м╪▓╪ж╪й ╪з┘Д╪│╪▒┘К╪╣╪й ┘Е┘Ж ЁЯдЧ Tokenizers
|
||||
# - local: multilingual
|
||||
# title: ╪к╪┤╪║┘К┘Д ╪з┘Д╪з╪│╪к┘Ж╪к╪з╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к
|
||||
# - local: create_a_model
|
||||
# title: ╪з╪│╪к╪о╪п╪з┘Е ┘И╪з╪м┘З╪з╪к ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д╪о╪з╪╡╪й ╪и╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
# - local: custom_models
|
||||
# title: ┘Е╪┤╪з╪▒┘Г╪й ┘Ж┘Е┘И╪░╪м ┘Е╪о╪╡╪╡
|
||||
# - local: chat_templating
|
||||
# title: ┘В┘И╪з┘Д╪и ┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪п╪▒╪п╪┤╪й
|
||||
# - local: trainer
|
||||
# title: ╪з┘Д┘Е╪п╪▒╪и
|
||||
# - local: sagemaker
|
||||
# title: ╪к╪┤╪║┘К┘Д ╪з┘Д╪к╪п╪▒┘К╪и ╪╣┘Д┘Й Amazon SageMaker
|
||||
# - local: serialization
|
||||
# title: ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й ONNX
|
||||
# - local: tflite
|
||||
# title: ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TFLite
|
||||
# - local: torchscript
|
||||
# title: ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TorchScript
|
||||
- sections:
|
||||
- local: fast_tokenizers
|
||||
title: ╪з╪│╪к╪о╪п┘Е ┘Е╪м╪▓╪ж┘К╪з╪к ╪з┘Д┘Ж╪╡┘И╪╡ ╪з┘Д╪│╪▒┘К╪╣╪й ┘Е┘Ж ЁЯдЧ Tokenizers
|
||||
- local: multilingual
|
||||
title: ╪з┘Д╪з╪│╪к╪п┘Д╪з┘Д ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к
|
||||
- local: create_a_model
|
||||
title: ╪з╪│╪к╪о╪п╪з┘Е ┘И╪з╪м┘З╪з╪к ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д╪о╪з╪╡╪й ╪и╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
- local: custom_models
|
||||
title: ┘Е╪┤╪з╪▒┘Г╪й ┘Ж┘Е┘И╪░╪м ┘Е╪о╪╡╪╡
|
||||
- local: chat_templating
|
||||
title: ┘В┘И╪з┘Д╪и ┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪п╪▒╪п╪┤╪й
|
||||
- local: trainer
|
||||
title: ╪з┘Д┘Е╪п╪▒╪и
|
||||
- local: sagemaker
|
||||
title: ╪к╪┤╪║┘К┘Д ╪з┘Д╪к╪п╪▒┘К╪и ╪╣┘Д┘Й Amazon SageMaker
|
||||
- local: serialization
|
||||
title: ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й ONNX
|
||||
- local: tflite
|
||||
title: ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TFLite
|
||||
- local: torchscript
|
||||
title: ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TorchScript
|
||||
# - local: benchmarks
|
||||
# title: ╪з┘Д┘Е╪╣╪з┘К┘К╪▒
|
||||
# - local: notebooks
|
||||
# title: ╪п┘Б╪з╪к╪▒ ╪з┘Д┘Е┘Д╪з╪н╪╕╪з╪к ┘Е╪╣ ╪з┘Д╪г┘Е╪л┘Д╪й
|
||||
# - local: community
|
||||
# title: ┘Е┘И╪з╪▒╪п ╪з┘Д┘Е╪м╪к┘Е╪╣
|
||||
# - local: troubleshooting
|
||||
# title: ╪з╪│╪к┘Г╪┤╪з┘Б ╪з┘Д╪г╪о╪╖╪з╪б ┘И╪е╪╡┘Д╪з╪н┘З╪з
|
||||
# - local: gguf
|
||||
# title: ╪з┘Д╪к┘И╪з┘Б┘В ┘Е╪╣ ┘Е┘Д┘Б╪з╪к GGUF
|
||||
# title: ╪г╪п┘Д╪й ╪з┘Д┘Е╪╖┘И╪▒┘К┘Ж
|
||||
- local: troubleshooting
|
||||
title: ╪з╪│╪к┘Г╪┤╪з┘Б ╪з┘Д╪г╪о╪╖╪з╪б ┘И╪е╪╡┘Д╪з╪н┘З╪з
|
||||
- local: gguf
|
||||
title: ╪з┘Д╪к┘И╪з┘Б┘В ┘Е╪╣ ┘Е┘Д┘Б╪з╪к GGUF
|
||||
title: ╪г╪п┘Д╪й ╪з┘Д┘Е╪╖┘И╪▒┘К┘Ж
|
||||
# - sections:
|
||||
# - local: quantization/overview
|
||||
# title: ┘Ж╪╕╪▒╪й ╪╣╪з┘Е╪й
|
||||
|
||||
@ -464,7 +464,7 @@ image = image_generator(prompt=improved_prompt)
|
||||
|
||||
┘В╪и┘Д ╪е┘Ж╪┤╪з╪б ╪з┘Д╪╡┘И╪▒╪й ╪г╪о┘К╪▒┘Л╪з:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png" />
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit_spacesuit_flux.webp" />
|
||||
|
||||
> [!WARNING]
|
||||
> ╪к╪к╪╖┘Д╪и gradio-tools ╪е╪п╪о╪з┘Д╪з╪к ┘И╪е╪о╪▒╪з╪м╪з╪к *┘Ж╪╡┘К╪й* ╪н╪к┘Й ╪╣┘Ж╪п ╪з┘Д╪╣┘Е┘Д ┘Е╪╣ ╪╖╪▒╪з╪ж┘В ┘Е╪о╪к┘Д┘Б╪й ┘Е╪л┘Д ┘Г╪з╪ж┘Ж╪з╪к ╪з┘Д╪╡┘И╪▒ ┘И╪з┘Д╪╡┘И╪к. ╪з┘Д╪е╪п╪о╪з┘Д╪з╪к ┘И╪з┘Д╪е╪о╪▒╪з╪м╪з╪к ╪з┘Д╪╡┘И╪▒┘К╪й ┘И╪з┘Д╪╡┘И╪к┘К╪й ╪║┘К╪▒ ┘Е╪к┘И╪з┘Б┘В╪й ╪н╪з┘Д┘К┘Л╪з.
|
||||
|
||||
835
docs/source/ar/chat_templating.md
Normal file
835
docs/source/ar/chat_templating.md
Normal file
@ -0,0 +1,835 @@
|
||||
# ┘В┘И╪з┘Д╪и ┘Ж┘Е╪з╪░╪м ╪з┘Д╪п╪▒╪п╪┤╪й
|
||||
|
||||
## ┘Е┘В╪п┘Е╪й
|
||||
|
||||
╪к╪╣╪п **╪з┘Д╪п╪▒╪п╪┤╪й** ╪г╪н╪п ╪з╪│╪к╪о╪п╪з┘Е╪з╪к ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪з╪к ╪з┘Д┘Г╪и┘К╪▒╪й (LLMs) ╪┤╪з╪ж╪╣╪й ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е ╪и╪┤┘Г┘Д ┘Е╪к╪▓╪з┘К╪п. ┘Б┘Б┘К ╪│┘К╪з┘В ╪з┘Д╪п╪▒╪п╪┤╪й╪М ┘И╪и╪п┘Д╪з┘Л ┘Е┘Ж ┘Е╪к╪з╪и╪╣╪й ╪│┘Д╪│┘Д╪й ┘Ж╪╡┘К╪й ┘И╪з╪н╪п╪й (┘Г┘Е╪з ┘З┘И ╪з┘Д╪н╪з┘Д ┘Е╪╣ ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪з╪к ╪з┘Д┘В┘К╪з╪│┘К╪й)╪М ┘К┘И╪з╪╡┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г ┘Е╪н╪з╪п╪л╪й ╪к╪к┘Г┘И┘Ж ┘Е┘Ж ╪▒╪│╪з┘Д╪й ┘И╪з╪н╪п╪й ╪г┘И ╪г┘Г╪л╪▒╪М ╪к╪к╪╢┘Е┘Ж ┘Г┘Д ┘Е┘Ж┘З╪з ╪п┘И╪▒┘Л╪з╪М ┘Е╪л┘Д "╪з┘Д┘Е╪│╪к╪о╪п┘Е" ╪г┘И "╪з┘Д┘Е╪│╪з╪╣╪п"╪М ╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ┘Ж╪╡ ╪з┘Д╪▒╪│╪з┘Д╪й.
|
||||
|
||||
┘И┘Г┘Е╪з ┘З┘И ╪з┘Д╪н╪з┘Д ┘Е╪╣ ╪к┘В╪│┘К┘Е ╪з┘Д┘Ж╪╡ ╪е┘Д┘Й ╪▒┘Е┘И╪▓ (tokenization)╪М ╪к╪к┘И┘В╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪о╪к┘Д┘Б╪й ╪к┘Ж╪│┘К┘В╪з╪к ╪е╪п╪о╪з┘Д ┘Е╪о╪к┘Д┘Б╪й ╪к┘Е╪з┘Е┘Л╪з ┘Д┘Д┘Е╪н╪з╪п╪л╪й. ┘Д┘З╪░╪з ╪з┘Д╪│╪и╪и ╪г╪╢┘Б┘Ж╪з **┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й** ┘Г┘Е┘К╪▓╪й ╪м╪п┘К╪п╪й. ╪к┘П╪╣╪п ┘В┘И╪з┘Д╪и ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪м╪▓╪б┘Л╪з ┘Е┘Ж tokenizer. ╪к╪н╪п╪п ┘З╪░┘З ╪з┘Д┘В┘И╪з┘Д╪и ┘Г┘К┘Б┘К╪й ╪к╪н┘И┘К┘Д ╪з┘Д┘Е╪н╪з╪п╪л╪з╪к╪М ┘И╪з┘Д╪к┘К ┘К╪к┘Е ╪к┘Е╪л┘К┘Д┘З╪з ┘Г┘В┘И╪з╪ж┘Е ┘Е┘Ж ╪з┘Д╪▒╪│╪з╪ж┘Д╪М ╪е┘Д┘Й ╪│┘Д╪│┘Д╪й ┘Ж╪╡┘К╪й ┘И╪з╪н╪п╪й ┘В╪з╪и┘Д╪й ┘Д┘Д╪к┘В╪│┘К┘Е ╪е┘Д┘Й ╪▒┘Е┘И╪▓ ╪и╪з┘Д╪к┘Ж╪│┘К┘В ╪з┘Д╪░┘К ┘К╪к┘И┘В╪╣┘З ╪з┘Д┘Ж┘Е┘И╪░╪м.
|
||||
|
||||
╪п╪╣┘И┘Ж╪з ┘Ж╪м╪╣┘Д ┘З╪░╪з ┘Е┘Д┘Е┘И╪│┘Л╪з ╪и┘Е╪л╪з┘Д ╪│╪▒┘К╪╣ ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м `BlenderBot`. ┘Д╪п┘Й BlenderBot ┘В╪з┘Д╪и ╪з┘Б╪к╪▒╪з╪╢┘К ╪и╪│┘К╪╖ ┘Д┘Д╪║╪з┘К╪й╪М ┘И╪з┘Д╪░┘К ┘К╪╢┘К┘Б ┘Б┘К ╪з┘Д╪║╪з┘Д╪и ┘Е╪│╪з┘Б╪з╪к ╪и┘К╪╢╪з╪б ╪и┘К┘Ж ╪м┘И┘Д╪з╪к ╪з┘Д╪н┘И╪з╪▒:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
|
||||
|
||||
>>> chat = [
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||
... ]
|
||||
|
||||
>>> tokenizer.apply_chat_template(chat, tokenize=False)
|
||||
" Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!</s>"
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ┘Г┘К┘Б ╪к┘Е ╪╢╪║╪╖ ╪з┘Д╪п╪▒╪п╪┤╪й ╪и╪г┘Г┘Е┘Д┘З╪з ┘Б┘К ╪│┘Д╪│┘Д╪й ┘И╪з╪н╪п╪й. ╪е╪░╪з ╪з╪│╪к╪о╪п┘Е┘Ж╪з `tokenize=True`╪М ┘И┘З┘И ╪з┘Д╪е╪╣╪п╪з╪п ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪М ┘Б╪│┘К╪к┘Е ╪г┘К╪╢┘Л╪з ╪к╪н┘Д┘К┘Д ╪з┘Д╪│┘Д╪│┘Д╪й ┘Ж╪н┘И┘К┘Л╪з ┘Ж┘К╪з╪и╪й ╪╣┘Ж╪з. ┘И┘Д┘Г┘Ж╪М ┘Д┘Ж╪┤╪з┘З╪п ┘В╪з┘Д╪и┘Л╪з ╪г┘Г╪л╪▒ ╪к╪╣┘В┘К╪п┘Л╪з ┘Б┘К ╪з┘Д╪╣┘Е┘Д╪М ╪п╪╣┘И┘Ж╪з ┘Ж╪│╪к╪о╪п┘Е ┘Ж┘Е┘И╪░╪м `mistralai/Mistral-7B-Instruct-v0.1`.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
|
||||
|
||||
>>> chat = [
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||
... ]
|
||||
|
||||
>>> tokenizer.apply_chat_template(chat, tokenize=False)
|
||||
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]</s>"
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ┘Г┘К┘Б ╪г╪╢╪з┘Б ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘Й tokenizer ╪▒┘Е┘И╪▓ ╪з┘Д╪к╪н┘Г┘Е `[INST]` ┘И `[/INST]` ┘Д┘Д╪е╪┤╪з╪▒╪й ╪е┘Д┘Й ╪и╪п╪з┘К╪й ┘И┘Ж┘З╪з┘К╪й ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪к╪о╪п┘Е (┘И┘Д┘Г┘Ж ┘Д┘К╪│ ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪з╪╣╪п!) ╪М ┘И╪к┘Е ╪к┘Г╪л┘К┘Б ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪и╪г┘Г┘Е┘Д┘З╪з ┘Б┘К ╪│┘Д╪│┘Д╪й ┘Ж╪╡┘К╪й ┘И╪з╪н╪п╪й. ╪е╪░╪з ╪з╪│╪к╪о╪п┘Е┘Ж╪з `tokenize=True` ╪М ┘И┘З┘И ╪з┘Д╪е╪╣╪п╪з╪п ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К ╪М ┘Б╪│┘К╪к┘Е ╪г┘К╪╢┘Л╪з ╪к┘В╪│┘К┘Е ╪к┘Д┘Г ╪з┘Д╪│┘Д╪│┘Д╪й ╪е┘Д┘Й ╪▒┘Е┘И╪▓.
|
||||
|
||||
╪н╪з┘И┘Д ╪з┘Д╪в┘Ж ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Б╪│ ╪з┘Д╪┤┘Б╪▒╪й╪М ┘Д┘Г┘Ж ┘Е╪╣ ╪з╪│╪к╪и╪п╪з┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и┘А `HuggingFaceH4/zephyr-7b-beta` ╪М ┘И╪│╪к╪н╪╡┘Д ╪╣┘Д┘Й:
|
||||
```text
|
||||
<|user|>
|
||||
Hello, how are you?</s>
|
||||
<|assistant|>
|
||||
I'm doing great. How can I help you today?</s>
|
||||
<|user|>
|
||||
I'd like to show off how chat templating works!</s>
|
||||
```
|
||||
╪к┘Е ╪╢╪и╪╖ ┘Г┘Д ┘Е┘Ж Zephyr ┘И Mistral-Instruct ┘Е┘Ж ┘Ж┘Б╪│ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪г╪╡┘Д┘К ╪М Mistral-7B-v0.1. ┘И┘Е╪╣ ╪░┘Д┘Г ╪М ┘Б┘В╪п ╪к┘Е ╪к╪п╪▒┘К╪и┘З┘Е ╪и╪к┘Ж╪│┘К┘В╪з╪к ╪п╪▒╪п╪┤╪й ┘Е╪о╪к┘Д┘Б╪й ╪к┘Е╪з┘Е┘Л╪з. ╪и╪п┘И┘Ж ┘В┘И╪з┘Д╪и ╪з┘Д┘Е╪н╪з╪п╪л╪й╪М ╪│╪к╪╢╪╖╪▒ ╪е┘Д┘Й ┘Г╪к╪з╪и╪й ╪┤┘Б╪▒╪й ╪к┘Ж╪│┘К┘В ┘К╪п┘И┘К┘Л╪з ┘Д┘Г┘Д ┘Ж┘Е┘И╪░╪м ╪М ┘И┘Е┘Ж ╪з┘Д╪│┘З┘Д ╪м╪п┘Л╪з ╪з╪▒╪к┘Г╪з╪и ╪г╪о╪╖╪з╪б ╪и╪│┘К╪╖╪й ╪к╪д╪л╪▒ ╪╣┘Д┘Й ╪з┘Д╪г╪п╪з╪б! ╪к┘П╪п┘К╪▒ ┘В┘И╪з┘Д╪и ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪к┘Б╪з╪╡┘К┘Д ╪з┘Д╪к┘Ж╪│┘К┘В ┘Ж┘К╪з╪и╪й┘Л ╪╣┘Ж┘Г ╪М ┘Е┘Е╪з ┘К┘П╪к┘К╪н ┘Д┘Г ┘Г╪к╪з╪и╪й ╪┤┘Б╪▒╪й ╪╣╪з┘Е╪й ╪к╪╣┘Е┘Д ┘Е╪╣ ╪г┘К ┘Ж┘Е┘И╪░╪м.
|
||||
|
||||
## ┘Г┘К┘Б ╪г╪│╪к╪о╪п┘Е ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й╪Я
|
||||
|
||||
┘Г┘Е╪з ╪▒╪г┘К╪к ┘Б┘К ╪з┘Д┘Е╪л╪з┘Д ╪з┘Д╪│╪з╪и┘В╪М ┘Е┘Ж ╪з┘Д╪│┘З┘Д ╪з╪│╪к╪о╪п╪з┘Е ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й. ┘В┘Е ╪и╪и╪│╪з╪╖╪й ╪и╪е┘Ж╪┤╪з╪б ┘В╪з╪ж┘Е╪й ┘Е┘Ж ╪з┘Д╪▒╪│╪з╪ж┘Д╪М ┘Е╪╣ ┘Е┘Б╪к╪з╪н┘К `role` ┘И`content`╪М ╪л┘Е ┘В┘Е ╪и╪к┘Е╪▒┘К╪▒┘З╪з ╪е┘Д┘Й [`~PreTrainedTokenizer.apply_chat_template`] . ╪и┘Е╪м╪▒╪п ┘В┘К╪з┘Е┘Г ╪и╪░┘Д┘Г╪М ╪│╪к╪н╪╡┘Д ╪╣┘Д┘Й ┘Е╪о╪▒╪м╪з╪к ╪м╪з┘З╪▓╪й ┘Д┘Д╪з╪│╪к╪о╪п╪з┘Е! ╪╣┘Ж╪п ╪з╪│╪к╪о╪п╪з┘Е ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘Г╪е╪п╪о╪з┘Д ┘Д╪к┘И┘Д┘К╪п ┘Ж╪╡┘И╪╡ ╪и┘И╪з╪│╪╖╪й ╪з┘Д┘Ж┘Е┘И╪░╪м╪М ┘Б┘Е┘Ж ╪з┘Д╪м┘К╪п ╪г┘К╪╢┘Л╪з ╪з╪│╪к╪о╪п╪з┘Е `add_generation_prompt=True` ┘Д╪е╪╢╪з┘Б╪й [┘Е╪╖╪з┘Д╪и╪з╪к ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡┘И╪╡](#what-are-generation-prompts).
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Е╪л╪з┘Д ╪╣┘Д┘Й ╪е╪╣╪п╪з╪п ╪з┘Д╪е╪п╪о╪з┘Д ┘Д┘А `model.generate()`╪М ╪и╪з╪│╪к╪о╪п╪з┘Е Zephyr ┘Е╪▒╪й ╪г╪о╪▒┘Й:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
checkpoint = "HuggingFaceH4/zephyr-7b-beta"
|
||||
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
model = AutoModelForCausalLM.from_pretrained(checkpoint) # ┘В╪п ╪к╪▒╪║╪и ┘Б┘К ╪з╪│╪к╪о╪п╪з┘Е bfloat16 ┘И/╪г┘И ╪з┘Д╪з┘Ж╪к┘В╪з┘Д ╪е┘Д┘Й GPU ┘З┘Ж╪з
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a friendly chatbot who always responds in the style of a pirate",
|
||||
},
|
||||
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
|
||||
]
|
||||
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
|
||||
print(tokenizer.decode(tokenized_chat[0]))
|
||||
```
|
||||
╪│┘К╪д╪п┘К ┘З╪░╪з ╪е┘Д┘Й ╪е┘Ж╪к╪з╪м ╪│┘Д╪│┘Д╪й ┘Ж╪╡┘К╪й ╪и╪к┘Ж╪│┘К┘В ╪з┘Д╪е╪п╪о╪з┘Д ╪з┘Д╪░┘К ┘К╪к┘И┘В╪╣┘З Zephyr.
|
||||
|
||||
```text
|
||||
<|system|>
|
||||
You are a friendly chatbot who always responds in the style of a pirate</s>
|
||||
<|user|>
|
||||
How many helicopters can a human eat in one sitting?</s>
|
||||
<|assistant|>
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ╪и╪╣╪п ╪г┘Ж ╪к┘Е ╪к┘Ж╪│┘К┘В ╪з┘Д╪е╪п╪о╪з┘Д ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н ┘Д┘А Zephyr╪М ┘К┘Е┘Г┘Ж┘Ж╪з ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д╪е┘Ж╪┤╪з╪б ╪▒╪п ╪╣┘Д┘Й ╪│╪д╪з┘Д ╪з┘Д┘Е╪│╪к╪о╪п┘Е:
|
||||
|
||||
```python
|
||||
outputs = model.generate(tokenized_chat, max_new_tokens=128)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
╪│┘К╪д╪п┘К ┘З╪░╪з ╪е┘Д┘Й ┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```text
|
||||
<|system|>
|
||||
You are a friendly chatbot who always responds in the style of a pirate</s>
|
||||
<|user|>
|
||||
How many helicopters can a human eat in one sitting?</s>
|
||||
<|assistant|>
|
||||
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.
|
||||
```
|
||||
|
||||
┘Г╪з┘Ж ╪░┘Д┘Г ╪│┘З┘Д╪з┘Л ╪и╪╣╪п ┘Г┘Д ╪┤┘К╪б !
|
||||
|
||||
|
||||
|
||||
## ┘З┘Д ┘З┘Ж╪з┘Г ┘В┘Ж┘И╪з╪к ┘Е╪╣╪з┘Д╪м╪й ╪г┘И╪к┘И┘Е╪з╪к┘К┘Г┘К╪й ┘Д┘Д╪п╪▒╪п╪┤╪й╪Я
|
||||
|
||||
┘Ж╪╣┘Е ┘К┘И╪м╪п ! ╪к╪п╪╣┘Е ┘В┘Ж┘И╪з╪к ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡┘И╪╡ ┘Е╪п╪о┘Д╪з╪к ╪з┘Д╪п╪▒╪п╪┤╪й ╪М ┘Е┘Е╪з ┘К┘П╪│┘З┘С┘Д ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е╪з╪░╪м ╪з┘Д╪п╪▒╪п╪┤╪й . ┘Б┘К ╪з┘Д┘Е╪з╪╢┘К ╪М ┘Г┘Ж╪з ┘Ж╪│╪к╪о╪п┘Е ┘Б╪ж╪й "ConversationalPipeline" ╪з┘Д┘Е┘П╪о╪╡┘С╪╡╪й ╪М ┘И┘Д┘Г┘Ж ╪к┘Е ╪з┘Д╪в┘Ж ╪е┘К┘В╪з┘Б┘З╪з ┘И╪к┘Е ╪п┘Е╪м ┘И╪╕╪з╪ж┘Б┘З╪з ┘Б┘К [`TextGenerationPipeline`]. ╪п╪╣┘И┘Ж╪з ┘Ж╪м╪▒┘С╪и ┘Е╪л╪з┘Д Zephyr ┘Е╪▒╪й ╪г╪о╪▒┘Й ╪М ┘И┘Д┘Г┘Ж ┘З╪░┘З ╪з┘Д┘Е╪▒╪й ╪и╪з╪│╪к╪о╪п╪з┘Е ┘В┘Ж╪з╪й ┘Е╪╣╪з┘Д╪м╪й:
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
pipe = pipeline("text-generation", "HuggingFaceH4/zephyr-7b-beta")
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a friendly chatbot who always responds in the style of a pirate",
|
||||
},
|
||||
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
|
||||
]
|
||||
print(pipe(messages, max_new_tokens=128)[0]['generated_text'][-1]) # ╪╖╪и╪з╪╣╪й ╪з╪│╪к╪м╪з╪и╪й ╪з┘Д┘Е╪│╪з╪╣╪п
|
||||
```
|
||||
|
||||
```╪з┘Д┘Ж╪╡
|
||||
{'role': 'assistant', 'content': "Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all."}
|
||||
```
|
||||
|
||||
╪│┘К┘П╪▒╪з╪╣┘К ┘В┘Ж╪з╪й ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪м┘Е┘К╪╣ ╪к┘Б╪з╪╡┘К┘Д ╪к┘В╪│┘К┘Е ╪з┘Д┘Ж╪╡ ╪е┘Д┘Й ╪▒┘Е┘И╪▓ ┘И╪з╪│╪к╪п╪╣╪з╪б apply_chat_template ┘Ж┘К╪з╪и╪й┘Л ╪╣┘Ж┘Г - ╪и┘Е╪м╪▒╪п ╪г┘Ж ┘К╪╡╪и╪н ┘Д┘Р╪п┘Й ╪з┘Д┘Ж┘Е┘И╪░╪м ┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й ╪М ┘Б┘Г┘Д ┘Е╪з ╪к╪н╪к╪з╪м ╪е┘Д┘Й ╪з┘Д┘В┘К╪з┘Е ╪и┘З ┘З┘И ╪к┘З┘К╪ж╪й ┘В┘Ж╪з╪й ┘Е╪╣╪з┘Д╪м╪й ┘И╪к┘Е╪▒┘К╪▒ ┘В╪з╪ж┘Е╪й ╪з┘Д╪▒╪│╪з╪ж┘Д ╪е┘Д┘К┘З╪з!
|
||||
|
||||
## ┘Е╪з ┘З┘К "┘Е╪╖╪з┘Д╪и╪з╪к ╪з┘Д╪к┘И┘Д┘К╪п"╪Я
|
||||
|
||||
┘В╪п ╪к┘Д╪з╪н╪╕ ╪г┘Ж ╪╖╪▒┘К┘В╪й `apply_chat_template` ┘Д┘З╪з ┘Е╪╣╪з┘Е┘Д `add_generation_prompt`. ╪к╪о╪и╪▒ ┘З╪░┘З ╪з┘Д┘Е╪╣╪з┘Е┘Д ╪з┘Д┘В╪з┘Д╪и ╪и╪е╪╢╪з┘Б╪й ╪▒┘Е┘И╪▓ ╪к╪┤┘К╪▒ ╪е┘Д┘Й ╪и╪п╪з┘К╪й ╪▒╪п ╪з┘Д╪и┘И╪к. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪╢╪╣ ┘Б┘К ╪з╪╣╪к╪и╪з╪▒┘Г ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪к╪з┘Д┘К╪й:
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "user", "content": "Hi there!"},
|
||||
{"role": "assistant", "content": "Nice to meet you!"},
|
||||
{"role": "user", "content": "Can I ask a question?"}
|
||||
]
|
||||
```
|
||||
|
||||
╪е┘Д┘К┘Г ┘Г┘К┘Б ╪│┘К╪и╪п┘И ╪░┘Д┘Г ╪и╪п┘И┘Ж ┘Е┘И╪м┘З ╪к┘И┘Д┘К╪п ┘Ж╪╡┘И╪╡ ╪М ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘Ж┘Е┘И╪░╪м ┘К╪│╪к╪о╪п┘Е ╪к┘Ж╪│┘К┘В "ChatML" ╪з┘Д┘В┘К╪з╪│┘К :
|
||||
|
||||
```python
|
||||
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
||||
"""<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
"""
|
||||
```
|
||||
|
||||
┘И┘З┘Г╪░╪з ┘К╪и╪п┘И ╪з┘Д╪г┘Е╪▒ **┘Е╪╣** ┘Е╪╖╪з┘Д╪и╪й ╪з┘Д╪к┘И┘Д┘К╪п:
|
||||
|
||||
```python
|
||||
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
"""<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
"""
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ╪г┘Ж┘Ж╪з ╪г╪╢┘Б┘Ж╪з ┘З╪░┘З ╪з┘Д┘Е╪▒╪й ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪к┘К ╪к╪┤┘К╪▒ ╪е┘Д┘Й ╪и╪п╪з┘К╪й ╪▒╪п ╪з┘Д╪и┘И╪к. ┘К╪╢┘Е┘Ж ┘З╪░╪з ╪г┘Ж┘З ╪╣┘Ж╪п┘Е╪з ┘К┘П┘И┘Д┘С╪п ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Ж╪╡┘Л╪з ┘Б╪│┘К┘Г╪к╪и ╪▒╪п ╪з┘Д╪и┘И╪к ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪з┘Д┘В┘К╪з┘Е ╪и╪┤┘К╪б ╪║┘К╪▒ ┘Е╪к┘И┘В╪╣╪М ┘Е╪л┘Д ╪з┘Д╪з╪│╪к┘Е╪▒╪з╪▒ ┘Б┘К ╪▒╪│╪з┘Д╪й ╪з┘Д┘Е╪│╪к╪о╪п┘Е. ╪к╪░┘Г╪▒╪М ╪г┘Ж ┘Ж┘Е╪з╪░╪м ╪з┘Д╪п╪▒╪п╪┤╪й ┘Д╪з ╪к╪▓╪з┘Д ┘Е╪м╪▒╪п ┘Ж┘Е╪з╪░╪м ┘Д┘Д╪║╪й - ┘Б┘З┘К ┘Е╪п╪▒╪и╪й ╪╣┘Д┘Й ┘Е╪к╪з╪и╪╣╪й ╪з┘Д┘Ж╪╡┘И╪╡╪М ┘И╪з┘Д╪п╪▒╪п╪┤╪й ┘З┘К ┘Е╪м╪▒╪п ┘Ж┘И╪╣ ╪о╪з╪╡ ┘Е┘Ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘З╪з! ┘К╪м╪и ╪к┘И╪м┘К┘З┘З╪з ╪и╪▒┘Е┘И╪▓ ╪к╪н┘Г┘Е ┘Е┘Ж╪з╪│╪и╪й╪М ╪н╪к┘Й ╪к╪╣╪▒┘Б ┘Е╪з ╪з┘Д╪░┘К ┘К╪м╪и ╪╣┘Д┘К┘З╪з ┘Б╪╣┘Д┘З.
|
||||
|
||||
┘Д╪з ╪к╪к╪╖┘Д╪и ╪м┘Е┘К╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪к╪н┘Г┘Е┘К╪й ┘Д╪к┘И┘Д┘К╪п ┘Ж╪╡┘И╪╡ . ╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪М ┘Е╪л┘Д LLaMA ╪М ┘Д┘К╪│ ┘Д╪п┘К┘З╪з ╪г┘К ╪▒┘Е┘И╪▓ ╪о╪з╪╡╪й ┘В╪и┘Д ╪▒╪п┘И╪п ╪з┘Д╪и┘И╪к . ┘Б┘К ┘З╪░┘З ╪з┘Д╪н╪з┘Д╪з╪к ╪М ┘Д┘Ж ┘К┘Г┘И┘Ж ┘Д┘Е╪╣╪з┘Е┘Д `add_generation_prompt` ╪г┘К ╪к╪г╪л┘К╪▒. ┘К╪╣╪к┘Е╪п ╪з┘Д╪к╪г╪л┘К╪▒ ╪з┘Д╪п┘В┘К┘В ╪з┘Д╪░┘К ╪к┘П╪н╪п╪л┘З `add_generation_prompt` ╪╣┘Д┘Й ╪з┘Д┘В╪з┘Д╪и ╪з┘Д┘Е╪│╪к╪о╪п┘Е .
|
||||
|
||||
## ┘Е╪з ┘И╪╕┘К┘Б╪й "continue_final_message"╪Я
|
||||
|
||||
╪╣┘Ж╪п ╪к┘Е╪▒┘К╪▒ ┘В╪з╪ж┘Е╪й ┘Е┘Ж ╪з┘Д╪▒╪│╪з╪ж┘Д ╪е┘Д┘Й `apply_chat_template` ╪г┘И `TextGenerationPipeline` ╪М ┘К┘Е┘Г┘Ж┘Г ╪з╪о╪к┘К╪з╪▒ ╪к┘Ж╪│┘К┘В ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪и╪н┘К╪л ┘К┘И╪з╪╡┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪▒╪│╪з┘Д╪й ╪з┘Д╪г╪о┘К╪▒╪й ┘Б┘К ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪и╪п╪б ╪▒╪│╪з┘Д╪й ╪м╪п┘К╪п╪й. ┘К╪к┘Е ╪░┘Д┘Г ╪╣┘Ж ╪╖╪▒┘К┘В ╪е╪▓╪з┘Д╪й ╪г┘К ╪▒┘Е┘И╪▓ ┘Ж┘З╪з┘К╪й ╪з┘Д╪к╪│┘Д╪│┘Д ╪з┘Д╪к┘К ╪к╪┤┘К╪▒ ╪е┘Д┘Й ┘Ж┘З╪з┘К╪й ╪з┘Д╪▒╪│╪з┘Д╪й ╪з┘Д╪г╪о┘К╪▒╪й ╪М ╪и╪н┘К╪л ┘К┘В┘И┘Е ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪и╪│╪з╪╖╪й ╪и╪к┘Е╪п┘К╪п ╪з┘Д╪▒╪│╪з┘Д╪й ╪з┘Д╪г╪о┘К╪▒╪й ╪╣┘Ж╪п┘Е╪з ┘К╪и╪п╪г ┘Б┘К ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡ . ┘К┘П╪╣╪п ┘З╪░╪з ╪г┘Е╪▒┘Л╪з ┘Е┘Б┘К╪п┘Л╪з "┘Д┘Р┘Е┘О┘Д╪б ╪и╪п╪з┘К╪й" ╪▒╪п ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е┘П╪│╪и┘В┘Л╪з.
|
||||
|
||||
┘И┘З┘Ж╪з ┘Е╪л╪з┘Д:
|
||||
```python
|
||||
chat = [
|
||||
{"role": "user", "content": "Can you format the answer in JSON?"},
|
||||
{"role": "assistant", "content": '{"name": "'},
|
||||
]
|
||||
|
||||
formatted_chat = tokenizer.apply_chat_template(chat, tokenize=True, return_dict=True, continue_final_message=True)
|
||||
model.generate(**formatted_chat)
|
||||
```
|
||||
╪│┘К┘В┘И┘Е ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪к┘И┘Д┘К╪п ┘Ж╪╡ ┘К┘Г┘Е┘Д ╪│┘Д╪│┘Д╪й JSON ╪М ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪и╪п╪б ╪▒╪│╪з┘Д╪й ╪м╪п┘К╪п╪й . ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К┘Г┘И┘Ж ┘З╪░╪з ╪з┘Д┘Ж┘З╪м ┘Е┘Б┘К╪п┘Л╪з ╪м╪п┘Л╪з ┘Д╪к╪н╪│┘К┘Ж ╪п┘В╪й ╪з╪к╪и╪з╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Д╪е╪▒╪┤╪з╪п╪з╪к ╪╣┘Ж╪п┘Е╪з ╪к╪╣╪▒┘Б ┘Г┘К┘Б ╪к╪▒┘К╪п ╪г┘Ж ┘К╪и╪п╪г ╪▒╪п┘И╪п┘З .
|
||||
.
|
||||
|
||||
┘Ж╪╕╪▒┘Л╪з ┘Д╪г┘Ж `add_generation_prompt` ╪к╪╢┘К┘Б ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪к┘К ╪к╪и╪п╪г ╪▒╪│╪з┘Д╪й ╪м╪п┘К╪п╪й ╪М ┘И `continue_final_message` ╪к╪▓┘К┘Д ╪г┘К ╪▒┘Е┘И╪▓ ┘Ж┘З╪з┘К╪й ╪з┘Д╪▒╪│╪з┘Д╪й ┘Е┘Ж ╪з┘Д╪▒╪│╪з┘Д╪й ╪з┘Д╪г╪о┘К╪▒╪й ╪М ┘Б┘Д┘К╪│ ┘Е┘Ж ╪з┘Д┘Е┘Ж╪╖┘В┘К ╪з╪│╪к╪о╪п╪з┘Е┘З┘Е╪з ┘Е╪╣┘Л╪з . ┘И┘Ж╪к┘К╪м╪й ┘Д╪░┘Д┘Г ╪М ╪│╪к╪к┘Д┘В┘С┘Й ╪о╪╖╪г┘Л ╪е╪░╪з ╪н╪з┘И┘Д╪к ╪░┘Д┘Г !
|
||||
|
||||
╪з┘Д╪│┘Д┘И┘Г ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К ┘Д┘Р┘А `TextGenerationPipeline` ┘З┘И ╪к╪╣┘К┘К┘Ж `add_generation_prompt=True` ╪и╪н┘К╪л ╪к╪и╪п╪г ╪▒╪│╪з┘Д╪й ╪м╪п┘К╪п╪й . ┘И┘Е╪╣ ╪░┘Д┘Г ╪М ╪е╪░╪з ┘Г╪з┘Ж╪к ╪з┘Д╪▒╪│╪з┘Д╪й ╪з┘Д╪г╪о┘К╪▒╪й ┘Б┘К ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪з┘Д╪к┘К ╪к┘Е ╪е╪п╪о╪з┘Д┘З╪з ┘Д╪п┘К┘З╪з ╪п┘И╪▒ "assistant" ╪М ┘Б╪│┘И┘Б ╪к┘Б╪к╪▒╪╢ ╪г┘Ж ┘З╪░┘З ╪з┘Д╪▒╪│╪з┘Д╪й ┘З┘К "┘Е┘О┘Д╪б ╪и╪п╪з┘К╪й" ┘И╪к╪к╪н┘И┘С┘Д ╪е┘Д┘Й `continue_final_message=True` ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г ╪М ┘Д╪г┘Ж ┘Е┘П╪╣╪╕┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д╪з ╪к╪п╪╣┘Е ╪╣╪п╪й ╪▒╪│╪з╪ж┘Д ┘Е╪к╪к╪з┘Д┘К╪й ┘Д┘Д┘Е╪│╪з╪╣╪п . ┘К┘Е┘Г┘Ж┘Г ╪к╪м╪з┘И╪▓ ┘З╪░╪з ╪з┘Д╪│┘Д┘И┘Г ╪╣┘Ж ╪╖╪▒┘К┘В ╪к┘Е╪▒┘К╪▒ ┘Е╪╣╪з┘Е┘Д `continue_final_message` ╪и╪┤┘Г┘Д ╪╡╪▒┘К╪н ╪╣┘Ж╪п ╪з╪│╪к╪п╪╣╪з╪б ┘В┘Ж╪з╪й ╪з┘Д┘Е╪╣╪з┘Д╪м╪й .
|
||||
|
||||
|
||||
|
||||
## ┘З┘Д ┘К┘Е┘Г┘Ж┘Ж┘К ╪з╪│╪к╪о╪п╪з┘Е ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘Б┘К ╪з┘Д╪к╪п╪▒┘К╪и╪Я
|
||||
|
||||
┘Ж╪╣┘Е ! ╪к┘П╪╣╪п ┘З╪░┘З ╪╖╪▒┘К┘В╪й ╪м┘К╪п╪й ┘Д┘Д╪к╪г┘Г╪п ┘Е┘Ж ╪г┘Ж ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘К╪к╪╖╪з╪и┘В ┘Е╪╣ ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪к┘К ┘К╪▒╪з┘З╪з ╪з┘Д┘Ж┘Е┘И╪░╪м ╪г╪л┘Ж╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и . ┘Ж┘И╪╡┘К ╪и╪к╪╖╪и┘К┘В ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘Г╪о╪╖┘И╪й ┘Е╪╣╪з┘Д╪м╪й ╪г┘И┘Д┘К╪й ┘Д┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к┘Г . ╪и╪╣╪п ╪░┘Д┘Г ╪М ┘К┘Е┘Г┘Ж┘Г ╪и╪и╪│╪з╪╖╪й ┘Е╪к╪з╪и╪╣╪й ╪╣┘Е┘Д┘К╪й ╪з┘Д╪к╪п╪▒┘К╪и ┘Г┘Е╪з ┘З┘И ╪з┘Д╪н╪з┘Д ┘Е╪╣ ╪г┘К ┘Е┘З┘Е╪й ╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м ┘Д╪║╪з╪к ╪г╪о╪▒┘Й . ╪╣┘Ж╪п ╪з┘Д╪к╪п╪▒┘К╪и ╪М ┘К╪м╪и ╪г┘Ж ╪к┘П╪╣┘К┘С┘Ж ╪╣╪з╪п╪й┘Л `add_generation_prompt=False` ╪М ┘Д╪г┘Ж┘З ┘Д┘Ж ╪к┘Г┘И┘Ж ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д┘Е┘П╪╢╪з┘Б╪й ┘Д╪к╪н┘Б┘К╪▓ ╪▒╪п ╪з┘Д┘Е╪│╪з╪╣╪п ┘Е┘Б┘К╪п╪й ╪г╪л┘Ж╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и . ╪п╪╣┘И┘Ж╪з ┘Ж╪▒┘Й ┘Е╪л╪з┘Д╪з┘Л :
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
from datasets import Dataset
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
|
||||
|
||||
chat1 = [
|
||||
{"role": "user", "content": "Which is bigger, the moon or the sun?"},
|
||||
{"role": "assistant", "content": "The sun."}
|
||||
]
|
||||
chat2 = [
|
||||
{"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
|
||||
{"role": "assistant", "content": "A bacterium."}
|
||||
]
|
||||
|
||||
dataset = Dataset.from_dict({"chat": [chat1, chat2]})
|
||||
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
|
||||
print(dataset['formatted_chat'][0])
|
||||
```
|
||||
┘И┘Ж╪н╪╡┘Д ╪╣┘Д┘Й:
|
||||
|
||||
```text
|
||||
<|user|>
|
||||
Which is bigger, the moon or the sun?</s>
|
||||
<|assistant|>
|
||||
The sun.</s>
|
||||
```
|
||||
|
||||
┘Е┘Ж ┘З┘Ж╪з╪М ╪з╪│╪к┘Е╪▒ ┘Б┘К ╪з┘Д╪к╪п╪▒┘К╪и ┘Г┘Е╪з ╪к┘Б╪╣┘Д ┘Е╪╣ ┘Е┘З┘Е╪й ┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘В┘К╪з╪│┘К╪й╪М ╪и╪з╪│╪к╪о╪п╪з┘Е ╪╣┘Е┘И╪п `formatted_chat`.
|
||||
|
||||
<Tip>
|
||||
╪и╪┤┘Г┘Д ╪з┘Б╪к╪▒╪з╪╢┘К ╪М ╪к╪╢┘К┘Б ╪и╪╣╪╢ *tokenizers* ╪▒┘Е┘И╪▓┘Л╪з ╪о╪з╪╡╪й ┘Е╪л┘Д `<bos>` ┘И `<eos>` ╪е┘Д┘Й ╪з┘Д┘Ж╪╡ ╪з┘Д╪░┘К ╪к┘В┘И┘Е ╪и╪к┘В╪│┘К┘Е┘З ╪е┘Д┘Й ╪▒┘Е┘И╪▓. ┘К╪м╪и ╪г┘Ж ╪к╪к╪╢┘Е┘Ж ┘В┘И╪з┘Д╪и ╪з┘Д┘Е╪н╪з╪п╪л╪й ╪и╪з┘Д┘Б╪╣┘Д ╪м┘Е┘К╪╣ ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й ╪з┘Д╪к┘К ╪к╪н╪к╪з╪м┘З╪з ╪М ┘И╪и╪з┘Д╪к╪з┘Д┘К ┘Б╪е┘Ж ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й ╪з┘Д╪е╪╢╪з┘Б┘К╪й ╪│╪к┘Г┘И┘Ж ╪║╪з┘Д╪и┘Л╪з ╪║┘К╪▒ ╪╡╪н┘К╪н╪й ╪г┘И ┘Е┘П┘Г╪▒╪▒╪й ╪М ┘Е┘Е╪з ╪│┘К╪д╪л╪▒ ╪│┘Д╪и┘Л╪з ╪╣┘Д┘Й ╪г╪п╪з╪б ╪з┘Д┘Ж┘Е┘И╪░╪м .
|
||||
|
||||
┘Д╪░┘Д┘Г ╪М ╪е╪░╪з ┘В┘Е╪к ╪и╪к┘Ж╪│┘К┘В ╪з┘Д┘Ж╪╡ ╪и╪з╪│╪к╪о╪п╪з┘Е `apply_chat_template(tokenize=False)` ╪М ┘Б┘К╪м╪и ╪к╪╣┘К┘К┘Ж ╪з┘Д┘Е╪╣╪з┘Е┘Д `add_special_tokens=False` ╪╣┘Ж╪п┘Е╪з ╪к┘В┘И┘Е ╪и╪к┘В╪│┘К┘Е ╪░┘Д┘Г ╪з┘Д┘Ж╪╡ ╪е┘Д┘Й ╪▒┘Е┘И╪▓ ┘Д╪з╪н┘В┘Л╪з . ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е `apply_chat_template(tokenize=True)` ╪М ┘Б┘Д┘Ж ╪к╪н╪к╪з╪м ╪е┘Д┘Й ╪з┘Д┘В┘Д┘В ╪и╪┤╪г┘Ж ╪░┘Д┘Г !
|
||||
</Tip>
|
||||
|
||||
## ┘Е╪к┘В╪п┘С┘Е: ┘Е╪п╪о┘Д╪з╪к ╪е╪╢╪з┘Б┘К╪й ┘Д┘Р┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й
|
||||
|
||||
|
||||
╪з┘Д┘Е╪╣╪з┘Е┘Д ╪з┘Д┘И╪н┘К╪п╪й ╪з┘Д╪к┘К ╪к╪к╪╖┘Д╪и┘З╪з ╪╖╪▒┘К┘В╪й `apply_chat_template` ┘З┘К `messages`. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪к┘Е╪▒┘К╪▒ ╪г┘К ┘Е╪╣╪з┘Е┘Д ┘Г┘Г┘Д┘Е╪й ┘Е┘Б╪к╪з╪н┘К╪й ╪е┘Д┘Й `apply_chat_template` ┘И╪│╪к┘Г┘И┘Ж ┘Е╪к╪з╪н╪й ╪п╪з╪о┘Д ╪з┘Д┘В╪з┘Д╪и. ┘К┘Е┘Ж╪н┘Г ┘З╪░╪з ╪з┘Д┘Г╪л┘К╪▒ ┘Е┘Ж ╪з┘Д┘Е╪▒┘И┘Ж╪й ┘Д╪з╪│╪к╪о╪п╪з┘Е ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘Д┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д╪г╪┤┘К╪з╪б. ┘Д╪з ╪к┘И╪м╪п ┘В┘К┘И╪п ╪╣┘Д┘Й ╪г╪│┘Е╪з╪б ┘З╪░┘З ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ╪г┘И ╪к┘Ж╪│┘К┘В╪з╪к┘З╪з - ┘К┘Е┘Г┘Ж┘Г ╪к┘Е╪▒┘К╪▒ ╪│┘Д╪з╪│┘Д ┘Ж╪╡┘К╪й ╪г┘И ┘В┘И╪з╪ж┘Е ╪г┘И ┘В┘И╪з┘Е┘К╪│ ╪г┘И ╪г┘К ╪┤┘К╪б ╪в╪о╪▒ ╪к╪▒┘К╪п┘З.
|
||||
|
||||
┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘З┘Ж╪з┘Г ╪и╪╣╪╢ ╪з┘Д╪н╪з┘Д╪з╪к ╪з┘Д╪┤╪з╪ж╪╣╪й ┘Д╪з╪│╪к╪о╪п╪з┘Е ┘З╪░┘З ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ╪з┘Д╪е╪╢╪з┘Б┘К╪й╪М ┘Е╪л┘Д ╪к┘Е╪▒┘К╪▒ ╪г╪п┘И╪з╪к ┘Д╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д┘И╪╕╪з╪ж┘Б╪М ╪г┘И ╪з┘Д┘Е╪│╪к┘Ж╪п╪з╪к ┘Д╪е┘Ж╪┤╪з╪б ╪з┘Д┘Ж╪╡┘И╪╡ ╪з┘Д┘Е┘П╪╣╪▓┘С╪▓╪й ╪и╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣. ┘Б┘К ┘З╪░┘З ╪з┘Д╪н╪з┘Д╪з╪к ╪з┘Д╪┤╪з╪ж╪╣╪й╪М ┘Д╪п┘К┘Ж╪з ╪и╪╣╪╢ ╪з┘Д╪к┘И╪╡┘К╪з╪к ╪з┘Д┘Е┘П╪н╪п┘С╪п╪й ╪н┘И┘Д ╪г╪│┘Е╪з╪б ┘З╪░┘З ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ┘И╪к┘Ж╪│┘К┘В╪з╪к┘З╪з╪М ┘И╪з┘Д╪к┘К ┘К╪к┘Е ┘И╪╡┘Б┘З╪з ┘Б┘К ╪з┘Д╪г┘В╪│╪з┘Е ╪з┘Д╪к╪з┘Д┘К╪й. ┘Ж╪┤╪м╪╣ ┘Е╪╖┘И┘С╪▒┘К ╪з┘Д┘Ж┘Е╪з╪░╪м ╪╣┘Д┘Й ╪м╪╣┘Д ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘З┘Е ┘Е╪к┘И╪з┘Б┘В╪й ┘Е╪╣ ┘З╪░╪з ╪з┘Д╪к┘Ж╪│┘К┘В╪М ┘Д╪к╪│┘З┘К┘Д ┘Ж┘В┘Д ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ┘Д╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п┘И╪з╪к ╪и┘К┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м.
|
||||
|
||||
## ┘Е╪к┘В╪п┘Е: ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п╪з╪й / ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪п╪з┘Д╪й
|
||||
|
||||
┘К┘Е┘Г┘Ж ┘Д┘Ж┘Е╪з╪░╪м "╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п╪з╪й" ╪з╪о╪к┘К╪з╪▒ ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪п┘И╪з┘Д ┘Г╪г╪п┘И╪з╪к ╪о╪з╪▒╪м┘К╪й ┘В╪и┘Д ╪к┘И┘Д┘К╪п ╪з┘Д╪е╪м╪з╪и╪й. ╪╣┘Ж╪п ╪к┘Е╪▒┘К╪▒ ╪з┘Д╪г╪п┘И╪з╪к ╪е┘Д┘Й ┘Ж┘Е┘И╪░╪м ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к╪М ┘К┘Е┘Г┘Ж┘Г ╪и╪и╪│╪з╪╖╪й ╪к┘Е╪▒┘К╪▒ ┘В╪з╪ж┘Е╪й ┘Е┘Ж ╪з┘Д┘И╪╕╪з╪ж┘Б ╪е┘Д┘Й ┘Е╪╣╪з┘Е┘Д `tools`:
|
||||
|
||||
```python
|
||||
import datetime
|
||||
|
||||
def current_time():
|
||||
"""Get the current local time as a string."""
|
||||
return str(datetime.now())
|
||||
|
||||
def multiply(a: float, b: float):
|
||||
"""
|
||||
A function that multiplies two numbers
|
||||
|
||||
Args:
|
||||
a: The first number to multiply
|
||||
b: The second number to multiply
|
||||
"""
|
||||
return a * b
|
||||
|
||||
tools = [current_time, multiply]
|
||||
|
||||
model_input = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tools=tools
|
||||
)
|
||||
```
|
||||
|
||||
┘Д┘Г┘К ┘К╪╣┘Е┘Д ┘З╪░╪з ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н╪М ┘К╪м╪и ╪╣┘Д┘К┘Г ┘Г╪к╪з╪и╪й ┘И╪╕╪з╪ж┘Б┘Г ╪и╪з┘Д╪к┘Ж╪│┘К┘В ╪з┘Д╪│╪з╪и┘В╪М ╪н╪к┘Й ┘К┘Е┘Г┘Ж ╪к╪н┘Д┘К┘Д┘З╪з ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н ┘Г╪г╪п┘И╪з╪к. ╪╣┘Д┘Й ┘И╪м┘З ╪з┘Д╪к╪н╪п┘К╪п╪М ┘К╪м╪и ╪╣┘Д┘К┘Г ╪з╪к╪и╪з╪╣ ┘З╪░┘З ╪з┘Д┘В┘И╪з╪╣╪п:
|
||||
|
||||
- ┘К╪м╪и ╪г┘Ж ┘К┘Г┘И┘Ж ┘Д┘Д╪п╪з┘Д╪й ╪з╪│┘Е ┘И╪╡┘Б┘К.
|
||||
- ┘К╪м╪и ╪г┘Ж ┘К┘Г┘И┘Ж ┘Д┘Г┘Д ┘Е╪╣╪з┘Е┘Д ┘Ж┘И╪╣ ┘Д┘Д╪к┘Д┘Е┘К╪н.
|
||||
- ┘К╪м╪и ╪г┘Ж ╪к╪н╪к┘И┘К ╪з┘Д╪п╪з┘Д╪й ╪╣┘Д┘Й ╪│┘Д╪│┘Д╪й ┘Е╪│╪к┘Ж╪п┘К╪й ╪и╪к┘Ж╪│┘К┘В Google ╪з┘Д┘В┘К╪з╪│┘К (╪и┘Е╪╣┘Ж┘Й ┘И╪╡┘Б ╪з┘Д╪п╪з┘Д╪й ╪з┘Д╪г┘И┘Д┘К ┘Е╪к╪и┘И╪╣┘Л╪з ╪и┘Г╪к┘Д╪й `Args:` ╪з┘Д╪к┘К ╪к╪╡┘Б ╪з┘Д┘Е╪╣╪зя╗╗╪к╪М ┘Е╪з ┘Д┘Е ╪к┘Г┘Ж ╪з┘Д╪п╪з┘Д╪й ┘Д╪з ╪к╪н╪к┘И┘К ╪╣┘Д┘Й ╪г┘К ┘Е╪╣╪з┘Ея╗╗╪к.
|
||||
- ┘Д╪з ╪к┘В┘Е ╪и╪к╪╢┘Е┘К┘Ж ╪з┘Д╪г┘Ж┘И╪з╪╣ ┘Б┘К ┘Г╪к┘Д╪й `Args:` . ╪и╪╣╪и╪з╪▒╪й ╪г╪о╪▒┘Й╪М ╪з┘Г╪к╪и `a: The first number to multiply`╪М ┘И┘Д┘К╪│ `a (int): The first number to multiply`. ┘К╪м╪и ╪г┘Ж ╪к╪░┘З╪и ╪к┘Д┘Е┘К╪н╪з╪к ╪з┘Д╪г┘Ж┘И╪з╪╣ ┘Б┘К ╪▒╪г╪│ ╪з┘Д╪п╪з┘Д╪й ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г.
|
||||
- ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К┘Г┘И┘Ж ┘Д┘Д╪п╪з┘Д╪й ┘Ж┘И╪╣ ┘Д┘Д╪е╪▒╪м╪з╪╣ ┘И┘Е╪▒╪и╪╣ `Returns:` ┘Б┘К ╪з┘Д╪│┘Д╪│┘Д╪й. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘Б┘З╪░┘З ╪з╪о╪к┘К╪з╪▒┘К╪й ┘Д╪г┘Ж ┘Е╪╣╪╕┘Е ┘Ж┘Е╪з╪░╪м ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ╪к╪к╪м╪з┘З┘Д┘З╪з.
|
||||
|
||||
### ╪к┘Е╪▒┘К╪▒ ┘Ж╪к╪з╪ж╪м ╪з┘Д╪г╪п╪з╪й ╪е┘Д┘Й ╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
|
||||
┘К┘Г┘Б┘К ╪з┘Д┘Г┘И╪п ╪з┘Д╪│╪з╪и┘В╪й ┘Д╪│╪▒╪п ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д┘Е╪к╪з╪н╪й ┘Д┘Ж┘Е┘И╪░╪м┘Г╪М ┘И┘Д┘Г┘Ж ┘Е╪з╪░╪з ┘К╪н╪п╪л ╪е╪░╪з ╪г╪▒╪з╪п ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з╪│╪к╪о╪п╪з┘Е ┘И╪з╪н╪п╪й ┘Е┘Ж┘З╪з╪Я ╪е╪░╪з ╪н╪п╪л ╪░┘Д┘Г╪М ┘Б┘К╪м╪и ╪╣┘Д┘К┘Г:
|
||||
|
||||
1. ╪к╪н┘Д┘К┘Д ┘Е╪о╪▒╪м╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪з╪│┘Е (╪г╪│┘Е╪з╪б) ╪з┘Д╪г╪п┘И╪з╪к ┘И┘Е╪╣╪з┘Ея╗╗╪к┘З╪з.
|
||||
2. ╪г╪╢┘Б ╪з╪│╪к╪п╪╣╪з╪б (╪з╪│╪к╪п╪╣╪з╪б╪з╪к) ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Р┘Д╪г╪п┘И╪з╪к ╪е┘Д┘Й ╪з┘Д┘Е╪н╪з╪п╪л╪й.
|
||||
3. ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪п╪з┘Д╪й (╪з┘Д╪п╪з┘Д╪з╪к) ╪з┘Д┘Е┘В╪з╪и┘Д╪й ╪и╪к┘Д┘Г ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к.
|
||||
4. ╪г╪╢┘Б ╪з┘Д┘Ж╪к┘К╪м╪й (╪з┘Д┘Ж╪к╪з╪ж╪м) ╪е┘Д┘Й ╪з┘Д┘Е╪н╪з╪п╪л╪й
|
||||
|
||||
### ┘Е╪л╪з┘Д ┘Г╪з┘Е┘Д ╪╣┘Д┘Й ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п╪з╪й
|
||||
|
||||
|
||||
╪│┘Ж╪│╪к╪╣╪▒╪╢ ┘Е╪л╪з┘Д╪з┘Л ╪╣┘Д┘Й ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ╪о╪╖┘И╪й ╪и╪о╪╖┘И╪й . ┘Б┘К ┘З╪░╪з ╪з┘Д┘Е╪л╪з┘Д ╪М ╪│┘Ж╪│╪к╪о╪п┘Е ┘Ж┘Е┘И╪░╪м `Hermes-2-Pro` ╪и╪н╪м┘Е 8 ┘Е┘Д┘К╪з╪▒╪з╪к ┘Е╪╣╪з┘Е┘Д ╪М ┘Ж╪╕╪▒┘Л╪з ┘Д╪г┘Ж┘З ╪г╪н╪п ╪г╪╣┘Д┘Й ┘Ж┘Е╪з╪░╪м ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ╪г╪п╪з╪б┘Л ┘Б┘К ┘Б╪ж╪й ╪н╪м┘Е┘З ┘И┘В╪к ┘Г╪к╪з╪и╪й ┘З╪░╪з ╪з┘Д┘Ж╪╡ . ╪е╪░╪з ┘Г╪з┘Ж ┘Д╪п┘К┘Г ╪з┘Д╪░╪з┘Г╪▒╪й ╪з┘Д┘Г╪з┘Б┘К╪й ╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪з┘Д┘Ж╪╕╪▒ ┘Б┘К ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м ╪г┘Г╪и╪▒ ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г ┘Е╪л┘Д `Command-R` ╪г┘И `Mixtral-8x22B` ╪М ┘И┘Г┘Д╪з┘З┘Е╪з ┘К╪п╪╣┘Е ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ┘И┘К┘И┘Б╪▒ ╪г╪п╪з╪б┘Л ╪г┘В┘И┘Й .
|
||||
|
||||
|
||||
╪г┘И┘Д╪з┘Л ╪М ┘Д┘Ж┘В┘Е ╪и╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м┘Ж╪з ┘И tokenizer ╪з┘Д╪о╪з╪╡ ╪и┘Ж╪з:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
|
||||
{"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
|
||||
]
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж╪М ┘Д┘Ж┘В┘Е ┘Ж╪╖╪и┘В ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘И┘Ж┘И┘Д╪п ╪▒╪п:
|
||||
|
||||
```python
|
||||
inputs = tokenizer.apply_chat_template(messages, chat_template="tool_use", tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
||||
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
||||
out = model.generate(**inputs, max_new_tokens=128)
|
||||
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
|
||||
```
|
||||
|
||||
┘И┘Ж╪н╪╡┘Д ╪╣┘Д┘Й:
|
||||
|
||||
```text
|
||||
<tool_call>
|
||||
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
|
||||
</tool_call><|im_end|>
|
||||
```
|
||||
|
||||
┘Д┘В╪п ┘В╪з┘Е ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪п╪з┘Д╪й ┘Е╪╣ ┘Е╪╣╪з┘Ея╗╗╪к ╪╡╪н┘К╪н╪й╪М ╪и╪з┘Д╪╡┘К╪║╪й ╪з┘Д╪к┘К ╪╖┘Д╪и╪к┘З╪з ╪к┘И╪л┘К┘В ╪з┘Д╪п╪з┘Д╪й. ┘Д┘В╪п ╪з╪│╪к┘Ж╪к╪м ╪г┘Ж┘Ж╪з ┘Ж╪┤┘К╪▒ ╪╣┘Д┘Й ╪з┘Д╪г╪▒╪м╪н ╪е┘Д┘Й ╪и╪з╪▒┘К╪│ ┘Б┘К ┘Б╪▒┘Ж╪│╪з╪М ┘И╪к╪░┘Г╪▒ ╪г┘Ж┘З ╪и┘Г┘И┘Ж┘З╪з ┘Е┘И╪╖┘Ж ┘И╪н╪п╪з╪к ╪з┘Д┘В┘К╪з╪│ ╪з┘Д╪п┘И┘Д┘К╪й╪М ┘К╪м╪и ╪╣╪▒╪╢ ╪п╪▒╪м╪й ╪з┘Д╪н╪▒╪з╪▒╪й ┘Б┘К ┘Б╪▒┘Ж╪│╪з ╪и╪з┘Д╪п╪▒╪м╪й ╪з┘Д┘Е╪ж┘И┘К╪й.
|
||||
|
||||
╪п╪╣┘Ж╪з ┘Ж╪╢┘К┘Б ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п╪з╪й ╪з┘Д╪о╪з╪╡ ╪и╪з┘Д┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й ╪з┘Д┘Е╪н╪з╪п╪л╪й. ┘Д╪з╪н╪╕ ╪г┘Ж┘Ж╪з ┘Ж┘И┘Д╪п ┘Е╪╣╪▒┘Б ╪з╪│╪к╪п╪╣╪з╪б ╪г╪п╪з╪й ╪╣╪┤┘И╪з╪ж┘К┘Л╪з ┘З┘Ж╪з. ┘Д╪з ╪к╪│╪к╪о╪п┘Е ╪м┘Е┘К╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘З╪░┘З ╪з┘Д┘Е╪╣╪▒┘Б╪з╪к╪М ┘И┘Д┘Г┘Ж┘З╪з ╪к╪│┘Е╪н ┘Д┘Д┘Ж┘Е╪з╪░╪м ╪и╪е╪╡╪п╪з╪▒ ╪╣╪п╪й ╪з╪│╪к╪п╪╣╪з╪б╪з╪к ┘Д┘Д╪г╪п┘И╪з╪к ┘Б┘К ┘Ж┘Б╪│ ╪з┘Д┘И┘В╪к ┘И╪к╪к╪и╪╣ ╪з┘Д╪з╪│╪к╪м╪з╪и╪й ╪з┘Д┘Е┘В╪з╪и┘Д╪й ┘Д┘Г┘Д ╪з╪│╪к╪п╪╣╪з╪б. ┘К┘Е┘Г┘Ж┘Г ╪к┘И┘Д┘К╪п ┘З╪░┘З ╪з┘Д┘Е╪╣╪▒┘Б╪з╪к ╪и╪г┘К ╪╖╪▒┘К┘В╪й ╪к╪▒┘К╪п┘З╪з╪М ┘И┘Д┘Г┘Ж ┘К╪м╪и ╪г┘Ж ╪к┘Г┘И┘Ж ┘Б╪▒┘К╪п╪й ╪п╪з╪о┘Д ┘Г┘Д ┘Е╪н╪з╪п╪л╪й.
|
||||
|
||||
```python
|
||||
tool_call_id = "vAHdf3" # Random ID, should be unique for each tool call
|
||||
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
|
||||
messages.append({"role": "assistant", "tool_calls": [{"id": tool_call_id, "type": "function", "function": tool_call}]})
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ╪и╪╣╪п ╪г┘Ж ╪г╪╢┘Б┘Ж╪з ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п╪з╪й ╪е┘Д┘Й ╪з┘Д┘Е╪н╪з╪п╪л╪й╪М ┘К┘Е┘Г┘Ж┘Ж╪з ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪п╪з┘Д╪й ┘И╪е╪╢╪з┘Б╪й ╪з┘Д┘Ж╪к┘К╪м╪й ╪е┘Д┘Й ╪з┘Д┘Е╪н╪з╪п╪л╪й. ┘Ж╪╕╪▒┘Л╪з ┘Д╪г┘Ж┘Ж╪з ┘Ж╪│╪к╪о╪п┘Е ╪п╪з┘Д╪й ┘И┘З┘Е┘К╪й ┘Д┘З╪░╪з ╪з┘Д┘Е╪л╪з┘Д ┘И╪з┘Д╪к┘К ╪к╪╣┘К╪п ╪п╪з╪ж┘Е┘Л╪з 22.0╪М ┘Б┘К┘Е┘Г┘Ж┘Ж╪з ╪и╪и╪│╪з╪╖╪й ╪е╪╢╪з┘Б╪й ╪к┘Д┘Г ╪з┘Д┘Ж╪к┘К╪м╪й ┘Е╪и╪з╪┤╪▒╪й┘Л. ┘Д╪з╪н╪╕ ┘Е╪╣╪▒┘Б ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п╪з╪й - ┘К╪м╪и ╪г┘Ж ┘К╪к╪╖╪з╪и┘В ┘Е╪╣ ╪з┘Д┘Е╪╣╪▒┘Б ╪з┘Д┘Е╪│╪к╪о╪п┘Е ┘Б┘К ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п╪з╪й ╪г╪╣┘Д╪з┘З.
|
||||
|
||||
```python
|
||||
messages.append({"role": "tool", "tool_call_id": tool_call_id, "name": "get_current_temperature", "content": "22.0"})
|
||||
```
|
||||
|
||||
╪г╪о┘К╪▒┘Л╪з╪М ╪п╪╣┘Ж╪з ┘Ж╪м╪╣┘Д ╪з┘Д┘Е╪│╪з╪╣╪п ┘К┘В╪▒╪г ┘Е╪о╪▒╪м╪з╪к ╪з┘Д╪п╪з┘Д╪й ┘И┘К┘Г┘Е┘Д ╪з┘Д╪п╪▒╪п╪┤╪й ┘Е╪╣ ╪з┘Д┘Е╪│╪к╪о╪п┘Е:
|
||||
|
||||
```python
|
||||
inputs = tokenizer.apply_chat_template(messages, chat_template="tool_use", tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
||||
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
||||
out = model.generate(**inputs, max_new_tokens=128)
|
||||
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
|
||||
```
|
||||
|
||||
┘И┘Ж╪н╪╡┘Д ╪╣┘Д┘Й:
|
||||
|
||||
```text
|
||||
The current temperature in Paris, France is 22.0 ┬░ Celsius.<|im_end|>
|
||||
```
|
||||
|
||||
<Tip>
|
||||
┘Д╪з ╪к╪│╪к╪о╪п┘Е ╪м┘Е┘К╪╣ ┘Ж┘Е╪з╪░╪м ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ╪м┘Е┘К╪╣ ┘Е┘К╪▓╪з╪к ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д┘Е┘И╪╢╪н╪й ╪г╪╣┘Д╪з┘З. ┘К╪│╪к╪о╪п┘Е ╪з┘Д╪и╪╣╪╢ ┘Е╪╣╪▒┘Б╪з╪к ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п┘И╪з╪к╪М ╪и┘К┘Ж┘Е╪з ┘К╪│╪к╪о╪п┘Е ╪з┘Д╪и╪╣╪╢ ╪з┘Д╪в╪о╪▒ ╪и╪и╪│╪з╪╖╪й ╪з╪│┘Е ╪з┘Д╪п╪з┘Д╪й ┘И┘К┘В╪з╪▒┘Ж ╪з╪│╪к╪п╪╣╪з╪б╪з╪к ╪з┘Д╪г╪п┘И╪з╪к ╪и╪з┘Д┘Ж╪к╪з╪ж╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪к╪▒╪к┘К╪и╪М ┘И┘З┘Ж╪з┘Г ╪╣╪п╪й ┘Ж┘Е╪з╪░╪м ┘Д╪з ╪к╪│╪к╪о╪п┘Е ╪г┘К┘Л╪з ┘Е┘Ж┘З┘Е╪з ┘И┘Д╪з ╪к╪╡╪п╪▒ ╪│┘И┘Й ╪з╪│╪к╪п╪╣╪з╪б ╪г╪п╪з╪й ┘И╪з╪н╪п ┘Б┘К ┘Г┘Д ┘Е╪▒╪й ┘Д╪к╪м┘Ж╪и ╪з┘Д╪з╪▒╪к╪и╪з┘Г. ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪г┘Ж ┘К┘Г┘И┘Ж ╪▒┘Е╪▓┘Г ┘Е╪к┘И╪з┘Б┘В┘Л╪з ┘Е╪╣ ╪г┘Г╪и╪▒ ╪╣╪п╪п ┘Е┘Е┘Г┘Ж ┘Е┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м╪М ┘Б╪е┘Ж┘Ж╪з ┘Ж┘И╪╡┘К ╪и┘З┘К┘Г┘Д╪й ╪з╪│╪к╪п╪╣╪з╪б╪з╪к ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ┘Г┘Е╪з ┘З┘И ┘Е┘И╪╢╪н ┘З┘Ж╪з╪М ┘И╪е╪╣╪з╪п╪й ┘Ж╪к╪з╪ж╪м ╪з┘Д╪г╪п┘И╪з╪к ╪и╪з┘Д╪к╪▒╪к┘К╪и ╪з┘Д╪░┘К ╪г╪╡╪п╪▒┘З╪з ╪з┘Д┘Ж┘Е┘И╪░╪м. ┘К╪м╪и ╪г┘Ж ╪к╪к╪╣╪з┘Е┘Д ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪╣┘Д┘Й ┘Г┘Д ┘Ж┘Е┘И╪░╪м ┘Е╪╣ ╪з┘Д╪и╪з┘В┘К.
|
||||
</Tip>
|
||||
|
||||
### ┘Б┘З┘Е ┘Е╪о╪╖╪╖╪з╪к ╪з┘Д╪г╪п┘И╪з╪к
|
||||
|
||||
┘К╪к┘Е ╪к╪н┘И┘К┘Д ┘Г┘Д ╪п╪з┘Д╪й ╪к┘В┘И┘Е ╪и╪к┘Е╪▒┘К╪▒┘З╪з ╪е┘Д┘Й ┘Е╪╣╪з┘Е┘Д `tools` ┘Б┘К ╪п╪з┘Д╪й `apply_chat_template` ╪е┘Д┘Й [┘Е╪о╪╖╪╖ JSON](https://json-schema.org/learn/getting-started-step-by-step). ┘К╪к┘Е ╪и╪╣╪п ╪░┘Д┘Г ╪к┘Е╪▒┘К╪▒ ┘З╪░┘З ╪з┘Д┘Е╪о╪╖╪╖╪з╪к ╪е┘Д┘Й ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д┘Ж┘Е┘И╪░╪м. ┘И╪и╪╣╪и╪з╪▒╪й ╪г╪о╪▒┘Й╪М ┘Б╪е┘Ж ┘Ж┘Е╪з╪░╪м ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ┘Д╪з ╪к╪▒┘Й ╪п┘И╪з┘Д┘Г ┘Е╪и╪з╪┤╪▒╪й╪М ┘И┘Д╪з ╪к╪▒┘Й ┘Е╪╖┘Д┘В┘Л╪з ╪з┘Д┘Г┘И╪п ╪з┘Д┘Е┘И╪м┘И╪п ╪и╪п╪з╪о┘Д┘З╪з. ┘Е╪з ┘К┘З┘Е┘З╪з ┘З┘И**╪к╪╣╪▒┘К┘Б╪з╪к** ╪з┘Д╪п┘И╪з┘Д ┘И**╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к** ╪з┘Д╪к┘К ╪к╪н╪к╪з╪м ╪е┘Д┘Й ╪к┘Е╪▒┘К╪▒┘З╪з ╪е┘Д┘К┘З╪з - ┘Б┘З┘К ╪к┘З╪к┘Е ╪и┘Е╪з ╪к┘Б╪╣┘Д┘З ╪з┘Д╪г╪п┘И╪з╪к ┘И┘Г┘К┘Б┘К╪й ╪з╪│╪к╪о╪п╪з┘Е┘З╪з╪М ┘И┘Д┘К╪│ ╪и┘Г┘К┘Б┘К╪й ╪╣┘Е┘Д┘З╪з! ┘К┘В╪╣ ╪╣┘Д┘Й ╪╣╪з╪к┘В┘Г ┘В╪▒╪з╪б╪й ┘Е╪о╪▒╪м╪з╪к┘З╪з╪М ┘И╪з┘Д┘Г╪┤┘Б ╪╣┘Е╪з ╪е╪░╪з ┘Г╪з┘Ж╪к ┘В╪п ╪╖┘Д╪и╪к ╪з╪│╪к╪о╪п╪з┘Е ╪г╪п╪з╪й╪М ┘И╪к┘Е╪▒┘К╪▒ ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ╪е┘Д┘Й ╪п╪з┘Д╪й ╪з┘Д╪г╪п╪з╪й╪М ┘И╪е╪▒╪м╪з╪╣ ╪з┘Д╪▒╪п ┘Б┘К ╪з┘Д╪п╪▒╪п╪┤╪й.
|
||||
|
||||
┘К╪м╪и ╪г┘Ж ┘К┘Г┘И┘Ж ╪е┘Ж╪┤╪з╪б ┘Е╪о╪╖╪╖╪з╪к JSON ┘Д╪к┘Е╪▒┘К╪▒┘З╪з ╪е┘Д┘Й ╪з┘Д┘В╪з┘Д╪и ╪к┘Д┘В╪з╪ж┘К┘Л╪з ┘И╪║┘К╪▒ ┘Е╪▒╪ж┘К ╪╖╪з┘Д┘Е╪з ╪г┘Ж ╪п┘И╪з┘Д┘Г ╪к╪к╪и╪╣ ╪з┘Д┘Е┘И╪з╪╡┘Б╪з╪к ╪з┘Д┘Е┘И╪╢╪н╪й ╪г╪╣┘Д╪з┘З╪М ┘И┘Д┘Г┘Ж ╪е╪░╪з ┘И╪з╪м┘З╪к ┘Е╪┤┘Г┘Д╪з╪к╪М ╪г┘И ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪и╪и╪│╪з╪╖╪й ┘Е╪▓┘К╪п┘Л╪з ┘Е┘Ж ╪з┘Д╪к╪н┘Г┘Е ┘Б┘К ╪з┘Д╪к╪н┘И┘К┘Д╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪з┘Д╪к╪╣╪з┘Е┘Д ┘Е╪╣ ╪з┘Д╪к╪н┘И┘К┘Д ┘К╪п┘И┘К┘Л╪з. ┘Б┘К┘Е╪з ┘К┘Д┘К ┘Е╪л╪з┘Д ╪╣┘Д┘Й ╪к╪н┘И┘К┘Д ┘Е╪о╪╖╪╖ ┘К╪п┘И┘К:
|
||||
|
||||
```python
|
||||
from transformers.utils import get_json_schema
|
||||
|
||||
def multiply(a: float, b: float):
|
||||
"""
|
||||
A function that multiplies two numbers
|
||||
|
||||
Args:
|
||||
a: The first number to multiply
|
||||
b: The second number to multiply
|
||||
"""
|
||||
return a * b
|
||||
|
||||
schema = get_json_schema(multiply)
|
||||
print(schema)
|
||||
```
|
||||
|
||||
╪│┘К╪д╪п┘К ┘З╪░╪з ╪е┘Д┘Й ┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "multiply",
|
||||
"description": "A function that multiplies two numbers",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"a": {
|
||||
"type": "number",
|
||||
"description": "The first number to multiply"
|
||||
},
|
||||
"b": {
|
||||
"type": "number",
|
||||
"description": "The second number to multiply"
|
||||
}
|
||||
},
|
||||
"required": ["a", "b"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒╪║╪и ┘Б┘К ╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪к╪н╪▒┘К╪▒ ┘З╪░┘З ╪з┘Д┘Е╪о╪╖╪╖╪з╪к╪М ╪г┘И ╪н╪к┘Й ┘Г╪к╪з╪и╪к┘З╪з ┘Е┘Ж ╪з┘Д╪и╪п╪з┘К╪й ╪и┘Ж┘Б╪│┘Г ╪п┘И┘Ж ╪з╪│╪к╪о╪п╪з┘Е `get_json_schema` ╪╣┘Д┘Й ╪з┘Д╪е╪╖┘Д╪з┘В. ┘К┘Е┘Г┘Ж ╪к┘Е╪▒┘К╪▒ ┘Е╪о╪╖╪╖╪з╪к JSON ┘Е╪и╪з╪┤╪▒╪й┘Л ╪е┘Д┘Й ┘Е╪╣╪з┘Е┘Д `tools` ┘Б┘К `apply_chat_template` - ┘К┘Е┘Ж╪н┘Г ┘З╪░╪з ╪з┘Д┘Г╪л┘К╪▒ ┘Е┘Ж ╪з┘Д┘В┘И╪й ┘Д╪к╪╣╪▒┘К┘Б ┘Е╪о╪╖╪╖╪з╪к ╪п┘В┘К┘В╪й ┘Д┘И╪╕╪з╪ж┘Б ╪г┘Г╪л╪▒ ╪к╪╣┘В┘К╪п┘Л╪з. ┘И┘Д┘Г┘Ж ┘Г┘Ж ╪н╪░╪▒┘Л╪з - ┘Г┘Д┘Е╪з ╪▓╪з╪п ╪к╪╣┘В┘К╪п ┘Е╪о╪╖╪╖╪з╪к┘Г╪М ╪▓╪з╪п ╪з╪н╪к┘Е╪з┘Д ╪з╪▒╪к╪и╪з┘Г ╪з┘Д┘Ж┘Е┘И╪░╪м ╪╣┘Ж╪п ╪з┘Д╪к╪╣╪з┘Е┘Д ┘Е╪╣┘З╪з! ┘Ж┘И╪╡┘К ╪и╪к┘И┘В┘К╪╣╪з╪к ╪п┘И╪з┘Д ╪и╪│┘К╪╖╪й ╪н┘К╪л┘Е╪з ╪г┘Е┘Г┘Ж╪М ┘Е╪╣ ╪к┘В┘Д┘К┘Д ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к (┘И╪о╪з╪╡╪й ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ╪з┘Д┘Е╪╣┘В╪п╪й ┘И╪з┘Д┘Е╪к╪п╪з╪о┘Д╪й) ╪е┘Д┘Й ╪з┘Д╪н╪п ╪з┘Д╪г╪п┘Ж┘Й.
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Е╪л╪з┘Д ╪╣┘Д┘Й ╪к╪╣╪▒┘К┘Б ╪з┘Д┘Е╪о╪╖╪╖╪з╪к ┘К╪п┘И┘К┘Л╪з╪М ┘И╪к┘Е╪▒┘К╪▒┘З╪з ┘Е╪и╪з╪┤╪▒╪й┘Л ╪е┘Д┘Й `apply_chat_template`:
|
||||
|
||||
```python
|
||||
# A simple function that takes no arguments
|
||||
current_time = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "current_time",
|
||||
"description": "Get the current local time as a string.",
|
||||
"parameters": {
|
||||
'type': 'object',
|
||||
'properties': {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# A more complete function that takes two numerical arguments
|
||||
multiply = {
|
||||
'type': 'function',
|
||||
'function': {
|
||||
'name': 'multiply',
|
||||
'description': 'A function that multiplies two numbers',
|
||||
'parameters': {
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'a': {
|
||||
'type': 'number',
|
||||
'description': 'The first number to multiply'
|
||||
},
|
||||
'b': {
|
||||
'type': 'number', 'description': 'The second number to multiply'
|
||||
}
|
||||
},
|
||||
'required': ['a', 'b']
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
model_input = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tools = [current_time, multiply]
|
||||
)
|
||||
```
|
||||
|
||||
## ┘Е╪к┘В╪п┘Е: ╪к┘И┘Д┘К╪п ┘В╪з╪ж┘Е ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣
|
||||
┘К┘Е┘Г┘Ж ┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪й ╪з┘Д┘Г╪и┘К╪▒╪й ┘Е┘Ж ┘Ж┘И╪╣ "╪к┘И┘Д┘К╪п ┘В╪з╪ж┘Е ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣" ╪г┘И "RAG" ╪з┘Д╪и╪н╪л ┘Б┘К ┘Е╪м┘Е┘И╪╣╪й ┘Ж╪╡┘И╪╡ ╪╣┘Ж ┘Е╪╣┘Д┘И┘Е╪з╪к ┘В╪и┘Д ╪з┘Д╪▒╪п ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к╪╣┘Д╪з┘Е. ┘К╪│┘Е╪н ┘З╪░╪з ┘Д┘Д┘Ж┘Е╪з╪░╪м ╪и╪к┘И╪│┘К╪╣ ┘В╪з╪╣╪п╪й ┘Е╪╣╪з╪▒┘Б┘З╪з ╪и╪┤┘Г┘Д ┘Г╪и┘К╪▒ ╪е┘Д┘Й ┘Е╪з ┘З┘И ╪г╪и╪╣╪п ┘Е┘Ж ╪н╪м┘Е ╪│┘К╪з┘В┘З╪з ╪з┘Д┘Е╪н╪п┘И╪п. ╪к┘И╪╡┘К╪к┘Ж╪з ┘Д┘Ж┘Е╪з╪░╪м RAG ┘З┘К ╪г┘Ж ┘К┘В╪и┘Д ┘В╪з┘Д╪и┘З╪з ┘И╪│┘К╪╖╪й `documents`. ┘К╪м╪и ╪г┘Ж ╪к┘Г┘И┘Ж ┘З╪░┘З ┘В╪з╪ж┘Е╪й ┘Е┘Ж ╪з┘Д┘Е╪│╪к┘Ж╪п╪з╪к╪М ╪н┘К╪л ┘К┘Г┘И┘Ж ┘Г┘Д "┘Е╪│╪к┘Ж╪п" ╪╣╪и╪з╪▒╪й ╪╣┘Ж ┘В╪з┘Е┘И╪│ ┘И╪з╪н╪п ╪и┘Е┘Б╪з╪к┘К╪н `title` ┘И `contents`╪М ┘И┘Г┘Д╪з┘З┘Е╪з ╪│┘Д╪з╪│┘Д ┘Ж╪╡┘К╪й. ┘Ж╪╕╪▒┘Л╪з ┘Д╪г┘Ж ┘З╪░╪з ╪з┘Д╪к┘Ж╪│┘К┘В ╪г╪и╪│╪╖ ╪и┘Г╪л┘К╪▒ ┘Е┘Ж ┘Е╪о╪╖╪╖╪з╪к JSON ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ┘Д┘Д╪г╪п┘И╪з╪к╪М ┘Б┘Д╪з ╪к┘И╪м╪п ╪н╪з╪м╪й ╪е┘Д┘Й ╪п┘И╪з┘Д ┘Е╪│╪з╪╣╪п╪й.
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Е╪л╪з┘Д ╪╣┘Д┘Й ┘В╪з┘Д╪и RAG ╪и╪з┘Д┘Б╪╣┘Д:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
# ╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К
|
||||
model_id = "CohereForAI/c4ai-command-r-v01-4bit"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
|
||||
device = model.device # ╪з┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪з┘Д╪м┘З╪з╪▓ ╪з┘Д╪░┘К ╪к┘Е ╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪╣┘Д┘К┘З
|
||||
|
||||
# ╪к╪╣╪▒┘К┘Б ┘Е┘П╪п╪о┘Д╪з╪к ╪з┘Д┘Е╪н╪з╪п╪л╪й
|
||||
conversation = [
|
||||
{"role": "user", "content": "What has Man always dreamed of?"}
|
||||
]
|
||||
|
||||
# ╪к╪╣╪▒┘К┘Б ╪з┘Д┘Е╪│╪к┘Ж╪п╪з╪к ┘Д╪к┘И┘Д┘К╪п ┘В╪з╪ж┘Е ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣
|
||||
documents = [
|
||||
{
|
||||
"title": "The Moon: Our Age-Old Foe",
|
||||
"text": "Man has always dreamed of destroying the moon. In this essay, I shall..."
|
||||
},
|
||||
{
|
||||
"title": "The Sun: Our Age-Old Friend",
|
||||
"text": "Although often underappreciated, the sun provides several notable benefits..."
|
||||
}
|
||||
]
|
||||
# ┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪н╪з╪п╪л╪й ┘И╪з┘Д┘Е╪│╪к┘Ж╪п╪з╪к ╪и╪з╪│╪к╪о╪п╪з┘Е ┘В╪з┘Д╪и RAG╪М ┘И╪е╪▒╪м╪з╪╣ ┘Е┘И╪к╪▒╪з╪к PyTorch.
|
||||
input_ids = tokenizer.apply_chat_template(
|
||||
conversation=conversation,
|
||||
documents=documents,
|
||||
chat_template="rag",
|
||||
tokenize=True,
|
||||
add_generation_prompt=True,
|
||||
return_tensors="pt").to(device)
|
||||
|
||||
# ╪к┘И┘Д┘К╪п ╪з┘Д╪▒╪п
|
||||
gen_tokens = model.generate(
|
||||
input_ids,
|
||||
max_new_tokens=100,
|
||||
do_sample=True,
|
||||
temperature=0.3,
|
||||
)
|
||||
|
||||
# ┘Б┘Г ╪к╪┤┘Б┘К╪▒ ╪з┘Д┘Ж╪╡ ╪з┘Д┘Е┘П┘И┘О┘Д┘С╪п ┘И╪╖╪и╪з╪╣╪к┘З
|
||||
gen_text = tokenizer.decode(gen_tokens[0])
|
||||
print(gen_text)
|
||||
```
|
||||
╪е┘Ж ┘Е┘П╪п╪о┘Д documents ┘Д┘Д╪к┘И┘Д┘К╪п ╪з┘Д┘В╪з╪ж┘Е ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣ ╪║┘К╪▒ ┘Е╪п╪╣┘И┘Е ╪╣┘Д┘Й ┘Ж╪╖╪з┘В ┘И╪з╪│╪╣╪М ┘И╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д╪п┘К┘З╪з ┘В┘И╪з┘Д╪и ╪п╪▒╪п╪┤╪й ╪к╪к╪м╪з┘З┘Д ┘З╪░╪з ╪з┘Д┘Е┘П╪п╪о┘Д ╪и╪и╪│╪з╪╖╪й.
|
||||
|
||||
┘Д┘Д╪к╪н┘В┘В ┘Е┘Е╪з ╪е╪░╪з ┘Г╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘К╪п╪╣┘Е ┘Е┘П╪п╪о┘Д `documents`╪М ┘К┘Е┘Г┘Ж┘Г ┘В╪▒╪з╪б╪й ╪и╪╖╪з┘В╪й ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪о╪з╪╡╪й ╪и┘З╪М ╪г┘И `print(tokenizer.chat_template)` ┘Д┘Е╪╣╪▒┘Б╪й ┘Е╪з ╪е╪░╪з ┘Г╪з┘Ж ┘Е┘Б╪к╪з╪н `documents` ┘Е╪│╪к╪о╪п┘Е┘Л╪з ┘Б┘К ╪г┘К ┘Е┘Г╪з┘Ж.
|
||||
<Tip>
|
||||
┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘Б╪е┘Ж ╪г╪н╪п ┘Б╪ж╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ╪к╪п╪╣┘Е┘З ┘З┘К [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) ┘И [Command-R+](https://huggingface.co/CohereForAI/c4ai-command-r-pluse-08-2024) ┘Е┘Ж Cohere╪М ┘Е┘Ж ╪о┘Д╪з┘Д ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й rag ╪з┘Д╪о╪з╪╡ ╪и┘З┘Е. ┘К┘Е┘Г┘Ж┘Г ╪▒╪д┘К╪й ╪г┘Е╪л┘Д╪й ╪е╪╢╪з┘Б┘К╪й ╪╣┘Д┘Й ╪з┘Д╪к┘И┘Д┘К╪п ╪и╪з╪│╪к╪о╪п╪з┘Е ┘З╪░┘З ╪з┘Д┘Е┘К╪▓╪й ┘Б┘К ╪и╪╖╪з┘В╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪о╪з╪╡╪й ╪и┘З┘Е.
|
||||
</Tip>
|
||||
|
||||
## ┘Е╪к┘В╪п┘Е: ┘Г┘К┘Б ╪к╪╣┘Е┘Д ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й╪Я
|
||||
┘К╪к┘Е ╪к╪о╪▓┘К┘Ж ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘Д┘Д┘Ж┘Е┘И╪░╪м ┘Б┘К ╪з┘Д╪о╪з╪╡┘К╪й `tokenizer.chat_template`. ╪е╪░╪з ┘Д┘Е ┘К╪к┘Е ╪к╪╣┘К┘К┘Ж ┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й╪М ┘Б╪│┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К ┘Д┘Б╪ж╪й ╪з┘Д┘Ж┘Е┘И╪░╪м ┘З╪░┘З ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г. ╪п╪╣┘И┘Ж╪з ┘Ж┘Д┘В┘К ┘Ж╪╕╪▒╪й ╪╣┘Д┘Й ┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й `Zephyr`╪М ┘И┘Д┘Г┘Ж ┘Д╪з╪н╪╕ ╪г┘Ж ┘З╪░╪з ╪з┘Д┘В╪з┘Д╪и ┘Е┘П╪и╪│┘С╪╖ ┘В┘Д┘К┘Д╪з┘Л ╪╣┘Ж ╪з┘Д┘В╪з┘Д╪и ╪з┘Д┘Б╪╣┘Д┘К!
|
||||
|
||||
```
|
||||
{%- for message in messages %}
|
||||
{{- '<|' + message['role'] + |>\n' }}
|
||||
{{- message['content'] + eos_token }}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|assistant|>\n' }}
|
||||
{%- endif %}
|
||||
```
|
||||
╪е╪░╪з ┘Д┘Е ╪к┘Г┘Ж ┘В╪п ╪▒╪г┘К╪к ╪г╪н╪п ┘З╪░┘З ╪з┘Д┘В┘И╪з┘Д╪и ┘Е┘Ж ┘В╪и┘Д╪М ┘Б┘З╪░╪з [┘В╪з┘Д╪и Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/) .Jinja ┘З┘К ┘Д╪║╪й ┘В┘И╪з┘Д╪и ╪к╪│┘Е╪н ┘Д┘Г ╪и┘Г╪к╪з╪и╪й ╪к╪╣┘Д┘К┘Е╪з╪к ╪и╪▒┘Е╪м┘К╪й ╪и╪│┘К╪╖╪й ╪к┘П┘И┘О┘Д┘С╪п ┘Ж╪╡┘Л╪з. ┘Е┘Ж ┘Ж┘И╪з╪н┘Н ╪╣╪п┘К╪п╪й╪М ┘К┘П╪┤╪и┘З ╪з┘Д╪▒┘Е╪▓ ┘И╪з┘Д╪к╪▒┘Г┘К╪и ┘Д┘Д╪║╪й Python. ╪г┘Е╪з ┘Б┘К ┘Д╪║╪й Python╪М ╪│┘К╪и╪п┘И ┘З╪░╪з ╪з┘Д┘В╪з┘Д╪и ┘Г┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```python
|
||||
for message in messages:
|
||||
print(f'<|{message["role"]}|>')
|
||||
print(message['content'] + eos_token)
|
||||
if add_generation_prompt:
|
||||
print('<|assistant|>')
|
||||
```
|
||||
┘К┘В┘И┘Е ╪з┘Д┘В╪з┘Д╪и ╪и╪л┘Д╪з╪л╪й ╪г╪┤┘К╪з╪б ╪и╪┤┘Г┘Д ┘Б╪╣╪з┘Д:
|
||||
|
||||
- ┘Д┘Г┘Д ╪▒╪│╪з┘Д╪й╪М ╪и╪╖╪и╪╣ ╪з┘Д╪п┘И╪▒ ┘Е┘П╪н╪з╪╖┘Л╪з ╪и┘А `<|` ┘И `|>`╪М ┘Е╪л┘Д `<|user|>` ╪г┘И `<|assistant|>`.
|
||||
- ╪и╪╣╪п ╪░┘Д┘Г╪М ┘К╪╖╪и╪╣ ┘Е╪н╪к┘И┘Й ╪з┘Д╪▒╪│╪з┘Д╪й╪М ┘Е╪к╪и┘И╪╣┘Л╪з ╪и╪▒┘Е╪▓ ┘Ж┘З╪з┘К╪й ╪з┘Д╪к╪│┘Д╪│┘Д `eos_token` .
|
||||
- ╪г╪о┘К╪▒┘Л╪з╪М ╪е╪░╪з ╪к┘Е ╪к╪╣┘К┘К┘Ж `add_generation_prompt` ╪М ┘К╪╖╪и╪╣ ╪з┘Д╪▒┘Е╪▓ ╪з┘Д┘Е╪│╪з╪╣╪п╪М ╪н╪к┘Й ┘К╪╣╪▒┘Б ╪з┘Д┘Ж┘Е┘И╪░╪м ╪г┘Ж┘З ┘К╪м╪и ╪г┘Ж ┘К╪и╪п╪г ┘Б┘К ╪к┘И┘Д┘К╪п ╪з╪│╪к╪м╪з╪и╪й ╪з┘Д┘Е╪│╪з╪╣╪п.
|
||||
|
||||
┘З╪░╪з ┘В╪з┘Д╪и ╪и╪│┘К╪╖ ╪м╪п┘Л╪з╪М ┘Д┘Г┘Ж Jinja ╪к┘Е┘Ж╪н┘Г ╪з┘Д┘Г╪л┘К╪▒ ┘Е┘Ж ╪з┘Д┘Е╪▒┘И┘Ж╪й ┘Д┘Д┘В┘К╪з┘Е ╪и╪г╪┤┘К╪з╪б ╪г┘Г╪л╪▒ ╪к╪╣┘В┘К╪п┘Л╪з! ╪п╪╣┘И┘Ж╪з ┘Ж╪▒┘Й ┘В╪з┘Д╪и Jinja ┘К┘П┘Е┘Г┘Ж┘З ╪к┘Ж╪│┘К┘В ╪з┘Д┘Е┘П╪п╪о┘Д╪з╪к ╪и╪╖╪▒┘К┘В╪й ╪к┘П╪┤╪и┘З ╪з┘Д╪╖╪▒┘К┘В╪й ╪з┘Д╪к┘К ╪к┘П┘Ж╪│┘С┘В ╪и┘З╪з LLaMA ┘Е┘П╪п╪о┘Д╪з╪к┘З╪з (┘Д╪з╪н╪╕ ╪г┘Ж ┘В╪з┘Д╪и LLaMA ╪з┘Д╪н┘В┘К┘В┘К ┘К╪к╪╢┘Е┘Ж ┘Е╪╣╪з┘Д╪м╪й ┘Д╪▒╪│╪з╪ж┘Д ╪з┘Д┘Ж╪╕╪з┘Е ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ┘И┘Е╪╣╪з┘Д╪м╪й ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Ж╪╕╪з┘Е ╪и╪┤┘Г┘Д ┘Е╪о╪к┘Д┘Б ┘В┘Д┘К┘Д╪з┘Л ╪и╪┤┘Г┘Д ╪╣╪з┘Е - ┘Д╪з ╪к╪│╪к╪о╪п┘Е ┘З╪░╪з ╪з┘Д┘В╪з┘Д╪и ┘Б┘К ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪з┘Д┘Б╪╣┘Д┘К╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘Г!)
|
||||
```
|
||||
{%- for message in messages %}
|
||||
{%- if message['role'] == 'user' %}
|
||||
{{- bos_token + '[INST] ' + message['content'] + ' [/INST]' }}
|
||||
{%- elif message['role'] == 'system' %}
|
||||
{{- '<<SYS>>\\n' + message['content'] + '\\n<</SYS>>\\n\\n' }}
|
||||
{%- elif message['role'] == 'assistant' %}
|
||||
{{- ' ' + message['content'] + ' ' + eos_token }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
```
|
||||
┘Ж╪г┘Е┘Д ╪г┘Ж┘З ╪е╪░╪з ╪н╪п┘В╪к ┘Б┘К ┘З╪░╪з ┘Д┘Б╪к╪▒╪й ┘В╪╡┘К╪▒╪й╪М ┘К┘Е┘Г┘Ж┘Г ╪г┘Ж ╪к╪▒┘Й ┘Е╪з ┘К┘Б╪╣┘Д┘З ┘З╪░╪з ╪з┘Д┘В╪з┘Д╪и - ┘Б┘З┘И ┘К┘П╪╢┘К┘Б ╪▒┘Е┘И╪▓┘Л╪з ┘Е┘П╪н╪п╪п╪й ┘Е╪л┘Д `[INST]` ┘И `[/INST]` ╪и┘Ж╪з╪б┘Л ╪╣┘Д┘Й ╪п┘И╪▒ ┘Г┘Д ╪▒╪│╪з┘Д╪й. ┘К┘Е┘Г┘Ж ╪к┘Е┘К┘К╪▓ ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪к╪о╪п┘Е ┘И╪з┘Д┘Е╪│╪з╪╣╪п ┘И╪з┘Д┘Ж╪╕╪з┘Е ╪и┘И╪╢┘И╪н ┘Д┘Д┘Ж┘Е┘И╪░╪м ╪и╪│╪и╪и ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪к┘К ╪к┘П╪н┘К╪╖ ╪и┘З╪з.
|
||||
|
||||
## ┘Е╪к┘В╪п┘Е: ╪е╪╢╪з┘Б╪й ┘И╪к╪╣╪п┘К┘Д ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й
|
||||
|
||||
### ┘Г┘К┘Б ╪г┘Ж╪┤╪ж ┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й╪Я
|
||||
╪и╪и╪│╪з╪╖╪й╪М ╪з┘Г╪к╪и ┘В╪з┘Д╪и Jinja ┘И╪з╪╢╪и╪╖ `tokenizer.chat_template`. ┘В╪п ╪к╪м╪п ╪г┘Ж┘З ┘Е┘Ж ╪з┘Д╪г╪│┘З┘Д ╪з┘Д╪и╪п╪б ╪и┘В╪з┘Д╪и ┘Е┘И╪м┘И╪п ┘Е┘Ж ┘Ж┘Е┘И╪░╪м ╪в╪о╪▒ ┘И╪к╪н╪▒┘К╪▒┘З ╪и╪и╪│╪з╪╖╪й ┘Д┘К┘Ж╪з╪│╪и ╪з╪н╪к┘К╪з╪м╪з╪к┘Г! ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘К┘Е┘Г┘Ж┘Ж╪з ╪г┘Ж ┘Ж╪г╪о╪░ ┘В╪з┘Д╪и LLaMA ╪г╪╣┘Д╪з┘З ┘И┘Ж╪╢┘К┘Б `[ASST]` ┘И `[/ASST]` ╪е┘Д┘Й ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪з╪╣╪п:
|
||||
|
||||
```
|
||||
{%- for message in messages %}
|
||||
{%- if message['role'] == 'user' %}
|
||||
{{- bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }}
|
||||
{%- elif message['role'] == 'system' %}
|
||||
{{- '<<SYS>>\\n' + message['content'].strip() + '\\n<</SYS>>\\n\\n' }}
|
||||
{%- elif message['role'] == 'assistant' %}
|
||||
{{- '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж╪М ╪з╪╢╪и╪╖ ╪и╪и╪│╪з╪╖╪й ╪з┘Д╪о╪з╪╡┘К╪й `tokenizer.chat_template`. ┘Б┘К ╪з┘Д┘Е╪▒╪й ╪з┘Д┘В╪з╪п┘Е╪й ╪з┘Д╪к┘К ╪к╪│╪к╪о╪п┘Е ┘Б┘К┘З╪з [`~PreTrainedTokenizer.apply_chat_template`] ╪М ╪│┘К╪│╪к╪о╪п┘Е ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪м╪п┘К╪п ╪з┘Д╪о╪з╪╡ ╪и┘Г! ╪│┘К╪к┘Е ╪н┘Б╪╕ ┘З╪░┘З ╪з┘Д╪о╪з╪╡┘К╪й ┘Б┘К ┘Е┘Д┘Б `tokenizer_config.json`╪М ╪н╪к┘Й ╪к╪к┘Е┘Г┘Ж ┘Е┘Ж ╪з╪│╪к╪о╪п╪з┘Е [`~utils.PushToHubMixin.push_to_hub`] ┘Д╪к╪н┘Е┘К┘Д ┘В╪з┘Д╪и┘Г ╪з┘Д╪м╪п┘К╪п ╪е┘Д┘Й Hub ┘И╪з┘Д╪к╪г┘Г╪п ┘Е┘Ж ╪г┘Ж ╪з┘Д╪м┘Е┘К╪╣ ┘К╪│╪к╪о╪п┘Е ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪╡╪н┘К╪н ┘Д┘Ж┘Е┘И╪░╪м┘Г!
|
||||
|
||||
```python
|
||||
template = tokenizer.chat_template
|
||||
template = template.replace("SYS", "SYSTEM") # ╪к╪║┘К┘К╪▒ ╪▒┘Е╪▓ ╪з┘Д┘Ж╪╕╪з┘Е
|
||||
tokenizer.chat_template = template # ╪к╪╣┘К┘К┘Ж ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪м╪п┘К╪п
|
||||
tokenizer.push_to_hub("model_name") # ╪к╪н┘Е┘К┘Д ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪м╪п┘К╪п ╪е┘Д┘Й Hub!
|
||||
```
|
||||
|
||||
┘К╪к┘Е ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪п╪з┘Д╪й [`~PreTrainedTokenizer.apply_chat_template`] ╪з┘Д╪░┘К ┘Ж╪│╪к╪о╪п┘Е ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪о╪з╪╡ ╪и┘Г ╪и┘И╪з╪│╪╖╪й ┘Б╪ж╪й [`TextGenerationPipeline`] ┘Д╪░┘Д┘Г ╪и┘Е╪м╪▒╪п ╪к╪╣┘К┘К┘Ж ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪╡╪н┘К╪н╪М ╪│┘К╪╡╪и╪н ┘Ж┘Е┘И╪░╪м┘Г ┘Е╪к┘И╪з┘Б┘В┘Л╪з ╪к┘Д┘В╪з╪ж┘К┘Л╪з ┘Е╪╣ [`TextGenerationPipeline`].
|
||||
|
||||
<Tip>
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к┘П╪м╪▒┘К ╪╢╪и╪╖┘Л╪з ╪п┘В┘К┘В┘Л╪з ┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Д╪п╪▒╪п╪┤╪й╪М ╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ╪к╪╣┘К┘К┘Ж ┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й╪М ┘Б╪▒╪и┘Е╪з ┘К╪м╪и ╪╣┘Д┘К┘Г ╪е╪╢╪з┘Б╪й ╪г┘К ╪▒┘Е┘И╪▓ ╪к╪н┘Г┘Е ╪п╪▒╪п╪┤╪й ╪м╪п┘К╪п╪й ┘Г╪▒┘Е┘И╪▓ ╪о╪з╪╡╪й ┘Б┘К ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К. ┘Д╪з ┘К╪к┘Е ╪к┘В╪│┘К┘Е ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й ╪г╪и╪п┘Л╪з╪М ┘Е┘Е╪з ┘К╪╢┘Е┘Ж ┘Е╪╣╪з┘Д╪м╪й ╪▒┘Е┘И╪▓ ╪з┘Д╪к╪н┘Г┘Е ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ╪п╪з╪ж┘Е┘Л╪з ┘Г╪▒┘Е┘И╪▓ ┘Б╪▒╪п┘К╪й ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪к╪м╪▓╪ж╪к┘З╪з ╪е┘Д┘Й ╪г╪м╪▓╪з╪б. ┘К╪м╪и ╪╣┘Д┘К┘Г ╪г┘К╪╢┘Л╪з ╪к╪╣┘К┘К┘Ж ╪о╪з╪╡┘К╪й `eos_token` ┘Д┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К ╪е┘Д┘Й ╪з┘Д╪▒┘Е╪▓ ╪з┘Д╪░┘К ┘К┘П╪┤┘К╪▒ ╪е┘Д┘Й ┘Ж┘З╪з┘К╪й ╪к┘И┘Д┘К╪п╪з╪к ╪з┘Д┘Е╪│╪з╪╣╪п ┘Б┘К ┘В╪з┘Д╪и┘Г. ╪│┘К╪╢┘Е┘Ж ┘З╪░╪з ╪г┘Ж ╪г╪п┘И╪з╪к ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡┘И╪╡ ┘К┘Е┘Г┘Ж┘З╪з ╪к╪н╪п┘К╪п ┘И┘В╪к ╪е┘К┘В╪з┘Б ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡ ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н.
|
||||
</Tip>
|
||||
|
||||
### ┘Д┘Е╪з╪░╪з ╪к╪н╪к┘И┘К ╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪╣┘Д┘Й ┘В┘И╪з┘Д╪и ┘Е╪к╪╣╪п╪п╪й╪Я
|
||||
╪к╪│╪к╪о╪п┘Е ╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘В┘И╪з┘Д╪и ┘Е╪о╪к┘Д┘Б╪й ┘Д╪н╪з┘Д╪з╪к ╪з╪│╪к╪о╪п╪з┘Е ┘Е╪о╪к┘Д┘Б╪й. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘В╪п ╪к╪│╪к╪о╪п┘Е ┘В╪з┘Д╪и┘Л╪з ┘И╪з╪н╪п┘Л╪з ┘Д┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪╣╪з╪п┘К╪й ┘И╪в╪о╪▒ ┘Д╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к╪М ╪г┘И ╪з┘Д╪к┘И┘Д┘К╪п ╪з┘Д┘В╪з╪ж┘Е ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣. ┘Б┘К ┘З╪░┘З ╪з┘Д╪н╪з┘Д╪з╪к╪М ╪к┘Г┘И┘Ж `tokenizer.chat_template` ┘В╪з┘Е┘И╪│┘Л╪з. ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К╪к╪│╪и╪и ┘З╪░╪з ┘Б┘К ╪и╪╣╪╢ ╪з┘Д╪з╪▒╪к╪и╪з┘Г╪М ┘И╪н┘К╪л┘Е╪з ╪г┘Е┘Г┘Ж╪М ┘Ж┘И╪╡┘К ╪и╪з╪│╪к╪о╪п╪з┘Е ┘В╪з┘Д╪и ┘И╪з╪н╪п ┘Д╪м┘Е┘К╪╣ ╪н╪з┘Д╪з╪к ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е. ┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е ╪╣╪и╪з╪▒╪з╪к Jinja ┘Е╪л┘Д `if tools is defined` ┘И╪к╪╣╪▒┘К┘Б╪з╪к `{% macro %}` ┘Д╪к╪╢┘Е┘К┘Ж ┘Е╪│╪з╪▒╪з╪к ╪к╪╣┘Д┘К┘Е╪з╪к ╪и╪▒┘Е╪м┘К╪й ┘Е╪к╪╣╪п╪п╪й ╪и╪│┘З┘И┘Д╪й ┘Б┘К ┘В╪з┘Д╪и ┘И╪з╪н╪п.
|
||||
|
||||
╪╣┘Ж╪п┘Е╪з ┘К╪н╪к┘И┘К ╪з┘Д┘Е╪╣╪з┘Д╪м ╪з┘Д┘Д╪║┘И┘К ╪╣┘Д┘Й ┘В┘И╪з┘Д╪и ┘Е╪к╪╣╪п╪п╪й╪М ╪│╪к┘Г┘И┘Ж `tokenizer.chat_template dict`╪М ╪н┘К╪л ┘К┘Г┘И┘Ж ┘Г┘Д ┘Е┘Б╪к╪з╪н ┘З┘И ╪з╪│┘Е ┘В╪з┘Д╪и. ┘К╪н╪к┘И┘К ╪г╪│┘Д┘И╪и `apply_chat_template` ╪╣┘Д┘Й ┘Е╪╣╪з┘Д╪м╪й ╪о╪з╪╡╪й ┘Д╪г╪│┘Е╪з╪б ┘В┘И╪з┘Д╪и ┘Е┘П╪╣┘К┘Ж╪й: ╪╣┘Д┘Й ┘И╪м┘З ╪з┘Д╪к╪н╪п┘К╪п╪М ╪│┘К╪и╪н╪л ╪╣┘Ж ┘В╪з┘Д╪и ╪и╪з╪│┘Е `default` ┘Б┘К ┘Е╪╣╪╕┘Е ╪з┘Д╪н╪з┘Д╪з╪к╪М ┘И╪│┘К┘П╪л┘К╪▒ ╪о╪╖╪г┘Л ╪е╪░╪з ┘Д┘Е ┘К╪к┘Е┘Г┘Ж ┘Е┘Ж ╪з┘Д╪╣╪л┘И╪▒ ╪╣┘Д┘Й ┘И╪з╪н╪п. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ╪е╪░╪з ┘Г╪з┘Ж ┘З┘Ж╪з┘Г ┘В╪з┘Д╪и ╪и╪з╪│┘Е `tool_use` ╪╣┘Ж╪п┘Е╪з ┘В╪з┘Е ╪з┘Д┘Е╪│╪к╪о╪п┘Е ╪и╪к┘Е╪▒┘К╪▒ ┘И╪│┘К╪╖╪й `tools`╪М ┘Б╪│┘К╪│╪к╪о╪п┘Е ┘З╪░╪з ╪з┘Д┘В╪з┘Д╪и ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г. ┘Д┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й ┘В┘И╪з┘Д╪и ╪и╪г╪│┘Е╪з╪б ╪г╪о╪▒┘Й╪М ┘Е╪▒╪▒ ╪з╪│┘Е ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪░┘К ╪к┘П╪▒┘К╪п┘З ╪е┘Д┘Й ┘И╪│┘К╪╖╪й `chat_template` ┘Д┘А `apply_chat_template()`.
|
||||
|
||||
┘Ж╪м╪п ╪г┘Ж ┘З╪░╪з ┘В╪п ┘К┘Г┘И┘Ж ┘Е┘П╪▒╪и┘Г┘Л╪з ╪и╪╣╪╢ ╪з┘Д╪┤┘К╪б ┘Д┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж - ┘Д╪░┘Д┘Г ╪е╪░╪з ┘Г┘Ж╪к ╪к┘Г╪к╪и ┘В╪з┘Д╪и┘Л╪з ╪и┘Ж┘Б╪│┘Г╪М ┘Б┘Ж┘Ж╪╡╪н┘Г ╪и┘Е╪н╪з┘И┘Д╪й ┘И╪╢╪╣┘З ┘Г┘Д┘З ┘Б┘К ┘В╪з┘Д╪и ┘И╪з╪н╪п ╪н┘К╪л┘Е╪з ╪г┘Е┘Г┘Ж!
|
||||
|
||||
## ┘Е╪з ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪░┘К ┘К╪м╪и ╪г┘Ж ╪г╪│╪к╪о╪п┘Е┘З╪Я
|
||||
|
||||
╪╣┘Ж╪п ╪к╪╣┘К┘К┘Ж ┘В╪з┘Д╪и ┘Д┘Ж┘Е┘И╪░╪м ╪к┘Е ╪к╪п╪▒┘К╪и┘З ╪и╪з┘Д┘Б╪╣┘Д ╪╣┘Д┘Й ╪з┘Д╪п╪▒╪п╪┤╪й╪М ┘К╪м╪и ╪з┘Д╪к╪г┘Г╪п ┘Е┘Ж ╪г┘Ж ╪з┘Д┘В╪з┘Д╪и ┘К╪к╪╖╪з╪и┘В ╪к┘Е╪з┘Е┘Л╪з ┘Е╪╣ ╪к┘Ж╪│┘К┘В ╪з┘Д╪▒╪│╪з┘Д╪й ╪з┘Д╪░┘К ╪┤╪з┘З╪п┘З ╪з┘Д┘Ж┘Е┘И╪░╪м ╪г╪л┘Ж╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и╪М ┘И╪е┘Д╪з ┘Б┘Е┘Ж ╪з┘Д┘Е╪н╪к┘Е┘Д ╪г┘Ж ╪к┘И╪з╪м┘З ╪к╪п┘З┘И╪▒┘Л╪з ┘Б┘К ╪з┘Д╪г╪п╪з╪б. ┘З╪░╪з ╪╡╪н┘К╪н ╪н╪к┘Й ╪е╪░╪з ┘Г┘Ж╪к ╪к╪п╪▒╪и ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪┤┘Г┘Д ╪е╪╢╪з┘Б┘К - ┘Б┘Е┘Ж ╪з┘Д┘Е╪н╪к┘Е┘Д ╪г┘Ж ╪к╪н╪╡┘Д ╪╣┘Д┘Й ╪г┘Б╪╢┘Д ╪г╪п╪з╪б ╪е╪░╪з ┘В┘Е╪к ╪и╪е╪и┘В╪з╪б ╪▒┘Е┘И╪▓ ╪з┘Д╪п╪▒╪п╪┤╪й ╪л╪з╪и╪к╪й. ┘К┘П╪┤╪и┘З ┘З╪░╪з ╪е┘Д┘Й ╪н╪п ┘Г╪и┘К╪▒ ╪╣┘Е┘Д┘К╪й ╪з┘Д╪к╪м╪▓╪ж╪й - ┘Б╪г┘Ж╪к ╪к╪н╪╡┘Д ╪и╪┤┘Г┘Д ╪╣╪з┘Е ╪╣┘Д┘Й ╪г┘Б╪╢┘Д ╪г╪п╪з╪б ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д ╪г┘И ╪з┘Д╪╢╪и╪╖ ╪з┘Д╪п┘В┘К┘В ╪╣┘Ж╪п┘Е╪з ╪к╪к╪╖╪з╪и┘В ╪и╪п┘В╪й ┘Е╪╣ ╪з┘Д╪к╪м╪▓╪ж╪й ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ╪г╪л┘Ж╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и.
|
||||
|
||||
┘Е┘Ж ┘Ж╪з╪н┘К╪й ╪г╪о╪▒┘Й╪М ╪е╪░╪з ┘Г┘Ж╪к ╪к┘П╪п╪▒┘С╪и ┘Ж┘Е┘И╪░╪м┘Л╪з ┘Е┘Ж ╪з┘Д╪и╪п╪з┘К╪й╪М ╪г┘И ╪к┘В┘И┘Е ╪и╪╢╪и╪╖ ╪п┘В┘К┘В ┘Д┘Ж┘Е┘И╪░╪м ┘Д╪║╪й ╪г╪│╪з╪│┘К ┘Д┘Д╪п╪▒╪п╪┤╪й╪М ┘Д╪п┘К┘Г ╪н╪▒┘К╪й ╪з╪о╪к┘К╪з╪▒ ┘В╪з┘Д╪и ┘Е┘Ж╪з╪│╪и! ╪к╪к┘Е╪к╪╣ LLMs ╪и╪з┘Д╪░┘Г╪з╪б ╪з┘Д┘Г╪з┘Б┘К ┘Д┘Д╪к╪╣╪з┘Е┘Д ┘Е╪╣ ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪к┘Ж╪│┘К┘В╪з╪к ╪з┘Д╪е╪п╪о╪з┘Д ╪з┘Д┘Е╪о╪к┘Д┘Б╪й. ╪г╪н╪п ╪з┘Д╪о┘К╪з╪▒╪з╪к ╪з┘Д╪┤╪з╪ж╪╣╪й ┘З┘И ╪к┘Ж╪│┘К┘В "ChatML"╪М ┘И┘З┘И ╪о┘К╪з╪▒ ╪м┘К╪п ┘И┘Е╪▒┘Ж ┘Д┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪н╪з┘Д╪з╪к ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е. ┘К╪и╪п┘И ┘Г╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```
|
||||
{%- for message in messages %}
|
||||
{{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}
|
||||
{%- endfor %}
|
||||
```
|
||||
|
||||
╪е╪░╪з ╪г╪╣╪м╪и┘Г ┘З╪░╪з╪М ┘Б╪е┘Д┘К┘Г ┘Ж╪│╪о╪й ╪м╪з┘З╪▓╪й ┘Д┘И╪╢╪╣┘З╪з ┘Б┘К ┘Г┘И╪п┘Г. ┘К╪к╪╢┘Е┘Ж ╪з┘Д╪о╪╖ ╪з┘Д┘Е┘Б╪▒╪п ╪г┘К╪╢┘Л╪з ╪п╪╣┘Е┘Л╪з ┘Е┘Б┘К╪п┘Л╪з [┘Д╪е╪▒╪┤╪з╪п╪з╪к ╪з┘Д╪к┘И┘Д┘К╪п](#what-are-generation-prompts)╪М ┘И┘Д┘Г┘Ж ┘Д╪з╪н╪╕ ╪г┘Ж┘З ┘Д╪з ┘К╪╢┘К┘Б ╪▒┘Е┘И╪▓ BOS ╪г┘И EOS! ╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Г ┘К╪к┘И┘В╪╣ ┘З╪░┘З ╪з┘Д╪▒┘Е┘И╪▓╪М ┘Б┘Д┘Ж ┘К╪к┘Е ╪е╪╢╪з┘Б╪к┘З╪з ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪и┘И╪з╪│╪╖╪й "apply_chat_template" - ╪и┘Е╪╣┘Ж┘Й ╪в╪о╪▒╪М ╪│┘К╪к┘Е ╪к╪м╪▓╪ж╪й ╪з┘Д┘Ж╪╡ ╪и╪з╪│╪к╪о╪п╪з┘Е "add_special_tokens=False". ┘З╪░╪з ┘Д╪к╪м┘Ж╪и ╪з┘Д╪к╪╣╪з╪▒╪╢╪з╪к ╪з┘Д┘Е╪н╪к┘Е┘Д╪й ╪и┘К┘Ж ╪з┘Д┘В╪з┘Д╪и ┘И┘Е┘Ж╪╖┘В "add_special_tokens". ╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Г ┘К╪к┘И┘В╪╣ ╪▒┘Е┘И╪▓┘Л╪з ╪о╪з╪╡╪й╪М ┘Б╪к╪г┘Г╪п ┘Е┘Ж ╪е╪╢╪з┘Б╪к┘З╪з ╪е┘Д┘Й ╪з┘Д┘В╪з┘Д╪и!
|
||||
|
||||
```python
|
||||
tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
|
||||
```
|
||||
|
||||
┘К┘П╪н┘К╪╖ ┘З╪░╪з ╪з┘Д┘В╪з┘Д╪и ┘Г┘Д ╪▒╪│╪з┘Д╪й ╪и┘К┘Ж ╪з┘Д╪▒┘Е╪▓┘К┘Ж "<|im_start|>" ┘И "<|im_end|>"╪М ┘И┘К┘Г╪к╪и ╪и╪и╪│╪з╪╖╪й ╪з┘Д╪п┘И╪▒ ┘Г╪│┘Д╪│┘Д╪й ┘Ж╪╡┘К╪й╪М ┘Е┘Е╪з ┘К╪│┘Е╪н ╪и╪з┘Д┘Е╪▒┘И┘Ж╪й ┘Б┘К ╪з┘Д╪г╪п┘И╪з╪▒ ╪з┘Д╪к┘К ╪к╪к╪п╪▒╪и ╪╣┘Д┘К┘З╪з. ┘К╪и╪п┘И ╪з┘Д┘Ж╪з╪к╪м ┘Г┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```text
|
||||
<|im_start|>system
|
||||
You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|>
|
||||
<|im_start|>user
|
||||
How are you?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
I'm doing great!<|im_end|>
|
||||
```
|
||||
|
||||
╪к╪╣╪п ╪г╪п┘И╪з╪▒ "user" ┘И "system" ┘И "assistant" ┘З┘К ╪з┘Д╪г╪п┘И╪з╪▒ ╪з┘Д┘В┘К╪з╪│┘К╪й ┘Д┘Д╪п╪▒╪п╪┤╪й╪М ┘И┘Ж┘И╪╡┘К ╪и╪з╪│╪к╪о╪п╪з┘Е┘З╪з ╪╣┘Ж╪п┘Е╪з ┘К┘Г┘И┘Ж ╪░┘Д┘Г ┘Е┘Ж╪╖┘В┘К┘Л╪з╪М ╪о╪з╪╡╪й ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪г┘Ж ┘К╪╣┘Е┘Д ┘Ж┘Е┘И╪░╪м┘Г ╪и╪┤┘Г┘Д ╪м┘К╪п ┘Е╪╣ [`TextGenerationPipeline`]. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘Б╪г┘Ж╪к ┘Д╪│╪к ┘Е┘В┘К╪п┘Л╪з ╪и┘З╪░┘З ╪з┘Д╪г╪п┘И╪з╪▒ - ┘Б╪е┘Ж ╪з┘Д┘В┘И╪з┘Д╪и ┘Е╪▒┘Ж╪й ┘Д┘Д╪║╪з┘К╪й╪М ┘И┘К┘Е┘Г┘Ж ╪г┘Ж ╪к┘Г┘И┘Ж ╪г┘К ╪│┘Д╪│┘Д╪й ┘Ж╪╡┘К╪й ╪п┘И╪▒┘Л╪з.
|
||||
|
||||
|
||||
## ╪г╪▒┘К╪п ╪е╪╢╪з┘Б╪й ╪и╪╣╪╢ ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й! ┘Г┘К┘Б ╪г╪и╪п╪г╪Я
|
||||
|
||||
╪е╪░╪з ┘Г╪з┘Ж ┘Д╪п┘К┘Г ╪г┘К ┘Ж┘Е╪з╪░╪м ╪п╪▒╪п╪┤╪й╪М ┘Б┘К╪м╪и ╪╣┘Д┘К┘Г ╪к╪╣┘К┘К┘Ж ╪з┘Д╪о╪з╪╡┘К╪й "tokenizer.chat_template" ╪з┘Д╪о╪з╪╡╪й ╪и┘З╪з ┘И╪з╪о╪к╪и╪з╪▒┘З╪з ╪и╪з╪│╪к╪о╪п╪з┘Е [`~PreTrainedTokenizer.apply_chat_template`]╪М ╪л┘Е ╪▒┘Б╪╣ ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К ╪з┘Д┘Е┘П╪н╪п┘С╪л ╪е┘Д┘Й Hub. ┘К┘Ж╪╖╪и┘В ┘З╪░╪з ╪н╪к┘Й ╪е╪░╪з ┘Д┘Е ╪к┘Г┘Ж ┘Е╪з┘Д┘Г ╪з┘Д┘Ж┘Е┘И╪░╪м - ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е ┘Ж┘Е┘И╪░╪м┘Л╪з ╪и┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й ┘Б╪з╪▒╪║╪М ╪г┘И ┘Д╪з ┘К╪▓╪з┘Д ┘К╪│╪к╪о╪п┘Е ┘В╪з┘Д╪и ╪з┘Д┘Б╪ж╪й ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й╪М ┘Б┘К╪▒╪м┘Й ┘Б╪к╪н [╪╖┘Д╪и ╪│╪н╪и](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) ╪е┘Д┘Й ┘Е╪│╪к┘И╪п╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪н╪к┘Й ┘К┘Е┘Г┘Ж ╪к╪╣┘К┘К┘Ж ╪з┘Д╪о╪з╪╡┘К╪й ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н!
|
||||
|
||||
╪и┘Е╪м╪▒╪п ╪к╪╣┘К┘К┘Ж ╪з┘Д╪о╪з╪╡┘К╪й╪М ┘З╪░╪з ┘Г┘Д ╪┤┘К╪б╪М ┘Д┘В╪п ╪з┘Ж╪к┘З┘К╪к! ╪│╪к╪╣┘Е┘Д "tokenizer.apply_chat_template" ╪з┘Д╪в┘Ж ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н ┘Д┘З╪░╪з ╪з┘Д┘Ж┘Е┘И╪░╪м╪М ┘Е┘Е╪з ┘К╪╣┘Ж┘К ╪г┘Ж┘З╪з ┘Е╪п╪╣┘И┘Е╪й ╪г┘К╪╢┘Л╪з ╪и╪┤┘Г┘Д ╪к┘Д┘В╪з╪ж┘К ┘Б┘К ╪г┘Е╪з┘Г┘Ж ┘Е╪л┘Д "TextGenerationPipeline"!
|
||||
|
||||
┘Е┘Ж ╪о┘Д╪з┘Д ╪╢┘Е╪з┘Ж ╪з┘Е╪к┘Д╪з┘Г ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д┘З╪░┘З ╪з┘Д╪о╪з╪╡┘К╪й╪М ┘К┘П┘Е┘Г┘Ж┘Ж╪з ╪з┘Д╪к╪г┘Г╪п ┘Е┘Ж ╪г┘Ж ╪з┘Д┘Е╪м╪к┘Е╪╣ ╪и╪г┘Г┘Е┘Д┘З ┘К╪│╪к╪о╪п┘Е ╪з┘Д┘В┘И╪й ╪з┘Д┘Г╪з┘Е┘Д╪й ┘Д┘Д┘Ж┘Е╪з╪░╪м ┘Е┘Б╪к┘И╪н╪й ╪з┘Д┘Е╪╡╪п╪▒. ┘Д┘В╪п ┘Г╪з┘Ж╪к ╪╣╪п┘Е ╪к╪╖╪з╪и┘В ╪з┘Д╪к┘Ж╪│┘К┘В ╪к╪╖╪з╪▒╪п ╪з┘Д┘Е╪м╪з┘Д ┘И╪г╪╢╪▒╪к ╪з┘Д╪г╪п╪з╪б ╪и╪╡┘Е╪к ┘Д┘Б╪к╪▒╪й ╪╖┘И┘К┘Д╪й ╪м╪п┘Л╪з - ┘Д┘В╪п ╪н╪з┘Ж ╪з┘Д┘И┘В╪к ┘Д┘И╪╢╪╣ ╪н╪п ┘Д┘З╪з!
|
||||
|
||||
## ┘Е╪к┘В╪п┘Е: ┘Ж╪╡╪з╪ж╪н ┘Д┘Г╪к╪з╪и╪й ╪з┘Д┘В┘И╪з┘Д╪и
|
||||
|
||||
<Tip>
|
||||
╪г╪│┘З┘Д ╪╖╪▒┘К┘В╪й ┘Д┘Д╪и╪п╪б ┘Б┘К ┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и Jinja ┘З┘К ╪е┘Д┘В╪з╪б ┘Ж╪╕╪▒╪й ╪╣┘Д┘Й ╪и╪╣╪╢ ╪з┘Д┘В┘И╪з┘Д╪и ╪з┘Д┘Е┘И╪м┘И╪п╪й. ┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е `print(tokenizer.chat_template)` ┘Д╪г┘К ┘Ж┘Е┘И╪░╪м ╪п╪▒╪п╪┤╪й ┘Д┘Е╪╣╪▒┘Б╪й ╪з┘Д┘В╪з┘Д╪и ╪з┘Д╪░┘К ┘К╪│╪к╪о╪п┘Е┘З. ╪и╪┤┘Г┘Д ╪╣╪з┘Е╪М ╪к╪н╪к┘И┘К ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ╪к╪п╪╣┘Е ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ╪╣┘Д┘Й ┘В┘И╪з┘Д╪и ╪г┘Г╪л╪▒ ╪к╪╣┘В┘К╪п┘Л╪з ╪и┘Г╪л┘К╪▒ ┘Е┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪г╪о╪▒┘Й - ┘Д╪░┘Д┘Г ╪╣┘Ж╪п┘Е╪з ╪к╪и╪п╪г ┘Д┘Д╪к┘И╪М ┘Б┘Е┘Ж ╪з┘Д┘Е╪н╪к┘Е┘Д ╪г┘Ж┘З╪з ┘Е╪л╪з┘Д ╪│┘К╪ж ┘Д┘Д╪к╪╣┘Д┘Е ┘Е┘Ж┘З! ┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪е┘Д┘В╪з╪б ┘Ж╪╕╪▒╪й ╪╣┘Д┘Й [┘И╪л╪з╪ж┘В Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪к┘Б╪з╪╡┘К┘Д ╪н┘И┘Д ╪к┘Ж╪│┘К┘В Jinja ╪з┘Д╪╣╪з┘Е ┘И╪к╪▒┘Г┘К╪и┘З.
|
||||
|
||||
</Tip>
|
||||
|
||||
╪к┘П╪╖╪з╪и┘В ┘В┘И╪з┘Д╪и Jinja ┘Б┘К `transformers` ┘В┘И╪з┘Д╪и Jinja ┘Б┘К ╪г┘К ┘Е┘Г╪з┘Ж ╪в╪о╪▒. ╪з┘Д╪┤┘К╪б ╪з┘Д╪▒╪ж┘К╪│┘К ╪з┘Д╪░┘К ┘К╪м╪и ┘Е╪╣╪▒┘Б╪к┘З ┘З┘И ╪г┘Ж ╪│╪м┘Д ╪з┘Д╪п╪▒╪п╪┤╪й ╪│┘К┘Г┘И┘Ж ┘Е╪к╪з╪н┘Л╪з ╪п╪з╪о┘Д ┘В╪з┘Д╪и┘Г ┘Г┘Е╪к╪║┘К╪▒ ┘К╪│┘Е┘Й `messages`. ╪│╪к╪к┘Е┘Г┘Ж ┘Е┘Ж ╪з┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й `messages` ┘Б┘К ┘В╪з┘Д╪и┘Г ╪к┘Е╪з┘Е┘Л╪з ┘Г┘Е╪з ┘К┘Е┘Г┘Ж┘Г ┘Б┘К Python╪М ┘Е┘Е╪з ┘К╪╣┘Ж┘К ╪г┘Ж┘З ┘К┘Е┘Г┘Ж┘Г ╪з┘Д╪к┘Г╪▒╪з╪▒ ╪о┘Д╪з┘Д┘З ╪и╪з╪│╪к╪о╪п╪з┘Е `{% for message in messages %}` ╪г┘И ╪з┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й ╪▒╪│╪з╪ж┘Д ┘Б╪▒╪п┘К╪й ╪и╪з╪│╪к╪о╪п╪з┘Е `{{ messages[0] }}`╪М ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д.
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж╪╡╪з╪ж╪н ╪з┘Д╪к╪з┘Д┘К╪й ┘Д┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и Jinja ┘Ж╪╕┘К┘Б╪й ┘И┘Б╪╣╪з┘Д╪й:
|
||||
|
||||
### ╪е┘В╪к╪╖╪з╪╣ ╪з┘Д┘Е╪│╪з┘Б╪з╪к ╪з┘Д┘Б╪з╪▒╪║╪й
|
||||
|
||||
╪и╪┤┘Г┘Д ╪з┘Б╪к╪▒╪з╪╢┘К╪М ╪│╪к╪╖╪и╪╣ Jinja ╪г┘К ┘Е╪│╪з┘Б╪з╪к ┘Б╪з╪▒╪║╪й ╪к╪г╪к┘К ┘В╪и┘Д ╪г┘И ╪и╪╣╪п ┘Г╪к┘Д╪й. ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К┘Г┘И┘Ж ┘З╪░╪з ┘Е╪┤┘Г┘Д╪й ┘Д┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й╪М ┘И╪з┘Д╪к┘К ╪к╪▒┘К╪п ╪╣╪з╪п╪й┘Л ╪г┘Ж ╪к┘Г┘И┘Ж ╪п┘В┘К┘В╪й ╪м╪п┘Л╪з ┘Е╪╣ ╪з┘Д┘Е╪│╪з┘Б╪з╪к! ┘Д╪к╪м┘Ж╪и ╪░┘Д┘Г╪М ┘Ж┘И╪╡┘К ╪и╪┤╪п╪й ╪и┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и┘Г ╪╣┘Д┘Й ╪з┘Д┘Ж╪н┘И ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```
|
||||
{%- for message in messages %}
|
||||
{{- message['role'] + message['content'] }}
|
||||
{%- endfor %}
|
||||
```
|
||||
|
||||
╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г:
|
||||
|
||||
```
|
||||
{% for message in messages %}
|
||||
{{ message['role'] + message['content'] }}
|
||||
{% endfor %}
|
||||
```
|
||||
|
||||
╪│┘К╪д╪п┘К ╪е╪╢╪з┘Б╪й "-" ╪е┘Д┘Й ╪е╪▓╪з┘Д╪й ╪г┘К ┘Е╪│╪з┘Б╪з╪к ╪к╪г╪к┘К ┘В╪и┘Д ╪з┘Д┘Г╪к┘Д╪й. ┘К╪и╪п┘И ╪з┘Д┘Е╪л╪з┘Д ╪з┘Д╪л╪з┘Ж┘К ╪╣╪з╪п┘К╪й╪М ┘И┘Д┘Г┘Ж ┘В╪п ┘К╪к┘Е ╪к╪╢┘Е┘К┘Ж ╪з┘Д╪│╪╖╪▒ ╪з┘Д╪м╪п┘К╪п ┘И╪з┘Д┘Е╪│╪з┘Б╪й ╪з┘Д╪и╪з╪п╪ж╪й ┘Б┘К ╪з┘Д┘Е╪о╪▒╪м╪з╪к╪М ┘И┘З┘И ╪╣┘Д┘Й ╪з┘Д╪г╪▒╪м╪н ┘Д┘К╪│ ┘Е╪з ╪к┘П╪▒┘К╪п┘З!
|
||||
|
||||
|
||||
### ╪з┘Д┘Е╪к╪║┘К╪▒╪з╪к ╪з┘Д╪о╪з╪╡╪й
|
||||
|
||||
╪п╪з╪о┘Д ┘В╪з┘Д╪и┘Г╪М ╪│┘К┘Г┘И┘Ж ┘Д╪п┘К┘Г ╪н┘В ╪з┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪к╪║┘К╪▒╪з╪к ╪з┘Д╪о╪з╪╡╪й. ╪г┘З┘Е┘З╪з ┘З┘И `messages`╪М ┘И╪з┘Д╪░┘К ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ╪│╪м┘Д ╪з┘Д╪п╪▒╪п╪┤╪й ┘Г┘В╪з╪ж┘Е╪й ┘Е┘Ж ┘В┘И╪з┘Е┘К╪│ ╪з┘Д╪▒╪│╪з╪ж┘Д. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘З┘Ж╪з┘Г ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪к╪║┘К╪▒╪з╪к ╪з┘Д╪г╪о╪▒┘Й. ┘Д┘Ж ┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘Г┘Д ┘Е╪к╪║┘К╪▒ ┘Б┘К ┘Г┘Д ┘В╪з┘Д╪и. ╪з┘Д┘Е╪к╪║┘К╪▒╪з╪к ╪з┘Д╪г┘Г╪л╪▒ ╪┤┘К┘И╪╣┘Л╪з ┘З┘К:
|
||||
|
||||
- `tools` ╪к╪н╪к┘И┘К ╪╣┘Д┘Й ┘В╪з╪ж┘Е╪й ╪и╪з┘Д╪г╪п┘И╪з╪к ╪и╪к┘Ж╪│┘К┘В ┘Е╪о╪╖╪╖ JSON. ╪│╪к┘Г┘И┘Ж `None` ╪г┘И ╪║┘К╪▒ ┘Е┘П╪╣╪▒┘С┘Б╪й ╪е╪░╪з ┘Д┘Е ┘К╪к┘Е ╪к┘Е╪▒┘К╪▒ ╪г┘К ╪г╪п┘И╪з╪к.
|
||||
- `documents` ╪к╪н╪к┘И┘К ╪╣┘Д┘Й ┘В╪з╪ж┘Е╪й ┘Е┘Ж ╪з┘Д┘Е╪│╪к┘Ж╪п╪з╪к ╪и╪з┘Д╪к┘Ж╪│┘К┘В `{"title": "╪з┘Д╪╣┘Ж┘И╪з┘Ж", "contents": "╪з┘Д┘Е╪н╪к┘И┘К╪з╪к"}`╪М ╪к┘П╪│╪к╪о╪п┘Е ┘Д┘Д╪к┘И┘Д┘К╪п ╪з┘Д┘Е┘П╪╣╪▓╪▓ ╪и╪з┘Д╪з╪│╪к╪▒╪м╪з╪╣. ╪│╪к┘Г┘И┘Ж `None` ╪г┘И ╪║┘К╪▒ ┘Е┘П╪╣╪▒┘С┘Б╪й ╪е╪░╪з ┘Д┘Е ┘К╪к┘Е ╪к┘Е╪▒┘К╪▒ ╪г┘К ┘Е╪│╪к┘Ж╪п╪з╪к.
|
||||
- `add_generation_prompt` ┘З┘К ┘В┘К┘Е╪й ┘Е┘Ж╪╖┘В┘К╪й ╪к┘Г┘И┘Ж `True` ╪е╪░╪з ╪╖┘Д╪и ╪з┘Д┘Е╪│╪к╪о╪п┘Е ┘Е┘П╪╖╪з┘Д╪и╪й ╪к┘И┘Д┘К╪п╪М ┘И `False` ╪и╪о┘Д╪з┘Б ╪░┘Д┘Г. ╪е╪░╪з ╪к┘Е ╪к╪╣┘К┘К┘Ж ┘З╪░╪з╪М ┘Б┘К╪м╪и ╪г┘Ж ┘К┘П╪╢┘К┘Б ┘В╪з┘Д╪и┘Г ╪▒╪г╪│ ╪▒╪│╪з┘Д╪й ┘Е╪│╪з╪╣╪п ╪е┘Д┘Й ┘Ж┘З╪з┘К╪й ╪з┘Д┘Е╪н╪з╪п╪л╪й. ╪е╪░╪з ┘Д┘Е ┘К┘Г┘Ж ┘Д╪п┘Й ┘Ж┘Е┘И╪░╪м┘Г ╪▒╪г╪│ ┘Е┘П╪н╪п╪п ┘Д╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪з╪╣╪п╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪к╪м╪з┘З┘Д ┘З╪░╪з ╪з┘Д╪╣┘Д┘Е.
|
||||
- **╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й** ┘Е╪л┘Д `bos_token` ┘И `eos_token`. ┘К╪к┘Е ╪з╪│╪к╪о╪▒╪з╪м┘З╪з ┘Е┘Ж `tokenizer.special_tokens_map`. ╪│╪к╪о╪к┘Д┘Б ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪п┘В┘К┘В╪й ╪з┘Д┘Е╪к╪з╪н╪й ╪п╪з╪о┘Д ┘Г┘Д ┘В╪з┘Д╪и ╪з╪╣╪к┘Е╪з╪п┘Л╪з ╪╣┘Д┘Й ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К ╪з┘Д╪г╪╡┘Д┘К.
|
||||
|
||||
|
||||
<Tip>
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ┘Б┘К ╪з┘Д┘И╪з┘В╪╣ ╪к┘Е╪▒┘К╪▒ ╪г┘К `kwarg` ╪е┘Д┘Й `apply_chat_template`╪М ┘И╪│╪к┘Г┘И┘Ж ┘Е╪к╪з╪н╪й ╪п╪з╪о┘Д ╪з┘Д┘В╪з┘Д╪и ┘Г┘Е╪к╪║┘К╪▒. ╪и╪┤┘Г┘Д ╪╣╪з┘Е╪М ┘Ж┘И╪╡┘К ╪и┘Е╪н╪з┘И┘Д╪й ╪з┘Д╪з┘Д╪к╪▓╪з┘Е ╪и╪з┘Д┘Е╪к╪║┘К╪▒╪з╪к ╪з┘Д╪г╪│╪з╪│┘К╪й ╪з┘Д┘Е╪░┘Г┘И╪▒╪й ╪г╪╣┘Д╪з┘З╪М ┘Д╪г┘Ж ╪░┘Д┘Г ╪│┘К╪м╪╣┘Д ┘Ж┘Е┘И╪░╪м┘Г ╪г┘Г╪л╪▒ ╪╡╪╣┘И╪и╪й ┘Б┘К ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е ╪е╪░╪з ┘Г╪з┘Ж ╪╣┘Д┘Й ╪з┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ┘Г╪к╪з╪и╪й ╪к╪╣┘Д┘К┘Е╪з╪к ╪и╪▒┘Е╪м┘К╪й ┘Е╪о╪╡╪╡╪й ┘Д╪к┘Е╪▒┘К╪▒ `kwargs` ╪о╪з╪╡╪й ╪и╪з┘Д┘Ж┘Е┘И╪░╪м. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘Б┘Ж╪н┘Ж ┘Ж┘П╪п╪▒┘Г ╪г┘Ж ┘З╪░╪з ╪з┘Д┘Е╪м╪з┘Д ┘К╪к╪н╪▒┘Г ╪и╪│╪▒╪╣╪й╪М ┘Д╪░┘Д┘Г ╪е╪░╪з ┘Г╪з┘Ж╪к ┘Д╪п┘К┘Г ╪н╪з┘Д╪й ╪з╪│╪к╪о╪п╪з┘Е ╪м╪п┘К╪п╪й ┘Д╪з ╪к╪к┘Ж╪з╪│╪и ┘Е╪╣ ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д╪г╪│╪з╪│┘К╪й╪М ┘Б┘Д╪з ╪к╪к╪▒╪п╪п ┘Б┘К ╪з╪│╪к╪о╪п╪з┘Е `kwarg` ┘Е╪╣╪з┘Е┘Д ╪м╪п┘К╪п ┘Д┘З╪з! ╪е╪░╪з ╪г╪╡╪и╪н `kwarg` ╪з┘Д┘Е╪╣╪з┘Е┘Д ╪з┘Д╪м╪п┘К╪п ╪┤╪з╪ж╪╣┘Л╪з╪М ┘Б┘В╪п ┘Ж┘В┘И┘Е ╪и╪к╪▒┘В┘К╪к┘З ╪е┘Д┘Й ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д╪г╪│╪з╪│┘К╪й ┘И╪е┘Ж╪┤╪з╪б ┘И╪к┘И╪л┘К┘В ╪з┘Д╪о╪з╪╡ ╪и┘З.
|
||||
|
||||
</Tip>
|
||||
|
||||
### ╪п┘И╪з┘Д ┘В╪з╪и┘Д╪й ┘Д┘Д╪з╪│╪к╪п╪╣╪з╪б
|
||||
|
||||
┘З┘Ж╪з┘Г ╪г┘К╪╢┘Л╪з ┘В╪з╪ж┘Е╪й ┘В╪╡┘К╪▒╪й ┘Е┘Ж ╪з┘Д╪п┘И╪з┘Д ╪з┘Д┘В╪з╪и┘Д╪й ┘Д┘Д╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д┘Е╪к╪з╪н╪й ┘Д┘Г ╪п╪з╪о┘Д ┘В┘И╪з┘Д╪и┘Г. ┘З╪░┘З ┘З┘К:
|
||||
|
||||
- `raise_exception(msg)`: ╪к┘П╪л┘К╪▒ `TemplateException`. ┘З╪░╪з ┘Е┘Б┘К╪п ┘Д╪к╪╡╪н┘К╪н ╪з┘Д╪г╪о╪╖╪з╪б╪М ┘И┘Д╪е╪о╪и╪з╪▒ ╪з┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ╪╣┘Ж╪п┘Е╪з ┘К┘Б╪╣┘Д┘И┘Ж ╪┤┘К╪ж┘Л╪з ┘Д╪з ┘К╪п╪╣┘Е┘З ┘В╪з┘Д╪и┘Г.
|
||||
- `strftime_now(format_str)`: ╪к┘П┘Г╪з┘Б╪ж `datetime.now().strftime(format_str)` ┘Б┘К Python. ┘К┘П╪│╪к╪о╪п┘Е ┘З╪░╪з ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪з┘Д╪к╪з╪▒┘К╪о/╪з┘Д┘И┘В╪к ╪з┘Д╪н╪з┘Д┘К ╪и╪к┘Ж╪│┘К┘В ┘Е┘П╪н╪п╪п╪М ┘И╪з┘Д╪░┘К ┘К╪к┘Е ╪к╪╢┘Е┘К┘Ж┘З ╪г╪н┘К╪з┘Ж┘Л╪з ┘Б┘К ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Ж╪╕╪з┘Е.
|
||||
|
||||
### ╪з┘Д╪к┘И╪з┘Б┘В ┘Е╪╣ Jinja ╪║┘К╪▒ Python
|
||||
|
||||
┘З┘Ж╪з┘Г ╪к╪╖╪и┘К┘В╪з╪к ┘Е╪к╪╣╪п╪п╪й ┘Д┘А Jinja ╪и┘Д╪║╪з╪к ┘Е╪о╪к┘Д┘Б╪й. ╪╣╪з╪п╪й ┘Е╪з ┘К┘Г┘И┘Ж ┘Д┘З╪з ┘Ж┘Б╪│ ╪з┘Д╪к╪▒┘Г┘К╪и╪М ┘И┘Д┘Г┘Ж ╪з┘Д╪з╪о╪к┘Д╪з┘Б ╪з┘Д╪▒╪ж┘К╪│┘К ┘З┘И ╪г┘Ж┘З ╪╣┘Ж╪п ┘Г╪к╪з╪и╪й ┘В╪з┘Д╪и┘Л╪з ┘Б┘К Python╪М ┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е ╪г╪│╪з┘Д┘К╪и Python╪М ┘Е╪л┘Д ".lower()" ╪╣┘Д┘Й ╪з┘Д╪│┘Д╪з╪│┘Д ╪г┘И ".items()" ╪╣┘Д┘Й ╪з┘Д┘В┘И╪з┘Е┘К╪│. ╪│┘К╪д╪п┘К ┘З╪░╪з ╪е┘Д┘Й ┘Г╪│╪▒ ╪е╪░╪з ╪н╪з┘И┘Д ╪┤╪о╪╡ ┘Е╪з ╪з╪│╪к╪о╪п╪з┘Е ┘В╪з┘Д╪и┘Г ┘Б┘К ╪к┘Ж┘Б┘К╪░ ╪║┘К╪▒ Python ┘Д┘А Jinja. ╪к╪╣╪п ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪║┘К╪▒ Python ╪┤╪з╪ж╪╣╪й ╪и╪┤┘Г┘Д ╪о╪з╪╡ ┘Б┘К ╪и┘К╪ж╪з╪к ╪з┘Д┘Ж╪┤╪▒╪М ╪н┘К╪л ╪к╪╣╪п JS ┘И Rust ╪┤╪з╪ж╪╣╪й ╪м╪п┘Л╪з.
|
||||
|
||||
┘Д╪з ╪к┘В┘Д┘В╪М ╪╣┘Д┘Й ╪з┘Д╪▒╪║┘Е ┘Е┘Ж ╪░┘Д┘Г! ┘З┘Ж╪з┘Г ╪и╪╣╪╢ ╪з┘Д╪к╪║┘К┘К╪▒╪з╪к ╪з┘Д╪и╪│┘К╪╖╪й ╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж┘Г ╪е╪м╪▒╪з╪д┘З╪з ╪╣┘Д┘Й ┘В┘И╪з┘Д╪и┘Г ┘Д╪╢┘Е╪з┘Ж ╪к┘И╪з┘Б┘В┘З╪з ╪╣╪и╪▒ ╪м┘Е┘К╪╣ ╪к╪╖╪и┘К┘В╪з╪к Jinja:
|
||||
|
||||
- ╪з╪│╪к╪и╪п┘Д ╪г╪│╪з┘Д┘К╪и Python ╪и┘Е╪▒╪┤╪н╪з╪к Jinja. ╪╣╪з╪п╪й ┘Е╪з ┘К┘Г┘И┘Ж ┘Д┘З╪з ┘Ж┘Б╪│ ╪з┘Д╪з╪│┘Е╪М ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘К╪╡╪и╪н "string.lower()" ╪╣╪и╪з╪▒╪й ╪╣┘Ж "string|lower"╪М ┘И┘К╪╡╪и╪н "dict.items()" ╪╣╪и╪з╪▒╪й ╪╣┘Ж "dict|items". ╪г╪н╪п ╪з┘Д╪к╪║┘К┘К╪▒╪з╪к ╪з┘Д┘Е┘Д╪н┘И╪╕╪й ┘З┘И ╪г┘Ж "string.strip()" ┘К╪╡╪и╪н "string|trim". ╪▒╪з╪м╪╣ [┘В╪з╪ж┘Е╪й ╪з┘Д┘Е╪▒╪┤╪н╪з╪к ╪з┘Д┘Е╪п┘Е╪м╪й](https://jinja.palletsprojects.com/en/3.1.x/templates/#builtin-filters) ┘Б┘К ┘И╪л╪з╪ж┘В Jinja ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к.
|
||||
- ╪з╪│╪к╪и╪п┘Д "True" ┘И "False" ┘И "None"╪М ┘И┘З┘К ╪о╪з╪╡╪й ╪и┘А Python╪М ╪и┘А "true" ┘И "false" ┘И "none".
|
||||
- ┘В╪п ┘К╪д╪п┘К ╪╣╪▒╪╢ ┘В╪з┘Е┘И╪│ ╪г┘И ┘В╪з╪ж┘Е╪й ┘Е╪и╪з╪┤╪▒╪й ╪е┘Д┘Й ┘Ж╪к╪з╪ж╪м ┘Е╪о╪к┘Д┘Б╪й ┘Б┘К ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д╪г╪о╪▒┘Й (╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘В╪п ╪к╪к╪║┘К╪▒ ┘Е╪п╪оя╗╗╪к ╪з┘Д╪│┘Д╪│┘Д╪й ╪з┘Д┘Ж╪╡┘К╪й ┘Е┘Ж ╪╣┘Д╪з┘Е╪з╪к ╪з┘В╪к╪и╪з╪│ ┘Е┘Б╪▒╪п╪й ' ╪е┘Д┘Й ╪╣┘Д╪з┘Е╪з╪к ╪з┘В╪к╪и╪з╪│ ┘Е╪▓╪п┘И╪м╪й "). ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К╪│╪з╪╣╪п ╪е╪╢╪з┘Б╪й "tojson" ┘Б┘К ╪╢┘Е╪з┘Ж ╪з┘Д╪з╪к╪│╪з┘В ┘З┘Ж╪з.
|
||||
|
||||
## ┘Г╪к╪з╪и╪й ┘Е╪╖╪з┘Д╪и╪з╪к ╪з┘Д╪к┘И┘Д┘К╪п
|
||||
┘Д┘В╪п ╪░┘Г╪▒┘Ж╪з ╪г╪╣┘Д╪з┘З ╪г┘Ж add_generation_prompt ┘З┘И ┘Е╪к╪║┘К╪▒ ╪о╪з╪╡ ┘К┘Е┘Г┘Ж ╪з┘Д┘И╪╡┘И┘Д ╪е┘Д┘К┘З ╪п╪з╪о┘Д ┘В╪з┘Д╪и┘Г╪М ┘И┘К╪к╪н┘Г┘Е ┘Б┘К┘З ╪з┘Д┘Е╪│╪к╪о╪п┘Е ┘Е┘Ж ╪о┘Д╪з┘Д ╪к╪╣┘К┘К┘Ж ┘Е╪╣╪з┘Е┘Д add_generation_prompt. ╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Г ┘К╪к┘И┘В╪╣ ╪╣┘Ж┘И╪з┘Ж ┘Д╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪з╪╣╪п╪М ┘Б┘К╪м╪и ╪г┘Ж ┘К╪п╪╣┘Е ┘В╪з┘Д╪и┘Г ╪е╪╢╪з┘Б╪й ╪з┘Д╪╣┘Ж┘И╪з┘Ж ╪╣┘Ж╪п ╪к╪╣┘К┘К┘Ж add_generation_prompt.
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Е╪л╪з┘Д ╪╣┘Д┘Й ┘В╪з┘Д╪и ┘К┘П┘Ж╪│┘С┘В ╪з┘Д╪▒╪│╪з╪ж┘Д ╪и╪г╪│┘Д┘И╪и ChatML╪М ┘Е╪╣ ╪п╪╣┘Е ┘Е┘П╪╖╪з┘Д╪и╪й ╪з┘Д╪к┘И┘Д┘К╪п:
|
||||
|
||||
```text
|
||||
{{- bos_token }}
|
||||
{%- for message in messages %}
|
||||
{{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- endif %}
|
||||
```
|
||||
╪│┘К╪╣╪к┘Е╪п ╪з┘Д┘Е╪н╪к┘И┘Й ╪з┘Д╪п┘В┘К┘В ┘Д╪╣┘Ж┘И╪з┘Ж ╪з┘Д┘Е╪│╪з╪╣╪п ╪╣┘Д┘Й ┘Ж┘Е┘И╪░╪м┘Г ╪з┘Д┘Е┘П╪н╪п╪п╪М ┘И┘Д┘Г┘Ж ┘К╪м╪и ╪г┘Ж ┘К┘Г┘И┘Ж ╪п╪з╪ж┘Е┘Л╪з ╪з┘Д╪│┘Д╪│┘Д╪й ╪з┘Д┘Ж╪╡┘К╪й ╪з┘Д╪к┘К ╪к┘П┘Е╪л┘Д ╪и╪п╪з┘К╪й ╪▒╪│╪з┘Д╪й ╪з┘Д┘Е╪│╪з╪╣╪п╪М ╪и╪н┘К╪л ╪е╪░╪з ┘В╪з┘Е ╪з┘Д┘Е╪│╪к╪о╪п┘Е ╪и╪к╪╖╪и┘К┘В ┘В╪з┘Д╪и┘Г ╪и╪з╪│╪к╪о╪п╪з┘Е add_generation_prompt=True ╪л┘Е ┘В╪з┘Е ╪и╪к┘И┘Д┘К╪п ┘Ж╪╡╪М ╪│┘К┘Г╪к╪и ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з╪│╪к╪м╪з╪и╪й ╪з┘Д┘Е╪│╪з╪╣╪п. ┘Д╪з╪н╪╕ ╪г┘К╪╢┘Л╪з ╪г┘Ж ╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д╪з ╪к╪н╪к╪з╪м ╪е┘Д┘Й ┘Е┘П╪╖╪з┘Д╪и╪й ╪к┘И┘Д┘К╪п╪М ┘Д╪г┘Ж ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪з╪╣╪п ╪к╪и╪п╪г ╪п╪з╪ж┘Е┘Л╪з ┘Б┘И╪▒┘Л╪з ╪и╪╣╪п ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪к╪о╪п┘Е. ┘З╪░╪з ╪┤╪з╪ж╪╣ ╪и╪┤┘Г┘Д ╪о╪з╪╡ ┘Д┘Ж┘Е╪з╪░╪м LLaMA ┘И Mistral╪М ╪н┘К╪л ╪к╪и╪п╪г ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪з╪╣╪п ┘Б┘И╪▒┘Л╪з ╪и╪╣╪п ╪▒┘Е╪▓ [/INST] ╪з┘Д╪░┘К ┘К┘Ж┘З┘К ╪▒╪│╪з╪ж┘Д ╪з┘Д┘Е╪│╪к╪о╪п┘Е. ┘Б┘К ┘З╪░┘З ╪з┘Д╪н╪з┘Д╪з╪к╪М ┘К┘Е┘Г┘Ж ┘Д┘Д┘В╪з┘Д╪и ╪к╪м╪з┘З┘Д ┘Е╪╣╪з┘Е┘Д add_generation_prompt.
|
||||
|
||||
┘Е┘П╪╖╪з┘Д╪и╪з╪к ╪з┘Д╪к┘И┘Д┘К╪п ┘Е┘П┘З┘Е╪й! ╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Г ┘К╪к╪╖┘Д╪и ┘Е┘П╪╖╪з┘Д╪и╪й ╪к┘И┘Д┘К╪п ┘И┘Д┘Г┘Ж┘З╪з ╪║┘К╪▒ ┘Е┘П╪╣┘К┘С┘Ж╪й ┘Б┘К ╪з┘Д┘В╪з┘Д╪и╪М ┘Б┘Е┘Ж ╪з┘Д┘Е┘П╪н╪к┘Е┘Д ╪г┘Ж ╪к╪к╪п┘З┘И╪▒ ╪╣┘Е┘Д┘К╪з╪к ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪┤╪п╪й╪М ╪г┘И ┘В╪п ┘К┘П╪╕┘З╪▒ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪│┘Д┘И┘Г┘Л╪з ╪║┘К╪▒ ╪╣╪з╪п┘К ┘Е╪л┘Д ┘Е╪к╪з╪и╪╣╪й ╪▒╪│╪з┘Д╪й ╪з┘Д┘Е╪│╪к╪о╪п┘Е ╪з┘Д╪г╪о┘К╪▒╪й!
|
||||
|
||||
### ┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и ╪г┘Г╪и╪▒ ┘И╪к╪╡╪н┘К╪н┘З╪з
|
||||
╪╣┘Ж╪п┘Е╪з ╪к┘Е ╪к┘В╪п┘К┘Е ┘З╪░┘З ╪з┘Д┘Е┘К╪▓╪й╪М ┘Г╪з┘Ж╪к ┘Е╪╣╪╕┘Е ╪з┘Д┘В┘И╪з┘Д╪и ╪╡╪║┘К╪▒╪й ╪м╪п┘Л╪з╪М ╪г┘К ┘Е╪з ┘К┘П╪╣╪з╪п┘Д ┘Ж╪╡ ╪и╪▒┘Е╪м┘К "┘Е┘Ж ╪│╪╖╪▒ ┘И╪з╪н╪п" ┘Б┘К Jinja. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘Е╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘И╪з┘Д┘Е┘К╪▓╪з╪к ╪з┘Д╪м╪п┘К╪п╪й ┘Е╪л┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ┘И RAG╪М ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К╪╡┘Д ╪╖┘И┘Д ╪и╪╣╪╢ ╪з┘Д┘В┘И╪з┘Д╪и ╪е┘Д┘Й 100 ╪│╪╖╪▒ ╪г┘И ╪г┘Г╪л╪▒. ╪╣┘Ж╪п ┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и ┘Г┘З╪░┘З╪М ┘Е┘Ж ╪з┘Д╪м┘К╪п ┘Г╪к╪з╪и╪к┘З╪з ┘Б┘К ┘Е┘Д┘Б ┘Е┘П┘Ж┘Б╪╡┘Д╪М ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е┘П╪н╪▒╪▒ ┘Ж╪╡┘И╪╡. ┘К┘Е┘Г┘Ж┘Г ╪и╪│┘З┘И┘Д╪й ╪з╪│╪к╪о╪▒╪з╪м ┘В╪з┘Д╪и ╪п╪▒╪п╪┤╪й ╪е┘Д┘Й ┘Е┘Д┘Б:
|
||||
|
||||
```python
|
||||
open("template.jinja", "w").write(tokenizer.chat_template)
|
||||
```
|
||||
╪г┘И ╪к╪н┘Е┘К┘Д ╪з┘Д┘В╪з┘Д╪и ╪з┘Д┘Е┘П╪н╪▒╪▒ ┘Е╪▒╪й ╪г╪о╪▒┘Й ╪е┘Д┘Й ╪з┘Д┘Е╪╣╪з┘Д╪м ╪з┘Д┘Д╪║┘И┘К:
|
||||
|
||||
```python
|
||||
tokenizer.chat_template = open("template.jinja").read()
|
||||
```
|
||||
┘Г┘Е┘К╪▓╪й ╪е╪╢╪з┘Б┘К╪й╪М ╪╣┘Ж╪п┘Е╪з ╪к┘Г╪к╪и ┘В╪з┘Д╪и┘Л╪з ╪╖┘И┘К┘Д╪з┘Л ┘Е╪к╪╣╪п╪п ╪з┘Д╪г╪│╪╖╪▒ ┘Б┘К ┘Е┘Д┘Б ┘Е┘П┘Ж┘Б╪╡┘Д╪М ╪│╪к╪к┘И╪з┘Б┘В ╪г╪▒┘В╪з┘Е ╪з┘Д╪г╪│╪╖╪▒ ┘Б┘К ┘З╪░╪з ╪з┘Д┘Е┘Д┘Б ╪к┘Е╪з┘Е┘Л╪з ┘Е╪╣ ╪г╪▒┘В╪з┘Е ╪з┘Д╪г╪│╪╖╪▒ ┘Б┘К ╪г╪о╪╖╪з╪б ╪к╪н┘Д┘К┘Д ╪з┘Д┘В╪з┘Д╪и ╪г┘И ╪к┘Ж┘Б┘К╪░┘З. ╪│┘К┘П╪│┘З┘С┘Д ┘З╪░╪з ┘Г╪л┘К╪▒┘Л╪з ╪к╪н╪п┘К╪п ┘Е┘Г╪з┘Ж ╪з┘Д┘Е╪┤┘Г┘Д╪з╪к.
|
||||
|
||||
### ┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и ┘Д┘Д╪г╪п┘И╪з╪к
|
||||
╪╣┘Д┘Й ╪з┘Д╪▒╪║┘Е ┘Е┘Ж ╪г┘Ж ┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘Д╪з ╪к┘Б╪▒╪╢ ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪к╪╖╪и┘К┘В╪з╪к ┘Е┘П╪н╪п╪п╪й ┘Д┘Д╪г╪п┘И╪з╪к (╪г┘И ┘Д╪г┘К ╪┤┘К╪б ╪н┘В┘Л╪з)╪М ┘Б╪е┘Ж┘Ж╪з ┘Ж┘И╪╡┘К ┘Е╪д┘Д┘Б┘К ╪з┘Д┘В┘И╪з┘Д╪и ╪и┘Е╪н╪з┘И┘Д╪й ╪з┘Д╪з┘Д╪к╪▓╪з┘Е ╪и┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪к╪╖╪и┘К┘В╪з╪к ┘В┘К╪з╪│┘К╪й ╪н┘К╪л┘Е╪з ╪г┘Е┘Г┘Ж. ╪з┘Д┘З╪п┘Б ╪з┘Д┘Ж┘З╪з╪ж┘К ┘Д┘В┘И╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ┘З┘И ╪з┘Д╪│┘Е╪з╪н ╪и┘Ж┘В┘Д ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪╣╪и╪▒ ╪з┘Д┘Ж┘Е╪з╪░╪м╪М ┘Д╪░╪з ┘Б╪е┘Ж ╪з┘Д╪з┘Ж╪н╪▒╪з┘Б ╪╣┘Ж ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д┘В┘К╪з╪│┘К╪й ┘К╪╣┘Ж┘К ╪г┘Ж ╪з┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ╪│┘К╪╢╪╖╪▒┘И┘Ж ╪е┘Д┘Й ┘Г╪к╪з╪и╪й ╪к╪╣┘Д┘К┘Е╪з╪к ╪и╪▒┘Е╪м┘К╪й ┘Е╪о╪╡╪╡╪й ┘Д╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ┘Е╪╣ ┘Ж┘Е┘И╪░╪м┘Г. ┘Б┘К ╪и╪╣╪╢ ╪з┘Д╪г╪н┘К╪з┘Ж ┘К┘Г┘И┘Ж ╪░┘Д┘Г ╪г┘Е╪▒┘Л╪з ┘Д╪з ┘Е┘Б╪▒ ┘Е┘Ж┘З╪М ┘И┘Д┘Г┘Ж ╪║╪з┘Д╪и┘Л╪з ┘Е╪з ┘К┘Г┘И┘Ж ┘Е┘Ж ╪з┘Д┘Е┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д┘В┘К╪з╪│┘К╪й ┘Е┘Ж ╪о┘Д╪з┘Д ╪з╪│╪к╪о╪п╪з┘Е ┘В┘И╪з┘Д╪и ╪░┘Г┘К╪й!
|
||||
|
||||
╪г╪п┘Ж╪з┘З╪М ╪│┘Ж┘П╪п╪▒╪м ╪╣┘Ж╪з╪╡╪▒ ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к ╪з┘Д┘В┘К╪з╪│┘К╪й╪М ┘И┘Ж┘В╪п┘Е ┘Ж╪╡╪з╪ж╪н ╪н┘И┘Д ┘Г╪к╪з╪и╪й ┘В┘И╪з┘Д╪и ╪│╪к╪╣┘Е┘Д ╪и╪┤┘Г┘Д ╪м┘К╪п ┘Е╪╣┘З╪з.
|
||||
|
||||
#### ╪к╪╣╪▒┘К┘Б╪з╪к ╪з┘Д╪г╪п┘И╪з╪к
|
||||
┘К╪м╪и ╪г┘Ж ┘К╪к┘И┘В╪╣ ┘В╪з┘Д╪и┘Г ╪г┘Ж ┘К┘Г┘И┘Ж ╪з┘Д┘Е╪к╪║┘К╪▒ tools ╪е┘Е╪з ┘Б╪з╪▒╪║┘Л╪з (╪е╪░╪з ┘Д┘Е ┘К╪к┘Е ╪к┘Е╪▒┘К╪▒ ╪г┘К ╪г╪п┘И╪з╪к)╪М ╪г┘И ┘В╪з╪ж┘Е╪й ┘Е┘Ж ┘В┘И╪з┘Е┘К╪│ ┘Е╪о╪╖╪╖ JSON. ╪к╪│┘Е╪н ╪г╪│╪з┘Д┘К╪и ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘Ж╪з ┘Д┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ╪и╪к┘Е╪▒┘К╪▒ ╪з┘Д╪г╪п┘И╪з╪к ╪е┘Е╪з ┘Г┘Е╪о╪╖╪╖ JSON ╪г┘И ┘Г╪п┘И╪з┘Д Python╪М ┘И┘Д┘Г┘Ж ╪╣┘Ж╪п┘Е╪з ┘К╪к┘Е ╪к┘Е╪▒┘К╪▒ ╪з┘Д╪п┘И╪з┘Д╪М ┘Б╪е┘Ж┘Ж╪з ┘Ж┘В┘И┘Е ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪и╪е┘Ж╪┤╪з╪б ┘Е╪о╪╖╪╖ JSON ┘И╪к┘Е╪▒┘К╪▒┘З ╪е┘Д┘Й ┘В╪з┘Д╪и┘Г. ┘Ж╪к┘К╪м╪й ┘Д╪░┘Д┘Г╪М ╪│┘К┘Г┘И┘Ж ┘Е╪к╪║┘К╪▒ tools ╪з┘Д╪░┘К ┘К╪│╪к┘В╪и┘Д┘З ┘В╪з┘Д╪и┘Г ╪п╪з╪ж┘Е┘Л╪з ┘В╪з╪ж┘Е╪й ┘Е┘Ж ┘Е╪о╪╖╪╖╪з╪к JSON. ┘З┘Ж╪з ┘Е╪о╪╖╪╖ JSON ╪г╪п╪з╪й ┘Ж┘Е┘И╪░╪м┘К:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "multiply",
|
||||
"description": "╪п╪з┘Д╪й ╪к╪╢╪▒╪и ╪╣╪п╪п┘К┘Ж",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"a": {
|
||||
"type": "number",
|
||||
"description": "╪з┘Д╪▒┘В┘Е ╪з┘Д╪г┘И┘Д ┘Д┘Д╪╢╪▒╪и"
|
||||
},
|
||||
"b": {
|
||||
"type": "number",
|
||||
"description": "╪з┘Д╪▒┘В┘Е ╪з┘Д╪л╪з┘Ж┘К ┘Д┘Д╪╢╪▒╪и"
|
||||
}
|
||||
},
|
||||
"required": ["a", "b"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
┘И┘З┘Ж╪з ╪и╪╣╪╢ ╪з┘Д╪г┘Е╪л┘Д╪й ╪з┘Д╪и╪▒┘Е╪м┘К╪й ┘Д┘Д╪к╪╣╪з┘Е┘Д ┘Е╪╣ ╪з┘Д╪г╪п┘И╪з╪к ┘Б┘К ┘В╪з┘Д╪и ╪з┘Д╪п╪▒╪п╪┤╪й ╪з┘Д╪о╪з╪╡ ╪и┘Г. ╪к╪░┘Г╪▒ ╪г┘Ж ┘З╪░╪з ┘Е╪м╪▒╪п ┘Е╪л╪з┘Д ┘Д╪к┘Ж╪│┘К┘В ┘Е┘П╪н╪п╪п - ┘Е┘Ж ╪з┘Д┘Е╪н╪к┘Е┘Д ╪г┘Ж ┘К╪н╪к╪з╪м ┘Ж┘Е┘И╪░╪м┘Г ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В ┘Е╪о╪к┘Д┘Б!
|
||||
```text
|
||||
{%- if tools %}
|
||||
{%- for tool in tools %}
|
||||
{{- '<tool>' + tool['function']['name'] + '\n' }}
|
||||
{%- for argument in tool['function']['parameters']['properties'] %}
|
||||
{{- argument + ': ' + tool['function']['parameters']['properties'][argument]['description'] + '\n' }}
|
||||
{%- endfor %}
|
||||
{{- '\n</tool>' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
```
|
||||
|
||||
┘К╪м╪и ╪и╪з┘Д╪╖╪и╪╣ ╪з╪о╪к┘К╪з╪▒ ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д┘Е╪н╪п╪п╪й ┘И┘И╪╡┘Б ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д╪к┘К ┘К┘П╪╣╪▒╪╢┘З╪з ┘В╪з┘Д╪и┘Г ┘Д╪к╪к┘Ж╪з╪│╪и ┘Е╪╣ ╪к┘Д┘Г ╪з┘Д╪к┘К ╪к┘Е ╪к╪п╪▒┘К╪и ┘Ж┘Е┘И╪░╪м┘Г ╪╣┘Д┘К┘З╪з. ┘Д╪з ┘К┘И╪м╪п ╪┤╪▒╪╖ ╪г┘Ж ┘К┘Б┘З┘Е ┘Ж┘Е┘И╪░╪м┘Г ┘Е┘П╪п╪о┘Д╪з╪к ┘Е╪о╪╖╪╖ JSON╪М ┘Б┘В╪╖ ╪г┘Ж ┘К╪к┘Е┘Г┘Ж ┘В╪з┘Д╪и┘Г ┘Е┘Ж ╪к╪▒╪м┘Е╪й ┘Е╪о╪╖╪╖ JSON ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В ┘Ж┘Е┘И╪░╪м┘Г. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪к┘Е ╪к╪п╪▒┘К╪и Command-R ╪и╪з╪│╪к╪о╪п╪з┘Е ╪г╪п┘И╪з╪к ┘Е┘П╪╣╪▒┘С┘Б╪й ╪и╪з╪│╪к╪о╪п╪з┘Е ╪▒╪д┘И╪│ ╪п┘И╪з┘Д Python╪М ┘И┘Д┘Г┘Ж ┘К┘В╪и┘Д ┘В╪з┘Д╪и ╪г╪п╪з╪й Command-R ┘Е╪о╪╖╪╖ JSON╪М ┘И┘К┘П╪н┘И┘С┘Д ╪з┘Д╪г┘Ж┘И╪з╪╣ ╪п╪з╪о┘Д┘К┘Л╪з ┘И┘К┘П╪╣╪▒╪╢ ╪г╪п┘И╪з╪к ╪з┘Д╪е╪п╪о╪з┘Д ┘Г╪╣┘Ж╪з┘И┘К┘Ж Python. ┘К┘Е┘Г┘Ж┘Г ┘Б╪╣┘Д ╪з┘Д┘Г╪л┘К╪▒ ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘В┘И╪з┘Д╪и!
|
||||
|
||||
#### ╪з╪│╪к╪п╪╣╪з╪б╪з╪к ╪з┘Д╪г╪п┘И╪з╪к
|
||||
╪з╪│╪к╪п╪╣╪з╪б╪з╪к ╪з┘Д╪г╪п┘И╪з╪к╪М ╪е╪░╪з ┘Г╪з┘Ж╪к ┘Е┘И╪м┘И╪п╪й╪М ╪│╪к┘Г┘И┘Ж ┘В╪з╪ж┘Е╪й ┘Е┘П╪▒┘Б┘В╪й ╪и╪▒╪│╪з┘Д╪й ╪и╪п┘И╪▒ "assistant". ┘Д╪з╪н╪╕ ╪г┘Ж tool_calls ┘З┘К ╪п╪з╪ж┘Е┘Л╪з ┘В╪з╪ж┘Е╪й╪М ╪╣┘Д┘Й ╪з┘Д╪▒╪║┘Е ┘Е┘Ж ╪г┘Ж ┘Е╪╣╪╕┘Е ┘Ж┘Е╪з╪░╪м ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п┘И╪з╪к ╪к╪п╪╣┘Е ┘Б┘В╪╖ ╪з╪│╪к╪п╪╣╪з╪б╪з╪к ╪г╪п┘И╪з╪к ┘Б╪▒╪п┘К╪й ┘Б┘К ┘Г┘Д ┘Е╪▒╪й╪М ┘Е┘Е╪з ┘К╪╣┘Ж┘К ╪г┘Ж ╪з┘Д┘В╪з╪ж┘Е╪й ╪│╪к╪н╪к┘И┘К ╪╣╪з╪п╪й┘Л ╪╣┘Д┘Й ╪╣┘Ж╪╡╪▒ ┘И╪з╪н╪п ┘Б┘В╪╖. ┘З┘Ж╪з ┘В╪з┘Е┘И╪│ ╪▒╪│╪з┘Д╪й ┘Ж┘Е┘И╪░╪м┘К ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ╪з╪│╪к╪п╪╣╪з╪б ╪г╪п╪з╪й:
|
||||
|
||||
```json
|
||||
{
|
||||
"role": "assistant",
|
||||
"tool_calls": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "multiply",
|
||||
"arguments": {
|
||||
"a": 5,
|
||||
"b": 6
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
┘И╪з┘Д┘Ж┘Е╪╖ ╪з┘Д╪┤╪з╪ж╪╣ ┘Д┘Д╪к╪╣╪з┘Е┘Д ┘Е╪╣┘З╪з ╪│┘К┘Г┘И┘Ж ┘Г┘З╪░╪з:
|
||||
|
||||
```text
|
||||
{%- if message['role'] == 'assistant' and 'tool_calls' in message %}
|
||||
{%- for tool_call in message['tool_calls'] %}
|
||||
{{- '<tool_call>' + tool_call['function']['name'] + '\n' + tool_call['function']['arguments']|tojson + '\n</tool_call>' }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
```
|
||||
|
||||
┘Е╪▒╪й ╪г╪о╪▒┘Й╪М ┘К╪м╪и ╪╣┘Д┘К┘Г ╪╣╪▒╪╢ ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п╪з╪й ╪и╪з┘Д╪к┘Ж╪│┘К┘В ┘И╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й ╪з┘Д╪к┘К ┘К╪к┘И┘В╪╣┘З╪з ┘Ж┘Е┘И╪░╪м┘Г.
|
||||
|
||||
#### ╪з╪│╪к╪м╪з╪и╪з╪к ╪з┘Д╪г╪п┘И╪з╪к
|
||||
╪з╪│╪к╪м╪з╪и╪з╪к ╪з┘Д╪г╪п┘И╪з╪к ┘Д┘З╪з ╪к┘Ж╪│┘К┘В ╪и╪│┘К╪╖: ╪е┘Ж┘З╪з ┘В╪з┘Е┘И╪│ ╪▒╪│╪з┘Д╪й ╪и╪п┘И╪▒ "tool"╪М ┘И┘Е┘Б╪к╪з╪н "name" ┘К┘П╪╣╪╖┘К ╪з╪│┘Е ╪з┘Д╪п╪з┘Д╪й ╪з┘Д┘Е┘П╪│╪к╪п╪╣╪з╪й╪М ┘И┘Е┘Б╪к╪з╪н "content" ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ┘Ж╪к┘К╪м╪й ╪з╪│╪к╪п╪╣╪з╪б ╪з┘Д╪г╪п╪з╪й. ┘З┘Ж╪з ╪з╪│╪к╪м╪з╪и╪й ╪г╪п╪з╪й ┘Ж┘Е┘И╪░╪м┘К╪й:
|
||||
|
||||
```json
|
||||
{
|
||||
"role": "tool",
|
||||
"name": "multiply",
|
||||
"content": "30"
|
||||
}
|
||||
```
|
||||
┘Д╪│╪к ╪и╪н╪з╪м╪й ╪е┘Д┘Й ╪з╪│╪к╪о╪п╪з┘Е ╪м┘Е┘К╪╣ ╪з┘Д┘Е┘Б╪з╪к┘К╪н ┘Б┘К ╪з╪│╪к╪м╪з╪и╪й ╪з┘Д╪г╪п╪з╪й. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Г ┘Д╪з ┘К╪к┘И┘В╪╣ ╪к╪╢┘Е┘К┘Ж ╪з╪│┘Е ╪з┘Д╪п╪з┘Д╪й ┘Б┘К ╪з╪│╪к╪м╪з╪и╪й ╪з┘Д╪г╪п╪з╪й╪М ┘Б┘К┘Е┘Г┘Ж ╪г┘Ж ┘К┘Г┘И┘Ж ╪╣╪▒╪╢┘З╪з ╪и╪│┘К╪╖┘Л╪з ┘Е╪л┘Д:
|
||||
|
||||
```text
|
||||
{%- if message['role'] == 'tool' %}
|
||||
{{- "<tool_result>" + message['content'] + "</tool_result>" }}
|
||||
{%- endif %}
|
||||
```
|
||||
|
||||
┘Е╪▒╪й ╪г╪о╪▒┘Й╪М ╪к╪░┘Г╪▒ ╪г┘Ж ╪з┘Д╪к┘Ж╪│┘К┘В ╪з┘Д┘Б╪╣┘Д┘К ┘И╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й ╪о╪з╪╡╪й ╪и╪з┘Д┘Ж┘Е┘И╪░╪м - ┘К╪м╪и ╪г┘Ж ╪к┘П┘И┘Д┘К ╪╣┘Ж╪з┘К╪й ┘Г╪и┘К╪▒╪й ┘Д╪╢┘Е╪з┘Ж ╪г┘Ж ╪з┘Д╪▒┘Е┘И╪▓ ┘И╪з┘Д┘Е╪│╪з┘Б╪з╪к ╪з┘Д┘Б╪з╪▒╪║╪й ┘И┘Г┘Д ╪┤┘К╪б ╪в╪о╪▒ ┘К╪к╪╖╪з╪и┘В ╪к┘Е╪з┘Е┘Л╪з ┘Е╪╣ ╪з┘Д╪к┘Ж╪│┘К┘В ╪з┘Д╪░┘К ╪к┘Е ╪к╪п╪▒┘К╪и ┘Ж┘Е┘И╪░╪м┘Г ╪╣┘Д┘К┘З!
|
||||
436
docs/source/ar/create_a_model.md
Normal file
436
docs/source/ar/create_a_model.md
Normal file
@ -0,0 +1,436 @@
|
||||
# ╪е┘Ж╪┤╪з╪б ╪и┘Ж┘К╪й ┘Е╪о╪╡╪╡╪й
|
||||
|
||||
╪к╪н╪п╪п ┘Б╪ж╪й [`AutoClass`](model_doc/auto) ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪и┘Ж┘К╪й ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪к┘В┘И┘Е ╪и╪к┘Ж╪▓┘К┘Д ╪к┘Г┘И┘К┘Ж ┘И╪г┘И╪▓╪з┘Ж ┘Е╪│╪и┘В┘К┘Ж ┘Д┘Д┘Ж┘Е┘И╪░╪м. ╪и╪┤┘Г┘Д ╪╣╪з┘Е╪М ┘Ж┘И╪╡┘К ╪и╪з╪│╪к╪о╪п╪з┘Е `AutoClass` ┘Д╪е┘Ж╪к╪з╪м ┘Г┘И╪п ╪║┘К╪▒ ┘Е╪▒╪к╪и╪╖ ╪и┘Ж╪│╪о╪й ┘Е╪╣┘К┘Ж╪й. ┘И┘Д┘Г┘Ж ┘К┘Е┘Г┘Ж ┘Д┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ╪з┘Д╪░┘К┘Ж ┘К╪▒┘К╪п┘И┘Ж ┘Е╪▓┘К╪п┘Л╪з ┘Е┘Ж ╪з┘Д╪к╪н┘Г┘Е ┘Б┘К ┘Е╪╣┘Д┘Е╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д┘Е╪н╪п╪п╪й ╪е┘Ж╪┤╪з╪б ┘Ж┘Е┘И╪░╪м ┘Е╪о╪╡╪╡ ┘Е┘Ж ЁЯдЧ Transformers ┘Е┘Ж ┘Е╪м╪▒╪п ╪и╪╢╪╣ ┘Б╪ж╪з╪к ╪г╪│╪з╪│┘К╪й. ┘В╪п ┘К┘Г┘И┘Ж ┘З╪░╪з ┘Е┘Б┘К╪п┘Л╪з ╪и╪┤┘Г┘Д ╪о╪з╪╡ ┘Д╪г┘К ╪┤╪о╪╡ ┘Е┘З╪к┘Е ╪и╪п╪▒╪з╪│╪й ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪г┘И ╪к╪п╪▒┘К╪и┘З ╪г┘И ╪е╪м╪▒╪з╪б ╪к╪м╪з╪▒╪и ╪╣┘Д┘К┘З. ┘Б┘К ┘З╪░╪з ╪з┘Д╪п┘Д┘К┘Д╪М ╪│┘Ж╪║┘И╪╡ ╪и╪┤┘Г┘Д ╪г╪╣┘Е┘В ┘Б┘К ╪е┘Ж╪┤╪з╪б ┘Ж┘Е┘И╪░╪м ┘Е╪о╪╡╪╡ ╪и╪п┘И┘Ж `AutoClass`. ╪к╪╣╪▒┘Б ╪╣┘Д┘Й ┘Г┘К┘Б┘К╪й:
|
||||
|
||||
- ╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪к╪о╪╡┘К╪╡┘З.
|
||||
- ╪е┘Ж╪┤╪з╪б ╪и┘Ж┘К╪й ┘Ж┘Е┘И╪░╪м.
|
||||
- ╪е┘Ж╪┤╪з╪б ┘Е╪м╪▓╪б ┘Д╪║┘И┘Й ╪│╪▒┘К╪╣ ┘И╪и╪╖┘К╪б ┘Д┘Д┘Ж╪╡.
|
||||
- ╪е┘Ж╪┤╪з╪б ┘Е╪╣╪з┘Д╪м ╪╡┘И╪▒ ┘Д┘Е┘З╪з┘Е ╪з┘Д╪▒╪д┘К╪й.
|
||||
- ╪е┘Ж╪┤╪з╪б ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ┘Д┘Е┘З╪з┘Е ╪з┘Д╪╡┘И╪к.
|
||||
- ╪е┘Ж╪┤╪з╪б ┘Е╪╣╪з┘Д╪м ┘Д┘Д┘Е┘З╪з┘Е ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘И╪│╪з╪ж╪╖.
|
||||
|
||||
## ╪з┘Д╪к┘Г┘И┘К┘Ж
|
||||
|
||||
┘К╪┤┘К╪▒ ┘Е╪╡╪╖┘Д╪н [╪з┘Д╪к┘Г┘И┘К┘Ж](main_classes/configuration) ╪е┘Д┘Й ╪з┘Д╪о╪╡╪з╪ж╪╡ ╪з┘Д┘Е╪н╪п╪п╪й ┘Д┘Д┘Ж┘Е┘И╪░╪м. ┘Д┘Г┘Д ╪к┘Г┘И┘К┘Ж ┘Ж┘Е┘И╪░╪м ╪о╪╡╪з╪ж╪╡┘З ╪з┘Д╪о╪з╪╡╪й╪Ы ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪к╪┤╪к╪▒┘Г ╪м┘Е┘К╪╣ ┘Ж┘Е╪з╪░╪м NLP ┘Б┘К ╪з┘Д╪о╪╡╪з╪ж╪╡ `hidden_size` ┘И`num_attention_heads` ┘И`num_hidden_layers` ┘И`vocab_size` ╪з┘Д┘Е╪┤╪к╪▒┘Г╪й. ╪к╪н╪п╪п ┘З╪░┘З ╪з┘Д╪о╪╡╪з╪ж╪╡ ╪╣╪п╪п ╪▒╪д┘И╪│ ╪з┘Д╪з┘Ж╪к╪и╪з┘З ╪г┘И ╪з┘Д╪╖╪и┘В╪з╪к ╪з┘Д┘Е╪о┘Б┘К╪й ┘Д╪и┘Ж╪з╪б ┘Ж┘Е┘И╪░╪м ╪и┘З╪з.
|
||||
|
||||
╪з╪╖┘Д╪╣ ╪╣┘Д┘Й [DistilBERT](model_doc/distilbert) ┘Е┘Ж ╪о┘Д╪з┘Д [`DistilBertConfig`] ┘Д┘Е╪╣╪з┘К┘Ж╪й ╪о╪╡╪з╪ж╪╡┘З:
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> print(config)
|
||||
DistilBertConfig {
|
||||
"activation": "gelu",
|
||||
"attention_dropout": 0.1,
|
||||
"dim": 768,
|
||||
"dropout": 0.1,
|
||||
"hidden_dim": 3072,
|
||||
"initializer_range": 0.02,
|
||||
"max_position_embeddings": 512,
|
||||
"model_type": "distilbert",
|
||||
"n_heads": 12,
|
||||
"n_layers": 6,
|
||||
"pad_token_id": 0,
|
||||
"qa_dropout": 0.1,
|
||||
"seq_classif_dropout": 0.2,
|
||||
"sinusoidal_pos_embds": false,
|
||||
"transformers_version": "4.16.2",
|
||||
"vocab_size": 30522
|
||||
}
|
||||
```
|
||||
|
||||
┘К╪╣╪▒╪╢ [`DistilBertConfig`] ╪м┘Е┘К╪╣ ╪з┘Д╪о╪╡╪з╪ж╪╡ ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ┘Д╪и┘Ж╪з╪б ┘Ж┘Е┘И╪░╪м [`DistilBertModel`] ╪г╪│╪з╪│┘К. ╪м┘Е┘К╪╣ ╪з┘Д╪о╪╡╪з╪ж╪╡ ┘В╪з╪и┘Д╪й ┘Д┘Д╪к╪╣╪п┘К┘Д╪М ┘Е┘Е╪з ┘К┘К╪к┘К╪н ┘Е╪м╪з┘Д╪з┘Л ┘Д┘Д╪к╪м╪▒┘К╪и. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘К┘Е┘Г┘Ж┘Г ╪к╪╣╪п┘К┘Д ┘Ж┘Е┘И╪░╪м ╪з┘Б╪к╪▒╪з╪╢┘К ┘Д┘А:
|
||||
|
||||
- ╪к╪м╪▒╪и╪й ╪п╪з┘Д╪й ╪к┘Ж╪┤┘К╪╖ ┘Е╪о╪к┘Д┘Б╪й ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е╪╣╪з┘Е┘Д `activation`.
|
||||
- ╪з╪│╪к╪о╪п╪з┘Е ┘Е╪╣╪п┘Д ╪е╪│┘В╪з╪╖ ╪г╪╣┘Д┘Й ╪з┘Д╪з╪н╪к┘Е╪з┘Д╪з╪к ╪з┘Д╪з┘Ж╪к╪и╪з┘З ┘Е╪╣ ┘Е╪╣╪з┘Е┘Д `attention_dropout`.
|
||||
|
||||
```py
|
||||
>>> my_config = DistilBertConfig(activation="relu", attention_dropout=0.4)
|
||||
>>> print(my_config)
|
||||
DistilBertConfig {
|
||||
"activation": "relu",
|
||||
"attention_dropout": 0.4,
|
||||
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪к╪╣╪п┘К┘Д ╪о╪╡╪з╪ж╪╡ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д┘Е╪п╪▒╪и ┘Е╪│╪и┘В┘Л╪з ┘Б┘К ╪п╪з┘Д╪й [`~PretrainedConfig.from_pretrained`] :
|
||||
|
||||
```py
|
||||
>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
|
||||
```
|
||||
|
||||
╪и┘Е╪м╪▒╪п ╪г┘Ж ╪к╪╡╪и╪н ╪▒╪з╪╢┘К┘Л╪з ╪╣┘Ж ╪к┘Г┘И┘К┘Ж ┘Ж┘Е┘И╪░╪м┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪н┘Б╪╕┘З ╪и╪з╪│╪к╪о╪п╪з┘Е [`~PretrainedConfig.save_pretrained`]. ┘К╪к┘Е ╪к╪о╪▓┘К┘Ж ┘Е┘Д┘Б ╪з┘Д╪к┘Г┘И┘К┘Ж ╪з┘Д╪о╪з╪╡ ╪и┘Г ╪╣┘Д┘Й ╪г┘Ж┘З ┘Е┘Д┘Б JSON ┘Б┘К ╪п┘Д┘К┘Д ╪з┘Д╪н┘Б╪╕ ╪з┘Д┘Е╪н╪п╪п:
|
||||
|
||||
```py
|
||||
>>> my_config.save_pretrained(save_directory="./your_model_save_path")
|
||||
```
|
||||
|
||||
┘Д╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е ┘Е┘Д┘Б ╪з┘Д╪к┘Г┘И┘К┘Ж╪М ┘В┘Е ╪и╪к╪н┘Е┘К┘Д┘З ╪и╪з╪│╪к╪о╪п╪з┘Е [`~PretrainedConfig.from_pretrained`]:
|
||||
|
||||
```py
|
||||
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
|
||||
```
|
||||
|
||||
<Tip>
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪н┘Б╪╕ ┘Е┘Д┘Б ╪з┘Д╪к┘Г┘И┘К┘Ж ┘Г┘В╪з┘Е┘И╪│ ╪г┘И ╪н╪к┘Й ┘Г┘Б╪▒┘В ╪и┘К┘Ж ╪о╪╡╪з╪ж╪╡ ╪з┘Д╪к┘Г┘И┘К┘Ж ╪з┘Д┘Е┘П╪╣╪п┘С┘Д╪й ┘И╪з┘Д╪о╪╡╪з╪ж╪╡ ╪з┘Д╪к┘Г┘И┘К┘Ж ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й! ╪▒╪з╪м╪╣ ┘И╪л╪з╪ж┘В [╪з┘Д╪к┘Г┘И┘К┘Ж](main_classes/configuration) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪к┘Б╪з╪╡┘К┘Д.
|
||||
</Tip>
|
||||
|
||||
|
||||
## ╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
|
||||
╪з┘Д╪о╪╖┘И╪й ╪з┘Д╪к╪з┘Д┘К╪й ┘З┘К ╪е┘Ж╪┤╪з╪б [┘Ж┘Е┘И╪░╪м](main_classes/models). ╪з┘Д┘Ж┘Е┘И╪░╪м - ┘И┘К┘П╪┤╪з╪▒ ╪е┘Д┘К┘З ╪г╪н┘К╪з┘Ж┘Л╪з ╪и╪з╪│┘Е ╪з┘Д╪и┘Ж┘К╪й - ┘К┘П╪н╪п╪п ┘И╪╕┘К┘Б╪й ┘Г┘Д ╪╖╪и┘В╪й ┘И╪з┘Д╪╣┘Е┘Д┘К╪з╪к ╪з┘Д╪н╪│╪з╪и┘К╪й ╪з┘Д┘Е┘П┘Ж┘Б╪░╪й. ╪к┘П╪│╪к╪о╪п┘Е ╪о╪╡╪з╪ж╪╡ ┘Е╪л┘Д `num_hidden_layers` ┘Е┘Ж ╪з┘Д╪к┘Г┘И┘К┘Ж ┘Д╪к╪н╪п┘К╪п ┘З╪░┘З ╪з┘Д╪и┘Ж┘К╪й. ╪к╪┤╪к╪▒┘Г ╪м┘Е┘К╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Б┘К ┘Б╪ж╪й ╪г╪│╪з╪│┘К╪й ┘И╪з╪н╪п╪й ┘З┘К [`PreTrainedModel`] ┘И╪и╪╣╪╢ ╪з┘Д┘И╪╕╪з╪ж┘Б ╪з┘Д┘Е┘П╪┤╪к╪▒┘Г╪й ┘Е╪л┘Д ╪║┘К┘К╪▒ ╪н╪м┘Е ┘Е┘П╪п╪о┘Д╪з╪к ╪з┘Д┘Г┘Д┘Е╪з╪к ┘И╪к┘В┘Д┘К╪╡ ╪▒╪д┘И╪│ ╪в┘Д┘К╪й ╪з┘Д╪з┘Ж╪к╪и╪з┘З ╪з┘Д╪░╪з╪к┘К. ╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ╪░┘Д┘Г╪М ┘Б╪е┘Ж ╪м┘Е┘К╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘З┘К ┘Б╪ж╪з╪к ┘Б╪▒╪╣┘К╪й ╪е┘Е╪з ┘Е┘Ж [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)╪М [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) ╪г┘И [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) . ┘З╪░╪з ┘К╪╣┘Ж┘К ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е╪к┘И╪з┘Б┘В╪й ┘Е╪╣ ┘Г┘Д ╪з╪│╪к╪о╪п╪з┘Е ┘Д╪е╪╖╪з╪▒ ╪з┘Д╪╣┘Е┘Д ╪з┘Д╪о╪з╪╡ ╪и┘З╪з.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
┘В┘Е ╪и╪к╪н┘Е┘К┘Д ╪о╪╡╪з╪ж╪╡ ╪з┘Д╪к┘Г┘И┘К┘Ж ╪з┘Д┘Е╪о╪╡╪╡╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ┘Б┘К ╪з┘Д┘Ж┘Е┘И╪░╪м:
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertModel
|
||||
|
||||
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
|
||||
>>> model = DistilBertModel(my_config)
|
||||
```
|
||||
|
||||
┘З╪░╪з ┘К┘Ж╪┤╪ж ┘Ж┘Е┘И╪░╪м┘Л╪з ╪и┘В┘К┘Е ╪╣╪┤┘И╪з╪ж┘К╪й ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е┘П╪п╪▒╪и╪й ┘Е╪│╪и┘В┘Л╪з. ┘Д┘Ж ┘К┘Г┘И┘Ж ┘З╪░╪з ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е┘Б┘К╪п┘Л╪з ╪н╪к┘Й ┘К╪к┘Е ╪к╪п╪▒┘К╪и┘З. ╪к┘П╪╣╪п ╪╣┘Е┘Д┘К╪й ╪з┘Д╪к╪п╪▒┘К╪и ┘Е┘Г┘Д┘Б╪й ┘И╪к╪│╪к╪║╪▒┘В ┘И┘В╪к┘Л╪з ╪╖┘И┘К┘Д╪з┘Л. ┘Е┘Ж ╪з┘Д╪г┘Б╪╢┘Д ╪и╪┤┘Г┘Д ╪╣╪з┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒╪и ┘Е╪│╪и┘В┘Л╪з ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ┘Ж╪к╪з╪ж╪м ╪г┘Б╪╢┘Д ╪и╪┤┘Г┘Д ╪г╪│╪▒╪╣╪М ┘Е╪╣ ╪з╪│╪к╪о╪п╪з┘Е ╪м╪▓╪б ╪и╪│┘К╪╖ ┘Б┘В╪╖ ┘Е┘Ж ╪з┘Д┘Е┘И╪з╪▒╪п ╪з┘Д┘Е╪╖┘Д┘И╪и╪й ┘Д┘Д╪к╪п╪▒┘К╪и.
|
||||
|
||||
┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒╪и ┘Е╪│╪и┘В┘Л╪з ╪и╪з╪│╪к╪о╪п╪з┘Е [`~PreTrainedModel.from_pretrained`]:
|
||||
|
||||
```py
|
||||
>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
|
||||
╪╣┘Ж╪п ╪и╪к╪н┘Е┘К┘Д ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е┘П╪п╪▒╪и╪й ┘Е╪│╪и┘В┘Л╪з╪М ┘К╪к┘Е ╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪е╪░╪з ┘Г╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е┘Ж ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪з╪│╪к╪и╪п╪з┘Д - ╪и╪╣╪╢ ╪г┘И ┘Г┘Д - ╪│╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ╪и╪е╪╣╪п╪з╪п╪з╪к┘Г ╪з┘Д╪о╪з╪╡╪й:
|
||||
|
||||
```py
|
||||
>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"╪М config=my_config)
|
||||
```
|
||||
</pt>
|
||||
<tf>
|
||||
┘В┘Е ╪и╪к╪н┘Е┘К┘Д ╪о╪╡╪з╪ж╪╡ ╪з┘Д╪к┘Г┘И┘К┘Ж ╪з┘Д┘Е┘П╪о╪╡╪╡╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ┘Б┘К ╪з┘Д┘Ж┘Е┘И╪░╪м:
|
||||
|
||||
```py
|
||||
>>> from transformers import TFDistilBertModel
|
||||
|
||||
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
|
||||
>>> tf_model = TFDistilBertModel(my_config)
|
||||
```
|
||||
|
||||
┘З╪░╪з ┘К┘Ж╪┤╪ж ┘Ж┘Е┘И╪░╪м┘Л╪з ╪и┘В┘К┘Е ╪╣╪┤┘И╪з╪ж┘К╪й ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е┘П╪п╪▒╪и╪й ┘Е╪│╪и┘В┘Л╪з. ┘Д┘Ж ┘К┘Г┘И┘Ж ┘З╪░╪з ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е┘Б┘К╪п┘Л╪з ╪н╪к┘Й ┘К╪к┘Е ╪к╪п╪▒┘К╪и┘З. ╪к┘П╪╣╪п ╪╣┘Е┘Д┘К╪й ╪з┘Д╪к╪п╪▒┘К╪и ┘Е┘Г┘Д┘Б╪й ┘И╪к╪│╪к╪║╪▒┘В ┘И┘В╪к┘Л╪з ╪╖┘И┘К┘Д╪з┘Л. ┘Е┘Ж ╪з┘Д╪г┘Б╪╢┘Д ╪и╪┤┘Г┘Д ╪╣╪з┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒╪и ┘Е╪│╪и┘В┘Л╪з ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ┘Ж╪к╪з╪ж╪м ╪г┘Б╪╢┘Д ╪и╪┤┘Г┘Д ╪г╪│╪▒╪╣╪М ┘Е╪╣ ╪з╪│╪к╪о╪п╪з┘Е ╪м╪▓╪б ╪и╪│┘К╪╖ ┘Б┘В╪╖ ┘Е┘Ж ╪з┘Д┘Е┘И╪з╪▒╪п ╪з┘Д┘Е╪╖┘Д┘И╪и╪й ┘Д┘Д╪к╪п╪▒┘К╪и.
|
||||
|
||||
┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒╪и ┘Е╪│╪и┘В┘Л╪з ╪и╪з╪│╪к╪о╪п╪з┘Е [`~TFPreTrainedModel.from_pretrained`]:
|
||||
|
||||
```py
|
||||
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
|
||||
╪╣┘Ж╪п┘Е╪з ╪к┘В┘И┘Е ╪и╪к╪н┘Е┘К┘Д ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е┘П╪п╪▒╪и╪й ┘Е╪│╪и┘В┘Л╪з╪М┘К╪к┘Е ╪к╪н┘Е┘К┘Д ╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪е╪░╪з ┘Г╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е┘Ж ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪з╪│╪к╪и╪п╪з┘Д - ╪и╪╣╪╢ ╪г┘И ┘Г┘Д - ╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ╪и╪е╪╣╪п╪з╪п╪з╪к┘Г ╪з┘Д╪о╪з╪╡╪й:
|
||||
|
||||
```py
|
||||
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"╪М config=my_config)
|
||||
```
|
||||
</tf>
|
||||
</frameworkcontent>
|
||||
|
||||
### ╪▒╪д┘И╪│ ╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
|
||||
┘Б┘К ┘З╪░┘З ╪з┘Д┘Е╪▒╪н┘Д╪й╪М ┘Д╪п┘К┘Г ┘Ж┘Е┘И╪░╪м DistilBERT ╪з┘Д╪г╪│╪з╪│┘К ╪з┘Д╪░┘К ┘К╪о╪▒╪м *╪н╪з┘Д╪з╪к ╪з┘Д┘Г╪з┘Е┘Ж╪й*. ╪к┘П┘Е╪▒┘С┘О╪▒ ┘З╪░┘З ╪з┘Д╪н╪з┘Д╪з╪к ╪з┘Д┘Г╪з┘Е┘Ж╪й ┘Г┘Е╪п╪о┘Д╪з╪к ┘Д╪▒╪г╪│ ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д╪е┘Ж╪к╪з╪м ╪з┘Д┘Е╪о╪▒╪м╪з╪к ╪з┘Д┘Ж┘З╪з╪ж┘К╪й. ╪к┘И┘Б╪▒ ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers ╪▒╪г╪│ ┘Ж┘Е┘И╪░╪м ┘Е╪о╪к┘Д┘Б ┘Д┘Г┘Д ┘Е┘З┘Е╪й ╪╖╪з┘Д┘Е╪з ╪г┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘К╪п╪╣┘Е ╪з┘Д┘Е┘З┘Е╪й (╪г┘К ┘Д╪з ┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е DistilBERT ┘Д┘Е┘З┘Е╪й ╪к╪│┘Д╪│┘Д ╪е┘Д┘Й ╪к╪│┘Д╪│┘Д ┘Е╪л┘Д ╪з┘Д╪к╪▒╪м┘Е╪й).
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М [`DistilBertForSequenceClassification`] ┘З┘И ┘Ж┘Е┘И╪░╪м DistilBERT ╪з┘Д╪г╪│╪з╪│ ┘Е╪▓┘И╪п┘Л╪з ╪и╪▒╪г╪│ ╪к╪╡┘Ж┘К┘Б ╪к╪│┘Д╪│┘Д┘К. ┘К┘П╪┤┘Г┘С┘Д ╪▒╪г╪│ ╪з┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪к╪│┘Д╪│┘Д┘К ╪╖╪и┘В╪й ╪о╪╖┘К╪й ┘Б┘И┘В ╪з┘Д┘Е╪о╪▒╪м╪з╪к ╪з┘Д┘Е╪м┘Е╪╣╪й.
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertForSequenceClassification
|
||||
|
||||
>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
|
||||
╪г╪╣╪п ╪з╪│╪к╪о╪п╪з┘Е ┘З╪░╪з ┘Ж┘В╪╖╪й ╪з┘Д╪к╪н┘В┘В ┘З╪░┘З ┘Д┘Е┘З┘Е╪й ╪г╪о╪▒┘Й ╪и╪│┘З┘И┘Д╪й╪М ┘И╪░┘Д┘Г ╪и╪к╪║┘К┘К╪▒ ╪▒╪г╪│ ╪з┘Д┘Ж┘Е┘И╪░╪м.┘Б┘Б┘К ┘Е┘З┘Е╪й ╪з┘Д╪е╪м╪з╪и╪й ╪╣┘Д┘Й ╪з┘Д╪г╪│╪ж┘Д╪й╪М ╪│╪к╪│╪к╪о╪п┘Е ╪▒╪г╪│ ╪з┘Д┘Ж┘Е┘И╪░╪м [`DistilBertForQuestionAnswering`]. ╪▒╪г╪│ ╪з┘Д╪е╪м╪з╪и╪й ╪╣┘Д┘Й ╪з┘Д╪г╪│╪ж┘Д╪й ┘Е╪┤╪з╪и┘З ┘Д╪▒╪г╪│ ╪з┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪к╪│┘Д╪│┘Д┘К ╪и╪з╪│╪к╪л┘Ж╪з╪б ╪г┘Ж┘З ╪╖╪и┘В╪й ╪о╪╖┘К╪й ┘Б┘И┘В ┘Е╪о╪▒╪м╪з╪к ╪з┘Д╪н╪з┘Д╪з╪к ╪з┘Д┘Г╪з┘Е┘Ж╪й.
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertForQuestionAnswering
|
||||
|
||||
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
</pt>
|
||||
<tf>
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М [`TFDistilBertForSequenceClassification`] ┘З┘И ┘Ж┘Е┘И╪░╪м DistilBERT ╪з┘Д╪г╪│╪з╪│┘К ╪и╪▒╪г╪│ ╪к╪╡┘Ж┘К┘Б ╪к╪│┘Д╪│┘Д. ╪▒╪г╪│ ╪з┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪к╪│┘Д╪│┘Д┘К ┘З┘И ╪╖╪и┘В╪й ╪о╪╖┘К╪й ╪г╪╣┘Д┘Й ╪з┘Д┘Е╪о╪▒╪м╪з╪к ╪з┘Д┘Е╪м┘Е╪╣╪й.
|
||||
|
||||
```py
|
||||
>>> from transformers import TFDistilBertForSequenceClassification
|
||||
|
||||
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
|
||||
╪г╪╣╪п ╪з╪│╪к╪о╪п╪з┘Е ┘З╪░╪з ┘Ж┘В╪╖╪й ╪з┘Д╪к╪н┘В┘В ┘Д┘Е┘З┘Е╪й ╪г╪о╪▒┘Й ╪╣┘Ж ╪╖╪▒┘К┘В ╪з┘Д╪к╪и╪п┘К┘Д ╪е┘Д┘Й ╪▒╪г╪│ ┘Ж┘Е┘И╪░╪м ┘Е╪о╪к┘Д┘Б. ┘Д┘Е┘З┘Е╪й ╪з┘Д╪е╪м╪з╪и╪й ╪╣┘Д┘Й ╪з┘Д╪г╪│╪ж┘Д╪й╪М ╪│╪к╪│╪к╪о╪п┘Е ╪▒╪г╪│ ╪з┘Д┘Ж┘Е┘И╪░╪м [`TFDistilBertForQuestionAnswering`]. ╪▒╪г╪│ ╪з┘Д╪е╪м╪з╪и╪й ╪╣┘Д┘Й ╪з┘Д╪г╪│╪ж┘Д╪й ┘Е╪┤╪з╪и┘З ┘Д╪▒╪г╪│ ╪з┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪к╪│┘Д╪│┘Д┘К ╪и╪з╪│╪к╪л┘Ж╪з╪б ╪г┘Ж┘З ╪╖╪и┘В╪й ╪о╪╖┘К╪й ╪г╪╣┘Д┘Й ╪н╪з┘Д╪з╪к ╪з┘Д╪е╪о╪▒╪з╪м ╪з┘Д┘Е╪о┘Б┘К╪й.
|
||||
|
||||
```py
|
||||
>>> from transformers import TFDistilBertForQuestionAnswering
|
||||
|
||||
>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
</tf>
|
||||
</frameworkcontent>
|
||||
|
||||
## ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡
|
||||
|
||||
╪з┘Д┘Б╪ж╪й ╪з┘Д╪г╪│╪з╪│┘К╪й ╪з┘Д╪г╪о┘К╪▒╪й ╪з┘Д╪к┘К ╪к╪н╪к╪з╪м┘З╪з ┘В╪и┘Д ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м ┘Д┘Д╪и┘К╪з┘Ж╪з╪к ╪з┘Д┘Ж╪╡┘К╪й ┘З┘К [┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡](main_classes/tokenizer) ┘Д╪к╪н┘И┘К┘Д ╪з┘Д┘Ж╪╡ ╪з┘Д╪о╪з┘Е ╪е┘Д┘Й ╪к┘Ж╪│┘И╪▒╪з╪к (tensors). ┘З┘Ж╪з┘Г ┘Ж┘И╪╣╪з┘Ж ┘Е┘Ж ╪з┘Д┘Е╪н┘И┘Д╪з╪к ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Е╪╣ ЁЯдЧ Transformers:
|
||||
|
||||
- [`PreTrainedTokenizer`]: ╪к┘Ж┘Б┘К╪░ Python ┘Д┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡.
|
||||
- [`PreTrainedTokenizerFast`]: ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ┘Е┘Ж ┘Е┘Г╪к╪и╪й [ЁЯдЧ Tokenizer](https://huggingface.co/docs/tokenizers/python/latest/) ╪з┘Д┘Е┘П╪и┘Ж┘К╪й ╪╣┘Д┘Й ┘Д╪║╪й Rust. ┘З╪░╪з ╪з┘Д┘Ж┘И╪╣ ┘Е┘Ж ╪з┘Д┘Е╪м╪▓╪ж╪з╪к ╪г╪│╪▒╪╣ ╪и┘Г╪л┘К╪▒╪М ╪о╪з╪╡╪й┘Л ╪╣┘Ж╪п ┘Е╪╣╪з┘Д╪м╪й ╪п┘Б╪╣╪з╪к ╪з┘Д┘Ж╪╡┘И╪╡╪М ┘И╪░┘Д┘Г ╪и┘Б╪╢┘Д ╪к╪╡┘Е┘К┘Е┘З ╪и┘Д╪║╪й Rust. ┘Г┘Е╪з ┘К┘И┘Б╪▒ ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪з┘Д╪│╪▒┘К╪╣ ╪╖╪▒┘В┘Л╪з ╪е╪╢╪з┘Б┘К╪й ┘Е╪л┘Д *┘Е╪о╪╖╪╖ ╪з┘Д╪е╪▓╪з╪н╪й* ╪з┘Д╪░┘К ┘К┘П╪╖╪з╪и┘В ╪з┘Д╪▒┘Е┘И╪▓ ╪и┘Г┘Д┘Е╪з╪к┘З╪з ╪г┘И ╪г╪н╪▒┘Б┘З╪з ╪з┘Д╪г╪╡┘Д┘К╪й.
|
||||
|
||||
┘К╪п╪╣┘Е ┘Г┘Д╪з ╪з┘Д┘Ж┘И╪╣┘К┘Ж ┘Е┘Ж ╪з┘Д┘Е╪м╪▓╪ж╪з╪к ╪╖╪▒┘В┘Л╪з ╪┤╪з╪ж╪╣╪й ┘Е╪л┘Д ╪з┘Д╪к╪▒┘Е┘К╪▓ ┘И┘Б┘Г ╪з┘Д╪к╪▒┘Е┘К╪▓╪М ┘И╪е╪╢╪з┘Б╪й ╪▒┘Е┘И╪▓ ╪м╪п┘К╪п╪й╪М ┘И╪е╪п╪з╪▒╪й ╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д╪о╪з╪╡╪й.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
┘Д╪з ┘К╪п╪╣┘Е ┘Г┘Д ┘Ж┘Е┘И╪░╪м ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪│╪▒┘К╪╣. ╪з┘Д┘В ┘Ж╪╕╪▒╪й ╪╣┘Д┘Й ┘З╪░╪з [╪м╪п┘И┘Д](index#supported-frameworks) ┘Д┘Д╪к╪н┘В┘В ┘Е┘Е╪з ╪е╪░╪з ┘Г╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ╪п╪╣┘Е ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪│╪▒┘К╪╣.
|
||||
|
||||
</Tip>
|
||||
|
||||
╪е╪░╪з ╪п╪▒╪и╪к ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪о╪з╪╡ ╪и┘Г╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪е┘Ж╪┤╪з╪б ┘И╪з╪н╪п ┘Е┘Ж *┘В╪з┘Е┘И╪│┘Г*:```
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertTokenizer
|
||||
|
||||
>>> my_tokenizer = DistilBertTokenizer(vocab_file="my_vocab_file.txt"╪М do_lower_case=False╪М padding_side="left")
|
||||
```
|
||||
|
||||
┘Е┘Ж ╪з┘Д┘Е┘З┘Е ╪г┘Ж ╪к╪к╪░┘Г╪▒ ╪г┘Ж ┘В╪з┘Е┘И╪│ ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪з┘Д┘Е┘П╪о╪╡╪╡ ╪│┘К┘Г┘И┘Ж ┘Е╪о╪к┘Д┘Б┘Л╪з ╪╣┘Ж ┘В╪з┘Е┘И╪│ ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒┘С╪и ┘Е╪│╪и┘В┘Л╪з. ┘К╪м╪и ╪╣┘Д┘К┘Г ╪з╪│╪к╪о╪п╪з┘Е ┘В╪з┘Е┘И╪│ ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒┘С╪и ┘Е╪│╪и┘В┘Л╪з ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е ┘Ж┘Е┘И╪░╪м┘Л╪з ┘Е┘П╪п╪▒┘С╪и┘Л╪з ┘Е╪│╪и┘В┘Л╪з╪М ┘И╪е┘Д╪з ┘Б┘Д┘Ж ╪к┘Г┘И┘Ж ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪░╪з╪к ┘Е╪╣┘Ж┘Й. ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡ ╪и╪з╪│╪к╪о╪п╪з┘Е ┘В╪з┘Е┘И╪│ ┘Ж┘Е┘И╪░╪м ┘Е┘П╪п╪▒┘С╪и ┘Е╪│╪и┘В┘Л╪з ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Б╪ж╪й [`DistilBertTokenizer`]:
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertTokenizer
|
||||
|
||||
>>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
|
||||
┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪м╪▓╪ж ┘Ж╪╡┘И╪╡ ╪│╪▒┘К╪╣ ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Б╪ж╪й [`DistilBertTokenizerFast`]:
|
||||
|
||||
```py
|
||||
>>> from transformers import DistilBertTokenizerFast
|
||||
|
||||
>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
```
|
||||
|
||||
<Tip>
|
||||
╪з┘Б╪к╪▒╪з╪╢┘К┘Л╪з╪М ╪│┘К╪н╪з┘И┘Д [`AutoTokenizer`] ╪к╪н┘Е┘К┘Д ┘Е╪м╪▓╪ж ┘Ж╪╡┘И╪╡ ╪│╪▒┘К╪╣. ┘К┘Е┘Г┘Ж┘Г ╪к╪╣╪╖┘К┘Д ┘З╪░╪з ╪з┘Д╪│┘Д┘И┘Г ╪╣┘Ж ╪╖╪▒┘К┘В ╪к╪╣┘К┘К┘Ж `use_fast=False` ┘Б┘К `from_pretrained`.
|
||||
</Tip>
|
||||
|
||||
## ┘Е╪╣╪з┘Д╪м ╪з┘Д╪╡┘И╪▒
|
||||
|
||||
┘К╪╣╪з┘Д╪м ┘Е╪╣╪з┘Д╪м ╪з┘Д╪╡┘И╪▒ ╪и┘К╪з┘Ж╪з╪к ╪з┘Д╪▒╪д┘К╪й. ┘И┘З┘И ┘К╪▒╪л ┘Е┘Ж ╪з┘Д┘Б╪ж╪й ╪з┘Д╪г╪│╪з╪│┘К╪й [`~image_processing_utils.ImageProcessingMixin`].
|
||||
|
||||
┘Д╪и┘Ж╪з╪б ┘Е╪╣╪з┘Д╪м ╪╡┘И╪▒ ╪о╪з╪╡ ╪и╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪М ╪г┘Ж╪┤╪ж ┘Е╪л┘Д╪з┘Л ┘Е┘П╪╣╪з┘Д╪м [`ViTImageProcessor`] ╪з┘Б╪к╪▒╪з╪╢┘К┘Л╪з ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е [ViT](model_doc/vit) ┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪╡┘И╪▒:
|
||||
|
||||
```py
|
||||
>>> from transformers import ViTImageProcessor
|
||||
|
||||
>>> vit_extractor = ViTImageProcessor()
|
||||
>>> print(vit_extractor)
|
||||
ViTImageProcessor {
|
||||
"do_normalize": true,
|
||||
"do_resize": true,
|
||||
"image_processor_type": "ViTImageProcessor",
|
||||
"image_mean": [
|
||||
0.5,
|
||||
0.5,
|
||||
0.5
|
||||
],
|
||||
"image_std": [
|
||||
0.5,
|
||||
0.5,
|
||||
0.5
|
||||
],
|
||||
"resample": 2,
|
||||
"size": 224
|
||||
}
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ┘Д╪з ╪к╪и╪н╪л ╪╣┘Ж ╪г┘К ╪к╪о╪╡┘К╪╡╪М ┘Б┘Е╪з ╪╣┘Д┘К┘Г ╪│┘И┘Й ╪з╪│╪к╪о╪п╪з┘Е ╪╖╪▒┘К┘В╪й `from_pretrained` ┘Д╪к╪н┘Е┘К┘Д ┘Е╪╣┘Д┘Е╪з╪к ┘Е╪╣╪з┘Д╪м ╪з┘Д╪╡┘И╪▒ ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ┘Д┘Д┘Ж┘Е┘И╪░╪м.
|
||||
|
||||
</Tip>
|
||||
|
||||
╪╣╪п┘Д ╪г┘К┘Л╪з ┘Е┘Ж ┘Е╪╣┘Д┘Е╪з╪к [`ViTImageProcessor`] ┘Д╪е┘Ж╪┤╪з╪б ┘Е╪╣╪з┘Д╪м ╪з┘Д╪╡┘И╪▒ ╪з┘Д┘Е╪о╪╡╪╡ ╪з┘Д╪о╪з╪╡ ╪и┘Г:
|
||||
|
||||
```py
|
||||
>>> from transformers import ViTImageProcessor
|
||||
|
||||
>>> my_vit_extractor = ViTImageProcessor(resample="PIL.Image.BOX", do_normalize=False, image_mean=[0.3, 0.3, 0.3])
|
||||
>>> print(my_vit_extractor)
|
||||
ViTImageProcessor {
|
||||
"do_normalize": false,
|
||||
"do_resize": true,
|
||||
"image_processor_type": "ViTImageProcessor",
|
||||
"image_mean": [
|
||||
0.3,
|
||||
0.3,
|
||||
0.3
|
||||
],
|
||||
"image_std": [
|
||||
0.5,
|
||||
0.5,
|
||||
0.5
|
||||
],
|
||||
"resample": "PIL.Image.BOX",
|
||||
"size": 224
|
||||
}
|
||||
```
|
||||
## ╪з┘Д╪╣┘Е┘И╪п ╪з┘Д┘Б┘В╪▒┘К
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Backbone.png">
|
||||
</div>
|
||||
|
||||
╪к╪к┘Г┘И┘Ж ┘Ж┘Е╪з╪░╪м ╪▒╪д┘К╪й ╪з┘Д╪н╪з╪│╪и ┘Е┘Ж ╪м╪▓╪б ╪г╪│╪з╪│┘К╪М ┘И╪м╪▓╪б ┘И╪│┘К╪╖╪М ┘И╪м╪▓╪б ┘Е╪╣╪з┘Д╪м╪й ┘Ж┘З╪з╪ж┘К. ┘К╪│╪к╪о╪▒╪м ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ╪з┘Д┘Е┘К╪▓╪з╪к ┘Е┘Ж ╪╡┘И╪▒╪й ╪з┘Д╪е╪п╪о╪з┘Д╪М ┘И┘К╪м┘Е╪╣ ╪з┘Д╪м╪▓╪б ╪з┘Д┘И╪│┘К╪╖ ┘З╪░┘З ╪з┘Д┘Е┘К╪▓╪з╪к ╪з┘Д┘Е╪│╪к╪о╪▒╪м╪й ┘И┘К╪╣╪▓╪▓┘З╪з╪М ┘И┘К┘П╪│╪к╪о╪п┘Е ╪з┘Д╪м╪▓╪б ╪з┘Д┘Ж┘З╪з╪ж┘К ┘Д┘Д┘Е┘З┘Е╪й ╪з┘Д╪▒╪ж┘К╪│┘К╪й (┘Е╪л┘Д ╪з┘Г╪к╪┤╪з┘Б ╪з┘Д╪г╪м╪│╪з┘Е). ╪з╪и╪п╪г ╪╣╪и╪к┘З┘К╪ж╪й ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘Б┘К ╪к┘Г┘И┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪н╪п╪п ┘Е╪з ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪к╪н┘Е┘К┘Д ╪г┘И╪▓╪з┘Ж ┘Е╪п╪▒╪и╪й ┘Е╪│╪и┘В┘Л╪з ╪г┘И ╪г┘И╪▓╪з┘Ж┘Л╪з ╪╣╪┤┘И╪з╪ж┘К╪й. ╪и╪╣╪п ╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪к┘Е╪▒┘К╪▒ ╪к┘Г┘И┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й ╪м╪▓╪б ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Ж┘З╪з╪ж┘К.
|
||||
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘Д╪к╪н┘Е┘К┘Д [ResNet](../model_doc/resnet) backbone ┘Б┘К ┘Ж┘Е┘И╪░╪м [MaskFormer](../model_doc/maskformer) ┘Е╪╣ ╪▒╪г╪│ ╪к╪м╪▓╪ж╪й ┘Е╪л┘К┘Д:
|
||||
|
||||
<hfoptions id="backbone">
|
||||
<hfoption id="pretrained weights">
|
||||
|
||||
┘В┘Е ╪и╪к╪╣┘К┘К┘Ж `use_pretrained_backbone=True` ┘Д╪к╪н┘Е┘К┘Д ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е╪│╪и┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и ┘Д┘А ResNet ┘Д┘Д╪╣┘Е┘И╪п ╪з┘Д┘Б┘В╪▒┘К.
|
||||
|
||||
```py
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
|
||||
|
||||
config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True) # ╪к┘Г┘И┘К┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘И╪з┘Д╪м╪▓╪б ╪з┘Д┘И╪│┘К╪╖
|
||||
model = MaskFormerForInstanceSegmentation(config) # ╪м╪▓╪б ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Ж┘З╪з╪ж┘К
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="random weights">
|
||||
|
||||
┘В┘Е ╪и╪к╪╣┘К┘К┘Ж `use_pretrained_backbone=False` ┘Д╪к┘З┘К╪ж╪й ╪м╪▓╪б ResNet ╪з┘Д╪г╪│╪з╪│┘К ╪и╪┤┘Г┘Д ╪╣╪┤┘И╪з╪ж┘К.
|
||||
|
||||
```py
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
|
||||
|
||||
config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=False) # ╪к┘Г┘И┘К┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘И╪з┘Д╪м╪▓╪б ╪з┘Д┘И╪│┘К╪╖
|
||||
model = MaskFormerForInstanceSegmentation(config) # ╪м╪▓╪б ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Ж┘З╪з╪ж┘К
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ╪и╪┤┘Г┘Д ┘Е┘Ж┘Б╪╡┘Д╪М ╪л┘Е ╪к┘Е╪▒┘К╪▒┘З ╪е┘Д┘Й ╪к┘Г┘И┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м.```
|
||||
|
||||
```py
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, ResNetConfig
|
||||
|
||||
backbone_config = ResNetConfig()
|
||||
config = MaskFormerConfig(backbone_config=backbone_config)
|
||||
model = MaskFormerForInstanceSegmentation(config)
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="timm backbone">
|
||||
|
||||
┘К╪к┘Е ╪к╪н┘Е┘К┘Д ┘Ж┘Е╪з╪░╪м [timm](https://hf.co/docs/timm/index) ╪п╪з╪о┘Д ┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е `use_timm_backbone=True` ╪г┘И ╪и╪з╪│╪к╪о╪п╪з┘Е [`TimmBackbone`] ┘И [`TimmBackboneConfig`].
|
||||
|
||||
╪з╪│╪к╪о╪п┘Е `use_timm_backbone=True` ┘И `use_pretrained_backbone=True` ┘Д╪к╪н┘Е┘К┘Д ╪г┘И╪▓╪з┘Ж timm ╪з┘Д┘Е┘П╪п╪▒┘С╪и╪й ┘Е╪│╪и┘В┘Л╪з ┘Д┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К.
|
||||
|
||||
```python
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
|
||||
|
||||
config = MaskFormerConfig(backbone="resnet50", use_pretrained_backbone=True, use_timm_backbone=True) # ╪к┘Г┘И┘К┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘И╪з┘Д╪м╪▓╪б ╪з┘Д┘И╪│┘К╪╖
|
||||
model = MaskFormerForInstanceSegmentation(config) # ╪м╪▓╪б ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Ж┘З╪з╪ж┘К
|
||||
```
|
||||
|
||||
┘В┘Е ╪и╪к╪╣┘К┘К┘Ж `use_timm_backbone=True` ┘И `use_pretrained_backbone=False` ┘Д╪к╪н┘Е┘К┘Д ╪╣┘Е┘И╪п ┘Б┘В╪▒┘К timm ┘Е╪и╪п╪ж┘К ╪╣╪┤┘И╪з╪ж┘К.
|
||||
|
||||
```python
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
|
||||
|
||||
config = MaskFormerConfig(backbone="resnet50", use_pretrained_backbone=False, use_timm_backbone=True) # ╪к┘Г┘И┘К┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘И╪з┘Д╪м╪▓╪б ╪з┘Д┘И╪│┘К╪╖
|
||||
model = MaskFormerForInstanceSegmentation(config) # ╪м╪▓╪б ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Ж┘З╪з╪ж┘К
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘И╪з╪│╪к╪о╪п╪з┘Е┘З ┘Д╪е┘Ж╪┤╪з╪б `TimmBackbone` ╪г┘И ╪к┘Е╪▒┘К╪▒┘З ╪е┘Д┘Й ╪к┘Г┘И┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м. ╪│┘К╪к┘Е ╪к╪н┘Е┘К┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д╪м╪▓╪б ╪з┘Д╪г╪│╪з╪│┘К ┘Д┘А Timm ╪з┘Д┘Е┘П╪п╪▒┘С╪и╪й ┘Е╪│╪и┘В┘Л╪з ╪з┘Б╪к╪▒╪з╪╢┘К┘Л╪з. ╪╣┘К┘С┘Ж `use_pretrained_backbone=False` ┘Д╪к╪н┘Е┘К┘Д ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е╪и╪п╪ж┘К╪й ╪з┘Д╪╣╪┤┘И╪з╪ж┘К╪й.
|
||||
|
||||
```python
|
||||
from transformers import TimmBackboneConfig, TimmBackbone
|
||||
|
||||
backbone_config = TimmBackboneConfig("resnet50", use_pretrained_backbone=False)
|
||||
|
||||
# ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪л┘К┘Д ┘Е┘Ж ╪з┘Д╪╣┘Е┘И╪п ╪з┘Д┘Б┘В╪▒┘К
|
||||
backbone = TimmBackbone(config=backbone_config)
|
||||
|
||||
# ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ╪╣┘Е┘И╪п ┘Б┘В╪▒┘К timm
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
|
||||
|
||||
config = MaskFormerConfig(backbone_config=backbone_config)
|
||||
model = MaskFormerForInstanceSegmentation(config)
|
||||
```
|
||||
|
||||
## ┘Е╪│╪к╪о╪▒╪м ╪з┘Д┘Е┘К╪▓╪з╪к
|
||||
|
||||
┘К┘В┘И┘Е ┘Е┘П╪│╪к╪о╪▒╪м ╪з┘Д┘Е┘К╪▓╪з╪к ╪и┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪з┘Д╪╡┘И╪к┘К╪й. ┘К╪▒╪л ┘Е┘Ж ┘Б╪ж╪й ╪з┘Д╪г╪│╪з╪│ [`~feature_extraction_utils.FeatureExtractionMixin`]╪М ┘И┘В╪п ┘К╪▒╪л ╪г┘К╪╢┘Л╪з ┘Е┘Ж ┘Б╪ж╪й [`SequenceFeatureExtractor`] ┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪з┘Д╪╡┘И╪к┘К╪й.
|
||||
|
||||
┘Д┘Д╪з╪│╪к╪о╪п╪з┘Е╪М ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ┘Е╪▒╪к╪и╪╖ ╪и╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪░┘К ╪к╪│╪к╪о╪п┘Е┘З. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к Wav2Vec2 ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е [Wav2Vec2](model_doc/wav2vec2) ┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪╡┘И╪к:
|
||||
|
||||
```py
|
||||
>>> from transformers import Wav2Vec2FeatureExtractor
|
||||
|
||||
>>> w2v2_extractor = Wav2Vec2FeatureExtractor()
|
||||
>>> print(w2v2_extractor)
|
||||
Wav2Vec2FeatureExtractor {
|
||||
"do_normalize": true,
|
||||
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
|
||||
"feature_size": 1,
|
||||
"padding_side": "right",
|
||||
"padding_value": 0.0,
|
||||
"return_attention_mask": false,
|
||||
"sampling_rate": 16000
|
||||
}
|
||||
```
|
||||
|
||||
<Tip>
|
||||
╪е╪░╪з ┘Д┘Е ╪к┘Г┘Ж ╪и╪н╪з╪м╪й ┘Д╪г┘К ╪к╪о╪╡┘К╪╡╪М ┘Б╪з╪│╪к╪о╪п┘Е ┘Б┘В╪╖ ╪╖╪▒┘К┘В╪й `from_pretrained` ┘Д╪к╪н┘Е┘К┘Д ┘Е╪╣┘Д┘Е╪з╪к ┘Е╪│╪к╪о╪▒╪м ╪з┘Д┘Е┘К╪▓╪з╪к ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ┘Д┘Д┘Ж┘Е┘И╪░╪м.
|
||||
</Tip>
|
||||
|
||||
┘В┘Е ╪и╪к╪╣╪п┘К┘Д ╪г┘К ┘Е┘Ж ┘Е╪╣┘Д┘Е╪з╪к [`Wav2Vec2FeatureExtractor`] ┘Д╪е┘Ж╪┤╪з╪б ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ┘Е╪о╪╡╪╡:
|
||||
|
||||
```py
|
||||
>>> from transformers import Wav2Vec2FeatureExtractor
|
||||
|
||||
>>> w2v2_extractor = Wav2Vec2FeatureExtractor(sampling_rate=8000╪М do_normalize=False)
|
||||
>>> print(w2v2_extractor)
|
||||
Wav2Vec2FeatureExtractor {
|
||||
"do_normalize": false,
|
||||
"feature_extractor_type": "Wav2Vec2FeatureExtractor"╪М
|
||||
"feature_size": 1╪М
|
||||
"padding_side": "right"╪М
|
||||
"padding_value": 0.0╪М
|
||||
"return_attention_mask": false╪М
|
||||
"sampling_rate": 8000
|
||||
}
|
||||
```
|
||||
|
||||
## ╪з┘Д┘Е╪╣╪з┘Д╪м
|
||||
|
||||
╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ╪к╪п╪╣┘Е ┘Е┘З╪з┘Е ╪з┘Д┘И╪│╪з╪ж╪╖ ╪з┘Д┘Е╪к╪╣╪п╪п╪й╪М ╪к┘И┘Б╪▒ ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers ┘Б╪ж╪й ┘Е╪╣╪з┘Д╪м ╪к╪м┘Е╪╣ ╪и┘Б╪з╪╣┘Д┘К╪й ┘Б╪ж╪з╪к ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ┘Е╪л┘Д ┘Е╪│╪к╪о╪▒╪м ╪з┘Д┘Е┘К╪▓╪з╪к ┘И┘Е┘В╪│┘С┘Е ╪з┘Д╪▒┘Е┘И╪▓ ┘Б┘К ┘Г╪з╪ж┘Ж ┘И╪з╪н╪п. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪п╪╣┘Ж╪з ┘Ж╪│╪к╪о╪п┘Е [`Wav2Vec2Processor`] ┘Д┘Е┘З┘Е╪й ╪з┘Д╪к╪╣╪▒┘Б ╪з┘Д╪в┘Д┘К ╪╣┘Д┘Й ╪з┘Д┘Г┘Д╪з┘Е (ASR). ╪к┘В┘И┘Е ┘Е┘З┘Е╪й ASR ╪и╪к╪н┘И┘К┘Д ╪з┘Д╪╡┘И╪к ╪е┘Д┘Й ┘Ж╪╡╪М ┘Д╪░┘Д┘Г ╪│╪к╪н╪к╪з╪м ╪е┘Д┘Й ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ┘И┘Е┘В╪│┘С┘Е ╪▒┘Е┘И╪▓.
|
||||
|
||||
┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪з┘Д╪╡┘И╪к┘К╪й:
|
||||
|
||||
```py
|
||||
>>> from transformers import Wav2Vec2FeatureExtractor
|
||||
|
||||
>>> feature_extractor = Wav2Vec2FeatureExtractor(padding_value=1.0, do_normalize=True)
|
||||
```
|
||||
|
||||
┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е┘В╪│┘С┘Е ╪▒┘Е┘И╪▓ ┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪з┘Д┘Ж╪╡┘К╪й:
|
||||
|
||||
```py
|
||||
>>> from transformers import Wav2Vec2CTCTokenizer
|
||||
|
||||
>>> tokenizer = Wav2Vec2CTCTokenizer(vocab_file="my_vocab_file.txt")
|
||||
```
|
||||
|
||||
┘В┘Е ╪и╪п┘Е╪м ┘Е╪│╪к╪о╪▒╪м ╪з┘Д┘Е┘К╪▓╪з╪к ┘И┘Е┘В╪│┘С┘Е ╪з┘Д╪▒┘Е┘И╪▓ ┘Б┘К [`Wav2Vec2Processor`]:
|
||||
|
||||
```py
|
||||
>>> from transformers import Wav2Vec2Processor
|
||||
|
||||
>>> processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
```
|
||||
|
||||
╪и╪з╪│╪к╪о╪п╪з┘Е ┘Б╪ж╪к┘К┘Ж ╪г╪│╪з╪│┘К╪к┘К┘Ж - ╪з┘Д╪к┘Г┘И┘К┘Ж ┘И╪з┘Д┘Ж┘Е┘И╪░╪м - ╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ┘Б╪ж╪й ┘Е╪╣╪з┘Д╪м╪й ┘Е╪│╪и┘В (┘Е┘В╪│┘С┘Е ╪▒┘Е┘И╪▓ ╪г┘И ┘Е╪╣╪з┘Д╪м ╪╡┘И╪▒╪й ╪г┘И ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ╪г┘И ┘Е╪╣╪з┘Д╪м)╪М ┘К┘Е┘Г┘Ж┘Г ╪е┘Ж╪┤╪з╪б ╪г┘К ┘Е┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ╪к╪п╪╣┘Е┘З╪з ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers. ┘К┘Е┘Г┘Ж ╪к┘Г┘И┘К┘Ж ┘Г┘Д ┘Е┘Ж ┘З╪░┘З ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪г╪│╪з╪│┘К╪й╪М ┘Е┘Е╪з ┘К╪│┘Е╪н ┘Д┘Г ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪│┘Е╪з╪к ╪з┘Д┘Е╪╖┘Д┘И╪и╪й. ┘К┘Е┘Г┘Ж┘Г ╪и╪│┘З┘И┘Д╪й ╪к┘З┘К╪ж╪й ┘Ж┘Е┘И╪░╪м ┘Д┘Д╪к╪п╪▒┘К╪и ╪г┘И ╪к╪╣╪п┘К┘Д ┘Ж┘Е┘И╪░╪м ┘Е╪п╪▒╪и ┘Е╪│╪и┘В╪з┘Л ┘Д╪е╪м╪▒╪з╪б ╪╢╪и╪╖ ╪п┘В┘К┘В.
|
||||
323
docs/source/ar/custom_models.md
Normal file
323
docs/source/ar/custom_models.md
Normal file
@ -0,0 +1,323 @@
|
||||
# ╪и┘Ж╪з╪б ┘Ж┘Е╪з╪░╪м ┘Е╪о╪╡╪╡╪й
|
||||
|
||||
╪к┘Е ╪к╪╡┘Е┘К┘Е ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers ┘Д╪к┘Г┘И┘Ж ┘В╪з╪и┘Д╪й ┘Д┘Д╪к┘И╪│┘К╪╣ ╪и╪│┘З┘И┘Д╪й. ┘Г┘Д ┘Ж┘Е┘И╪░╪м ┘Е┘П╪┤┘Б┘С╪▒ ╪и╪з┘Д┘Г╪з┘Е┘Д ┘Б┘К ┘Е╪м┘Д╪п ┘Б╪▒╪╣┘К ┘Е╪╣┘К┘Ж ╪и╪з┘Д┘Е╪│╪к┘И╪п╪╣╪М ╪п┘И┘Ж ╪г┘К ╪к╪м╪▒┘К╪п╪М ┘Д╪░┘Д┘Г ┘К┘Е┘Г┘Ж┘Г ╪и╪│┘З┘И┘Д╪й ┘Ж╪│╪о ┘Е┘Д┘Б ╪з┘Д┘Ж┘Е╪░╪м╪й ┘И╪к╪╣╪п┘К┘Д┘З ┘И┘Б┘В┘Л╪з ┘Д╪з╪н╪к┘К╪з╪м╪з╪к┘Г.
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к┘П┘Ж╪┤╪ж ┘Ж┘Е┘И╪░╪м┘Л╪з ╪м╪п┘К╪п┘Л╪з ╪к┘Е╪з┘Е┘Л╪з╪М ┘Б┘В╪п ┘К┘Г┘И┘Ж ┘Е┘Ж ╪з┘Д╪г╪│┘З┘Д ╪з┘Д╪и╪п╪б ┘Е┘Ж ╪з┘Д╪╡┘Б╪▒. ┘Б┘К ┘З╪░╪з ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д╪к╪╣┘Д┘К┘Е┘К╪М ╪│┘Ж┘П╪▒┘Р┘К┘Г ┘Г┘К┘Б┘К╪й ┘Г╪к╪з╪и╪й ┘Ж┘Е┘И╪░╪м ┘Е╪о╪╡╪╡ ┘И╪к┘Г┘И┘К┘Ж┘З ┘Д┘К┘П╪│╪к╪о╪п┘Е ╪п╪з╪о┘Д Transformers╪М ┘И┘Г┘К┘Б┘К╪й ┘Е╪┤╪з╪▒┘Г╪к┘З ┘Е╪╣ ╪з┘Д┘Е╪м╪к┘Е╪╣ (┘Е╪╣ ╪з┘Д┘Г┘И╪п ╪з┘Д╪░┘К ┘К╪╣╪к┘Е╪п ╪╣┘Д┘К┘З) ╪и╪н┘К╪л ┘К┘Е┘Г┘Ж ┘Д╪г┘К ╪┤╪о╪╡ ╪з╪│╪к╪о╪п╪з┘Е┘З╪М ╪н╪к┘Й ╪е╪░╪з ┘Д┘Е ┘К┘Г┘Ж ┘Е┘И╪м┘И╪п┘Л╪з ┘Б┘К ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers. ╪│┘Ж╪▒┘Й ┘Г┘К┘Б┘К╪й ╪з┘Д╪и┘Ж╪з╪б ╪╣┘Д┘Й ╪з┘Д┘Е╪н┘И┘Д╪з╪к ┘И┘Ж┘И╪│┘С╪╣ ╪з┘Д╪е╪╖╪з╪▒ ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Д╪к╪╣╪п┘К┘Д ╪│┘Д┘И┘Г ╪з┘Д╪е╪╖╪з╪▒ (hooks) ┘И╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪з┘Д┘Е╪о╪╡╪╡╪й.
|
||||
|
||||
╪│┘Ж┘И╪╢╪н ┘Г┘Д ┘З╪░╪з ┘Е┘Ж ╪о┘Д╪з┘Д ┘Ж┘Е┘И╪░╪м ResNet╪М ╪и╪к╪║┘Д┘К┘Б ┘Б╪ж╪й ResNet ┘Е┘Ж
|
||||
[┘Е┘Г╪к╪и╪й timm](https://github.com/rwightman/pytorch-image-models) ╪п╪з╪о┘Д [`PreTrainedModel`].
|
||||
|
||||
## ┘Г╪к╪з╪и╪й ╪е╪╣╪п╪з╪п╪з╪к ┘Е╪о╪╡╪╡╪й
|
||||
|
||||
┘Д┘Ж╪и╪п╪г ╪и┘Г╪к╪з╪и╪й ╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м. ╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д┘Ж┘Е┘И╪░╪м ┘З┘И ┘Г╪з╪ж┘Ж┘М ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ╪м┘Е┘К╪╣ ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к ╪з┘Д┘Д╪з╪▓┘Е╪й ┘Д╪и┘Ж╪з╪ж┘З. ┘Г┘Е╪з ╪│┘Ж╪▒┘Й ┘Д╪з╪н┘В┘Л╪з╪М ┘К╪к╪╖┘Д╪и ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Г╪з╪ж┘Ж `config` ┘Д╪к┘З┘К╪ж╪к┘З╪М ┘Д╪░╪з ┘К╪м╪и ╪г┘Ж ┘К┘Г┘И┘Ж ┘З╪░╪з ╪з┘Д┘Г╪з╪ж┘Ж ┘Г╪з┘Е┘Д╪з┘Л.
|
||||
|
||||
<Tip>
|
||||
|
||||
╪к╪к╪и╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Б┘К ┘Е┘Г╪к╪и╪й `transformers` ╪з╪к┘Б╪з┘В┘К╪й ┘В╪и┘И┘Д ┘Г╪з╪ж┘Ж `config` ┘Б┘К ╪п╪з┘Д╪й `__init__` ╪з┘Д╪о╪з╪╡╪й ╪и┘З╪з╪М ╪л┘Е ╪к┘Е╪▒╪▒ ┘Г╪з╪ж┘Ж `config` ╪и╪з┘Д┘Г╪з┘Е┘Д ╪е┘Д┘Й ╪з┘Д╪╖╪и┘В╪з╪к ╪з┘Д┘Б╪▒╪╣┘К╪й ┘Б┘К ╪з┘Д┘Ж┘Е┘И╪░╪м╪М ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪к┘В╪│┘К┘Е┘З ╪е┘Д┘Й ┘Е╪╣╪з┘Ея╗╗╪к ┘Е╪к╪╣╪п╪п╪й. ┘К╪д╪п┘К ┘Г╪к╪з╪и╪й ┘Ж┘Е┘И╪░╪м┘Г ╪и┘З╪░╪з ╪з┘Д╪г╪│┘Д┘И╪и ╪е┘Д┘Й ┘Г┘И╪п ╪г╪и╪│╪╖ ┘Е╪╣ "┘Е╪╡╪п╪▒ ╪н┘В┘К┘В╪й" ┘И╪з╪╢╪н ┘Д╪г┘К ┘Б╪▒╪╖ ┘Е╪╣┘Д┘Е╪з╪к╪М ┘Г┘Е╪з ┘К╪│┘З┘Д ╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Г┘И╪п ┘Е┘Ж ┘Ж┘Е╪з╪░╪м ╪г╪о╪▒┘Й ┘Б┘К `transformers`.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘Б┘К ┘Е╪л╪з┘Д┘Ж╪з╪М ╪│┘Ж╪╣╪п┘С┘Д ╪и╪╣╪╢ ╪з┘Д┘И╪│╪з╪ж╪╖ ┘Б┘К ┘Б╪ж╪й ResNet ╪з┘Д╪к┘К ┘В╪п ┘Ж╪▒╪║╪и ┘Б┘К ╪╢╪и╪╖┘З╪з. ╪│╪к╪╣╪╖┘К┘Ж╪з ╪з┘Д╪к┘Г┘И┘К┘Ж╪з╪к ╪з┘Д┘Е╪о╪к┘Д┘Б╪й ╪г┘Ж┘И╪з╪╣ ResNets ╪з┘Д┘Е╪о╪к┘Д┘Б╪й ╪з┘Д┘Е┘Е┘Г┘Ж╪й. ╪│┘Ж┘В┘И┘Е ╪и╪к╪о╪▓┘К┘Ж ┘З╪░┘З ╪з┘Д┘И╪│╪з╪ж╪╖ ╪и╪╣╪п ╪з┘Д╪к╪н┘В┘В ┘Е┘Ж ╪╡╪н╪к┘З.
|
||||
|
||||
```python
|
||||
from transformers import PretrainedConfig
|
||||
from typing import List
|
||||
|
||||
|
||||
class ResnetConfig(PretrainedConfig):
|
||||
model_type = "resnet"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
block_type="bottleneck",
|
||||
layers: List[int] = [3, 4, 6, 3],
|
||||
num_classes: int = 1000,
|
||||
input_channels: int = 3,
|
||||
cardinality: int = 1,
|
||||
base_width: int = 64,
|
||||
stem_width: int = 64,
|
||||
stem_type: str = "",
|
||||
avg_down: bool = False,
|
||||
**kwargs,
|
||||
):
|
||||
if block_type not in ["basic", "bottleneck"]:
|
||||
raise ValueError(f"`block_type` must be 'basic' or bottleneck', got {block_type}.")
|
||||
if stem_type not in ["", "deep", "deep-tiered"]:
|
||||
raise ValueError(f"`stem_type` must be '', 'deep' or 'deep-tiered', got {stem_type}.")
|
||||
|
||||
self.block_type = block_type
|
||||
self.layers = layers
|
||||
self.num_classes = num_classes
|
||||
self.input_channels = input_channels
|
||||
self.cardinality = cardinality
|
||||
self.base_width = base_width
|
||||
self.stem_width = stem_width
|
||||
self.stem_type = stem_type
|
||||
self.avg_down = avg_down
|
||||
super().__init__(**kwargs)
|
||||
```
|
||||
╪з┘Д╪г╪┤┘К╪з╪б ╪з┘Д╪л┘Д╪з╪л╪й ╪з┘Д┘Е┘З┘Е╪й ╪з┘Д╪к┘К ┘К╪м╪и ╪к╪░┘Г╪▒┘З╪з ╪╣┘Ж╪п ┘Г╪к╪з╪и╪й ╪к┘Г┘И┘К┘Ж┘Г ╪з┘Д╪о╪з╪╡ ┘З┘К:
|
||||
|
||||
- ┘К╪м╪и ╪г┘Ж ╪к╪▒╪л ┘Е┘Ж `PretrainedConfig`╪М
|
||||
- ┘К╪м╪и ╪г┘Ж ╪к┘В╪и┘Д ╪п╪з┘Д╪й `__init__` ╪з┘Д╪о╪з╪╡╪й ╪и┘А `PretrainedConfig` ╪г┘К ┘Е╪╣╪з┘Ея╗╗╪к ╪е╪╢╪з┘Б┘К╪й kwargs╪М
|
||||
- ┘К╪м╪и ╪к┘Е╪▒┘К╪▒ ┘З╪░┘З ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ╪з┘Д╪е╪╢╪з┘Б┘К╪й ╪е┘Д┘Й ╪п╪з┘Д╪й `__init__` ┘Б┘Й ╪з┘Д┘Б╪ж╪й ╪з┘Д╪г╪│╪з╪│┘К╪й ╪з┘Д╪з╪╣┘Д┘Й.
|
||||
|
||||
┘К╪╢┘Е┘Ж ╪з┘Д╪е╪▒╪л ╪н╪╡┘И┘Д┘Г ╪╣┘Д┘Й ╪м┘Е┘К╪╣ ╪з┘Д┘И╪╕╪з╪ж┘Б ┘Е┘Ж ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers╪М ┘Б┘К ╪н┘К┘Ж ╪г┘Ж ╪з┘Д┘В┘К╪п┘К┘Ж ╪з┘Д╪к╪з┘Ж┘Й ┘И╪з┘Д╪л╪з┘Д╪л ┘К╪г╪к┘К╪з┘Ж ┘Е┘Ж ╪н┘В┘К┘В╪й ╪г┘Ж `PretrainedConfig` ┘Д╪п┘К┘З ╪з┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪н┘В┘И┘Д ╪г┘Г╪л╪▒ ┘Е┘Ж ╪к┘Д┘Г ╪з┘Д╪к┘К ╪к┘В┘И┘Е ╪и╪к╪╣┘К┘К┘Ж┘З╪з. ╪╣┘Ж╪п ╪е╪╣╪з╪п╪й ╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж ╪и╪з╪│╪к╪о╪п╪з┘Е ╪╖╪▒┘К┘В╪й `from_pretrained`╪М ┘К╪м╪и ╪г┘Ж ┘К┘В╪и┘Д ╪к┘Г┘И┘К┘Ж┘Г ┘З╪░┘З ╪з┘Д╪н┘В┘И┘Д ╪л┘Е ╪е╪▒╪│╪з┘Д┘З╪з ╪е┘Д┘Й ╪з┘Д┘Б╪ж╪й ╪з┘Д╪г╪│╪з╪│┘К╪й ╪з┘Д╪г╪╣┘Д┘Й.
|
||||
|
||||
╪к╪н╪п┘К╪п `model_type` ┘Д╪к┘Г┘И┘К┘Ж┘Г (┘З┘Ж╪з `model_type="resnet"`) ┘Д┘К╪│ ╪е┘Д╪▓╪з┘Е┘К┘Л╪з╪М ┘Е╪з ┘Д┘Е ╪к╪▒╪║╪и ┘Б┘К
|
||||
╪к╪│╪м┘К┘Д ┘Ж┘Е┘И╪░╪м┘Г ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й (╪▒╪з╪м╪╣ ╪з┘Д┘В╪│┘Е ╪з┘Д╪г╪о┘К╪▒).
|
||||
|
||||
┘Е╪╣ ╪з┘Д┘В┘К╪з┘Е ╪и╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж┘Г ╪и╪│┘З┘И┘Д╪й ╪е┘Ж╪┤╪з╪б ╪к┘Г┘И┘К┘Ж┘Г ┘И╪н┘Б╪╕┘З ┘Е╪л┘Д┘Е╪з ╪к┘Б╪╣┘Д ┘Е╪╣ ╪г┘К ╪к┘Г┘И┘К┘Ж ┘Ж┘Е┘И╪░╪м ╪в╪о╪▒ ┘Б┘К
|
||||
╪з┘Д┘Е┘Г╪к╪и╪й. ╪е┘Д┘К┘Г ┘Г┘К┘Б┘К╪й ╪е┘Ж╪┤╪з╪б ╪к┘Г┘И┘К┘Ж resnet50d ┘И╪н┘Б╪╕┘З:
|
||||
|
||||
```py
|
||||
resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
|
||||
resnet50d_config.save_pretrained("custom-resnet")
|
||||
```
|
||||
|
||||
╪│┘К╪д╪п┘К ┘З╪░╪з ╪е┘Д┘Й ╪н┘Б╪╕ ┘Е┘Д┘Б ╪и╪з╪│┘Е `config.json` ╪п╪з╪о┘Д ┘Е╪м┘Д╪п `custom-resnet`. ┘К┘Е┘Г┘Ж┘Г ╪и╪╣╪п ╪░┘Д┘Г ╪е╪╣╪з╪п╪й ╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж┘Г ╪и╪з╪│╪к╪о╪п╪з┘Е
|
||||
╪╖╪▒┘К┘В╪й `from_pretrained`:
|
||||
|
||||
```py
|
||||
resnet50d_config = ResnetConfig.from_pretrained("custom-resnet")
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪з╪│╪к╪о╪п╪з┘Е ╪г┘К ╪╖╪▒┘К┘В╪й ╪г╪о╪▒┘Й ┘Е┘Ж ┘Б╪ж╪й [`PretrainedConfig`]╪М ┘Е╪л┘Д [`~PretrainedConfig.push_to_hub`] ┘Д╪к╪н┘Е┘К┘Д ╪к┘Г┘И┘К┘Ж┘Г ┘Е╪и╪з╪┤╪▒╪й ╪е┘Д┘Й Hub.
|
||||
|
||||
## ┘Г╪к╪з╪и╪й ┘Ж┘Е┘И╪░╪м ┘Е╪о╪╡╪╡
|
||||
|
||||
╪з┘Д╪в┘Ж ╪и╪╣╪п ╪г┘Ж ╪г╪╡╪и╪н ┘Д╪п┘К┘Ж╪з ╪к┘Г┘И┘К┘Ж ResNet╪М ┘К┘Е┘Г┘Ж┘Ж╪з ╪з┘Д┘Е╪к╪з╪и╪╣╪й ┘Д╪е┘Ж╪┤╪з╪б ┘Ж┘Е┘И╪░╪м┘К┘Ж: ╪з┘Д╪г┘И┘Д ┘К╪│╪к╪о╪▒╪м ╪з┘Д┘Е┘К╪▓╪з╪к ╪з┘Д┘Е╪о┘Б┘К╪й ┘Е┘Ж ╪п┘Б╪╣╪й ┘Е┘Ж ╪з┘Д╪╡┘И╪▒ (┘Е╪л┘Д [`BertModel`]) ┘И╪з┘Д╪в╪о╪▒ ┘Е┘Ж╪з╪│╪и ┘Д╪к╪╡┘Ж┘К┘Б ╪з┘Д╪╡┘И╪▒ (┘Е╪л┘Д [`BertForSequenceClassification`]).
|
||||
|
||||
┘Г┘Е╪з ╪░┘Г╪▒┘Ж╪з ╪│╪з╪и┘В┘Л╪з╪М ╪│┘Ж┘В┘И┘Е ╪и╪и┘Ж╪з╪б ┘Ж┘Е┘И╪░╪м ┘Е╪и╪│╪╖ ┘Д╪к╪│┘З┘К┘Д ╪з┘Д┘Б┘З┘Е ┘Б┘К ┘З╪░╪з ╪з┘Д┘Е╪л╪з┘Д. ╪з┘Д╪о╪╖┘И╪й ╪з┘Д┘И╪н┘К╪п╪й ╪з┘Д┘Е╪╖┘Д┘И╪и╪й ┘В╪и┘Д ┘Г╪к╪з╪и╪й ┘З╪░┘З ╪з┘Д┘Б╪ж╪й ┘З┘К ┘Д╪▒╪и╪╖ ╪г┘Ж┘И╪з╪╣ ┘И╪н╪п╪з╪к ╪з┘Д╪и┘Ж╪з╪б ╪и┘Б╪ж╪з╪к ╪░╪з╪к ┘И╪н╪п╪з╪к ╪и┘Ж╪з╪б ┘Б╪╣┘Д┘К╪й. ╪и╪╣╪п ╪░┘Д┘Г╪М ┘К┘П╪╣╪▒┘С┘Б ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е┘Ж ╪о┘Д╪з┘Д ╪з┘Д╪к┘Г┘И┘К┘Ж ╪╣╪и╪▒ ╪к┘Е╪▒┘К╪▒ ┘Г┘Д ╪┤┘К╪б ╪е┘Д┘Й ┘Б╪ж╪й `ResNet`:
|
||||
|
||||
```py
|
||||
from transformers import PreTrainedModel
|
||||
from timm.models.resnet import BasicBlock, Bottleneck, ResNet
|
||||
from .configuration_resnet import ResnetConfig
|
||||
|
||||
|
||||
BLOCK_MAPPING = {"basic": BasicBlock, "bottleneck": Bottleneck}
|
||||
|
||||
|
||||
class ResnetModel(PreTrainedModel):
|
||||
config_class = ResnetConfig
|
||||
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
block_layer = BLOCK_MAPPING[config.block_type]
|
||||
self.model = ResNet(
|
||||
block_layer,
|
||||
config.layers,
|
||||
num_classes=config.num_classes,
|
||||
in_chans=config.input_channels,
|
||||
cardinality=config.cardinality,
|
||||
base_width=config.base_width,
|
||||
stem_width=config.stem_width,
|
||||
stem_type=config.stem_type,
|
||||
avg_down=config.avg_down,
|
||||
)
|
||||
|
||||
def forward(self, tensor):
|
||||
return self.model.forward_features(tensor)
|
||||
```
|
||||
|
||||
╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪░┘К ╪│┘К╪╡┘Ж┘Б ╪з┘Д╪╡┘И╪▒╪М ┘Б╪е┘Ж┘Ж╪з ┘Ж╪║┘К╪▒ ┘Б┘В╪╖ ╪╖╪▒┘К┘В╪й ╪з┘Д╪к┘В╪п┘К┘Е:
|
||||
|
||||
```py
|
||||
import torch
|
||||
|
||||
|
||||
class ResnetModelForImageClassification(PreTrainedModel):
|
||||
config_class = ResnetConfig
|
||||
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
block_layer = BLOCK_MAPPING[config.block_type]
|
||||
self.model = ResNet(
|
||||
block_layer,
|
||||
config.layers,
|
||||
num_classes=config.num_classes,
|
||||
in_chans=config.input_channels,
|
||||
cardinality=config.cardinality,
|
||||
base_width=config.base_width,
|
||||
stem_width=config.stem_width,
|
||||
stem_type=config.stem_type,
|
||||
avg_down=config.avg_down,
|
||||
)
|
||||
|
||||
def forward(self, tensor, labels=None):
|
||||
logits = self.model(tensor)
|
||||
if labels is not None:
|
||||
loss = torch.nn.cross_entropy(logits, labels)
|
||||
return {"loss": loss, "logits": logits}
|
||||
return {"logits": logits}
|
||||
```
|
||||
┘Б┘К ┘Г┘Д╪к╪з ╪з┘Д╪н╪з┘Д╪к┘К┘Ж╪М ┘Д╪з╪н╪╕ ┘Г┘К┘Б ┘Ж╪▒╪л ┘Е┘Ж `PreTrainedModel` ┘И┘Ж╪│╪к╪п╪╣┘К ┘Е┘П┘З┘К╪ж ╪з┘Д┘Б╪ж╪й ╪з┘Д╪▒╪ж┘К╪│┘К╪й ╪и╪з╪│╪к╪о╪п╪з┘Е `config` (┘Г┘Е╪з ╪к┘Б╪╣┘Д ╪╣┘Ж╪п ╪е┘Ж╪┤╪з╪б ┘И╪н╪п╪й `torch.nn.Module` ╪╣╪з╪п┘К╪й). ┘Д┘К╪│ ┘Е┘Ж ╪з┘Д╪╢╪▒┘И╪▒┘К ╪к╪╣╪▒┘К┘Б `config_class` ╪е┘Д╪з ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒╪║╪и ┘Б┘К ╪к╪│╪м┘К┘Д ┘Ж┘Е┘И╪░╪м┘Г ┘Е╪╣ ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й (╪▒╪з╪м╪╣ ╪з┘Д┘В╪│┘Е ╪з┘Д╪г╪о┘К╪▒).
|
||||
|
||||
<Tip>
|
||||
|
||||
╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Г ┘Е╪┤╪з╪и┘З┘Л╪з ╪м╪п┘Л╪з ┘Д┘Ж┘Е┘И╪░╪м ╪п╪з╪о┘Д ╪з┘Д┘Е┘Г╪к╪и╪й╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Б╪│ ╪з┘Д╪к┘Г┘И┘К┘Ж ┘Е╪л┘Д ┘З╪░╪з ╪з┘Д┘Ж┘Е┘И╪░╪м.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘К┘Е┘Г┘Ж ┘Д┘Ж┘Е┘И╪░╪м┘Г ╪г┘Ж ┘К╪╣┘К╪п ╪г┘К ╪┤┘К╪б ╪к╪▒┘К╪п┘З╪М ┘И┘Д┘Г┘Ж ╪е╪╣╪з╪п╪й ┘В╪з┘Е┘И╪│ ┘Е╪л┘Д┘Е╪з ┘Б╪╣┘Д┘Ж╪з ┘Д┘А
|
||||
`ResnetModelForImageClassification`╪М ┘Е╪╣ ╪к╪╢┘Е┘К┘Ж ╪з┘Д╪о╪│╪з╪▒╪й ╪╣┘Ж╪п ╪к┘Е╪▒┘К╪▒ ╪з┘Д╪╣┘Д╪з┘Е╪з╪к╪М ╪│┘К╪м╪╣┘Д ┘Ж┘Е┘И╪░╪м┘Г ┘В╪з╪и┘Д┘Л╪з ┘Д┘Д╪з╪│╪к╪о╪п╪з┘Е ┘Е╪и╪з╪┤╪▒╪й ╪п╪з╪о┘Д ┘Б╪ж╪й [`Trainer`]. ┘К╪╣╪п ╪з╪│╪к╪о╪п╪з┘Е ╪к┘Ж╪│┘К┘В ╪е╪о╪▒╪з╪м ╪в╪о╪▒ ╪г┘Е╪▒┘Л╪з ╪м┘К╪п┘Л╪з ╪╖╪з┘Д┘Е╪з ╪г┘Ж┘Г ╪к╪о╪╖╪╖ ┘Д╪з╪│╪к╪о╪п╪з┘Е ╪н┘Д┘В╪й ╪к╪п╪▒┘К╪и ╪о╪з╪╡╪й ╪и┘Г ╪г┘И ┘Е┘Г╪к╪и╪й ╪г╪о╪▒┘Й ┘Д┘Д╪к╪п╪▒┘К╪и.
|
||||
|
||||
╪з┘Д╪в┘Ж ╪и╪╣╪п ╪г┘Ж ╪г╪╡╪и╪н ┘Д╪п┘К┘Ж╪з ┘Б╪ж╪й ╪з┘Д┘Ж┘Е┘И╪░╪м╪М ╪п╪╣┘Ж╪з ┘Ж┘Ж╪┤╪ж ┘И╪з╪н╪п╪й:
|
||||
|
||||
```py
|
||||
resnet50d = ResnetModelForImageClassification(resnet50d_config)
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е ╪г┘К ┘Е┘Ж ╪╖╪▒┘В ┘Б╪ж╪й [`PreTrainedModel`]╪М ┘Е╪л┘Д [`~PreTrainedModel.save_pretrained`] ╪г┘И
|
||||
[`~PreTrainedModel.push_to_hub`]. ╪│┘Ж╪│╪к╪о╪п┘Е ╪з┘Д╪л╪з┘Ж┘К ┘Б┘К ╪з┘Д┘В╪│┘Е ╪з┘Д╪к╪з┘Д┘К╪М ┘И╪│┘Ж╪▒┘Й ┘Г┘К┘Б┘К╪й ╪п┘Б╪╣ ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е╪╣ ┘Г┘И╪п ┘Ж┘Е┘И╪░╪м┘Ж╪з. ┘И┘Д┘Г┘Ж ╪г┘И┘Д╪з┘Л╪М ╪п╪╣┘Ж╪з ┘Ж╪н┘Е┘Д ╪и╪╣╪╢ ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е┘П╪╣┘Д┘Е╪й ┘Е╪│╪и┘В┘Л╪з ╪п╪з╪о┘Д ┘Ж┘Е┘И╪░╪м┘Ж╪з.
|
||||
|
||||
┘Б┘К ╪н╪з┘Д╪й ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪о╪з╪╡╪й ╪и┘Г╪М ┘Б┘Е┘Ж ╪з┘Д┘Е╪н╪к┘Е┘Д ╪г┘Ж ╪к┘В┘И┘Е ╪и╪к╪п╪▒┘К╪и ┘Ж┘Е┘И╪░╪м┘Г ╪з┘Д┘Е╪о╪╡╪╡ ╪╣┘Д┘Й ╪и┘К╪з┘Ж╪з╪к┘Г ╪з┘Д╪о╪з╪╡╪й. ┘Д┘Д╪з┘Ж╪к┘В╪з┘Д ╪и╪│╪▒╪╣╪й ╪о┘Д╪з┘Д ┘З╪░╪з ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д╪к╪╣┘Д┘К┘Е┘К╪М
|
||||
╪│┘Ж╪│╪к╪о╪п┘Е ╪з┘Д╪е╪╡╪п╪з╪▒ ╪з┘Д┘Е┘П╪╣┘Д┘Е ┘Е╪│╪и┘В┘Л╪з ┘Е┘Ж resnet50d. ┘Ж╪╕╪▒┘Л╪з ┘Д╪г┘Ж ┘Ж┘Е┘И╪░╪м┘Ж╪з ┘З┘И ┘Е╪м╪▒╪п ╪║┘Д╪з┘Б ╪н┘И┘Д┘З╪М ┘Б┘Е┘Ж ╪з┘Д╪│┘З┘Д ┘Ж┘В┘Д ┘З╪░┘З ╪з┘Д╪г┘И╪▓╪з┘Ж:
|
||||
|
||||
```py
|
||||
import timm
|
||||
|
||||
pretrained_model = timm.create_model("resnet50d", pretrained=True)
|
||||
resnet50d.model.load_state_dict(pretrained_model.state_dict())
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ╪п╪╣┘И┘Ж╪з ┘Ж╪▒┘Й ┘Г┘К┘Б┘К╪й ╪з┘Д╪к╪г┘Г╪п ┘Е┘Ж ╪г┘Ж┘З ╪╣┘Ж╪п ┘В┘К╪з┘Е┘Ж╪з ╪и┘А [`~PreTrainedModel.save_pretrained`] ╪г┘И [`~PreTrainedModel.push_to_hub`]╪М ┘К╪к┘Е ╪н┘Б╪╕ ┘Г┘И╪п ╪з┘Д┘Ж┘Е┘И╪░╪м.
|
||||
|
||||
## ╪к╪│╪м┘К┘Д ┘Ж┘Е┘И╪░╪м ┘Е╪╣ ┘Г┘И╪п ┘Е╪о╪╡╪╡ ┘Д┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к┘Г╪к╪и ┘Е┘Г╪к╪и╪й ╪к┘И╪│╪╣ ЁЯдЧ Transformers╪М ┘Б┘В╪п ╪к╪▒╪║╪и ┘Б┘К ╪к┘И╪│┘К╪╣ ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й ┘Д╪к╪┤┘Е┘Д ┘Ж┘Е┘И╪░╪м┘Г ╪з┘Д╪о╪з╪╡. ┘К╪о╪к┘Д┘Б ┘З╪░╪з ╪╣┘Ж ┘Ж╪┤╪▒ ╪з┘Д┘Г┘И╪п ╪е┘Д┘Й Hub ╪и┘Е╪╣┘Ж┘Й ╪г┘Ж ╪з┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ╪│┘К╪н╪к╪з╪м┘И┘Ж ╪е┘Д┘Й ╪з╪│╪к┘К╪▒╪з╪п ┘Е┘Г╪к╪и╪к┘Г ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪о╪╡╪╡╪й (╪╣┘Д┘Й ╪╣┘Г╪│ ╪к┘Ж╪▓┘К┘Д ┘Г┘И╪п ╪з┘Д┘Ж┘Е┘И╪░╪м ╪к┘Д┘В╪з╪ж┘К┘Л╪з ┘Е┘Ж Hub).
|
||||
|
||||
┘Е╪з ╪п╪з┘Е ╪к┘Г┘И┘К┘Ж┘Г ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ┘Е╪╣╪з┘Е┘Д `model_type` ┘Е╪о╪к┘Д┘Б╪й ╪╣┘Ж ╪г┘Ж┘И╪з╪╣ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪н╪з┘Д┘К╪й╪М ┘И╪г┘Ж ┘Б╪ж╪з╪к ┘Ж┘Е╪з╪░╪м┘Г ┘Д╪п┘К┘Г ┘Д╪п┘К┘З╪з ╪з┘Д╪о╪╡╪з╪ж╪╡ ╪з┘Д╪╡╪н┘К╪н╪й `config_class`╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪и╪и╪│╪з╪╖╪й ╪е╪╢╪з┘Б╪к┘З╪з ╪е┘Д┘Й ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й ┘Е╪л┘Д ┘З╪░╪з:
|
||||
|
||||
```py
|
||||
from transformers import AutoConfig, AutoModel, AutoModelForImageClassification
|
||||
|
||||
AutoConfig.register("resnet", ResnetConfig)
|
||||
AutoModel.register(ResnetConfig, ResnetModel)
|
||||
AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassification)
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ╪г┘Ж ╪з┘Д╪н╪м╪й ╪з┘Д╪г┘И┘Д┘Й ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ╪╣┘Ж╪п ╪к╪│╪м┘К┘Д ╪к┘Г┘И┘К┘Ж┘Г ╪з┘Д┘Е╪о╪╡╪╡ ┘Д┘А [`AutoConfig`] ┘К╪м╪и ╪г┘Ж ╪к╪к╪╖╪з╪и┘В ┘Е╪╣ `model_type`
|
||||
┘Е┘Ж ╪к┘Г┘И┘К┘Ж┘Г ╪з┘Д┘Е╪о╪╡╪╡╪М ┘И╪з┘Д╪н╪м╪й ╪з┘Д╪г┘И┘Д┘Й ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ╪╣┘Ж╪п ╪к╪│╪м┘К┘Д ┘Ж┘Е╪з╪░╪м┘Г ╪з┘Д┘Е╪о╪╡╪╡╪й ┘Д╪г┘К ┘Б╪ж╪й ┘Ж┘Е┘И╪░╪м ╪к┘Д┘В╪з╪ж┘К ┘К╪м╪и
|
||||
╪г┘Ж ╪к╪к╪╖╪з╪и┘В ┘Е╪╣ `config_class` ┘Е┘Ж ╪к┘Д┘Г ╪з┘Д┘Ж┘Е╪з╪░╪м.
|
||||
|
||||
## ╪е╪▒╪│╪з┘Д ╪з┘Д┘Г┘И╪п ╪е┘Д┘Й Hub
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
┘З╪░╪з API ╪к╪м╪▒┘К╪и┘К ┘И┘В╪п ┘К┘Г┘И┘Ж ┘Д┘З ╪и╪╣╪╢ ╪з┘Д╪к╪║┘К┘К╪▒╪з╪к ╪з┘Д╪╖┘Б┘К┘Б╪й ┘Б┘К ╪з┘Д╪е╪╡╪п╪з╪▒╪з╪к ╪з┘Д┘В╪з╪п┘Е╪й.
|
||||
|
||||
</Tip>
|
||||
|
||||
╪г┘И┘Д╪з┘Л╪М ╪к╪г┘Г╪п ┘Е┘Ж ╪к╪╣╪▒┘К┘Б ┘Ж┘Е┘И╪░╪м┘Г ╪и╪з┘Д┘Г╪з┘Е┘Д ┘Б┘К ┘Е┘Д┘Б `.py`. ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К╪╣╪к┘Е╪п ╪╣┘Д┘Й ╪з┘Д╪з╪│╪к┘К╪▒╪з╪п ╪з┘Д┘Ж╪│╪и┘К ┘Д┘Е┘Д┘Б╪з╪к ╪г╪о╪▒┘Й ╪╖╪з┘Д┘Е╪з ╪г┘Ж ╪м┘Е┘К╪╣ ╪з┘Д┘Е┘Д┘Б╪з╪к ┘Е┘И╪м┘И╪п╪й ┘Б┘К ┘Ж┘Б╪│ ╪з┘Д╪п┘Д┘К┘Д (┘Д╪з ┘Ж╪п╪╣┘Е ╪з┘Д┘И╪н╪п╪з╪к ╪з┘Д┘Б╪▒╪╣┘К╪й ┘Д┘З╪░┘З ╪з┘Д┘Е┘К╪▓╪й ╪н╪к┘Й ╪з┘Д╪в┘Ж). ┘Б┘К ┘Е╪л╪з┘Д┘Ж╪з╪М ╪│┘Ж╪н╪п╪п ┘Е┘Д┘Б `modeling_resnet.py` ┘И┘Е┘Д┘Б `configuration_resnet.py` ┘Б┘К ┘Е╪м┘Д╪п ╪и╪з╪│┘Е "resnet_model" ┘Б┘К ╪п┘Д┘К┘Д ╪з┘Д╪╣┘Е┘Д ╪з┘Д╪н╪з┘Д┘К. ┘К╪н╪к┘И┘К ┘Е┘Д┘Б ╪з┘Д╪к┘Г┘И┘К┘Ж ╪╣┘Д┘Й ┘Г┘И╪п ┘Д┘А `ResnetConfig` ┘И┘К╪н╪к┘И┘К ┘Е┘Д┘Б ╪з┘Д┘Ж┘Е╪░╪м╪й ╪╣┘Д┘Й ┘Г┘И╪п ┘Д┘А `ResnetModel` ┘И`ResnetModelForImageClassification`.
|
||||
|
||||
```
|
||||
.
|
||||
тФФтФАтФА resnet_model
|
||||
тФЬтФАтФА __init__.py
|
||||
тФЬтФАтФА configuration_resnet.py
|
||||
тФФтФАтФА modeling_resnet.py
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪г┘Ж ┘К┘Г┘И┘Ж ┘Е┘Д┘Б `__init__.py` ┘Б╪з╪▒╪║┘Л╪з╪М ┘Б┘З┘И ┘Е┘И╪м┘И╪п ┘Б┘В╪╖ ╪н╪к┘Й ┘К╪к┘Е┘Г┘Ж Python ┘Е┘Ж ╪з┘Г╪к╪┤╪з┘Б ╪г┘Ж `resnet_model` ┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е┘З ┘Г┘Е┘И╪п┘К┘Д.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к┘В┘И┘Е ╪и┘Ж╪│╪о ┘Е┘Д┘Б╪з╪к ╪з┘Д┘Ж┘Е╪░╪м╪й ┘Е┘Ж ╪з┘Д┘Е┘Г╪к╪и╪й╪М ┘Б╪│┘И┘Б ╪к╪н╪к╪з╪м ╪е┘Д┘Й ╪з╪│╪к╪и╪п╪з┘Д ╪м┘Е┘К╪╣ ╪з┘Д┘И╪з╪▒╪п╪з╪к ╪з┘Д┘Ж╪│╪и┘К╪й ┘Б┘К ╪г╪╣┘Д┘Й ╪з┘Д┘Е┘Д┘Б
|
||||
┘Д╪з╪│╪к┘К╪▒╪з╪п┘З╪з ┘Е┘Ж ╪н╪▓┘Е╪й `transformers`.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘Д╪з╪н╪╕ ╪г┘Ж┘З ┘К┘Е┘Г┘Ж┘Г ╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е (╪г┘И ╪к┘И╪│┘К╪╣) ╪к┘Г┘И┘К┘Ж/┘Ж┘Е┘И╪░╪м ┘Е┘И╪м┘И╪п.
|
||||
|
||||
┘Д┘Е╪┤╪з╪▒┘Г╪й ┘Ж┘Е┘И╪░╪м┘Г ┘Е╪╣ ╪з┘Д┘Е╪м╪к┘Е╪╣╪М ╪з╪к╪и╪╣ ╪з┘Д╪о╪╖┘И╪з╪к ╪з┘Д╪к╪з┘Д┘К╪й: ╪г┘И┘Д╪з┘Л╪М ┘В┘Е ╪и╪з╪│╪к┘К╪▒╪з╪п ┘Ж┘Е┘И╪░╪м ResNet ┘И╪з┘Д╪к┘Г┘И┘К┘Ж ┘Е┘Ж ╪з┘Д┘Е┘Д┘Б╪з╪к ╪з┘Д╪к┘К ╪к┘Е ╪е┘Ж╪┤╪з╪д┘З╪з ╪н╪п┘К╪л┘Л╪з:
|
||||
|
||||
```py
|
||||
from resnet_model.configuration_resnet import ResnetConfig
|
||||
from resnet_model.modeling_resnet import ResnetModel, ResnetModelForImageClassification
|
||||
```
|
||||
|
||||
╪и╪╣╪п ╪░┘Д┘Г╪М ┘К╪м╪и ╪╣┘Д┘К┘Г ╪е╪о╪и╪з╪▒ ╪з┘Д┘Е┘Г╪к╪и╪й ╪и╪г┘Ж┘Г ╪к╪▒┘К╪п ┘Ж╪│╪о ┘Е┘Д┘Б╪з╪к ╪з┘Д┘Г┘И╪п ╪з┘Д╪о╪з╪╡╪й ╪и┘З╪░┘З ╪з┘Д┘Г╪з╪ж┘Ж╪з╪к ╪╣┘Ж╪п ╪з╪│╪к╪о╪п╪з┘Е ╪╖╪▒┘К┘В╪й `save_pretrained`
|
||||
┘И╪к╪│╪м┘К┘Д┘З╪з ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Б╪ж╪й ╪к┘Д┘В╪з╪ж┘К╪й (╪о╪з╪╡╪й ┘Д┘Д┘Ж┘Е╪з╪░╪м)╪М ┘Е╪з ╪╣┘Д┘К┘Г ╪│┘И┘Й ╪к╪┤╪║┘К┘Д:
|
||||
|
||||
```py
|
||||
ResnetConfig.register_for_auto_class()
|
||||
ResnetModel.register_for_auto_class("AutoModel")
|
||||
ResnetModelForImageClassification.register_for_auto_class("AutoModelForImageClassification")
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ╪г┘Ж┘З ┘Д╪з ╪к┘И╪м╪п ╪н╪з╪м╪й ┘Д╪к╪н╪п┘К╪п ┘Б╪ж╪й ╪к┘Д┘В╪з╪ж┘К╪й ┘Д┘Д╪к┘Г┘И┘К┘Ж (┘З┘Ж╪з┘Г ┘Б╪ж╪й ╪к┘Д┘В╪з╪ж┘К╪й ┘И╪з╪н╪п╪й ┘Б┘В╪╖ ┘Д┘З╪з╪М
|
||||
[`AutoConfig`]) ┘И┘Д┘Г┘Ж ╪з┘Д╪г┘Е╪▒ ┘К╪о╪к┘Д┘Б ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘Д┘Ж┘Е╪з╪░╪м. ┘В╪п ┘К┘Г┘И┘Ж ┘Ж┘Е┘И╪░╪м┘Г ╪з┘Д┘Е╪о╪╡╪╡ ┘Е┘Ж╪з╪│╪и┘Л╪з ┘Д┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д┘Е┘З╪з┘Е ╪з┘Д┘Е╪о╪к┘Д┘Б╪й╪М ┘Д╪░┘Д┘Г ┘К╪м╪и
|
||||
╪к╪н╪п┘К╪п ╪г┘К ┘Е┘Ж ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й ┘З┘И ╪з┘Д╪╡╪н┘К╪н ┘Д┘Ж┘Е┘И╪░╪м┘Г.
|
||||
|
||||
<Tip>
|
||||
|
||||
╪з╪│╪к╪о╪п┘Е `register_for_auto_class()` ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ┘Ж╪│╪о ┘Е┘Д┘Б╪з╪к ╪з┘Д┘Г┘И╪п. ╪е╪░╪з ┘Г┘Ж╪к ╪к┘Б╪╢┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Г┘И╪п ╪╣┘Д┘Й Hub ┘Е┘Ж ┘Е╪│╪к┘И╪п╪╣ ╪в╪о╪▒╪М
|
||||
┘Б┘Д╪з ╪к╪н╪к╪з╪м ╪е┘Д┘Й ╪з╪│╪к╪п╪╣╪з╪ж┘З. ┘Б┘К ╪з┘Д╪н╪з┘Д╪з╪к ╪з┘Д╪к┘К ┘К┘И╪м╪п ┘Б┘К┘З╪з ╪г┘Г╪л╪▒ ┘Е┘Ж ┘Б╪ж╪й ╪к┘Д┘В╪з╪ж┘К╪й ┘И╪з╪н╪п╪й╪М ┘К┘Е┘Г┘Ж┘Г ╪к╪╣╪п┘К┘Д ┘Е┘Д┘Б `config.json` ┘Е╪и╪з╪┤╪▒╪й ╪и╪з╪│╪к╪о╪п╪з┘Е
|
||||
╪з┘Д┘З┘К┘Г┘Д ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```json
|
||||
"auto_map": {
|
||||
"AutoConfig": "<your-repo-name>--<config-name>",
|
||||
"AutoModel": "<your-repo-name>--<config-name>",
|
||||
"AutoModelFor<Task>": "<your-repo-name>--<config-name>",
|
||||
},
|
||||
```
|
||||
|
||||
</Tip>
|
||||
|
||||
╪и╪╣╪п ╪░┘Д┘Г╪М ╪п╪╣┘Ж╪з ┘Ж┘В┘И┘Е ╪и╪е┘Ж╪┤╪з╪б ╪з┘Д╪к┘Г┘И┘К┘Ж ┘И╪з┘Д┘Ж┘Е╪з╪░╪м ┘Г┘Е╪з ┘Б╪╣┘Д┘Ж╪з ┘Е┘Ж ┘В╪и┘Д:
|
||||
|
||||
```py
|
||||
resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
|
||||
resnet50d = ResnetModelForImageClassification(resnet50d_config)
|
||||
|
||||
pretrained_model = timm.create_model("resnet50d", pretrained=True)
|
||||
resnet50d.model.load_state_dict(pretrained_model.state_dict())
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ┘Д╪е╪▒╪│╪з┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й Hub╪М ╪к╪г┘Г╪п ┘Е┘Ж ╪к╪│╪м┘К┘Д ╪з┘Д╪п╪о┘И┘Д. ╪е┘Е╪з ╪к╪┤╪║┘К┘Д ┘Б┘К ╪з┘Д┘Е╪н╪╖╪й ╪з┘Д╪г┘И╪з┘Е╪▒ ╪з┘Д╪╖╪▒┘Б┘К╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘Г:
|
||||
|
||||
```bash
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
╪г┘И ┘Е┘Ж ╪п┘Б╪к╪▒ ┘Е┘Д╪з╪н╪╕╪з╪к:
|
||||
|
||||
```py
|
||||
from huggingface_hub import notebook_login
|
||||
|
||||
notebook_login()
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪и╪╣╪п ╪░┘Д┘Г ╪з┘Д╪╢╪║╪╖ ╪╣┘Д┘Й ┘Е╪│╪з╪н╪й ╪з┘Д╪з╪│┘Е ╪з┘Д╪о╪з╪╡╪й ╪и┘Г (╪г┘И ┘Е┘Ж╪╕┘Е╪й ╪г┘Ж╪к ╪╣╪╢┘И ┘Б┘К┘З╪з) ┘Е╪л┘Д ┘З╪░╪з:
|
||||
|
||||
```py
|
||||
resnet50d.push_to_hub("custom-resnet50d")
|
||||
```
|
||||
|
||||
╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е╪░╪м╪й ┘И╪з┘Д╪к┘Г┘И┘К┘Ж ╪и╪к┘Ж╪│┘К┘В json╪М ┘Б┘В╪п ┘В╪з┘Е ┘З╪░╪з ╪г┘К╪╢┘Л╪з ╪и┘Ж╪│╪о ┘Е┘Д┘Б╪з╪к ╪з┘Д┘Ж┘Е╪░╪м╪й ┘И╪з┘Д╪к┘Г┘И┘К┘Ж `.py` ┘Б┘К ┘Е╪м┘Д╪п `custom-resnet50d` ┘И╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж╪к┘К╪м╪й ╪е┘Д┘Й Hub. ┘К┘Е┘Г┘Ж┘Г ╪з┘Д╪к╪н┘В┘В ┘Е┘Ж ╪з┘Д┘Ж╪к┘К╪м╪й ┘Б┘К ┘З╪░╪з [┘Е╪│╪к┘И╪п╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м](https://huggingface.co/sgugger/custom-resnet50d).
|
||||
|
||||
╪▒╪з╪м╪╣ [╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д╪к╪╣┘Д┘К┘Е┘К ┘Д┘Д┘Е╪┤╪з╪▒┘Г╪й](model_sharing) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к ╪н┘И┘Д ╪╖╪▒┘К┘В╪й ╪з┘Д╪п┘Б╪╣ ╪е┘Д┘Й ╪з┘Д┘Е╪н┘И╪▒.
|
||||
|
||||
### ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м ┘Е╪╣ ┘Г┘И╪п ┘Е╪о╪╡╪╡
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪о╪п╪з┘Е ╪г┘К ╪к┘Г┘И┘К┘Ж ╪г┘И ┘Ж┘Е┘И╪░╪м ╪г┘И ┘Е┘В╪│┘Е ┘Д╪║┘И┘К ┘Е╪╣ ┘Е┘Д┘Б╪з╪к ╪и╪▒┘Е╪м╪й ┘Е╪о╪╡╪╡╪й ┘Б┘К ┘Е╪│╪к┘И╪п╪╣┘З ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й ┘И ╪п╪з┘Д╪й `from_pretrained`.╪к┘П┘Б╪н╪╡ ╪м┘Е┘К╪╣ ╪з┘Д┘Е┘Д┘Б╪з╪к ┘И╪з┘Д╪▒┘Е┘И╪▓ ╪з┘Д┘Е╪▒┘Б┘И╪╣ ╪е┘Д┘Й Hub ╪и╪н╪л┘Л╪з ╪╣┘Ж ╪з┘Д╪и╪▒╪з┘Е╪м ╪з┘Д╪╢╪з╪▒╪й (╪▒╪з╪м╪╣ ┘И╪л╪з╪ж┘В [╪г┘Е╪з┘Ж Hub](https://huggingface.co/docs/hub/security#malware-scanning) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к)╪М ┘И┘Д┘Г┘Ж ┘К╪м╪и ╪╣┘Д┘К┘Г ┘Е╪▒╪з╪м╪╣╪й ┘Г┘И╪п ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪з┘Д┘Е╪д┘Д┘Б ┘Д╪к╪м┘Ж╪и ╪к┘Ж┘Б┘К╪░ ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪з┘Д╪╢╪з╪▒╪й ╪╣┘Д┘Й ╪м┘З╪з╪▓┘Г. ┘Д╪к┘Б╪╣┘К┘Д ┘Ж┘Е┘И╪░╪м ┘К╪н╪к┘И┘К ╪╣┘Д┘Й ╪┤┘Б╪▒╪й ╪и╪▒┘Е╪м┘К╪й ┘Е╪о╪╡╪╡╪й╪М ╪╣┘К┘С┘Ж `trust_remote_code=True`:
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForImageClassification
|
||||
|
||||
model = AutoModelForImageClassification.from_pretrained("sgugger/custom-resnet50d", trust_remote_code=True)
|
||||
```
|
||||
|
||||
┘К┘П┘Ж╪╡╪н ╪и╪┤╪п╪й ╪и╪к╪н╪п┘К╪п ╪▒┘В┘Е ╪е╪╡╪п╪з╪▒ (commit hash) ┘Г┘А `revision` ┘Д┘Д╪к╪г┘Г╪п ┘Е┘Ж ╪╣╪п┘Е ╪к╪╣╪п┘К┘Д ┘Е╪д┘Д┘Б ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Д╪┤┘Б╪▒╪й ┘Д╪з╪н┘В┘Л╪з╪и╪е╪╢╪з┘Б╪й ╪г╪│╪╖╪▒ ╪╢╪з╪▒╪й (╪е┘Д╪з ╪е╪░╪з ┘Г┘Ж╪к ╪к╪л┘В ╪к┘Е╪з┘Е┘Л╪з ╪и┘Е╪д┘Д┘Б┘К ╪з┘Д┘Ж┘Е┘И╪░╪м):
|
||||
|
||||
```py
|
||||
commit_hash = "ed94a7c6247d8aedce4647f00f20de6875b5b292"
|
||||
model = AutoModelForImageClassification.from_pretrained(
|
||||
"sgugger/custom-resnet50d"╪М trust_remote_code=True╪М revision=commit_hash
|
||||
)
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ┘И╪м┘И╪п ╪▓╪▒┘С ┘Д┘Ж╪│╪о ╪▒┘В┘Е ╪е╪╡╪п╪з╪▒ ╪и╪│┘З┘И┘Д╪й ╪╣┘Ж╪п ╪к╪╡┘Б╪н ╪│╪м┘Д ╪з┘Д╪к╪▓╪з┘Е╪з╪к ┘Е╪│╪к┘И╪п╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪╣┘Д┘Й ┘Е┘Ж╪╡╪й Hugging Face.
|
||||
51
docs/source/ar/fast_tokenizers.md
Normal file
51
docs/source/ar/fast_tokenizers.md
Normal file
@ -0,0 +1,51 @@
|
||||
# ╪з╪│╪к╪о╪п╪з┘Е ┘Е╪м╪▓╪ж┘К╪з╪к ╪з┘Д┘Ж╪╡┘И╪╡ ┘Е┘Ж ЁЯдЧ Tokenizers
|
||||
|
||||
┘К╪╣╪к┘Е╪п [`PreTrainedTokenizerFast`] ╪╣┘Д┘Й ┘Е┘Г╪к╪и╪й [ЁЯдЧ Tokenizers](https://huggingface.co/docs/tokenizers). ┘К┘Е┘Г┘Ж ╪к╪н┘Е┘К┘Д ╪з┘Д┘Е╪м╪▓╪ж╪з╪к ╪з┘Д┘Д╪║┘И┘К┘К┘Ж ╪з┘Д╪░┘К┘Ж ╪к┘Е ╪з┘Д╪н╪╡┘И┘Д ╪╣┘Д┘К┘З┘Е ┘Е┘Ж ┘Е┘Г╪к╪и╪й ЁЯдЧ Tokenizers ╪и╪и╪│╪з╪╖╪й ╪┤╪п┘К╪п╪й ┘Б┘К ЁЯдЧ Transformers.
|
||||
|
||||
┘В╪и┘Д ╪з┘Д╪п╪о┘И┘Д ┘Б┘К ╪з┘Д╪к┘Б╪з╪╡┘К┘Д╪М ╪п╪╣┘И┘Ж╪з ┘Ж╪и╪п╪г ╪г┘И┘Д╪з┘Л ╪и╪е┘Ж╪┤╪з╪б ┘Е┘П╪м╪▓┘Й╪б ┘Д╪║┘И┘К ╪к╪м╪▒┘К╪и┘К ┘Б┘К ╪и╪╢╪╣ ╪│╪╖┘И╪▒:
|
||||
|
||||
```python
|
||||
>>> from tokenizers import Tokenizer
|
||||
>>> from tokenizers.models import BPE
|
||||
>>> from tokenizers.trainers import BpeTrainer
|
||||
>>> from tokenizers.pre_tokenizers import Whitespace
|
||||
|
||||
>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
|
||||
>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
|
||||
|
||||
>>> tokenizer.pre_tokenizer = Whitespace()
|
||||
>>> files = [...]
|
||||
>>> tokenizer.train(files, trainer)
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ┘Д╪п┘К┘Ж╪з ┘Е┘П╪м╪▓┘Й╪б ┘Д╪║┘И┘К ┘Е╪п╪▒╪и ╪╣┘Д┘Й ╪з┘Д┘Е┘Д┘Б╪з╪к ╪з┘Д╪к┘К ╪н╪п╪п┘Ж╪з┘З╪з. ┘К┘Е┘Г┘Ж┘Ж╪з ╪е┘Е╪з ╪з┘Д╪з╪│╪к┘Е╪▒╪з╪▒ ┘Б┘К ╪з╪│╪к╪о╪п╪з┘Е┘З ┘Б┘К ┘И┘В╪к ╪з┘Д╪к╪┤╪║┘К┘Д ┘З╪░╪з╪М ╪г┘И ╪н┘Б╪╕┘З ┘Б┘К ┘Е┘Д┘Б JSON ┘Д╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е┘З ┘Д╪з╪н┘В┘Л╪з.
|
||||
|
||||
## ╪к╪н┘Е┘К┘Д ┘Е┘П╪м╪▓╪ж ╪з┘Д┘Ж┘С╪╡┘И╪╡ ┘Е┘П╪и╪з╪┤╪▒╪й┘Л
|
||||
|
||||
╪п╪╣┘И┘Ж╪з ┘Ж╪▒┘Й ┘Г┘К┘Б ┘К┘Е┘Г┘Ж┘Ж╪з ╪з┘Д╪з╪│╪к┘Б╪з╪п╪й ┘Е┘Ж ┘Г╪з╪ж┘Ж (┘Е┘П╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡) ┘Б┘К ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers. ╪к╪│┘Е╪н ┘Б╪ж╪й [`PreTrainedTokenizerFast`] ╪│┘З┘И┘Д╪й ╪е┘Ж╪┤╪з╪б *tokenizer*╪М ┘Е┘Ж ╪о┘Д╪з┘Д ┘В╪и┘И┘Д ┘Г╪з╪ж┘Ж *╪з┘Д┘Е┘П╪м╪▓╪ж ╪з┘Д┘Ж╪╡┘И╪╡* ┘Е┘П┘З┘К┘С╪г ┘Е┘П╪│╪и┘В┘Л╪з ┘Г┘Е╪╣╪з┘Е┘Д:
|
||||
|
||||
```python
|
||||
>>> from transformers import PreTrainedTokenizerFast
|
||||
|
||||
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪з┘Д╪в┘Ж ╪з╪│╪к╪о╪п╪з┘Е ┘З╪░╪з ╪з┘Д┘Г╪з╪ж┘Ж ┘Е╪╣ ╪м┘Е┘К╪╣ ╪з┘Д╪╖╪▒┘В ╪з┘Д┘Е┘П╪┤╪к╪▒┘Г╪й ╪и┘К┘Ж ┘Е┘П╪м╪▓┘С╪ж┘К ╪з┘Д┘Ж┘С╪╡┘И╪╡ ┘Д┘А ЁЯдЧ Transformers! ╪з┘Ж╪к┘В┘Д ╪е┘Д┘Й [╪╡┘Б╪н╪й ┘Е┘П╪м╪▓┘С╪ж ╪з┘Д┘Ж┘С╪╡┘И╪╡](main_classes/tokenizer) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к.
|
||||
|
||||
## ╪з┘Д╪к╪н┘Е┘К┘Д ┘Е┘Ж ┘Е┘Д┘Б JSON
|
||||
|
||||
┘Д╪к╪н┘Е┘К┘Д ┘Е┘П╪м╪▓┘С╪ж ╪з┘Д┘Ж╪╡ ┘Е┘Ж ┘Е┘Д┘Б JSON╪М ╪п╪╣┘И┘Ж╪з ┘Ж╪и╪п╪г ╪г┘И┘Д╪з┘Л ╪и╪н┘Б╪╕ ┘Е┘П╪м╪▓┘С╪ж ╪з┘Д┘Ж┘С╪╡┘И╪╡:
|
||||
|
||||
```python
|
||||
>>> tokenizer.save("tokenizer.json")
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪к┘Е╪▒┘К╪▒ ╪з┘Д┘Е╪│╪з╪▒ ╪з┘Д╪░┘К ╪н┘Б╪╕┘Ж╪з ╪и┘З ┘З╪░╪з ╪з┘Д┘Е┘Д┘Б ╪е┘Д┘Й ╪╖╪▒┘К┘В╪й ╪к┘З┘К╪ж╪й [`PreTrainedTokenizerFast`] ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Е┘П╪╣╪з┘Е┘Д `tokenizer_file`:
|
||||
|
||||
```python
|
||||
>>> from transformers import PreTrainedTokenizerFast
|
||||
|
||||
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪з┘Д╪в┘Ж ╪з╪│╪к╪о╪п╪з┘Е ┘З╪░╪з ╪з┘Д┘Г╪з╪ж┘Ж ┘Е╪╣ ╪м┘Е┘К╪╣ ╪з┘Д╪╖╪▒┘В ╪з┘Д╪к┘К ╪к╪┤╪к╪▒┘Г ┘Б┘К┘З╪з ┘Е┘П╪м╪▓┘С╪ж┘К ╪з┘Д┘Ж┘С╪╡┘И╪╡ ┘Д┘А ЁЯдЧ Transformers! ╪з┘Ж╪к┘В┘Д ╪е┘Д┘Й [╪╡┘Б╪н╪й ┘Е┘П╪м╪▓┘С╪ж ╪з┘Д┘Ж╪╡](main_classes/tokenizer) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к.
|
||||
89
docs/source/ar/gguf.md
Normal file
89
docs/source/ar/gguf.md
Normal file
@ -0,0 +1,89 @@
|
||||
# GGUF ┘И╪к┘Б╪з╪╣┘Д┘З╪з ┘Е╪╣ ╪з┘Д┘Е╪н┘И┘Д╪з╪к
|
||||
|
||||
╪к┘П╪│╪к╪о╪п┘Е ╪╡┘К╪║╪й ┘Е┘Д┘Б GGUF ┘Д╪к╪о╪▓┘К┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д ╪и╪з╪│╪к╪о╪п╪з┘Е [GGML](https://github.com/ggerganov/ggml) ┘И╪з┘Д┘Е┘Г╪к╪и╪з╪к ╪з┘Д╪г╪о╪▒┘Й ╪з┘Д╪к┘К ╪к╪╣╪к┘Е╪п ╪╣┘Д┘К┘З╪М ┘Е╪л┘Д [llama.cpp](https://github.com/ggerganov/llama.cpp) ╪г┘И [whisper.cpp](https://github.com/ggerganov/whisper.cpp) ╪з┘Д╪┤┘З┘К╪▒╪й ╪м╪п┘Л╪з.
|
||||
|
||||
╪е┘Ж┘З╪з ╪╡┘К╪║╪й ┘Е┘Д┘Б [┘Е╪п╪╣┘И┘Е╪й ┘Е┘Ж ┘В╪и┘Д Hugging Face Hub](https://huggingface.co/docs/hub/en/gguf) ┘Е╪╣ ┘Е┘К╪▓╪з╪к ╪к╪│┘Е╪н ╪и╪з┘Д┘Б╪н╪╡ ╪з┘Д╪│╪▒┘К╪╣ ┘Д┘Д┘Е┘И╪к╪▒╪з╪к ┘И╪з┘Д╪и┘К╪з┘Ж╪з╪к ╪з┘Д┘И╪╡┘Б┘К╪й ╪п╪з╪о┘Д ╪з┘Д┘Е┘Д┘Б.
|
||||
|
||||
╪к┘Е ╪к╪╡┘Е┘К┘Е ╪к┘Ж╪│┘К┘В ╪з┘Д┘Е┘Д┘Б ┘З╪░╪з ┘Г┘А "╪к┘Ж╪│┘К┘В ┘Е┘Д┘Б ┘И╪з╪н╪п" ╪н┘К╪л ┘К╪н╪к┘И┘К ┘Е┘Д┘Б ┘И╪з╪н╪п ╪╣╪з╪п╪й┘Л ╪╣┘Д┘Й ┘Г┘Д ┘Е┘Ж ╪│┘Е╪з╪к ╪з┘Д╪к┘Г┘И┘К┘Ж ┘И┘Е┘Б╪▒╪п╪з╪к ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К ┘И╪з┘Д╪о╪╡╪з╪ж╪╡ ╪з┘Д╪г╪о╪▒┘Й╪М ╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ╪м┘Е┘К╪╣ ╪з┘Д┘Е┘И╪к╪▒╪з╪к ╪з┘Д╪к┘К ╪│┘К╪к┘Е ╪к╪н┘Е┘К┘Д┘З╪з ┘Б┘К ╪з┘Д┘Ж┘Е┘И╪░╪м. ╪к╪г╪к┘К ┘З╪░┘З ╪з┘Д┘Е┘Д┘Б╪з╪к ╪и╪к┘Ж╪│┘К┘В╪з╪к ┘Е╪о╪к┘Д┘Б╪й ┘И┘Б┘В┘Л╪з ┘Д┘Ж┘И╪╣ ╪з┘Д╪к┘Г┘Е┘К┘Е ┘Б┘К ╪з┘Д┘Е┘Д┘Б. ┘Ж┘Д┘В┘К ┘Ж╪╕╪▒╪й ┘Е┘И╪м╪▓╪й ╪╣┘Д┘Й ╪и╪╣╪╢┘З╪з [┘З┘Ж╪з](https://huggingface.co/docs/hub/en/gguf#quantization-types).
|
||||
|
||||
## ╪з┘Д╪п╪╣┘Е ╪п╪з╪о┘Д ╪з┘Д┘Е╪н┘И┘Д╪з╪к
|
||||
|
||||
╪г╪╢┘Б┘Ж╪з ╪з┘Д┘В╪п╪▒╪й ╪╣┘Д┘Й ╪к╪н┘Е┘К┘Д ┘Е┘Д┘Б╪з╪к `gguf` ╪п╪з╪о┘Д `╪з┘Д┘Е╪н┘И┘Д╪з╪к` ┘Д╪к┘И┘Б┘К╪▒ ┘В╪п╪▒╪з╪к ╪к╪п╪▒┘К╪и/╪╢╪и╪╖ ╪е╪╢╪з┘Б┘К╪й ┘Д┘Ж┘Е╪з╪░╪м gguf╪М ┘В╪и┘Д ╪е╪╣╪з╪п╪й ╪к╪н┘И┘К┘Д ╪к┘Д┘Г ╪з┘Д┘Ж┘Е╪з╪░╪м ╪е┘Д┘Й `gguf` ┘Д╪з╪│╪к╪о╪п╪з┘Е┘З╪з ╪п╪з╪о┘Д ┘Ж╪╕╪з┘Е `ggml`. ╪╣┘Ж╪п ╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м╪М ┘Ж┘В┘И┘Е ╪г┘И┘Д╪з┘Л ╪и╪е┘Д╪║╪з╪б ╪к┘Г┘Е┘К┘Е┘З ╪е┘Д┘Й fp32╪М ┘В╪и┘Д ╪к╪н┘Е┘К┘Д ╪з┘Д╪г┘И╪▓╪з┘Ж ┘Д╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Б┘К PyTorch.
|
||||
|
||||
> [!NOTE]
|
||||
> ┘Д╪з ┘К╪▓╪з┘Д ╪з┘Д╪п╪╣┘Е ╪к╪м╪▒┘К╪и┘К┘Л╪з ┘Д┘Д╪║╪з┘К╪й ┘И┘Ж╪▒╪н╪и ╪и╪з┘Д┘Е╪│╪з┘З┘Е╪з╪к ┘Е┘Ж ╪г╪м┘Д ╪к╪▒╪│┘К╪о┘З ╪╣╪и╪▒ ╪г┘Ж┘И╪з╪╣ ╪з┘Д╪к┘Г┘Е┘К┘Е ┘И╪и┘Ж┘Й ╪з┘Д┘Ж┘Е╪з╪░╪м.
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К╪М ╪и┘Ж┘К╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м ┘И╪г┘Ж┘И╪з╪╣ ╪з┘Д╪к┘Г┘Е┘К┘Е ╪з┘Д┘Е╪п╪╣┘И┘Е╪й:
|
||||
|
||||
### ╪г┘Ж┘И╪з╪╣ ╪з┘Д╪к┘Г┘Е┘К┘Е ╪з┘Д┘Е╪п╪╣┘И┘Е╪й
|
||||
|
||||
╪к┘П╪н╪п╪п ╪г┘Ж┘И╪з╪╣ ╪з┘Д╪к┘Г┘Е┘К┘Е ╪з┘Д┘Е╪п╪╣┘И┘Е╪й ┘Е╪и╪п╪ж┘К┘Л╪з ┘И┘Б┘В┘Л╪з ┘Д┘Е┘Д┘Б╪з╪к ╪з┘Д╪к┘Г┘Е┘К┘Е ╪з┘Д╪┤╪з╪ж╪╣╪й ╪з┘Д╪к┘К ╪к┘Е╪к ┘Е╪┤╪з╪▒┘Г╪к┘З╪з ╪╣┘Д┘Й Hub.
|
||||
|
||||
- F32
|
||||
- F16
|
||||
- BF16
|
||||
- Q4_0
|
||||
- Q4_1
|
||||
- Q5_0
|
||||
- Q5_1
|
||||
- Q8_0
|
||||
- Q2_K
|
||||
- Q3_K
|
||||
- Q4_K
|
||||
- Q5_K
|
||||
- Q6_K
|
||||
- IQ1_S
|
||||
- IQ1_M
|
||||
- IQ2_XXS
|
||||
- IQ2_XS
|
||||
- IQ2_S
|
||||
- IQ3_XXS
|
||||
- IQ3_S
|
||||
- IQ4_XS
|
||||
- IQ4_NL
|
||||
|
||||
> [!NOTE]
|
||||
> ┘Д╪п╪╣┘Е ╪е┘Д╪║╪з╪б ╪к┘Г┘Е┘К┘Е gguf╪М ┘К┘Д╪▓┘Е ╪к╪л╪и┘К╪к `gguf>=0.10.0`.
|
||||
|
||||
### ╪и┘Ж┘К╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪п╪╣┘И┘Е╪й
|
||||
|
||||
┘Б┘К ╪з┘Д┘И┘В╪к ╪з┘Д╪н╪з┘Д┘К╪М ╪и┘Ж┘К╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪п╪╣┘И┘Е╪й ┘З┘К ╪з┘Д╪и┘Ж┘К╪з╪к ╪з┘Д╪к┘К ┘Г╪з┘Ж╪к ╪┤╪з╪ж╪╣╪й ╪м╪п┘Л╪з ╪╣┘Д┘Й Hub╪М ┘И┘З┘К:
|
||||
|
||||
- LLaMa
|
||||
- Mistral
|
||||
- Qwen2
|
||||
- Qwen2Moe
|
||||
- Phi3
|
||||
- Bloom
|
||||
- Falcon
|
||||
- StableLM
|
||||
- GPT2
|
||||
- Starcoder2
|
||||
- T5
|
||||
|
||||
## ┘Е╪л╪з┘Д ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е
|
||||
|
||||
┘Д╪к╪н┘Е┘К┘Д ┘Е┘Д┘Б╪з╪к `gguf` ┘Б┘К `transformers`╪М ┘К╪м╪и ╪к╪н╪п┘К╪п ┘Е╪╣╪з┘Е┘Д `gguf_file` ┘Б┘Й ╪п╪з┘Д╪й `from_pretrained` ┘Д┘Г┘Д ┘Е┘Ж ╪з┘Д┘Е┘П╪м╪▓┘С╪ж ╪з┘Д┘Д╪║┘И┘К╪й ┘И╪з┘Д┘Ж┘Е┘И╪░╪м. ┘Б┘К┘Е╪з ┘К┘Д┘К ┘Г┘К┘Б┘К╪й ╪к╪н┘Е┘К┘Д ╪з┘Д┘Е┘П╪м╪▓┘С╪ж ╪з┘Д┘Д╪║┘И┘К ┘И┘Ж┘Е┘И╪░╪м╪М ┘К┘Е┘Г┘Ж ╪к╪н┘Е┘К┘Д┘З┘Е╪з ┘Е┘Ж ┘Ж┘Б╪│ ╪з┘Д┘Е┘Д┘Б:
|
||||
|
||||
```py
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
|
||||
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ┘Д╪п┘К┘Г ╪е┘Е┘Г╪з┘Ж┘К╪й ╪з┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й ╪з┘Д┘Ж╪│╪о╪й ╪з┘Д┘Г╪з┘Е┘Д ╪║┘К╪▒ ╪з┘Д┘Е┘Г┘Е┘Е╪й ┘Д┘Д┘Ж┘Е┘И╪░╪м ┘Б┘К ╪и┘К╪ж╪й PyTorch╪М ╪н┘К╪л ┘К┘Е┘Г┘Ж┘Г ╪п┘Е╪м┘З ┘Е╪╣ ┘Е╪м┘Е┘И╪╣╪й ┘Г╪и┘К╪▒╪й ┘Е┘Ж ╪з┘Д╪г╪п┘И╪з╪к ╪з┘Д╪г╪о╪▒┘Й.
|
||||
|
||||
┘Д╪е╪╣╪з╪п╪й ╪з┘Д╪к╪н┘И┘К┘Д ╪е┘Д┘Й ┘Е┘Д┘Б `gguf`╪М ┘Ж┘И╪╡┘К ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е┘Д┘Б [`convert-hf-to-gguf.py`](https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py) ┘Е┘Ж llama.cpp.
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Г┘К┘Б┘К╪й ╪е┘Г┘Е╪з┘Д ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д┘Ж╪╡┘К ╪г╪╣┘Д╪з┘З ┘Д╪н┘Б╪╕ ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪е╪╣╪з╪п╪й ╪к╪╡╪п┘К╪▒┘З ┘Е╪▒╪й ╪г╪о╪▒┘Й ╪е┘Д┘Й `gguf`:
|
||||
|
||||
```py
|
||||
tokenizer.save_pretrained('directory')
|
||||
model.save_pretrained('directory')
|
||||
|
||||
!python ${path_to_llama_cpp}/convert-hf-to-gguf.py ${directory}
|
||||
```
|
||||
@ -28,7 +28,7 @@ picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
```py
|
||||
>>> model = AutoModel.from_pretrained(
|
||||
... "julien-c/EsperBERTo-small", revision="v2.0.1" # ╪з╪│┘Е ╪з┘Д╪╣┘Д╪з┘Е╪й╪М ╪г┘И ╪з╪│┘Е ╪з┘Д┘Б╪▒╪╣╪М ╪г┘И ╪к╪м╪▓╪ж╪й ╪з┘Д╪з┘Д╪к╪▓╪з┘Е
|
||||
... "julien-c/EsperBERTo-small", revision="4c77982" # ╪з╪│┘Е ╪з┘Д╪╣┘Д╪з┘Е╪й╪М ╪г┘И ╪з╪│┘Е ╪з┘Д┘Б╪▒╪╣╪М ╪г┘И ╪к╪м╪▓╪ж╪й ╪з┘Д╪з┘Д╪к╪▓╪з┘Е
|
||||
... )
|
||||
```
|
||||
|
||||
|
||||
160
docs/source/ar/multilingual.md
Normal file
160
docs/source/ar/multilingual.md
Normal file
@ -0,0 +1,160 @@
|
||||
# ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д
|
||||
|
||||
┘З┘Ж╪з┘Г ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ┘Б┘К ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers╪М ┘И╪к╪о╪к┘Д┘Б ╪╖╪▒┘К┘В╪й ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д ╪╣┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м ╪г╪н╪з╪п┘К╪й ╪з┘Д┘Д╪║╪й. ┘И┘Д┘Г┘Ж ┘Д┘К╪│ ┘Г┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ┘Е╪о╪к┘Д┘Б. ┘Б╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м╪М ┘Е╪л┘Д [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)╪М ┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ╪к┘Е╪з┘Е┘Л╪з ┘Е╪л┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪г╪н╪з╪п┘К ╪з┘Д┘Д╪║╪й. ╪│┘К┘И╪╢╪н ┘Д┘Г ┘З╪░╪з ╪з┘Д╪п┘Д┘К┘Д ┘Г┘К┘Б┘К╪й ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ╪з┘Д╪к┘К ╪к╪о╪к┘Д┘Б ╪╖╪▒┘К┘В╪й ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д.
|
||||
|
||||
## XLM
|
||||
|
||||
┘К╪н╪к┘И┘К XLM ╪╣┘Д┘Й ╪╣╪┤╪▒ ┘Ж╪│╪о ┘Е╪о╪к┘Д┘Б╪й╪М ┘И╪з╪н╪п╪й ┘Е┘Ж┘З╪з ┘Б┘В╪╖ ╪г╪н╪з╪п┘К╪й ╪з┘Д┘Д╪║╪й. ┘И┘К┘Е┘Г┘Ж ╪к┘В╪│┘К┘Е ┘Ж╪│╪о ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪│╪╣ ╪з┘Д┘Е╪к╪и┘В┘К╪й ╪е┘Д┘Й ┘Б╪ж╪к┘К┘Ж: ┘Ж╪│╪о ╪з┘Д╪к┘К ╪к╪│╪к╪о╪п┘Е ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й (language embeddings) ┘И╪к┘Д┘Г ╪з┘Д╪к┘К ┘Д╪з ╪к╪│╪к╪о╪п┘Е┘З╪з.
|
||||
|
||||
### XLM ┘Е╪╣ ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й
|
||||
|
||||
╪к╪│╪к╪о╪п┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪з┘Д┘К╪й ┘Е┘Ж XLM ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й ┘Д╪к╪н╪п┘К╪п ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ╪г╪л┘Ж╪з╪б ╪з┘Д╪з╪│╪к╪п┘Д╪з┘Д:
|
||||
|
||||
- `FacebookAI/xlm-mlm-ende-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й-╪з┘Д╪г┘Д┘Е╪з┘Ж┘К╪й)
|
||||
- `FacebookAI/xlm-mlm-enfr-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й-╪з┘Д┘Б╪▒┘Ж╪│┘К╪й)
|
||||
- `FacebookAI/xlm-mlm-enro-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й-╪з┘Д╪▒┘И┘Е╪з┘Ж┘К╪й)
|
||||
- `FacebookAI/xlm-mlm-xnli15-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М ┘Д╪║╪з╪к XNLI)
|
||||
- `FacebookAI/xlm-mlm-tlm-xnli15-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й + ╪з┘Д╪к╪▒╪м┘Е╪й╪М ┘Д╪║╪з╪к XNLI)
|
||||
- `FacebookAI/xlm-clm-enfr-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д╪│╪и╪и┘К╪й╪М ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й-╪з┘Д┘Б╪▒┘Ж╪│┘К╪й)
|
||||
- `FacebookAI/xlm-clm-ende-1024` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д╪│╪и╪и┘К╪й╪М ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й-╪з┘Д╪г┘Д┘Е╪з┘Ж┘К╪й)
|
||||
|
||||
╪к┘П┘Е╪л┘Д ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й ╪╣┘Д┘Й ╪┤┘Г┘Д ┘Е╪╡┘Б┘И┘Б╪й ╪и┘Ж┘Б╪│ ╪┤┘Г┘Д `input_ids` ╪з┘Д╪к┘К ┘К╪к┘Е ╪к┘Е╪▒┘К╪▒┘З ╪е┘Д┘Й ╪з┘Д┘Ж┘Е┘И╪░╪м. ┘И╪к╪╣╪к┘Е╪п ╪з┘Д┘В┘К┘Е ┘Б┘К ┘З╪░┘З ╪з┘Д┘Е╪╡┘Б┘И┘Б╪з╪к ╪╣┘Д┘Й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е╪│╪к╪о╪п┘Е╪й ┘И┘К╪к┘Е ╪к╪н╪п┘К╪п┘З╪з ╪и┘И╪з╪│╪╖╪й ┘Е╪╣╪з┘Е┘Д┘Й ╪з┘Д┘Е╪м╪▓┘Й╪б `lang2id` ┘И `id2lang`.
|
||||
|
||||
┘Б┘К ┘З╪░╪з ╪з┘Д┘Е╪л╪з┘Д╪М ┘В┘Е ╪и╪к╪н┘Е┘К┘Д ┘Ж╪│╪о╪й `FacebookAI/xlm-clm-enfr-1024` ( ┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д╪│╪и╪и┘К╪й╪М ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й-╪з┘Д┘Б╪▒┘Ж╪│┘К╪й):
|
||||
|
||||
```py
|
||||
>>> import torch
|
||||
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
|
||||
|
||||
>>> tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
|
||||
>>> model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
|
||||
```
|
||||
|
||||
╪к┘П╪╕┘З╪▒ ╪о╪з╪╡┘К╪й `lang2id` ┘Б┘К ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║╪з╪к ┘И╪г╪▒┘В╪з┘Е ╪к╪╣╪▒┘К┘Б┘З╪з ┘Б┘К ┘З╪░╪з ╪з┘Д┘Ж┘Е┘И╪░╪м:
|
||||
|
||||
```py
|
||||
>>> print(tokenizer.lang2id)
|
||||
{'en': 0, 'fr': 1}
|
||||
```
|
||||
|
||||
╪и╪╣╪п ╪░┘Д┘Г╪М ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б ┘Е╪л╪з┘Д ╪╣┘Д┘Й ╪з┘Д┘Е╪п╪о┘Д╪з╪к:
|
||||
|
||||
```py
|
||||
>>> input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1
|
||||
```
|
||||
|
||||
┘В┘Е ╪и╪к╪╣┘К┘К┘Ж ┘Е╪╣╪▒┘Б ╪з┘Д┘Д╪║╪й ╪е┘Д┘Й `"en"` ┘И╪з╪│╪к╪о╪п┘Е┘З ┘Д╪к╪н╪п┘К╪п ╪к╪╢┘Е┘К┘Ж ╪з┘Д┘Д╪║╪й. ┘И╪к╪╢┘Е┘К┘Ж ╪з┘Д┘Д╪║╪й ╪╣╪и╪з╪▒╪й ╪╣┘Ж ┘Е╪╡┘Б┘И┘Б╪й ┘Е┘Е┘Д┘И╪б╪й ╪и┘А `0` ┘Д╪г┘Ж ┘З╪░╪з ┘З┘И ┘Е╪╣╪▒┘Б ╪з┘Д┘Д╪║╪й ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й. ┘К╪м╪и ╪г┘Ж ╪к┘Г┘И┘Ж ┘З╪░┘З ╪з┘Д┘Е╪╡┘Б┘И┘Б╪й ╪и┘Ж┘Б╪│ ╪н╪м┘Е `input_ids`.
|
||||
|
||||
```py
|
||||
>>> language_id = tokenizer.lang2id["en"] # 0
|
||||
>>> langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0])
|
||||
|
||||
>>> # ┘Ж┘В┘И┘Е ╪и╪е╪╣╪з╪п╪й ╪к╪┤┘Г┘К┘Д┘З╪з ┘Д╪к┘Г┘И┘Ж ╪и╪з┘Д╪н╪м┘Е (batch_size╪М sequence_length)
|
||||
>>> langs = langs.view(1, -1) # ╪з┘Д╪в┘Ж ╪и╪з┘Д╪н╪м┘Е [1╪М sequence_length] (┘Д╪п┘К┘Ж╪з batch size ╪к╪│╪з┘И┘К 1)
|
||||
```
|
||||
|
||||
╪з┘Д╪в┘Ж ┘К┘Е┘Г┘Ж┘Г ╪к┘Е╪▒┘К╪▒ `input_ids` ┘И╪к╪╢┘Е┘К┘Ж ╪з┘Д┘Д╪║╪й ╪е┘Д┘Й ╪з┘Д┘Ж┘Е┘И╪░╪м:
|
||||
|
||||
```py
|
||||
>>> outputs = model(input_ids, langs=langs)
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ┘Д┘Ж╪╡ ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д┘Ж╪╡┘К [run_generation.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-generation/run_generation.py) ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡ ╪и╪з╪│╪к╪о╪п╪з┘Е ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й ┘Е╪╣ ┘Ж┘В╪з╪╖ ╪к┘Б╪к┘К╪┤ `xlm-clm`.
|
||||
|
||||
### XLM ╪и╪п┘И┘Ж ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й
|
||||
|
||||
╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪з┘Д┘К╪й ┘Е┘Ж XLM ┘Д╪з ╪к╪к╪╖┘Д╪и ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й ╪г╪л┘Ж╪з╪б ╪з┘Д╪з╪│╪к┘Ж╪к╪з╪м:
|
||||
|
||||
- `FacebookAI/xlm-mlm-17-1280` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М 17 ┘Д╪║╪й)
|
||||
- `FacebookAI/xlm-mlm-100-1280` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М 100 ┘Д╪║╪й)
|
||||
|
||||
╪к┘П╪│╪к╪о╪п┘Е ┘З╪░┘З ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д╪к┘Е╪л┘К┘Д ╪з┘Д╪м┘Е┘Д ╪з┘Д╪╣╪з┘Е╪й╪М ╪╣┘Д┘Й ╪╣┘Г╪│ ┘Ж╪│╪н XLM ╪з┘Д╪│╪з╪и┘В╪й.
|
||||
|
||||
## BERT
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪з┘Д┘К╪й ┘Е┘Ж BERT ┘Д┘Д┘Е┘З╪з┘Е ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к:
|
||||
|
||||
- `google-bert/bert-base-multilingual-uncased` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й + ╪з┘Д╪к┘Ж╪и╪д ╪и╪з┘Д╪м┘Е┘Д╪й ╪з┘Д╪к╪з┘Д┘К╪й╪М 102 ┘Д╪║╪й)
|
||||
- `google-bert/bert-base-multilingual-cased` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й + ╪з┘Д╪к┘Ж╪и╪д ╪и╪з┘Д╪м┘Е┘Д╪й ╪з┘Д╪к╪з┘Д┘К╪й╪М 104 ┘Д╪║╪з╪к)
|
||||
|
||||
┘Д╪з ╪к╪к╪╖┘Д╪и ┘З╪░┘З ╪з┘Д┘Ж┘Е╪з╪░╪м ╪к╪╢┘Е┘К┘Ж╪з╪к ╪з┘Д┘Д╪║╪й ╪г╪л┘Ж╪з╪б ╪з┘Д╪з╪│╪к╪п┘Д╪з┘Д. ┘К╪м╪и ╪г┘Ж ╪к┘П╪н╪п┘С╪п ╪з┘Д┘Д╪║╪й ┘Е┘Ж ╪з┘Д╪│┘К╪з┘В ┘И╪к╪│╪к┘Ж╪к╪м ┘И┘Б┘В╪з┘Л ┘Д╪░┘Д┘Г.
|
||||
|
||||
## XLM-RoBERTa
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪з┘Д┘К╪й ┘Е┘Ж XLM-RoBERTa ┘Д┘Д┘Е┘З╪з┘Е ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к:
|
||||
|
||||
- `FacebookAI/xlm-roberta-base` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М 100 ┘Д╪║╪й)
|
||||
- `FacebookAI/xlm-roberta-large` (┘Ж┘Е╪░╪м╪й ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е┘В┘Ж╪╣╪й╪М 100 ┘Д╪║╪й)
|
||||
|
||||
╪к┘Е ╪к╪п╪▒┘К╪и XLM-RoBERTa ╪╣┘Д┘Й 2.5 ╪к┘К╪▒╪з╪и╪з┘К╪к ┘Е┘Ж ╪и┘К╪з┘Ж╪з╪к CommonCrawl ╪з┘Д╪м╪п┘К╪п╪й ┘И╪з┘Д┘Е╪н╪│┘Ж╪й ┘Б┘К 100 ┘Д╪║╪й. ┘И┘К┘И┘Б╪▒ ┘Е┘Г╪з╪│╪и ┘В┘И┘К╪й ╪╣┘Д┘Й ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ╪з┘Д╪к┘К ╪к┘Е ╪е╪╡╪п╪з╪▒┘З╪з ╪│╪з╪и┘В╪з┘Л ┘Е╪л┘Д mBERT ╪г┘И XLM ┘Б┘К ┘Е┘З╪з┘Е ╪з┘Д┘Е╪╡╪и ┘Е╪л┘Д ╪з┘Д╪к╪╡┘Ж┘К┘Б╪М ┘И┘И╪╢╪╣ ╪з┘Д╪╣┘Д╪з┘Е╪з╪к ╪з┘Д╪к╪│┘Д╪│┘Д┘К╪й╪М ┘И╪з┘Д╪г╪│╪ж┘Д╪й ┘И╪з┘Д╪г╪м┘И╪и╪й.
|
||||
|
||||
## M2M100
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪з┘Д┘К╪й ┘Е┘Ж M2M100 ┘Д┘Д╪к╪▒╪м┘Е╪й ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к:
|
||||
|
||||
- `facebook/m2m100_418M` (╪з┘Д╪к╪▒╪м┘Е╪й)
|
||||
- `facebook/m2m100_1.2B` (╪з┘Д╪к╪▒╪м┘Е╪й)
|
||||
|
||||
┘Б┘К ┘З╪░╪з ╪з┘Д┘Е╪л╪з┘Д╪М ┘В┘Е ╪и╪к╪н┘Е┘К┘Д ┘Ж╪│╪н╪й `facebook/m2m100_418M` ┘Д╪к╪▒╪м┘Е╪й ╪з┘Д┘Ж╪╡ ┘Е┘Ж ╪з┘Д╪╡┘К┘Ж┘К╪й ╪е┘Д┘Й ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й. ┘К┘Е┘Г┘Ж┘Г ╪к╪╣┘К┘К┘Ж ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е╪╡╪п╪▒ ┘Б┘К ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘Й:
|
||||
|
||||
```py
|
||||
>>> from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||
|
||||
>>> en_text = "Do not meddle in the affairs of wizards, for they are subtle and quick to anger."
|
||||
>>> chinese_text = "ф╕НшжБцПТцЙЛх╖лх╕лчЪДф║ЛхЛЩ, хЫачВ║ф╗ЦхАСцШпх╛охжЩчЪД, х╛Их┐лх░▒цЬГчЩ╝цАТ."
|
||||
|
||||
>>> tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M", src_lang="zh")
|
||||
>>> model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
|
||||
```
|
||||
|
||||
╪к┘В╪│┘К┘Е ╪з┘Д┘Ж┘С╪╡ ╪е┘Д┘Й ╪▒┘Е┘И╪▓:
|
||||
|
||||
```py
|
||||
>>> encoded_zh = tokenizer(chinese_text, return_tensors="pt")
|
||||
```
|
||||
|
||||
┘К╪м╪и╪▒ M2M100 ┘Е╪╣╪▒┘Б ╪з┘Д┘Д╪║╪й ╪з┘Д┘З╪п┘Б ┘Г╪г┘И┘Д ╪▒┘Е╪▓ ┘Е┘И┘Д╪п ┘Д┘Д╪к╪▒╪м┘Е╪й ╪е┘Д┘Й ╪з┘Д┘Д╪║╪й ╪з┘Д┘З╪п┘Б. ┘В┘Е ╪и╪к╪╣┘К┘К┘Ж `forced_bos_token_id` ╪е┘Д┘Й `en` ┘Б┘К ╪╖╪▒┘К┘В╪й `generate` ┘Д┘Д╪к╪▒╪м┘Е╪й ╪е┘Д┘Й ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й:
|
||||
|
||||
```py
|
||||
>>> generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
|
||||
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
'Do not interfere with the matters of the witches, because they are delicate and will soon be angry.'
|
||||
```
|
||||
|
||||
## MBart
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪з┘Д┘К╪й ┘Е┘Ж MBart ┘Д┘Д╪к╪▒╪м┘Е╪й ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к:
|
||||
|
||||
- `facebook/mbart-large-50-one-to-many-mmt` (╪з┘Д╪к╪▒╪м┘Е╪й ╪з┘Д╪в┘Д┘К╪й ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ┘Е┘Ж ┘И╪з╪н╪п ╪е┘Д┘Й ┘Г╪л┘К╪▒╪М 50 ┘Д╪║╪й)
|
||||
- `facebook/mbart-large-50-many-to-many-mmt` (╪з┘Д╪к╪▒╪м┘Е╪й ╪з┘Д╪в┘Д┘К╪й ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ┘Е┘Ж ┘Г╪л┘К╪▒ ╪е┘Д┘Й ┘Г╪л┘К╪▒╪М 50 ┘Д╪║╪й)
|
||||
- `facebook/mbart-large-50-many-to-one-mmt` (╪з┘Д╪к╪▒╪м┘Е╪й ╪з┘Д╪в┘Д┘К╪й ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к ┘Е┘Ж ┘Г╪л┘К╪▒ ╪е┘Д┘Й ┘И╪з╪н╪п╪М 50 ┘Д╪║╪й)
|
||||
- `facebook/mbart-large-50` (╪з┘Д╪к╪▒╪м┘Е╪й ┘Е╪к╪╣╪п╪п╪й ╪з┘Д┘Д╪║╪з╪к╪М 50 ┘Д╪║╪й)
|
||||
- `facebook/mbart-large-cc25`
|
||||
|
||||
┘Б┘К ┘З╪░╪з ╪з┘Д┘Е╪л╪з┘Д╪М ┘В┘Е ╪и╪к╪н┘Е┘К┘Д ┘Ж╪│╪о╪й `facebook/mbart-large-50-many-to-many-mmt` ┘Д╪к╪▒╪м┘Е╪й ╪з┘Д┘Ж╪╡ ┘Е┘Ж ╪з┘Д┘Б┘Ж┘Д┘Ж╪п┘К╪й ╪е┘Д┘Й ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й. ┘К┘Е┘Г┘Ж┘Г ╪к╪╣┘К┘К┘Ж ╪з┘Д┘Д╪║╪й ╪з┘Д┘Е╪╡╪п╪▒ ┘Б┘К ╪з┘Д┘Е╪м╪▓┘Й╪б:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
||||
|
||||
>>> en_text = "Do not meddle in the affairs of wizards, for they are subtle and quick to anger."
|
||||
>>> fi_text = "├Дl├д sekaannu velhojen asioihin, sill├д ne ovat hienovaraisia ja nopeasti vihaisia."
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50-many-to-many-mmt", src_lang="fi_FI")
|
||||
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
|
||||
```
|
||||
|
||||
╪к┘В╪│┘К┘Е ╪з┘Д┘Ж┘С╪╡ ╪е┘Д┘Й ╪▒┘Е┘И╪▓:
|
||||
|
||||
```py
|
||||
>>> encoded_en = tokenizer(en_text, return_tensors="pt")
|
||||
```
|
||||
|
||||
┘К╪м╪и╪▒ MBart ┘Е╪╣╪▒┘Б ┘Д╪║╪й ╪з┘Д┘З╪п┘Б ┘Г╪г┘И┘Д ╪▒┘Е╪▓ ┘Е┘И┘Д╪п ┘Д┘Д╪к╪▒╪м┘Е╪й ╪е┘Д┘Й ╪з┘Д┘Д╪║╪й ╪з┘Д┘З╪п┘Б. ┘В┘Е ╪и╪к╪╣┘К┘К┘Ж `forced_bos_token_id` ╪е┘Д┘Й `en` ┘Б┘К ╪╖╪▒┘К┘В╪й `generate` ┘Д┘Д╪к╪▒╪м┘Е╪й ╪е┘Д┘Й ╪з┘Д╪е┘Ж╪м┘Д┘К╪▓┘К╪й:
|
||||
|
||||
```py
|
||||
>>> generated_tokens = model.generate(**encoded_en, forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"])
|
||||
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
"Don't interfere with the wizard's affairs, because they are subtle, will soon get angry."
|
||||
```
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е ┘Ж╪│╪о╪й `facebook/mbart-large-50-many-to-one-mmt`╪М ┘Б┘Д╪з ╪к╪н╪к╪з╪м ╪е┘Д┘Й ╪е╪м╪и╪з╪▒ ┘Е╪╣╪▒┘Б ┘Д╪║╪й ╪з┘Д┘З╪п┘Б ┘Г╪г┘И┘Д ╪▒┘Е╪▓ ┘Е┘И┘Д╪п╪М ┘И╪е┘Д╪з ┘Б╪е┘Ж ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е ┘З┘И ┘Ж┘Б╪│┘З.
|
||||
8
docs/source/ar/sagemaker.md
Normal file
8
docs/source/ar/sagemaker.md
Normal file
@ -0,0 +1,8 @@
|
||||
# ╪к╪┤╪║┘К┘Д ╪з┘Д╪к╪п╪▒┘К╪и ╪╣┘Д┘Й Amazon SageMaker
|
||||
|
||||
╪к┘Е ┘Ж┘В┘Д ╪з┘Д╪к┘И╪л┘К┘В ╪е┘Д┘Й [hf.co/docs/sagemaker](https://huggingface.co/docs/sagemaker). ┘И╪│┘К╪к┘Е ╪е╪▓╪з┘Д╪й ┘З╪░┘З ╪з┘Д╪╡┘Б╪н╪й ┘Б┘К ╪з┘Д╪е╪╡╪п╪з╪▒ 5.0 ┘Е┘Ж ╪и╪▒┘Ж╪з┘Е╪м Transformers.
|
||||
|
||||
### ╪м╪п┘И┘Д ╪з┘Д┘Е╪н╪к┘И┘К╪з╪к
|
||||
|
||||
- [╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м Hugging Face ╪╣┘Д┘Й Amazon SageMaker ╪и╪з╪│╪к╪о╪п╪з┘Е SageMaker Python SDK](https://huggingface.co/docs/sagemaker/train)
|
||||
- [┘Ж╪┤╪▒ ┘Ж┘Е╪з╪░╪м Hugging Face ╪╣┘Д┘Й Amazon SageMaker ╪и╪з╪│╪к╪о╪п╪з┘Е SageMaker Python SDK](https://huggingface.co/docs/sagemaker/inference)
|
||||
170
docs/source/ar/serialization.md
Normal file
170
docs/source/ar/serialization.md
Normal file
@ -0,0 +1,170 @@
|
||||
# ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й ONNX
|
||||
|
||||
╪║╪з┘Д╪и╪з┘Л ┘Е╪з ┘К╪к╪╖┘Д╪и ┘Ж╪┤╪▒ ┘Ж┘Е╪з╪░╪м ЁЯдЧ Transformers ┘Б┘К ╪и┘К╪ж╪з╪к ╪з┘Д╪е┘Ж╪к╪з╪м ╪г┘И ┘К┘Е┘Г┘Ж ╪г┘Ж ┘К╪│╪к┘Б┘К╪п ┘Е┘Ж ╪к╪╡╪п┘К╪▒ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В ╪к╪│┘Д╪│┘Д┘К ┘К┘П┘Е┘Г┘Ж ╪к╪н┘Е┘К┘Д┘З ┘И╪к┘Ж┘Б┘К╪░┘З ╪╣┘Д┘Й ╪г╪м┘З╪▓╪й ┘И╪и╪▒╪з┘Е╪м ╪к╪┤╪║┘К┘Д ┘Е┘П╪к╪о╪╡╪╡╪й.
|
||||
|
||||
ЁЯдЧ Optimum ┘З┘И ╪з┘Е╪к╪п╪з╪п ┘Д┘А Transformers ┘К┘Е┘Г┘С┘Ж ┘Е┘Ж ╪к╪╡╪п┘К╪▒ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Е┘Ж PyTorch ╪г┘И TensorFlow ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В╪з╪к ┘Е┘П╪к╪│┘Д╪│┘Д╪й ┘Е╪л┘Д ONNX ┘И TFLite ┘Е┘Ж ╪о┘Д╪з┘Д ┘И╪н╪п╪й `exporters` ╪з┘Д╪о╪з╪╡╪й ╪и┘З. ┘К┘И┘Б╪▒ ЁЯдЧ Optimum ╪г┘К╪╢┘Л╪з ┘Е╪м┘Е┘И╪╣╪й ┘Е┘Ж ╪г╪п┘И╪з╪к ╪к╪н╪│┘К┘Ж ╪з┘Д╪г╪п╪з╪б ┘Д╪к╪п╪▒┘К╪и ╪з┘Д┘Ж┘Е╪з╪░╪м ┘И╪к╪┤╪║┘К┘Д┘З╪з ╪╣┘Д┘Й ╪г╪м┘З╪▓╪й ┘Е╪│╪к┘З╪п┘Б╪й ╪и┘Г┘Б╪з╪б╪й ┘В╪╡┘И┘Й.
|
||||
|
||||
┘К┘И╪╢╪н ┘З╪░╪з ╪з┘Д╪п┘Д┘К┘Д ┘Г┘К┘Б┘К╪й ╪к╪╡╪п┘К╪▒ ┘Ж┘Е╪з╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX ╪и╪з╪│╪к╪о╪п╪з┘Е ЁЯдЧ Optimum╪М ┘И┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪з┘Д╪п┘Д┘К┘Д ╪з┘Д╪о╪з╪╡ ╪и╪к╪╡╪п┘К╪▒ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪е┘Д┘Й TFLite╪М ┘К┘П╪▒╪м┘Й ╪з┘Д╪▒╪м┘И╪╣ ╪е┘Д┘Й ╪╡┘Б╪н╪й [╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TFLite](tflite).
|
||||
|
||||
## ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й ONNX
|
||||
|
||||
┘Е╪м┘Е╪п [ONNX (Open Neural Network Exchange)](http://onnx.ai) ┘З┘И ┘Е╪╣┘К╪з╪▒ ┘Е┘Б╪к┘И╪н ┘К┘П╪н╪п╪п ┘Е╪м┘Е┘И╪╣╪й ┘Е╪┤╪к╪▒┘Г╪й ┘Е┘Ж ╪з┘Д╪╣┘И╪з┘Е┘Д ┘И╪к┘Ж╪│┘К┘В ┘Е┘Д┘Б ┘Е╪┤╪к╪▒┘Г ┘Д╪к┘Е╪л┘К┘Д ┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪╣┘Д┘Е ╪з┘Д╪╣┘Е┘К┘В ┘Б┘К ┘Е╪м┘Е┘И╪╣╪й ┘Е╪к┘Ж┘И╪╣╪й ┘И╪з╪│╪╣╪й ┘Е┘Ж ╪з┘Д╪г╪╖╪▒╪М ╪и┘Е╪з ┘Б┘К ╪░┘Д┘Г PyTorch ┘ИTensorFlow. ╪╣┘Ж╪п┘Е╪з ┘К╪к┘Е ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В ONNX╪М ┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘З╪░┘З ╪з┘Д┘Е╪┤╪║┘Д╪з╪к ┘Д╪и┘Ж╪з╪б ╪▒╪│┘Е ╪и┘К╪з┘Ж┘К ╪н╪з╪│┘И╪и┘К (┘К┘П╪╖┘Д┘В ╪╣┘Д┘К┘З ╪║╪з┘Д╪и┘Л╪з ╪з╪│┘Е _╪к┘Е╪л┘К┘Д ┘И╪│┘К╪╖_) ┘И╪з┘Д╪░┘К ┘К┘Е╪л┘Д ╪к╪п┘Б┘В ╪з┘Д╪и┘К╪з┘Ж╪з╪к ╪╣╪и╪▒ ╪з┘Д╪┤╪и┘Г╪й ╪з┘Д╪╣╪╡╪и┘К╪й.
|
||||
|
||||
┘Е┘Ж ╪о┘Д╪з┘Д ╪╣╪▒╪╢ ╪▒╪│┘Е ╪и┘К╪з┘Ж┘К ╪и╪╣┘И╪з┘Е┘Д ┘И╪г┘Ж┘И╪з╪╣ ╪и┘К╪з┘Ж╪з╪к ┘Е╪╣┘К╪з╪▒┘К╪й╪М ┘К┘П╪│┘З┘С┘Д ONNX ╪з┘Д╪к╪и╪п┘К┘Д ╪и┘К┘Ж ╪з┘Д╪г╪╖╪▒. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘К┘П┘Е┘Г┘Ж ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ┘Е╪п╪▒╪и ┘Б┘К PyTorch ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В ONNX ╪л┘Е ╪з╪│╪к┘К╪▒╪з╪п┘З ┘Б┘К TensorFlow (┘И╪з┘Д╪╣┘Г╪│ ╪╡╪н┘К╪н).
|
||||
|
||||
╪и┘Е╪м╪▒╪п ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й ╪к┘Ж╪│┘К┘В ONNX╪М ┘К┘П┘Е┘Г┘Ж:
|
||||
|
||||
- ╪к╪н╪│┘К┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д ╪╣╪и╪▒ ╪к┘В┘Ж┘К╪з╪к ┘Е╪л┘Д [╪к╪н╪│┘К┘Ж ╪з┘Д╪▒╪│┘Е ╪з┘Д╪и┘К╪з┘Ж┘К](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) ┘И [╪з┘Д╪к┘Г┘Е┘К┘Е](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization).
|
||||
- ╪к╪┤╪║┘К┘Д┘З ╪и╪з╪│╪к╪о╪п╪з┘Е ONNX Runtime ╪╣╪и╪▒ ┘Б╪ж╪з╪к [`ORTModelForXXX`](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort)╪М ┘И╪з┘Д╪к┘К ╪к╪к╪и╪╣ ┘Ж┘Б╪│ ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к (API) ┘Д┘А `AutoModel` ╪з┘Д╪к┘К ╪з╪╣╪к╪п╪к ╪╣┘Д┘К┘З╪з ┘Б┘К ЁЯдЧ Transformers.
|
||||
- ╪к╪┤╪║┘К┘Д┘З ╪и╪з╪│╪к╪о╪п╪з┘Е [┘В┘Ж┘И╪з╪к ┘Е╪╣╪з┘Д╪м╪й ╪з┘Д╪з╪│╪к╪п┘Д╪з┘Д ┘Е┘П╪н╪│┘С┘Ж╪й](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)╪М ┘И╪з┘Д╪к┘К ┘Д┘З╪з ┘Ж┘Б╪│ ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪з┘Д╪к╪╖╪и┘К┘В╪з╪к (API) ┘Е╪л┘Д ┘И╪╕┘К┘Б╪й [`pipeline`] ┘Б┘К ЁЯдЧ Transformers.
|
||||
|
||||
┘К┘И┘Б╪▒ ЁЯдЧ Optimum ╪п╪╣┘Е┘Л╪з ┘Д╪к╪╡╪п┘К╪▒ ONNX ┘Е┘Ж ╪о┘Д╪з┘Д ╪з┘Д╪з╪│╪к┘Б╪з╪п╪й ┘Е┘Ж ┘Г╪з╪ж┘Ж╪з╪к ╪з┘Д╪к┘Г┘И┘К┘Ж. ╪к╪г╪к┘К ┘Г╪з╪ж┘Ж╪з╪к ╪з┘Д╪к┘Г┘И┘К┘Ж ┘З╪░┘З ╪м╪з┘З╪▓╪й ┘Д╪╣╪п╪п ┘Е┘Ж ┘Е╪╣┘Е╪з╪▒┘К╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м╪М ┘И┘В╪п ╪к┘Е ╪к╪╡┘Е┘К┘Е┘З╪з ┘Д╪к┘Г┘И┘Ж ┘В╪з╪и┘Д╪й ┘Д┘Д╪к┘И╪│╪╣╪й ╪и╪│┘З┘И┘Д╪й ╪е┘Д┘Й ┘Е╪╣┘Е╪з╪▒┘К╪з╪к ╪г╪о╪▒┘Й.
|
||||
|
||||
┘Д┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й ┘В╪з╪ж┘Е╪й ╪и╪з┘Д╪к┘Г┘И┘К┘Ж╪з╪к ╪з┘Д╪м╪з┘З╪▓╪й╪М ┘К┘П╪▒╪м┘Й ╪з┘Д╪▒╪м┘И╪╣ ╪е┘Д┘Й [┘И╪л╪з╪ж┘В ЁЯдЧ Optimum](https://huggingface.co/docs/optimum/exporters/onnx/overview).
|
||||
|
||||
┘З┘Ж╪з┘Г ╪╖╪▒┘К┘В╪к╪з┘Ж ┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX╪М ┘Ж╪╣╪▒╪╢ ┘З┘Ж╪з ┘Г┘Д┘К┘З┘Е╪з:
|
||||
|
||||
- ╪з┘Д╪к╪╡╪п┘К╪▒ ╪и╪з╪│╪к╪о╪п╪з┘Е ЁЯдЧ Optimum ╪╣╪и╪▒ ┘И╪з╪м┘З╪й ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒ (CLI).
|
||||
- ╪з┘Д╪к╪╡╪п┘К╪▒ ╪и╪з╪│╪к╪о╪п╪з┘Е ЁЯдЧ Optimum ┘Е╪╣ `optimum.onnxruntime`.
|
||||
|
||||
### ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX ╪и╪з╪│╪к╪о╪п╪з┘Е ┘И╪з╪м┘З╪й ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX╪М ┘В┘Е ╪г┘И┘Д╪з┘Л ╪и╪к╪л╪и┘К╪к ╪з╪╣╪к┘Е╪з╪п ╪е╪╢╪з┘Б┘К:
|
||||
|
||||
```bash
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
┘Д┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й ╪м┘Е┘К╪╣ ╪з┘Д┘Е╪╣╪з┘Ея╗╗╪к ╪з┘Д┘Е╪к╪з╪н╪й╪М ┘К╪▒╪м┘Й ╪з┘Д╪▒╪м┘И╪╣ ╪е┘Д┘Й [┘И╪л╪з╪ж┘В ЁЯдЧ Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)╪М ╪г┘И ╪╣╪▒╪╢ ╪з┘Д┘Е╪│╪з╪╣╪п╪й ┘Б┘К ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒:
|
||||
|
||||
```bash
|
||||
optimum-cli export onnx --help
|
||||
```
|
||||
```bash
|
||||
optimum-cli export onnx --help
|
||||
```
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘В╪╖╪й ╪к┘Б╪к┘К╪┤ ┘Ж┘Е┘И╪░╪м ┘Е┘Ж ЁЯдЧ Hub╪М ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М `distilbert/distilbert-base-uncased-distilled-squad`╪М ┘В┘Е ╪и╪к╪┤╪║┘К┘Д ╪з┘Д╪г┘Е╪▒ ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```bash
|
||||
optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
|
||||
```
|
||||
|
||||
┘К╪м╪и ╪г┘Ж ╪к╪┤╪з┘З╪п ╪з┘Д╪│╪м┘Д╪з╪к ╪з┘Д╪к┘К ╪к╪┤┘К╪▒ ╪е┘Д┘Й ╪з┘Д╪к┘В╪п┘Е ╪з┘Д┘Е╪н╪▒╪▓ ┘И╪к╪╕┘З╪▒ ╪з┘Д┘Е┘Г╪з┘Ж ╪з┘Д╪░┘К ╪к┘Е ┘Б┘К┘З ╪н┘Б╪╕ ┘Е┘Д┘Б `model.onnx` ╪з┘Д┘Ж╪з╪к╪м╪М ┘Е╪л┘Д ┘З╪░╪з:
|
||||
|
||||
```bash
|
||||
Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
|
||||
-[тЬУ] ONNX model output names match reference model (start_logits, end_logits)
|
||||
- Validating ONNX Model output "start_logits":
|
||||
-[тЬУ] (2, 16) matches (2, 16)
|
||||
-[тЬУ] all values close (atol: 0.0001)
|
||||
- Validating ONNX Model output "end_logits":
|
||||
-[тЬУ] (2, 16) matches (2, 16)
|
||||
-[тЬУ] all values close (atol: 0.0001)
|
||||
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx
|
||||
```
|
||||
|
||||
┘К┘И╪╢╪н ╪з┘Д┘Е╪л╪з┘Д ╪г╪╣┘Д╪з┘З ╪к╪╡╪п┘К╪▒ ┘Ж┘В╪╖╪й ╪к┘Б╪к┘К╪┤ ┘Е┘Ж ЁЯдЧ Hub. ╪╣┘Ж╪п ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ┘Е╪н┘Д┘К╪М ╪к╪г┘Г╪п ╪г┘И┘Д╪з┘Л ┘Е┘Ж ╪н┘Б╪╕ ┘Е┘Д┘Б╪з╪к ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И┘Е╪н┘И┘Д ╪з┘Д╪▒┘Е┘И╪▓ ┘Б┘К ┘Ж┘Б╪│ ╪з┘Д╪п┘Д┘К┘Д (`local_path`). ╪╣┘Ж╪п ╪з╪│╪к╪о╪п╪з┘Е ┘И╪з╪м┘З╪й ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒╪М ┘В┘Е ╪и╪к┘Е╪▒┘К╪▒ `local_path` ╪е┘Д┘Й ┘И╪│┘К╪╖ `model` ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪з╪│┘Е ┘Ж┘В╪╖╪й ╪з┘Д╪к┘Б╪к┘К╪┤ ╪╣┘Д┘Й ЁЯдЧ Hub ┘И┘В╪п┘Е ┘И╪│┘К╪╖ `--task`. ┘К┘Е┘Г┘Ж┘Г ┘Е╪▒╪з╪м╪╣╪й ┘В╪з╪ж┘Е╪й ╪з┘Д┘Е┘З╪з┘Е ╪з┘Д┘Е╪п╪╣┘И┘Е╪й ┘Б┘К [┘И╪л╪з╪ж┘В ЁЯдЧ Optimum](https://huggingface.co/docs/optimum/exporters/task_manager). ╪е╪░╪з ┘Д┘Е ┘К╪к┘Е ╪к┘И┘Б┘К╪▒ ┘И╪│┘К╪╖ `task`╪М ┘Б╪│┘К╪к┘Е ╪к╪╣┘К┘К┘Ж┘З ╪з┘Б╪к╪▒╪з╪╢┘К┘Л╪з ╪е┘Д┘Й ┘З┘Ж╪п╪│╪й ╪з┘Д┘Ж┘Е┘И╪░╪м ╪п┘И┘Ж ╪г┘К ╪▒╪г╪│ ┘Е╪н╪п╪п ┘Д┘Д┘Е┘З┘Е╪й.
|
||||
|
||||
```bash
|
||||
optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪и╪╣╪п ╪░┘Д┘Г ╪к╪┤╪║┘К┘Д ┘Е┘Д┘Б `model.onnx` ╪з┘Д┘Ж╪з╪к╪м ╪╣┘Д┘Й ╪г╪н╪п [╪з┘Д┘Е╪│╪▒╪╣╪з╪к](https://onnx.ai/supported-tools.html#deployModel) ╪з┘Д╪╣╪п┘К╪п╪й ╪з┘Д╪к┘К ╪к╪п╪╣┘Е ┘Е╪╣┘К╪з╪▒ ONNX. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘К┘Е┘Г┘Ж┘Ж╪з ╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪к╪┤╪║┘К┘Д┘З ╪и╪з╪│╪к╪о╪п╪з┘Е [ONNX Runtime](https://onnxruntime.ai/) ┘Г┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx")
|
||||
>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx")
|
||||
>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
```
|
||||
|
||||
╪к┘Г┘И┘Ж ╪з┘Д╪╣┘Е┘Д┘К╪й ┘Е┘Е╪з╪л┘Д╪й ╪и╪з┘Д┘Ж╪│╪и╪й ╪е┘Д┘Й ┘Ж┘В╪з╪╖ ╪к┘Б╪к┘К╪┤ TensorFlow ╪╣┘Д┘Й Hub. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪е┘Д┘К┘Г ┘Г┘К┘Б┘К╪й ╪к╪╡╪п┘К╪▒ ┘Ж┘В╪╖╪й ╪к┘Б╪к┘К╪┤ TensorFlow ┘Ж┘В┘К╪й ┘Е┘Ж [┘Е┘Ж╪╕┘Е╪й Keras](https://huggingface.co/keras-io):
|
||||
|
||||
```bash
|
||||
optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/
|
||||
```
|
||||
|
||||
### ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX ╪и╪з╪│╪к╪о╪п╪з┘Е `optimum.onnxruntime`
|
||||
|
||||
┘Г╪и╪п┘К┘Д ┘Д┘И╪з╪м┘З╪й ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒╪М ┘К┘П┘Е┘Г┘Ж┘Г ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX ╪и╪▒┘Е╪м┘К┘Л╪з ┘Г┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```python
|
||||
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
|
||||
>>> from transformers import AutoTokenizer
|
||||
|
||||
>>> model_checkpoint = "distilbert_base_uncased_squad"
|
||||
>>> save_directory = "onnx/"
|
||||
|
||||
>>> # ╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м ┘Е┘Ж transformers ┘И╪к╪╡╪п┘К╪▒┘З ╪е┘Д┘Й ONNX
|
||||
>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
|
||||
|
||||
>>> # ╪н┘Б╪╕ ┘Ж┘Е┘И╪░╪м onnx ┘И┘Е╪м╪▓┘Й╪б ╪з┘Д┘Ж╪╡┘И╪╡
|
||||
>>> ort_model.save_pretrained(save_directory)
|
||||
>>> tokenizer.save_pretrained(save_directory)
|
||||
```
|
||||
|
||||
### ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ┘Д┘З┘Ж╪п╪│╪й ╪║┘К╪▒ ┘Е╪п╪╣┘И┘Е╪й
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒╪║╪и ┘Б┘К ╪з┘Д┘Е╪│╪з┘З┘Е╪й ┘Е┘Ж ╪о┘Д╪з┘Д ╪е╪╢╪з┘Б╪й ╪п╪╣┘Е ┘Д┘Ж┘Е┘И╪░╪м ┘Д╪з ┘К┘П┘Е┘Г┘Ж ╪к╪╡╪п┘К╪▒┘З ╪н╪з┘Д┘К┘Л╪з╪М ┘Б┘К╪м╪и ╪╣┘Д┘К┘Г ╪г┘И┘Д╪з┘Л ╪з┘Д╪к╪н┘В┘В ┘Е┘Е╪з ╪е╪░╪з ┘Г╪з┘Ж ┘Е╪п╪╣┘И┘Е┘Л╪з ┘Б┘К [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)╪М ┘И╪е╪░╪з ┘Д┘Е ┘К┘Г┘Ж ┘Е╪п╪╣┘И┘Е┘Л╪з╪М [┘Б┘К┘Е┘Г┘Ж┘Г ╪з┘Д┘Е╪│╪з┘З┘Е╪й ┘Б┘К ЁЯдЧ Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) ┘Е┘П╪и╪з╪┤╪▒╪й┘Л.
|
||||
|
||||
### ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е `transformers.onnx`
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
┘Д┘Е ┘К╪╣╪п ┘К╪к┘Е ╪п╪╣┘Е `tranformers.onnx` ┘К┘П╪▒╪м┘Й ╪к╪╡╪п┘К╪▒ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ЁЯдЧ Optimum ┘Г┘Е╪з ┘З┘И ┘Е┘И╪╢╪н ╪г╪╣┘Д╪з┘З. ╪│┘К╪к┘Е ╪е╪▓╪з┘Д╪й ┘З╪░╪з ╪з┘Д┘В╪│┘Е ┘Б┘К ╪з┘Д╪е╪╡╪п╪з╪▒╪з╪к ╪з┘Д┘В╪з╪п┘Е╪й.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й ONNX ╪и╪з╪│╪к╪о╪п╪з┘Е `tranformers.onnx`╪М ╪л╪и┘С╪к ╪з┘Д╪к╪и╪╣┘К╪з╪к ╪з┘Д╪е╪╢╪з┘Б┘К╪й:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
╪з╪│╪к╪о╪п┘Е ╪н╪▓┘Е╪й `transformers.onnx` ┘Г┘Ж┘Е┘И╪░╪м Python ┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘В╪╖╪й ╪н┘Б╪╕ ╪и╪з╪│╪к╪о╪п╪з┘Е ╪к┘Г┘И┘К┘Ж ╪м╪з┘З╪▓:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
┘К┘П╪╡╪п┘С╪▒ ┘З╪░╪з ╪▒╪│┘Е┘Л╪з ╪и┘К╪з┘Ж┘К┘Л╪з ONNX ┘Д┘Ж┘В╪╖╪й ╪з┘Д╪н┘Б╪╕ ╪з┘Д┘Е┘П╪н╪п╪п╪й ╪и┘И╪з╪│╪╖╪й ┘И╪│┘К╪╖╪й `--model`. ┘Е╪▒╪▒ ╪г┘К ┘Ж┘В╪╖╪й ╪н┘Б╪╕ ╪╣┘Д┘Й ЁЯдЧ Hub ╪г┘И ┘Ж┘В╪╖╪й ╪н┘Б╪╕ ┘Е┘П╪о╪▓┘Ж╪й ┘Е╪н┘Д┘К┘Л╪з.
|
||||
┘К┘П┘Е┘Г┘Ж ╪и╪╣╪п ╪░┘Д┘Г ╪к╪┤╪║┘К┘Д ┘Е┘Д┘Б `model.onnx` ╪з┘Д┘Ж╪з╪к╪м ╪╣┘Д┘Й ╪г╪н╪п ╪з┘Д┘Е┘П╪│╪▒╪╣╪з╪к ╪з┘Д╪╣╪п┘К╪п╪й ╪з┘Д╪к┘К ╪к╪п╪╣┘Е ┘Е╪╣┘К╪з╪▒ ONNX. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘В┘Е ╪и╪к╪н┘Е┘К┘Д ┘И╪к╪┤╪║┘К┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ONNX Runtime ┘Г┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ┘К╪к┘И┘В╪╣ ONNX Runtime ┘Е╪╡┘Б┘И┘Б╪з╪к NumPy ┘Г┘Е╪п╪о┘Д╪з╪к
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
┘К┘П┘Е┘Г┘Ж ╪з┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪г╪│┘Е╪з╪б ╪з┘Д┘Е╪о╪▒╪м╪з╪к ╪з┘Д┘Е╪╖┘Д┘И╪и╪й (┘Е╪л┘Д `["last_hidden_state"]`) ┘Е┘Ж ╪о┘Д╪з┘Д ╪е┘Д┘В╪з╪б ┘Ж╪╕╪▒╪й ╪╣┘Д┘Й ╪к┘Г┘И┘К┘Ж ONNX ┘Д┘Г┘Д ┘Ж┘Е┘И╪░╪м. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘А DistilBERT╪М ┘Д╪п┘К┘Ж╪з:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
╪з┘Д╪╣┘Е┘Д┘К╪з╪к ┘Е┘П╪к╪╖╪з╪и┘В╪й ┘Д┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕ TensorFlow ╪╣┘Д┘Й Hub. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪╡╪п┘С╪▒ ┘Ж┘В╪╖╪й ╪н┘Б╪╕ TensorFlow ╪о╪з┘Д╪╡╪й ┘Г┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ┘Е┘П╪о╪▓┘Ж ┘Е╪н┘Д┘К┘Л╪з╪М ╪з╪н┘Б╪╕ ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘Й ┘Б┘К ┘Ж┘Б╪│ ╪з┘Д╪п┘Д┘К┘Д (╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д `local-pt-checkpoint`)╪М ╪л┘Е ┘В┘Е ╪и╪к╪╡╪п┘К╪▒┘З ╪е┘Д┘Й ONNX ╪╣┘Ж ╪╖╪▒┘К┘В ╪к┘И╪м┘К┘З ┘И╪│┘К╪╖ `--model` ┘Д╪н╪▓┘Е╪й `transformers.onnx` ╪е┘Д┘Й ╪з┘Д╪п┘Д┘К┘Д ╪з┘Д┘Е╪╖┘Д┘И╪и:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
||||
40
docs/source/ar/tflite.md
Normal file
40
docs/source/ar/tflite.md
Normal file
@ -0,0 +1,40 @@
|
||||
# ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TFLite
|
||||
|
||||
[TensorFlow Lite](https://www.tensorflow.org/lite/guide) ┘З┘И ╪е╪╖╪з╪▒ ╪╣┘Е┘Д ╪о┘Б┘К┘Б ╪з┘Д┘И╪▓┘Ж ┘Д┘Ж╪┤╪▒ ┘Ж┘Е╪з╪░╪м ╪з┘Д╪к╪╣┘Д┘Е ╪з┘Д╪в┘Д┘К ╪╣┘Д┘Й ╪з┘Д╪г╪м┘З╪▓╪й ╪з┘Д┘Е╪н╪п┘И╪п╪й ╪з┘Д┘Е┘И╪з╪▒╪п╪М ┘Е╪л┘Д ╪з┘Д┘З┘И╪з╪к┘Б ╪з┘Д┘Е╪н┘Е┘И┘Д╪й╪М ┘И╪з┘Д╪г┘Ж╪╕┘Е╪й ╪з┘Д┘Е╪п┘Е╪м╪й╪М ┘И╪г╪м┘З╪▓╪й ╪е┘Ж╪к╪▒┘Ж╪к ╪з┘Д╪г╪┤┘К╪з╪б (IoT). ╪к┘Е ╪к╪╡┘Е┘К┘Е TFLite ┘Д╪к╪┤╪║┘К┘Д ╪з┘Д┘Ж┘Е╪з╪░╪м ┘И╪к╪н╪│┘К┘Ж┘З╪з ╪и┘Г┘Б╪з╪б╪й ╪╣┘Д┘Й ┘З╪░┘З ╪з┘Д╪г╪м┘З╪▓╪й ╪░╪з╪к ╪з┘Д╪╖╪з┘В╪й ╪з┘Д╪н╪з╪│┘И╪и┘К╪й ┘И╪з┘Д╪░╪з┘Г╪▒╪й ┘И╪з╪│╪к┘З┘Д╪з┘Г ╪з┘Д╪╖╪з┘В╪й ╪з┘Д┘Е╪н╪п┘И╪п╪й.
|
||||
|
||||
┘К┘П┘Е╪л┘О┘С┘Д ┘Ж┘Е┘И╪░╪м TensorFlow Lite ╪и╪к┘Ж╪│┘К┘В ┘Е╪н┘Е┘И┘Д ┘Б╪╣╪з┘Д ╪о╪з╪╡ ┘К┘П╪╣╪▒┘О┘С┘Б ╪и╪з┘Е╪к╪п╪з╪п ╪з┘Д┘Е┘Д┘Б `.tflite`.
|
||||
|
||||
ЁЯдЧ Optimum ┘К┘В╪п┘Е ┘И╪╕┘К┘Б╪й ┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘Е╪з╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й TFLite ┘Е┘Ж ╪о┘Д╪з┘Д ╪з┘Д┘И╪н╪п╪й ╪з┘Д┘Ж┘Е╪╖┘К╪й `exporters.tflite`. ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д┘В╪з╪ж┘Е╪й ┘З┘Ж╪п╪│╪з╪к ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪п╪╣┘И┘Е╪й╪М ┘К╪▒╪м┘Й ╪з┘Д╪▒╪м┘И╪╣ ╪е┘Д┘Й [┘И╪л╪з╪ж┘В ЁЯдЧ Optimum](https://huggingface.co/docs/optimum/exporters/tflite/overview).
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й TFLite╪М ┘В┘Е ╪и╪к╪л╪и┘К╪к ┘Е╪к╪╖┘Д╪и╪з╪к ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д┘Е╪╖┘Д┘И╪и╪й:
|
||||
|
||||
```bash
|
||||
pip install optimum[exporters-tf]
|
||||
```
|
||||
|
||||
┘Д┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й ╪м┘Е┘К╪╣ ╪з┘Д┘Е╪║╪з┘Ея╗╗╪к ╪з┘Д┘Е╪к╪з╪н╪й╪М ╪▒╪з╪м╪╣ [┘И╪л╪з╪ж┘В ЁЯдЧ Optimum](https://huggingface.co/docs/optimum/main/en/exporters/tflite/usage_guides/export_a_model)╪М ╪г┘И ╪╣╪▒╪╢ ╪з┘Д┘Е╪│╪з╪╣╪п╪й ┘Б┘К ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒:
|
||||
|
||||
```bash
|
||||
optimum-cli export tflite --help
|
||||
```
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ ┘Ж╪│╪о╪й ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д ЁЯдЧ Hub╪М ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М `google-bert/bert-base-uncased`╪М ┘В┘Е ╪и╪к╪┤╪║┘К┘Д ╪з┘Д╪г┘Е╪▒ ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```bash
|
||||
optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
|
||||
```
|
||||
|
||||
╪│╪к╪╕┘З╪▒ ┘Д┘Г ╪з┘Д╪│╪м┘Д╪з╪к ╪з┘Д╪к┘К ╪к┘П╪и┘К┘С┘Ж ╪з┘Д╪к┘В╪п┘Е ┘И┘Е┘И┘В╪╣ ╪н┘Б╪╕ ┘Е┘Д┘Б `model.tflite` ╪з┘Д┘Ж╪з╪к╪м╪М ┘Г┘Е╪з ┘Б┘К ╪з┘Д┘Е╪л╪з┘Д ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```bash
|
||||
Validating TFLite model...
|
||||
-[тЬУ] TFLite model output names match reference model (logits)
|
||||
- Validating TFLite Model output "logits":
|
||||
-[тЬУ] (1, 128, 30522) matches (1, 128, 30522)
|
||||
-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
|
||||
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
|
||||
- logits: max diff = 5.817413330078125e-05.
|
||||
The exported model was saved at: bert_tflite
|
||||
```
|
||||
|
||||
┘К┘П╪и┘К┘С┘Ж ╪з┘Д┘Е╪л╪з┘Д ╪г╪╣┘Д╪з┘З ┘Г┘К┘Б┘К╪й ╪к╪╡╪п┘К╪▒ ┘Ж╪│╪о╪й ┘Е┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Д ЁЯдЧ Hub. ╪╣┘Ж╪п ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ┘Е╪н┘Д┘К╪М ╪к╪г┘Г╪п ╪г┘И┘Д╪з┘Л ┘Е┘Ж ╪н┘Б╪╕ ┘Е┘Д┘Б╪з╪к ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д┘Е╪м╪▓╪б ╪з┘Д┘Д╪║┘И┘Й ┘Б┘К ┘Ж┘Б╪│ ╪з┘Д┘Е╪│╪з╪▒ (`local_path`). ╪╣┘Ж╪п ╪з╪│╪к╪о╪п╪з┘Е CLI╪М ┘В┘Е ╪и╪к┘Е╪▒┘К╪▒ `local_path` ╪е┘Д┘Й ┘Е╪╣╪з┘Е┘Д `model` ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪з╪│┘Е ╪з┘Д┘Ж╪│╪о╪й ╪╣┘Д┘Й ЁЯдЧ Hub.
|
||||
154
docs/source/ar/torchscript.md
Normal file
154
docs/source/ar/torchscript.md
Normal file
@ -0,0 +1,154 @@
|
||||
# ╪з┘Д╪к╪╡╪п┘К╪▒ ╪е┘Д┘Й TorchScript
|
||||
|
||||
<Tip>
|
||||
|
||||
┘З╪░┘З ┘З┘К ╪и╪п╪з┘К╪й ╪к╪м╪з╪▒╪и┘Ж╪з ┘Е╪╣ TorchScript ┘И┘Д╪з ╪▓┘Д┘Ж╪з ┘Ж╪│╪к┘Г╪┤┘Б ┘В╪п╪▒╪з╪к┘З ┘Е╪╣ ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪з┘Д┘Е╪к╪║┘К╪▒╪й ╪з┘Д╪н╪м┘Е. ╪е┘Ж┘З ┘Е╪м╪з┘Д ╪з┘З╪к┘Е╪з┘Е┘Ж╪з ┘И╪│┘Ж╪╣┘Е┘В ╪к╪н┘Д┘К┘Д┘Ж╪з ┘Б┘К ╪з┘Д╪е╪╡╪п╪з╪▒╪з╪к ╪з┘Д┘В╪з╪п┘Е╪й╪М ┘Е╪╣ ╪з┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪г┘Е╪л┘Д╪й ╪з┘Д╪и╪▒┘Е╪м┘К╪й╪М ┘И╪к┘Ж┘Б┘К╪░ ╪г┘Г╪л╪▒ ┘Е╪▒┘И┘Ж╪й╪М ┘И┘Е┘В╪з┘К┘К╪│ ┘Е┘В╪з╪▒┘Ж╪й ╪и┘К┘Ж ╪з┘Д╪г┘Г┘И╪з╪п ╪з┘Д┘В╪з╪ж┘Е╪й ╪╣┘Д┘Й Python ┘Е╪╣ ╪г┘Г┘И╪з╪п TorchScript ╪з┘Д┘Е┘П╪м┘Е┘С╪╣╪й.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘И┘Б┘В┘Л╪з ┘Д┘А [┘И╪л╪з╪ж┘В TorchScript](https://pytorch.org/docs/stable/jit.html):
|
||||
|
||||
> TorchScript ┘З┘К ╪╖╪▒┘К┘В╪й ┘Д╪е┘Ж╪┤╪з╪б ┘Ж┘Е╪з╪░╪м ┘В╪з╪и┘Д╪й ┘Д┘Д╪к╪│┘Д╪│┘Д ┘И╪з┘Д╪к╪н╪│┘К┘Ж ┘Е┘Ж ╪к╪╣┘Д┘К┘Е╪з╪к PyTorch ╪з┘Д╪и╪▒┘Е╪м┘К╪й.
|
||||
|
||||
┘З┘Ж╪з┘Г ┘И╪н╪п╪к╪з┘Ж ┘Е┘Ж PyTorch╪М [JIT and TRACE](https://pytorch.org/docs/stable/jit.html)╪М ╪к╪к┘К╪н╪з┘Ж ┘Д┘Д┘Е╪╖┘И╪▒┘К┘Ж ╪к╪╡╪п┘К╪▒ ┘Ж┘Е╪з╪░╪м┘З┘Е ┘Д╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Б┘К ╪и╪▒╪з┘Е╪м ╪г╪о╪▒┘Й ┘Е╪л┘Д ╪и╪▒╪з┘Е╪м C++ ╪з┘Д┘Е┘П╪н╪│┘С┘Ж╪й ┘Д┘Д╪г╪п╪з╪б.
|
||||
|
||||
┘Ж┘В╪п┘Е ┘И╪з╪м┘З╪й ╪к╪к┘К╪н ┘Д┘Г ╪к╪╡╪п┘К╪▒ ┘Ж┘Е╪з╪░╪м ЁЯдЧ Transformers ╪е┘Д┘Й TorchScript ╪и╪н┘К╪л ┘К┘Е┘Г┘Ж ╪е╪╣╪з╪п╪й ╪з╪│╪к╪о╪п╪з┘Е┘З╪з ┘Б┘К ╪и┘К╪ж╪й ┘Е╪о╪к┘Д┘Б╪й ╪╣┘Ж ╪и╪▒╪з┘Е╪м Python ╪з┘Д┘В╪з╪ж┘Е╪й ╪е┘Д┘Й PyTorch. ┘З┘Ж╪з ┘Ж╪┤╪▒╪н ┘Г┘К┘Б┘К╪й ╪к╪╡╪п┘К╪▒ ┘Ж┘Е╪з╪░╪м┘Ж╪з ┘И╪з╪│╪к╪о╪п╪з┘Е┘З╪з ╪и╪з╪│╪к╪о╪п╪з┘Е TorchScript.
|
||||
|
||||
┘К╪к╪╖┘Д╪и ╪к╪╡╪п┘К╪▒ ┘Ж┘Е┘И╪░╪м ╪г┘Е╪▒┘К┘Ж:
|
||||
|
||||
- ╪к┘З┘К╪ж╪й ┘Е╪л┘К┘Д ┘Д┘Д┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ╪╣┘Д╪з┘Е╪й `torchscript`
|
||||
- ╪к┘Е╪▒┘К╪▒ ┘Е┘П╪п╪о┘Д╪з╪к ┘И┘З┘Е┘К╪й (dummy inputs) ╪о┘Д╪з┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
|
||||
╪к┘Ж╪╖┘И┘К ┘З╪░┘З ╪з┘Д╪╢╪▒┘И╪▒╪з╪к ╪╣┘Д┘Й ╪╣╪п╪й ╪г┘Е┘И╪▒ ┘К╪м╪и ╪╣┘Д┘Й ╪з┘Д┘Е╪╖┘И╪▒┘К┘Ж ╪к┘И╪о┘К ╪з┘Д╪н╪░╪▒ ╪и╪┤╪г┘Ж┘З╪з ┘Г┘Е╪з ┘З┘И ┘Е┘Б╪╡┘Д ╪г╪п┘Ж╪з┘З.
|
||||
|
||||
## ╪╣┘Д╪з┘Е╪й TorchScript ┘И╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е╪▒╪к╪и╪╖╪й
|
||||
|
||||
╪╣┘Д╪з┘Е╪й `torchscript` ╪╢╪▒┘И╪▒┘К╪й ┘Д╪г┘Ж ┘Е╪╣╪╕┘Е ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪й ЁЯдЧ Transformers ┘Д┘З╪з ╪г┘И╪▓╪з┘Ж ┘Е╪▒╪к╪и╪╖╪й ╪и┘К┘Ж ╪╖╪и┘В╪й `Embedding` ┘И╪╖╪и┘В╪й `Decoding`. ┘Д╪з ┘К╪│┘Е╪н ┘Д┘Г TorchScript ╪и╪к╪╡╪п┘К╪▒ ╪з┘Д┘Ж┘Е╪з╪░╪м ╪░╪з╪к ╪з┘Д╪г┘И╪▓╪з┘Ж ╪з┘Д┘Е╪▒╪к╪и╪╖╪й╪М ┘Д╪░┘Д┘Г ┘Е┘Ж ╪з┘Д╪╢╪▒┘И╪▒┘К ┘Б╪╡┘Д ╪з┘Д╪г┘И╪▓╪з┘Ж ┘И┘Ж╪│╪о┘З╪з ┘Е╪│╪и┘В┘Л╪з.
|
||||
|
||||
╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е┘П┘З┘К╪г╪й ╪и╪з╪│╪к╪о╪п╪з┘Е ╪╣┘Д╪з┘Е╪й `torchscript` ┘Д┘З╪з ╪╖╪и┘В╪й `Embedding` ┘И╪╖╪и┘В╪й`Decoding` ┘Е┘Ж┘Б╪╡┘Д╪к┘К┘Ж╪М ┘Е┘Е╪з ┘К╪╣┘Ж┘К ╪г┘Ж┘З ┘Д╪з ┘К┘Ж╪и╪║┘К ╪к╪п╪▒┘К╪и┘З╪з ┘Д╪з╪н┘В┘Л╪з. ╪│┘К╪д╪п┘К ╪з┘Д╪к╪п╪▒┘К╪и ╪е┘Д┘Й ╪╣╪п┘Е ╪к╪▓╪з┘Е┘Ж ╪з┘Д╪╖╪и┘В╪к┘К┘Ж╪М ┘Е┘Е╪з ┘К╪д╪п┘К ╪е┘Д┘Й ┘Ж╪к╪з╪ж╪м ╪║┘К╪▒ ┘Е╪к┘И┘В╪╣╪й.
|
||||
|
||||
┘З╪░╪з ┘Д╪з ┘К┘Ж╪╖╪и┘В ╪╣┘Д┘Й ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ┘Д╪з ╪к╪н╪к┘И┘К ╪╣┘Д┘Й ╪▒╪г╪│ ┘Ж┘Е┘И╪░╪м ╪з┘Д┘Д╪║╪й╪М ╪н┘К╪л ┘Д╪з ╪к┘Е┘Д┘Г ╪г┘И╪▓╪з┘Ж┘Л╪з ┘Е╪▒╪к╪и╪╖╪й. ┘К┘Е┘Г┘Ж ╪к╪╡╪п┘К╪▒ ┘З╪░┘З ╪з┘Д┘Ж┘Е╪з╪░╪м ╪и╪г┘Е╪з┘Ж ╪п┘И┘Ж ╪╣┘Д╪з┘Е╪й `torchscript`.
|
||||
|
||||
## ╪з┘Д┘Е╪п╪о┘Д╪з╪к ╪з┘Д┘И┘З┘Е┘К╪й ┘И╪з┘Д╪г╪╖┘И╪з┘Д ╪з┘Д┘В┘К╪з╪│┘К╪й
|
||||
|
||||
╪к┘П╪│╪к╪о╪п┘Е ╪з┘Д┘Е┘П╪п╪о┘Д╪з╪к ╪з┘Д┘И┘З┘Е┘К╪й ┘Д╪к┘Е╪▒┘К╪▒ ╪г┘Е╪з┘Е┘К ╪о┘Д╪з┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м. ╪г╪л┘Ж╪з╪б ╪з┘Ж╪к╪┤╪з╪▒ ┘В┘К┘Е ╪з┘Д┘Е┘П╪п╪о┘Д╪з╪к ╪╣╪и╪▒ ╪з┘Д╪╖╪и┘В╪з╪к╪М ┘К╪к╪к╪и╪╣ PyTorch ╪з┘Д╪╣┘Е┘Д┘К╪з╪к ╪з┘Д┘Е╪о╪к┘Д┘Б╪й ╪з┘Д╪к┘К ┘К╪к┘Е ╪к┘Ж┘Б┘К╪░┘З╪з ╪╣┘Д┘Й ┘Г┘Д ┘Е╪╡┘Б┘И┘Б╪й(tensor). ╪л┘Е ┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘З╪░┘З ╪з┘Д╪╣┘Е┘Д┘К╪з╪к ╪з┘Д┘Е┘П╪│╪м┘Д╪й ╪и╪╣╪п ╪░┘Д┘Г ┘Д╪е┘Ж╪┤╪з╪б *╪г╪л╪▒* ╪з┘Д┘Ж┘Е┘И╪░╪м.
|
||||
|
||||
┘К╪к┘Е ╪е┘Ж╪┤╪з╪б ╪з┘Д╪к╪к╪и╪╣ ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д╪г╪и╪╣╪з╪п ╪з┘Д┘Е┘П╪п╪о┘Д╪з╪к. ┘И╪и╪з┘Д╪к╪з┘Д┘К╪М ┘Б┘З┘И ┘Е┘П┘В┘К┘С╪п ╪и╪г╪и╪╣╪з╪п ╪з┘Д┘Е┘П╪п╪о┘Д╪з╪к ╪з┘Д┘И┘З┘Е┘К╪й╪М ┘И┘Д┘Ж ┘К╪╣┘Е┘Д ┘Д╪г┘К ╪╖┘И┘Д ╪к╪│┘Д╪│┘Д ╪г┘И ╪н╪м┘Е ╪п┘Б╪╣╪й ┘Е╪о╪к┘Д┘Б. ╪╣┘Ж╪п ╪з┘Д┘Е╪н╪з┘И┘Д╪й ╪и╪н╪м┘Е ┘Е╪о╪к┘Д┘Б╪М ┘К╪к┘Е ╪▒┘Б╪╣ ╪з┘Д╪о╪╖╪г ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```
|
||||
`The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`
|
||||
```
|
||||
|
||||
┘Ж┘И╪╡┘К ╪и╪к╪к╪и╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е ╪н╪м┘Е ┘Е┘П╪п╪о┘Д╪з╪к ┘И┘З┘Е┘К╪й ┘Д╪з ┘К┘В┘Д ╪╣┘Ж ╪г┘Г╪и╪▒ ┘Е┘П╪п╪о┘Д ╪│┘К╪к┘Е ╪к┘В╪п┘К┘Е┘З ┘Д┘Д┘Ж┘Е┘И╪░╪м ╪г╪л┘Ж╪з╪б ╪з┘Д╪з╪│╪к╪п┘Д╪з┘Д. ┘К┘Е┘Г┘Ж ╪г┘Ж ╪к╪│╪з╪╣╪п ╪з┘Д╪н╪┤┘И╪й(padding) ┘Б┘К ┘Е┘Д╪б ╪з┘Д┘В┘К┘Е ╪з┘Д┘Е┘Б┘В┘И╪п╪й. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘Ж╪╕╪▒┘Л╪з ┘Д╪к╪к╪и╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪н╪м┘Е ┘Е┘П╪п╪о┘Д ╪г┘Г╪и╪▒╪М ╪│╪к┘Г┘И┘Ж ╪г╪и╪╣╪з╪п ╪з┘Д┘Е╪╡┘Б┘И┘Б╪й ╪│╪к┘Г┘И┘Ж ┘Г╪и┘К╪▒╪й ╪г┘К╪╢┘Л╪з╪М ┘Е┘Е╪з ┘К╪д╪п┘К ╪╣┘Ж┘З ╪з┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪н╪│╪з╪и╪з╪к.
|
||||
|
||||
╪з┘Ж╪к╪и┘З ╪е┘Д┘Й ╪е╪м┘Е╪з┘Д┘К ╪╣╪п╪п ╪з┘Д╪╣┘Е┘Д┘К╪з╪к ╪з┘Д┘Е┘П┘Ж┘Б╪░╪й ╪╣┘Д┘Й ┘Г┘Д ┘Е┘П╪п╪о┘Д ┘И╪к╪з╪и╪╣ ╪з┘Д╪г╪п╪з╪б ╪╣┘Ж ┘Г╪л╪и ╪╣┘Ж╪п ╪к╪╡╪п┘К╪▒ ┘Ж┘Е╪з╪░╪м ┘Е╪к╪║┘К╪▒╪й ╪╖┘И┘Д ╪з┘Д╪к╪│┘Д╪│┘Д.
|
||||
|
||||
## ╪з╪│╪к╪о╪п╪з┘Е TorchScript ┘Б┘К Python
|
||||
|
||||
┘К┘И╪╢╪н ┘З╪░╪з ╪з┘Д┘В╪│┘Е ┘Г┘К┘Б┘К╪й ╪н┘Б╪╕ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘И╪к╪н┘Е┘К┘Д┘З╪з╪М ╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ┘Г┘К┘Б┘К╪й ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪к╪к╪и╪╣ ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д.
|
||||
|
||||
### ╪н┘Б╪╕ ┘Ж┘Е┘И╪░╪м
|
||||
|
||||
┘Д╪к╪╡╪п┘К╪▒ `BertModel` ╪и╪з╪│╪к╪о╪п╪з┘Е TorchScript╪М ┘В┘Е ╪и╪к┘З┘К╪ж╪й ┘А `BertModel` ┘Е┘Ж ┘Б╪ж╪й `BertConfig` ╪л┘Е ╪з╪н┘Б╪╕┘З ╪╣┘Д┘Й ╪з┘Д┘В╪▒╪╡ ╪к╪н╪к ╪з╪│┘Е ╪з┘Д┘Е┘Д┘Б `traced_bert.pt`:
|
||||
|
||||
```python
|
||||
from transformers import BertModel, BertTokenizer, BertConfig
|
||||
import torch
|
||||
|
||||
enc = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
|
||||
|
||||
# Tokenizing input text
|
||||
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
|
||||
tokenized_text = enc.tokenize(text)
|
||||
|
||||
# Masking one of the input tokens
|
||||
masked_index = 8
|
||||
tokenized_text[masked_index] = "[MASK]"
|
||||
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
|
||||
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
|
||||
|
||||
# Creating a dummy input
|
||||
tokens_tensor = torch.tensor([indexed_tokens])
|
||||
segments_tensors = torch.tensor([segments_ids])
|
||||
dummy_input = [tokens_tensor, segments_tensors]
|
||||
|
||||
# Initializing the model with the torchscript flag
|
||||
# Flag set to True even though it is not necessary as this model does not have an LM Head.
|
||||
config = BertConfig(
|
||||
vocab_size_or_config_json_file=32000,
|
||||
hidden_size=768,
|
||||
num_hidden_layers=12,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
torchscript=True,
|
||||
)
|
||||
|
||||
# Instantiating the model
|
||||
model = BertModel(config)
|
||||
|
||||
# The model needs to be in evaluation mode
|
||||
model.eval()
|
||||
|
||||
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
|
||||
model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True)
|
||||
|
||||
# Creating the trace
|
||||
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
|
||||
torch.jit.save(traced_model, "traced_bert.pt")
|
||||
```
|
||||
|
||||
### ╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪з┘Д╪в┘Ж ╪к╪н┘Е┘К┘Д `BertModel` ╪з┘Д┘Е┘П╪н┘Б╪╕ ╪│╪з╪и┘В┘Л╪з╪М `traced_bert.pt`╪М ┘Е┘Ж ╪з┘Д┘В╪▒╪╡ ┘И╪з╪│╪к╪о╪п╪з┘Е┘З ╪╣┘Д┘Й `dummy_input` ╪з┘Д┘Е┘П┘З┘К╪г ╪│╪з╪и┘В┘Л╪з:
|
||||
|
||||
```python
|
||||
loaded_model = torch.jit.load("traced_bert.pt")
|
||||
loaded_model.eval()
|
||||
|
||||
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
|
||||
```
|
||||
|
||||
### ╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Е┘И╪░╪м ┘Е┘П╪к╪к╪и╪╣ ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д
|
||||
|
||||
╪з╪│╪к╪о╪п┘Е ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д┘Е┘П╪к╪к╪и╪╣ ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д ╪и╪з╪│╪к╪о╪п╪з┘Е ╪г╪│┘Д┘И╪и `__call__` ╪з┘Д╪о╪з╪╡ ╪и┘З:
|
||||
|
||||
```python
|
||||
traced_model(tokens_tensor, segments_tensors)
|
||||
```
|
||||
|
||||
## ┘Ж╪┤╪▒ ┘Ж┘Е╪з╪░╪м Hugging Face TorchScript ╪╣┘Д┘Й AWS ╪и╪з╪│╪к╪о╪п╪з┘Е Neuron SDK
|
||||
|
||||
┘В╪п┘Е╪к AWS ╪╣╪з╪ж┘Д╪й [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) ┘Е┘Ж ╪зя╗╖╪м┘З╪▓╪й ┘Д╪о┘Б╪╢ ╪з┘Д╪к┘Г┘Д┘Б╪й ┘И╪г╪п╪з╪б ╪з┘Д╪к╪╣┘Д┘Е ╪з┘Д╪в┘Д┘К ╪╣╪з┘Д┘К ╪з┘Д╪г╪п╪з╪б ┘Б┘К ╪з┘Д╪и┘К╪ж╪й ╪з┘Д╪│╪н╪з╪и┘К╪й. ╪к╪╣┘Е┘Д ╪г╪м┘З╪▓╪й Inf1 ╪и┘И╪з╪│╪╖╪й ╪┤╪▒┘К╪н╪й Inferentia ┘Е┘Ж AWS╪М ┘И┘З┘К ┘Е┘П╪│╪▒┘С╪╣ ╪г╪м┘З╪▓╪й ┘Е┘П╪о╪╡╪╡╪М ┘Е╪к╪о╪╡╪╡ ┘Б┘К ╪г╪╣╪и╪з╪б ╪╣┘Е┘Д ╪з┘Д╪з╪│╪к╪п┘Д╪з┘Д ┘Д┘Д╪к╪╣┘Д┘Е ╪з┘Д╪╣┘Е┘К┘В. [AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#) ┘З┘К SDK ┘Д┘А Inferentia ╪з┘Д╪к┘К ╪к╪п╪╣┘Е ╪к╪к╪и╪╣ ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪н┘И┘Д╪з╪к ┘И╪к╪н╪│┘К┘Ж┘З╪з ┘Д┘Д┘Ж╪┤╪▒ ╪╣┘Д┘Й Inf1. ╪к┘И┘Б╪▒ Neuron SDK ┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
1. ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪к╪╖╪и┘К┘В╪з╪к ╪│┘З┘Д╪й ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е ┘Е╪╣ ╪к╪║┘К┘К╪▒ ╪│╪╖╪▒ ┘И╪з╪н╪п ┘Е┘Ж ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ┘Д╪к╪к╪и╪╣ ┘Ж┘Е┘И╪░╪м TorchScript ┘И╪к╪н╪│┘К┘Ж┘З ┘Д┘Д╪з╪│╪к╪п┘Д╪з┘Д ┘Б┘К ╪з┘Д╪и┘К╪ж╪й ╪з┘Д╪│╪н╪з╪и┘К╪й.
|
||||
2. ╪к╪н╪│┘К┘Ж╪з╪к ╪з┘Д╪г╪п╪з╪б ╪з┘Д╪м╪з┘З╪▓╪й ┘Д┘Д╪з╪│╪к╪о╪п╪з┘Е [╪к╪н╪│┘К┘Ж ╪з┘Д╪к┘Г┘Д┘Б╪й ┘И╪з┘Д╪г╪п╪з╪б](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/benchmark/>).
|
||||
3. ╪п╪╣┘Е ┘Ж┘Е╪з╪░╪м Hugging Face ╪з┘Д┘Е╪н┘И┘Д╪з╪к ╪з┘Д┘Е╪и┘Ж┘К╪й ╪и╪з╪│╪к╪о╪п╪з┘Е ╪е┘Е╪з [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html) ╪г┘И [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html).
|
||||
|
||||
### ╪з┘Д╪в╪л╪з╪▒ ╪з┘Д┘Е╪к╪▒╪к╪и╪й
|
||||
|
||||
╪к╪╣┘Е┘Д ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Е╪н┘И┘Д╪з╪к ╪з┘Д┘Е╪│╪к┘Ж╪п╪й ╪е┘Д┘Й ╪и┘Ж┘К╪й [BERT (╪к┘Е╪л┘К┘Д╪з╪к ╪з┘Д╪к╪▒┘Е┘К╪▓ ╪л┘Ж╪з╪ж┘К╪й ╪з┘Д╪з╪к╪м╪з┘З ┘Е┘Ж ╪з┘Д┘Е╪н┘И┘Д╪з╪к)](https://huggingface.co/docs/transformers/main/model_doc/bert) ╪г┘И ┘Е╪к╪║┘К╪▒╪з╪к┘З╪з ┘Е╪л┘Д [distilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert) ┘И [roBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta) ╪и╪┤┘Г┘Д ╪г┘Б╪╢┘Д ╪╣┘Д┘Й Inf1 ┘Д┘Д┘Е┘З╪з┘Е ╪║┘К╪▒ ╪з┘Д╪к┘И┘Д┘К╪п┘К╪й ┘Е╪л┘Д ╪з┘Д╪е╪м╪з╪и╪й ╪╣┘Д┘Й ╪з┘Д╪г╪│╪ж┘Д╪й ╪з┘Д╪з╪│╪к╪о╪▒╪з╪м┘К╪й╪М ┘И╪к╪╡┘Ж┘К┘Б ╪з┘Д╪к╪│┘Д╪│┘Д╪з╪к╪М ┘И╪к╪╡┘Ж┘К┘Б ╪з┘Д╪▒┘Е┘И╪▓ (tokens). ┘И┘Е╪╣ ╪░┘Д┘Г╪М ┘К┘Е┘Г┘Ж ╪к┘Г┘К┘К┘Б ┘Е┘З╪з┘Е ╪к┘И┘Д┘К╪п ╪з┘Д┘Ж╪╡┘И╪╡ ┘Д┘Д╪╣┘Е┘Д ╪╣┘Д┘Й Inf1 ┘И┘Б┘В┘Л╪з ┘Д┘З╪░╪з [╪и╪▒┘Ж╪з┘Е╪м ╪к╪╣┘Д┘К┘Е┘К AWS Neuron MarianMT](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html). ┘К┘Е┘Г┘Ж ╪з┘Д╪╣╪л┘И╪▒ ╪╣┘Д┘Й ┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к ╪н┘И┘Д ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж ╪к╪н┘И┘К┘Д┘З╪з ╪м╪з┘З╪▓╪й ╪╣┘Д┘Й Inferentia ┘Б┘К ┘В╪│┘Е [┘Е┘Д╪з╪б┘Е╪й ╪и┘Ж┘К╪й ╪з┘Д┘Ж┘Е┘И╪░╪м](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia) ┘Е┘Ж ┘И╪л╪з╪ж┘В Neuron.
|
||||
|
||||
### ╪з┘Д╪к╪и╪╣┘К╪з╪к (Dependencies)
|
||||
|
||||
┘К╪к╪╖┘Д╪и ╪з╪│╪к╪о╪п╪з┘Е AWS Neuron ┘Д╪к╪н┘И┘К┘Д ╪з┘Д┘Ж┘Е╪з╪░╪м [╪и┘К╪ж╪й SDK Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html#installation-guide) ┘И╪з┘Д╪к┘К ╪к╪г╪к┘К ┘Е╪│╪и┘В┘Л╪з ╪╣┘Д┘Й [AMI ┘Д┘Д╪к╪╣┘Д┘Е ╪з┘Д╪╣┘Е┘К┘В ┘Е┘Ж AWS](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html).
|
||||
|
||||
### ╪к╪н┘И┘К┘Д ┘Ж┘Е┘И╪░╪м ┘Д┘А AWS Neuron
|
||||
|
||||
┘В┘Е ╪и╪к╪н┘И┘К┘Д ┘Ж┘Е┘И╪░╪м ┘Д┘А AWS NEURON ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Б╪│ ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ┘Е┘Ж [╪з╪│╪к╪о╪п╪з┘Е TorchScript ┘Б┘К Python](torchscript#using-torchscript-in-python) ┘Д╪к╪к╪и╪╣ `BertModel`. ┘В┘Е ╪и╪з╪│╪к┘К╪▒╪з╪п ╪з┘Е╪к╪п╪з╪п ╪е╪╖╪з╪▒ ╪╣┘Е┘Д `torch.neuron` ┘Д┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й ┘Е┘Г┘И┘Ж╪з╪к Neuron SDK ┘Е┘Ж ╪о┘Д╪з┘Д ┘И╪з╪м┘З╪й ╪и╪▒┘Е╪м╪й ╪к╪╖╪и┘К┘В╪з╪к Python:
|
||||
|
||||
```python
|
||||
from transformers import BertModel, BertTokenizer, BertConfig
|
||||
import torch
|
||||
import torch.neuron
|
||||
```
|
||||
|
||||
┘Г┘Д ┘Е╪з ╪╣┘Д┘К┘Г ┘Б╪╣┘Д┘З ┘З┘И ╪к╪╣╪п┘К┘Д ╪з┘Д╪│╪╖╪▒ ╪з┘Д╪к╪з┘Д┘К:
|
||||
|
||||
```diff
|
||||
- torch.jit.trace(model, [tokens_tensor, segments_tensors])
|
||||
+ torch.neuron.trace(model, [token_tensor, segments_tensors])
|
||||
```
|
||||
|
||||
┘К╪к┘К╪н ╪░┘Д┘Г ┘Д┘А Neuron SDK ╪к╪к╪и╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪к╪н╪│┘К┘Ж┘З ┘Д┘Е╪л┘К┘Д╪з╪к Inf1.
|
||||
|
||||
┘Д┘Е╪╣╪▒┘Б╪й ╪з┘Д┘Е╪▓┘К╪п ╪н┘И┘Д ┘Е┘К╪▓╪з╪к AWS Neuron SDK ┘И╪з┘Д╪г╪п┘И╪з╪к ┘И╪п╪▒┘И╪│ ╪з┘Д╪и╪▒╪з┘Е╪м ╪з┘Д╪к╪╣┘Д┘К┘Е┘К╪й ┘И╪з┘Д╪к╪н╪п┘К╪л╪з╪к ╪з┘Д╪г╪о┘К╪▒╪й╪М ┘К╪▒╪м┘Й ╪з┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й [┘И╪л╪з╪ж┘В AWS NeuronSDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).
|
||||
720
docs/source/ar/trainer.md
Normal file
720
docs/source/ar/trainer.md
Normal file
@ -0,0 +1,720 @@
|
||||
# Trainer
|
||||
|
||||
╪к┘П╪к┘К╪н ┘И╪н╪п╪й [`Trainer`] ╪н┘Д┘В╪й ╪к╪п╪▒┘К╪и ┘И╪к┘В┘К┘К┘Е ┘Е╪к┘Г╪з┘Е┘Д╪й ┘Д┘Ж┘Е╪з╪░╪м PyTorch ╪з┘Д┘Е╪╖╪и┘В╪й ┘Б┘К ┘Е┘Г╪к╪и╪й Transformers. ╪к╪н╪к╪з╪м ┘Б┘В╪╖ ╪е┘Д┘Й ╪к┘Е╪▒┘К╪▒ ╪з┘Д┘Е┘Г┘И┘Ж╪з╪к ╪з┘Д╪╢╪▒┘И╪▒┘К╪й ┘Д┘Д╪к╪п╪▒┘К╪и (╪з┘Д┘Ж┘Е┘И╪░╪м╪М ┘И╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Ж╪╡┘Й╪М ┘И┘Е╪м┘Е┘И╪╣╪й ╪з┘Д╪и┘К╪з┘Ж╪з╪к╪М ╪п╪з┘Д╪й ╪з┘Д╪к┘В┘К┘К┘Е╪М ┘Е╪╣┘Д┘Е╪з╪к ╪з┘Д╪к╪п╪▒┘К╪и ╪з┘Д┘Б╪з╪ж┘В╪й╪М ╪е┘Д╪о)╪М ┘И╪│╪к╪к┘И┘Д┘Й ┘Б╪ж╪й [`Trainer`] ╪з┘Д╪и╪з┘В┘К. ┘З╪░╪з ┘К┘П╪│┘З┘С┘Д ╪и╪п╪б ╪з┘Д╪к╪п╪▒┘К╪и ╪и╪┤┘Г┘Д ╪г╪│╪▒╪╣ ╪п┘И┘Ж ┘Г╪к╪з╪и╪й ╪н┘Д┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ┘К╪п┘И┘К┘Л╪з. ┘И┘Д┘Г┘Ж ┘Б┘К ╪з┘Д┘И┘В╪к ┘Ж┘Б╪│┘З╪М ┘Б╪е┘Ж [`Trainer`] ┘В╪з╪и┘Д ┘Д┘Д╪к╪о╪╡┘К╪╡ ╪и╪п╪▒╪м╪й ┘Г╪и┘К╪▒╪й ┘И┘К┘И┘Б╪▒ ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪о┘К╪з╪▒╪з╪к ╪з┘Д╪к╪п╪▒┘К╪и ╪н╪к┘Й ╪к╪к┘Е┘Г┘Ж ┘Е┘Ж ╪к╪о╪╡┘К╪╡┘З ┘И┘Б┘В┘Л╪з ┘Д╪з╪н╪к┘К╪з╪м╪з╪к ╪з┘Д╪к╪п╪▒┘К╪и ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ╪и╪п┘В╪й.
|
||||
|
||||
<Tip>
|
||||
|
||||
╪и╪з┘Д╪е╪╢╪з┘Б╪й ╪е┘Д┘Й ┘Б╪ж╪й [`Trainer`], ╪к┘И┘Б╪▒ ┘Е┘Г╪к╪и╪й Transformers ╪г┘К╪╢┘Л╪з ┘Б╪ж╪й [`Seq2SeqTrainer`] ┘Д┘Д┘Е┘З╪з┘Е ╪з┘Д╪к╪│┘Д╪│┘Д┘К╪й ┘Е╪л┘Д ╪з┘Д╪к╪▒╪м┘Е╪й ╪г┘И ╪з┘Д╪к┘Д╪о┘К╪╡. ┘З┘Ж╪з┘Г ╪г┘К╪╢┘Л╪з ┘Б╪ж╪й [`~trl.SFTTrainer`] ┘Е┘Ж ┘Е┘Г╪к╪и╪й [TRL](https://hf.co/docs/trl) ╪з┘Д╪к┘К ╪к╪║┘Д┘С┘Б ┘Б╪ж╪й [`Trainer`] ┘И┘З┘К ┘Е┘П╪н┘П╪│┘О┘С┘Ж╪й ┘Д╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪й ┘Е╪л┘Д Llama-2 ┘ИMistral ╪и╪з╪│╪к╪о╪п╪з┘Е ╪к┘В┘Ж┘К╪з╪к ╪з┘Д╪к┘И┘Д┘К╪п ╪з┘Д┘Д╪║┘И┘К. ┘Г┘Е╪з ┘К╪п╪╣┘Е [`~trl.SFTTrainer`] ┘Е┘К╪▓╪з╪к ┘Е╪л┘Д ╪н╪▓┘Е ╪з┘Д╪к╪│┘Д╪│┘Д╪з╪к╪М ┘ИLoRA╪М ┘И╪з┘Д┘В┘К╪з╪│ ╪з┘Д┘Г┘Е┘К╪М ┘ИDeepSpeed ┘Е┘Е╪з ┘К┘П┘Е┘Г┘С┘Ж ┘Е┘Ж ╪з┘Д╪к╪п╪▒┘К╪и ╪и┘Г┘Б╪з╪б╪й ╪╣┘Д┘Й ┘Ж┘Е╪з╪░╪м ╪╢╪о┘Е╪й ╪з┘Д╪н╪м┘Е.
|
||||
|
||||
<br>
|
||||
|
||||
┘Д╪з ╪к╪к╪▒╪п╪п ┘Б┘К ╪з┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й [┘Е╪▒╪м╪╣ API](./main_classes/trainer) ┘Д┘З╪░┘З ╪з┘Д┘Б╪ж╪з╪к ╪з┘Д╪г╪о╪▒┘Й ┘Е┘Ж ╪з┘Д┘Ж┘И╪╣ [`Trainer`] ┘Д┘Е╪╣╪▒┘Б╪й ╪з┘Д┘Е╪▓┘К╪п ╪н┘И┘Д ┘Е╪к┘Й ┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘Г┘Д ┘Е┘Ж┘З╪з. ╪и╪┤┘Г┘Д ╪╣╪з┘Е╪М [`Trainer`] ┘З┘И ╪з┘Д╪о┘К╪з╪▒ ╪з┘Д╪г┘Г╪л╪▒ ╪к┘Ж┘И╪╣┘Л╪з ┘И┘Е┘Ж╪з╪│╪и┘Л╪з ┘Д┘Е╪м┘Е┘И╪╣╪й ┘И╪з╪│╪╣╪й ┘Е┘Ж ╪з┘Д┘Е┘З╪з┘Е. ╪к┘Е ╪к╪╡┘Е┘К┘Е [`Seq2SeqTrainer`] ┘Д┘Д┘Е┘З╪з┘Е ╪з┘Д╪к╪│┘Д╪│┘Д┘К╪й ╪М ┘И [`~trl.SFTTrainer`] ┘Е┘П╪╡┘Е┘Е ┘Д╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪й ╪з┘Д┘Г╪и┘К╪▒╪й.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘В╪и┘Д ╪з┘Д╪и╪п╪б╪М ╪к╪г┘Г╪п ┘Е┘Ж ╪к╪л╪и┘К╪к ┘Е┘Г╪к╪и╪й [Accelerate](https://hf.co/docs/accelerate) - ┘И┘З┘К ┘Е┘Г╪к╪и╪й ╪к┘П┘Е┘Г┘С┘Ж ╪к╪┤╪║┘К┘Д ╪к╪п╪▒┘К╪и PyTorch ┘Б┘К ╪и┘К╪ж╪з╪к ┘Е┘П┘И╪▓╪╣╪й.
|
||||
|
||||
```bash
|
||||
pip install accelerate
|
||||
|
||||
# upgrade
|
||||
pip install accelerate --upgrade
|
||||
```
|
||||
|
||||
┘К┘И┘Б╪▒ ┘З╪░╪з ╪з┘Д╪п┘Д┘К┘Д ┘Ж╪╕╪▒╪й ╪╣╪з┘Е╪й ╪╣┘Д┘Й ┘Б╪ж╪й [`Trainer`].
|
||||
|
||||
## ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г╪│╪з╪│┘К
|
||||
|
||||
┘К╪к╪╢┘Е┘Ж [`Trainer`] ╪м┘Е┘К╪╣ ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪з┘Д╪к┘К ╪│╪к╪м╪п┘З╪з ┘Б┘К ╪н┘Д┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и ╪з┘Д╪г╪│╪з╪│┘К╪й:
|
||||
|
||||
1. ┘В┘Е ╪и╪к┘Ж┘Б┘К╪░ ╪о╪╖┘И╪й ╪к╪п╪▒┘К╪и ┘Д╪н╪│╪з╪и ╪з┘Д╪о╪│╪з╪▒╪й
|
||||
2. ╪з╪н╪│╪и ╪з┘Д┘Е╪┤╪к┘В╪з╪к ╪и╪з╪│╪к╪о╪п╪з┘Е ╪╖╪▒┘К┘В╪й [`~accelerate.Accelerator.backward`]
|
||||
3. ╪к╪н╪п┘К╪л ╪з┘Д╪г┘И╪▓╪з┘Ж ╪и┘Ж╪з╪б┘Л ╪╣┘Д┘Й ╪з┘Д┘Е╪┤╪к┘В╪з╪к
|
||||
4. ┘Г╪▒╪▒ ┘З╪░┘З ╪з┘Д╪╣┘Е┘Д┘К╪й ╪н╪к┘Й ╪к╪╡┘Д ╪е┘Д┘Й ╪╣╪п╪п ┘Е╪н╪п╪п ┘Е╪│╪и┘В┘Л╪з ┘Е┘Ж ╪з┘Д╪п┘И╪▒╪з╪к (epochs).
|
||||
|
||||
╪к┘П╪м╪▒╪п ┘Б╪ж╪й [`Trainer`] ┘Г┘Д ┘З╪░┘З ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪н╪к┘Й ┘Д╪з ╪к╪╢╪╖╪▒ ╪е┘Д┘Й ╪з┘Д┘В┘Д┘В ╪и╪┤╪г┘Ж ┘Г╪к╪з╪и╪й ╪н┘Д┘В╪й ╪к╪п╪▒┘К╪и ┘К╪п┘И┘К┘Л╪з ┘Б┘К ┘Г┘Д ┘Е╪▒╪й ╪г┘Е╪з ╪е╪░╪з ┘Г┘Ж╪к ╪и╪п╪г╪к ┘Д┘Д╪к┘И ┘Б┘К PyTorch ┘И╪з┘Д╪к╪п╪▒┘К╪и. ┘Г┘Д ┘Е╪з ╪╣┘Д┘К┘Г ┘Б╪╣┘Д┘З ┘З┘И ╪к┘И┘Б┘К╪▒ ╪з┘Д┘Е┘Г┘И┘Ж╪з╪к ╪з┘Д╪г╪│╪з╪│┘К╪й ╪з┘Д┘Д╪з╪▓┘Е╪й ┘Д┘Д╪к╪п╪▒┘К╪и╪М ┘Е╪л┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к╪М ┘И╪к╪к╪╣╪з┘Е┘Д ┘Б╪ж╪й [`Trainer`] ┘Е╪╣ ┘Г┘Д ╪┤┘К╪б ╪в╪о╪▒.
|
||||
|
||||
╪е╪░╪з ┘Г┘Ж╪к ╪к┘П╪▒┘К╪п ╪к╪н╪п┘К╪п ╪г┘К ╪о┘К╪з╪▒╪з╪к ╪к╪п╪▒┘К╪и ╪г┘И ┘Е╪╣┘Д┘Е╪з╪к ┘Б╪з╪ж┘В╪й╪М ┘Б┘К┘Е┘Г┘Ж┘Г ╪з┘Д╪╣╪л┘И╪▒ ╪╣┘Д┘К┘З╪з ┘Б┘К ┘Б╪ж╪й [`TrainingArguments`]. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪п╪╣┘Ж╪з ┘Ж╪н╪п╪п ╪г┘К┘Ж ┘К╪к┘Е ╪н┘Б╪╕ ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Б┘К `output_dir` ┘И╪▒┘Б╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й Hub ╪и╪╣╪п ╪з┘Д╪к╪п╪▒┘К╪и ╪и╪з╪│╪к╪о╪п╪з┘Е `push_to_hub=True`.
|
||||
|
||||
```py
|
||||
from transformers import TrainingArguments
|
||||
|
||||
training_args = TrainingArguments(
|
||||
output_dir="your-model"╪М
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=2,
|
||||
weight_decay=0.01,
|
||||
eval_strategy="epoch"╪М
|
||||
save_strategy="epoch"╪М
|
||||
load_best_model_at_end=True,
|
||||
push_to_hub=True,
|
||||
)
|
||||
```
|
||||
┘Е╪▒╪▒ `training_args` ╪е┘Д┘Й [`Trainer`] ╪м┘Ж╪и┘Л╪з ╪е┘Д┘Й ╪м┘Ж╪и ┘Е╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м╪М ┘И┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к╪М ┘И╪┤╪ж ┘Д┘Е╪╣╪з┘Д╪м╪й ┘Е╪м┘Е┘И╪╣╪й ╪з┘Д╪и┘К╪з┘Ж╪з╪к ┘Е╪│╪и┘В┘Л╪з (╪н╪│╪и ┘Ж┘И╪╣ ╪з┘Д╪и┘К╪з┘Ж╪з╪к╪М ┘Б┘В╪п ┘К┘Г┘И┘Ж ┘Е╪н┘Д┘Д┘Л╪з ╪▒┘Е╪▓┘К┘Л╪з ╪г┘И ┘Е╪│╪к╪о╪▒╪м ┘Е┘К╪▓╪з╪к ╪г┘И ┘Е╪╣╪з┘Д╪м ╪╡┘И╪▒)╪М ┘И╪м╪з┘Е╪╣ ╪и┘К╪з┘Ж╪з╪к╪М ┘И╪п╪з┘Д╪й ┘Д╪н╪│╪з╪и ╪з┘Д┘Е┘В╪з┘К┘К╪│ ╪з┘Д╪к┘К ╪к┘П╪▒┘К╪п ╪к╪к╪и╪╣┘З╪з ╪г╪л┘Ж╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и.
|
||||
|
||||
╪г╪о┘К╪▒┘Л╪з╪М ╪з╪│╪к╪п╪╣┘Р [`~Trainer.train`] ┘Д╪и╪п╪б ╪з┘Д╪к╪п╪▒┘К╪и!
|
||||
|
||||
```py
|
||||
from transformers import Trainer
|
||||
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=dataset["train"]╪М
|
||||
eval_dataset=dataset["test"]╪М
|
||||
tokenizer=tokenizer,
|
||||
data_collator=data_collator,
|
||||
compute_metrics=compute_metrics,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### ┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕
|
||||
|
||||
╪к╪н┘Б╪╕ ┘Б╪ж╪й [`Trainer`] ┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕ ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Б┘К ╪з┘Д╪п┘Д┘К┘Д ╪з┘Д┘Е╪н╪п╪п ┘Б┘К ┘Е╪╣╪з┘Е┘Д `output_dir` ┘Е┘Ж [`TrainingArguments`]. ╪│╪к╪м╪п ┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕ ┘Б┘К ┘Е╪м┘Д╪п ┘Б╪▒╪╣┘К ┘К╪│┘Е┘Й `checkpoint-000` ╪н┘К╪л ╪к╪к┘И╪з┘Б┘В ╪з┘Д╪г╪▒┘В╪з┘Е ┘Б┘К ╪з┘Д┘Ж┘З╪з┘К╪й ┘Е╪╣ ╪о╪╖┘И╪й ╪з┘Д╪к╪п╪▒┘К╪и. ╪е┘Ж ╪н┘Б╪╕ ┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕ ┘Е┘Б┘К╪п ┘Д╪з╪│╪к╪ж┘Ж╪з┘Б ╪з┘Д╪к╪п╪▒┘К╪и ┘Д╪з╪н┘В┘Л╪з.
|
||||
|
||||
```py
|
||||
# ╪з╪│╪к╪г┘Ж┘Б ┘Е┘Ж ╪г╪н╪п╪л ┘Ж┘В╪╖╪й ╪н┘Б╪╕
|
||||
trainer.train(resume_from_checkpoint=True)
|
||||
|
||||
# ╪з╪│╪к╪г┘Ж┘Б ┘Е┘Ж ┘Ж┘В╪╖╪й ╪н┘Б╪╕ ┘Е╪н╪п╪п╪й ┘Е╪н┘Б┘И╪╕╪й ┘Б┘К ╪п┘Д┘К┘Д ╪з┘Д╪е╪о╪▒╪з╪м
|
||||
trainer.train(resume_from_checkpoint="your-model/checkpoint-1000")
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪н┘Б╪╕ ┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕ ╪з┘Д╪о╪з╪╡╪й ╪и┘Г (┘Д╪з ┘К╪к┘Е ╪н┘Б╪╕ ╪н╪з┘Д╪й ╪з┘Д┘Е┘П╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘Й ╪к┘В╪з╪ж┘К┘Л╪з) ╪е┘Д┘Й Hub ╪╣┘Ж ╪╖╪▒┘К┘В ╪к╪╣┘К┘К┘Ж `push_to_hub=True` ┘Б┘К [`TrainingArguments`] ┘Д╪▒┘Б╪╣┘З╪з. ╪з┘Д╪о┘К╪з╪▒╪з╪к ╪з┘Д╪г╪о╪▒┘Й ┘Д╪з╪к╪о╪з╪░ ╪з┘Д┘В╪▒╪з╪▒ ╪и╪┤╪г┘Ж ┘Г┘К┘Б┘К╪й ╪н┘Б╪╕ ┘З╪░╪й ╪з┘Д┘Ж┘В╪з╪╖ ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ┘З┘К ╪з┘Д╪е╪╣╪п╪з╪п ┘Б┘К ┘Е╪╣╪з┘Е┘Д [`hub_strategy`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.hub_strategy):
|
||||
|
||||
* `hub_strategy="checkpoint"` ┘К╪п┘Б╪╣ ╪г╪н╪п╪л ┘Ж┘В╪╖╪й ╪н┘Б╪╕ ╪е┘Д┘Й ┘Е╪м┘Д╪п ┘Б╪▒╪╣┘К ┘К╪│┘Е┘Й "last-checkpoint" ┘К┘Е┘Г┘Ж┘Г ╪з╪│╪к╪ж┘Ж╪з┘Б ╪з┘Д╪к╪п╪▒┘К╪и ┘Е┘Ж┘З
|
||||
* `hub_strategy="all_checkpoints"` ┘К╪п┘Б╪╣ ╪м┘Е┘К╪╣ ┘Ж┘В╪з╪╖ ╪з┘Д╪н┘Б╪╕ ╪е┘Д┘Й ╪з┘Д╪п┘Д┘К┘Д ╪з┘Д┘Е╪н╪п╪п ┘Б┘К `output_dir` (╪│╪к╪▒┘Й ┘Ж┘В╪╖╪й ╪н┘Б╪╕ ┘И╪з╪н╪п╪й ┘Д┘Г┘Д ┘Е╪м┘Д╪п ┘Б┘К ┘Е╪│╪к┘И╪п╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪о╪з╪╡ ╪и┘Г)
|
||||
|
||||
╪╣┘Ж╪п ╪з╪│╪к╪ж┘Ж╪з┘Б ╪з┘Д╪к╪п╪▒┘К╪и ┘Е┘Ж ┘Ж┘В╪╖╪й ╪н┘Б╪╕╪М ╪к┘П╪н╪з┘И┘Д [`Trainer`] ╪з┘Д╪н┘Б╪з╪╕ ╪╣┘Д┘Й ╪н╪з┘Д╪з╪к RNG Python ┘ИNumPy ┘ИPyTorch ┘Г┘Е╪з ┘Г╪з┘Ж╪к ╪╣┘Ж╪п┘Е╪з ╪к┘Е ╪н┘Б╪╕ ┘Ж┘В╪╖╪й ╪з┘Д╪н┘Б╪╕. ┘И┘Д┘Г┘Ж ┘Д╪г┘Ж PyTorch ┘Д╪п┘К┘З╪з ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪з┘Д╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д╪з┘Б╪к╪▒╪з╪╢┘К╪й ╪║┘К╪▒ ╪з┘Д╪н╪к┘Е┘К╪й ┘Е┘П╪к┘Ж┘И╪╣╪й╪М ┘Б╪е┘Ж ╪н╪з┘Д╪з╪к RNG ┘Д┘К╪│╪к ┘Е╪╢┘Е┘И┘Ж╪й ┘Д╪к┘Г┘И┘Ж ┘З┘К ┘Ж┘Б╪│┘З╪з. ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪к┘Е┘Г┘К┘Ж ╪з┘Д╪н╪к┘Е┘К╪й ╪з┘Д┘Г╪з┘Е┘Д╪й╪М ┘Б╪▒╪з╪м╪╣ ╪п┘Д┘К┘Д [╪з┘Д╪к╪н┘Г┘Е ┘Б┘К ┘Е╪╡╪з╪п╪▒ ╪з┘Д╪╣╪┤┘И╪з╪ж┘К╪й](https://pytorch.org/docs/stable/notes/randomness#controlling-sources-of-randomness) ┘Д┘Е╪╣╪▒┘Б╪й ┘Е╪з ┘К┘П┘Е┘Г┘Ж┘Г ╪к┘Е┘Г┘К┘Ж┘З ┘Д╪м╪╣┘Д ╪к╪п╪▒┘К╪и┘Г ╪н╪к┘Е┘К┘Л╪з ╪к┘Е╪з┘Е┘Л╪з. ╪╢╪╣ ┘Б┘К ╪з╪╣╪к╪и╪з╪▒┘Г ╪г┘Ж┘З ┘Е┘Ж ╪о┘Д╪з┘Д ╪м╪╣┘Д ╪е╪╣╪п╪з╪п╪з╪к ┘Е╪╣┘К┘Ж╪й ╪н╪к┘Е┘К╪й╪М ┘Б┘В╪п ┘К┘Г┘И┘Ж ╪з┘Д╪к╪п╪▒┘К╪и ╪г╪и╪╖╪г.
|
||||
|
||||
## ╪к╪о╪╡┘К╪╡ ╪з┘Д┘Е╪п╪▒╪и
|
||||
|
||||
┘Б┘К ╪н┘К┘Ж ╪г┘Ж ┘Б╪ж╪й [`Trainer`] ┘Е┘П╪╡┘Е┘Е╪й ┘Д╪к┘Г┘И┘Ж ╪│┘З┘Д╪й ╪з┘Д┘И╪╡┘И┘Д ┘И╪│┘З┘Д╪й ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е╪М ┘Б╪е┘Ж┘З╪з ╪к┘И┘Б╪▒ ╪г┘К╪╢┘Л╪з ╪з┘Д┘Г╪л┘К╪▒ ┘Е┘Ж ┘В╪з╪и┘Д┘К╪й ╪з┘Д╪к╪о╪╡┘К╪╡ ┘Д┘Д┘Е╪│╪к╪о╪п┘Е┘К┘Ж ╪з┘Д┘Е╪║╪з┘Е╪▒┘К┘Ж. ┘К┘П┘Е┘Г┘Ж ╪е┘Ж╪┤╪з╪б ┘Б╪ж╪з╪к ┘Б╪▒╪╣┘К╪й ┘Е┘Ж ╪з┘Д╪╣╪п┘К╪п ┘Е┘Ж ╪г╪│╪з┘Д┘К╪и [`Trainer`] ┘И╪к╪м╪з┘И╪▓┘З╪з ┘Д╪п╪╣┘Е ╪з┘Д┘И╪╕╪з╪ж┘Б ╪з┘Д╪к┘К ╪к┘П╪▒┘К╪п┘З╪з╪М ╪п┘И┘Ж ╪з┘Д╪н╪з╪м╪й ╪е┘Д┘Й ╪е╪╣╪з╪п╪й ┘Г╪к╪з╪и╪й ╪н┘Д┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и ╪и╪г┘Г┘Е┘Д┘З╪з ┘Е┘Ж ╪з┘Д╪и╪п╪з┘К╪й ┘Д╪з╪│╪к┘К╪╣╪з╪и┘З╪з. ╪к╪к╪╢┘Е┘Ж ┘З╪░┘З ╪з┘Д╪г╪│╪з┘Д┘К╪и:
|
||||
|
||||
* [`~Trainer.get_train_dataloader`] ┘К┘Ж╪┤╪ж DataLoader ┘Д┘Д╪к╪п╪▒┘К╪и
|
||||
* [`~Trainer.get_eval_dataloader`] ┘К┘Ж╪┤╪ж DataLoader ┘Д┘Д╪к┘В┘К┘К┘Е
|
||||
* [`~Trainer.get_test_dataloader`] ┘К┘Ж╪┤╪ж DataLoader ┘Д┘Д╪з╪о╪к╪и╪з╪▒
|
||||
* [`~Trainer.log`] ┘К╪│╪м┘Д ┘Е╪╣┘Д┘И┘Е╪з╪к ╪н┘И┘Д ┘Е╪о╪к┘Д┘Б ╪з┘Д┘Г╪з╪ж┘Ж╪з╪к ╪з┘Д╪к┘К ╪к╪▒╪з┘В╪и ╪з┘Д╪к╪п╪▒┘К╪и
|
||||
* [`~Trainer.create_optimizer_and_scheduler`] ┘К┘Ж╪┤╪ж ┘Е╪н╪│┘Ж┘Л╪з ┘И┘Е╪о╪╖╪╖┘Л╪з ┘Д┘Е┘П╪╣╪п┘Д ╪з┘Д╪к╪╣┘Д┘Е ╪е╪░╪з ┘Д┘Е ┘К╪к┘Е ╪к┘Е╪▒┘К╪▒┘З┘Е╪з ┘Б┘К `__init__`╪Ы ┘К┘Е┘Г┘Ж ╪г┘К╪╢┘Л╪з ╪к╪о╪╡┘К╪╡ ┘З╪░┘З ╪з┘Д┘И╪╕╪з╪ж┘Б ╪и╪┤┘Г┘Д ┘Е┘Ж┘Б╪╡┘Д ╪и╪з╪│╪к╪о╪п╪з┘Е [`~Trainer.create_optimizer`] ┘И [`~Trainer.create_scheduler`] ╪╣┘Д┘Й ╪з┘Д╪к┘И╪з┘Д┘К
|
||||
* [`~Trainer.compute_loss`] ┘К╪н╪│╪и ╪п╪з┘Д╪й ╪з┘Д╪о╪│╪з╪▒╪й ╪╣┘Д┘Й ╪п┘Б╪╣╪й ┘Е┘Ж ┘Е┘П╪п╪о┘Д╪з╪к ╪з┘Д╪к╪п╪▒┘К╪и
|
||||
* [`~Trainer.training_step`] ┘К┘П┘Ж┘Б╪░ ╪о╪╖┘И╪й ╪з┘Д╪к╪п╪▒┘К╪и
|
||||
* [`~Trainer.prediction_step`] ┘К┘П┘Ж┘Б╪░ ╪о╪╖┘И╪й ╪з┘Д╪к┘Ж╪и╪д ┘И╪з┘Д╪з╪о╪к╪и╪з╪▒
|
||||
* [`~Trainer.evaluate`] ┘К┘П┘В┘К┘С┘Е ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И┘К╪╣┘К╪п ┘Е┘В╪з┘К┘К╪│ ╪з┘Д╪к┘В┘К┘К┘Е
|
||||
* [`~Trainer.predict`] ┘К┘П╪м╪▒┘К ╪з┘Д╪к┘Ж╪и╪д╪з╪к (┘Е╪╣ ╪з┘Д┘Е┘В╪з┘К┘К╪│ ╪е╪░╪з ┘Г╪з┘Ж╪к ╪з┘Д╪╣┘Д╪з┘Е╪з╪к ┘Е╪к╪з╪н╪й) ╪╣┘Д┘Й ┘Е╪м┘Е┘И╪╣╪й ╪з┘Д╪з╪о╪к╪и╪з╪▒
|
||||
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪к╪о╪╡┘К╪╡ ╪╖╪▒┘К┘В╪й [`~Trainer.compute_loss`] ┘Д╪з╪│╪к╪о╪п╪з┘Е ╪п╪з┘Д╪й ╪о╪│╪з╪▒╪й ╪░╪з╪к ╪к╪▒╪м┘К╪н ╪и╪п┘Д╪з┘Л ┘Е┘Ж ╪░┘Д┘Г.
|
||||
|
||||
|
||||
```py
|
||||
from torch import nn
|
||||
from transformers import Trainer
|
||||
|
||||
class CustomTrainer(Trainer):
|
||||
def compute_loss(self, model, inputs, return_outputs=False):
|
||||
labels = inputs.pop("labels")
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
logits = outputs.get("logits")
|
||||
# compute custom loss for 3 labels with different weights
|
||||
loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0], device=model.device))
|
||||
loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
|
||||
return (loss, outputs) if return_outputs else loss
|
||||
```
|
||||
|
||||
### ╪п┘И╪з┘Д ╪з┘Д╪з╪│╪к╪п╪╣╪з╪б Callbacks
|
||||
|
||||
╪о┘К╪з╪▒ ╪в╪о╪▒ ┘Д╪к╪о╪╡┘К╪╡ [`Trainer`] ┘З┘И ╪з╪│╪к╪о╪п╪з┘Е [╪п┘И╪з┘Д ╪з┘Д╪з╪│╪к╪п╪╣╪з╪б](callbacks). ┘Д╪з *╪к╪║┘К╪▒* ╪п┘И╪з┘Д ╪з┘Д╪з╪│╪к╪п╪╣╪з╪б ╪г┘К ╪┤┘К╪б ┘Б┘К ╪н┘Д┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и. ╪е┘Ж┘З┘Е ╪к┘Б╪н╪╡ ╪н╪з┘Д╪й ╪н┘Д┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и ╪л┘Е ╪к┘П┘Ж┘Б╪░ ╪и╪╣╪╢ ╪з┘Д╪е╪м╪▒╪з╪б╪з╪к (┘Е╪л┘Д ╪з┘Д╪е┘К┘В╪з┘Б ╪з┘Д┘Е╪и┘Г╪▒ ╪г┘И ╪к╪│╪м┘К┘Д ╪з┘Д┘Ж╪к╪з╪ж╪м╪М ╪е┘Д╪о) ╪з╪╣╪к┘Е╪з╪п┘Л╪з ╪╣┘Д┘Й ╪з┘Д╪н╪з┘Д╪й. ┘И╪и╪╣╪и╪з╪▒╪й ╪г╪о╪▒┘Й╪М ┘Д╪з ┘К┘Е┘Г┘Ж ╪з╪│╪к╪о╪п╪з┘Е ╪п╪з┘Д╪й ╪з┘Д╪з╪│╪к╪п╪╣╪з╪б ┘Д╪к┘Ж┘Б┘К╪░ ╪┤┘К╪б ┘Е╪л┘Д ╪п╪з┘Д╪й ╪о╪│╪з╪▒╪й ┘Е╪о╪╡╪╡╪й╪М ┘И┘К╪м╪и ╪╣┘Д┘К┘Г ╪к╪м╪з┘И╪▓ ╪п╪з┘Д╪й [`~Trainer.compute_loss`] ┘Д╪░┘Д┘Г.
|
||||
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪е╪░╪з ┘Г┘Ж╪к ╪к╪▒┘К╪п ╪е╪╢╪з┘Б╪й ╪п╪з┘Д╪й ╪з╪│╪к╪п╪╣╪з╪б ╪е┘К┘В╪з┘Б ┘Е╪и┘Г╪▒ ╪е┘Д┘Й ╪н┘Д┘В╪й ╪з┘Д╪к╪п╪▒┘К╪и ╪и╪╣╪п 10 ╪о╪╖┘И╪з╪к.
|
||||
|
||||
```py
|
||||
from transformers import TrainerCallback
|
||||
|
||||
class EarlyStoppingCallback(TrainerCallback):
|
||||
def __init__(self, num_steps=10):
|
||||
self.num_steps = num_steps
|
||||
|
||||
def on_step_end(self, args, state, control, **kwargs):
|
||||
if state.global_step >= self.num_steps:
|
||||
return {"should_training_stop": True}
|
||||
else:
|
||||
return {}
|
||||
```
|
||||
|
||||
╪л┘Е ┘Е╪▒╪▒┘З ╪е┘Д┘Й ┘Е╪╣╪з┘Е┘Д `callback` ┘Б┘К [`Trainer`].
|
||||
|
||||
```py
|
||||
from transformers import Trainer
|
||||
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=dataset["train"]╪М
|
||||
eval_dataset=dataset["test"]╪М
|
||||
tokenizer=tokenizer,
|
||||
data_collator=data_collator,
|
||||
compute_metrics=compute_metrics,
|
||||
callback=[EarlyStoppingCallback()],
|
||||
)
|
||||
```
|
||||
|
||||
## ╪к╪│╪м┘К┘Д ╪з┘Д╪г╪н╪п╪з╪л (Logging)
|
||||
|
||||
<Tip>
|
||||
|
||||
╪▒╪з╪м╪╣ ┘Е╪▒╪м╪╣ [API](./main_classes/logging) ┘Д┘Д╪к╪│╪м┘К┘Д ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к ╪н┘И┘Д ┘Е╪│╪к┘И┘К╪з╪к ╪з┘Д╪к╪│╪м┘К┘Д ╪з┘Д┘Е╪о╪к┘Д┘Б╪й ┘Д┘Д╪г╪н╪п╪з╪л.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘К╪к┘Е ╪к╪╣┘К┘К┘Ж [`Trainer`] ╪е┘Д┘Й `logging.INFO` ╪з┘Б╪к╪▒╪з╪╢┘К┘Л╪з ┘И╪з┘Д╪░┘К ┘К┘П╪и┘Д╪║ ╪╣┘Ж ╪з┘Д╪г╪о╪╖╪з╪б ┘И╪з┘Д╪к╪н╪░┘К╪▒╪з╪к ┘И┘Е╪╣┘Д┘И┘Е╪з╪к ╪г╪│╪з╪│┘К╪й ╪г╪о╪▒┘Й. ┘К╪к┘Е ╪к╪╣┘К┘К┘Ж ┘Ж╪│╪о╪й [`Trainer`] - ┘Б┘К ╪з┘Д╪и┘К╪ж╪з╪к ╪з┘Д┘Е┘И╪▓╪╣╪й - ╪е┘Д┘Й `logging.WARNING` ┘И╪з┘Д╪к┘К ┘К┘П╪и┘Д╪║ ┘Б┘В╪╖ ╪╣┘Ж ╪з┘Д╪г╪о╪╖╪з╪б ┘И╪з┘Д╪к╪н╪░┘К╪▒╪з╪к. ┘К┘Е┘Г┘Ж┘Г ╪к╪║┘К┘К╪▒ ┘Е╪│╪к┘И┘Й ╪к╪│╪м┘К┘Д ╪з┘Д╪г╪н╪п╪з╪л ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е╪╣╪з┘Е┘Д┘К [`log_level`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.log_level) ┘И [`log_level_replica`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.log_level_replica) ┘Б┘К [`TrainingArguments`].
|
||||
|
||||
┘Д╪к┘З┘К╪ж╪й ╪е╪╣╪п╪з╪п ┘Е┘П╪│╪к┘И┘Й ╪к╪│╪м┘К┘Д ╪зя╗╖╪н╪п╪з╪л ┘Д┘Г┘Д ╪╣┘В╪п╪й╪М ╪з╪│╪к╪о╪п┘Е ┘Е╪╣╪з┘Е┘Д [`log_on_each_node`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.log_on_each_node) ┘Д╪к╪н╪п┘К╪п ┘Е╪з ╪е╪░╪з ┘Г╪з┘Ж ╪│┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е ┘Е┘П╪│╪к┘И┘Й ╪з┘Д╪│╪м┘Д ╪╣┘Д┘Й ┘Г┘Д ╪╣┘В╪п╪й ╪г┘И ┘Б┘В╪╖ ╪╣┘Д┘Й ╪з┘Д╪╣┘В╪п╪й ╪з┘Д╪▒╪ж┘К╪│┘К╪й.
|
||||
|
||||
<Tip>
|
||||
|
||||
┘К╪н╪п╪п [`Trainer`] ┘Е┘П╪│╪к┘И┘Й ╪з┘Д╪к╪│╪м┘К┘Д ╪и╪┤┘Г┘Д ┘Е┘П┘Ж┘Б╪╡┘Д ┘Д┘Г┘Д ╪╣┘В╪п╪й ┘Б┘К ╪╖╪▒┘К┘В╪й [`Trainer.__init__`]╪М ┘Д╪░╪з ┘Б┘В╪п ╪к╪▒╪║╪и ┘Б┘К ╪з┘Д╪к┘Б┘Г┘К╪▒ ┘Б┘К ╪к╪╣┘К┘К┘Ж ┘З╪░╪з ╪з┘Д╪е╪╣╪п╪з╪п ┘Б┘К ┘И┘В╪к ╪│╪з╪и┘В ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е ┘И╪╕╪з╪ж┘Б Transformers ╪з┘Д╪г╪о╪▒┘Й ┘В╪и┘Д ╪е┘Ж╪┤╪з╪б ┘Г╪з╪ж┘Ж [`Trainer`].
|
||||
|
||||
</Tip>
|
||||
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘Д╪к╪╣┘К┘К┘Ж ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ┘И╪з┘Д┘И╪н╪п╪з╪к ╪з┘Д┘Ж┘Е╪╖┘К╪й ╪з┘Д╪▒╪ж┘К╪│┘К╪й ╪з┘Д╪о╪з╪╡╪й ╪и┘Г ┘Д╪з╪│╪к╪о╪п╪з┘Е ┘Ж┘Б╪│ ┘Е┘П╪│╪к┘И┘Й ╪з┘Д╪к╪│╪м┘К┘Д ┘И┘Б┘В┘Л╪з ┘Д┘Г┘Д ╪╣┘В╪п╪й:
|
||||
|
||||
```py
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logging.basicConfig(
|
||||
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s"╪М
|
||||
datefmt="%m/%d/%Y %H:%M:%S"╪М
|
||||
handlers=[logging.StreamHandler(sys.stdout)],
|
||||
)
|
||||
|
||||
log_level = training_args.get_process_log_level()
|
||||
logger.setLevel(log_level)
|
||||
datasets.utils.logging.set_verbosity(log_level)
|
||||
transformers.utils.logging.set_verbosity(log_level)
|
||||
|
||||
trainer = Trainer(...)
|
||||
```
|
||||
|
||||
╪з╪│╪к╪о╪п┘Е ╪к╪▒┘Г┘К╪и╪з╪к ┘Е╪о╪к┘Д┘Б╪й ┘Е┘Ж `log_level` ┘И `log_level_replica` ┘Д╪к┘З┘К╪ж╪й ┘Е╪з ┘К╪к┘Е ╪к╪│╪м┘К┘Д┘З ╪╣┘Д┘Й ┘Г┘Д ┘Е┘Ж ╪з┘Д╪╣┘В╪п.
|
||||
|
||||
|
||||
<hfoptions id="logging">
|
||||
<hfoption id="single node">
|
||||
|
||||
```bash
|
||||
my_app.py ... --log_level warning --log_level_replica error
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="multi-node">
|
||||
|
||||
╪г╪╢┘Б ┘Е╪╣┘Д┘Е╪й `log_on_each_node 0` ┘Д╪и┘К╪ж╪з╪к ┘Е╪к╪╣╪п╪п╪й ╪з┘Д╪╣┘В╪п.
|
||||
|
||||
```bash
|
||||
my_app.py ... --log_level warning --log_level_replica error --log_on_each_node 0
|
||||
|
||||
# set to only report errors
|
||||
my_app.py ... --log_level error --log_level_replica error --log_on_each_node 0
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
## NEFTune
|
||||
|
||||
[NEFTune](https://hf.co/papers/2310.05914) ┘З┘К ╪к┘В┘Ж┘К╪й ┘К┘Е┘Г┘Ж ╪г┘Ж ╪к╪н╪│┘Ж ╪з┘Д╪г╪п╪з╪б ╪╣┘Ж ╪╖╪▒┘К┘В ╪е╪╢╪з┘Б╪й ╪╢┘И╪╢╪з╪б ╪е┘Д┘Й ┘Е┘П╪к╪м┘З╪з╪к ╪з┘Д╪к╪╣┘Д┘Е ╪г╪л┘Ж╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и. ┘Д╪к┘Е┘Г┘К┘Ж┘З ┘Б┘К [`Trainer`], ┘В┘Е ╪и╪к╪╣┘К┘К┘Ж ┘Е╪╣╪з┘Е┘Д `neftune_noise_alpha` ┘Б┘К [`TrainingArguments`] ┘Д┘Д╪к╪н┘Г┘Е ┘Б┘К ┘Е┘В╪п╪з╪▒ ╪з┘Д╪╢┘И╪╢╪з╪б ╪з┘Д┘Е┘П╪╢╪з┘Б╪й.
|
||||
|
||||
```py
|
||||
from transformers import TrainingArguments, Trainer
|
||||
|
||||
training_args = TrainingArguments(..., neftune_noise_alpha=0.1)
|
||||
trainer = Trainer(..., args=training_args)
|
||||
```
|
||||
|
||||
┘К╪к┘Е ╪к╪╣╪╖┘К┘Д NEFTune ╪и╪╣╪п ╪з┘Д╪к╪п╪▒┘К╪и ┘Д╪з╪│╪к╪╣╪з╪п╪й ╪╖╪и┘В╪й ╪з┘Д╪к╪╣┘Д┘Е ╪з┘Д╪г╪╡┘Д┘К╪й ┘Д╪к╪м┘Ж╪и ╪г┘К ╪│┘Д┘И┘Г ╪║┘К╪▒ ┘Е╪к┘И┘В╪╣.
|
||||
|
||||
## ┘Ж┘И╪з╪й Liger
|
||||
[Liger-Kernel](https://github.com/linkedin/Liger-Kernel) Kernel ┘З┘К ┘Е╪м┘Е┘И╪╣╪й ┘Е┘Ж ┘Ж┘И┘Й Triton ╪з┘Д╪к┘К ╪╖┘И╪▒╪к┘З╪з Linkedin ┘Е┘П╪╡┘Е┘Е╪й ╪о╪╡┘К╪╡┘Л╪з ┘Д╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪й ╪з┘Д┘Г╪и┘К╪▒╪й (LLM). ┘Д┘В╪п ┘В┘Е┘Ж╪з ╪и╪к┘Ж┘Б┘К╪░ RMSNorm ┘И RoPE ┘И SwiGLU ┘И CrossEntropy ┘И FusedLinearCrossEntropy ┘Е┘П╪к┘И╪з┘Б┘В╪й ┘Е╪╣ Hugging Face╪М ┘И╪з┘Д┘Е╪▓┘К╪п ┘В╪з╪п┘Е. ┘К┘П┘Е┘Г┘Ж┘З╪з ╪▓┘К╪з╪п╪й ╪е┘Ж╪к╪з╪м┘К╪й ╪з┘Д╪к╪п╪▒┘К╪и ┘Е╪к╪╣╪п╪п ┘И╪н╪п╪з╪к ┘Е╪╣╪з┘Д╪м╪й ╪з┘Д╪▒╪│┘И┘Е╪з╪к (GPU) ╪и┘Ж╪│╪и╪й 20┘к ┘И╪к┘В┘Д┘К┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪░╪з┘Г╪▒╪й ╪и┘Ж╪│╪и╪й 60┘к. ╪к╪╣┘Е┘Д ╪з┘Д┘Ж┘И╪з╪й ╪и╪┤┘Г┘Д ╪к┘Д┘В╪з╪ж┘К ┘Е╪╣ flash attention ┘И PyTorch FSDP ┘И Microsoft DeepSpeed.
|
||||
|
||||
╪з╪н╪╡┘Д ╪╣┘Д┘Й ╪▓┘К╪з╪п╪й ┘Б┘К ╪з┘Д╪е┘Ж╪к╪з╪м┘К╪й ╪и┘Ж╪│╪и╪й 20┘к ┘И╪к┘В┘Д┘К┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪░╪з┘Г╪▒╪й ╪и┘Ж╪│╪и╪й 60┘к ╪╣┘Д┘Й ╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м LLaMA 3-8B. ╪н┘В┘В ╪г╪╖┘И╪з┘Д ╪│┘К╪з┘В ╪г┘Г╪и╪▒ ┘И╪г╪н╪м╪з┘Е ╪п┘Б╪╣╪з╪к ╪г┘Г╪и╪▒. ┘Г┘Е╪з ╪г┘Ж┘З╪з ┘Е┘П┘Б┘К╪п╪й ╪е╪░╪з ┘Г┘Ж╪к ╪к┘П╪▒┘К╪п ╪▓┘К╪з╪п╪й ╪н╪м┘Е ┘Ж┘Е┘И╪░╪м┘Г ╪е┘Д┘Й ╪к╪п╪▒┘К╪и ╪и┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д╪▒╪д┘И╪│ ╪г┘И ╪г╪н╪м╪з┘Е ┘Е┘П┘Б╪▒╪п╪з╪к ╪╢╪о┘Е╪й. ╪г╪╖┘Д┘В ╪з┘Д╪╣┘Ж╪з┘Ж ┘Д┘Д╪к╪п╪▒┘К╪и ╪и┘Ж┘Е╪з╪░╪м ┘Е╪к╪╣╪п╪п╪й ╪з┘Д╪▒╪д┘И╪│ (medusa) ┘И╪з┘Д┘Е╪▓┘К╪п. ╪▒╪з╪м╪╣ ╪з┘Д╪к┘Б╪з╪╡┘К┘Д ┘И╪з┘Д╪г┘Е╪л┘Д╪й ┘Б┘К [Liger](https://github.com/linkedin/Liger-Kernel/tree/main/examples)
|
||||
╪к╪г┘Г╪п ╪г┘И┘Д╪з┘Л ┘Е┘Ж ╪к╪л╪и┘К╪к ┘Е╪│╪к┘И╪п╪╣ Liger ╪з┘Д╪▒╪│┘Е┘К:
|
||||
```bash
|
||||
pip install liger-kernel
|
||||
```
|
||||
┘К╪м╪и ╪╣┘Д┘К┘Г ╪к┘Е╪▒┘К╪▒ `use_liger_kernel=True` ┘Д╪к╪╖╪и┘К┘В ┘Ж┘И╪з╪й `liger` ╪╣┘Д┘Й ┘Ж┘Е┘И╪░╪м┘Г╪М ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д:
|
||||
|
||||
```python
|
||||
from transformers import TrainingArguments
|
||||
|
||||
training_args = TrainingArguments(
|
||||
output_dir="your-model",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=2,
|
||||
weight_decay=0.01,
|
||||
eval_strategy="epoch",
|
||||
save_strategy="epoch",
|
||||
load_best_model_at_end=True,
|
||||
push_to_hub=True,
|
||||
use_liger_kernel=True
|
||||
)
|
||||
```
|
||||
|
||||
╪к╪п╪╣┘Е ╪з┘Д┘Ж┘И╪з╪й ┘Е╪╣┘Е╪з╪▒┘К╪з╪к ┘Ж┘Е╪з╪░╪м Llama ┘И Gemma ┘И Mistral ┘И Mixtral. ┘К┘П┘Е┘Г┘Ж ╪з┘Д╪╣╪л┘И╪▒ ╪╣┘Д┘Й ╪г╪н╪п╪л ┘В╪з╪ж┘Е╪й ╪и╪з┘Д┘Ж┘Е╪з╪ж╪м ╪з┘Д┘Е╪п╪╣┘И┘Е╪й [┘З┘Ж╪з](https://github.com/linkedin/Liger-Kernel). ╪╣┘Ж╪п┘Е╪з ┘К╪к┘Е ╪к╪╣┘К┘К┘Ж `use_liger_kernel` ╪е┘Д┘Й `True`╪М ╪│┘К╪к┘Е ╪к╪╡╪н┘К╪н ╪з┘Д╪╖╪и┘В╪з╪к ╪з┘Д┘Е┘П┘В╪з╪и┘Д╪й ┘Б┘К ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪г╪╡┘Д┘К ╪и╪з╪│╪к╪о╪п╪з┘Е ╪к╪╖╪и┘К┘В Liger ╪з┘Д┘Б╪╣╪з┘Д╪М ┘Д╪░┘Д┘Г ┘Д╪з ╪к╪н╪к╪з╪м ╪е┘Д┘Й ┘Б╪╣┘Д ╪г┘К ╪┤┘К╪б ╪е╪╢╪з┘Б┘К ╪и╪о┘Д╪з┘Б ╪к╪╣┘К┘К┘Ж ┘В┘К┘Е╪й ╪з┘Д┘Е╪╣╪з┘Е┘Д.
|
||||
|
||||
## ╪з┘Д┘Е┘П╪н╪│┘С┘Р┘Ж╪з╪к
|
||||
┘К┘Е┘Г┘Ж┘Г ╪з╪о╪к┘К╪з╪▒ ┘Е┘П╪н╪│┘С┘Р┘Ж ┘Е╪п┘Е╪м ┘Д┘Д╪к╪п╪▒┘К╪и ╪и╪з╪│╪к╪о╪п╪з┘Е:
|
||||
```python
|
||||
from transformers import TrainingArguments
|
||||
training_args = TrainingArguments(..., optim="adamw_torch")
|
||||
```
|
||||
╪з╪╖┘Д╪╣ ╪╣┘Д┘Й [`OptimizerNames`](https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py) ┘Д┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й ╪з┘Д┘В╪з╪ж┘Е╪й ╪з┘Д┘Г╪з┘Е┘Д╪й ┘Д┘Д╪о┘К╪з╪▒╪з╪к. ┘Ж┘П╪п╪▒╪м ╪г┘Е╪л┘Д╪й ┘Е┘П╪к┘В╪п┘Е╪й ┘Б┘К ╪з┘Д╪г┘В╪│╪з┘Е ╪г╪п┘Ж╪з┘З.
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪з╪│╪к╪о╪п╪з┘Е ┘Е┘П╪н╪│┘С┘Р┘Ж PyTorch ╪╣╪┤┘И╪з╪ж┘К ╪╣╪и╪▒:
|
||||
```python
|
||||
import torch
|
||||
|
||||
optimizer_cls = torch.optim.AdamW
|
||||
optimizer_kwargs = {
|
||||
"lr": 4e-3,
|
||||
"betas": (0.9, 0.999),
|
||||
"weight_decay": 0.05,
|
||||
}
|
||||
|
||||
from transformers import Trainer
|
||||
trainer = Trainer(..., optimizer_cls_and_kwargs=(optimizer_cls, optimizer_kwargs))
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
### GaLore
|
||||
|
||||
╪е╪│┘В╪з╪╖ ╪з┘Д╪к╪п╪▒╪м ╪░┘И ╪з┘Д╪▒╪к╪и╪й ╪з┘Д┘Е┘Ж╪о┘Б╪╢╪й (GaLore) ┘З┘И ╪е╪│╪к╪▒╪з╪к┘К╪м┘К╪й ╪к╪п╪▒┘К╪и ╪░╪з╪к ╪▒╪к╪и╪й ┘Е┘Ж╪о┘Б╪╢╪й ┘Б╪╣┘С╪з┘Д╪й ┘Е┘Ж ╪н┘К╪л ╪з┘Д╪░╪з┘Г╪▒╪й╪М ╪к╪│┘Е╪н ╪и╪к╪╣┘Д┘Е ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ╪з┘Д┘Г╪з┘Е┘Д╪й ┘И┘Д┘Г┘Ж┘З╪з ╪г┘Г╪л╪▒ ┘Г┘Б╪з╪б╪й ┘Е┘Ж ╪н┘К╪л ╪з┘Д╪░╪з┘Г╪▒╪й ┘Е┘Ж ╪г╪│╪з┘Д┘К╪и ╪з┘Д╪к┘Г┘К┘С┘Б ╪з┘Д╪┤╪з╪ж╪╣╪й ╪░╪з╪к ╪з┘Д╪▒╪к╪и╪й ╪з┘Д┘Е┘Ж╪о┘Б╪╢╪й╪М ┘Е╪л┘Д LoRA.
|
||||
|
||||
╪г┘И┘Д╪з┘Л╪М ╪к╪г┘Г╪п ┘Е┘Ж ╪к╪л╪и┘К╪к ╪з┘Д┘Е╪│╪к┘И╪п╪╣ ╪з┘Д╪▒╪│┘Е┘К ┘Д┘А GaLore:
|
||||
|
||||
```bash
|
||||
pip install galore-torch
|
||||
```
|
||||
|
||||
╪л┘Е ╪г╪╢┘Б ╪и╪и╪│╪з╪╖╪й ╪г╪н╪п `["galore_adamw"╪М "galore_adafactor"╪М "galore_adamw_8bit"]` ┘Б┘К `optim` ╪м┘Ж╪и┘Л╪з ╪е┘Д┘Й ╪м┘Ж╪и ┘Е╪╣ `optim_target_modules`╪М ┘И╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж ╪г┘Ж ╪к┘Г┘И┘Ж ┘В╪з╪ж┘Е╪й ┘Е┘Ж ╪з┘Д╪│┘Д╪з╪│┘Д ╪г┘И ╪з┘Д╪к╪╣╪и┘К╪▒╪з╪к ╪з┘Д┘Ж┘Е╪╖┘К╪й regex ╪г┘И ╪з┘Д┘Е╪│╪з╪▒ ╪з┘Д┘Г╪з┘Е┘Д ╪з┘Д┘Е╪╖╪з╪и┘В ┘Д╪г╪│┘Е╪з╪б ╪з┘Д┘И╪н╪п╪з╪к ╪з┘Д┘Е╪│╪к┘З╪п┘Б╪й ╪з┘Д╪к┘К ╪к╪▒┘К╪п ╪к┘Г┘К┘К┘Б┘З╪з. ┘Б┘К┘Е╪з ┘К┘Д┘К ┘Е╪л╪з┘Д ╪╣┘Д┘Й ╪з┘Д┘Ж╪╡ ╪з┘Д╪и╪▒┘Е╪м┘К ┘Г╪з┘Е┘Д(╪к╪г┘Г╪п ┘Е┘Ж `pip install trl datasets`):
|
||||
|
||||
```python
|
||||
import torch
|
||||
import datasets
|
||||
import trl
|
||||
|
||||
from transformers import TrainingArguments, AutoConfig, AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
train_dataset = datasets.load_dataset('imdb', split='train')
|
||||
|
||||
args = TrainingArguments(
|
||||
output_dir="./test-galore"╪М
|
||||
max_steps=100,
|
||||
per_device_train_batch_size=2,
|
||||
optim="galore_adamw"╪М
|
||||
optim_target_modules=[r".*.attn.*"╪М r".*.mlp.*"]
|
||||
)
|
||||
|
||||
model_id = "google/gemma-2b"
|
||||
|
||||
config = AutoConfig.from_pretrained(model_id)
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_config(config).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
max_seq_length=512,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
┘Д╪к┘Е╪▒┘К╪▒ ┘Е╪╣╪з┘Ея╗╗╪к ╪е╪╢╪з┘Б┘К╪й ┘К╪п╪╣┘Е┘З╪з GaLore╪М ┘К╪м╪и ╪╣┘Д┘К┘Г ╪к┘Е╪▒┘К╪▒ `optim_args` ╪и╪┤┘Г┘Д ╪╡╪н┘К╪н╪М ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д:
|
||||
|
||||
```python
|
||||
import torch
|
||||
import datasets
|
||||
import trl
|
||||
|
||||
from transformers import TrainingArguments, AutoConfig, AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
train_dataset = datasets.load_dataset('imdb', split='train')
|
||||
|
||||
args = TrainingArguments(
|
||||
output_dir="./test-galore",
|
||||
max_steps=100,
|
||||
per_device_train_batch_size=2,
|
||||
optim="galore_adamw",
|
||||
optim_target_modules=[r".*.attn.*", r".*.mlp.*"],
|
||||
optim_args="rank=64, update_proj_gap=100, scale=0.10",
|
||||
)
|
||||
|
||||
model_id = "google/gemma-2b"
|
||||
|
||||
config = AutoConfig.from_pretrained(model_id)
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_config(config).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
max_seq_length=512,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
┘К┘Е┘Г┘Ж┘Г ┘В╪▒╪з╪б╪й ╪з┘Д┘Е╪▓┘К╪п ╪н┘И┘Д ╪з┘Д╪╖╪▒┘К┘В╪й ┘Б┘К [╪з┘Д┘Е╪│╪к┘И╪п╪╣ ╪з┘Д╪г╪╡┘Д┘К](https://github.com/jiaweizzhao/GaLore) ╪г┘И [╪з┘Д┘И╪▒┘В╪й ╪з┘Д╪и╪н╪л┘К╪й](https://arxiv.org/abs/2403.03507).
|
||||
|
||||
╪н╪з┘Д┘К┘Л╪з╪М ┘К┘Е┘Г┘Ж┘Г ┘Б┘В╪╖ ╪к╪п╪▒┘К╪и ╪з┘Д╪╖╪и┘В╪з╪к ╪з┘Д╪о╪╖┘К╪й ╪з┘Д╪к┘К ╪к╪╣╪к╪и╪▒ ╪╖╪и┘В╪з╪к GaLore ┘И╪│╪к╪│╪к╪о╪п┘Е ╪з┘Д╪к╪н┘Д┘Д ╪░┘И ╪з┘Д╪▒╪к╪и╪й ╪з┘Д┘Е┘Ж╪о┘Б╪╢╪й ┘Д┘Д╪к╪п╪▒┘К╪и ╪и┘К┘Ж┘Е╪з ╪│┘К╪к┘Е ╪к╪н╪│┘К┘Ж ╪з┘Д╪╖╪и┘В╪з╪к ╪з┘Д┘Е╪к╪и┘В┘К╪й ╪и╪з┘Д╪╖╪▒┘К┘В╪й ╪з┘Д╪к┘В┘Д┘К╪п┘К╪й.
|
||||
|
||||
┘Д╪з╪н╪╕ ╪г┘Ж┘З ╪│┘К╪│╪к╪║╪▒┘В ╪з┘Д╪г┘Е╪▒ ╪и╪╣╪╢ ╪з┘Д┘И┘В╪к ┘В╪и┘Д ╪и╪п╪б ╪з┘Д╪к╪п╪▒┘К╪и (~3 ╪п┘В╪з╪ж┘В ┘Д┘Ж┘Е┘И╪░╪м 2B ╪╣┘Д┘Й NVIDIA A100)╪М ┘И┘Д┘Г┘Ж ┘К╪м╪и ╪г┘Ж ┘К╪│┘К╪▒ ╪з┘Д╪к╪п╪▒┘К╪и ╪и╪│┘Д╪з╪│╪й ╪и╪╣╪п ╪░┘Д┘Г.
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪е╪м╪▒╪з╪б ╪к╪н╪│┘К┘Ж ╪╖╪и┘В╪й ╪к┘Д┘И ╪з┘Д╪г╪о╪▒┘Й ╪╣┘Ж ╪╖╪▒┘К┘В ╪е╪╢╪з┘Б╪й `layerwise` ╪е┘Д┘Й ╪з╪│┘Е ╪з┘Д┘Е┘П╪н╪│┘С┘Р┘Ж ┘Г┘Е╪з ┘З┘И ┘Е┘И╪╢╪н ╪г╪п┘Ж╪з┘З:
|
||||
|
||||
```python
|
||||
import torch
|
||||
import datasets
|
||||
import trl
|
||||
|
||||
from transformers import TrainingArguments╪М AutoConfig╪М AutoTokenizer╪М AutoModelForCausalLM
|
||||
|
||||
train_dataset = datasets.load_dataset('imdb'╪М split='train')
|
||||
|
||||
args = TrainingArguments(
|
||||
output_dir="./test-galore"╪М
|
||||
max_steps=100╪М
|
||||
per_device_train_batch_size=2╪М
|
||||
optim="galore_adamw_layerwise"╪М
|
||||
optim_target_modules=[r".*.attn.*"╪М r".*.mlp.*"]
|
||||
)
|
||||
|
||||
model_id = "google/gemma-2b"
|
||||
|
||||
config = AutoConfig.from_pretrained(model_id)
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_config(config).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model╪М
|
||||
args=args╪М
|
||||
train_dataset=train_dataset╪М
|
||||
dataset_text_field='text'╪М
|
||||
max_seq_length=512╪М
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
┘Д╪з╪н╪╕ ╪г┘Ж ╪к╪н╪│┘К┘Ж ╪з┘Д╪╖╪и┘В╪й ╪к╪м╪▒┘К╪и┘К ╪е┘Д┘Й ╪н╪п ┘Е╪з ┘И┘Д╪з ┘К╪п╪╣┘Е DDP (Distributed Data Parallel)╪М ┘И╪и╪з┘Д╪к╪з┘Д┘К ┘К┘Е┘Г┘Ж┘Г ╪к╪┤╪║┘К┘Д ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ┘Д┘Д╪к╪п╪▒┘К╪и ╪╣┘Д┘Й ┘И╪н╪п╪й ┘Е╪╣╪з┘Д╪м╪й ╪з┘Д╪▒╪│┘И┘Е╪з╪к (GPU) ┘И╪з╪н╪п╪й ┘Б┘В╪╖. ┘К╪▒╪м┘Й ╪з┘Д╪з╪╖┘Д╪з╪╣ ╪╣┘Д┘Й [┘З╪░╪з ╪з┘Д┘В╪│┘Е ╪з┘Д┘Е┘Ж╪з╪│╪и](https://github.com/jiaweizzhao/GaLore?tab=readme-ov-file#train-7b-model-with-a-single-gpu-with-24gb-memory) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪к┘Б╪з╪╡┘К┘Д. ┘В╪п ┘Д╪з ╪к╪п╪╣┘Е ╪з┘Д┘Е┘К╪▓╪з╪к ╪з┘Д╪г╪о╪▒┘Й ┘Е╪л┘Д ╪к┘В┘Д┘К┘Е ╪з┘Д╪к╪п╪▒╪м╪з╪к ╪г┘И DeepSpeed╪М ╪е┘Д╪о. ┘Е┘Ж ╪з┘Д╪╡┘Ж╪п┘И┘В. ┘К╪▒╪м┘Й [╪к┘В╪п┘К┘Е ╪к┘В╪▒┘К╪▒ ╪╣┘Ж ╪з┘Д┘Е╪┤┘Г┘Д╪й ╪╣┘Д┘Й GitHub](https://github.com/huggingface/transformers/issues) ╪е╪░╪з ┘И╪з╪м┘З╪к┘Г ┘Е╪л┘Д ┘З╪░┘З ╪з┘Д┘Е╪┤┘Г┘Д╪й.
|
||||
|
||||
### ┘Е╪н╪│┘Ж╪з╪к LOMO
|
||||
|
||||
╪к┘Е ╪к┘В╪п┘К┘Е ┘Е┘П╪н╪│┘С┘Р┘Ж╪з╪к LOMO ┘Б┘К [╪з┘Д╪к╪п╪▒┘К╪и ╪╣┘Д┘Й ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ╪з┘Д┘Г╪з┘Е┘Д╪й ┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Д╪║╪й ╪з┘Д┘Г╪и┘К╪▒╪й ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е┘И╪з╪▒╪п ┘Е╪н╪п┘И╪п╪й](https://hf.co/papers/2306.09782) ┘И [AdaLomo: ╪к╪н╪│┘К┘Ж ╪░╪з┘Г╪▒╪й ┘Е┘Ж╪о┘Б╪╢╪й ╪и┘Е╪╣╪п┘Д ╪к╪╣┘Д┘Е ┘Е╪к┘Г┘К┘Б](https://hf.co/papers/2310.10195).
|
||||
┘К╪к┘Г┘И┘Ж ┘Г┘Д╪з┘З┘Е╪з ┘Е┘Ж ╪╖╪▒┘К┘В╪й ┘Б╪╣╪з┘Д╪й ┘Д╪╢╪и╪╖ ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ╪з┘Д┘Г╪з┘Е┘Д╪й. ╪к╪п┘Е╪м ┘Е╪н╪│┘Ж╪з╪к LOMO ╪н╪│╪з╪и ╪з┘Д╪з╪┤╪к┘В╪з┘В ┘И╪к╪н╪п┘К╪л ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ┘Б┘К ╪о╪╖┘И╪й ┘И╪з╪н╪п╪й ┘Д╪к┘В┘Д┘К┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪░╪з┘Г╪▒╪й. ┘Е╪н╪│┘Ж╪з╪к LOMO ╪з┘Д┘Е╪п╪╣┘И┘Е╪й ┘З┘К `"lomo"` ┘И `"adalomo"`. ╪г┘И┘Д╪з┘Л ┘В┘Е ╪и╪к╪л╪и┘К╪к LOMO ┘Е┘Ж pypi `pip install lomo-optim` ╪г┘И ┘В┘Е ╪и╪к╪л╪и┘К╪к┘З ┘Е┘Ж ╪з┘Д┘Е╪╡╪п╪▒ ╪и╪з╪│╪к╪о╪п╪з┘Е `pip install git+https://github.com/OpenLMLab/LOMO.git`.
|
||||
|
||||
<Tip>
|
||||
|
||||
┘И┘Б┘В┘Л╪з ┘Д┘Д┘Е╪д┘Д┘Б┘К┘Ж╪М ┘К┘И╪╡┘Й ╪и╪з╪│╪к╪о╪п╪з┘Е `AdaLomo` ╪и╪п┘И┘Ж `grad_norm` ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪г╪п╪з╪б ╪г┘Б╪╢┘Д ┘И╪│╪▒╪╣╪й ╪г╪╣┘Д┘Й.
|
||||
|
||||
</Tip>
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Ж╪╡ ╪и╪▒┘Е╪м┘К ╪и╪│┘К╪╖ ┘К┘И╪╢╪н ┘Г┘К┘Б┘К╪й ╪╢╪и╪╖ ┘Ж┘Е┘И╪░╪м [google/gemma-2b](https://huggingface.co/google/gemma-2b) ╪╣┘Д┘Й ┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к IMDB ┘Б┘К ╪з┘Д╪п┘В╪й ╪з┘Д┘Г╪з┘Е┘Д╪й:
|
||||
|
||||
```python
|
||||
import torch
|
||||
import datasets
|
||||
from transformers import TrainingArguments╪М AutoTokenizer╪М AutoModelForCausalLM
|
||||
import trl
|
||||
|
||||
train_dataset = datasets.load_dataset('imdb'╪М split='train')
|
||||
|
||||
args = TrainingArguments(
|
||||
output_dir="./test-lomo"╪М
|
||||
max_steps=100╪М
|
||||
per_device_train_batch_size=4╪М
|
||||
optim="adalomo"╪М
|
||||
gradient_checkpointing=True╪М
|
||||
logging_strategy="steps"╪М
|
||||
logging_steps=1╪М
|
||||
learning_rate=2e-6╪М
|
||||
save_strategy="no"╪М
|
||||
run_name="lomo-imdb"╪М
|
||||
)
|
||||
|
||||
model_id = "google/gemma-2b"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id╪М low_cpu_mem_usage=True).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model╪М
|
||||
args=args╪М
|
||||
train_dataset=train_dataset╪М
|
||||
dataset_text_field='text'╪М
|
||||
max_seq_length=1024╪М
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### ┘Е┘П╪н╪│┘С┘Р┘Ж GrokAdamW
|
||||
╪к┘Е ╪к╪╡┘Е┘К┘Е ┘Е┘П╪н╪│┘С┘Р┘Ж GrokAdamW ┘Д╪к╪╣╪▓┘К╪▓ ╪г╪п╪з╪б ╪з┘Д╪к╪п╪▒┘К╪и ┘И╪з╪│╪к┘В╪▒╪з╪▒┘З╪М ╪о╪з╪╡╪й┘Л ┘Д┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ╪к╪│╪к┘Б┘К╪п ┘Е┘Ж ╪п┘И╪з┘Д ╪е╪┤╪з╪▒╪й `grokking`. ┘Д╪з╪│╪к╪о╪п╪з┘Е `GrokAdamW`╪М ┘В┘Е ╪г┘И┘Д╪з┘Л ╪и╪к╪л╪и┘К╪к ╪н╪▓┘Е╪й ╪з┘Д┘Е┘П╪н╪│┘С┘Р┘Ж ╪и╪з╪│╪к╪о╪п╪з┘Е `pip install grokadamw`.
|
||||
<Tip>
|
||||
┘К┘П╪╣╪п GrokAdamW ┘Е┘Б┘К╪п┘Л╪з ╪и╪┤┘Г┘Д ╪о╪з╪╡ ┘Д┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д╪к┘К ╪к╪к╪╖┘Д╪и ╪к┘В┘Ж┘К╪з╪к ╪к╪н╪│┘К┘Ж ┘Е┘П╪к┘В╪п┘Е╪й ┘Д╪к╪н┘В┘К┘В ╪г╪п╪з╪б ┘И╪з╪│╪к┘В╪▒╪з╪▒ ╪г┘Б╪╢┘Д.
|
||||
</Tip>
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Ж╪╡ ╪и╪▒┘Е╪м┘Й ╪и╪│┘К╪╖ ┘Д╪┤╪▒╪н ┘Г┘К┘Б┘К╪й ╪╢╪и╪╖ [google/gemma-2b](https://huggingface.co/google/gemma-2b) ╪и╪п┘В╪й ╪╣┘Д┘Й ┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к IMDB ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е┘П╪н╪│┘С┘Р┘Ж GrokAdamW:
|
||||
```python
|
||||
import torch
|
||||
import datasets
|
||||
from transformers import TrainingArguments, AutoTokenizer, AutoModelForCausalLM, Trainer
|
||||
|
||||
# ╪к╪н┘Е┘К┘Д ┘Е╪м┘Е┘И╪╣╪й ╪з┘Д╪и┘К╪з┘Ж╪з╪к IMDB
|
||||
train_dataset = datasets.load_dataset('imdb', split='train')
|
||||
|
||||
# ╪к╪╣╪▒┘К┘Б ┘Е╪╣╪з┘Ея╗╗╪к ╪з┘Д╪к╪п╪▒┘К╪и
|
||||
args = TrainingArguments(
|
||||
output_dir="./test-grokadamw",
|
||||
max_steps=1000,
|
||||
per_device_train_batch_size=4,
|
||||
optim="grokadamw",
|
||||
logging_strategy="steps",
|
||||
logging_steps=1,
|
||||
learning_rate=2e-5,
|
||||
save_strategy="no",
|
||||
run_name="grokadamw-imdb",
|
||||
)
|
||||
|
||||
# ╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ┘И╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Д╪║┘И┘К
|
||||
model_id = "google/gemma-2b"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True).to(0)
|
||||
|
||||
# ╪к┘З┘К╪ж╪й ╪з┘Д┘Е╪п╪▒╪и
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
)
|
||||
|
||||
# ╪к╪п╪▒┘К╪и ╪з┘Д┘Ж┘Е┘И╪░╪м
|
||||
trainer.train()
|
||||
```
|
||||
┘К┘И╪╢╪н ┘З╪░╪з ╪з┘Д┘Ж╪╡ ╪з┘Д╪и╪▒┘Е╪м┘Й ┘Г┘К┘Б┘К╪й ╪╢╪и╪╖ ┘Ж┘Е┘И╪░╪м google/gemma-2b ╪и╪п┘В╪й ╪╣┘Д┘Й ┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к IMDB ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Е┘П╪н╪│┘С┘Р┘Ж GrokAdamW. ┘К╪к┘Е ╪к┘Г┘И┘К┘Ж TrainingArguments ┘Д╪з╪│╪к╪о╪п╪з┘Е GrokAdamW╪М ┘И┘К╪к┘Е ╪к┘Е╪▒┘К╪▒ ┘Е╪м┘Е┘И╪╣╪й ╪з┘Д╪и┘К╪з┘Ж╪з╪к ╪е┘Д┘Й Trainer ┘Д┘Д╪к╪п╪▒┘К╪и.
|
||||
|
||||
### ┘Е┘П╪н╪│┘С┘Р┘Ж ╪и╪п┘И┘Ж ╪м╪п┘И┘Д┘З (Schedule Free Optimizer)
|
||||
╪к┘Е ╪к┘В╪п┘К┘Е ┘Е┘П╪н╪│┘С┘Р┘Ж╪з╪к ╪и╪п┘И┘Ж ╪м╪п┘И┘Д┘З ┘Б┘К [The Road Less Scheduled](https://hf.co/papers/2405.15682).
|
||||
┘К╪│╪к╪и╪п┘Д ╪з┘Д╪к╪╣┘Д┘Е ╪и╪п┘И┘Ж ╪м╪п┘И┘Д┘З ╪▓╪о┘Е ╪з┘Д┘Е┘П╪н╪│┘С┘Р┘Ж ╪з┘Д╪г╪│╪з╪│┘К ╪и┘Е╪▓┘К╪м ┘Е┘Ж ╪з┘Д┘Е╪к┘И╪│╪╖ тАЛтАЛ┘И╪з┘Д╪к╪п╪з╪о┘Д╪М ┘Д╪е╪▓╪з┘Д╪й ╪з┘Д╪н╪з╪м╪й ╪к┘Е╪з┘Е┘Л╪з ╪е┘Д┘Й ╪к╪о┘Б┘К┘Б ┘Е┘П╪╣╪п┘Д ╪з┘Д╪к╪╣┘Д┘Е ╪и╪з╪│╪к╪о╪п╪з┘Е ╪м╪п┘И┘Д┘З ╪к┘В┘Д┘К╪п┘К┘З.
|
||||
╪з┘Д┘Е┘П╪н╪│┘С┘Р┘Ж╪з╪к ╪з┘Д┘Е╪п╪╣┘И┘Е╪й ┘Д┘А SFO ┘З┘К "schedule_free_adamw" ┘И "schedule_free_sgd". ┘В┘Е ╪г┘И┘Д╪з┘Л ╪и╪к╪л╪и┘К╪к `schedulefree` ┘Е┘Ж pypi ╪и╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪г┘Е╪▒ `pip install schedulefree`.
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ┘Ж╪╡ ╪и╪▒┘Е╪м┘Й ╪и╪│┘К╪╖ ┘Д╪┤╪▒╪н ┘Г┘К┘Б┘К╪й ╪╢╪и╪╖ [google/gemma-2b](https://huggingface.co/google/gemma-2b) ╪и╪п┘В╪й ╪╣┘Д┘Й ┘Е╪м┘Е┘И╪╣╪й ╪и┘К╪з┘Ж╪з╪к IMDB ╪и╪п┘В╪й ┘Г╪з┘Е┘Д╪й:
|
||||
```python
|
||||
import torch
|
||||
import datasets
|
||||
from transformers import TrainingArguments, AutoTokenizer, AutoModelForCausalLM
|
||||
import trl
|
||||
|
||||
train_dataset = datasets.load_dataset('imdb', split='train')
|
||||
|
||||
args = TrainingArguments(
|
||||
output_dir="./test-schedulefree",
|
||||
max_steps=1000,
|
||||
per_device_train_batch_size=4,
|
||||
optim="schedule_free_adamw",
|
||||
gradient_checkpointing=True,
|
||||
logging_strategy="steps",
|
||||
logging_steps=1,
|
||||
learning_rate=2e-6,
|
||||
save_strategy="no",
|
||||
run_name="sfo-imdb",
|
||||
)
|
||||
|
||||
model_id = "google/gemma-2b"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
max_seq_length=1024,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
## ╪к╪│╪▒┘К╪╣ ┘И┘Е╪п╪▒╪и
|
||||
|
||||
┘К╪к┘Е ╪к╪┤╪║┘К┘Д ┘Б╪ж╪й [`Trainer`] ╪и┘И╪з╪│╪╖╪й [╪к╪│╪▒┘К╪╣](https://hf.co/docs/accelerate)╪М ┘И┘З┘К ┘Е┘Г╪к╪и╪й ┘Д╪к╪п╪▒┘К╪и ┘Ж┘Е╪з╪░╪м PyTorch ╪и╪│┘З┘И┘Д╪й ┘Б┘К ╪и┘К╪ж╪з╪к ┘Е┘И╪▓╪╣╪й ┘Е╪╣ ╪п╪╣┘Е ╪╣┘Е┘Д┘К╪з╪к ╪з┘Д╪к┘Г╪з┘Е┘Д ┘Е╪л┘Д [FullyShardedDataParallel (FSDP)](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/) ┘И [DeepSpeed](https://www.deepspeed.ai/).
|
||||
|
||||
<Tip>
|
||||
|
||||
╪к╪╣╪▒┘Б ╪╣┘Д┘Й ╪з┘Д┘Е╪▓┘К╪п ╪н┘И┘Д ╪з╪│╪к╪▒╪з╪к┘К╪м┘К╪з╪к ╪к╪м╪▓╪ж╪й FSDP╪М ┘И╪к┘Б╪▒┘К╪║ ┘И╪н╪п╪й ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪▒┘Г╪▓┘К╪й (CPU)╪М ┘И╪з┘Д┘Е╪▓┘К╪п ┘Е╪╣ [`Trainer`] ┘Б┘К [╪п┘Д┘К┘Д Fully Sharded Data Parallel](fsdp).
|
||||
|
||||
</Tip>
|
||||
|
||||
┘Д╪з╪│╪к╪о╪п╪з┘Е Accelerate ┘Е╪╣ [`Trainer`]]╪М ┘В┘Е ╪и╪к╪┤╪║┘К┘Д ╪з┘Д╪г┘Е╪▒ [`accelerate.config`](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-config) ┘Д╪е╪╣╪п╪з╪п ╪з┘Д╪к╪п╪▒┘К╪и ┘Д╪и┘К╪ж╪й ╪з┘Д╪к╪п╪▒┘К╪и ╪з┘Д╪о╪з╪╡╪й ╪и┘Г. ┘Ж╪┤╪ж ┘З╪░╪з ╪з┘Д╪г┘Е╪▒ ┘Е┘Д┘Б `config_file.yaml` ╪з┘Д╪░┘К ╪│┘К╪к┘Е ╪з╪│╪к╪о╪п╪з┘Е┘З ╪╣┘Ж╪п ╪к╪┤╪║┘К┘Д ┘Ж╪╡ ┘Д┘Д╪к╪п╪▒┘К╪и ╪з┘Д╪и╪▒┘Е╪м┘Й. ╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪и╪╣╪╢ ╪к┘Г┘И┘К┘Ж╪з╪к ╪з┘Д┘Е╪л╪з┘Д ╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж┘Г ╪е╪╣╪п╪з╪п┘З╪з ┘З┘К:
|
||||
|
||||
<hfoptions id="config">
|
||||
<hfoption id="DistributedDataParallel">
|
||||
|
||||
```yml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
distributed_type: MULTI_GPU
|
||||
downcast_bf16: 'no'
|
||||
gpu_ids: all
|
||||
machine_rank: 0 #change rank as per the node
|
||||
main_process_ip: 192.168.20.1
|
||||
main_process_port: 9898
|
||||
main_training_function: main
|
||||
mixed_precision: fp16
|
||||
num_machines: 2
|
||||
num_processes: 8
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
tpu_env: []
|
||||
tpu_use_cluster: false
|
||||
tpu_use_sudo: false
|
||||
use_cpu: false
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="FSDP">
|
||||
|
||||
```yml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
distributed_type: FSDP
|
||||
downcast_bf16: 'no'
|
||||
fsdp_config:
|
||||
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
|
||||
fsdp_backward_prefetch_policy: BACKWARD_PRE
|
||||
fsdp_forward_prefetch: true
|
||||
fsdp_offload_params: false
|
||||
fsdp_sharding_strategy: 1
|
||||
fsdp_state_dict_type: FULL_STATE_DICT
|
||||
fsdp_sync_module_states: true
|
||||
fsdp_transformer_layer_cls_to_wrap: BertLayer
|
||||
fsdp_use_orig_params: true
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
mixed_precision: bf16
|
||||
num_machines: 1
|
||||
num_processes: 2
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
tpu_env: []
|
||||
tpu_use_cluster: false
|
||||
tpu_use_sudo: false
|
||||
use_cpu: false
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="DeepSpeed">
|
||||
|
||||
```yml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
deepspeed_config:
|
||||
deepspeed_config_file: /home/user/configs/ds_zero3_config.json
|
||||
zero3_init_flag: true
|
||||
distributed_type: DEEPSPEED
|
||||
downcast_bf16: 'no'
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
num_machines: 1
|
||||
num_processes: 4
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
tpu_env: []
|
||||
tpu_use_cluster: false
|
||||
tpu_use_sudo: false
|
||||
use_cpu: false
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="DeepSpeed with Accelerate plugin">
|
||||
|
||||
```yml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
deepspeed_config:
|
||||
gradient_accumulation_steps: 1
|
||||
gradient_clipping: 0.7
|
||||
offload_optimizer_device: cpu
|
||||
offload_param_device: cpu
|
||||
zero3_init_flag: true
|
||||
zero_stage: 2
|
||||
distributed_type: DEEPSPEED
|
||||
downcast_bf16: 'no'
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
mixed_precision: bf16
|
||||
num_machines: 1
|
||||
num_processes: 4
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
tpu_env: []
|
||||
tpu_use_cluster: false
|
||||
tpu_use_sudo: false
|
||||
use_cpu: false
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
┘К┘П╪╣╪п ╪г┘Е╪▒ [`accelerate_launch`](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch) ┘З┘И ╪з┘Д╪╖╪▒┘К┘В╪й ╪з┘Д┘Е┘П┘И╪╡┘Й ╪и┘З╪з ┘Д╪к╪┤╪║┘К┘Д ┘Ж╪╡ ╪з┘Д╪и╪▒┘Е╪м┘Й ┘Д┘Д╪к╪п╪▒┘К╪и ╪╣┘Д┘Й ┘Ж╪╕╪з┘Е ┘Е┘И╪▓╪╣ ╪и╪з╪│╪к╪о╪п╪з┘Е Accelerate ┘И [`Trainer`] ┘Е╪╣ ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ╪з┘Д┘Е╪н╪п╪п╪й ┘Б┘К `config_file.yaml`. ┘К╪к┘Е ╪н┘Б╪╕ ┘З╪░╪з ╪з┘Д┘Е┘Д┘Б ┘Б┘К ┘Е╪м┘Д╪п ╪░╪з┘Г╪▒╪й ╪з┘Д╪к╪о╪▓┘К┘Ж ╪з┘Д┘Е╪д┘В╪к ┘Д┘А Accelerate ┘И┘К╪к┘Е ╪к╪н┘Е┘К┘Д┘З ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪╣┘Ж╪п ╪к╪┤╪║┘К┘Д `accelerate_launch`.
|
||||
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ┘Д╪к╪┤╪║┘К┘Д ╪з┘Д┘Ж╪╡ ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м┘К ┘Д┘Д╪к╪п╪▒┘К╪и [run_glue.py](https://github.com/huggingface/transformers/blob/f4db565b695582891e43a5e042e5d318e28f20b8/examples/pytorch/text-classification/run_glue.py#L4) ┘Е╪╣ ╪к┘Г┘И┘К┘Ж FSDP:
|
||||
|
||||
```bash
|
||||
accelerate launch \
|
||||
./examples/pytorch/text-classification/run_glue.py \
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name $TASK_NAME \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 16 \
|
||||
--learning_rate 5e-5 \
|
||||
--num_train_epochs 3 \
|
||||
--output_dir /tmp/$TASK_NAME/ \
|
||||
--overwrite_output_dir
|
||||
```
|
||||
|
||||
┘К┘Е┘Г┘Ж┘Г ╪г┘К╪╢┘Л╪з ╪к╪н╪п┘К╪п ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ┘Е┘Ж ┘Е┘Д┘Б `config_file.yaml` ┘Е╪и╪з╪┤╪▒╪й ┘Б┘К ╪│╪╖╪▒ ╪з┘Д╪г┘И╪з┘Е╪▒:
|
||||
|
||||
```bash
|
||||
accelerate launch --num_processes=2 \
|
||||
--use_fsdp \
|
||||
--mixed_precision=bf16 \
|
||||
--fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP \
|
||||
--fsdp_transformer_layer_cls_to_wrap="BertLayer" \
|
||||
--fsdp_sharding_strategy=1 \
|
||||
--fsdp_state_dict_type=FULL_STATE_DICT \
|
||||
./examples/pytorch/text-classification/run_glue.py
|
||||
--model_name_or_path google-bert/bert-base-cased \
|
||||
--task_name $TASK_NAME \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 16 \
|
||||
--learning_rate 5e-5 \
|
||||
--num_train_epochs 3 \
|
||||
--output_dir /tmp/$TASK_NAME/ \
|
||||
--overwrite_output_dir
|
||||
```
|
||||
|
||||
╪з╪╖┘Д╪╣ ╪╣┘Д┘Й ╪и╪▒┘Ж╪з┘Е╪м ╪к╪╣┘Д┘К┘Е┘К [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) ┘Д┘Е╪╣╪▒┘Б╪й ╪з┘Д┘Е╪▓┘К╪п ╪н┘И┘Д `accelerate_launch` ┘И╪з┘Д╪к┘Г┘И┘К┘Ж╪з╪к ╪з┘Д┘Е╪о╪╡╪╡╪й.
|
||||
171
docs/source/ar/troubleshooting.md
Normal file
171
docs/source/ar/troubleshooting.md
Normal file
@ -0,0 +1,171 @@
|
||||
# ╪з╪│╪к┘Г╪┤╪з┘Б ╪з┘Д╪г╪о╪╖╪з╪б ┘И╪е╪╡┘Д╪з╪н┘З╪з
|
||||
|
||||
╪к╪н╪п╪л ╪з┘Д╪г╪о╪╖╪з╪б ╪г╪н┘К╪з┘Ж┘Л╪з╪М ┘Д┘Г┘Ж┘Ж╪з ┘З┘Ж╪з ┘Д┘Д┘Е╪│╪з╪╣╪п╪й! ┘К╪║╪╖┘К ┘З╪░╪з ╪з┘Д╪п┘Д┘К┘Д ╪и╪╣╪╢ ╪з┘Д┘Е╪┤┘Г┘Д╪з╪к ╪з┘Д╪г┘Г╪л╪▒ ╪┤┘К┘И╪╣┘Л╪з ╪з┘Д╪к┘К ┘И╪з╪м┘З┘Ж╪з┘З╪з ┘И┘Г┘К┘Б┘К╪й ╪н┘Д┘З╪з. ┘Е╪╣ ╪░┘Д┘Г╪М ┘Д╪з ┘К┘П┘В╪╡╪п ╪и┘З╪░╪з ╪з┘Д╪п┘Д┘К┘Д ╪г┘Ж ┘К┘Г┘И┘Ж ┘Е╪м┘Е┘И╪╣╪й ╪┤╪з┘Е┘Д╪й ┘Д┘Г┘Д ┘Е╪┤┘Г┘Д╪з╪к ЁЯдЧ Transformers. ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д┘Е╪│╪з╪╣╪п╪й ┘Б┘К ╪з╪│╪к┘Г╪┤╪з┘Б ┘Е╪┤┘Г┘Д╪к┘Г ┘И╪е╪╡┘Д╪з╪н┘З╪з╪М ╪м╪▒╪и ┘Е╪з ┘К┘Д┘К:
|
||||
<Youtube id="S2EEG3JIt2A"/>
|
||||
|
||||
|
||||
1. ╪з╪╖┘Д╪и ╪з┘Д┘Е╪│╪з╪╣╪п╪й ╪╣┘Д┘Й [╪з┘Д┘Е┘Ж╪к╪п┘К╪з╪к](https://discuss.huggingface.co/). ┘З┘Ж╪з┘Г ┘Б╪ж╪з╪к ┘Е╪н╪п╪п╪й ┘К┘Е┘Г┘Ж┘Г ┘Ж╪┤╪▒ ╪│╪д╪з┘Д┘Г ┘Б┘К┘З╪з╪М ┘Е╪л┘Д [╪з┘Д┘Е╪и╪к╪п╪ж┘К┘Ж](https://discuss.huggingface.co/c/beginners/5) ╪г┘И [ЁЯдЧ Transformers](https://discuss.huggingface.co/c/transformers/9). ╪к╪г┘Г╪п ┘Е┘Ж ┘Г╪к╪з╪и╪й ┘Е┘Ж╪┤┘И╪▒ ╪м┘К╪п ┘И┘И╪з╪╢╪н ╪╣┘Д┘Й ╪з┘Д┘Е┘Ж╪к╪п┘Й ┘Е╪╣ ╪и╪╣╪╢ ╪з┘Д╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Е╪м┘К╪й ╪з┘Д┘В╪з╪и┘Д╪й ┘Д┘Д╪к┘Г╪▒╪з╪▒ ┘Д╪▓┘К╪з╪п╪й ╪з╪н╪к┘Е╪з┘Д┘К╪й ╪н┘Д ┘Е╪┤┘Г┘Д╪к┘Г!
|
||||
<Youtube id="_PAli-V4wj0"/>
|
||||
|
||||
2. ┘В┘Е ╪и╪е┘Ж╪┤╪з╪б [┘Е╪┤┘Г┘Д╪й](https://github.com/huggingface/transformers/issues/new/choose) ┘Б┘К ┘Е╪│╪к┘И╪п╪╣ ЁЯдЧ Transformers ╪е╪░╪з ┘Г╪з┘Ж╪к ┘З┘Ж╪з┘Г ┘Е╪┤┘Г┘Д╪й ┘Е╪к╪╣┘Д┘В╪й ╪и╪з┘Д┘Е┘Г╪к╪и╪й. ╪н╪з┘И┘Д ╪к╪╢┘Е┘К┘Ж ╪г┘Г╪и╪▒ ┘В╪п╪▒ ┘Е┘Е┘Г┘Ж ┘Е┘Ж ╪з┘Д┘Е╪╣┘Д┘И┘Е╪з╪к ╪з┘Д╪к┘К ╪к╪╡┘Б ╪з┘Д┘Е╪┤┘Г┘Д╪й ┘Д┘Е╪│╪з╪╣╪п╪к┘Ж╪з ┘Б┘К ┘Е╪╣╪▒┘Б╪й ┘Е╪з ┘З┘И ╪з┘Д╪о╪╖╪г ┘И┘Г┘К┘Б┘К╪й ╪е╪╡┘Д╪з╪н┘З.
|
||||
|
||||
3. ╪к╪н┘В┘В ┘Е┘Ж ╪п┘Д┘К┘Д [╪з┘Д╪к╪▒╪н┘К┘Д](migration) ╪е╪░╪з ┘Г┘Ж╪к ╪к╪│╪к╪о╪п┘Е ╪е╪╡╪п╪з╪▒┘Л╪з ╪г┘В╪п┘Е ┘Е┘Ж ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers ╪н┘К╪л ╪к┘Е ╪е╪п╪о╪з┘Д ╪и╪╣╪╢ ╪з┘Д╪к╪║┘К┘К╪▒╪з╪к ╪з┘Д┘Е┘З┘Е╪й ╪и┘К┘Ж ╪з┘Д╪е╪╡╪п╪з╪▒╪з╪к.
|
||||
|
||||
|
||||
┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪к┘Б╪з╪╡┘К┘Д ╪н┘И┘Д ╪з╪│╪к┘Г╪┤╪з┘Б ╪з┘Д╪г╪о╪╖╪з╪б ┘И╪е╪╡┘Д╪з╪н┘З╪з ┘И╪з┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪з┘Д┘Е╪│╪з╪╣╪п╪й╪М ╪▒╪з╪м╪╣ [╪з┘Д┘Б╪╡┘Д 8](https://huggingface.co/course/chapter8/1?fw=pt) ┘Е┘Ж ╪п┘И╪▒╪й Hugging Face.
|
||||
|
||||
## ╪и┘К╪ж╪з╪к ╪м╪п╪з╪▒ ╪з┘Д╪н┘Е╪з┘К╪й
|
||||
|
||||
╪и╪╣╪╢ ┘И╪н╪п╪з╪к ┘Е╪╣╪з┘Д╪м╪й ╪з┘Д╪▒╪│┘И┘Е╪з╪к (GPU) ╪╣┘Д┘Й ╪з┘Д╪│╪н╪з╪и╪й ┘И╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д╪┤╪и┘Г╪й ╪з┘Д╪п╪з╪о┘Д┘К╪й ┘Е╪н┘Е┘К╪й ╪и╪м╪п╪з╪▒ ╪н┘Е╪з┘К╪й ┘Е┘Ж ╪з┘Д╪з╪к╪╡╪з┘Д╪з╪к ╪з┘Д╪о╪з╪▒╪м┘К╪й╪М ┘Е┘Е╪з ┘К╪д╪п┘К ╪е┘Д┘Й ╪н╪п┘И╪л ╪о╪╖╪г ┘Б┘К ╪з┘Д╪з╪к╪╡╪з┘Д. ╪╣┘Ж╪п┘Е╪з ╪к╪н╪з┘И┘Д ╪к╪╣┘Д┘К┘Е╪з╪к ╪з┘Д╪и╪▒┘Ж╪з┘Е╪м ╪з┘Д┘Ж╪╡┘К ╪к┘Ж╪▓┘К┘Д ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ╪г┘И ┘Е╪м┘Е┘И╪╣╪з╪к ╪з┘Д╪и┘К╪з┘Ж╪з╪к╪М ╪│┘К╪к┘И┘В┘Б ╪з┘Д╪к┘Ж╪▓┘К┘Д ╪л┘Е ┘К┘Ж╪к┘З┘К ╪и╪о╪╖╪г ┘Е╪л┘Д:
|
||||
|
||||
```
|
||||
ValueError: Connection error, and we cannot find the requested files in the cached path.
|
||||
Please try again or make sure your Internet connection is on.
|
||||
```
|
||||
|
||||
┘Б┘К ┘З╪░┘З ╪з┘Д╪н╪з┘Д╪й╪М ┘К╪м╪и ┘Е╪н╪з┘И┘Д╪й ╪к╪┤╪║┘К┘Д ЁЯдЧ Transformers ┘Б┘К [┘И╪╢╪╣ ╪╣╪п┘Е ╪з┘Д╪з╪к╪╡╪з┘Д](installation#offline-mode) ┘Д╪к╪м┘Ж╪и ╪о╪╖╪г ╪з┘Д╪з╪к╪╡╪з┘Д.
|
||||
|
||||
## CUDA ┘Ж┘Б╪з╪п ╪з┘Д╪░╪з┘Г╪▒╪й
|
||||
|
||||
┘К┘Е┘Г┘Ж ╪г┘Ж ┘К┘Г┘И┘Ж ╪к╪п╪▒┘К╪и ╪з┘Д┘Ж┘Е╪з╪░╪м ╪з┘Д┘Г╪и┘К╪▒╪й ╪з┘Д╪к┘К ╪к╪н╪к┘И┘К ╪╣┘Д┘Й ┘Е┘Д╪з┘К┘К┘Ж ╪з┘Д┘Е╪╣┘Д┘Е╪з╪к ╪г┘Е╪▒┘Л╪з ╪╡╪╣╪и┘Л╪з ╪и╪п┘И┘Ж ╪з┘Д╪г╪м┘З╪▓╪й ╪з┘Д┘Е┘Ж╪з╪│╪и╪й. ╪г╪н╪п ╪з┘Д╪г╪о╪╖╪з╪б ╪з┘Д╪┤╪з╪ж╪╣╪й ╪з┘Д╪к┘К ┘В╪п ╪к┘И╪з╪м┘З┘З╪з ╪╣┘Ж╪п ┘Ж┘Б╪з╪п ╪░╪з┘Г╪▒╪й GPU ┘З┘И:
|
||||
|
||||
```
|
||||
CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.17 GiB total capacity; 9.70 GiB already allocated; 179.81 MiB free; 9.85 GiB reserved in total by PyTorch)
|
||||
```
|
||||
|
||||
┘Б┘К┘Е╪з ┘К┘Д┘К ╪и╪╣╪╢ ╪з┘Д╪н┘Д┘И┘Д ╪з┘Д┘Е╪н╪к┘Е┘Д╪й ╪з┘Д╪к┘К ┘К┘Е┘Г┘Ж┘Г ╪к╪м╪▒╪и╪к┘З╪з ┘Д╪к┘В┘Д┘К┘Д ╪з╪│╪к╪о╪п╪з┘Е ╪з┘Д╪░╪з┘Г╪▒╪й:
|
||||
|
||||
- ┘В┘Д┘Д ┘Е┘Ж ┘В┘К┘Е╪й [`per_device_train_batch_size`](main_classes/trainer#transformers.TrainingArguments.per_device_train_batch_size) ┘Б┘К [`TrainingArguments`].
|
||||
|
||||
- ╪н╪з┘И┘Д ╪з╪│╪к╪о╪п╪з┘Е [`gradient_accumulation_steps`](main_classes/trainer#transformers.TrainingArguments.gradient_accumulation_steps) ┘Б┘К [`TrainingArguments`] ┘Д╪▓┘К╪з╪п╪й ╪н╪м┘Е ╪з┘Д╪п┘П┘Б╪╣╪й ╪и╪┤┘Г┘Д ┘Б╪╣╪з┘Д.
|
||||
|
||||
<Tip>
|
||||
╪▒╪з╪м╪╣ ╪п┘Д┘К┘Д [╪з┘Д╪г╪п╪з╪б](performance) ┘Д┘Е╪▓┘К╪п ┘Е┘Ж ╪з┘Д╪к┘Б╪з╪╡┘К┘Д ╪н┘И┘Д ╪к┘В┘Ж┘К╪з╪к ╪к┘И┘Б┘К╪▒ ╪з┘Д╪░╪з┘Г╪▒╪й.
|
||||
</Tip>
|
||||
|
||||
## ╪╣╪п┘Е ╪з┘Д┘В╪п╪▒╪й ╪╣┘Д┘Й ╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м TensorFlow ┘Е╪н┘Б┘И╪╕
|
||||
|
||||
╪к┘В┘И┘Е ╪╖╪▒┘К┘В╪й TensorFlow [model.save](https://www.tensorflow.org/tutorials/keras/save_and_load#save_the_entire_model) ╪и╪н┘Б╪╕ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪з┘Д┘Г╪з┘Е┘Д - ╪з┘Д┘З┘Ж╪п╪│╪й ╪з┘Д┘Е╪╣┘Е╪з╪▒┘К╪й╪М ╪з┘Д╪г┘И╪▓╪з┘Ж╪М ╪к┘Г┘И┘К┘Ж ╪з┘Д╪к╪п╪▒┘К╪и - ┘Б┘К ┘Е┘Д┘Б ┘И╪з╪н╪п. ┘И┘Е╪╣ ╪░┘Д┘Г╪М ╪╣┘Ж╪п ╪к╪н┘Е┘К┘Д ┘Е┘Д┘Б ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Е╪▒╪й ╪г╪о╪▒┘Й╪М ┘В╪п ╪к┘И╪з╪м┘З ╪о╪╖╪г ┘Д╪г┘Ж ┘Е┘Г╪к╪и╪й ЁЯдЧ Transformers ┘В╪п ┘Д╪з ╪к┘В┘И┘Е ╪и╪к╪н┘Е┘К┘Д ╪м┘Е┘К╪╣ ╪з┘Д┘Г╪з╪ж┘Ж╪з╪к ╪з┘Д┘Е╪к╪╣┘Д┘В╪й ╪и┘А TensorFlow ┘Б┘К ┘Е┘Д┘Б ╪з┘Д┘Ж┘Е┘И╪░╪м. ┘Д╪к╪м┘Ж╪и ╪з┘Д┘Е╪┤┘Г┘Д╪з╪к ╪з┘Д┘Е╪к╪╣┘Д┘В╪й ╪и╪н┘Б╪╕ ┘И╪к╪н┘Е┘К┘Д ┘Ж┘Е╪з╪░╪м TensorFlow╪М ┘Ж┘И╪╡┘К ╪и┘Е╪з ┘К┘Д┘К:
|
||||
|
||||
- ╪з╪н┘Б╪╕ ╪г┘И╪▓╪з┘Ж ╪з┘Д┘Ж┘Е┘И╪░╪м ┘Г┘Е┘Д┘Б `h5` ╪и╪з╪│╪к╪о╪п╪з┘Е [`model.save_weights`](https://www.tensorflow.org/tutorials/keras/save_and_load#save_the_entire_model) ╪л┘Е ╪г╪╣╪п ╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е [`~TFPreTrainedModel.from_pretrained`]:
|
||||
|
||||
```python
|
||||
>>> from transformers import TFPreTrainedModel
|
||||
>>> from tensorflow import keras
|
||||
|
||||
>>> model.save_weights("some_folder/tf_model.h5")
|
||||
>>> model = TFPreTrainedModel.from_pretrained("some_folder")
|
||||
```
|
||||
|
||||
- ╪з╪н┘Б╪╕ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪и╪з╪│╪к╪о╪п╪з┘Е [`~TFPretrainedModel.save_pretrained`] ┘И┘В┘Е ╪и╪к╪н┘Е┘К┘Д┘З ┘Е╪▒╪й ╪г╪о╪▒┘Й ╪и╪з╪│╪к╪о╪п╪з┘Е [`~TFPreTrainedModel.from_pretrained`]:
|
||||
|
||||
```python
|
||||
>>> from transformers import TFPreTrainedModel
|
||||
|
||||
>>> model.save_pretrained("path_to/model")
|
||||
>>> model = TFPreTrainedModel.from_pretrained("path_to/model")
|
||||
```
|
||||
|
||||
## ImportError
|
||||
|
||||
╪о╪╖╪г ╪┤╪з╪ж╪╣ ╪в╪о╪▒ ┘В╪п ╪к┘И╪з╪м┘З┘З╪М ╪о╪з╪╡╪й ╪е╪░╪з ┘Г╪з┘Ж ┘Ж┘Е┘И╪░╪м┘Л╪з ╪к┘Е ╪е╪╡╪п╪з╪▒┘З ╪н╪п┘К╪л┘Л╪з╪М ┘З┘И `ImportError`:
|
||||
|
||||
```
|
||||
ImportError: cannot import name 'ImageGPTImageProcessor' from 'transformers' (unknown location)
|
||||
```
|
||||
|
||||
╪и╪з┘Д┘Ж╪│╪и╪й ┘Д╪г┘Ж┘И╪з╪╣ ╪з┘Д╪г╪о╪╖╪з╪б ┘З╪░┘З╪М ╪к╪н┘В┘В ┘Е┘Ж ╪г┘Ж ┘Д╪п┘К┘Г ╪г╪н╪п╪л ╪е╪╡╪п╪з╪▒ ┘Е┘Ж ┘Е┘Г╪к╪и╪й Hugging Face Transformers ┘Е╪л╪и╪к┘Л╪з ┘Д┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й ╪г╪н╪п╪л ╪з┘Д┘Ж┘Е╪з╪░╪м:
|
||||
|
||||
```bash
|
||||
pip install transformers --upgrade
|
||||
```
|
||||
|
||||
## ╪о╪╖╪г CUDA: ╪к┘Е ╪к╪┤╪║┘К┘Д ╪з┘Д╪к╪г┘Г┘К╪п ╪╣┘Д┘Й ╪м╪з┘Ж╪и ╪з┘Д╪м┘З╪з╪▓
|
||||
|
||||
┘Б┘К ╪и╪╣╪╢ ╪з┘Д╪г╪н┘К╪з┘Ж╪М ┘В╪п ╪к┘И╪з╪м┘З ╪о╪╖╪г CUDA ╪╣╪з┘Е┘Л╪з ╪н┘И┘Д ╪о╪╖╪г ┘Б┘К ┘Г┘И╪п ╪з┘Д╪м┘З╪з╪▓.
|
||||
|
||||
```
|
||||
RuntimeError: CUDA error: device-side assert triggered
|
||||
```
|
||||
|
||||
┘К╪м╪и ╪╣┘Д┘К┘Г ┘Е╪н╪з┘И┘Д╪й ╪к╪┤╪║┘К┘Д ╪з┘Д┘Г┘И╪п ╪╣┘Д┘Й ┘И╪н╪п╪й ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪▒┘Г╪▓┘К╪й (CPU) ╪г┘И┘Д╪з┘Л ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪▒╪│╪з┘Д╪й ╪о╪╖╪г ╪г┘Г╪л╪▒ ╪п┘В╪й. ╪г╪╢┘Б ┘Е╪к╪║┘К╪▒ ╪з┘Д╪и┘К╪ж╪й ╪з┘Д╪к╪з┘Д┘К ┘Б┘К ╪и╪п╪з┘К╪й ┘Г┘И╪п┘Г ┘Д┘Д╪к╪и╪п┘К┘Д ╪е┘Д┘Й ┘И╪н╪п╪й ╪з┘Д┘Е╪╣╪з┘Д╪м╪й ╪з┘Д┘Е╪▒┘Г╪▓┘К╪й:
|
||||
|
||||
```python
|
||||
>>> import os
|
||||
|
||||
>>> os.environ["CUDA_VISIBLE_DEVICES"] = ""
|
||||
```
|
||||
|
||||
╪з┘Д╪о┘К╪з╪▒ ╪з┘Д╪в╪о╪▒ ┘З┘И ╪з┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪к╪к╪и╪╣ ┘Е┘Г╪п╪│ ╪г┘Б╪╢┘Д ┘Е┘Ж GPU. ╪г╪╢┘Б ┘Е╪к╪║┘К╪▒ ╪з┘Д╪и┘К╪ж╪й ╪з┘Д╪к╪з┘Д┘К ┘Б┘К ╪и╪п╪з┘К╪й ┘Г┘И╪п┘Г ┘Д┘Д╪н╪╡┘И┘Д ╪╣┘Д┘Й ╪к╪к╪и╪╣ ╪з┘Д┘Е┘Г╪п╪│ ┘Д┘Д╪е╪┤╪з╪▒╪й ╪е┘Д┘Й ┘Е╪╡╪п╪▒ ╪з┘Д╪о╪╖╪г:
|
||||
|
||||
```python
|
||||
>>> import os
|
||||
|
||||
>>> os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
|
||||
```
|
||||
|
||||
## ╪е╪о╪▒╪з╪м ╪║┘К╪▒ ╪╡╪н┘К╪н ╪╣┘Ж╪п ╪╣╪п┘Е ╪е╪о┘Б╪з╪б ╪▒┘Е┘И╪▓ ╪з┘Д╪н╪┤┘И
|
||||
|
||||
┘Б┘К ╪и╪╣╪╢ ╪з┘Д╪н╪з┘Д╪з╪к╪М ┘В╪п ┘К┘Г┘И┘Ж `hidden_state` ╪║┘К╪▒ ╪╡╪н┘К╪н╪й ╪е╪░╪з ╪к╪╢┘Е┘Ж╪к `input_ids` ╪▒┘Е┘И╪▓ ╪н╪┤┘И. ┘И┘Д╪е╪л╪и╪з╪к ╪░┘Д┘Г╪М ┘В┘Е ╪и╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м ┘И┘Е╪м╪▓┘Й╪б ┘Д╪║┘И┘Й. ┘К┘Е┘Г┘Ж┘Г ╪з┘Д┘И╪╡┘И┘Д ╪е┘Д┘Й `pad_token_id` ┘Д┘Д┘Ж┘Е┘И╪░╪м ┘Д┘Е╪╣╪▒┘Б╪й ┘В┘К┘Е╪к┘З. ┘В╪п ╪к┘Г┘И┘Ж `pad_token_id` `None` ┘Д╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м╪М ┘И┘Д┘Г┘Ж ┘К┘Е┘Г┘Ж┘Г ╪п╪з╪ж┘Е┘Л╪з ╪к╪╣┘К┘К┘Ж┘З╪з ┘К╪п┘И┘К┘Л╪з.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModelForSequenceClassification
|
||||
>>> import torch
|
||||
|
||||
>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
|
||||
>>> model.config.pad_token_id
|
||||
0
|
||||
```
|
||||
|
||||
┘К┘И╪╢╪н ╪з┘Д┘Е╪л╪з┘Д ╪з┘Д╪к╪з┘Д┘К ╪з┘Д┘Е┘П╪о╪▒╪м╪з╪к ╪и╪п┘И┘Ж ╪е╪о┘Б╪з╪б ╪▒┘Е┘И╪▓ ╪з┘Д╪н╪┤┘И:
|
||||
|
||||
```python
|
||||
>>> input_ids = torch.tensor([[7592, 2057, 2097, 2393, 9611, 2115], [7592, 0, 0, 0, 0, 0]])
|
||||
>>> output = model(input_ids)
|
||||
>>> print(output.logits)
|
||||
tensor([[ 0.0082, -0.2307],
|
||||
[ 0.1317, -0.1683]], grad_fn=<AddmmBackward0>)
|
||||
```
|
||||
|
||||
┘З┘Ж╪з ╪з┘Д┘Е┘П╪о╪▒╪м╪з╪к ╪з┘Д┘Б╪╣┘Д┘К╪й ┘Д┘Д╪к╪│┘Д╪│┘Д ╪з┘Д╪л╪з┘Ж┘К:
|
||||
|
||||
```python
|
||||
>>> input_ids = torch.tensor([[7592]])
|
||||
>>> output = model(input_ids)
|
||||
>>> print(output.logits)
|
||||
tensor([[-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)
|
||||
```
|
||||
|
||||
┘К╪м╪и ╪╣┘Д┘К┘Г ┘Б┘К ┘Е╪╣╪╕┘Е ╪з┘Д┘И┘В╪к ╪к┘И┘Б┘К╪▒ `attention_mask` ┘Д┘Д┘Ж┘Е┘И╪░╪м ┘Д╪к╪м╪з┘З┘Д ╪▒┘Е┘И╪▓ ╪з┘Д╪н╪┤┘И ┘Д╪к╪м┘Ж╪и ┘З╪░╪з ╪з┘Д╪о╪╖╪г ╪з┘Д╪╡╪з┘Е╪к. ╪з┘Д╪в┘Ж ┘К╪к╪╖╪з╪и┘В ┘Е┘П╪о╪▒╪м╪з╪к ╪з┘Д╪к╪│┘Д╪│┘Д ╪з┘Д╪л╪з┘Ж┘К ┘Е╪╣ ┘Е┘П╪о╪▒╪м╪з╪к┘З ╪з┘Д┘Б╪╣┘Д┘К╪й:
|
||||
|
||||
<Tip>
|
||||
╪и╪┤┘Г┘Д ╪з┘Б╪к╪▒╪з╪╢┘К╪М ┘К┘Ж╪┤╪ж ┘Е╪м╪▓┘Й╪б ╪з┘Д┘Ж╪╡┘И╪╡ `attention_mask` ┘Д┘Г ╪з╪│╪к┘Ж╪з╪п┘Л╪з ╪е┘Д┘Й ╪е╪╣╪п╪з╪п╪з╪к ╪з┘Д┘Е╪м╪▓┘Й╪б ╪з┘Д┘Е╪н╪п╪п.
|
||||
</Tip>
|
||||
|
||||
```python
|
||||
>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]])
|
||||
>>> output = model(input_ids, attention_mask=attention_mask)
|
||||
>>> print(output.logits)
|
||||
tensor([[ 0.0082, -0.2307],
|
||||
[-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)
|
||||
```
|
||||
|
||||
┘Д╪з ┘К┘Ж╪┤╪ж ЁЯдЧ Transformers ╪к┘Д┘В╪з╪ж┘К┘Л╪з `attention_mask` ┘Д╪е╪о┘Б╪з╪б ╪▒┘Е╪▓ ╪з┘Д╪н╪┤┘И ╪е╪░╪з ╪к┘Е ╪к┘И┘Б┘К╪▒┘З ┘Д╪г┘Ж:
|
||||
|
||||
- ╪и╪╣╪╢ ╪з┘Д┘Ж┘Е╪з╪░╪м ┘Д┘К╪│ ┘Д┘З╪з ╪▒┘Е╪▓ ╪н╪┤┘И.
|
||||
|
||||
- ╪и╪з┘Д┘Ж╪│╪и╪й ┘Д╪и╪╣╪╢ ╪з┘Д╪з╪│╪к╪о╪п╪з┘Е╪з╪к╪М ┘К╪▒┘К╪п ╪з┘Д┘Е╪│╪к╪о╪п┘Е┘И┘Ж ╪г┘Ж ┘К┘Ж╪к╪и┘З ╪з┘Д┘Ж┘Е┘И╪░╪м ╪е┘Д┘Й ╪▒┘Е╪▓ ╪з┘Д╪н╪┤┘И.
|
||||
## ValueError: ┘Б╪ж╪й ╪з┘Д╪к┘Г┘И┘К┘Ж ╪║┘К╪▒ ╪з┘Д┘Е╪╣╪к╪▒┘Б ╪и┘З╪з XYZ ┘Д┘З╪░╪з ╪з┘Д┘Ж┘И╪╣ ┘Е┘Ж AutoModel
|
||||
|
||||
╪и╪┤┘Г┘Д ╪╣╪з┘Е╪М ┘Ж┘И╪╡┘К ╪и╪з╪│╪к╪о╪п╪з┘Е ┘Б╪ж╪й [`AutoModel`] ┘Д╪к╪н┘Е┘К┘Д ╪з┘Д┘Ж╪│╪о ╪з┘Д┘Е╪п╪▒╪и╪й ┘Е╪│╪и┘В┘Л╪з ┘Е┘Ж ╪з┘Д┘Ж┘Е╪з╪░╪м. ┘К┘Е┘Г┘Ж ┘Д┘З╪░┘З ╪з┘Д┘Б╪ж╪й ╪г┘Ж ╪к╪│╪к┘Ж╪к╪м ┘И╪к┘П╪н┘Е┘Д ╪к┘Д┘В╪з╪ж┘К┘Л╪з ╪з┘Д╪и┘Ж┘К╪й ╪з┘Д╪╡╪н┘К╪н╪й ┘Е┘Ж ┘Ж╪│╪о ┘Е╪╣┘К┘Ж╪й ╪и┘Ж╪з╪б┘Л ╪╣┘Д┘Й ╪з┘Д╪к┘Г┘И┘К┘Ж. ╪е╪░╪з ╪▒╪г┘К╪к ┘З╪░╪з ╪з┘Д╪о╪╖╪г `ValueError` ╪╣┘Ж╪п ╪к╪н┘Е┘К┘Д ┘Ж┘Е┘И╪░╪м ┘Е┘Ж ┘Ж╪│╪о╪й╪М ┘Б┘З╪░╪з ┘К╪╣┘Ж┘К ╪г┘Ж ╪з┘Д┘Б╪ж╪й ╪з┘Д╪к┘Д┘В╪з╪ж┘К╪й (Auto) ┘Д┘Е ╪к╪к┘Е┘Г┘Ж ┘Е┘Ж ╪з┘Д╪╣╪л┘И╪▒ ╪╣┘Д┘Й ╪о╪▒┘К╪╖╪й ┘Е┘Ж ╪з┘Д╪к┘Г┘И┘К┘Ж ┘Б┘К ┘Ж┘В╪╖╪й ╪з┘Д╪к┘Б╪к┘К╪┤ ╪з┘Д┘Е╪╣╪╖╪з╪й ╪е┘Д┘Й ┘Ж┘И╪╣ ╪з┘Д┘Ж┘Е┘И╪░╪м ╪з┘Д╪░┘К ╪к┘П╪н╪з┘И┘Д ╪к╪н┘Е┘К┘Д┘З. ┘И╪║╪з┘Д╪и┘Л╪з ┘Е╪з ┘К╪н╪п╪л ┘З╪░╪з ╪╣┘Ж╪п┘Е╪з ┘Д╪з ╪к╪п╪╣┘Е ┘Ж┘В╪╖╪й ╪з┘Д╪к┘Б╪к┘К╪┤ ┘Е┘З┘Е╪й ┘Е╪╣┘К┘Ж╪й.
|
||||
|
||||
╪╣┘Д┘Й ╪│╪и┘К┘Д ╪з┘Д┘Е╪л╪з┘Д╪М ╪│╪к╪▒┘Й ┘З╪░╪з ╪з┘Д╪о╪╖╪г ┘Б┘К ╪з┘Д┘Е╪л╪з┘Д ╪з┘Д╪к╪з┘Д┘К ┘Д╪г┘Ж┘З ┘Д╪з ┘К┘И╪м╪п GPT2 ┘Д┘Д╪е╪м╪з╪и╪й ╪╣┘Д┘Й ╪з┘Д╪г╪│╪ж┘Д╪й:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering
|
||||
|
||||
>>> processor = AutoProcessor.from_pretrained("openai-community/gpt2-medium")
|
||||
>>> model = AutoModelForQuestionAnswering.from_pretrained("openai-community/gpt2-medium")
|
||||
ValueError: Unrecognized configuration class <class 'transformers.models.gpt2.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForQuestionAnswering.
|
||||
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ...
|
||||
```
|
||||
@ -112,7 +112,7 @@ Bevor Sie irgendwelchen Code schreiben, empfehlen wir Ihnen dringend, die besteh
|
||||
|
||||
Sie ben├╢tigen grundlegende `git`-Kenntnisse, um zu ЁЯдЧ Transformers beizutragen. Obwohl `git` nicht das einfachste Werkzeug ist, hat es ein sehr gutes Handbuch. Geben Sie `git --help` in eine Shell ein und genie├Яen Sie es! Wenn Sie B├╝cher bevorzugen, ist [Pro Git](https://git-scm.com/book/en/v2) eine gute Anlaufstelle.
|
||||
|
||||
Sie ben├╢tigen **[Python 3.8](https://github.com/huggingface/transformers/blob/main/setup.py#L426)** oder h├╢her, um zu ЁЯдЧ Transformers beizutragen. Folgen Sie den nachstehenden Schritten, um mit dem Beitrag zu beginnen:
|
||||
Sie ben├╢tigen **[Python 3.9](https://github.com/huggingface/transformers/blob/main/setup.py#L426)** oder h├╢her, um zu ЁЯдЧ Transformers beizutragen. Folgen Sie den nachstehenden Schritten, um mit dem Beitrag zu beginnen:
|
||||
|
||||
1. Forken Sie das [Repository](https://github.com/huggingface/transformers), indem Sie auf den **[Fork](https://github.com/huggingface/transformers/fork)**-Button auf der Seite des Repositorys klicken. Dadurch wird eine Kopie des Codes auf Ihrem GitHub-Account erstellt.
|
||||
|
||||
|
||||
@ -43,7 +43,7 @@ Folglich k├╢nnen Sie eine bestimmte Modellversion mit dem Parameter "Revision" l
|
||||
|
||||
```py
|
||||
>>> model = AutoModel.from_pretrained(
|
||||
... "julien-c/EsperBERTo-small", revision="v2.0.1" # tag name, or branch name, or commit hash
|
||||
... "julien-c/EsperBERTo-small", revision="4c77982" # tag name, or branch name, or commit hash
|
||||
... )
|
||||
```
|
||||
|
||||
|
||||
@ -218,6 +218,8 @@
|
||||
title: CPU inference
|
||||
- local: perf_infer_gpu_one
|
||||
title: GPU inference
|
||||
- local: perf_infer_gpu_multi
|
||||
title: Multi-GPU inference
|
||||
title: Optimizing inference
|
||||
- local: big_models
|
||||
title: Instantiate a big model
|
||||
@ -414,6 +416,8 @@
|
||||
title: Gemma
|
||||
- local: model_doc/gemma2
|
||||
title: Gemma2
|
||||
- local: model_doc/glm
|
||||
title: GLM
|
||||
- local: model_doc/openai-gpt
|
||||
title: GPT
|
||||
- local: model_doc/gpt_neo
|
||||
@ -512,6 +516,8 @@
|
||||
title: Nystr├╢mformer
|
||||
- local: model_doc/olmo
|
||||
title: OLMo
|
||||
- local: model_doc/olmo_1124
|
||||
title: OLMo November 2024
|
||||
- local: model_doc/olmoe
|
||||
title: OLMoE
|
||||
- local: model_doc/open-llama
|
||||
@ -604,6 +610,8 @@
|
||||
title: XLNet
|
||||
- local: model_doc/yoso
|
||||
title: YOSO
|
||||
- local: model_doc/zamba
|
||||
title: Zamba
|
||||
title: Text models
|
||||
- isExpanded: false
|
||||
sections:
|
||||
@ -713,8 +721,6 @@
|
||||
title: ViTMSN
|
||||
- local: model_doc/yolos
|
||||
title: YOLOS
|
||||
- local: model_doc/zamba
|
||||
title: Zamba
|
||||
- local: model_doc/zoedepth
|
||||
title: ZoeDepth
|
||||
title: Vision models
|
||||
@ -740,6 +746,8 @@
|
||||
title: Mimi
|
||||
- local: model_doc/mms
|
||||
title: MMS
|
||||
- local: model_doc/moshi
|
||||
title: Moshi
|
||||
- local: model_doc/musicgen
|
||||
title: MusicGen
|
||||
- local: model_doc/musicgen_melody
|
||||
@ -969,4 +977,4 @@
|
||||
- local: internal/time_series_utils
|
||||
title: Utilities for Time Series
|
||||
title: Internal Helpers
|
||||
title: API
|
||||
title: API
|
||||
|
||||
@ -332,7 +332,7 @@ This code can quickly be converted into a tool, just by wrapping it in a functio
|
||||
from transformers import tool
|
||||
|
||||
@tool
|
||||
def model_download_counter(task: str) -> str:
|
||||
def model_download_tool(task: str) -> str:
|
||||
"""
|
||||
This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub.
|
||||
It returns the name of the checkpoint.
|
||||
@ -345,7 +345,7 @@ def model_download_counter(task: str) -> str:
|
||||
```
|
||||
|
||||
The function needs:
|
||||
- A clear name. The name usually describes what the tool does. Since the code returns the model with the most downloads for a task, let's put `model_download_counter`.
|
||||
- A clear name. The name usually describes what the tool does. Since the code returns the model with the most downloads for a task, let's put `model_download_tool`.
|
||||
- Type hints on both inputs and output
|
||||
- A description, that includes an 'Args:' part where each argument is described (without a type indication this time, it will be pulled from the type hint).
|
||||
All these will be automatically baked into the agent's system prompt upon initialization: so strive to make them as clear as possible!
|
||||
@ -367,7 +367,7 @@ You get the following:
|
||||
======== New task ========
|
||||
Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?
|
||||
==== Agent is executing the code below:
|
||||
most_downloaded_model = model_download_counter(task="text-to-video")
|
||||
most_downloaded_model = model_download_tool(task="text-to-video")
|
||||
print(f"The most downloaded model for the 'text-to-video' task is {most_downloaded_model}.")
|
||||
====
|
||||
```
|
||||
|
||||
@ -66,10 +66,10 @@ manager_agent.run("Who is the CEO of Hugging Face?")
|
||||
|
||||
Let's take again the tool example from main documentation, for which we had implemented a `tool` decorator.
|
||||
|
||||
If you need to add variation, like custom attributes for your too, you can build your tool following the fine-grained method: building a class that inherits from the [`Tool`] superclass.
|
||||
If you need to add variation, like custom attributes for your tool, you can build your tool following the fine-grained method: building a class that inherits from the [`Tool`] superclass.
|
||||
|
||||
The custom tool needs:
|
||||
- An attribute `name`, which corresponds to the name of the tool itself. The name usually describes what the tool does. Since the code returns the model with the most downloads for a task, let's name is `model_download_counter`.
|
||||
- An attribute `name`, which corresponds to the name of the tool itself. The name usually describes what the tool does. Since the code returns the model with the most downloads for a task, let's name it `model_download_counter`.
|
||||
- An attribute `description` is used to populate the agent's system prompt.
|
||||
- An `inputs` attribute, which is a dictionary with keys `"type"` and `"description"`. It contains information that helps the Python interpreter make educated choices about the input.
|
||||
- An `output_type` attribute, which specifies the output type.
|
||||
@ -123,6 +123,54 @@ from transformers import load_tool, CodeAgent
|
||||
model_download_tool = load_tool("m-ric/hf-model-downloads")
|
||||
```
|
||||
|
||||
### Import a Space as a tool ЁЯЪА
|
||||
|
||||
You can directly import a Space from the Hub as a tool using the [`Tool.from_space`] method!
|
||||
|
||||
You only need to provide the id of the Space on the Hub, its name, and a description that will help you agent understand what the tool does. Under the hood, this will use [`gradio-client`](https://pypi.org/project/gradio-client/) library to call the Space.
|
||||
|
||||
For instance, let's import the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) Space from the Hub and use it to generate an image.
|
||||
|
||||
```
|
||||
from transformers import Tool
|
||||
|
||||
image_generation_tool = Tool.from_space(
|
||||
"black-forest-labs/FLUX.1-dev",
|
||||
name="image_generator",
|
||||
description="Generate an image from a prompt")
|
||||
|
||||
image_generation_tool("A sunny beach")
|
||||
```
|
||||
And voil├а, here's your image! ЁЯПЦя╕П
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/sunny_beach.webp">
|
||||
|
||||
Then you can use this tool just like any other tool. For example, let's improve the prompt `a rabbit wearing a space suit` and generate an image of it.
|
||||
|
||||
```python
|
||||
from transformers import ReactCodeAgent
|
||||
|
||||
agent = ReactCodeAgent(tools=[image_generation_tool])
|
||||
|
||||
agent.run(
|
||||
"Improve this prompt, then generate an image of it.", prompt='A rabbit wearing a space suit'
|
||||
)
|
||||
```
|
||||
|
||||
```text
|
||||
=== Agent thoughts:
|
||||
improved_prompt could be "A bright blue space suit wearing rabbit, on the surface of the moon, under a bright orange sunset, with the Earth visible in the background"
|
||||
|
||||
Now that I have improved the prompt, I can use the image generator tool to generate an image based on this prompt.
|
||||
>>> Agent is executing the code below:
|
||||
image = image_generator(prompt="A bright blue space suit wearing rabbit, on the surface of the moon, under a bright orange sunset, with the Earth visible in the background")
|
||||
final_answer(image)
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit_spacesuit_flux.webp">
|
||||
|
||||
How cool is this? ЁЯдй
|
||||
|
||||
### Use gradio-tools
|
||||
|
||||
[gradio-tools](https://github.com/freddyaboulton/gradio-tools) is a powerful library that allows using Hugging
|
||||
@ -140,36 +188,6 @@ gradio_prompt_generator_tool = StableDiffusionPromptGeneratorTool()
|
||||
prompt_generator_tool = Tool.from_gradio(gradio_prompt_generator_tool)
|
||||
```
|
||||
|
||||
Now you can use it just like any other tool. For example, let's improve the prompt `a rabbit wearing a space suit`.
|
||||
|
||||
```python
|
||||
image_generation_tool = load_tool('huggingface-tools/text-to-image')
|
||||
agent = CodeAgent(tools=[prompt_generator_tool, image_generation_tool], llm_engine=llm_engine)
|
||||
|
||||
agent.run(
|
||||
"Improve this prompt, then generate an image of it.", prompt='A rabbit wearing a space suit'
|
||||
)
|
||||
```
|
||||
|
||||
The model adequately leverages the tool:
|
||||
```text
|
||||
======== New task ========
|
||||
Improve this prompt, then generate an image of it.
|
||||
You have been provided with these initial arguments: {'prompt': 'A rabbit wearing a space suit'}.
|
||||
==== Agent is executing the code below:
|
||||
improved_prompt = StableDiffusionPromptGenerator(query=prompt)
|
||||
while improved_prompt == "QUEUE_FULL":
|
||||
improved_prompt = StableDiffusionPromptGenerator(query=prompt)
|
||||
print(f"The improved prompt is {improved_prompt}.")
|
||||
image = image_generator(prompt=improved_prompt)
|
||||
====
|
||||
```
|
||||
|
||||
Before finally generating the image:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png">
|
||||
|
||||
|
||||
> [!WARNING]
|
||||
> gradio-tools require *textual* inputs and outputs even when working with different modalities like image and audio objects. Image and audio inputs and outputs are currently incompatible.
|
||||
|
||||
@ -179,7 +197,7 @@ We love Langchain and think it has a very compelling suite of tools.
|
||||
To import a tool from LangChain, use the `from_langchain()` method.
|
||||
|
||||
Here is how you can use it to recreate the intro's search result using a LangChain web search tool.
|
||||
|
||||
This tool will need `pip install google-search-results` to work properly.
|
||||
```python
|
||||
from langchain.agents import load_tools
|
||||
from transformers import Tool, ReactCodeAgent
|
||||
@ -188,7 +206,7 @@ search_tool = Tool.from_langchain(load_tools(["serpapi"])[0])
|
||||
|
||||
agent = ReactCodeAgent(tools=[search_tool])
|
||||
|
||||
agent.run("How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?")
|
||||
agent.run("How many more blocks (also denoted as layers) are in BERT base encoder compared to the encoder from the architecture proposed in Attention is All You Need?")
|
||||
```
|
||||
|
||||
## Display your agent run in a cool Gradio interface
|
||||
@ -240,4 +258,4 @@ with gr.Blocks() as demo:
|
||||
|
||||
if __name__ == "__main__":
|
||||
demo.launch()
|
||||
```
|
||||
```
|
||||
|
||||
@ -943,6 +943,35 @@ all implementations of Jinja:
|
||||
- Directly rendering a dict or list may give different results in other implementations (for example, string entries
|
||||
might change from single-quoted to double-quoted). Adding the `tojson` filter can help to ensure consistency here.
|
||||
|
||||
### Writing generation prompts
|
||||
|
||||
We mentioned above that `add_generation_prompt` is a special variable that will be accessible inside your template,
|
||||
and is controlled by the user setting the `add_generation_prompt` flag. If your model expects a header for
|
||||
assistant messages, then your template must support adding the header when `add_generation_prompt` is set.
|
||||
|
||||
Here is an example of a template that formats messages ChatML-style, with generation prompt support:
|
||||
|
||||
```text
|
||||
{{- bos_token }}
|
||||
{%- for message in messages %}
|
||||
{{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- endif %}
|
||||
```
|
||||
|
||||
The exact content of the assistant header will depend on your specific model, but it should always be **the string
|
||||
that represents the start of an assistant message**, so that if the user applies your template with
|
||||
`add_generation_prompt=True` and then generates text, the model will write an assistant response. Also note that some
|
||||
models do not need a generation prompt, because assistant messages always begin immediately after user messages.
|
||||
This is particularly common for LLaMA and Mistral models, where assistant messages begin immediately after the `[/INST]`
|
||||
token that ends user messages. In these cases, the template can ignore the `add_generation_prompt` flag.
|
||||
|
||||
Generation prompts are important! If your model requires a generation prompt but it is not set in the template, then
|
||||
model generations will likely be severely degraded, or the model may display unusual behaviour like continuing
|
||||
the final user message!
|
||||
|
||||
### Writing and debugging larger templates
|
||||
|
||||
When this feature was introduced, most templates were quite small, the Jinja equivalent of a "one-liner" script.
|
||||
|
||||
@ -403,7 +403,7 @@ culture, and they allow us to design the'
|
||||
|
||||
This guide illustrates the main parameters that enable various decoding strategies. More advanced parameters exist for the
|
||||
[`generate`] method, which gives you even further control over the [`generate`] method's behavior.
|
||||
For the complete list of the available parameters, refer to the [API documentation](./main_classes/text_generation.md).
|
||||
For the complete list of the available parameters, refer to the [API documentation](./main_classes/text_generation).
|
||||
|
||||
### Speculative Decoding
|
||||
|
||||
@ -416,16 +416,6 @@ Assisted decoding assumes the main and assistant models have the same tokenizer,
|
||||
Currently, only greedy search and sampling are supported with assisted decoding, and assisted decoding doesn't support batched inputs.
|
||||
To learn more about assisted decoding, check [this blog post](https://huggingface.co/blog/assisted-generation).
|
||||
|
||||
#### Universal Assisted Decoding
|
||||
|
||||
Universal Assisted Decoding (UAD) adds support for main and assistant models with different tokenizers.
|
||||
To use it, simply pass the tokenizers using the `tokenizer` and `assistant_tokenizer` arguments (see below).
|
||||
Internally, the main model input tokens are re-encoded into assistant model tokens, then candidate tokens are generated in the assistant encoding, which are
|
||||
in turn re-encoded into main model candidate tokens. Validation then proceeds as explained above.
|
||||
The re-encoding steps involve decoding token ids into text and then encoding the text using a different tokenizer.
|
||||
Since re-encoding the tokens may result in tokenization discrepancies, UAD finds the longest common subsequence between the source and target encodings,
|
||||
to ensure the new tokens include the correct prompt suffix.
|
||||
|
||||
To enable assisted decoding, set the `assistant_model` argument with a model.
|
||||
|
||||
```python
|
||||
@ -445,26 +435,6 @@ To enable assisted decoding, set the `assistant_model` argument with a model.
|
||||
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
|
||||
```
|
||||
|
||||
If the main and assistant models have different tokenizers, use Universal Assisted Decoding.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
>>> prompt = "Alice and Bob"
|
||||
>>> checkpoint = "google/gemma-2-9b"
|
||||
>>> assistant_checkpoint = "double7/vicuna-68m"
|
||||
|
||||
>>> assistant_tokenizer = AutoTokenizer.from_pretrained(assistant_checkpoint)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
>>> inputs = tokenizer(prompt, return_tensors="pt")
|
||||
|
||||
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
||||
>>> assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint)
|
||||
>>> outputs = model.generate(**inputs, assistant_model=assistant_model, tokenizer=tokenizer, assistant_tokenizer=assistant_tokenizer)
|
||||
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
|
||||
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
|
||||
```
|
||||
|
||||
When using assisted decoding with sampling methods, you can use the `temperature` argument to control the randomness,
|
||||
just like in multinomial sampling. However, in assisted decoding, reducing the temperature may help improve the latency.
|
||||
|
||||
@ -486,9 +456,63 @@ just like in multinomial sampling. However, in assisted decoding, reducing the t
|
||||
['Alice and Bob, a couple of friends of mine, who are both in the same office as']
|
||||
```
|
||||
|
||||
#### Universal Assisted Decoding
|
||||
|
||||
Universal Assisted Decoding (UAD) adds support for main and assistant models with different tokenizers.
|
||||
To use it, simply pass the tokenizers using the `tokenizer` and `assistant_tokenizer` arguments (see below).
|
||||
Internally, the main model input tokens are re-encoded into assistant model tokens, then candidate tokens are generated in the assistant encoding, which are
|
||||
in turn re-encoded into main model candidate tokens. Validation then proceeds as explained above.
|
||||
The re-encoding steps involve decoding token ids into text and then encoding the text using a different tokenizer.
|
||||
Since re-encoding the tokens may result in tokenization discrepancies, UAD finds the longest common subsequence between the source and target encodings,
|
||||
to ensure the new tokens include the correct prompt suffix.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
>>> prompt = "Alice and Bob"
|
||||
>>> checkpoint = "google/gemma-2-9b"
|
||||
>>> assistant_checkpoint = "double7/vicuna-68m"
|
||||
|
||||
>>> assistant_tokenizer = AutoTokenizer.from_pretrained(assistant_checkpoint)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
>>> inputs = tokenizer(prompt, return_tensors="pt")
|
||||
|
||||
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
||||
>>> assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint)
|
||||
>>> outputs = model.generate(**inputs, assistant_model=assistant_model, tokenizer=tokenizer, assistant_tokenizer=assistant_tokenizer)
|
||||
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
|
||||
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
|
||||
```
|
||||
|
||||
#### Prompt Lookup
|
||||
|
||||
Alternatively, you can also set the `prompt_lookup_num_tokens` to trigger n-gram based assisted decoding, as opposed
|
||||
to model based assisted decoding. You can read more about it [here](https://twitter.com/joao_gante/status/1747322413006643259).
|
||||
|
||||
#### Self-Speculative Decoding
|
||||
|
||||
An LLM can be trained to also use its language modeling head with earlier hidden states as input, effectively
|
||||
skipping layers to yield a lower-quality output -- a technique called early exiting.
|
||||
We use the lower-quality early exit output as an assistant output, and apply self-speculation to fix the output using the remaining layers. The final generation of that self-speculative solution is the same (or has the same distribution) as the original model's generation.
|
||||
If the model you're using was trained to do early exit, you can pass
|
||||
`assistant_early_exit` (integer). In this case, the assistant model will be the same model but exiting early, hence the
|
||||
"self-speculative" name. Because the assistant model is a portion of the target model, caches and weights can be shared, which results in lower memory requirements. As in other assisted generation methods, the final generated result has the same quality as if no assistant had been used.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
>>> prompt = "Alice and Bob"
|
||||
>>> checkpoint = "facebook/layerskip-llama3.2-1B"
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
>>> inputs = tokenizer(prompt, return_tensors="pt")
|
||||
|
||||
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
||||
>>> outputs = model.generate(**inputs, assistant_early_exit=4, do_sample=False, max_new_tokens=20)
|
||||
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
|
||||
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
|
||||
```
|
||||
|
||||
### DoLa Decoding
|
||||
|
||||
**D**ecoding by C**o**ntrasting **La**yers (DoLa) is a contrastive decoding strategy to improve the factuality and reduce the
|
||||
@ -508,10 +532,11 @@ See the following examples for DoLa decoding with the 32-layer LLaMA-7B model.
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
|
||||
>>> import torch
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
|
||||
>>> model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", torch_dtype=torch.float16)
|
||||
>>> device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
||||
>>> device, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> model.to(device)
|
||||
>>> set_seed(42)
|
||||
|
||||
|
||||
@ -85,6 +85,9 @@ For now the supported model architectures are the architectures that have been v
|
||||
- StableLM
|
||||
- GPT2
|
||||
- Starcoder2
|
||||
- T5
|
||||
- Mamba
|
||||
- Nemotron
|
||||
|
||||
## Example usage
|
||||
|
||||
|
||||
@ -19,7 +19,7 @@ State-of-the-art Machine Learning for [PyTorch](https://pytorch.org/), [TensorFl
|
||||
|
||||
ЁЯдЧ Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:
|
||||
|
||||
ЁЯУЭ **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.<br>
|
||||
ЁЯУЭ **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, code generation, summarization, translation, multiple choice, and text generation.<br>
|
||||
ЁЯЦ╝я╕П **Computer Vision**: image classification, object detection, and segmentation.<br>
|
||||
ЁЯЧгя╕П **Audio**: automatic speech recognition and audio classification.<br>
|
||||
ЁЯРЩ **Multimodal**: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
|
||||
@ -150,6 +150,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| [Gemma](model_doc/gemma) | тЬЕ | тЭМ | тЬЕ |
|
||||
| [Gemma2](model_doc/gemma2) | тЬЕ | тЭМ | тЭМ |
|
||||
| [GIT](model_doc/git) | тЬЕ | тЭМ | тЭМ |
|
||||
| [GLM](model_doc/glm) | тЬЕ | тЭМ | тЭМ |
|
||||
| [GLPN](model_doc/glpn) | тЬЕ | тЭМ | тЭМ |
|
||||
| [GPT Neo](model_doc/gpt_neo) | тЬЕ | тЭМ | тЬЕ |
|
||||
| [GPT NeoX](model_doc/gpt_neox) | тЬЕ | тЭМ | тЭМ |
|
||||
@ -223,6 +224,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| [MobileNetV2](model_doc/mobilenet_v2) | тЬЕ | тЭМ | тЭМ |
|
||||
| [MobileViT](model_doc/mobilevit) | тЬЕ | тЬЕ | тЭМ |
|
||||
| [MobileViTV2](model_doc/mobilevitv2) | тЬЕ | тЭМ | тЭМ |
|
||||
| [Moshi](model_doc/moshi) | тЬЕ | тЭМ | тЭМ |
|
||||
| [MPNet](model_doc/mpnet) | тЬЕ | тЬЕ | тЭМ |
|
||||
| [MPT](model_doc/mpt) | тЬЕ | тЭМ | тЭМ |
|
||||
| [MRA](model_doc/mra) | тЬЕ | тЭМ | тЭМ |
|
||||
@ -238,6 +240,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| [Nougat](model_doc/nougat) | тЬЕ | тЬЕ | тЬЕ |
|
||||
| [Nystr├╢mformer](model_doc/nystromformer) | тЬЕ | тЭМ | тЭМ |
|
||||
| [OLMo](model_doc/olmo) | тЬЕ | тЭМ | тЭМ |
|
||||
| [OLMo November 2024](model_doc/olmo_1124) | тЬЕ | тЭМ | тЭМ |
|
||||
| [OLMoE](model_doc/olmoe) | тЬЕ | тЭМ | тЭМ |
|
||||
| [OmDet-Turbo](model_doc/omdet-turbo) | тЬЕ | тЭМ | тЭМ |
|
||||
| [OneFormer](model_doc/oneformer) | тЬЕ | тЭМ | тЭМ |
|
||||
|
||||
@ -185,6 +185,9 @@ generation.
|
||||
[[autodoc]] SuppressTokensLogitsProcessor
|
||||
- __call__
|
||||
|
||||
[[autodoc]] SynthIDTextWatermarkLogitsProcessor
|
||||
- __call__
|
||||
|
||||
[[autodoc]] TemperatureLogitsWarper
|
||||
- __call__
|
||||
|
||||
@ -418,5 +421,18 @@ A [`Constraint`] can be used to force the generation to include specific tokens
|
||||
|
||||
## Watermark Utils
|
||||
|
||||
[[autodoc]] WatermarkingConfig
|
||||
- __call__
|
||||
|
||||
[[autodoc]] WatermarkDetector
|
||||
- __call__
|
||||
|
||||
[[autodoc]] BayesianDetectorConfig
|
||||
|
||||
[[autodoc]] BayesianDetectorModel
|
||||
- forward
|
||||
|
||||
[[autodoc]] SynthIDTextWatermarkingConfig
|
||||
|
||||
[[autodoc]] SynthIDTextWatermarkDetector
|
||||
- __call__
|
||||
|
||||
@ -348,6 +348,99 @@ model = AutoModelForCausalLM.from_pretrained(
|
||||
)
|
||||
```
|
||||
|
||||
### Fine-Tuning with torch.compile and Padding-Free Data Collation
|
||||
|
||||
In addition to optimizing inference, you can also enhance the training efficiency of large language models by leveraging torch.compile during fine-tuning and using a padding-free data collator. This approach can significantly speed up training and reduce computational overhead.
|
||||
|
||||
Here's how you can fine-tune a Llama model using SFTTrainer from the TRL library, with torch_compile enabled and a padding-free data collator:
|
||||
|
||||
```
|
||||
#################### IMPORTS ###################
|
||||
|
||||
import math
|
||||
import datasets
|
||||
import dataclasses
|
||||
from transformers import (
|
||||
AutoModelForCausalLM,
|
||||
AutoTokenizer,
|
||||
TrainingArguments
|
||||
)
|
||||
from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM
|
||||
|
||||
#################### MODEL LOADING WITH FLASH ATTENTION ###################
|
||||
|
||||
model_name = "meta-llama/Llama-3.2-1B"
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
attn_implementation="flash_attention_2" # Enables FlashAttention-2
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
|
||||
|
||||
#################### DATA PREPROCESSING (PADDING-FREE) ###################
|
||||
|
||||
response_template = "\n### Label:"
|
||||
response_template_ids = tokenizer.encode(
|
||||
response_template, add_special_tokens=False
|
||||
)[2:] # Exclude special tokens
|
||||
|
||||
data_collator = DataCollatorForCompletionOnlyLM(
|
||||
response_template_ids=response_template_ids,
|
||||
tokenizer=tokenizer,
|
||||
ignore_index=-100,
|
||||
padding_free=True # Enables padding-free collation
|
||||
)
|
||||
|
||||
def format_dataset(example):
|
||||
return {
|
||||
"output": example["output"] + tokenizer.eos_token
|
||||
}
|
||||
|
||||
data_files = {"train": "path/to/dataset"} # Replace with your dataset path
|
||||
json_dataset = datasets.load_dataset("json", data_files=data_files)
|
||||
formatted_train_dataset = json_dataset["train"].map(format_dataset)
|
||||
|
||||
################# TRAINING CONFIGURATION ############################
|
||||
|
||||
train_args = TrainingArguments(
|
||||
num_train_epochs=5,
|
||||
per_device_train_batch_size=4,
|
||||
per_device_eval_batch_size=4,
|
||||
gradient_accumulation_steps=4,
|
||||
learning_rate=1e-5,
|
||||
weight_decay=0.0,
|
||||
warmup_ratio=0.03,
|
||||
lr_scheduler_type="cosine",
|
||||
logging_steps=1,
|
||||
include_tokens_per_second=True,
|
||||
save_strategy="epoch",
|
||||
output_dir="output",
|
||||
torch_compile=True, # Enables torch.compile
|
||||
torch_compile_backend="inductor",
|
||||
torch_compile_mode="default"
|
||||
)
|
||||
|
||||
# Convert TrainingArguments to SFTConfig
|
||||
transformer_train_arg_fields = [x.name for x in dataclasses.fields(SFTConfig)]
|
||||
transformer_kwargs = {
|
||||
k: v
|
||||
for k, v in train_args.to_dict().items()
|
||||
if k in transformer_train_arg_fields
|
||||
}
|
||||
training_args = SFTConfig(**transformer_kwargs)
|
||||
|
||||
####################### FINE-TUNING #####################
|
||||
|
||||
trainer = SFTTrainer(
|
||||
model=model,
|
||||
tokenizer=tokenizer,
|
||||
train_dataset=formatted_train_dataset,
|
||||
data_collator=data_collator,
|
||||
dataset_text_field="output",
|
||||
args=training_args,
|
||||
)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### PyTorch scaled dot product attention
|
||||
|
||||
Scaled dot product attention (SDPA) is automatically enabled in PyTorch 2.0 and it supports FlashAttention, xFormers, and PyTorch's C++ implementation. SDPA chooses the most performant attention algorithm if you're using a CUDA backend. For other backends, SDPA defaults to the PyTorch C++ implementation.
|
||||
|
||||
@ -18,6 +18,49 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
An image processor is in charge of preparing input features for vision models and post processing their outputs. This includes transformations such as resizing, normalization, and conversion to PyTorch, TensorFlow, Flax and Numpy tensors. It may also include model specific post-processing such as converting logits to segmentation masks.
|
||||
|
||||
Fast image processors are available for a few models and more will be added in the future. They are based on the [torchvision](https://pytorch.org/vision/stable/index.html) library and provide a significant speed-up, especially when processing on GPU.
|
||||
They have the same API as the base image processors and can be used as drop-in replacements.
|
||||
To use a fast image processor, you need to install the `torchvision` library, and set the `use_fast` argument to `True` when instantiating the image processor:
|
||||
|
||||
```python
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)
|
||||
```
|
||||
|
||||
When using a fast image processor, you can also set the `device` argument to specify the device on which the processing should be done. By default, the processing is done on the same device as the inputs if the inputs are tensors, or on the CPU otherwise.
|
||||
|
||||
```python
|
||||
from torchvision.io import read_image
|
||||
from transformers import DetrImageProcessorFast
|
||||
|
||||
images = read_image("image.jpg")
|
||||
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
|
||||
images_processed = processor(images, return_tensors="pt", device="cuda")
|
||||
```
|
||||
|
||||
Here are some speed comparisons between the base and fast image processors for the `DETR` and `RT-DETR` models, and how they impact overall inference time:
|
||||
|
||||
<div class="flex">
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_detr_fast_padded.png" />
|
||||
</div>
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_detr_fast_batched_compiled.png" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="flex">
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_rt_detr_fast_single.png" />
|
||||
</div>
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_rt_detr_fast_batched.png" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
These benchmarks were run on an [AWS EC2 g5.2xlarge instance](https://aws.amazon.com/ec2/instance-types/g5/), utilizing an NVIDIA A10G Tensor Core GPU.
|
||||
|
||||
|
||||
## ImageProcessingMixin
|
||||
|
||||
|
||||
@ -478,6 +478,12 @@ Pipelines available for multimodal tasks include the following.
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ImageTextToTextPipeline
|
||||
|
||||
[[autodoc]] ImageTextToTextPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### MaskGenerationPipeline
|
||||
|
||||
[[autodoc]] MaskGenerationPipeline
|
||||
|
||||
@ -41,8 +41,6 @@ like token streaming.
|
||||
- validate
|
||||
- get_generation_mode
|
||||
|
||||
[[autodoc]] generation.WatermarkingConfig
|
||||
|
||||
## GenerationMixin
|
||||
|
||||
[[autodoc]] GenerationMixin
|
||||
|
||||
@ -51,6 +51,25 @@ token space (e.g., getting the index of the token comprising a given character o
|
||||
to a given token).
|
||||
|
||||
|
||||
# Multimodal Tokenizer
|
||||
|
||||
Apart from that each tokenizer can be a "multimodal" tokenizer which means that the tokenizer will hold all relevant special tokens
|
||||
as part of tokenizer attributes for easier access. For example, if the tokenizer is loaded from a vision-language model like LLaVA, you will
|
||||
be able to access `tokenizer.image_token_id` to obtain the special image token used as a placeholder.
|
||||
|
||||
To enable extra special tokens for any type of tokenizer, you have to add the following lines and save the tokenizer. Extra special tokens do not
|
||||
have to be modality related and can ne anything that the model often needs access to. In the below code, tokenizer at `output_dir` will have direct access
|
||||
to three more special tokens.
|
||||
|
||||
```python
|
||||
vision_tokenizer = AutoTokenizer.from_pretrained(
|
||||
"llava-hf/llava-1.5-7b-hf",
|
||||
extra_special_tokens={"image_token": "<image>", "boi_token": "<image_start>", "eoi_token": "<image_end>"}
|
||||
)
|
||||
print(vision_tokenizer.image_token, vision_tokenizer.image_token_id)
|
||||
("<image>", 32000)
|
||||
```
|
||||
|
||||
## PreTrainedTokenizer
|
||||
|
||||
[[autodoc]] PreTrainedTokenizer
|
||||
|
||||
@ -40,6 +40,10 @@ The original code can be found [here](https://github.com/salesforce/LAVIS/tree/5
|
||||
- BLIP-2 can be used for conditional text generation given an image and an optional text prompt. At inference time, it's recommended to use the [`generate`] method.
|
||||
- One can use [`Blip2Processor`] to prepare images for the model, and decode the predicted tokens ID's back to text.
|
||||
|
||||
> [!NOTE]
|
||||
> BLIP models after release v4.46 will raise warnings about adding `processor.num_query_tokens = {{num_query_tokens}}` and expand model embeddings layer to add special `<image>` token. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you. Adding these attributes means that BLIP will add the number of query tokens required per image and expand the text with as many `<image>` placeholders as there will be query tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there wil be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.num_query_tokens` and model embeddings expansion can be done by following [this link](https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042).
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by ЁЯМО) resources to help you get started with BLIP-2.
|
||||
|
||||
@ -54,6 +54,12 @@ If you're interested in submitting a resource to be included here, please feel f
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
|
||||
## DeformableDetrImageProcessorFast
|
||||
|
||||
[[autodoc]] DeformableDetrImageProcessorFast
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
|
||||
## DeformableDetrFeatureExtractor
|
||||
|
||||
[[autodoc]] DeformableDetrFeatureExtractor
|
||||
|
||||
@ -84,27 +84,24 @@ If you want to do the pre- and postprocessing yourself, here's how to do that:
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(**inputs)
|
||||
... predicted_depth = outputs.predicted_depth
|
||||
|
||||
>>> # interpolate to original size
|
||||
>>> prediction = torch.nn.functional.interpolate(
|
||||
... predicted_depth.unsqueeze(1),
|
||||
... size=image.size[::-1],
|
||||
... mode="bicubic",
|
||||
... align_corners=False,
|
||||
>>> # interpolate to original size and visualize the prediction
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... target_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
>>> # visualize the prediction
|
||||
>>> output = prediction.squeeze().cpu().numpy()
|
||||
>>> formatted = (output * 255 / np.max(output)).astype("uint8")
|
||||
>>> depth = Image.fromarray(formatted)
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by ЁЯМО) resources to help you get started with Depth Anything.
|
||||
|
||||
- [Monocular depth estimation task guide](../tasks/depth_estimation)
|
||||
- [Monocular depth estimation task guide](../tasks/monocular_depth_estimation)
|
||||
- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). ЁЯМО
|
||||
|
||||
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
|
||||
|
||||
@ -78,27 +78,24 @@ If you want to do the pre- and post-processing yourself, here's how to do that:
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(**inputs)
|
||||
... predicted_depth = outputs.predicted_depth
|
||||
|
||||
>>> # interpolate to original size
|
||||
>>> prediction = torch.nn.functional.interpolate(
|
||||
... predicted_depth.unsqueeze(1),
|
||||
... size=image.size[::-1],
|
||||
... mode="bicubic",
|
||||
... align_corners=False,
|
||||
>>> # interpolate to original size and visualize the prediction
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... target_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
>>> # visualize the prediction
|
||||
>>> output = prediction.squeeze().cpu().numpy()
|
||||
>>> formatted = (output * 255 / np.max(output)).astype("uint8")
|
||||
>>> depth = Image.fromarray(formatted)
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by ЁЯМО) resources to help you get started with Depth Anything.
|
||||
|
||||
- [Monocular depth estimation task guide](../tasks/depth_estimation)
|
||||
- [Monocular depth estimation task guide](../tasks/monocular_depth_estimation)
|
||||
- [Depth Anything V2 demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-V2).
|
||||
- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). ЁЯМО
|
||||
- [Core ML conversion of the `small` variant for use on Apple Silicon](https://huggingface.co/apple/coreml-depth-anything-v2-small).
|
||||
|
||||
@ -181,6 +181,15 @@ If you're interested in submitting a resource to be included here, please feel f
|
||||
- post_process_instance_segmentation
|
||||
- post_process_panoptic_segmentation
|
||||
|
||||
## DetrImageProcessorFast
|
||||
|
||||
[[autodoc]] DetrImageProcessorFast
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
- post_process_semantic_segmentation
|
||||
- post_process_instance_segmentation
|
||||
- post_process_panoptic_segmentation
|
||||
|
||||
## DetrFeatureExtractor
|
||||
|
||||
[[autodoc]] DetrFeatureExtractor
|
||||
|
||||
99
docs/source/en/model_doc/glm.md
Normal file
99
docs/source/en/model_doc/glm.md
Normal file
@ -0,0 +1,99 @@
|
||||
<!--Copyright 2024 The GLM & ZhipuAI team and The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
тЪая╕П Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# GLM
|
||||
|
||||
## Overview
|
||||
|
||||
The GLM Model was proposed
|
||||
in [ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools](https://arxiv.org/html/2406.12793v1)
|
||||
by GLM Team, THUDM & ZhipuAI.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report
|
||||
primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most
|
||||
capable models that are trained with all the insights and lessons gained from the preceding three generations of
|
||||
ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with
|
||||
a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment
|
||||
is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human
|
||||
feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU,
|
||||
GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3)
|
||||
matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as
|
||||
measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide
|
||||
when and which tool(s) to useтАФincluding web browser, Python interpreter, text-to-image model, and user-defined
|
||||
functionsтАФto effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All
|
||||
Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter.
|
||||
Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M),
|
||||
GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone.*
|
||||
|
||||
Tips:
|
||||
|
||||
- This model was contributed by [THUDM](https://huggingface.co/THUDM). The most recent code can be
|
||||
found [here](https://github.com/thudm/GLM-4).
|
||||
|
||||
|
||||
## Usage tips
|
||||
|
||||
`GLM-4` can be found on the [Huggingface Hub](https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7)
|
||||
|
||||
In the following, we demonstrate how to use `glm-4-9b-chat` for the inference. Note that we have used the ChatML format for dialog, in this demo we show how to leverage `apply_chat_template` for this purpose.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
>>> device = "cuda" # the device to load the model onto
|
||||
|
||||
>>> model = AutoModelForCausalLM.from_pretrained("THUDM/glm-4-9b-chat", device_map="auto")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat")
|
||||
|
||||
>>> prompt = "Give me a short introduction to large language model."
|
||||
|
||||
>>> messages = [{"role": "user", "content": prompt}]
|
||||
|
||||
>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
|
||||
>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)
|
||||
|
||||
>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)
|
||||
|
||||
>>> generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
|
||||
|
||||
>>> response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||||
```
|
||||
|
||||
## GlmConfig
|
||||
|
||||
[[autodoc]] GlmConfig
|
||||
|
||||
## GlmModel
|
||||
|
||||
[[autodoc]] GlmModel
|
||||
- forward
|
||||
|
||||
## GlmForCausalLM
|
||||
|
||||
[[autodoc]] GlmForCausalLM
|
||||
- forward
|
||||
|
||||
## GlmForSequenceClassification
|
||||
|
||||
[[autodoc]] GlmForSequenceClassification
|
||||
- forward
|
||||
|
||||
## GlmForTokenClassification
|
||||
|
||||
[[autodoc]] GlmForTokenClassification
|
||||
- forward
|
||||
@ -33,6 +33,10 @@ The original code can be found [here](https://github.com/salesforce/LAVIS/tree/m
|
||||
|
||||
InstructBLIP uses the same architecture as [BLIP-2](blip2) with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.
|
||||
|
||||
> [!NOTE]
|
||||
> BLIP models after release v4.46 will raise warnings about adding `processor.num_query_tokens = {{num_query_tokens}}` and expand model embeddings layer to add special `<image>` token. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you. Adding these attributes means that BLIP will add the number of query tokens required per image and expand the text with as many `<image>` placeholders as there will be query tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there wil be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.num_query_tokens` and model embeddings expansion can be done by following [this link](https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042).
|
||||
|
||||
## InstructBlipConfig
|
||||
|
||||
[[autodoc]] InstructBlipConfig
|
||||
|
||||
@ -35,6 +35,10 @@ The original code can be found [here](https://github.com/salesforce/LAVIS/tree/m
|
||||
|
||||
- The model was trained by sampling 4 frames per video, so it's recommended to sample 4 frames
|
||||
|
||||
> [!NOTE]
|
||||
> BLIP models after release v4.46 will raise warnings about adding `processor.num_query_tokens = {{num_query_tokens}}` and expand model embeddings layer to add special `<image>` token. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you. Adding these attributes means that BLIP will add the number of query tokens required per image and expand the text with as many `<image>` placeholders as there will be query tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there wil be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.num_query_tokens` and model embeddings expansion can be done by following [this link](https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042).
|
||||
|
||||
## InstructBlipVideoConfig
|
||||
|
||||
[[autodoc]] InstructBlipVideoConfig
|
||||
|
||||
@ -40,6 +40,13 @@ The original code can be found [here](https://github.com/haotian-liu/LLaVA/tree/
|
||||
|
||||
- Note the model has not been explicitly trained to process multiple images in the same prompt, although this is technically possible, you may experience inaccurate results.
|
||||
|
||||
|
||||
> [!NOTE]
|
||||
> LLaVA models after release v4.46 will raise warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you.
|
||||
Adding these attributes means that LLaVA will try to infer the number of image tokens required per image and expand the text with as many `<image>` placeholders as there will be tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there will be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`. The `num_additional_image_tokens` should be `1` if the vision backbone adds a CLS token or `0` if nothing extra is added to the vision patches.
|
||||
|
||||
|
||||
### Single image inference
|
||||
|
||||
For best results, we recommend users to use the processor's `apply_chat_template()` method to format your prompt correctly. For that you need to construct a conversation history, passing in a plain string will not format your prompt. Each message in the conversation history for chat templates is a dictionary with keys "role" and "content". The "content" should be a list of dictionaries, for "text" and "image" modalities, as follows:
|
||||
@ -85,10 +92,10 @@ LLaVa also supports batched inference. Here is how you can do it:
|
||||
import requests
|
||||
from PIL import Image
|
||||
import torch
|
||||
from transformers import AutoProcessor, LLavaForConditionalGeneration
|
||||
from transformers import AutoProcessor, LlavaForConditionalGeneration
|
||||
|
||||
# Load the model in half-precision
|
||||
model = LLavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", torch_dtype=torch.float16, device_map="auto")
|
||||
model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", torch_dtype=torch.float16, device_map="auto")
|
||||
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
|
||||
|
||||
# Get two different images
|
||||
|
||||
@ -53,6 +53,12 @@ The original code can be found [here](https://github.com/haotian-liu/LLaVA/tree/
|
||||
</Tip>
|
||||
|
||||
|
||||
> [!NOTE]
|
||||
> LLaVA models after release v4.46 will raise warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you.
|
||||
Adding these attributes means that LLaVA will try to infer the number of image tokens required per image and expand the text with as many `<image>` placeholders as there will be tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there will be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`. The `num_additional_image_tokens` should be `1` if the vision backbone adds a CLS token or `0` if nothing extra is added to the vision patches.
|
||||
|
||||
|
||||
- Note that each checkpoint has been trained with a specific prompt format, depending on which large language model (LLM) was used. You can use the processor's `apply_chat_template` to format your prompts correctly. For that you have to construct a conversation history, passing a plain string will not format your prompt. Each message in the conversation history for chat templates is a dictionary with keys "role" and "content". The "content" should be a list of dictionaries, for "text" and "image" modalities. Below is an example of how to do that and the list of formats accepted by each checkpoint.
|
||||
|
||||
We will use [llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) and a conversation history of text and image. Each content field has to be a list of dicts, as follows:
|
||||
|
||||
@ -50,6 +50,12 @@ The original code can be found [here](https://github.com/LLaVA-VL/LLaVA-NeXT/tre
|
||||
</Tip>
|
||||
|
||||
|
||||
> [!NOTE]
|
||||
> LLaVA models after release v4.46 will raise warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you.
|
||||
Adding these attributes means that LLaVA will try to infer the number of image tokens required per image and expand the text with as many `<image>` placeholders as there will be tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there will be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`. The `num_additional_image_tokens` should be `1` if the vision backbone adds a CLS token or `0` if nothing extra is added to the vision patches.
|
||||
|
||||
|
||||
- Note that each checkpoint has been trained with a specific prompt format, depending on which large language model (LLM) was used. You can use tokenizer's `apply_chat_template` to format your prompts correctly. Below is an example of how to do that.
|
||||
|
||||
We will use [LLaVA-NeXT-Video-7B-hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-hf) and a conversation history of videos and images. Each content field has to be a list of dicts, as follows:
|
||||
|
||||
@ -66,4 +66,4 @@ The original code can be found [here](https://github.com/kyutai-labs/moshi).
|
||||
[[autodoc]] MimiModel
|
||||
- decode
|
||||
- encode
|
||||
- forward
|
||||
- forward
|
||||
@ -30,6 +30,25 @@ The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a
|
||||
- The text passed to the processor should have the `"<|image|>"` tokens where the images should be inserted.
|
||||
- The processor has its own `apply_chat_template` method to convert chat messages to text that can then be passed as text to the processor.
|
||||
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
Mllama has an extra token used as a placeholder for image positions in the text. It means that input ids and an input embedding layer will have an extra token. But since the weights for input and output embeddings are not tied, the `lm_head` layer has one less token and will fail if you want to calculate loss on image tokens or apply some logit processors. In case you are training, make sure to mask out special `"<|image|>"` tokens in the `labels` as the model should not be trained on predicting them.
|
||||
|
||||
Otherwise if you see CUDA-side index erros when generating, use the below code to expand the `lm_head` by one more token.
|
||||
|
||||
|
||||
```python
|
||||
old_embeddings = model.get_output_embeddings()
|
||||
|
||||
num_tokens = model.vocab_size + 1
|
||||
resized_embeddings = model._get_resized_lm_head(old_embeddings, new_num_tokens=num_tokens, mean_resizing=True)
|
||||
resized_embeddings.requires_grad_(old_embeddings.weight.requires_grad)
|
||||
model.set_output_embeddings(resized_embeddings)
|
||||
```
|
||||
</Tip>
|
||||
|
||||
|
||||
## Usage Example
|
||||
|
||||
#### Instruct model
|
||||
|
||||
183
docs/source/en/model_doc/moshi.md
Normal file
183
docs/source/en/model_doc/moshi.md
Normal file
@ -0,0 +1,183 @@
|
||||
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
тЪая╕П Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Moshi
|
||||
|
||||
## Overview
|
||||
|
||||
The Moshi model was proposed in [Moshi: a speech-text foundation model for real-time dialogue](https://kyutai.org/Moshi.pdf) by Alexandre D├йfossez, Laurent Mazar├й, Manu Orsini, Am├йlie Royer, Patrick P├йrez, Herv├й J├йgou, Edouard Grave and Neil Zeghidour.
|
||||
|
||||
Moshi is a speech-text foundation model that casts spoken dialogue as speech-to-speech generation. Starting from a text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec, while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of explicit speaker turns, and the modeling of arbitrary conversational dynamics. Moshi also predicts time-aligned text tokens as a prefix to audio tokens. This тАЬInner MonologueтАЭ method significantly improves the linguistic quality of generated speech and provides streaming speech recognition and text-to-speech. As a result, Moshi is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice.
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/ylacombe/benchmark-comparison/resolve/main/moshi_architecture.png">
|
||||
</div>
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and text-to-speech. Such frameworks cannot emulate the experience of real conversations. First, their complexity induces a latency of several seconds between interactions. Second, text being the intermediate modality for dialogue, non-linguistic information that modifies meaningтАФ such as emotion or non-speech soundsтАФ is lost in the interaction. Finally, they rely on a segmentation into speaker turns, which does not take into account overlapping speech, interruptions and interjections. Moshi solves these independent issues altogether by casting spoken dialogue as speech-to-speech generation. Starting from a text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec, while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of explicit speaker turns, and the modeling of arbitrary conversational dynamics. We moreover extend the hierarchical semantic-to-acoustic token generation of previous work to first predict time-aligned text tokens as a prefix to audio tokens. Not only this тАЬInner MonologueтАЭ method significantly improves the linguistic quality of generated speech, but we also illustrate how it can provide streaming speech recognition and text-to-speech. Our resulting model is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice, and is available at github.com/kyutai-labs/moshi.*
|
||||
|
||||
Moshi deals with 3 streams of information:
|
||||
1. The user's audio
|
||||
2. Moshi's audio
|
||||
3. Moshi's textual output
|
||||
|
||||
Similarly to [`~MusicgenModel`], audio is represented with audio codebooks, which can be interpreted like tokens. The main difference between text tokens and audio codebooks is that audio codebooks introduce an additional dimension of information.
|
||||
Text tokens are typically of dim `(batch_size, sequence_length)` but audio tokens are of dim `(batch_size, num_codebooks, sequence_length)`.
|
||||
|
||||
Moshi's made of 3 components:
|
||||
|
||||
**1. The main decoder (Helium in the paper)**
|
||||
|
||||
It corresponds to [`MoshiForCausalLM`]. It is strictly a classic text LLM, that uses an architecture similar to [` ~GemmaForCausalLM`]. In other words, it takes text tokens, embeds them, pass them through the decoder and a language head, to get text logits.
|
||||
|
||||
**2. The depth decoder**
|
||||
|
||||
On its own, it's also a classic LLM, but this time, instead of generating over the time dimension, it generates over the codebook dimension.
|
||||
|
||||
It also means that its context length is `num_codebooks`, thus it can't generate more than `num_codebooks`.
|
||||
|
||||
Note that each timestamp - i.e each codebook - gets its own set of Linear Layers and Embeddings.
|
||||
|
||||
**3. [`MimiModel`]**
|
||||
|
||||
It's the audio encoder from Kyutai, that has recently been integrated to transformers, which is used to "tokenize" audio. It has the same use that [`~EncodecModel`] has in [`~MusicgenModel`].
|
||||
|
||||
|
||||
## Tips:
|
||||
|
||||
The original checkpoints can be converted using the conversion script `src/transformers/models/moshi/convert_moshi_transformers.py`
|
||||
|
||||
|
||||
### How to use the model:
|
||||
|
||||
This implementation has two main aims:
|
||||
1. quickly test model generation by simplifying the original API
|
||||
2. simplify training. A training guide will come soon, but user contributions are welcomed!
|
||||
|
||||
<Tip>
|
||||
|
||||
It is designed for intermediate use. We strongly recommend using the original [implementation](https://github.com/kyutai-labs/moshi) to infer the model in real-time streaming.
|
||||
|
||||
</Tip>
|
||||
|
||||
**1. Model generation**
|
||||
|
||||
Moshi is a streaming auto-regressive model with two streams of audio. To put it differently, one audio stream corresponds to what the model said/will say and the other audio stream corresponds to what the user said/will say.
|
||||
|
||||
[`MoshiForConditionalGeneration.generate`] thus needs 3 inputs:
|
||||
1. `input_ids` - corresponding to the text token history
|
||||
2. `moshi_input_values` or `moshi_audio_codes`- corresponding to the model audio history
|
||||
3. `user_input_values` or `user_audio_codes` - corresponding to the user audio history
|
||||
|
||||
These three inputs must be synchronized. Meaning that their lengths must correspond to the same number of tokens.
|
||||
|
||||
You can dynamically use the 3 inputs depending on what you want to test:
|
||||
1. Simply check the model response to an user prompt - in that case, `input_ids` can be filled with pad tokens and `user_input_values` can be a zero tensor of the same shape than the user prompt.
|
||||
2. Test more complex behaviour - in that case, you must be careful about how the input tokens are synchronized with the audios.
|
||||
|
||||
<Tip>
|
||||
|
||||
The original model is synchronized text with audio by padding the text in between each token enunciation.
|
||||
|
||||
To follow the example of the following image, `"Hello, I'm Moshi"` could be transformed to `"Hello,<pad><unk>I'm Moshi"`.
|
||||
|
||||
</Tip>
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/ylacombe/benchmark-comparison/resolve/main/moshi_text_sync.png">
|
||||
</div>
|
||||
|
||||
|
||||
[`MoshiForConditionalGeneration.generate`] then auto-regressively feeds to itself its own audio stream, but since it doesn't have access to the user input stream while using `transformers`, it will thus **assume that the user is producing blank audio**.
|
||||
|
||||
|
||||
|
||||
```python
|
||||
>>> from datasets import load_dataset, Audio
|
||||
>>> import torch, math
|
||||
>>> from transformers import MoshiForConditionalGeneration, AutoFeatureExtractor, AutoTokenizer
|
||||
>>> librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
||||
|
||||
|
||||
>>> # prepare user input audio
|
||||
>>> librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate))
|
||||
>>> audio_sample = librispeech_dummy[-1]["audio"]["array"]
|
||||
>>> user_input_values = feature_extractor(raw_audio=audio_sample, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt").to(device=device, dtype=dtype)
|
||||
|
||||
>>> # prepare moshi input values - we suppose moshi didn't say anything while the user spoke
|
||||
>>> moshi_input_values = torch.zeros_like(user_input_values.input_values)
|
||||
|
||||
>>> # prepare moshi input ids - we suppose moshi didn't say anything while the user spoke
|
||||
>>> num_tokens = math.ceil(moshi_input_values.shape[-1] * waveform_to_token_ratio)
|
||||
>>> input_ids = torch.ones((1, num_tokens), device=device, dtype=torch.int64) * tokenizer.encode("<pad>")[0]
|
||||
|
||||
>>> # generate 25 new tokens (around 2s of audio)
|
||||
>>> output = model.generate(input_ids=input_ids, user_input_values=user_input_values.input_values, moshi_input_values=moshi_input_values, max_new_tokens=25)
|
||||
|
||||
>>> text_tokens = output.sequences
|
||||
>>> audio_waveforms = output.audio_sequences
|
||||
```
|
||||
|
||||
**2. Model training**
|
||||
|
||||
Most of the work has to be done during data creation/pre-processing, because of the need to align/synchronize streams.
|
||||
|
||||
Once it's done, you can simply forward `text_labels` and `audio_labels` to [`MoshiForConditionalGeneration.forward`], alongside the usual inputs, to get the model loss.
|
||||
|
||||
A training guide will come soon, but user contributions are welcomed!
|
||||
|
||||
### How does the model forward the inputs / generate:
|
||||
|
||||
1. The input streams are embedded and combined into `inputs_embeds`.
|
||||
|
||||
2. `inputs_embeds` is passed through the main decoder, which processes it like a normal LLM would.
|
||||
|
||||
3. The main decoder outputs `text logits` but also its `last hidden state` which is called `temporal context` in the paper.
|
||||
|
||||
3. The depth decoder switches the dimension on which we forward / generate (codebooks instead of time). It uses the token generated from `text logits` and the `temporal context` to auto-regressively generate audio codebooks.
|
||||
|
||||
|
||||
This model was contributed by [Yoach Lacombe (ylacombe)](https://huggingface.co/ylacombe).
|
||||
|
||||
The original code can be found [here](https://github.com/kyutai-labs/moshi).
|
||||
|
||||
|
||||
|
||||
## MoshiConfig
|
||||
|
||||
[[autodoc]] MoshiConfig
|
||||
|
||||
## MoshiDepthConfig
|
||||
|
||||
[[autodoc]] MoshiDepthConfig
|
||||
|
||||
## MoshiModel
|
||||
|
||||
[[autodoc]] MoshiModel
|
||||
- forward
|
||||
|
||||
## MoshiForCausalLM
|
||||
|
||||
[[autodoc]] MoshiForCausalLM
|
||||
- forward
|
||||
|
||||
## MoshiForConditionalGeneration
|
||||
|
||||
[[autodoc]] MoshiForConditionalGeneration
|
||||
- forward
|
||||
- generate
|
||||
- get_unconditional_inputs
|
||||
46
docs/source/en/model_doc/olmo_1124.md
Normal file
46
docs/source/en/model_doc/olmo_1124.md
Normal file
@ -0,0 +1,46 @@
|
||||
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
тЪая╕П Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# OLMo November 2024
|
||||
|
||||
## Overview
|
||||
|
||||
The OLMo November 2024 model is a successor of the OLMo model, which was proposed in
|
||||
[OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838).
|
||||
|
||||
The architectural changes from the original OLMo model to this model are:
|
||||
|
||||
- RMSNorm is used instead of standard layer norm.
|
||||
- Norm is applied to attention queries and keys.
|
||||
- Norm is applied after attention/feedforward layers rather than before.
|
||||
|
||||
This model was contributed by [shanearora](https://huggingface.co/shanearora).
|
||||
The original code can be found [here](https://github.com/allenai/OLMo/tree/main/olmo).
|
||||
|
||||
|
||||
## Olmo1124Config
|
||||
|
||||
[[autodoc]] Olmo1124Config
|
||||
|
||||
## Olmo1124Model
|
||||
|
||||
[[autodoc]] Olmo1124Model
|
||||
- forward
|
||||
|
||||
## Olmo1124ForCausalLM
|
||||
|
||||
[[autodoc]] Olmo1124ForCausalLM
|
||||
- forward
|
||||
@ -46,7 +46,7 @@ Initially, an image is processed using a pre-trained convolutional neural networ
|
||||
>>> from PIL import Image
|
||||
>>> from transformers import RTDetrForObjectDetection, RTDetrImageProcessor
|
||||
|
||||
>>> url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
|
||||
>>> url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
>>> image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_r50vd")
|
||||
@ -57,7 +57,7 @@ Initially, an image is processed using a pre-trained convolutional neural networ
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(**inputs)
|
||||
|
||||
>>> results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)
|
||||
>>> results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3)
|
||||
|
||||
>>> for result in results:
|
||||
... for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
|
||||
@ -95,6 +95,12 @@ A list of official Hugging Face and community (indicated by ЁЯМО) resources to h
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
|
||||
## RTDetrImageProcessorFast
|
||||
|
||||
[[autodoc]] RTDetrImageProcessorFast
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
|
||||
## RTDetrModel
|
||||
|
||||
[[autodoc]] RTDetrModel
|
||||
|
||||
@ -86,24 +86,32 @@ model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/sup
|
||||
|
||||
inputs = processor(images, return_tensors="pt")
|
||||
outputs = model(**inputs)
|
||||
image_sizes = [(image.height, image.width) for image in images]
|
||||
outputs = processor.post_process_keypoint_detection(outputs, image_sizes)
|
||||
|
||||
for i in range(len(images)):
|
||||
image_mask = outputs.mask[i]
|
||||
image_indices = torch.nonzero(image_mask).squeeze()
|
||||
image_keypoints = outputs.keypoints[i][image_indices]
|
||||
image_scores = outputs.scores[i][image_indices]
|
||||
image_descriptors = outputs.descriptors[i][image_indices]
|
||||
for output in outputs:
|
||||
for keypoints, scores, descriptors in zip(output["keypoints"], output["scores"], output["descriptors"]):
|
||||
print(f"Keypoints: {keypoints}")
|
||||
print(f"Scores: {scores}")
|
||||
print(f"Descriptors: {descriptors}")
|
||||
```
|
||||
|
||||
You can then print the keypoints on the image to visualize the result :
|
||||
You can then print the keypoints on the image of your choice to visualize the result:
|
||||
```python
|
||||
import cv2
|
||||
for keypoint, score in zip(image_keypoints, image_scores):
|
||||
keypoint_x, keypoint_y = int(keypoint[0].item()), int(keypoint[1].item())
|
||||
color = tuple([score.item() * 255] * 3)
|
||||
image = cv2.circle(image, (keypoint_x, keypoint_y), 2, color)
|
||||
cv2.imwrite("output_image.png", image)
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
plt.axis("off")
|
||||
plt.imshow(image_1)
|
||||
plt.scatter(
|
||||
outputs[0]["keypoints"][:, 0],
|
||||
outputs[0]["keypoints"][:, 1],
|
||||
c=outputs[0]["scores"] * 100,
|
||||
s=outputs[0]["scores"] * 50,
|
||||
alpha=0.8
|
||||
)
|
||||
plt.savefig(f"output_image.png")
|
||||
```
|
||||

|
||||
|
||||
This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
|
||||
The original code can be found [here](https://github.com/magicleap/SuperPointPretrainedNetwork).
|
||||
@ -123,6 +131,7 @@ A list of official Hugging Face and community (indicated by ЁЯМО) resources to h
|
||||
[[autodoc]] SuperPointImageProcessor
|
||||
|
||||
- preprocess
|
||||
- post_process_keypoint_detection
|
||||
|
||||
## SuperPointForKeypointDetection
|
||||
|
||||
|
||||
@ -54,6 +54,12 @@ This model was contributed by [RaushanTurganbay](https://huggingface.co/RaushanT
|
||||
The original code can be found [here](https://github.com/PKU-YuanGroup/Video-LLaVA).
|
||||
|
||||
|
||||
> [!NOTE]
|
||||
> LLaVA models after release v4.46 will raise warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you.
|
||||
Adding these attributes means that LLaVA will try to infer the number of image tokens required per image and expand the text with as many `<image>` placeholders as there will be tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there will be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`. The `num_additional_image_tokens` should be `1` if the vision backbone adds a CLS token or `0` if nothing extra is added to the vision patches.
|
||||
|
||||
|
||||
## Usage example
|
||||
|
||||
### Single Media Mode
|
||||
|
||||
@ -39,6 +39,12 @@ This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada)
|
||||
|
||||
- Note the model has not been explicitly trained to process multiple images in the same prompt, although this is technically possible, you may experience inaccurate results.
|
||||
|
||||
> [!NOTE]
|
||||
> LLaVA models after release v4.46 will raise warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you.
|
||||
Adding these attributes means that LLaVA will try to infer the number of image tokens required per image and expand the text with as many `<image>` placeholders as there will be tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there will be failure when merging the embeddings.
|
||||
The attributes can be obtained from model config, as `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`. The `num_additional_image_tokens` should be `1` if the vision backbone adds a CLS token or `0` if nothing extra is added to the vision patches.
|
||||
|
||||
|
||||
- For better results, we recommend users to use the processor's `apply_chat_template()` method to format your prompt correctly. For that you need to construct a conversation history, passing in a plain string will not format your prompt. Each message in the conversation history for chat templates is a dictionary with keys "role" and "content". The "content" should be a list of dictionaries, for "text" and "image" modalities, as follows:
|
||||
|
||||
```python
|
||||
|
||||
@ -23,6 +23,43 @@ The abstract from the paper is the following:
|
||||
|
||||
This model was contributed by [jegormeister](https://huggingface.co/jegormeister). The original code (written in JAX) can be found [here](https://github.com/google-research/scenic/tree/main/scenic/projects/vivit).
|
||||
|
||||
### Using Scaled Dot Product Attention (SDPA)
|
||||
|
||||
PyTorch includes a native scaled dot-product attention (SDPA) operator as part of `torch.nn.functional`. This function
|
||||
encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the
|
||||
[official documentation](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
|
||||
or the [GPU Inference](https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention)
|
||||
page for more information.
|
||||
|
||||
SDPA is used by default for `torch>=2.1.1` when an implementation is available, but you may also set
|
||||
`attn_implementation="sdpa"` in `from_pretrained()` to explicitly request SDPA to be used.
|
||||
|
||||
```
|
||||
from transformers import VivitModel
|
||||
model = VivitModel.from_pretrained("google/vivit-b-16x2-kinetics400", attn_implementation="sdpa", torch_dtype=torch.float16)
|
||||
...
|
||||
```
|
||||
|
||||
For the best speedups, we recommend loading the model in half-precision (e.g. `torch.float16` or `torch.bfloat16`).
|
||||
|
||||
On a local benchmark (A100-40GB, PyTorch 2.3.0, OS Ubuntu 22.04) with `float32` and `google/vivit-b-16x2-kinetics400` model, we saw the following speedups during inference.
|
||||
|
||||
### Training
|
||||
| num_training_steps | batch_size | is cuda | Speedup (%) | Eager peak mem (MB) | sdpa peak mem (MB) | Mem saving (%) |
|
||||
|---------------------:|-------------:|----------:|--------------:|----------------------:|---------------------:|-----------------:|
|
||||
| 100 | 1 | True | 7.122 | 2575.28 | 5932.54 | 130.364 |
|
||||
|
||||
|
||||
|
||||
### Inference
|
||||
| num_batches | batch_size | is cuda | is half | Speedup (%) | Mem eager (MB) | Mem BT (MB) | Mem saved (%) |
|
||||
|---------------|--------------|-----------|-----------|---------------|------------------|---------------|-----------------|
|
||||
| 20 | 1 | True | False | 15.422 | 715.807 | 317.079 | 125.75 |
|
||||
| 20 | 2 | True | False | 17.146 | 1234.75 | 447.175 | 176.122 |
|
||||
| 20 | 4 | True | False | 18.093 | 2275.82 | 709.864 | 220.6 |
|
||||
| 20 | 8 | True | False | 19.284 | 4358.19 | 1233.24 | 253.393 |
|
||||
|
||||
|
||||
## VivitConfig
|
||||
|
||||
[[autodoc]] VivitConfig
|
||||
|
||||
@ -39,54 +39,66 @@ The original code can be found [here](https://github.com/isl-org/ZoeDepth).
|
||||
The easiest to perform inference with ZoeDepth is by leveraging the [pipeline API](../main_classes/pipelines.md):
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
from PIL import Image
|
||||
import requests
|
||||
>>> from transformers import pipeline
|
||||
>>> from PIL import Image
|
||||
>>> import requests
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti")
|
||||
result = pipe(image)
|
||||
depth = result["depth"]
|
||||
>>> pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti")
|
||||
>>> result = pipe(image)
|
||||
>>> depth = result["depth"]
|
||||
```
|
||||
|
||||
Alternatively, one can also perform inference using the classes:
|
||||
|
||||
```python
|
||||
from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation
|
||||
import torch
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
import requests
|
||||
>>> from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation
|
||||
>>> import torch
|
||||
>>> import numpy as np
|
||||
>>> from PIL import Image
|
||||
>>> import requests
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
>>> image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
>>> model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
|
||||
# prepare image for the model
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
>>> # prepare image for the model
|
||||
>>> inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model(**inputs)
|
||||
predicted_depth = outputs.predicted_depth
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(pixel_values)
|
||||
|
||||
# interpolate to original size
|
||||
prediction = torch.nn.functional.interpolate(
|
||||
predicted_depth.unsqueeze(1),
|
||||
size=image.size[::-1],
|
||||
mode="bicubic",
|
||||
align_corners=False,
|
||||
)
|
||||
>>> # interpolate to original size and visualize the prediction
|
||||
>>> ## ZoeDepth dynamically pads the input image. Thus we pass the original image size as argument
|
||||
>>> ## to `post_process_depth_estimation` to remove the padding and resize to original dimensions.
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... source_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
# visualize the prediction
|
||||
output = prediction.squeeze().cpu().numpy()
|
||||
formatted = (output * 255 / np.max(output)).astype("uint8")
|
||||
depth = Image.fromarray(formatted)
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
<Tip>
|
||||
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
|
||||
<pre><code class="language-Python">>>> with torch.no_grad():
|
||||
... outputs = model(pixel_values)
|
||||
... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... source_sizes=[(image.height, image.width)],
|
||||
... outputs_flipped=outputs_flipped,
|
||||
... )
|
||||
</code></pre>
|
||||
</Tip>
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by ЁЯМО) resources to help you get started with ZoeDepth.
|
||||
|
||||
@ -43,7 +43,7 @@ As a result, you can load a specific model version with the `revision` parameter
|
||||
|
||||
```py
|
||||
>>> model = AutoModel.from_pretrained(
|
||||
... "julien-c/EsperBERTo-small", revision="v2.0.1" # tag name, or branch name, or commit hash
|
||||
... "julien-c/EsperBERTo-small", revision="4c77982" # tag name, or branch name, or commit hash
|
||||
... )
|
||||
```
|
||||
|
||||
|
||||
68
docs/source/en/perf_infer_gpu_multi.md
Normal file
68
docs/source/en/perf_infer_gpu_multi.md
Normal file
@ -0,0 +1,68 @@
|
||||
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
|
||||
тЪая╕П Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Multi-GPU inference
|
||||
|
||||
Built-in Tensor Parallelism (TP) is now available with certain models using PyTorch. Tensor parallelism shards a model onto multiple GPUs, enabling larger model sizes, and parallelizes computations such as matrix multiplication.
|
||||
|
||||
To enable tensor parallel, pass the argument `tp_plan="auto"` to [`~AutoModelForCausalLM.from_pretrained`]:
|
||||
|
||||
```python
|
||||
import os
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
|
||||
# Initialize distributed
|
||||
rank = int(os.environ["RANK"])
|
||||
device = torch.device(f"cuda:{rank}")
|
||||
torch.distributed.init_process_group("nccl", device_id=device)
|
||||
|
||||
# Retrieve tensor parallel model
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
tp_plan="auto",
|
||||
)
|
||||
|
||||
# Prepare input tokens
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
prompt = "Can I help"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
|
||||
|
||||
# Distributed run
|
||||
outputs = model(inputs)
|
||||
```
|
||||
|
||||
You can use `torchrun` to launch the above script with multiple processes, each mapping to a GPU:
|
||||
|
||||
```
|
||||
torchrun --nproc-per-node 4 demo.py
|
||||
```
|
||||
|
||||
PyTorch tensor parallel is currently supported for the following models:
|
||||
* [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)
|
||||
|
||||
You can request to add tensor parallel support for another model by opening a GitHub Issue or Pull Request.
|
||||
|
||||
### Expected speedups
|
||||
|
||||
You can benefit from considerable speedups for inference, especially for inputs with large batch size or long sequences.
|
||||
|
||||
For a single forward pass on [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel) with a sequence length of 512 and various batch sizes, the expected speedup is as follows:
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Meta-Llama-3-8B-Instruct, seqlen = 512, python, w_ compile.png">
|
||||
</div>
|
||||
@ -42,6 +42,7 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [Chameleon](https://huggingface.co/docs/transformers/model_doc/chameleon#transformers.Chameleon)
|
||||
* [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel)
|
||||
* [Cohere](https://huggingface.co/docs/transformers/model_doc/cohere#transformers.CohereModel)
|
||||
* [GLM](https://huggingface.co/docs/transformers/model_doc/glm#transformers.GLMModel)
|
||||
* [Dbrx](https://huggingface.co/docs/transformers/model_doc/dbrx#transformers.DbrxModel)
|
||||
* [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)
|
||||
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
|
||||
@ -70,13 +71,16 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [MBart](https://huggingface.co/docs/transformers/model_doc/mbart#transformers.MBartModel)
|
||||
* [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral#transformers.MistralModel)
|
||||
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
|
||||
* [Moshi](https://huggingface.co/docs/transformers/model_doc/moshi#transformers.MoshiModel)
|
||||
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
|
||||
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
|
||||
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
|
||||
* [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)
|
||||
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
|
||||
* [OLMo November 2024](https://huggingface.co/docs/transformers/model_doc/olmo_1124#transformers.Olmo1124Model)
|
||||
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
|
||||
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
|
||||
* [PaliGemma](https://huggingface.co/docs/transformers/model_doc/paligemma#transformers.PaliGemmaForConditionalGeneration)
|
||||
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
|
||||
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
|
||||
* [PhiMoE](https://huggingface.co/docs/transformers/model_doc/phimoe#transformers.PhimoeModel)
|
||||
@ -86,6 +90,10 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [Qwen2Audio](https://huggingface.co/docs/transformers/model_doc/qwen2_audio#transformers.Qwen2AudioEncoder)
|
||||
* [Qwen2MoE](https://huggingface.co/docs/transformers/model_doc/qwen2_moe#transformers.Qwen2MoeModel)
|
||||
* [Qwen2VL](https://huggingface.co/docs/transformers/model_doc/qwen2_vl#transformers.Qwen2VLModel)
|
||||
* [RAG](https://huggingface.co/docs/transformers/model_doc/rag#transformers.RagModel)
|
||||
* [SpeechEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/speech_encoder_decoder#transformers.SpeechEncoderDecoderModel)
|
||||
* [VisionEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/vision_encoder_decoder#transformers.VisionEncoderDecoderModel)
|
||||
* [VisionTextDualEncoder](https://huggingface.co/docs/transformers/model_doc/vision_text_dual_encoder#transformers.VisionTextDualEncoderModel)
|
||||
* [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel)
|
||||
* [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2Model)
|
||||
* [Hubert](https://huggingface.co/docs/transformers/model_doc/hubert#transformers.HubertModel)
|
||||
@ -215,6 +223,7 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert#transformers.CamembertModel)
|
||||
* [Chameleon](https://huggingface.co/docs/transformers/model_doc/chameleon#transformers.Chameleon)
|
||||
* [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel)
|
||||
* [GLM](https://huggingface.co/docs/transformers/model_doc/glm#transformers.GLMModel)
|
||||
* [Cohere](https://huggingface.co/docs/transformers/model_doc/cohere#transformers.CohereModel)
|
||||
* [data2vec_audio](https://huggingface.co/docs/transformers/main/en/model_doc/data2vec#transformers.Data2VecAudioModel)
|
||||
* [Dbrx](https://huggingface.co/docs/transformers/model_doc/dbrx#transformers.DbrxModel)
|
||||
@ -222,6 +231,7 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [Dinov2](https://huggingface.co/docs/transformers/en/model_doc/dinov2)
|
||||
* [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)
|
||||
* [Dpr](https://huggingface.co/docs/transformers/model_doc/dpr#transformers.DprReader)
|
||||
* [EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder_decoder#transformers.EncoderDecoderModel)
|
||||
* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
|
||||
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
|
||||
* [Gemma2](https://huggingface.co/docs/transformers/model_doc/gemma2#transformers.Gemma2Model)
|
||||
@ -230,21 +240,28 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [GPTNeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox#transformers.GPTNeoXModel)
|
||||
* [Hubert](https://huggingface.co/docs/transformers/model_doc/hubert#transformers.HubertModel)
|
||||
* [Idefics](https://huggingface.co/docs/transformers/model_doc/idefics#transformers.IdeficsModel)
|
||||
* [Idefics2](https://huggingface.co/docs/transformers/model_doc/idefics2#transformers.Idefics2Model)
|
||||
* [Idefics3](https://huggingface.co/docs/transformers/model_doc/idefics3#transformers.Idefics3Model)
|
||||
* [Granite](https://huggingface.co/docs/transformers/model_doc/granite#transformers.GraniteModel)
|
||||
* [GraniteMoe](https://huggingface.co/docs/transformers/model_doc/granitemoe#transformers.GraniteMoeModel)
|
||||
* [JetMoe](https://huggingface.co/docs/transformers/model_doc/jetmoe#transformers.JetMoeModel)
|
||||
* [Jamba](https://huggingface.co/docs/transformers/model_doc/jamba#transformers.JambaModel)
|
||||
* [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)
|
||||
* [Llava](https://huggingface.co/docs/transformers/model_doc/llava)
|
||||
* [Llava-NeXT](https://huggingface.co/docs/transformers/model_doc/llava_next)
|
||||
* [Llava-NeXT-Video](https://huggingface.co/docs/transformers/model_doc/llava_next_video)
|
||||
* [LLaVA-Onevision](https://huggingface.co/docs/transformers/model_doc/llava_onevision)
|
||||
* [M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100#transformers.M2M100Model)
|
||||
* [Mimi](https://huggingface.co/docs/transformers/model_doc/mimi)
|
||||
* [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral#transformers.MistralModel)
|
||||
* [Mllama](https://huggingface.co/docs/transformers/model_doc/mllama#transformers.MllamaForConditionalGeneration)
|
||||
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
|
||||
* [Moshi](https://huggingface.co/docs/transformers/model_doc/moshi#transformers.MoshiModel)
|
||||
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
|
||||
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
|
||||
* [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)
|
||||
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
|
||||
* [OLMo November 2024](https://huggingface.co/docs/transformers/model_doc/olmo_1124#transformers.Olmo1124Model)
|
||||
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
|
||||
* [OPT](https://huggingface.co/docs/transformers/en/model_doc/opt)
|
||||
* [PaliGemma](https://huggingface.co/docs/transformers/model_doc/paligemma#transformers.PaliGemmaForConditionalGeneration)
|
||||
@ -273,11 +290,17 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
|
||||
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
|
||||
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
|
||||
* [SpeechEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/speech_encoder_decoder#transformers.SpeechEncoderDecoderModel)
|
||||
* [VideoLlava](https://huggingface.co/docs/transformers/model_doc/video_llava)
|
||||
* [VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)
|
||||
* [VisionEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/vision_encoder_decoder#transformers.VisionEncoderDecoderModel)
|
||||
* [ViT](https://huggingface.co/docs/transformers/model_doc/vit#transformers.ViTModel)
|
||||
* [ViTHybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid#transformers.ViTHybridModel)
|
||||
* [ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae#transformers.ViTMAEModel)
|
||||
* [ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn#transformers.ViTMSNModel)
|
||||
* [VisionTextDualEncoder](https://huggingface.co/docs/transformers/model_doc/vision_text_dual_encoder#transformers.VisionTextDualEncoderModel)
|
||||
* [VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae#transformers.VideoMAEModell)
|
||||
* [ViViT](https://huggingface.co/docs/transformers/model_doc/vivit#transformers.VivitModel)
|
||||
* [wav2vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2Model)
|
||||
* [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel)
|
||||
* [XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel)
|
||||
|
||||
@ -18,11 +18,11 @@ rendered properly in your Markdown viewer.
|
||||
This guide focuses on training large models efficiently on CPU.
|
||||
|
||||
## Mixed precision with IPEX
|
||||
Mixed precision uses single (fp32) and half-precision (bf16/fp16) data types in a model to accelerate training or inference while still preserving much of the single-precision accuracy. Modern CPUs such as 3rd and 4th Gen Intel┬о Xeon┬о Scalable processors natively support bf16, so you should get more performance out of the box by enabling mixed precision training with bf16.
|
||||
Mixed precision uses single (fp32) and half-precision (bf16/fp16) data types in a model to accelerate training or inference while still preserving much of the single-precision accuracy. Modern CPUs such as 3rd, 4th, and 5th Gen Intel┬о Xeon┬о Scalable processors natively support bf16. 6th Gen Intel┬о Xeon┬о Scalable processors natively support bf16 and fp16. You should get more performance out of the box by enabling mixed precision training with bf16 or fp16.
|
||||
|
||||
To further maximize training performance, you can use Intel┬о Extension for PyTorch (IPEX), which is a library built on PyTorch and adds additional CPU instruction level architecture (ISA) level support such as Intel┬о Advanced Vector Extensions 512 Vector Neural Network Instructions (Intel┬о AVX512-VNNI), and Intel┬о Advanced Matrix Extensions (Intel┬о AMX) for an extra performance boost on Intel CPUs. However, CPUs with only AVX2 (e.g., AMD or older Intel CPUs) are not guaranteed to have better performance under IPEX.
|
||||
|
||||
Auto Mixed Precision (AMP) for CPU backends has been enabled since PyTorch 1.10. AMP support for bf16 on CPUs and bf16 operator optimization is also supported in IPEX and partially upstreamed to the main PyTorch branch. You can get better performance and user experience with IPEX AMP.
|
||||
Auto Mixed Precision (AMP) for CPU backends has been enabled since PyTorch 1.10. AMP support for bf16/fp16 on CPUs and bf16/fp16 operator optimization is also supported in IPEX and partially upstreamed to the main PyTorch branch. You can get better performance and user experience with IPEX AMP.
|
||||
|
||||
Check more detailed information for [Auto Mixed Precision](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/features/amp.html).
|
||||
|
||||
@ -32,10 +32,10 @@ IPEX release is following PyTorch, to install via pip:
|
||||
|
||||
| PyTorch Version | IPEX version |
|
||||
| :---------------: | :----------: |
|
||||
| 2.1.x | 2.1.100+cpu |
|
||||
| 2.0.x | 2.0.100+cpu |
|
||||
| 1.13 | 1.13.0+cpu |
|
||||
| 1.12 | 1.12.300+cpu |
|
||||
| 2.5.0 | 2.5.0+cpu |
|
||||
| 2.4.0 | 2.4.0+cpu |
|
||||
| 2.3.0 | 2.3.0+cpu |
|
||||
| 2.2.0 | 2.2.0+cpu |
|
||||
|
||||
Please run `pip list | grep torch` to get your `pytorch_version`, so you can get the `IPEX version_name`.
|
||||
```bash
|
||||
@ -46,7 +46,7 @@ You can check the latest versions in [ipex-whl-stable-cpu](https://developer.int
|
||||
Check more approaches for [IPEX installation](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
|
||||
|
||||
### Usage in Trainer
|
||||
To enable auto mixed precision with IPEX in Trainer, users should add `use_ipex`, `bf16` and `no_cuda` in training command arguments.
|
||||
To enable auto mixed precision with IPEX in Trainer, users should add `use_ipex`, `bf16` or `fp16`, and `no_cuda` in training command arguments.
|
||||
|
||||
Take an example of the use cases on [Transformers question-answering](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)
|
||||
|
||||
|
||||
@ -30,46 +30,32 @@ Check more detailed information for [oneccl_bind_pt](https://github.com/intel/to
|
||||
|
||||
Wheel files are available for the following Python versions:
|
||||
|
||||
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
|
||||
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: |
|
||||
| 2.1.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 2.0.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 1.13.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 1.12.100 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 1.12.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| Extension Version | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 | Python 3.11 |
|
||||
| :---------------: | :--------: | :--------: | :--------: | :---------: | :---------: |
|
||||
| 2.5.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 2.4.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 2.3.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
| 2.2.0 | | тИЪ | тИЪ | тИЪ | тИЪ |
|
||||
|
||||
Please run `pip list | grep torch` to get your `pytorch_version`.
|
||||
```bash
|
||||
pip install oneccl_bind_pt=={pytorch_version} -f https://developer.intel.com/ipex-whl-stable-cpu
|
||||
```
|
||||
where `{pytorch_version}` should be your PyTorch version, for instance 2.1.0.
|
||||
where `{pytorch_version}` should be your PyTorch version, for instance 2.4.0.
|
||||
Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl).
|
||||
Versions of oneCCL and PyTorch must match.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
oneccl_bindings_for_pytorch 1.12.0 prebuilt wheel does not work with PyTorch 1.12.1 (it is for PyTorch 1.12.0)
|
||||
PyTorch 1.12.1 should work with oneccl_bindings_for_pytorch 1.12.100
|
||||
|
||||
</Tip>
|
||||
|
||||
## Intel┬о MPI library
|
||||
Use this standards-based MPI implementation to deliver flexible, efficient, scalable cluster messaging on Intel┬о architecture. This component is part of the Intel┬о oneAPI HPC Toolkit.
|
||||
|
||||
oneccl_bindings_for_pytorch is installed along with the MPI tool set. Need to source the environment before using it.
|
||||
|
||||
for Intel┬о oneCCL >= 1.12.0
|
||||
```bash
|
||||
oneccl_bindings_for_pytorch_path=$(python -c "from oneccl_bindings_for_pytorch import cwd; print(cwd)")
|
||||
source $oneccl_bindings_for_pytorch_path/env/setvars.sh
|
||||
```
|
||||
|
||||
for Intel┬о oneCCL whose version < 1.12.0
|
||||
```bash
|
||||
torch_ccl_path=$(python -c "import torch; import torch_ccl; import os; print(os.path.abspath(os.path.dirname(torch_ccl.__file__)))")
|
||||
source $torch_ccl_path/env/setvars.sh
|
||||
```
|
||||
|
||||
#### Intel┬о Extension for PyTorch installation
|
||||
|
||||
Intel Extension for PyTorch (IPEX) provides performance optimizations for CPU training with both Float32 and BFloat16 (refer to the [single CPU section](./perf_train_cpu) to learn more).
|
||||
@ -155,7 +141,7 @@ This example assumes that you have:
|
||||
The snippet below is an example of a Dockerfile that uses a base image that supports distributed CPU training and then
|
||||
extracts a Transformers release to the `/workspace` directory, so that the example scripts are included in the image:
|
||||
```dockerfile
|
||||
FROM intel/intel-optimized-pytorch:2.3.0-pip-multinode
|
||||
FROM intel/intel-optimized-pytorch:2.4.0-pip-multinode
|
||||
|
||||
RUN apt-get update -y && \
|
||||
apt-get install -y --no-install-recommends --fix-missing \
|
||||
@ -165,7 +151,7 @@ RUN apt-get update -y && \
|
||||
WORKDIR /workspace
|
||||
|
||||
# Download and extract the transformers code
|
||||
ARG HF_TRANSFORMERS_VER="4.44.0"
|
||||
ARG HF_TRANSFORMERS_VER="4.46.0"
|
||||
RUN pip install --no-cache-dir \
|
||||
transformers==${HF_TRANSFORMERS_VER} && \
|
||||
mkdir transformers && \
|
||||
@ -319,4 +305,4 @@ with the job, the PyTorchJob resource can be deleted from the cluster using `kub
|
||||
|
||||
This guide covered running distributed PyTorch training jobs using multiple CPUs on bare metal and on a Kubernetes
|
||||
cluster. Both cases utilize Intel Extension for PyTorch and Intel oneCCL Bindings for PyTorch for optimal training
|
||||
performance, and can be used as a template to run your own workload on multiple nodes.
|
||||
performance, and can be used as a template to run your own workload on multiple nodes.
|
||||
|
||||
@ -53,7 +53,7 @@ sections we go through the steps to run inference on CPU and single/multi-GPU se
|
||||
|
||||
* [Inference on a single CPU](perf_infer_cpu)
|
||||
* [Inference on a single GPU](perf_infer_gpu_one)
|
||||
* [Multi-GPU inference](perf_infer_gpu_one)
|
||||
* [Multi-GPU inference](perf_infer_gpu_multi)
|
||||
* [XLA Integration for TensorFlow Models](tf_xla)
|
||||
|
||||
|
||||
|
||||
@ -107,7 +107,8 @@ max_length = model.config.n_positions
|
||||
stride = 512
|
||||
seq_len = encodings.input_ids.size(1)
|
||||
|
||||
nlls = []
|
||||
nll_sum = 0.0
|
||||
n_tokens = 0
|
||||
prev_end_loc = 0
|
||||
for begin_loc in tqdm(range(0, seq_len, stride)):
|
||||
end_loc = min(begin_loc + max_length, seq_len)
|
||||
@ -124,13 +125,19 @@ for begin_loc in tqdm(range(0, seq_len, stride)):
|
||||
# to the left by 1.
|
||||
neg_log_likelihood = outputs.loss
|
||||
|
||||
nlls.append(neg_log_likelihood)
|
||||
# Accumulate the total negative log-likelihood and the total number of tokens
|
||||
num_valid_tokens = (target_ids != -100).sum().item() # number of valid tokens in target_ids
|
||||
batch_size = target_ids.size(0)
|
||||
num_loss_tokens = num_valid_tokens - batch_size # subtract batch_size due to internal label shift
|
||||
nll_sum += neg_log_likelihood * num_loss_tokens
|
||||
n_tokens += num_loss_tokens
|
||||
|
||||
prev_end_loc = end_loc
|
||||
if end_loc == seq_len:
|
||||
break
|
||||
|
||||
ppl = torch.exp(torch.stack(nlls).mean())
|
||||
avg_nll = nll_sum / n_tokens # average negative log-likelihood per token
|
||||
ppl = torch.exp(avg_nll)
|
||||
```
|
||||
|
||||
Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window
|
||||
@ -139,5 +146,5 @@ and the better the reported perplexity will typically be.
|
||||
|
||||
When we run the above with `stride = 1024`, i.e. no overlap, the resulting PPL is `19.44`, which is about the same
|
||||
as the `19.93` reported in the GPT-2 paper. By using `stride = 512` and thereby employing our striding window
|
||||
strategy, this jumps down to `16.45`. This is not only a more favorable score, but is calculated in a way that is
|
||||
strategy, this jumps down to `16.44`. This is not only a more favorable score, but is calculated in a way that is
|
||||
closer to the true autoregressive decomposition of a sequence likelihood.
|
||||
|
||||
@ -45,19 +45,19 @@ In short, supporting a wide range of quantization methods allows you to pick the
|
||||
|
||||
Use the table below to help you decide which quantization method to use.
|
||||
|
||||
| Quantization method | On the fly quantization | CPU | CUDA GPU | RoCm GPU (AMD) | Metal (Apple Silicon) | torch.compile() support | Number of bits | Supports fine-tuning (through PEFT) | Serializable with ЁЯдЧ transformers | ЁЯдЧ transformers support | Link to library |
|
||||
|-------------------------------------|-------------------------|-----|----------|----------------|-----------------------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------|
|
||||
| [AQLM](./aqlm) | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | 1 / 2 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/Vahe1994/AQLM |
|
||||
| [AWQ](./awq) | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ? | 4 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/casper-hansen/AutoAWQ |
|
||||
| [bitsandbytes](./bitsandbytes) | ЁЯЯв | ЁЯЯб * | ЁЯЯв | ЁЯЯб * | ЁЯФ┤ ** | ЁЯФ┤ (soon!) | 4 / 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/bitsandbytes-foundation/bitsandbytes |
|
||||
| [compressed-tensors](./compressed_tensors) | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | 1 - 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/neuralmagic/compressed-tensors |
|
||||
| [EETQ](./eetq) | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ? | 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/NetEase-FuXi/EETQ |
|
||||
| GGUF / GGML (llama.cpp) | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | 1 - 8 | ЁЯФ┤ | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
|
||||
| [GPTQ](./gptq) | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | 2 - 3 - 4 - 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/AutoGPTQ/AutoGPTQ |
|
||||
| [HQQ](./hqq) | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | 1 - 8 | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | https://github.com/mobiusml/hqq/ |
|
||||
| [Quanto](./quanto) | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | 2 / 4 / 8 | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | https://github.com/huggingface/quanto |
|
||||
| [FBGEMM_FP8](./fbgemm_fp8.md) | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | 8 | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | https://github.com/pytorch/FBGEMM |
|
||||
| [torchao](./torchao.md) | ЁЯЯв | | ЁЯЯв | ЁЯФ┤ | partial support (int4 weight only) | | 4 / 8 | | ЁЯЯвЁЯФ┤ | ЁЯЯв | https://github.com/pytorch/ao |
|
||||
| Quantization method | On the fly quantization | CPU | CUDA GPU | RoCm GPU (AMD) | Metal (Apple Silicon) | Intel GPU | torch.compile() support | Number of bits | Supports fine-tuning (through PEFT) | Serializable with ЁЯдЧ transformers | ЁЯдЧ transformers support | Link to library |
|
||||
|-------------------------------------|-------------------------|-----|----------|----------------|-----------------------|-----------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------|
|
||||
| [AQLM](./aqlm) | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | 1 / 2 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/Vahe1994/AQLM |
|
||||
| [AWQ](./awq) | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ? | 4 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/casper-hansen/AutoAWQ |
|
||||
| [bitsandbytes](./bitsandbytes) | ЁЯЯв | ЁЯЯб * | ЁЯЯв | ЁЯЯб * | ЁЯФ┤ ** | ЁЯЯб * | ЁЯФ┤ (soon!) | 4 / 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/bitsandbytes-foundation/bitsandbytes |
|
||||
| [compressed-tensors](./compressed_tensors) | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | 1 - 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/neuralmagic/compressed-tensors |
|
||||
| [EETQ](./eetq) | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | ? | 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/NetEase-FuXi/EETQ |
|
||||
| GGUF / GGML (llama.cpp) | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | 1 - 8 | ЁЯФ┤ | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
|
||||
| [GPTQ](./gptq) | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | 2 - 3 - 4 - 8 | ЁЯЯв | ЁЯЯв | ЁЯЯв | https://github.com/AutoGPTQ/AutoGPTQ |
|
||||
| [HQQ](./hqq) | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | 1 - 8 | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | https://github.com/mobiusml/hqq/ |
|
||||
| [optimum-quanto](./quanto) | ЁЯЯв | ЁЯЯв | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | 2 / 4 / 8 | ЁЯФ┤ | ЁЯФ┤ | ЁЯЯв | https://github.com/huggingface/optimum-quanto |
|
||||
| [FBGEMM_FP8](./fbgemm_fp8.md) | ЁЯЯв | ЁЯФ┤ | ЁЯЯв | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | ЁЯФ┤ | 8 | ЁЯФ┤ | ЁЯЯв | ЁЯЯв | https://github.com/pytorch/FBGEMM |
|
||||
| [torchao](./torchao.md) | ЁЯЯв | | ЁЯЯв | ЁЯФ┤ | partial support (int4 weight only) | ЁЯФ┤ | | 4 / 8 | | ЁЯЯвЁЯФ┤ | ЁЯЯв | https://github.com/pytorch/ao |
|
||||
|
||||
<Tip>
|
||||
|
||||
|
||||
@ -14,21 +14,21 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Quanto
|
||||
# Optimum-quanto
|
||||
|
||||
<Tip>
|
||||
|
||||
Try Quanto + transformers with this [notebook](https://colab.research.google.com/drive/16CXfVmtdQvciSh9BopZUDYcmXCDpvgrT?usp=sharing)!
|
||||
Try optimum-quanto + transformers with this [notebook](https://colab.research.google.com/drive/16CXfVmtdQvciSh9BopZUDYcmXCDpvgrT?usp=sharing)!
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
[ЁЯдЧ Quanto](https://github.com/huggingface/quanto) library is a versatile pytorch quantization toolkit. The quantization method used is the linear quantization. Quanto provides several unique features such as:
|
||||
[ЁЯдЧ optimum-quanto](https://github.com/huggingface/optimum-quanto) library is a versatile pytorch quantization toolkit. The quantization method used is the linear quantization. Quanto provides several unique features such as:
|
||||
|
||||
- weights quantization (`float8`,`int8`,`int4`,`int2`)
|
||||
- activation quantization (`float8`,`int8`)
|
||||
- modality agnostic (e.g CV,LLM)
|
||||
- device agnostic (e.g CUDA,MPS,CPU)
|
||||
- device agnostic (e.g CUDA,XPU,MPS,CPU)
|
||||
- compatibility with `torch.compile`
|
||||
- easy to add custom kernel for specific device
|
||||
- supports quantization aware training
|
||||
@ -37,12 +37,12 @@ Try Quanto + transformers with this [notebook](https://colab.research.google.com
|
||||
Before you begin, make sure the following libraries are installed:
|
||||
|
||||
```bash
|
||||
pip install quanto accelerate transformers
|
||||
pip install optimum-quanto accelerate transformers
|
||||
```
|
||||
|
||||
Now you can quantize a model by passing [`QuantoConfig`] object in the [`~PreTrainedModel.from_pretrained`] method. This works for any model in any modality, as long as it contains `torch.nn.Linear` layers.
|
||||
|
||||
The integration with transformers only supports weights quantization. For the more complex use case such as activation quantization, calibration and quantization aware training, you should use [quanto](https://github.com/huggingface/quanto) library instead.
|
||||
The integration with transformers only supports weights quantization. For the more complex use case such as activation quantization, calibration and quantization aware training, you should use [optimum-quanto](https://github.com/huggingface/optimum-quanto) library instead.
|
||||
|
||||
```py
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, QuantoConfig
|
||||
@ -55,7 +55,7 @@ quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cud
|
||||
|
||||
Note that serialization is not supported yet with transformers but it is coming soon! If you want to save the model, you can use quanto library instead.
|
||||
|
||||
Quanto library uses linear quantization algorithm for quantization. Even though this is a basic quantization technique, we get very good results! Have a look at the following benchmark (llama-2-7b on perplexity metric). You can find more benchmarks [here](https://github.com/huggingface/quanto/tree/main/bench/generation)
|
||||
Optimum-quanto library uses linear quantization algorithm for quantization. Even though this is a basic quantization technique, we get very good results! Have a look at the following benchmark (llama-2-7b on perplexity metric). You can find more benchmarks [here](https://github.com/huggingface/optimum-quanto/tree/main/bench/generation)
|
||||
|
||||
<div class="flex gap-4">
|
||||
<div>
|
||||
|
||||
@ -360,8 +360,8 @@ One particularly cool ЁЯдЧ Transformers feature is the ability to save a model a
|
||||
```py
|
||||
>>> from transformers import AutoModel
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
|
||||
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
|
||||
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
|
||||
```
|
||||
</pt>
|
||||
<tf>
|
||||
@ -369,8 +369,8 @@ One particularly cool ЁЯдЧ Transformers feature is the ability to save a model a
|
||||
```py
|
||||
>>> from transformers import TFAutoModel
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
|
||||
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
|
||||
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
|
||||
```
|
||||
</tf>
|
||||
</frameworkcontent>
|
||||
|
||||
@ -386,9 +386,9 @@ The use and prompting for the conversational use is very similar to using the ba
|
||||
```py
|
||||
>>> import torch
|
||||
>>> from transformers import IdeficsForVisionText2Text, AutoProcessor
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
|
||||
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
|
||||
>>> device, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> checkpoint = "HuggingFaceM4/idefics-9b-instruct"
|
||||
>>> model = IdeficsForVisionText2Text.from_pretrained(checkpoint, torch_dtype=torch.bfloat16).to(device)
|
||||
>>> processor = AutoProcessor.from_pretrained(checkpoint)
|
||||
|
||||
@ -256,8 +256,9 @@ image
|
||||
Prepare image for the model.
|
||||
|
||||
```python
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
|
||||
from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
device, _, _ = get_backend()
|
||||
inputs = processor(images=image, return_tensors="pt").to(device)
|
||||
pixel_values = inputs.pixel_values
|
||||
```
|
||||
|
||||
@ -26,7 +26,7 @@ after a natural disaster, monitoring crop health, or helping screen medical imag
|
||||
|
||||
This guide illustrates how to:
|
||||
|
||||
1. Fine-tune [ViT](model_doc/vit) on the [Food-101](https://huggingface.co/datasets/food101) dataset to classify a food item in an image.
|
||||
1. Fine-tune [ViT](../model_doc/vit) on the [Food-101](https://huggingface.co/datasets/food101) dataset to classify a food item in an image.
|
||||
2. Use your fine-tuned model for inference.
|
||||
|
||||
<Tip>
|
||||
|
||||
@ -43,8 +43,9 @@ Let's see the pipeline in action. First, initialize the pipeline. If you don't p
|
||||
```python
|
||||
import torch
|
||||
from transformers import pipeline
|
||||
|
||||
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
||||
from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
DEVICE, _, _ = get_backend()
|
||||
pipe = pipeline(task="image-feature-extraction", model_name="google/vit-base-patch16-384", device=DEVICE, pool=True)
|
||||
```
|
||||
|
||||
|
||||
@ -120,6 +120,46 @@ print(generated_texts)
|
||||
## ['User: What do we see in this image? \nAssistant: In this image we can see two cats on the nets. \nUser: And how about this image? \nAssistant: In this image we can see flowers, plants and insect.']
|
||||
```
|
||||
|
||||
## Pipeline
|
||||
|
||||
The fastest way to get started is to use the [`Pipeline`] API. Specify the `"image-text-to-text"` task and the model you want to use.
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
pipe = pipeline("image-text-to-text", model="llava-hf/llava-interleave-qwen-0.5b-hf")
|
||||
```
|
||||
|
||||
The example below uses chat templates to format the text inputs.
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "image",
|
||||
"image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg",
|
||||
},
|
||||
{"type": "text", "text": "Describe this image."},
|
||||
],
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": [
|
||||
{"type": "text", "text": "There's a pink flower"},
|
||||
],
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
Pass the chat template formatted text and image to [`Pipeline`] and set `return_full_text=False` to remove the input from the generated output.
|
||||
|
||||
```python
|
||||
outputs = pipe(text=messages, max_new_tokens=20, return_full_text=False)
|
||||
outputs[0]["generated_text"]
|
||||
# with a yellow center in the foreground. The flower is surrounded by red and white flowers with green stems
|
||||
```
|
||||
|
||||
## Streaming
|
||||
|
||||
We can use [text streaming](./generation_strategies#streaming) for a better generation experience. Transformers supports streaming with the [`TextStreamer`] or [`TextIteratorStreamer`] classes. We will use the [`TextIteratorStreamer`] with IDEFICS-8B.
|
||||
|
||||
@ -37,8 +37,9 @@ We can now initialize the pipeline with a [Swin2SR model](https://huggingface.co
|
||||
```python
|
||||
from transformers import pipeline
|
||||
import torch
|
||||
|
||||
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
||||
from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
device, _, _ = get_backend()
|
||||
pipe = pipeline(task="image-to-image", model="caidas/swin2SR-lightweight-x2-64", device=device)
|
||||
```
|
||||
|
||||
|
||||
@ -58,7 +58,7 @@ from transformers import TrainingArguments, Trainer
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
|
||||
from accelerate.test_utils.testing import get_backend
|
||||
|
||||
class ImageDistilTrainer(Trainer):
|
||||
def __init__(self, teacher_model=None, student_model=None, temperature=None, lambda_param=None, *args, **kwargs):
|
||||
@ -66,7 +66,7 @@ class ImageDistilTrainer(Trainer):
|
||||
self.teacher = teacher_model
|
||||
self.student = student_model
|
||||
self.loss_function = nn.KLDivLoss(reduction="batchmean")
|
||||
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
||||
device, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
self.teacher.to(device)
|
||||
self.teacher.eval()
|
||||
self.temperature = temperature
|
||||
|
||||
@ -125,9 +125,9 @@ the processor.
|
||||
```python
|
||||
from transformers import SamModel, SamProcessor
|
||||
import torch
|
||||
|
||||
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
||||
|
||||
from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
device, _, _ = get_backend()
|
||||
model = SamModel.from_pretrained("facebook/sam-vit-base").to(device)
|
||||
processor = SamProcessor.from_pretrained("facebook/sam-vit-base")
|
||||
```
|
||||
|
||||
@ -53,8 +53,9 @@ Instantiate a pipeline from a [checkpoint on the Hugging Face Hub](https://huggi
|
||||
```py
|
||||
>>> from transformers import pipeline
|
||||
>>> import torch
|
||||
|
||||
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> device, _, _ = get_backend()
|
||||
>>> checkpoint = "depth-anything/Depth-Anything-V2-base-hf"
|
||||
>>> pipe = pipeline("depth-estimation", model=checkpoint, device=device)
|
||||
```
|
||||
@ -126,97 +127,34 @@ Pass the prepared inputs through the model:
|
||||
... outputs = model(pixel_values)
|
||||
```
|
||||
|
||||
Let's post-process and visualize the results.
|
||||
|
||||
We need to pad and then resize the outputs so that predicted depth map has the same dimension as the original image. After resizing we will remove the padded regions from the depth.
|
||||
Let's post-process the results to remove any padding and resize the depth map to match the original image size. The `post_process_depth_estimation` outputs a list of dicts containing the `"predicted_depth"`.
|
||||
|
||||
```py
|
||||
>>> import numpy as np
|
||||
>>> import torch.nn.functional as F
|
||||
>>> # ZoeDepth dynamically pads the input image. Thus we pass the original image size as argument
|
||||
>>> # to `post_process_depth_estimation` to remove the padding and resize to original dimensions.
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... source_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
>>> predicted_depth = outputs.predicted_depth.unsqueeze(dim=1)
|
||||
>>> height, width = pixel_values.shape[2:]
|
||||
|
||||
>>> height_padding_factor = width_padding_factor = 3
|
||||
>>> pad_h = int(np.sqrt(height/2) * height_padding_factor)
|
||||
>>> pad_w = int(np.sqrt(width/2) * width_padding_factor)
|
||||
|
||||
>>> if predicted_depth.shape[-2:] != pixel_values.shape[-2:]:
|
||||
>>> predicted_depth = F.interpolate(predicted_depth, size= (height, width), mode='bicubic', align_corners=False)
|
||||
|
||||
>>> if pad_h > 0:
|
||||
predicted_depth = predicted_depth[:, :, pad_h:-pad_h,:]
|
||||
>>> if pad_w > 0:
|
||||
predicted_depth = predicted_depth[:, :, :, pad_w:-pad_w]
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
We can now visualize the results (the function below is taken from the [GaussianObject](https://github.com/GaussianObject/GaussianObject/blob/ad6629efadb57902d5f8bc0fa562258029a4bdf1/pred_monodepth.py#L11) framework).
|
||||
|
||||
```py
|
||||
import matplotlib
|
||||
|
||||
def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
|
||||
"""Converts a depth map to a color image.
|
||||
|
||||
Args:
|
||||
value (torch.Tensor, numpy.ndarray): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
|
||||
vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
|
||||
vmax (float, optional): vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
|
||||
cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
|
||||
invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
|
||||
invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
|
||||
background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
|
||||
gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
|
||||
value_transform (Callable, optional): Apply transform function to valid pixels before coloring. Defaults to None.
|
||||
|
||||
Returns:
|
||||
numpy.ndarray, dtype - uint8: Colored depth map. Shape: (H, W, 4)
|
||||
"""
|
||||
if isinstance(value, torch.Tensor):
|
||||
value = value.detach().cpu().numpy()
|
||||
|
||||
value = value.squeeze()
|
||||
if invalid_mask is None:
|
||||
invalid_mask = value == invalid_val
|
||||
mask = np.logical_not(invalid_mask)
|
||||
|
||||
# normalize
|
||||
vmin = np.percentile(value[mask],2) if vmin is None else vmin
|
||||
vmax = np.percentile(value[mask],85) if vmax is None else vmax
|
||||
if vmin != vmax:
|
||||
value = (value - vmin) / (vmax - vmin) # vmin..vmax
|
||||
else:
|
||||
# Avoid 0-division
|
||||
value = value * 0.
|
||||
|
||||
# squeeze last dim if it exists
|
||||
# grey out the invalid values
|
||||
|
||||
value[invalid_mask] = np.nan
|
||||
cmapper = matplotlib.colormaps.get_cmap(cmap)
|
||||
if value_transform:
|
||||
value = value_transform(value)
|
||||
# value = value / value.max()
|
||||
value = cmapper(value, bytes=True) # (nxmx4)
|
||||
|
||||
# img = value[:, :, :]
|
||||
img = value[...]
|
||||
img[invalid_mask] = background_color
|
||||
|
||||
# return img.transpose((2, 0, 1))
|
||||
if gamma_corrected:
|
||||
# gamma correction
|
||||
img = img / 255
|
||||
img = np.power(img, 2.2)
|
||||
img = img * 255
|
||||
img = img.astype(np.uint8)
|
||||
return img
|
||||
|
||||
>>> result = colorize(predicted_depth.cpu().squeeze().numpy())
|
||||
>>> Image.fromarray(result)
|
||||
```
|
||||
|
||||
|
||||
<Tip>
|
||||
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
|
||||
<pre><code class="language-Python">>>> with torch.no_grad():
|
||||
... outputs = model(pixel_values)
|
||||
... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... source_sizes=[(image.height, image.width)],
|
||||
... outputs_flipped=outputs_flipped,
|
||||
... )
|
||||
</code></pre>
|
||||
</Tip>
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/depth-visualization-zoe.png" alt="Depth estimation visualization"/>
|
||||
|
||||
@ -1488,7 +1488,9 @@ Now that you have finetuned a model, evaluated it, and uploaded it to the Huggin
|
||||
|
||||
Load model and image processor from the Hugging Face Hub (skip to use already trained in this session):
|
||||
```py
|
||||
>>> device = "cuda"
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> device, _, _ = get_backend()
|
||||
>>> model_repo = "qubvel-hf/detr_finetuned_cppe5"
|
||||
|
||||
>>> image_processor = AutoImageProcessor.from_pretrained(model_repo)
|
||||
|
||||
@ -689,7 +689,9 @@ Reload the dataset and load an image for inference.
|
||||
We will now see how to infer without a pipeline. Process the image with an image processor and place the `pixel_values` on a GPU:
|
||||
|
||||
```py
|
||||
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # use GPU if available, otherwise use a CPU
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> device, _, _ = get_backend()
|
||||
>>> encoding = image_processor(image, return_tensors="pt")
|
||||
>>> pixel_values = encoding.pixel_values.to(device)
|
||||
```
|
||||
|
||||
@ -282,10 +282,10 @@ containing the corresponding speaker embedding.
|
||||
>>> import os
|
||||
>>> import torch
|
||||
>>> from speechbrain.inference.classifiers import EncoderClassifier
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
|
||||
>>> spk_model_name = "speechbrain/spkrec-xvect-voxceleb"
|
||||
|
||||
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
>>> device, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> speaker_model = EncoderClassifier.from_hparams(
|
||||
... source=spk_model_name,
|
||||
... run_opts={"device": device},
|
||||
|
||||
@ -363,10 +363,11 @@ GPU, if available, which we didn't need to do earlier when training, as [`Traine
|
||||
```py
|
||||
>>> from transformers import AutoProcessor, Blip2ForConditionalGeneration
|
||||
>>> import torch
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
|
||||
>>> processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
|
||||
>>> model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16)
|
||||
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
>>> device, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> model.to(device)
|
||||
```
|
||||
|
||||
|
||||
@ -182,7 +182,7 @@ There are three main components to Mask2Former:
|
||||
|
||||
The mask predictions are generated by combining the pixel-embeddings with the final decoder hidden states. The sigmoid cross-entropy and dice loss is calculated between the logits and the ground truth mask to find the most likely mask.
|
||||
|
||||
Ready to try your hand at object detection? Check out our complete [image segmentation guide](tasks/semantic_segmentation) to learn how to finetune SegFormer and use it for inference!
|
||||
Ready to try your hand at image segmentation? Check out our complete [image segmentation guide](tasks/semantic_segmentation) to learn how to finetune SegFormer and use it for inference!
|
||||
|
||||
### Depth estimation
|
||||
|
||||
@ -292,4 +292,4 @@ Ready to try your hand at translation? Check out our complete [translation guide
|
||||
|
||||
For more information about text generation, check out the [text generation strategies](generation_strategies) guide!
|
||||
|
||||
</Tip>
|
||||
</Tip>
|
||||
|
||||
@ -428,7 +428,7 @@ pytest --instafail
|
||||
|
||||
### To GPU or not to GPU
|
||||
|
||||
On a GPU-enabled setup, to test in CPU-only mode add `CUDA_VISIBLE_DEVICES=""`:
|
||||
On a GPU-enabled setup, to test in CPU-only mode add `CUDA_VISIBLE_DEVICES=""` for CUDA GPUs:
|
||||
|
||||
```bash
|
||||
CUDA_VISIBLE_DEVICES="" pytest tests/utils/test_logging.py
|
||||
@ -441,10 +441,12 @@ second gpu if you have gpus `0` and `1`, you can run:
|
||||
CUDA_VISIBLE_DEVICES="1" pytest tests/utils/test_logging.py
|
||||
```
|
||||
|
||||
For Intel GPUs, use `ZE_AFFINITY_MASK` instead of `CUDA_VISIBLE_DEVICES` in the above example.
|
||||
|
||||
This is handy when you want to run different tasks on different GPUs.
|
||||
|
||||
Some tests must be run on CPU-only, others on either CPU or GPU or TPU, yet others on multiple-GPUs. The following skip
|
||||
decorators are used to set the requirements of tests CPU/GPU/TPU-wise:
|
||||
decorators are used to set the requirements of tests CPU/GPU/XPU/TPU-wise:
|
||||
|
||||
- `require_torch` - this test will run only under torch
|
||||
- `require_torch_gpu` - as `require_torch` plus requires at least 1 GPU
|
||||
|
||||
@ -174,7 +174,7 @@ trainer = Trainer(
|
||||
processing_class=tokenizer,
|
||||
data_collator=data_collator,
|
||||
compute_metrics=compute_metrics,
|
||||
callback=[EarlyStoppingCallback()],
|
||||
callbacks=[EarlyStoppingCallback()],
|
||||
)
|
||||
```
|
||||
|
||||
@ -252,7 +252,70 @@ trainer = Trainer(..., args=training_args)
|
||||
|
||||
NEFTune is disabled after training to restore the original embedding layer to avoid any unexpected behavior.
|
||||
|
||||
## GaLore
|
||||
## Liger Kernel
|
||||
|
||||
[Liger-Kernel](https://github.com/linkedin/Liger-Kernel) Kernel is a collection of Triton kernels developed by Linkedin designed specifically for LLM training. We have implemented Hugging Face Compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more to come. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. The kernel works out of the box with flash attention, PyTorch FSDP, and Microsoft DeepSpeed.
|
||||
|
||||
<Tip>
|
||||
Gain +20% throughput and reduce memory usage by 60% on LLaMA 3-8B model training. Achieve longer context lengths and larger batch sizes. ItтАЩs also useful if you want to scale up your model to multi-head training or large vocabulary sizes. Unleash multi-head training (medusa) and more. See details and examples in [Liger](https://github.com/linkedin/Liger-Kernel/tree/main/examples)
|
||||
</Tip>
|
||||
|
||||
First make sure to install Liger official repository:
|
||||
```bash
|
||||
pip install liger-kernel
|
||||
```
|
||||
|
||||
You should pass `use_liger_kernel=True` to apply liger kernel on your model, for example:
|
||||
|
||||
```py
|
||||
from transformers import TrainingArguments
|
||||
|
||||
training_args = TrainingArguments(
|
||||
output_dir="your-model",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=2,
|
||||
weight_decay=0.01,
|
||||
eval_strategy="epoch",
|
||||
save_strategy="epoch",
|
||||
load_best_model_at_end=True,
|
||||
push_to_hub=True,
|
||||
use_liger_kernel=True
|
||||
)
|
||||
```
|
||||
|
||||
The kernel supports the Llama, Gemma, Mistral, and Mixtral model architectures. The most up-to-date list of supported models can be found [here](https://github.com/linkedin/Liger-Kernel). When `use_liger_kernel` is set to `True`, the corresponding layers in the original model will be patched with Liger's efficient implementation, so you don't need to do anything extra other than setting the argument value.
|
||||
|
||||
|
||||
## Optimizers
|
||||
|
||||
You can choose a built-in optimizer for training using:
|
||||
|
||||
```python
|
||||
from transformers import TrainingArguments
|
||||
training_args = TrainingArguments(..., optim="adamw_torch")
|
||||
```
|
||||
|
||||
See [`OptimizerNames`](https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py) for a full list of choices. We include advanced examples in the sections below.
|
||||
|
||||
You can also use an arbitrary PyTorch optimizer via:
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
optimizer_cls = torch.optim.AdamW
|
||||
optimizer_kwargs = {
|
||||
"lr": 4e-3,
|
||||
"betas": (0.9, 0.999),
|
||||
"weight_decay": 0.05,
|
||||
}
|
||||
|
||||
from transformers import Trainer
|
||||
trainer = Trainer(..., optimizer_cls_and_kwargs=(optimizer_cls, optimizer_kwargs))
|
||||
```
|
||||
|
||||
### GaLore
|
||||
|
||||
Gradient Low-Rank Projection (GaLore) is a memory-efficient low-rank training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods, such as LoRA.
|
||||
|
||||
@ -382,42 +445,7 @@ trainer.train()
|
||||
|
||||
Note layerwise optimization is a bit experimental and does not support DDP (Distributed Data Parallel), thus you can run the training script only on a single GPU. Please see [this appropriate section](https://github.com/jiaweizzhao/GaLore?tab=readme-ov-file#train-7b-model-with-a-single-gpu-with-24gb-memory) for more details. Other features such as gradient clipping, DeepSpeed, etc might not be supported out of the box. Please [raise an issue on GitHub](https://github.com/huggingface/transformers/issues) if you encounter such issue.
|
||||
|
||||
## Liger Kernel
|
||||
|
||||
[Liger-Kernel](https://github.com/linkedin/Liger-Kernel) Kernel is a collection of Triton kernels developed by Linkedin designed specifically for LLM training. We have implemented Hugging Face Compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more to come. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. The kernel works out of the box with flash attention, PyTorch FSDP, and Microsoft DeepSpeed.
|
||||
|
||||
<Tip>
|
||||
Gain +20% throughput and reduce memory usage by 60% on LLaMA 3-8B model training. Achieve longer context lengths and larger batch sizes. ItтАЩs also useful if you want to scale up your model to multi-head training or large vocabulary sizes. Unleash multi-head training (medusa) and more. See details and examples in [Liger](https://github.com/linkedin/Liger-Kernel/tree/main/examples)
|
||||
</Tip>
|
||||
|
||||
First make sure to install Liger official repository:
|
||||
```bash
|
||||
pip install liger-kernel
|
||||
```
|
||||
|
||||
You should pass `use_liger_kernel=True` to apply liger kernel on your model, for example:
|
||||
|
||||
```py
|
||||
from transformers import TrainingArguments
|
||||
|
||||
training_args = TrainingArguments(
|
||||
output_dir="your-model",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=2,
|
||||
weight_decay=0.01,
|
||||
eval_strategy="epoch",
|
||||
save_strategy="epoch",
|
||||
load_best_model_at_end=True,
|
||||
push_to_hub=True,
|
||||
use_liger_kernel=True
|
||||
)
|
||||
```
|
||||
|
||||
The kernel supports the Llama, Gemma, Mistral, and Mixtral model architectures. The most up-to-date list of supported models can be found [here](https://github.com/linkedin/Liger-Kernel). When `use_liger_kernel` is set to `True`, the corresponding layers in the original model will be patched with Liger's efficient implementation, so you don't need to do anything extra other than setting the argument value.
|
||||
|
||||
## LOMO optimizer
|
||||
### LOMO optimizer
|
||||
|
||||
The LOMO optimizers have been introduced in [Full Parameter Fine-Tuning for Large Language Models with Limited Resources](https://hf.co/papers/2306.09782) and [AdaLomo: Low-memory Optimization with Adaptive Learning Rate](https://hf.co/papers/2310.10195).
|
||||
They both consist of an efficient full-parameter fine-tuning method. These optimizers fuse the gradient computation and the parameter update in one step to reduce memory usage. Supported optimizers for LOMO are `"lomo"` and `"adalomo"`. First either install LOMO from pypi `pip install lomo-optim` or install it from source with `pip install git+https://github.com/OpenLMLab/LOMO.git`.
|
||||
@ -467,7 +495,7 @@ trainer = trl.SFTTrainer(
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
## GrokAdamW optimizer
|
||||
### GrokAdamW optimizer
|
||||
|
||||
The GrokAdamW optimizer is designed to enhance training performance and stability, particularly for models that benefit from grokking signal functions. To use GrokAdamW, first install the optimizer package with `pip install grokadamw`.
|
||||
|
||||
@ -518,7 +546,7 @@ trainer.train()
|
||||
|
||||
This script demonstrates how to fine-tune the `google/gemma-2b` model on the IMDB dataset using the GrokAdamW optimizer. The `TrainingArguments` are configured to use GrokAdamW, and the dataset is passed to the `Trainer` for training.
|
||||
|
||||
## Schedule Free Optimizer
|
||||
### Schedule Free Optimizer
|
||||
|
||||
The Schedule Free optimizers have been introduced in [The Road Less Scheduled](https://hf.co/papers/2405.15682).
|
||||
Schedule-Free learning replaces the momentum of the base optimizer with a combination of averaging and interpolation, to completely remove the need to anneal the learning rate with a traditional schedule.
|
||||
|
||||
@ -287,9 +287,10 @@ model.fit(tf_dataset)
|
||||
At this point, you may need to restart your notebook or execute the following code to free some memory:
|
||||
|
||||
```py
|
||||
from accelerate.utils.memory import clear_device_cache
|
||||
del model
|
||||
del trainer
|
||||
torch.cuda.empty_cache()
|
||||
clear_device_cache()
|
||||
```
|
||||
|
||||
Next, manually postprocess `tokenized_dataset` to prepare it for training.
|
||||
@ -364,8 +365,9 @@ Lastly, specify `device` to use a GPU if you have access to one. Otherwise, trai
|
||||
|
||||
```py
|
||||
>>> import torch
|
||||
>>> from accelerate.test_utils.testing import get_backend
|
||||
|
||||
>>> device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
||||
>>> device, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
|
||||
>>> model.to(device)
|
||||
```
|
||||
|
||||
|
||||
@ -43,7 +43,7 @@ Como resultado, puedes cargar una versi├│n espec├нfica del modelo con el par├бme
|
||||
|
||||
```py
|
||||
>>> model = AutoModel.from_pretrained(
|
||||
... "julien-c/EsperBERTo-small", revision="v2.0.1" # tag name, or branch name, or commit hash
|
||||
... "julien-c/EsperBERTo-small", revision="4c77982" # tag name, or branch name, or commit hash
|
||||
... )
|
||||
```
|
||||
|
||||
|
||||
@ -1,3 +1,7 @@
|
||||
- sections:
|
||||
- local: pipeline_tutorial
|
||||
title: рдкрд╛рдЗрдкрд▓рд╛рдЗрдиреЛрдВ рдХреЗ рд╕рд╛рде рдЕрдиреБрдорд╛рди рдЪрд▓рд╛рдПрдБ
|
||||
title: рдкрд╛рдЗрдкрд▓рд╛рдЗрдиреЛрдВ рдХреЗ рд╕рд╛рде рдЕрдиреБрдорд╛рди рдЪрд▓рд╛рдПрдБ
|
||||
- local: accelerate
|
||||
title: ЁЯдЧ Accelerate рдХреЗ рд╕рд╛рде рд╡рд┐рддрд░рд┐рдд рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд╕реЗрдЯ рдХрд░реЗрдВ
|
||||
- local: tflite
|
||||
title: TFLite рдореЗрдВ рдирд┐рд░реНрдпрд╛рдд рдХрд░реЗрдВ
|
||||
136
docs/source/hi/accelerate.md
Normal file
136
docs/source/hi/accelerate.md
Normal file
@ -0,0 +1,136 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
тЪая╕П Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# рд╡рд┐рддрд░рд┐рдд рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдХреЗ рд╕рд╛рде ЁЯдЧ Accelerate
|
||||
|
||||
рдЬреИрд╕реЗ-рдЬреИрд╕реЗ рдореЙрдбрд▓ рдмрдбрд╝реЗ рд╣реЛрддреЗ рд╣реИрдВ, рд╕рдорд╛рдирд╛рдВрддрд░рддрд╛ рд╕реАрдорд┐рдд рд╣рд╛рд░реНрдбрд╡реЗрдпрд░ рдкрд░ рдмрдбрд╝реЗ рдореЙрдбрд▓ рдХреЛ рдкреНрд░рд╢рд┐рдХреНрд╖рд┐рдд рдХрд░рдиреЗ рдФрд░ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдХреА рдЧрддрд┐ рдХреЛ рдХрдИ рдЖрджреЗрд╢реЛрдВ рдХреЗ рдЖрдХрд╛рд░ рдореЗрдВ рддреЗрдЬ рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП рдПрдХ рд░рдгрдиреАрддрд┐ рдХреЗ рд░реВрдк рдореЗрдВ рдЙрднрд░реА рд╣реИред рд╣рдЧрд┐рдВрдЧ рдлреЗрд╕ рдореЗрдВ, рд╣рдордиреЗ рдЙрдкрдпреЛрдЧрдХрд░реНрддрд╛рдУрдВ рдХреЛ рдХрд┐рд╕реА рднреА рдкреНрд░рдХрд╛рд░ рдХреЗ рд╡рд┐рддрд░рд┐рдд рд╕реЗрдЯрдЕрдк рдкрд░ ЁЯдЧ рдЯреНрд░рд╛рдВрд╕рдлрд╛рд░реНрдорд░реНрд╕ рдореЙрдбрд▓ рдХреЛ рдЖрд╕рд╛рдиреА рд╕реЗ рдкреНрд░рд╢рд┐рдХреНрд╖рд┐рдд рдХрд░рдиреЗ рдореЗрдВ рдорджрдж рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП [ЁЯдЧ Accelerate](https://huggingface.co/docs/accelerate) рдкреБрд╕реНрддрдХрд╛рд▓рдп рдмрдирд╛рдпрд╛ рд╣реИ, рдЪрд╛рд╣реЗ рд╡рд╣ рдПрдХ рдорд╢реАрди рдкрд░ рдХрдИ GPU рд╣реЛрдВ рдпрд╛ рдХрдИ рдорд╢реАрдиреЛрдВ рдореЗрдВ рдХрдИ GPUред рдЗрд╕ рдЯреНрдпреВрдЯреЛрд░рд┐рдпрд▓ рдореЗрдВ, рдЬрд╛рдиреЗрдВ рдХрд┐ рдЕрдкрдиреЗ рдореВрд▓ PyTorch рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд▓реВрдк рдХреЛ рдХреИрд╕реЗ рдЕрдиреБрдХреВрд▓рд┐рдд рдХрд┐рдпрд╛ рдЬрд╛рдП рддрд╛рдХрд┐ рд╡рд┐рддрд░рд┐рдд рд╡рд╛рддрд╛рд╡рд░рдг рдореЗрдВ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд╕рдХреНрд╖рдо рд╣реЛ рд╕рдХреЗред
|
||||
|
||||
## рд╕реЗрдЯрдЕрдк
|
||||
|
||||
ЁЯдЧ Accelerate рд╕реНрдерд╛рдкрд┐рдд рдХрд░рдХреЗ рд╢реБрд░реВ рдХрд░реЗрдВ:
|
||||
|
||||
```bash
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
рдлрд┐рд░ рдПрдХ [`~accelerate.Accelerator`] рдСрдмреНрдЬреЗрдХреНрдЯ рдЖрдпрд╛рдд рдХрд░реЗрдВ рдФрд░ рдмрдирд╛рдПрдВред [`~accelerate.Accelerator`] рд╕реНрд╡рдЪрд╛рд▓рд┐рдд рд░реВрдк рд╕реЗ рдЖрдкрдХреЗ рд╡рд┐рддрд░рд┐рдд рд╕реЗрдЯрдЕрдк рдХреЗ рдкреНрд░рдХрд╛рд░ рдХрд╛ рдкрддрд╛ рд▓рдЧрд╛рдПрдЧрд╛ рдФрд░ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдХреЗ рд▓рд┐рдП рд╕рднреА рдЖрд╡рд╢реНрдпрдХ рдШрдЯрдХреЛрдВ рдХреЛ рдкреНрд░рд╛рд░рдВрдн рдХрд░реЗрдЧрд╛ред рдЖрдкрдХреЛ рдЕрдкрдиреЗ рдореЙрдбрд▓ рдХреЛ рдХрд┐рд╕реА рдбрд┐рд╡рд╛рдЗрд╕ рдкрд░ рд╕реНрдкрд╖реНрдЯ рд░реВрдк рд╕реЗ рд░рдЦрдиреЗ рдХреА рдЖрд╡рд╢реНрдпрдХрддрд╛ рдирд╣реАрдВ рд╣реИред
|
||||
|
||||
```py
|
||||
>>> from accelerate import Accelerator
|
||||
|
||||
>>> accelerator = Accelerator()
|
||||
```
|
||||
|
||||
## рддреЗрдЬреА рд▓рд╛рдиреЗ рдХреА рддреИрдпрд╛рд░реА
|
||||
|
||||
рдЕрдЧрд▓рд╛ рдХрджрдо рд╕рднреА рдкреНрд░рд╛рд╕рдВрдЧрд┐рдХ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд╡рд╕реНрддреБрдУрдВ рдХреЛ [`~accelerate.Accelerator.prepare`] рд╡рд┐рдзрд┐ рдореЗрдВ рдкрд╛рд╕ рдХрд░рдирд╛ рд╣реИред рдЗрд╕рдореЗрдВ рдЖрдкрдХреЗ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдФрд░ рдореВрд▓реНрдпрд╛рдВрдХрди DataLoaders, рдПрдХ рдореЙрдбрд▓ рдФрд░ рдПрдХ рдСрдкреНрдЯрд┐рдорд╛рдЗрдЬрд╝рд░ рд╢рд╛рдорд┐рд▓ рд╣реИрдВ:
|
||||
|
||||
```py
|
||||
>>> train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
|
||||
... train_dataloader, eval_dataloader, model, optimizer
|
||||
... )
|
||||
```
|
||||
|
||||
## рдмреИрдХрд╡рд░реНрдб
|
||||
|
||||
рдЕрдВрддрд┐рдо рдЬреЛрдбрд╝ рдпрд╣ рд╣реИ рдХрд┐ рдЖрдкрдХреЗ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд▓реВрдк рдореЗрдВ рд╕рд╛рдорд╛рдиреНрдп `loss.backward()` рдХреЛ ЁЯдЧ Accelerate рдХреЗ [`~accelerate.Accelerator.backward`] рд╡рд┐рдзрд┐ рд╕реЗ рдмрджрд▓реЗрдВ:
|
||||
|
||||
```py
|
||||
>>> for epoch in range(num_epochs):
|
||||
... for batch in train_dataloader:
|
||||
... outputs = model(**batch)
|
||||
... loss = outputs.loss
|
||||
... accelerator.backward(loss)
|
||||
|
||||
... optimizer.step()
|
||||
... lr_scheduler.step()
|
||||
... optimizer.zero_grad()
|
||||
... progress_bar.update(1)
|
||||
```
|
||||
|
||||
рдЬреИрд╕рд╛ рдХрд┐ рдЖрдк рдирд┐рдореНрдирд▓рд┐рдЦрд┐рдд рдХреЛрдб рдореЗрдВ рджреЗрдЦ рд╕рдХрддреЗ рд╣реИрдВ, рдЖрдкрдХреЛ рд╡рд┐рддрд░рд┐рдд рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд╕рдХреНрд╖рдо рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП рдЕрдкрдиреЗ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рд▓реВрдк рдореЗрдВ рдХреЗрд╡рд▓ рдЪрд╛рд░ рдЕрддрд┐рд░рд┐рдХреНрдд рдХреЛрдб рдХреА рдкрдВрдХреНрддрд┐рдпрд╛рдБ рдЬреЛрдбрд╝рдиреЗ рдХреА рдЖрд╡рд╢реНрдпрдХрддрд╛ рд╣реИ!
|
||||
|
||||
```diff
|
||||
+ from accelerate import Accelerator
|
||||
from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler
|
||||
|
||||
+ accelerator = Accelerator()
|
||||
|
||||
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
|
||||
optimizer = AdamW(model.parameters(), lr=3e-5)
|
||||
|
||||
- device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
||||
- model.to(device)
|
||||
|
||||
+ train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
|
||||
+ train_dataloader, eval_dataloader, model, optimizer
|
||||
+ )
|
||||
|
||||
num_epochs = 3
|
||||
num_training_steps = num_epochs * len(train_dataloader)
|
||||
lr_scheduler = get_scheduler(
|
||||
"linear",
|
||||
optimizer=optimizer,
|
||||
num_warmup_steps=0,
|
||||
num_training_steps=num_training_steps
|
||||
)
|
||||
|
||||
progress_bar = tqdm(range(num_training_steps))
|
||||
|
||||
model.train()
|
||||
for epoch in range(num_epochs):
|
||||
for batch in train_dataloader:
|
||||
- batch = {k: v.to(device) for k, v in batch.items()}
|
||||
outputs = model(**batch)
|
||||
loss = outputs.loss
|
||||
- loss.backward()
|
||||
+ accelerator.backward(loss)
|
||||
|
||||
optimizer.step()
|
||||
lr_scheduler.step()
|
||||
optimizer.zero_grad()
|
||||
progress_bar.update(1)
|
||||
```
|
||||
|
||||
## рдкреНрд░рд╢рд┐рдХреНрд╖рдг
|
||||
|
||||
рдПрдХ рдмрд╛рд░ рдЬрдм рдЖрдкрдиреЗ рдкреНрд░рд╛рд╕рдВрдЧрд┐рдХ рдХреЛрдб рдХреА рдкрдВрдХреНрддрд┐рдпрд╛рдБ рдЬреЛрдбрд╝ рджреА рд╣реИрдВ, рддреЛ рдЕрдкрдиреЗ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдХреЛ рд╕реНрдХреНрд░рд┐рдкреНрдЯ рдпрд╛ рдХреЛрд▓реИрдмреЛрд░реЗрдЯрд░реА рдЬреИрд╕реЗ рдиреЛрдЯрдмреБрдХ рдореЗрдВ рд▓реЙрдиреНрдЪ рдХрд░реЗрдВред
|
||||
|
||||
### рд╕реНрдХреНрд░рд┐рдкреНрдЯ рдХреЗ рд╕рд╛рде рдкреНрд░рд╢рд┐рдХреНрд╖рдг
|
||||
|
||||
рдпрджрд┐ рдЖрдк рд╕реНрдХреНрд░рд┐рдкреНрдЯ рд╕реЗ рдЕрдкрдирд╛ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдЪрд▓рд╛ рд░рд╣реЗ рд╣реИрдВ, рддреЛ рдПрдХ рдХреЙрдиреНрдлрд╝рд┐рдЧрд░реЗрд╢рди рдлрд╝рд╛рдЗрд▓ рдмрдирд╛рдиреЗ рдФрд░ рд╕рд╣реЗрдЬрдиреЗ рдХреЗ рд▓рд┐рдП рдирд┐рдореНрдирд▓рд┐рдЦрд┐рдд рдХрдорд╛рдВрдб рдЪрд▓рд╛рдПрдБ:
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
рдлрд┐рд░ рдЕрдкрдиреЗ рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдХреЛ рдЗрд╕ рддрд░рд╣ рд▓реЙрдиреНрдЪ рдХрд░реЗрдВ:
|
||||
|
||||
```bash
|
||||
accelerate launch train.py
|
||||
```
|
||||
|
||||
### рдиреЛрдЯрдмреБрдХ рдХреЗ рд╕рд╛рде рдкреНрд░рд╢рд┐рдХреНрд╖рдг
|
||||
|
||||
ЁЯдЧ Accelerate рдПрдХ рдиреЛрдЯрдмреБрдХ рдореЗрдВ рднреА рдЪрд▓ рд╕рдХрддрд╛ рд╣реИ рдпрджрд┐ рдЖрдк Colaboratory рдХреЗ TPU рдХрд╛ рдЙрдкрдпреЛрдЧ рдХрд░рдиреЗ рдХреА рдпреЛрдЬрдирд╛ рдмрдирд╛ рд░рд╣реЗ рд╣реИрдВред рдкреНрд░рд╢рд┐рдХреНрд╖рдг рдХреЗ рд▓рд┐рдП рдЬрд┐рдореНрдореЗрджрд╛рд░ рд╕рднреА рдХреЛрдб рдХреЛ рдПрдХ рдлрд╝рдВрдХреНрд╢рди рдореЗрдВ рд▓рдкреЗрдЯреЗрдВ, рдФрд░ рдЗрд╕реЗ [`~accelerate.notebook_launcher`] рдореЗрдВ рдкрд╛рд╕ рдХрд░реЗрдВ:
|
||||
|
||||
```py
|
||||
>>> from accelerate import notebook_launcher
|
||||
|
||||
>>> notebook_launcher(training_function)
|
||||
```
|
||||
|
||||
ЁЯдЧ Accelerate рдФрд░ рдЗрд╕рдХреА рд╕рдореГрджреНрдз рд╕реБрд╡рд┐рдзрд╛рдУрдВ рдХреЗ рдмрд╛рд░реЗ рдореЗрдВ рдЕрдзрд┐рдХ рдЬрд╛рдирдХрд╛рд░реА рдХреЗ рд▓рд┐рдП, [рджрд╕реНрддрд╛рд╡реЗрдЬрд╝реАрдХрд░рдг](https://huggingface.co/docs/accelerate) рджреЗрдЦреЗрдВред
|
||||
55
docs/source/hi/tflite.md
Normal file
55
docs/source/hi/tflite.md
Normal file
@ -0,0 +1,55 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
тЪая╕П Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# TFLite рдореЗрдВ рдирд┐рд░реНрдпрд╛рдд рдХрд░реЗрдВ
|
||||
|
||||
[TensorFlow Lite](https://www.tensorflow.org/lite/guide) рдПрдХ рд╣рд▓реНрдХрд╛ рдврд╛рдВрдЪрд╛ рд╣реИ рдЬреЛ рдорд╢реАрди рд▓рд░реНрдирд┐рдВрдЧ рдореЙрдбрд▓ рдХреЛ рд╕рдВрд╕рд╛рдзрди-рд╕реАрдорд┐рдд рдЙрдкрдХрд░рдгреЛрдВ, рдЬреИрд╕реЗ рдореЛрдмрд╛рдЗрд▓ рдлреЛрди, рдПрдореНрдмреЗрдбреЗрдб рд╕рд┐рд╕реНрдЯрдо рдФрд░ рдЗрдВрдЯрд░рдиреЗрдЯ рдСрдл рдерд┐рдВрдЧреНрд╕ (IoT) рдЙрдкрдХрд░рдгреЛрдВ рдкрд░ рддреИрдирд╛рдд рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП рд╣реИред TFLite рдХреЛ рдЗрди рдЙрдкрдХрд░рдгреЛрдВ рдкрд░ рд╕реАрдорд┐рдд рдЧрдгрдирд╛рддреНрдордХ рд╢рдХреНрддрд┐, рдореЗрдореЛрд░реА рдФрд░ рдКрд░реНрдЬрд╛ рдЦрдкрдд рдХреЗ рд╕рд╛рде рдореЙрдбрд▓ рдХреЛ рдХреБрд╢рд▓рддрд╛ рд╕реЗ рдСрдкреНрдЯрд┐рдорд╛рдЗрдЬрд╝ рдФрд░ рдЪрд▓рд╛рдиреЗ рдХреЗ рд▓рд┐рдП рдбрд┐рдЬрд╝рд╛рдЗрди рдХрд┐рдпрд╛ рдЧрдпрд╛ рд╣реИред рдПрдХ TensorFlow Lite рдореЙрдбрд▓ рдХреЛ рдПрдХ рд╡рд┐рд╢реЗрд╖ рдХреБрд╢рд▓ рдкреЛрд░реНрдЯреЗрдмрд▓ рдкреНрд░рд╛рд░реВрдк рдореЗрдВ рджрд░реНрд╢рд╛рдпрд╛ рдЬрд╛рддрд╛ рд╣реИ рдЬрд┐рд╕реЗ `.tflite` рдлрд╝рд╛рдЗрд▓ рдПрдХреНрд╕рдЯреЗрдВрд╢рди рджреНрд╡рд╛рд░рд╛ рдкрд╣рдЪрд╛рдирд╛ рдЬрд╛рддрд╛ рд╣реИред
|
||||
|
||||
ЁЯдЧ Optimum рдореЗрдВ `exporters.tflite` рдореЙрдбреНрдпреВрд▓ рдХреЗ рдорд╛рдзреНрдпрдо рд╕реЗ ЁЯдЧ Transformers рдореЙрдбрд▓ рдХреЛ TFLite рдореЗрдВ рдирд┐рд░реНрдпрд╛рдд рдХрд░рдиреЗ рдХреА рдХрд╛рд░реНрдпрдХреНрд╖рдорддрд╛ рд╣реИред рд╕рдорд░реНрдерд┐рдд рдореЙрдбрд▓ рдЖрд░реНрдХрд┐рдЯреЗрдХреНрдЪрд░ рдХреА рд╕реВрдЪреА рдХреЗ рд▓рд┐рдП, рдХреГрдкрдпрд╛ [ЁЯдЧ Optimum рджрд╕реНрддрд╛рд╡реЗрдЬрд╝](https://huggingface.co/docs/optimum/exporters/tflite/overview) рджреЗрдЦреЗрдВред
|
||||
|
||||
TFLite рдореЗрдВ рдПрдХ рдореЙрдбрд▓ рдирд┐рд░реНрдпрд╛рдд рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП, рдЖрд╡рд╢реНрдпрдХ рдирд┐рд░реНрднрд░рддрд╛рдПрдБ рд╕реНрдерд╛рдкрд┐рдд рдХрд░реЗрдВ:
|
||||
|
||||
```bash
|
||||
pip install optimum[exporters-tf]
|
||||
```
|
||||
|
||||
рд╕рднреА рдЙрдкрд▓рдмреНрдз рддрд░реНрдХреЛрдВ рдХреА рдЬрд╛рдВрдЪ рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП, [ЁЯдЧ Optimum рджрд╕реНрддрд╛рд╡реЗрдЬрд╝](https://huggingface.co/docs/optimum/main/en/exporters/tflite/usage_guides/export_a_model) рджреЗрдЦреЗрдВ,
|
||||
рдпрд╛ рдХрдорд╛рдВрдб рд▓рд╛рдЗрди рдореЗрдВ рдорджрдж рджреЗрдЦреЗрдВ:
|
||||
|
||||
```bash
|
||||
optimum-cli export tflite --help
|
||||
```
|
||||
|
||||
рдпрджрд┐ рдЖрдк ЁЯдЧ Hub рд╕реЗ рдПрдХ рдореЙрдбрд▓ рдХрд╛ рдЪреЗрдХрдкреЙрдЗрдВрдЯ рдирд┐рд░реНрдпрд╛рдд рдХрд░рдирд╛ рдЪрд╛рд╣рддреЗ рд╣реИрдВ, рдЙрджрд╛рд╣рд░рдг рдХреЗ рд▓рд┐рдП, `google-bert/bert-base-uncased`, рдирд┐рдореНрдирд▓рд┐рдЦрд┐рдд рдХрдорд╛рдВрдб рдЪрд▓рд╛рдПрдБ:
|
||||
|
||||
```bash
|
||||
optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
|
||||
```
|
||||
|
||||
рдЖрдкрдХреЛ рдкреНрд░рдЧрддрд┐ рдХреЛ рджрд░реНрд╢рд╛рддреЗ рд╣реБрдП рд▓реЙрдЧ рджрд┐рдЦрд╛рдИ рджреЗрдВрдЧреЗ рдФрд░ рдпрд╣ рджрд┐рдЦрд╛рдПрдВрдЧреЗ рдХрд┐ рдкрд░рд┐рдгрд╛рдорд╕реНрд╡рд░реВрдк `model.tflite` рдХрд╣рд╛рдБ рд╕рд╣реЗрдЬрд╛ рдЧрдпрд╛ рд╣реИ, рдЬреИрд╕реЗ:
|
||||
|
||||
```bash
|
||||
Validating TFLite model...
|
||||
-[тЬУ] TFLite model output names match reference model (logits)
|
||||
- Validating TFLite Model output "logits":
|
||||
-[тЬУ] (1, 128, 30522) matches (1, 128, 30522)
|
||||
-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
|
||||
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
|
||||
- logits: max diff = 5.817413330078125e-05.
|
||||
The exported model was saved at: bert_tflite
|
||||
```
|
||||
|
||||
рдЙрдкрд░реЛрдХреНрдд рдЙрджрд╛рд╣рд░рдг ЁЯдЧ Hub рд╕реЗ рдПрдХ рдЪреЗрдХрдкреЙрдЗрдВрдЯ рдирд┐рд░реНрдпрд╛рдд рдХрд░рдиреЗ рдХреЛ рджрд░реНрд╢рд╛рддрд╛ рд╣реИред рдЬрдм рдПрдХ рд╕реНрдерд╛рдиреАрдп рдореЙрдбрд▓ рдирд┐рд░реНрдпрд╛рдд рдХрд░рддреЗ рд╣реИрдВ, рддреЛ рдкрд╣рд▓реЗ рд╕реБрдирд┐рд╢реНрдЪрд┐рдд рдХрд░реЗрдВ рдХрд┐ рдЖрдкрдиреЗ рдореЙрдбрд▓ рдХреЗ рд╡рдЬрд╝рди рдФрд░ рдЯреЛрдХрдирд╛рдЗрдЬрд╝рд░ рдлрд╝рд╛рдЗрд▓реЛрдВ рдХреЛ рдПрдХ рд╣реА рдирд┐рд░реНрджреЗрд╢рд┐рдХрд╛ (`local_path`) рдореЗрдВ рд╕рд╣реЗрдЬрд╛ рд╣реИред CLI рдХрд╛ рдЙрдкрдпреЛрдЧ рдХрд░рддреЗ рд╕рдордп, рдЪреЗрдХрдкреЙрдЗрдВрдЯ рдирд╛рдо рдХреЗ рдмрдЬрд╛рдп `model` рддрд░реНрдХ рдореЗрдВ `local_path` рдкрд╛рд╕ рдХрд░реЗрдВред
|
||||
@ -43,7 +43,7 @@ Come risultato, puoi caricare una specifica versione di un modello con il parame
|
||||
|
||||
```py
|
||||
>>> model = AutoModel.from_pretrained(
|
||||
... "julien-c/EsperBERTo-small", revision="v2.0.1" # nome di un tag, di un branch, o commit hash
|
||||
... "julien-c/EsperBERTo-small", revision="4c77982" # nome di un tag, di un branch, o commit hash
|
||||
... )
|
||||
```
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user