Files
pytorch/functorch/docs/source/notebooks/ensembling.ipynb
Klaus Zimmermann f16053f0c9 Switch to standard pep517 sdist generation (#152098)
Generate source tarball with PEP 517 conform build tools instead of the custom routine in place right now.

Closes #150461.

The current procedure for generating the source tarball consists in creation of a source tree by manual copying and pruning of source files.

This PR replaces that with a call to the standard [build tool](https://build.pypa.io/en/stable/), which works with the build backend to produce an sdist. For that to work correctly, the build backend also needs to be configured. In the case of Pytorch, the backend currently is (the legacy version of) the setuptools backend, the source dist part of which is mostly configured via the `MANIFEST.in` file.

The resulting source distribution can be used to install directly from source with `pip install ./torch-{version}.tar.gz` or to build wheels directly from source with `pip wheel ./torch-{version}.tar.gz`; both should be considered experimental for now.

## Issues

### sdist name
According to PEP 517, the name of the source distribution file must coincide with the project name, or [more precisely](https://peps.python.org/pep-0517/#source-distributions), the source distribution of a project that generates `{NAME}-{...}.whl` wheels are required to be named `{NAME}-{...}.tar.gz`. Currently, the source tarball is called `pytorch-{...}.tar.gz`, but the generated wheels and python package are called `torch-{...}`.

### Symbolic Links
The source tree at the moment contains a small number of symbolic links. This [has been seen as problematic](https://github.com/pypa/pip/issues/5919) largely because of lack of support on Windows, but also because of [a problem in setuptools](https://github.com/pypa/setuptools/issues/4937). Particularly unfortunate is a circular symlink in the third party `ittapi` module, which can not be resolved by replacing it with a copy.

PEP 721 (now integrated in the [Source Distribution Format Specification](https://packaging.python.org/en/latest/specifications/source-distribution-format/#source-distribution-archive-features)) allows for symbolic links, but only if they don't point outside the destination directory and if they don't contain `../` in their target.

The list of symbolic links currently is as follows:

<details>

|source|target|problem|solution|
|-|-|-|-|
| `.dockerignore` | `.gitignore` |  ok (individual file) ||
| `docs/requirements.txt` | `../.ci/docker/requirements-docs.txt` |`..` in target|swap source and target[^1]|
| `functorch/docs/source/notebooks` | `../../notebooks/` |`..` in target|swap source and target[^1]|
| `.github/ci_commit_pins/triton.txt` | `../../.ci/docker/ci_commit_pins/triton.txt` |  ok (omitted from sdist)||
| `third_party/flatbuffers/docs/source/CONTRIBUTING.md` | `../../CONTRIBUTING.md` |`..` in target|omit from sdist[^2]|
| `third_party/flatbuffers/java/src/test/java/DictionaryLookup` | `../../../../tests/DictionaryLookup` |`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/MyGame` | `../../../../tests/MyGame` |`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/NamespaceA` | `../../../../tests/namespace_test/NamespaceA` |`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/NamespaceC` | `../../../../tests/namespace_test/NamespaceC` |`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/optional_scalars` | `../../../../tests/optional_scalars` |`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/union_vector` | `../../../../tests/union_vector` |`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/kotlin/benchmark/src/jvmMain/java` | `../../../../java/src/main/java` |`..` in target|omit from sdist[^3]|
| `third_party/ittapi/rust/ittapi-sys/c-library` | `../../` |`..` in target|omit from sdist[^4]|
| `third_party/ittapi/rust/ittapi-sys/LICENSES` | `../../LICENSES` |`..` in target|omit from sdist[^4]|
| `third_party/opentelemetry-cpp/buildscripts/pre-merge-commit` | `./pre-commit` | ok (individual file)||
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-cmake/sample_client.cc` | `../../push/tests/integration/sample_client.cc` |`..` in target|omit from sdist[^5]|
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-cmake/sample_server.cc` | `../../pull/tests/integration/sample_server.cc` |`..` in target|omit from sdist[^5]|
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-pkgconfig/sample_client.cc` | `../../push/tests/integration/sample_client.cc` |`..` in target|omit from sdist[^5]|
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-pkgconfig/sample_server.cc` | `../../pull/tests/integration/sample_server.cc` |`..` in target|omit from sdist[^5]|
| `third_party/XNNPACK/tools/xngen` | `xngen.py` |  ok (individual file)||

</details>

The introduction of symbolic links inside the `.ci/docker` folder creates a new problem, however, because Docker's `COPY` command does not allow symlinks in this way. We work around that by using `tar ch` to dereference the symlinks before handing them over to `docker build`.

[^1]: These resources can be naturally considered to be part of the docs, so moving the actual files into the place of the current symlinks and replacing them with (unproblematic) symlinks can be said to improve semantics as well.

[^2]: The flatbuffers docs already actually use the original file, not the symlink and in the most recent releases, starting from flatbuffers-25.1.21 the symlink is replaced by the actual file thanks to a documentation overhaul.

[^3]: These resources are flatbuffers tests for java and kotlin and can be omitted from our sdist.

[^4]: We don't need to ship the rust bindings for ittapi.

[^5]: These are demonstration examples for how to link to prometheus-cpp using cmake and can be omitted.

### Nccl
Nccl used to be included as a submodule. However, with #146073 (first released in v2.7.0-rc1), the submodule was removed and replaced with a build time checkout procedure in `tools/build_pytorch_libs.py`, which checks out the required version of nccl from the upstream repository based on a commit pin recorded in `.ci/docker/ci_commit_pins/nccl-cu{11,12}.txt`.
This means that a crucial third party dependency is missing from the source distribution and as the `.ci` folder is omitted from the source distribution, it is not possible to use the build time download.
However, it *is* possible to use a system provided Nccl using the `USE_SYSTEM_NCCL` environment variable, which now also is the default for the official Pytorch wheels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152098
Approved by: https://github.com/atalman
2025-06-30 19:07:34 +00:00

392 lines
12 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "de1548fb-a313-4e9c-ae5d-8ec4c12ddd94",
"metadata": {
"id": "de1548fb-a313-4e9c-ae5d-8ec4c12ddd94"
},
"source": [
"# Model ensembling\n",
"\n",
"This example illustrates how to vectorize model ensembling using vmap.\n",
"\n",
"<a href=\"https://colab.research.google.com/github/pytorch/pytorch/blob/master/functorch/notebooks/ensembling.ipynb\">\n",
" <img style=\"width: auto\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>\n",
"\n",
"## What is model ensembling?\n",
"Model ensembling combines the predictions from multiple models together.\n",
"Traditionally this is done by running each model on some inputs separately\n",
"and then combining the predictions. However, if you're running models with\n",
"the same architecture, then it may be possible to combine them together\n",
"using `vmap`. `vmap` is a function transform that maps functions across\n",
"dimensions of the input tensors. One of its use cases is eliminating\n",
"for-loops and speeding them up through vectorization.\n",
"\n",
"Let's demonstrate how to do this using an ensemble of simple MLPs."
]
},
{
"cell_type": "code",
"source": [
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"from functools import partial\n",
"torch.manual_seed(0);"
],
"metadata": {
"id": "Gb-yt4VKUUuc"
},
"execution_count": null,
"outputs": [],
"id": "Gb-yt4VKUUuc"
},
{
"cell_type": "code",
"source": [
"# Here's a simple MLP\n",
"class SimpleMLP(nn.Module):\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.fc1 = nn.Linear(784, 128)\n",
" self.fc2 = nn.Linear(128, 128)\n",
" self.fc3 = nn.Linear(128, 10)\n",
"\n",
" def forward(self, x):\n",
" x = x.flatten(1)\n",
" x = self.fc1(x)\n",
" x = F.relu(x)\n",
" x = self.fc2(x)\n",
" x = F.relu(x)\n",
" x = self.fc3(x)\n",
" return x\n"
],
"metadata": {
"id": "tf-HKHjUUbyY"
},
"execution_count": null,
"outputs": [],
"id": "tf-HKHjUUbyY"
},
{
"cell_type": "markdown",
"source": [
"Lets generate a batch of dummy data and pretend that were working with an MNIST dataset. Thus, the dummy images are 28 by 28, and we have a minibatch of size 64. Furthermore, lets say we want to combine the predictions from 10 different models. \n"
],
"metadata": {
"id": "VEDPe-EoU5Fa"
},
"id": "VEDPe-EoU5Fa"
},
{
"cell_type": "code",
"source": [
"device = 'cuda'\n",
"num_models = 10\n",
"\n",
"data = torch.randn(100, 64, 1, 28, 28, device=device)\n",
"targets = torch.randint(10, (6400,), device=device)\n",
"\n",
"models = [SimpleMLP().to(device) for _ in range(num_models)]"
],
"metadata": {
"id": "WB2Qe3AHUvPN"
},
"execution_count": null,
"outputs": [],
"id": "WB2Qe3AHUvPN"
},
{
"cell_type": "markdown",
"source": [
"We have a couple of options for generating predictions. Maybe we want to give each model a different randomized minibatch of data. Alternatively, maybe we want to run the same minibatch of data through each model (e.g. if we were testing the effect of different model initializations).\n",
"\n",
"\n",
"\n"
],
"metadata": {
"id": "GOGJ-OUxVcT5"
},
"id": "GOGJ-OUxVcT5"
},
{
"cell_type": "markdown",
"source": [
"Option 1: different minibatch for each model"
],
"metadata": {
"id": "CwJBb09MxCN3"
},
"id": "CwJBb09MxCN3"
},
{
"cell_type": "code",
"source": [
"minibatches = data[:num_models]\n",
"predictions_diff_minibatch_loop = [model(minibatch) for model, minibatch in zip(models, minibatches)]"
],
"metadata": {
"id": "WYjMx8QTUvRu"
},
"execution_count": null,
"outputs": [],
"id": "WYjMx8QTUvRu"
},
{
"cell_type": "markdown",
"source": [
"Option 2: Same minibatch"
],
"metadata": {
"id": "HNw4_IVzU5Pz"
},
"id": "HNw4_IVzU5Pz"
},
{
"cell_type": "code",
"source": [
"minibatch = data[0]\n",
"predictions2 = [model(minibatch) for model in models]"
],
"metadata": {
"id": "vUsb3VfexJrY"
},
"execution_count": null,
"outputs": [],
"id": "vUsb3VfexJrY"
},
{
"cell_type": "markdown",
"source": [
"## Using vmap to vectorize the ensemble\n",
"\n",
"\n",
"\n",
"\n"
],
"metadata": {
"id": "aNkX6lFIxzcm"
},
"id": "aNkX6lFIxzcm"
},
{
"cell_type": "markdown",
"source": [
"Lets use vmap to speed up the for-loop. We must first prepare the models for use with vmap.\n",
"\n",
"First, lets combine the states of the model together by stacking each parameter. For example, `model[i].fc1.weight` has shape `[784, 128]`; we are going to stack the .fc1.weight of each of the 10 models to produce a big weight of shape `[10, 784, 128]`.\n",
"\n",
"functorch offers the 'combine_state_for_ensemble' convenience function to do that. It returns a stateless version of the model (fmodel) and stacked parameters and buffers.\n",
"\n"
],
"metadata": {
"id": "-sFMojhryviM"
},
"id": "-sFMojhryviM"
},
{
"cell_type": "code",
"source": [
"from functorch import combine_state_for_ensemble\n",
"\n",
"fmodel, params, buffers = combine_state_for_ensemble(models)\n",
"[p.requires_grad_() for p in params];\n"
],
"metadata": {
"id": "C3a9_clvyPho"
},
"execution_count": null,
"outputs": [],
"id": "C3a9_clvyPho"
},
{
"cell_type": "markdown",
"source": [
"Option 1: get predictions using a different minibatch for each model. \n",
"\n",
"By default, vmap maps a function across the first dimension of all inputs to the passed-in function. After using the combine_state_for_ensemble, each of the params and buffers have an additional dimension of size 'num_models' at the front, and minibatches has a dimension of size 'num_models'.\n",
"\n",
"\n",
"\n",
"\n"
],
"metadata": {
"id": "mFJDWMM9yaYZ"
},
"id": "mFJDWMM9yaYZ"
},
{
"cell_type": "code",
"source": [
"print([p.size(0) for p in params]) # show the leading 'num_models' dimension\n",
"\n",
"assert minibatches.shape == (num_models, 64, 1, 28, 28) # verify minibatch has leading dimension of size 'num_models'"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ezuFQx1G1zLG",
"outputId": "ab260da3-77f2-4ff9-d843-e0d0f1e0a884"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[10, 10, 10, 10, 10, 10]\n"
]
}
],
"id": "ezuFQx1G1zLG"
},
{
"cell_type": "code",
"source": [
"from functorch import vmap\n",
"\n",
"predictions1_vmap = vmap(fmodel)(params, buffers, minibatches)\n",
"\n",
"# verify the vmap predictions match the \n",
"assert torch.allclose(predictions1_vmap, torch.stack(predictions_diff_minibatch_loop), atol=1e-3, rtol=1e-5)"
],
"metadata": {
"id": "VroLnfD82DDf"
},
"execution_count": null,
"outputs": [],
"id": "VroLnfD82DDf"
},
{
"cell_type": "markdown",
"source": [
"Option 2: get predictions using the same minibatch of data.\n",
"\n",
"vmap has an in_dims arg that specifies which dimensions to map over. By using `None`, we tell vmap we want the same minibatch to apply for all of the 10 models.\n",
"\n",
"\n"
],
"metadata": {
"id": "tlkmyQyfY6XU"
},
"id": "tlkmyQyfY6XU"
},
{
"cell_type": "code",
"source": [
"predictions2_vmap = vmap(fmodel, in_dims=(0, 0, None))(params, buffers, minibatch)\n",
"\n",
"assert torch.allclose(predictions2_vmap, torch.stack(predictions2), atol=1e-3, rtol=1e-5)"
],
"metadata": {
"id": "WiSMupvCyecd"
},
"execution_count": null,
"outputs": [],
"id": "WiSMupvCyecd"
},
{
"cell_type": "markdown",
"source": [
"A quick note: there are limitations around what types of functions can be transformed by vmap. The best functions to transform are ones that are pure functions: a function where the outputs are only determined by the inputs that have no side effects (e.g. mutation). vmap is unable to handle mutation of arbitrary Python data structures, but it is able to handle many in-place PyTorch operations."
],
"metadata": {
"id": "KrXQsUCIGLWm"
},
"id": "KrXQsUCIGLWm"
},
{
"cell_type": "markdown",
"source": [
"## Performance\n",
"\n",
"Curious about performance numbers? Here's how the numbers look on Google Colab."
],
"metadata": {
"id": "MCjBhMrVF5hH"
},
"id": "MCjBhMrVF5hH"
},
{
"cell_type": "code",
"source": [
"from torch.utils.benchmark import Timer\n",
"without_vmap = Timer(\n",
" stmt=\"[model(minibatch) for model, minibatch in zip(models, minibatches)]\",\n",
" globals=globals())\n",
"with_vmap = Timer(\n",
" stmt=\"vmap(fmodel)(params, buffers, minibatches)\",\n",
" globals=globals())\n",
"print(f'Predictions without vmap {without_vmap.timeit(100)}')\n",
"print(f'Predictions with vmap {with_vmap.timeit(100)}')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gJPrGdS0GBjz",
"outputId": "04e75950-b964-419c-fa9c-f1590e0081bb"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Predictions without vmap <torch.utils.benchmark.utils.common.Measurement object at 0x7fe22c58b3d0>\n",
"[model(minibatch) for model, minibatch in zip(models, minibatches)]\n",
" 3.25 ms\n",
" 1 measurement, 100 runs , 1 thread\n",
"Predictions with vmap <torch.utils.benchmark.utils.common.Measurement object at 0x7fe22c50c450>\n",
"vmap(fmodel)(params, buffers, minibatches)\n",
" 879.28 us\n",
" 1 measurement, 100 runs , 1 thread\n"
]
}
],
"id": "gJPrGdS0GBjz"
},
{
"cell_type": "markdown",
"source": [
"There's a large speedup using vmap! \n",
"\n",
"In general, vectorization with vmap should be faster than running a function in a for-loop and competitive with manual batching. There are some exceptions though, like if we havent implemented the vmap rule for a particular operation or if the underlying kernels werent optimized for older hardware (GPUs). If you see any of these cases, please let us know by opening an issue at our [GitHub](https://github.com/pytorch/functorch)!\n",
"\n"
],
"metadata": {
"id": "UI74G9JarQU8"
},
"id": "UI74G9JarQU8"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"colab": {
"name": "ensembling.ipynb",
"provenance": []
}
},
"nbformat": 4,
"nbformat_minor": 5
}