Summary:
Support special case that data size can be 0 for SegmentReduce.
Example code below:
```
x = torch.ones((0, 6)).cuda()
lengths = torch.tensor([0, 0]).cuda()
torch.segment_reduce(x, "sum", lengths=lengths, unsafe=False, initial=0)
```
Previously, error message: Expected data.numel() > 0 to be true, but got false.
Now expect to return 0.
Test Plan: contbuild & OSS CI
Differential Revision: D45133827
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99733
Approved by: https://github.com/ngimel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61266
Same as title.
Mainly this concludes the initially planned features from the op. Only missing functionality is to do reduction on any axis (currently axis 0 only is supported).
Test Plan: Updated unit test.
Reviewed By: ngimel
Differential Revision: D29552037
fbshipit-source-id: 023c7cbf750a0671f76082708f14c05739dda07a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61141
Currently only long is supported. This diff adds support for other index type.
Next Steps:
- Update default, refactor unit test and test non_initial value as well
- Cleanup (more tests, benchmark, update documentation)
Test Plan: updated unit test. rely on CI.
Reviewed By: ngimel
Differential Revision: D29526308
fbshipit-source-id: b4043603483851ef7e0e93b0bb02ac7849c6449d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60379
This concludes the support for all reductions types initially planned (min, max, mean, sum).
Next Steps:
- Cleanups
- update default values when length is 0 and initial not given
- templatize the code to avoid branching with every item.( and other known improvements)
- more unit tests, verification
- benchmarking
Test Plan: updated unit tests.
Reviewed By: ngimel
Differential Revision: D29268218
fbshipit-source-id: c77d91671e01dcf96c18c758fa3ea522b2e13db9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60018
Same as title. This diff finishes cuda support for currently implemented reductions and input parameters.
Next Steps:
- Add support for sum/min
- More testing and benchmarking
- Cleanup
- Update default values when length is 0
- Use TensorIterator
- Update documentation
Test Plan: Unit test to cover cuda forward path.
Reviewed By: ngimel
Differential Revision: D29135373
fbshipit-source-id: d070727eeb660f56782e7ac8a5b0798be688480a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59951
Add support for multi-d input for cpu forward/backward implementation.
Next step: Adding cuda support for multi-d input.
Test Plan: Added unit tests.
Reviewed By: ngimel
Differential Revision: D29105457
fbshipit-source-id: a389ba4cc10f02434a336b8e7d36259f32552e11
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59543
Building on top of previous PR: https://github.com/pytorch/pytorch/pull/59521
This diff is adding support for mean reduction for Cuda (fwd only currently).
Will add cuda backward implementation in subsequent PR.
Next Steps:
cuda backward support for mean
2d data input support
more testing
benchmarking
Test Plan: update unit test to cover this part as well.
Reviewed By: ngimel
Differential Revision: D28922838
fbshipit-source-id: 72b7e5e79db967116b96ad010f290c9f057232d4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59521
This diff is adding support for mean reduction for CPU (fwd + bckwd).
Will add cuda implementation in subsequent PR. We are using "cub::DeviceSegmentedReduce" for other aggregation, trying to see how to support mean or will write custom kernel for it.
Next Steps:
- cuda support for mean
- 2d data input support
- more testing
- benchmarking
Test Plan: updated unit test. Still relying on manual data for ease of debugging. Will add more tests that covers edge cases once major features are complete.
Reviewed By: ngimel
Differential Revision: D28922547
fbshipit-source-id: 2fad53bbad2cce714808ff95759cbdbd45bb4ce6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56923
Next Steps in order:
- Add backward support for CUDA
- Add support for more aggregation types
- Benchmarking (for cuda mainly)/more testing/documentation
- Support for multi dimension
Test Plan: Updated unit test to include 0 length segment as well.
Reviewed By: ngimel
Differential Revision: D27992228
fbshipit-source-id: 28851811f8a784a63162721c511d69e617a93727
Summary:
This is to setup boiler plate code for backward and CPU implementation.
Next Steps in order:
- Add backward support for CUDA
- Add support for more aggregation types
- Benchmarking (for cuda mainly)/more testing/documentation
- Support for multi dimension
Test Plan:
Updated unit test to also check correctness of backward.
Wait for CI signal
Reviewed By: ngimel
Differential Revision: D27970340
fbshipit-source-id: 3e608c7fe3628b0a761dd8affc6aad8f65a6ef7f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56704
This is re submit of PR: https://github.com/pytorch/pytorch/pull/54175
Main changes compared to original PR:
- Switch to importing "<ATen/cuda/cub.cuh>"
- Use CUB_WRAPPER to reduce boiler plate code.
Test Plan:
Will check CI status to make sure a
Added unit test
Reviewed By: ngimel
Differential Revision: D27941257
fbshipit-source-id: 24a0e0c7f6c46126d2606fe42ed03dca15684415
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54175
Building on top of previous PR. This PR adds cuda support for 1D max reduction.
Next steps:
- Add support for other major reduction types (e.g. min, sum) for 1D tensor
- Documentation for the op
- Perf optimizations and benchmark util
- Backward support (not high priority)
- Support for multi dimensional tensors (on data and lengths) (not high priority)
- Support for 'indices' (not high priority)
Test Plan: Added unit test
Reviewed By: ngimel
Differential Revision: D27121170
fbshipit-source-id: 1c2565f42e2903e6fc089d56983ce8857efbfa3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53727
This is first diff to add native support for segment reduction in PyTorch. It provides similar functionality like torch.scatter or "numpy.ufunc.reduceat".
This diff mainly focuses on API layer to make sure future improvements will not cause backward compatibility issues. Once API is settled, here are next steps I am planning:
- Add support for other major reduction types (e.g. min, sum) for 1D tensor
- Add Cuda support
- Backward support
- Documentation for the op
- Perf optimizations and benchmark util
- Support for multi dimensional tensors (on data and lengths) (not high priority)
- Support for 'indices' (not high priority)
Test Plan: Added unit test
Reviewed By: ngimel
Differential Revision: D26952075
fbshipit-source-id: 8040ec96def3013e7240cf675d499ee424437560