Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55886
We've imported llvm's MathExtras header, but now that we want to also
include LLVM (which includes its own MathExtras), we need to guard the c10
version appropriately (or interwine llvm more deeply with our build than just
the CPU fuser, which I'm not super excited about doing just yet).
ghstack-source-id: 126375067
Test Plan: build
Reviewed By: ZolotukhinM
Differential Revision: D27731038
fbshipit-source-id: 7c136341d6b433b3876ee983820016df75c14dec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46833
Implicit integer conversions are causing compiler warnings. Since in this case the logs make it pretty clear that the `unsigned` types won't overflow despite 64-bit inputs, we fix the issue by making the downconversion explicit.
Test Plan: Standard test rig.
Reviewed By: malfet
Differential Revision: D24481377
fbshipit-source-id: 4422538286d8ed2beb65065544016fd430394ff8
Summary:
# Overall Improvements
1. Switched from using `unordered_set` to sparse bitset.
1. Prevent some excessive memory allocations (thanks to resistor )
1. Take advantage of the sparse bitset operations
1. Switch to `flat_hash_map` instead of `unordered_map` in some places.
# Benchmarks (somewhat approximate, best of a couple runs)
1. InceptionNet (load + one forward pass): 19.8->13.3
1. GoogleNet(load + one forward pass): 10.0 -> 7.24
1. DenseNet (only load): 7.3 -> 5.3
I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies.
## Some notes on our graph topologies
In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves).
PS: Holy crap macbooks throttle an insane amount with the default fan settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899
Differential Revision: D15564612
Pulled By: Chillee
fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0