* Integrate accelerator abstraction interface into deepspeed/
* Fix error message in fp16/fused_optimizer
* fix error message in fp16/unfused_optimizer.py
* assign get_accelerator().pin_memory() result to input Tensor name
* no need to check cuda and whether nvtx supported
* move try-except into inner most block
* call Event() and Stream() in get_accelerator() for data type
* Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed
* Apply op_builder backend api change from #2705 from @jeffra
* fix tests where Builder NAME is used
* keep original ...Builder.NAME interface instead of ...Builder().NAME interface
* fix builder closure for installation
* fix randomltd builder
* add comments to clarify create_op_builder and get_op_builder
* fix compatibility with pip install -e
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>