Error Log :
=========================== short test summary info ============================
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe[4]
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe_and_zero[4-True]
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe_and_zero[2-True]
FAILED tests/unit/test_configurable_parallel.py::TestConfigurableMP::test_gpt2_basic
====== 4 failed, 581 passed, 58 skipped, 1 warning in 3850.22s (1:04:10) =======
Steps to reproduce :
Follow the steps in this PR to install pytorch with hipify_torch as submodule
After building and installing pytorch from source , clone DeepSpeed from upstream and do a jit build and run unit tests:
git clone https://github.com/microsoft/DeepSpeed.git
- #include<THC/THCGeneral.h> from csrc/lamb/fused_lamb_cuda_kernel.cu removed before building
./install.sh (JIT build)
DEEPSPEED_TEST_WITH_ROCM=1 pytest --forked tests/unit/test_* 2>&1 | tee deepspeed_unit_test
Error Log :
=========================== short test summary info ============================
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe[4]
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe_and_zero[4-True]
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe_and_zero[2-True]
FAILED tests/unit/test_configurable_parallel.py::TestConfigurableMP::test_gpt2_basic
====== 4 failed, 581 passed, 58 skipped, 1 warning in 3850.22s (1:04:10) =======
Steps to reproduce :
Follow the steps in this PR to install pytorch with hipify_torch as submodule
After building and installing pytorch from source , clone DeepSpeed from upstream and do a jit build and run unit tests:
git clone https://github.com/microsoft/DeepSpeed.git./install.sh(JIT build)DEEPSPEED_TEST_WITH_ROCM=1 pytest --forked tests/unit/test_* 2>&1 | tee deepspeed_unit_test