Hi, i tried to train model with only LJ data, and with only own data, with fp16 and with fr32, with 1 gpu and with 3 gpu, but everywhere i have this

Always los is Nan.
When i start with pretrained chekpoint your code return this:

I solve it by changing def load_checkpoint , but loss is nan(
do u have any ideas what am i doing wrong?
Hi, i tried to train model with only LJ data, and with only own data, with fp16 and with fr32, with 1 gpu and with 3 gpu, but everywhere i have this


Always los is Nan.
When i start with pretrained chekpoint your code return this:
I solve it by changing
def load_checkpoint, but loss is nan(do u have any ideas what am i doing wrong?