I tried to reproduce DDPM on CIFAR10. As mentioned in the paper, my batchsize is 128, the optimizer is Adam, the learning rate is 0.0002, and I used l2 loss. I found that the training loss kept fluctuating between 0.015 and 0.030. What is this caused by? Should I need to reduce the learning rate? Can you tell me the loss of your training?
I tried to reproduce DDPM on CIFAR10. As mentioned in the paper, my batchsize is 128, the optimizer is Adam, the learning rate is 0.0002, and I used l2 loss. I found that the training loss kept fluctuating between 0.015 and 0.030. What is this caused by? Should I need to reduce the learning rate? Can you tell me the loss of your training?