In the later Epoch, it was observed that the recognition accuracy of the model in the test dataset increased slowly, so it is considered to use a smaller learning rate.
Epoch: 24, Last loss :-9.88e-10, Best Accuracy: 97.78%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1366.33 Iter/s]
Epoch: 25, Last loss :-9.92e-10, Best Accuracy: 97.80%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1367.67 Iter/s]
Epoch: 26, Last loss :-9.94e-10, Best Accuracy: 97.82%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1367.13 Iter/s]
Epoch: 27, Last loss :-9.96e-10, Best Accuracy: 97.82%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1366.02 Iter/s]
Epoch: 28, Last loss :-9.97e-10, Best Accuracy: 97.82%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1369.92 Iter/s]
Epoch: 29, Last loss :-9.97e-10, Best Accuracy: 97.84%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1363.86 Iter/s]
Epoch: 30, Last loss :-9.98e-10, Best Accuracy: 97.84%: 100%|████████████████████████████████████████████| 60000/60000 [00:44<00:00, 1359.05 Iter/s]
Epoch: 31, Last loss :-9.98e-10, Best Accuracy: 97.84%: 100%|████████████████████████████████████████████| 60000/60000 [00:44<00:00, 1356.52 Iter/s]
Epoch: 32, Last loss :-9.99e-10, Best Accuracy: 97.84%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1363.80 Iter/s]
Epoch: 33, Last loss :-9.99e-10, Best Accuracy: 97.84%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1368.25 Iter/s]
Epoch: 34, Last loss :-9.99e-10, Best Accuracy: 97.84%: 100%|████████████████████████████████████████████| 60000/60000 [00:43<00:00, 1374.83 Iter/s]
In the later Epoch, it was observed that the recognition accuracy of the model in the test dataset increased slowly, so it is considered to use a smaller learning rate.