Another author here– I'll add to Jared's comment above that for long-running experiments (like the ones our Dota team runs), it can be useful to track this statistic in real time to see whether or not it would be useful to scale up the experiment.
Whats a cheap and unobtrusive way to estimate the BSimple version of the noise scale in real time? Piggy back on ADAM's moving mean and variance estimates? Edit: I see that Appendix A has a method for the multi-device training setting, but I'm thinking of single device training.