It's experiment time!
All the */5, */15 and */20 cronjobs lead to the ugly I/O access pattern shown below. Tonight we're adding a RANDOM_DELAY to each crontab to figure out, if that would help. This means your jobs might be delayed by a few minutes compared to their normal run times. Please bear with us! 🐻
aand our tests concluded that the change improved the situation by exactly 0% 🎉 I don't understand why, though 😐 the playbook adds RANDOM_DELAY=3 and reloads crond (just to be sure). But the peaks did not change at all.
Oh well, back to the drawing board!
@dev Oh noes! I
But I use cron jobs to do my time critical measurement of the reactor temperature in my nuclear plant!!!1!
@oxi please note that according to §42(5) of our Allgemeine Geschäftsbedingungen the following use cases are not permitted: fission reactors. However, fusion reactors may be controlled using an uberspace account, assuming the exact operational parameters as well as engineering drawings are provided via FAX in a timely manner. Thank you for your cooperation.
@dev damn I meant fission reactor, äh... of course I meant fusion reactor 🙄
FAX is not possible due to security reasons. The messenger on horseback was sent a while ago... but on todays streets...
@dev then maybe a small number of jobs is responsible for all of the load? Seems unlikely at your scale, though. Or user cronjobs aren't responsible at all?
*Magic hand waving*
Multiple crond instances
*Magic hand wave end*
Or just check if you aren't doing this to your self
Are you gathering via Prometheus?
If so just set your own gather Intervalls to 7minutes to see if it changes the peaks distance
Or check which Intervalls any internals tools use and set them off by +-1-2 minutes to see, if it makes a change
If you still have high peaks you can then see which program is responsible