> The big problem we encounrted is that its unit of management (I believe it's c...

> The big problem we encounrted is that its unit of management (I believe it's called a "job" but I'm rusty on this) is too low level. Our pipeline processes a lot of data and we have millions of jobs per day. Once Airflow has an (planned or unplanned) outage, 10s of thousands of job start piling up, and it never recovers from that.

That sounds more like an architecture-at-scale problem than something that is Airflow's 'fault.' Airflow may never have been the right tool for the job but it's getting all the blame.