If it can't be easily simulated by QA--and infra had to get involved in this case, in order to hang the service--it's hard to even test Rachel's suggestion like she mentioned.
No dev team would add a /hang endpoint just sitting out there, either.
Every environment, from local to pre-production, may successfully bind the port. So it becomes an operations issue, almost like an emergent thing (a service among services in an ecosystem).
Ideally, the port binding retry gets added to an internal framework, so that all teams benefit. The first version might be a hack, but later improvements multiply every service's robustness.
Although, it seems like there's a threshold for distributed port binding retries where they synchronize (?), and nothing can rebind.
No dev team would add a /hang endpoint just sitting out there, either.
Every environment, from local to pre-production, may successfully bind the port. So it becomes an operations issue, almost like an emergent thing (a service among services in an ecosystem).
Ideally, the port binding retry gets added to an internal framework, so that all teams benefit. The first version might be a hack, but later improvements multiply every service's robustness.
Although, it seems like there's a threshold for distributed port binding retries where they synchronize (?), and nothing can rebind.