What is your take on checking return values?

grantjpowell · on Sept 3, 2020

> They actually said that to me: it makes the code messy so they don't want to do it.

I'm going to give one possible interpretation (my own) of the sentiment I believe the engineers were trying to get across with the statement "it makes the code messy".

Functionality is an asset, and code is a liability. Minimizing the amount of code and making the code "less messy" is a way of reducing that liability. That doesn't excuse not checking for certain error conditions, but I think it's important to point out that non "messy code" has merit on its own.

On the topic of effective error handling, the number of possible invalid states in a program (especially dealing with the messy real world) tends to be combinatoric. Error checking is vital to building a successful application, but my experience is that in practice it quickly becomes impractical to have _total_ potential error coverage. At some point tradeoffs need to be made, and effective engineering is the art of evaluating those tradeoffs.

> "And yet, here we are, because it DID happen."

I'm not trying to imply the author may be suffering from hindsight bias, I don't have any information that could tell me that. I do know that in similar situations I've been in in the past I feel like I've suffered from hindsight bias significantly. Here's a great article on hindsight bias that I think about often. https://www.lesswrong.com/posts/fkM9XsNvXdYH6PPAx/hindsight-...

The pivotal section from the lesswrong 2.0 article for me

> Viewing history through the lens of hindsight, we vastly underestimate the cost of effective safety precautions. In 1986, the Challenger exploded for reasons traced to an O-ring losing flexibility at low temperature. There were warning signs of a problem with the O-rings. But preventing the Challenger disaster would have required, not attending to the problem with the O-rings, but attending to every warning sign which seemed as severe as the O-ring problem, without benefit of hindsight. It could have been done, but it would have required a general policy much more expensive than just fixing the O-Rings.

nickm12 · on Sept 3, 2020

Tony Hoare, famously, called null references a "billion dollar mistake"[0]. Much preferred to checking return values is to use a language which models null references in the type system doesn't compile code that would try to use a null value as if it weren't null.

Ok, fine, but for whatever reason, some of us don't have the option of working in such a language. What to do then? The author seems to think it's obvious that the answer is you check for null in any function that could possibly receive a null value. This means you have to check for null for every reference type in your system, which is madness.

I agree with the programmers who the author thinks "Do Not Get It". When you have a null reference bug, you fix the bug where the reference is generated, not where it's used. If you fix it at the place where it's used, you've barely fixed anything, because the place where it's generated will continue to send nulls to unsuspecting callers who will blow up when they assume it's non-null.

It sounds to me like the fix here is to write a validation routine for the configuration to ensure that it doesn't vend null values to the rest of the program when it shouldn't. The service can validate its config at load time and fail to start up if it's invalid. This will block your deployments, but your customers won't be impacted and your code won't be littered with null checks that fail to address the root cause.

[0] https://www.infoq.com/presentations/Null-References-The-Bill...

thdc · on Sept 3, 2020

Definitely agree that you should have validation if you can't guarantee the validity of your data, but I think there's a difference here.

Coincidentally, I had a discussion about a similar topic today. Basically how much validation is necessary and at what level should it be at?

  function add(a, b)
    if a is not a number
      throw error
    if b is not a number
      throw error
    result = a + b
    if result > max possible precise value
      throw error
    etc...

I would never have these checks in a function that just added numbers if I could guarantee that any values passed to it are correct.

The team in the article said that the config shouldn't return null so they didn't put checks before using that value. To me it seems like they missed necessary validation logic because they didn't the realize the full extend of the config.

NewEntryHN · on Sept 5, 2020

Why are we talking about checking return values (which are generated by a function according to its interface and should not be checked) when we should be talking about checking configuration values (which are generated by humans according to whatever they type on their keyboard and should be checked). Also why are we talking about checking stuff when we should be talking about one bug bringing the entire system down?

burfog · on Sept 3, 2020

Be sure that your error-handling code checks for errors.

  while(printf("Error!\n")+1 != sizeof "Error!\n")
    ;

Also, it should be bug-free.

burfog · on Sept 3, 2020

No, wait, like this:

  void got_error(void){
    char err[] = "Error!\n";
    int len = strlen(err);
    int ret = printf(err);
    if(len != ret)
      got_error();
  }

Obviously, if the error routine faces an error, it should call the error routine. That is how you handle errors.

Er, but maybe the error routine should also return an error flag in that case? The caller could then check it and call the error routine if needed.

technicalbard · on Sept 2, 2020

I'm an amateur developer and I write checking code in functions all the time because you can't be certain of the source of the data!!!

tjalfi · on Sept 3, 2020

RobustProgramming.ps[0] is a great presentation by Stratus Computers about error handling for writing reliable systems.

I posted it to HN[1] a few years ago.

[0] ftp://ftp.stratus.com/vos/doc/papers/RobustProgramming.ps

[1] https://news.ycombinator.com/item?id=13678251

rurban · on Sept 3, 2020

Could be the glibc authors. Why should one ever check for NULL or out of boundary conditions or overlaps or overflows when the spec forbids it?