There's an obvious extension here for lifetime inference - the example given doesn't need to be an error, it could compile correctly by increasing the object lifetime to the outer block. I don't know offhand whether there is a universally correct inference algorithm for that (if every other language feature was static then unification would solve it easily, but the other language features are not static and I don't know how it would interact with rust type inference).
In general I prefer the compiler to yell at me rather than to secretly help me behind the scenes. Otherwise, I don't learn anything and I get burned later on when there is a situation where the safeguard can't help me.
OCaml recently added a bunch of "help" -- for example, if you reference a field in a struct, but the name of the field could be from one of several structs, then it will guess which struct you mean.
TBH I don't find this to be that useful -- it covers up potential mistakes, and of course means that code breaks when compiled with the older version of OCaml which didn't do this.
Edit: I should note that this doesn't break type safety.
I like clang's general approach, where if it can guess what you mean to do, it says "did you mean ..." in the error message, and carries on trying to compile the rest of the code on an "assume we did make this change" basis, but fails overall.
If the object has destructor with side effects, I believe that would massively complicate reasoning about program's behavior for human who's reading it.
It has some unexpected properties: it can sometimes produce dangling pointers because it knows the target data will never be used, even though a GC would treat it as live. And on the other hand it will often promote data to an excessively long-lived region (because region lifetimes have to be nested). So current versions of the MLKit use GC as well as region inference.
No. Currently a reference in a struct or enum has no inferred lifetime, and must be explicitly stated. Having it default to the lifetime of that containing struct or enum would simply mean you don't need to specify it. The danger is it could infer wrongly, leading to lifetime errors that might be cryptic.
But at the end of the day, the Rust compiler will never allow a reference to outlive the original object.
You've misinterpreted, it's not a question about reducing the annotation in a struct/enum definition, but about postponing the destruction of the String so that the later references are valid, i.e. currently we have
fn test_parse_unsafe() {
let v = {
let text = "The cat".to_string();
tokenize_string3(text.as_slice())
}; // `text` destroyed here
assert_eq!(vec![Word("The"), Other(" "), Word("cat")], v);
}
but the suggestion/question is about changing this to
fn test_parse_unsafe() {
let v = {
let text = "The cat".to_string();
tokenize_string3(text.as_slice())
};
assert_eq!(vec![Word("The"), Other(" "), Word("cat")], v);
} // `text` destroyed here
so that the references in `v` are valid.
This could lead to "memory leaks", where a destructor is implicitly postponed to a higher scope, but I don't think it would be much of a problem in practice (the promotion would only be through simple scopes, not through loops, and maybe not through `if`s). In fact, there's an yet-to-be-implemented accepted RFC covering this[1] (there's no guarantee that it will be implemented though, just that the idea is mostly sound).
I like this - you can make sure something isn't named in subsequent code while still allowing it to be used. Ideally there could be some mechanism to statically assert that it is destroyed by some particular point, though I'm not sure what that should look like (I guess that's just making the lifetime explicit?).
Yes, I agree that it should be safe, I've softened my original text. However, it would require dynamically tracking if the destructor needs to be run, and there's currently discussion[1] about Rust possibly moving to a static model, for the highest performance.
On a two-way if statement, then a given storage location is either set on zero, one or both branches. If it is set on neither branch then the if statement is irrelevant and can be ignored. If it is set on one branch, then either it had an original value and hence can be treated as being set on both branches, or it must be destroyed within the branch of the if (no null pointers - think about it until it is clear that the type system guarantees this). Hence we are only interested in cases which are isomorphic with the location being set on both branches.
We can treat this as a phi node following the if: there is one output value, which has been created in one of two different ways. In this case we don't know statically which value has been constructed, but we do know statically how and when to destroy it regardless of which one we get, because both branches have the same type and storage location. We don't actually need to know where it came from.
I'm not sure that this particular example can ever use some_string outside the if without hitting a type error, but I see what you mean.
That seems like a reasonable case to raise a type error. That defines the cases quite neatly: if it's temporary on both branches then it can work, and if it has different lifetimes then the values can't be merged and should be rejected. If the programmer really meant for this to work then they need to copy the global, and copies should be written explicitly.
There's no type error at all, we're talking about delaying the destruction of some_string so that the `s` (which is a &str) is valid outside the if. The string literal is a &str with a infinite lifetime, and so can of course be safely restricted to have the same lifetime as the other branch (done implicitly).
However, it's easily possible to have the &str come from temporaries of different types in the two branches. This would restrict the static destruction case to only working through an `if` when the "parent" values have exactly the same types; which doesnt seem nearly as valuable and possibly not worth the effort.
Couldn't the compiler get around that by introducing a boolean variable that is set depending on the branch of the if statement taken, that it checks before running the destructor?
Can you elaborate? I fail to see why this is dynamic - it still determines at compile time if/when to run destructors. I was under the impression that dynamic destruction was when you determine when to run destructors at runtime. Garbage collectors or reference counting, in other words.
It is dynamic because it is not known if the destructor call is executed at compile time. I'm using the terminology from RFC PR #210 that I linked above.
The obviously "safe" thing to do is to push it out as far as the containing function, and no further. That would be sufficient for this scenario and probably all the ones where you would ever want this to happen.