"Unsafe" is an escape hatch used for a number of reasons. There are good ones and bad ones. Bad ones include:
- "I'm so l33t I don't need the compiler to check me." (Don't hire those guys.)
- "Safe code is too slow". (File bugs on the compiler's optimizer.)
- "Porting this to safe code would require a redesign". (See the Rust port of DOOM.)
Most of the real needs for "Unsafe" in Rust come from
- The need to interface with external code, including system calls.
- Forced type conversion ("casting")
- Memory allocation.
The first one is mostly a problem with expressive power in the foreign function interface. Can you express what "int read(int fd, char buf[], size_t len)" means in the foreign function definition syntax? Rust's foreign function syntax isn't expressive enough to do that.[1] You can't tell Rust that "len" is the length of "buf". Being able to do that would help reduce the need for unsafe code. Most of the POSIX/Linux API can be described with relatively simple syntax that allows you to associate size info with C arrays. (I once proposed this as an extension to C. It's technically possible but politically too difficult.)[2]
If your external interface still requires unsafe code after that, you're probably talking to something that has elaborate foreign data structures visible to the caller. Those really are unsafe. They also usually need a rewrite anyway. (OpenSSL comes to mind.)
Forced type conversion, or casting, is traditionally a problem. Most of the trouble comes from C, where casts bypass all type checking. In practice, much casting is safe. If a type is fully mapped to the underlying bits (i.e. all possible bit value are valid for the type), then allowing a cast is safe. If you cast 4 bytes to a 32-bit unsigned integer, the result is always a valid 32-bit unsigned integer. Conversions like that should be explicit, but are not memory-unsafe. On the other hand, casting to a pointer is always unsafe. Again, with a bit more expressive power, the need for unsafe code can be reduced.
Memory allocation is hard. However, more of it could be done in safe code. Suppose Rust had a type "space", which is simply an array of bytes, treated as write-only. Constructors take in an array of "space" of the desired type, create a valid local structure with the initialized values, and then perform an operation which copies the structure to the array of "space" and changes its type to the type of the structure. This is safe construction. As an optimization, the compiler can observe that if no reads are made from the local structure prior to converting the "space", the extra local copy is unnecessary.
"Space" would still have Rust scope and lifetime, so all that machinery remains hidden. But it's convenient to separate it from construction. Raw memory allocation is complex and unsafe, but separated from the type system, it's a coherent closed system that doesn't get modified much. It's a good candidate for formal proof of correctness - not too big, and critical to system operation.
Operations such as expanding vectors seem to include unsafe code. That's worth a hard look. If you had
the "space" concept, and the operation that moves a struct into a "safe" array and converts the type, it should be possible to do operations such as growing an array without unsafe code.
For Rust 2, it's worth looking at how the need for unsafe code can be reduced. Ultimately, everything should be either memory safe or have a machine proof of memory correctness at the instruction level.
> The first one is mostly a problem with expressive power in the foreign function interface. Can you express what "int read(int fd, char buf[], size_t len)" means in the foreign function definition syntax? Rust's foreign function syntax isn't expressive enough to do that.[1] You can't tell Rust that "len" is the length of "buf". Being able to do that would help reduce the need for unsafe code. Most of the POSIX/Linux API can be described with relatively simple syntax that allows you to associate size info with C arrays. (I once proposed this as an extension to C. It's technically possible but politically too difficult.)[2]
There's still a need for `unsafe`, since it's possible for the relationship to be described incorrectly. It's fundamentally not something the compiler can check, and hence requires `unsafe` conceptually (if not in practice).
One can regard wrapping FFI functions in safe interfaces as specifying the relationships between parameters.
> If you cast 4 bytes to a 32-bit unsigned integer, the result is always a valid 32-bit unsigned integer. Conversions like that should be explicit, but are not memory-unsafe.
Only a very small subset of types have the property that any bit-pattern is safe, essentially only primitives. So this seems like a rather limited way to reduce unsafety (instead of just writing a short library function once).
No, only structs with no invariants. As soon as you have invariants, there are illegal bit patterns. Of course, these illegal bit pattern may not necessarily result in memory unsafety, but there's no way for the compiler to know this automatically^.
This functionality could be implemented something like
fn from_bytes<T: JustBits>(bytes: &[u8]) -> Option<&T> {
if bytes.len() >= std::mem::size_of::<T>() {
unsafe {
Some(&*(bytes.as_ptr() as *const T))
}
} else {
None
}
}
/// Values for which any bit pattern is valid.
pub unsafe trait JustBits {}
unsafe impl JustBits for u8 {}
unsafe impl JustBits for i8 {}
unsafe impl JustBits for u16 {}
unsafe impl JustBits for i16 {}
// ...
Some custom struct that can be any bit pattern can then do:
Of course, there's `unsafe` there, but there has to be: it's asserting that "yes, I'm sure that anything works".
^Notably, there's been proposals for `unsafe` fields which will make expressing "invariants exist" more focused, and adjust the trade-offs here.
(I'll note that a TCP header has 3 reserved bits (100, 101, 102) which, I believe, should be set to zero, making some bit patterns theoretically illegal.)
I've run into two kinds of problems that require unsafe so far.
The first is similar to the c extension you described - building data structures with dynamically-sized arrays eg a bitfield followed by population_count(bitfield) entries. It would be great to have some way to express this without having to pay a whole usize for DST eg:
The second is dealing with recursive data-structures or algorithms. Even for tree-shaped stuff, if you are hanging onto state as you walk the tree there are some kinds of patterns that the borrow checker just can't deal with eg
fn join_step<'a>(state: &mut Vec<&'a Value>, ..) {
...
for values in primitive.eval_from_join(&arguments[..], &state[..]).into_iter() {
// promise the borrow checker that we will pop values before we exit this scope
let values = unsafe { ::std::mem::transmute::<&Vec<Value>, &'a Vec<Value>>(&values) };
push_all(state, values);
if join.constraints[ix].iter().all(|constraint| constraint.is_satisfied_by(&state[..])) {
join_step(state, ...)
}
pop_all(state, values);
}
}
It would be nice to have some finer-grained way of making this promise - transmute is overkill and leaves me open to all kinds of mistakes.
If you're just trying to demote a lifetime, you should just be able to specify the lifetime on the variable:
let values: &'a Vec<Value> = &values;
Lifetimes have variance so that you can always put a "bigger" lifetime in a place expecting a "smaller" one safely and it will treated as the smaller one forever.
Although it can be a bit of a dangerous game trying to "set" lifetimes manually because lifetime variance is hell (at least to me).
> It's a good candidate for formal proof of correctness - not too big, and critical to system operation.
Unfortunately, having actually looked for production strength proofs of correctness for production allocators (as opposed to toys), I wasn't able to find any. Given how high-value of a target they are for formal verification, we may be underestimating the complexity of doing so.
It's discouraging to me how little progress there's been in proof of correctness in the last 35 years. I used to work on that stuff. C set the field back by decades.
That was one of the "toy" implementations I mentioned. The algorithm they described is going to be nowhere near the performance of a modern multithreaded allocator like jemalloc (which is correspondingly far more complex). Which is not to say things like that are not encouraging, only that I was hoping someone would have proven something people actually use correct.
- "I'm so l33t I don't need the compiler to check me." (Don't hire those guys.)
- "Safe code is too slow". (File bugs on the compiler's optimizer.)
- "Porting this to safe code would require a redesign". (See the Rust port of DOOM.)
Most of the real needs for "Unsafe" in Rust come from
- The need to interface with external code, including system calls.
- Forced type conversion ("casting")
- Memory allocation.
The first one is mostly a problem with expressive power in the foreign function interface. Can you express what "int read(int fd, char buf[], size_t len)" means in the foreign function definition syntax? Rust's foreign function syntax isn't expressive enough to do that.[1] You can't tell Rust that "len" is the length of "buf". Being able to do that would help reduce the need for unsafe code. Most of the POSIX/Linux API can be described with relatively simple syntax that allows you to associate size info with C arrays. (I once proposed this as an extension to C. It's technically possible but politically too difficult.)[2]
If your external interface still requires unsafe code after that, you're probably talking to something that has elaborate foreign data structures visible to the caller. Those really are unsafe. They also usually need a rewrite anyway. (OpenSSL comes to mind.)
Forced type conversion, or casting, is traditionally a problem. Most of the trouble comes from C, where casts bypass all type checking. In practice, much casting is safe. If a type is fully mapped to the underlying bits (i.e. all possible bit value are valid for the type), then allowing a cast is safe. If you cast 4 bytes to a 32-bit unsigned integer, the result is always a valid 32-bit unsigned integer. Conversions like that should be explicit, but are not memory-unsafe. On the other hand, casting to a pointer is always unsafe. Again, with a bit more expressive power, the need for unsafe code can be reduced.
Memory allocation is hard. However, more of it could be done in safe code. Suppose Rust had a type "space", which is simply an array of bytes, treated as write-only. Constructors take in an array of "space" of the desired type, create a valid local structure with the initialized values, and then perform an operation which copies the structure to the array of "space" and changes its type to the type of the structure. This is safe construction. As an optimization, the compiler can observe that if no reads are made from the local structure prior to converting the "space", the extra local copy is unnecessary.
"Space" would still have Rust scope and lifetime, so all that machinery remains hidden. But it's convenient to separate it from construction. Raw memory allocation is complex and unsafe, but separated from the type system, it's a coherent closed system that doesn't get modified much. It's a good candidate for formal proof of correctness - not too big, and critical to system operation.
Operations such as expanding vectors seem to include unsafe code. That's worth a hard look. If you had the "space" concept, and the operation that moves a struct into a "safe" array and converts the type, it should be possible to do operations such as growing an array without unsafe code.
For Rust 2, it's worth looking at how the need for unsafe code can be reduced. Ultimately, everything should be either memory safe or have a machine proof of memory correctness at the instruction level.
[1] https://doc.rust-lang.org/book/ffi.html [2] http://www.animats.com/papers/languages/safearraysforc43.pdf