"Unsafe" is an escape hatch used for a number of reasons. There are good ones an...

dbaupp · on May 29, 2015

> The first one is mostly a problem with expressive power in the foreign function interface. Can you express what "int read(int fd, char buf[], size_t len)" means in the foreign function definition syntax? Rust's foreign function syntax isn't expressive enough to do that.[1] You can't tell Rust that "len" is the length of "buf". Being able to do that would help reduce the need for unsafe code. Most of the POSIX/Linux API can be described with relatively simple syntax that allows you to associate size info with C arrays. (I once proposed this as an extension to C. It's technically possible but politically too difficult.)[2]

There's still a need for `unsafe`, since it's possible for the relationship to be described incorrectly. It's fundamentally not something the compiler can check, and hence requires `unsafe` conceptually (if not in practice).

One can regard wrapping FFI functions in safe interfaces as specifying the relationships between parameters.

> If you cast 4 bytes to a 32-bit unsigned integer, the result is always a valid 32-bit unsigned integer. Conversions like that should be explicit, but are not memory-unsafe.

Only a very small subset of types have the property that any bit-pattern is safe, essentially only primitives. So this seems like a rather limited way to reduce unsafety (instead of just writing a short library function once).

Animats · on May 29, 2015

"Only a very small subset of types have the property that any bit-pattern is safe, essentially only primitives."

Structs which contain only primitives have that property. Consider a TCP header.

dbaupp · on May 29, 2015

No, only structs with no invariants. As soon as you have invariants, there are illegal bit patterns. Of course, these illegal bit pattern may not necessarily result in memory unsafety, but there's no way for the compiler to know this automatically^.

This functionality could be implemented something like

  fn from_bytes<T: JustBits>(bytes: &[u8]) -> Option<&T> {
      if bytes.len() >= std::mem::size_of::<T>() {
          unsafe {
              Some(&*(bytes.as_ptr() as *const T))
          }
      } else {
          None
      }
  }
  /// Values for which any bit pattern is valid.
  pub unsafe trait JustBits {}
  
  unsafe impl JustBits for u8 {}
  unsafe impl JustBits for i8 {}
  unsafe impl JustBits for u16 {}
  unsafe impl JustBits for i16 {}
  // ...

Some custom struct that can be any bit pattern can then do:

  struct CustomStruct { ... }

  unsafe impl JustBits for CustomStruct {}

Of course, there's `unsafe` there, but there has to be: it's asserting that "yes, I'm sure that anything works".

^Notably, there's been proposals for `unsafe` fields which will make expressing "invariants exist" more focused, and adjust the trade-offs here.

(I'll note that a TCP header has 3 reserved bits (100, 101, 102) which, I believe, should be set to zero, making some bit patterns theoretically illegal.)

jamii · on May 28, 2015

I've run into two kinds of problems that require unsafe so far.

The first is similar to the c extension you described - building data structures with dynamically-sized arrays eg a bitfield followed by population_count(bitfield) entries. It would be great to have some way to express this without having to pay a whole usize for DST eg:

    struct Node{
        keys: u8,
        vals: [ShortPointer<Node>; popcount(keys)],
    }

The second is dealing with recursive data-structures or algorithms. Even for tree-shaped stuff, if you are hanging onto state as you walk the tree there are some kinds of patterns that the borrow checker just can't deal with eg

    fn join_step<'a>(state: &mut Vec<&'a Value>, ..) {
        ...
        for values in primitive.eval_from_join(&arguments[..], &state[..]).into_iter() {
            // promise the borrow checker that we will pop values before we exit this scope
            let values = unsafe { ::std::mem::transmute::<&Vec<Value>, &'a Vec<Value>>(&values) };
            push_all(state, values);
            if join.constraints[ix].iter().all(|constraint| constraint.is_satisfied_by(&state[..])) {
                join_step(state, ...)
            }
            pop_all(state, values);
        }
    }

It would be nice to have some finer-grained way of making this promise - transmute is overkill and leaves me open to all kinds of mistakes.

Gankro · on May 28, 2015

If you're just trying to demote a lifetime, you should just be able to specify the lifetime on the variable:

    let values: &'a Vec<Value> = &values;

Lifetimes have variance so that you can always put a "bigger" lifetime in a place expecting a "smaller" one safely and it will treated as the smaller one forever.

Although it can be a bit of a dangerous game trying to "set" lifetimes manually because lifetime variance is hell (at least to me).

Jweb_Guru · on May 28, 2015

> It's a good candidate for formal proof of correctness - not too big, and critical to system operation.

Unfortunately, having actually looked for production strength proofs of correctness for production allocators (as opposed to toys), I wasn't able to find any. Given how high-value of a target they are for formal verification, we may be underestimating the complexity of doing so.

Animats · on May 29, 2015

Successes have been claimed.[1]

It's discouraging to me how little progress there's been in proof of correctness in the last 35 years. I used to work on that stuff. C set the field back by decades.

[1] https://books.google.com/books?id=yvJqCQAAQBAJ&pg=PA364&lpg=...

Jweb_Guru · on May 29, 2015

That was one of the "toy" implementations I mentioned. The algorithm they described is going to be nowhere near the performance of a modern multithreaded allocator like jemalloc (which is correspondingly far more complex). Which is not to say things like that are not encouraging, only that I was hoping someone would have proven something people actually use correct.