Unsafe Rust: An Intro and Open Questions

Gankro · on May 28, 2015

Author here. I'm currently interning at Mozilla with the goal of writing the "advanced" companion to the TRPL (The Rust Programming Language book): TURPL (The Unsafe Rust Programming Language).

In the process I'll need to wrangle various members of the community -- particularly core team members -- to determine the things that we actually intend to guarantee in safe code, and what unsafe code is allowed to "do".

This post was intended to kick off that effort by:

* Making it clear that stuff is unclear

* Asserting my beliefs on what things should be

* Getting the whole internet mad at me so that they can explain what it should actually be

So, please, single file: Get Mad On The Internet At This Guy

sanderjd · on May 28, 2015

Thanks for the great article!

I think the most important point you make is that unsafety confounds local reasoning, requiring reasoning about all the possible interactions a block of code might have with everything else. One of my favorite things about (safe) Rust (and functional languages in general) is exactly this local reasoning capability that you give up in unsafe code. It's still not clear to me whether or not it's harder to write correct-in-all-cases unsafe Rust code than correct-in-all-cases C/C++/etc. code in general.

As I was reading, I was thinking it would be nice for crates.io to include an indication of "level of unsafety" for each crate, but then you went and pointed out that the naive metrics for it would be dependent on stylistic choices. I wonder if you could transform to a "minimally unsafe representation", at either the code or AST level, and evaluate that.

Looking forward to TURPL, and especially interested in learning more about how unsafe code interacts with destructors, which seem particularly fraught.

jcranmer · on May 28, 2015

Since you mentioned you didn't understand what LLVM's "in bounds" meant:

In general, LLVM semantics are largely derived from C and C++ in terms of undefined behavior (although some aspects, like signed overflow, are merely optional). The rule for "in bounds" means that pointers can only be manipulated to point to some address within the bounds of their object, or possibly the address just after the object (which cannot be dereferenced). A concrete example:

struct foo { int a; char b; int c[32]; } obj;

char ptr = &obj.b; char ptr2 = ptr + kerfuffle;

In this example, it is undefined behavior if kerfuffle is not in the range of [-sizeof(int), sizeof(obj) - sizeof(int)]. (e.g., ptr2 == &obj.c[32] is valid, but ptr2 == &obj.c[33] is not).

Gankro · on May 28, 2015

It was more a jab at LLVM's documentation in general ;)

But since you're claiming to understand LLVM docs:

* How does LLVM identify that a region of memory is "allocated" per the usage in the GEP docs. In particular it may be useful to mark special addresses as "allocated" for special marker objects that don't actually exist.

* Does "in bounds" extend to arrays? e.g. can I offset even further from a ptr to foo if it's in an array of foo?

sunfish · on May 28, 2015

"Allocated" is something that arbitrary functions like "malloc" can define for themselves. LLVM specifies what can be done with "allocated memory", and then it's up to the API and implementation of malloc to provide something that works and is useful within LLVM's framework. Dereferencing "allocated" bytes through a proper pointer has to work. Dereferencing "unallocated" bytes is undefined behavior (regardless of whether the deference will succeed or fail in hardware).

LLVM's type system is mostly inert in its memory semantics. There is no difference between arrays or any other type of object with respect to what addresses can be computed and dereferenced. The important things are allocations which guarantee contiguous regions of memory.

kzrdude · on May 28, 2015

That's also this tricky issue with C that's often ignored. If you have a nested array and take a pointer &foo[0][0], the only in bounds pointers are the pointers into the 0th subarray.

sunfish · on May 28, 2015

That's true for C, but not for LLVM inbounds.

Animats · on May 28, 2015

"Unsafe" is an escape hatch used for a number of reasons. There are good ones and bad ones. Bad ones include:

- "I'm so l33t I don't need the compiler to check me." (Don't hire those guys.)

- "Safe code is too slow". (File bugs on the compiler's optimizer.)

- "Porting this to safe code would require a redesign". (See the Rust port of DOOM.)

Most of the real needs for "Unsafe" in Rust come from

- The need to interface with external code, including system calls.

- Forced type conversion ("casting")

- Memory allocation.

The first one is mostly a problem with expressive power in the foreign function interface. Can you express what "int read(int fd, char buf[], size_t len)" means in the foreign function definition syntax? Rust's foreign function syntax isn't expressive enough to do that.[1] You can't tell Rust that "len" is the length of "buf". Being able to do that would help reduce the need for unsafe code. Most of the POSIX/Linux API can be described with relatively simple syntax that allows you to associate size info with C arrays. (I once proposed this as an extension to C. It's technically possible but politically too difficult.)[2]

If your external interface still requires unsafe code after that, you're probably talking to something that has elaborate foreign data structures visible to the caller. Those really are unsafe. They also usually need a rewrite anyway. (OpenSSL comes to mind.)

Forced type conversion, or casting, is traditionally a problem. Most of the trouble comes from C, where casts bypass all type checking. In practice, much casting is safe. If a type is fully mapped to the underlying bits (i.e. all possible bit value are valid for the type), then allowing a cast is safe. If you cast 4 bytes to a 32-bit unsigned integer, the result is always a valid 32-bit unsigned integer. Conversions like that should be explicit, but are not memory-unsafe. On the other hand, casting to a pointer is always unsafe. Again, with a bit more expressive power, the need for unsafe code can be reduced.

Memory allocation is hard. However, more of it could be done in safe code. Suppose Rust had a type "space", which is simply an array of bytes, treated as write-only. Constructors take in an array of "space" of the desired type, create a valid local structure with the initialized values, and then perform an operation which copies the structure to the array of "space" and changes its type to the type of the structure. This is safe construction. As an optimization, the compiler can observe that if no reads are made from the local structure prior to converting the "space", the extra local copy is unnecessary.

"Space" would still have Rust scope and lifetime, so all that machinery remains hidden. But it's convenient to separate it from construction. Raw memory allocation is complex and unsafe, but separated from the type system, it's a coherent closed system that doesn't get modified much. It's a good candidate for formal proof of correctness - not too big, and critical to system operation.

Operations such as expanding vectors seem to include unsafe code. That's worth a hard look. If you had the "space" concept, and the operation that moves a struct into a "safe" array and converts the type, it should be possible to do operations such as growing an array without unsafe code.

For Rust 2, it's worth looking at how the need for unsafe code can be reduced. Ultimately, everything should be either memory safe or have a machine proof of memory correctness at the instruction level.

[1] https://doc.rust-lang.org/book/ffi.html [2] http://www.animats.com/papers/languages/safearraysforc43.pdf

dbaupp · on May 29, 2015

> The first one is mostly a problem with expressive power in the foreign function interface. Can you express what "int read(int fd, char buf[], size_t len)" means in the foreign function definition syntax? Rust's foreign function syntax isn't expressive enough to do that.[1] You can't tell Rust that "len" is the length of "buf". Being able to do that would help reduce the need for unsafe code. Most of the POSIX/Linux API can be described with relatively simple syntax that allows you to associate size info with C arrays. (I once proposed this as an extension to C. It's technically possible but politically too difficult.)[2]

There's still a need for `unsafe`, since it's possible for the relationship to be described incorrectly. It's fundamentally not something the compiler can check, and hence requires `unsafe` conceptually (if not in practice).

One can regard wrapping FFI functions in safe interfaces as specifying the relationships between parameters.

> If you cast 4 bytes to a 32-bit unsigned integer, the result is always a valid 32-bit unsigned integer. Conversions like that should be explicit, but are not memory-unsafe.

Only a very small subset of types have the property that any bit-pattern is safe, essentially only primitives. So this seems like a rather limited way to reduce unsafety (instead of just writing a short library function once).

Animats · on May 29, 2015

"Only a very small subset of types have the property that any bit-pattern is safe, essentially only primitives."

Structs which contain only primitives have that property. Consider a TCP header.

dbaupp · on May 29, 2015

No, only structs with no invariants. As soon as you have invariants, there are illegal bit patterns. Of course, these illegal bit pattern may not necessarily result in memory unsafety, but there's no way for the compiler to know this automatically^.

This functionality could be implemented something like

  fn from_bytes<T: JustBits>(bytes: &[u8]) -> Option<&T> {
      if bytes.len() >= std::mem::size_of::<T>() {
          unsafe {
              Some(&*(bytes.as_ptr() as *const T))
          }
      } else {
          None
      }
  }
  /// Values for which any bit pattern is valid.
  pub unsafe trait JustBits {}
  
  unsafe impl JustBits for u8 {}
  unsafe impl JustBits for i8 {}
  unsafe impl JustBits for u16 {}
  unsafe impl JustBits for i16 {}
  // ...

Some custom struct that can be any bit pattern can then do:

  struct CustomStruct { ... }

  unsafe impl JustBits for CustomStruct {}

Of course, there's `unsafe` there, but there has to be: it's asserting that "yes, I'm sure that anything works".

^Notably, there's been proposals for `unsafe` fields which will make expressing "invariants exist" more focused, and adjust the trade-offs here.

(I'll note that a TCP header has 3 reserved bits (100, 101, 102) which, I believe, should be set to zero, making some bit patterns theoretically illegal.)

jamii · on May 28, 2015

I've run into two kinds of problems that require unsafe so far.

The first is similar to the c extension you described - building data structures with dynamically-sized arrays eg a bitfield followed by population_count(bitfield) entries. It would be great to have some way to express this without having to pay a whole usize for DST eg:

    struct Node{
        keys: u8,
        vals: [ShortPointer<Node>; popcount(keys)],
    }

The second is dealing with recursive data-structures or algorithms. Even for tree-shaped stuff, if you are hanging onto state as you walk the tree there are some kinds of patterns that the borrow checker just can't deal with eg

    fn join_step<'a>(state: &mut Vec<&'a Value>, ..) {
        ...
        for values in primitive.eval_from_join(&arguments[..], &state[..]).into_iter() {
            // promise the borrow checker that we will pop values before we exit this scope
            let values = unsafe { ::std::mem::transmute::<&Vec<Value>, &'a Vec<Value>>(&values) };
            push_all(state, values);
            if join.constraints[ix].iter().all(|constraint| constraint.is_satisfied_by(&state[..])) {
                join_step(state, ...)
            }
            pop_all(state, values);
        }
    }

It would be nice to have some finer-grained way of making this promise - transmute is overkill and leaves me open to all kinds of mistakes.

Gankro · on May 28, 2015

If you're just trying to demote a lifetime, you should just be able to specify the lifetime on the variable:

    let values: &'a Vec<Value> = &values;

Lifetimes have variance so that you can always put a "bigger" lifetime in a place expecting a "smaller" one safely and it will treated as the smaller one forever.

Although it can be a bit of a dangerous game trying to "set" lifetimes manually because lifetime variance is hell (at least to me).

Jweb_Guru · on May 28, 2015

> It's a good candidate for formal proof of correctness - not too big, and critical to system operation.

Unfortunately, having actually looked for production strength proofs of correctness for production allocators (as opposed to toys), I wasn't able to find any. Given how high-value of a target they are for formal verification, we may be underestimating the complexity of doing so.

Animats · on May 29, 2015

Successes have been claimed.[1]

It's discouraging to me how little progress there's been in proof of correctness in the last 35 years. I used to work on that stuff. C set the field back by decades.

[1] https://books.google.com/books?id=yvJqCQAAQBAJ&pg=PA364&lpg=...

Jweb_Guru · on May 29, 2015

That was one of the "toy" implementations I mentioned. The algorithm they described is going to be nowhere near the performance of a modern multithreaded allocator like jemalloc (which is correspondingly far more complex). Which is not to say things like that are not encouraging, only that I was hoping someone would have proven something people actually use correct.

gregwtmtno · on May 28, 2015

This is important for anyone considering using Rust. I was mistaken about the meaning of Rust's safety guarantees before reading this.

Gankro · on May 28, 2015

Care to elaborate as to what you thought they were?

gregwtmtno · on May 28, 2015

Sure. First, thanks for the writeup. I had imagined that the Rust standard library used safe code all the way down. (Whatever that meant, I hadn't put all that much thought into it.) But as you state, "everything is built on top of unsafe."

So I guess my understanding after reading this is that I could, using only "safe" code, accidentally manipulate the Rust standard library to cause undefined behavior, it's just much more unlikely in Rust than in C++.

Gankro · on May 28, 2015

One important guarantee of Rust though: If you manage to do that, this is a bug in Rust and it's not your fault.

And really, that's true of any "safe" language, right? Java, Ruby, Javascript, Python, whatever -- implementation errors mean your program will do crazy bad stuff, and we all have them.

dschatz · on May 28, 2015

Just to add to this point, the difference is that Rust can only guarantee this for the standard library. I can similarly write a library with safe interfaces that can be (ab)used to cause UB and there's little that the Rust team can do. This is different from other "safe" languages.

This is why it's so important to establish what the responsibility and expectation is of library developers to uphold the safety guarantees that everyone else relies on. It only takes one bad library to destroy the safety guarantees everyone who is transitively using that library relies on.

eslaught · on May 28, 2015

This is not all that different from Java (or Python, etc.), where it is quite easy to hide a call to a native function behind a seemingly-safe interface. The real difference is that native methods in Java must be written in a different language (C), while Rust supports both modes in the same language. (Edit: Or, if you prefer, two different but very closely related languages.)

I would argue, at any rate, that this sort of safe/unsafe boundary is still useful for the purpose of auditing code. Conceptually, memory bugs are interactions between two points in the program: e.g. one location deallocates a pointer, then another tries to dereference it. With Rust's implementation of unsafe, you are guaranteed that any bad interactions must have at least one endpoint in an unsafe block. You still can't completely ignore the safe code, because unsafe code can reach arbitrarily far out of its box (so to speak), but in general this constraint does help significantly in limiting the amount of code that needs to be audited.

Manishearth · on May 29, 2015

Agreed. We had a segfault in Servo due to upgrading the compiler (and some internal representations changing). I wasn't able to track it myself (unfamiliarity with the code), but someone else was able to find its origin and fix it without much trouble because of `unsafe`. (That aside, we very rarely have segfaults in Servo, and Servo's huge)

rcxdude · on May 28, 2015

Indeed. I have frequently segfaulted python by using certain modules (not even particularly obscure or low-quality ones either).

gregwtmtno · on May 28, 2015

Right, but if you write a library using only Rust "safe" code, the guarantee is back on the Rust team.

To me, it would be better if libraries using unsafe code were marked.

sanderjd · on May 28, 2015

Here's[0] an interesting idea for forcing crates using unsafe code to be handled specially, while allowing some "blessed" crates through without the special handling.

[0]: https://github.com/rust-lang/cargo/issues/934#issuecomment-6...

krick · on May 28, 2015

That seems like absolutely great idea. I want it already. I wonder why it wasn't processed further, while being 6 month old already.

It is different notion of "blessed" than the original proposal and, IMHO, a much better one. Steve's proposal is quite dubious, I guess. The problem is low standard for the word "blessed" here: in his proposal this badge has no technical meaning, but big social impact. Crate doesn't have to be superior in a technical sense to get this badge, it just has to be "famous", and stuff gets "famous" for various reasons. That's really bad and will get worse when Rust/Cargo will become more popular.

But that "safe/unsafe" isn't a matter of opinion anymore. If a library is known to be unsafe through the "safe" interface: it's a bug, and a crate shouldn't have this badge from a moment the bug has been discovered and until the bug is fixed again (or even longer, if there have been 5 such bugs over the last 2 weeks, even though now they seem to be fixed). It somewhat serves the original purpose (I assume), because it still means that a crate that isn't used heavily enough ("was downloaded N times over the last month") cannot be "blessed" — we don't have all necessary information to mark it as "blessed" yet, so it will help to set "junk crates" aside.

It's worth nothing if a library is made by Github, but is known to be buggy and to leak memory. But it is worth something if a "github API" crate made by John Doe is used by hundreds of people and didn't have a single memory leak for quite a while.

steveklabnik · on May 28, 2015

> Steve's proposal is quite dubious, I guess.

I pretty much opened this issue so we could talk about it, not because I think it's a good idea.

krick · on May 28, 2015

Oh, I didn't mean any offense. I'm just commenting on the idea: the original wording of the proposal seems dangerous, but both reem's and yazaddaruvala's ideas definitely have potential.

So you did a good job by starting that discussion. I hope it will have results.

steveklabnik · on May 29, 2015

It's all good, none taken! :)

e12e · on May 29, 2015

I wonder if this shouldn't be on the "rust" side, rather than on the "crates.io"-side. Considering[1], I think I'd prefer either/or:

    extern unsafe crate phrases; // It's all UNSAFE!

Or: extern crate phrases; // It's mostly safe

    use unsafe phrases::english; // But not English

The idea being, that either phrases wouldn't be imported/give an error -- or everything in phrases except "english" would be imported -- and english would only be imported if qualified with "unsafe".

Either way... I can see this going the way of try/catch/throws in java -- where the usefulness diminishes as lazy programmers (we're all lazy) end up polluting everything with unsafe (just like "throws Exception e..").

[1] https://doc.rust-lang.org/book/crates-and-modules.html

TheLoneWolfling · on May 29, 2015

> with the caveat that main can't be unsafe

Why not? Is there a specific justification for this beyond just "you shouldn't do that"?