> I wrote a small tool called `crate-scraper` which downloads the source package for every source specified in our Cargo.toml file, and stores them locally so we can have a snapshot of the code used to build a Xous release.
> This cargo subcommand will vendor all crates.io and git dependencies for a project into the specified directory at <path>. After this command completes the vendor directory specified by <path> will contain all remote sources from dependencies specified.
Maybe he doesn't want to depend on Cargo. Fair enough, it's a big program.
The big thing I wanted was the summary of all the build.rs files concatenated together so I wasn't spending lots of time grepping and searching for them (and possibly missing one).
The script isn't that complicated... it actually uses an existing tool, cargo-download, to obtain the crates, and then a simple Python script searches for all the build.rs files and concatenation them into a builds.rs mega file.
The other reason to give the tool its own repo is crate-scraper actually commits the crates back into git so we have a publicly accessible log of all the crates used in a given release by the actual build machine (in case the attack involved swapping out a crate version, but only for certain build environments, as a highly targeted supply chain attack is less likely to be noticed right away).
It's more about leaving a public trail of breadcrumbs we can use to do forensics to try and pinpoint an attack in retrospect, and making it very public so that any attacker who cares about discretion or deniability has deal with this in their counter-threat model.
I often wonder about what priorities lead to the kind of focus on the build system as a supply chain attack vector. It seems unusual that you are in a position where you have a chunk of code you want to build and have to trust the system that builds it but not the code, especially in a situation where such concerns can't be adequately addressed through sandboxing the build system. Personally if I was concerned about the supply chain I wouldn't worry about 5.6k lines of rust code running during the build and more the >200k (extremely conservative estimate) lines running on the actual system. (not that you can ignore the build system since of course it can inject code into the build, just that it's such a small part of the workload of reviewing the dependencies it shouldn't really be worth mentioning).
I guess the major thing is opening up the code to review it in an editor of choice and then having an LSP server running the build scripts automatically without you realizing it.
Reviewing code that you don't trust seems to be a pretty logical thing, and most people probably wouldn't expect that opening the code up in their favorite editor could cause their system to be harmed!
I thought `cargo vendor` already did this?
https://doc.rust-lang.org/cargo/commands/cargo-vendor.html
> This cargo subcommand will vendor all crates.io and git dependencies for a project into the specified directory at <path>. After this command completes the vendor directory specified by <path> will contain all remote sources from dependencies specified.
Maybe he doesn't want to depend on Cargo. Fair enough, it's a big program.