Jane Street: "We fight at a fundamental disadvantage" using OCaml over C++

karmakaze · on Dec 1, 2023

Interesting though brief read. I'd earlier read how OCaml was their unfair advantage, but apparently it comes at a cost. I can see how thinking/working at a higher level makes it even harder to control runtime concerns. I wish it had gone into more detail on the GC issues. I can't tell if it's related to interfacing with code written in lower-level languages, or runtime native code generation, or something else.

Arnt · on Dec 1, 2023

I've heard about that from others. There are some cases where you want constant performance. Say you receive some input and want to react with e.g. a pricing change. Then you want to defer GC until you're done with processing (and not run out of memory).

yawaramin · on Dec 4, 2023

Someone pointed out that this article seems like an AI-generated (badly) summary of the Jane Street Signals & Threads podcast episode. It would be better to listen to or read the transcript of the podcast itself rather than this editorialized version: https://signalsandthreads.com/performance-engineering-on-har...

keikobadthebad · on Dec 1, 2023

Verilog isn't slow to compile to RTL.

The problem is after it's compiled and 'linked', laying out and routing it efficiently on the fpga takes a lot of compute as the occupancy and timing requested gets closer to the edge of what's possible on the device.

On the plus side, you can quickly iteratively compile and simulate your change to confirm it is logically correct before putting it on the fpga.

lahvak · on Dec 1, 2023

Can someone familiar with OCaml comment on this:

> The reason for this is that the language prominently features garbage collection, a feature which deletes code not in use by the machine. Using this results in "uninitialized data [which] can be really problematic."

Does the OCaml GC really trash data that are still being used by the program?

octachron · on Dec 1, 2023

Of course not.

Honestly, I am not sure what was the initial sentence before being distorted by the editor. The next sentence makes me think that the interviewee might have been discoursing about all the tricks that Janestreet uses to reduce allocation to a minimum but it is hard to tell from the final text.

cempaka · on Dec 1, 2023

From the transcript of the source podcast on Jane Street's website:

> A good example is we’re a garbage collected language, our garbage collector inspects values at runtime. Therefore, uninitialized data can be really problematic. And so we have to do stupid things in my brain like, oh, it’s really important to null out the pointers in this array and not just leave them behind or they’ll leak, or you can’t just have an uninitialized array that I promise I’ll get too soon. Because what happens if you GC in that range? And I do actually think this is meaningfully costly in some scenarios, but I’m willing to put up with it.

I'm still a little unsure exactly what this means though.

The whole podcast is worth a listen, I had never heard about the feature on Intel CPUs which allows you to replay the last 2 milliseconds of instructions on the chip.

octachron · on Dec 1, 2023

Ok, now I understand: this part is a description of how optimizing C bindings using the OCaml FFI often requires to play around the GC:

With OCaml uniform representation of data, the GC will follow all data that looks like a pointer in its reachable set of values. Which is the right thing to do for valid OCaml values, for which anything that looks like a pointer (outside of strings or numerical arrays) is a pointer to a valid OCaml value.

However if you are building OCaml values in the C FFI, and that this in-construction value is somehow reachable by the GC, you have to make sure that there is no pointer in the uninitialized part of the data. Or alternatively you need to make sure that the runtime will not enter a GC phase while you are building those values.