I'm not sure you looked at what I posted. __builtin_ia32_psadbw is right there on the list of builtins. I've used __builtin_ia32_psadbw128 in GCC myself. It compiles directly to PSADW instructions. Perhaps you confused what I was talking about with GCC's auto-vectorization?
edit: Just realized that you're the x264 guy and it's unlikely you misunderstood me. Still I think my point about psadbw stands.
I'm not sure you looked at what I posted. __builtin_ia32_psadbw is right there on the list of builtins. I've used __builtin_ia32_psadbw128 in GCC myself. It compiles directly to PSADW instructions. Perhaps you confused what I was talking about with GCC's auto-vectorization?
Those aren't gcc vectors, those are intrinsics. Vectors use something like this:
edit: Just realized that you're the x264 guy and it's unlikely you misunderstood me. Still I think my point about psadbw stands.