You can do that with Onnx. You can graft the preprocessing layers to the actual model [1] and then serve that. Honestly, I already thought that ONNX (CPU at least) was already low level code and already very optimized.
@Author - if you see this is it possible to add comparisons (ie "vanilla" inference latencies vs timber)?