Wouldn't it be much more useful if the request received raw input (i.e. before f...

marcyb5st · 2026-03-02T09:01:09 1772442069

You can do that with Onnx. You can graft the preprocessing layers to the actual model [1] and then serve that. Honestly, I already thought that ONNX (CPU at least) was already low level code and already very optimized.

@Author - if you see this is it possible to add comparisons (ie "vanilla" inference latencies vs timber)?

[1] https://gist.github.com/msteiner-google/5f03534b0df58d32abcc... <-- A gist I put together in the past that goes from PyTorch to ONNX and grafts the preprocessing layers to the model, so you can pass the raw input.

kossisoroyce · 2026-03-02T09:38:42 1772444322

I'll check this out as soon as I am at my desk.