Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not associated with this project (or LMQL), but one of the authors of LMQL, a similar project, answered this in a recent thread about it.

https://news.ycombinator.com/item?id=35484673#35491123

        As a solution to this, we implement speculative execution, allowing us to
        lazily validate constraints against the generated output, while still
        failing early if necessary. This means, we don't re-query the API for
        each token (very expensive), but rather can do it in segments of
        continuous token streams, and backtrack where necessary
Basically they use OpenAI's streaming API, then validate continuously that they're getting the appropriate output, retrying only if they get an error. It's a really clever solution.


This is slick -- It's not explicitly documented anywhere but I hope OpenAI has the necessary callbacks to terminate generation when the API stream is killed rather than continuing in the background until another termination condition happens? I suppose one could check this via looking at API usage when a stream is killed early.


Yeah I did a CLI tool for talking to ChatGPT. I'm pretty sure they stop generating when you kill the SSE stream, based on my anecdotal experience of keeping ChatGPT4 costs down by killing it as soon as i get the answer I'm looking for. You're right that it's undocumented behavior though, on a whole the API docs they give you are as thin as the API itself.


I'm skeptical that the streaming API would really save that much cost. In my experience the vast majority of all tokens used are input tokens rather than completed tokens.


Any new call to the API is considered fresh. I don't believe your session is saved.


We're talking about the streaming API which streams generated text token by token, not the normal one-shot API. I have no insider knowledge but would agree with your intuition on the normal API.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: