Not associated with this project (or LMQL), but one of the authors of LMQL, a si...

newhouseb · on May 16, 2023

This is slick -- It's not explicitly documented anywhere but I hope OpenAI has the necessary callbacks to terminate generation when the API stream is killed rather than continuing in the background until another termination condition happens? I suppose one could check this via looking at API usage when a stream is killed early.

tuchsen · on May 16, 2023

Yeah I did a CLI tool for talking to ChatGPT. I'm pretty sure they stop generating when you kill the SSE stream, based on my anecdotal experience of keeping ChatGPT4 costs down by killing it as soon as i get the answer I'm looking for. You're right that it's undocumented behavior though, on a whole the API docs they give you are as thin as the API itself.

killthebuddha · on May 16, 2023

I'm skeptical that the streaming API would really save that much cost. In my experience the vast majority of all tokens used are input tokens rather than completed tokens.

boywitharupee · on May 19, 2023

Any new call to the API is considered fresh. I don't believe your session is saved.

newhouseb · on May 19, 2023

We're talking about the streaming API which streams generated text token by token, not the normal one-shot API. I have no insider knowledge but would agree with your intuition on the normal API.