Sonnet, GPT-5.2, Gemini Flash, in a set of 21 games, where conclusions are drawn from the LLMs self reported reasoning.
This is like writing a paper about kids in a literal sandbox fighting over ‘territory’.
The models employed don’t indicate the actual extents of machine reasoning even as we currently recognize them. They certainly don’t have the metacognition necessary to accurately understand their own reasoning. As we’ve seen with recent papers on how LLMs do math there’s a complete disconnect between actual and reported mechanism.
LLMs have already been used to bomb school girls, chilling is absolutely the operative word to use here. Especially since these delusional fools want to incorporate LLMs into everything.
Forgive my ignorance, but were LLMs involved in that decision? I don't remember hearing anything to that effect, but we're so bombarded by news these days I guess I could just be forgetting
Yes our government purportedly used technology to work up a list of targets in the Iran debacle as well just not with a LLM a distinction that to me just isn't that meaningful
Is it really though? Now admittedly I’ve never worked with LexisNexis or Westlaw, but would it really be that difficult to have a tool just check if the citations actually exist?
You also have to read them and check whether they claim what the document claims. Just checking whether it goes somewhere is not enough.
This whole thing is requiring people to do exactly the thing they are bad at - controling output of something else while being passive and not loosing attention.
IIIRC in terms of clients mutt (&co) will actually handle “@“ in the local part correctly.
> But the real reason I do that is just because I just like to sit in anger whenever this breaks the user experience because of programming errors or inconsistencies.
Genuinely delighted by the fact that I’m not alone in that.
This has a serious case of in-housing the edge cases that I don’t think many would want to pay the price for.
The problem with DNS per the haiku isn’t that it’s difficult to understand, or even that running your own DNS server is particularly difficult. It’s that coordinating information and exchange at scale is a tricky problem with a lot of non-obvious edge cases and foot guns.
So trying to reduce complexity by sidestepping DNS really doesn’t do that, it just leaves you holding the bag on all the problems that DNS was quietly solving for you in the background.
My interpretation would be that he feels his wife is incredibly loving in a quantity he isn’t able to match (degree) and in a unique way he’s not able to match (kind). General life experience plus the fact that he wrote that tells me he’s probably wrong and his wife would probably say the same about him, but that’s just speculation.
I’ve had one experience with Claude Code so far that genuinely frustrated me, but it did it to such a degree that it wrapped back around to hilarity. It tried to run a command got an error, realized it needed to cd to a different directory first, and then… didn’t do that.
It tried itself several times going “oh, I didn’t actually cd, let me add that and try again”. I tried correcting it several times “you MUST begin the command with `cd dir &&`”. There were a lot of variations back and forth to try to coax out the correct tool call. Including backing up the conversation and trying from earlier in the context.
It refused. Every time. It simply would not include the cd. Genuinely unhinged behavior.
reply