Hacker Newsnew | past | comments | ask | show | jobs | submit | sohex's commentslogin

I first read this in the jargon file when I must have been… 10? 12? And it’s been stuck in my head ever since.

Sonnet, GPT-5.2, Gemini Flash, in a set of 21 games, where conclusions are drawn from the LLMs self reported reasoning.

This is like writing a paper about kids in a literal sandbox fighting over ‘territory’.

The models employed don’t indicate the actual extents of machine reasoning even as we currently recognize them. They certainly don’t have the metacognition necessary to accurately understand their own reasoning. As we’ve seen with recent papers on how LLMs do math there’s a complete disconnect between actual and reported mechanism.

“Chilling” shouldn’t be the take away here.


So in the conext you just laid out, you can apply that to this. "Artificial Intelligence Strategy for the Department of War" https://media.defense.gov/2026/Jan/12/2003855671/-1/-1/0/art...

regardless of what the capabilities of the models are, they will be used in every situation possible.


> “Chilling” shouldn’t be the take away here.

It is when you consider the personality currently occupying the office of US SecDef.


LLMs have already been used to bomb school girls, chilling is absolutely the operative word to use here. Especially since these delusional fools want to incorporate LLMs into everything.

Forgive my ignorance, but were LLMs involved in that decision? I don't remember hearing anything to that effect, but we're so bombarded by news these days I guess I could just be forgetting

Perhaps not in that one, but in plenty more: https://www.972mag.com/lavender-ai-israeli-army-gaza/

Yes our government purportedly used technology to work up a list of targets in the Iran debacle as well just not with a LLM a distinction that to me just isn't that meaningful

https://www.theguardian.com/news/2026/mar/26/ai-got-the-blam...


I find it hard to imagine that no AI was involved. The people who know how it happened aren't interested in saying.

At very least it comes from AI-like thinking. A human life has no value to an AI


Is it really though? Now admittedly I’ve never worked with LexisNexis or Westlaw, but would it really be that difficult to have a tool just check if the citations actually exist?

You also have to read them and check whether they claim what the document claims. Just checking whether it goes somewhere is not enough.

This whole thing is requiring people to do exactly the thing they are bad at - controling output of something else while being passive and not loosing attention.


Westlaw is very expensive

IIIRC in terms of clients mutt (&co) will actually handle “@“ in the local part correctly.

> But the real reason I do that is just because I just like to sit in anger whenever this breaks the user experience because of programming errors or inconsistencies.

Genuinely delighted by the fact that I’m not alone in that.


Namespace collision

Well I have a new favorite website. I don’t know the last time that I read something that was so thoroughly and multidimensionally my shit.

Not actually measuring crispness when you self report having the perfect equipment to do so is a cruel, cruel tease though.


Hahaha, glad you're enjoying it as much as I did! I will have to add a crispness slider and then validate it with expensive audio equipment.

Shout out to the best named feminist group of all time, W.I.T.C.H. The Women’s International Terrorist Conspiracy from Hell.

This has a serious case of in-housing the edge cases that I don’t think many would want to pay the price for.

The problem with DNS per the haiku isn’t that it’s difficult to understand, or even that running your own DNS server is particularly difficult. It’s that coordinating information and exchange at scale is a tricky problem with a lot of non-obvious edge cases and foot guns.

So trying to reduce complexity by sidestepping DNS really doesn’t do that, it just leaves you holding the bag on all the problems that DNS was quietly solving for you in the background.


My interpretation would be that he feels his wife is incredibly loving in a quantity he isn’t able to match (degree) and in a unique way he’s not able to match (kind). General life experience plus the fact that he wrote that tells me he’s probably wrong and his wife would probably say the same about him, but that’s just speculation.

I’ve had one experience with Claude Code so far that genuinely frustrated me, but it did it to such a degree that it wrapped back around to hilarity. It tried to run a command got an error, realized it needed to cd to a different directory first, and then… didn’t do that.

It tried itself several times going “oh, I didn’t actually cd, let me add that and try again”. I tried correcting it several times “you MUST begin the command with `cd dir &&`”. There were a lot of variations back and forth to try to coax out the correct tool call. Including backing up the conversation and trying from earlier in the context.

It refused. Every time. It simply would not include the cd. Genuinely unhinged behavior.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: