> One thing I recommend for engineers new to on-call is to avoid rushing: take a few breaths before joining the call or before speaking, and in general try to “think in slow motion”.
The more difficult it becomes to remain silent, the more likely it is that silence is the correct action.
Ego is a hell of a drug. I've been on some calls where things go sideways in prod infrastructure simply because of incessant yapping about things that are tangentially related to the tasks at hand.
Being clever tends to be a lot easier than being the other things, especially now that we have chatgpt and friends.
> I think the sound quality of modern-day bluetooth speakers is really good.
The sound quality out of the speakers of some Apple products seems borderline impossible to me. The MacBook in particular makes me feel like I missed an important DSP lecture at university.
If you are working with a web application, playwright + frontier LLM is incredibly capable. They added some recent features to make this sort of use case go a lot smoother:
If you set this up correctly, you can have a main agent issue natural language testing instructions to this playwright agent which returns a natural language summary of what it experiences. This is the sort of thing where I begin to get interested in the idea of agents working while I sleep.
Some of these plants are being retrofitted for dual firing. They can burn coal & natural gas at the same time. I wonder how that factors into these statistics.
The full dynamic range is nice if you actually want to experience it and have a system capable of reproducing it. A dedicated center channel with a few hundred watts of amplification behind it will cut through the ambient backdrop like a hot knife through butter. You can watch Transformers or MI3 at reference volume with crystal clear dialogue if you're willing to throw enough power at the problem.
What really would solve the movie issue is there was more standardised sound across different streaming services. Every single seems to have a different volume and compression / setup.
That and having an industry standard way to crank the center channel (user setting) when downmixing to 2.1
I have a decent[1] system with a dedicated center channel. Everybody complains that the mix is too loud if we tune for audible dialog on anything made in the past decade or so (MI3 bluray is fine, and I suspect that Transformers would be too).
1: Powered by a Denon AVR, not separates if you want to "No true cinephile" me.
> users may experience more false positives as we refine these classifiers to respond to new threats. We are working to reduce these as fast as possible.
Getting a really strong capacity issue vibe here. Reframing it as a safety issue could burn a lot of trust if this turns out to be another lie. I hope they've done their math on this one.
I have a really strong suspicion that there is something different about OAI prepaid tokens in the API vs elsewhere. I've been able to get away with spending less than $150/m on average while many peers are hitting 10x that.
I am curious how many on HN have manually configured their copilot install with a custom OAI token for 5.4/5.5. In my experience, the performance difference over the built in subscription models is immense. This setup tends to solve the problem so quickly and reliably that any desire to have it run while I'm asleep seems absolutely ridiculous. The performance is constant throughout the day and week.
I think what might be happening is that we are chasing the cost optimization rabbit a little bit too hard. Capability is weird dimension to quantify. A weaker model is not weaker in a linear way. It's usually this incredibly tall brick wall of a discrete go/no-go. If the model can't do the task, it doesn't matter how cheap the tokens are. Something approaching the inverse is also largely true.
Focus on the capability (is this giving my customer what they want) instead of the cost, and you will likely find that the cost never reaches a threshold where you even begin to worry about it. Starting from a position of cost optimization tends to spiral into a dark place.
The point I'm trying to make is the reason a lot of people are resorting to the 24/7 Ralph loops is because they're using weaker models that need an incredible number of attempts to make any progress. The Death Star has different game theoretic implications. You probably don't need it to be lasering entire planets while you sleep, assuming the laser system actually works as advertised. I've never had a copilot run that took so long that I had to get up from my PC. Maybe 10 minutes. What the hell can run for 24 hours and still converge in a meaningful way?
I run totally unprotected with gpt5.4/5. I've been through thousands of dollars worth of API tokens through both copilot and custom harnesses that have local admin and arbitrary powershell access. I've never seen anything that could even remotely be construed as malicious.
I see a lot of people making a really big deal about safety and sandboxing while I'm busy getting shit done. If you can't handle your current source code checkout getting screwed up by a bad prompt, that's on you 1000%. Source control is the answer for anything information over time.
Unless you intentionally try to make a scene, these models aren't going to go fuck with your system shell or do anything you couldn't recover from in a few minutes. Connecting chatgpt to the enterprise sql server as sysadmin is not what I'm advocating for. This is another example of "on you, not the AI". There's a tiny amount of nuance you can apply at the edges that makes it easy to allow broad access with negligible risk.
"I think they are lying to you"
https://youtu.be/zfYsSFY4l18
reply