I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
The issue you’re missing is that the abstraction Dropbox/OneDrive/etc provide is not that of an NFS. When an application triggers the download of a file, it hydrates the file to the local file system and keeps it there. So if Backblaze triggers the download of a TB of files, it will consume a TB of local file system space (which may not even exist).
It does keep them permanently. Dropbox is not a NAS and does not pretend to be one.
> When you open an online-only file from the Dropbox folder on your computer, it will automatically download and become available offline. This means you’ll need to have enough hard drive space for the file to download before you can open it. You can change it back to online-only by following the instructions below.
Same exact behavior for OneDrive, though it apparently does have a Windows integration to eventually migrate unused files back to online-only if enabled.
> When you open an online-only file, it downloads to your device and becomes a locally available file. You can open a locally available file anytime, even without Internet access. If you need more space, you can change the file back to online only. Just right-click the file and select "Free up space."
This "let's not back up .git folders" thing bit me too. I had reinstalled windows and thought "Eh, no big deal, I'll just restore my source code directory from Backblaze". But, of course, I'm that kind of SWE who tends to accumulate very large numbers of git repositories over time (think hundreds at least), some big, some small. Some are personal projects. Some are forks of others. But either way, I had no idea that Backblaze had decided, without my consent, to not back up .git directories. So, of course, imagine how shocked and dismayed I was when I discovered that I had a bunch of git repositories which had the files at the time they were backed up, but absolutely no actual git repo data, so I couldn't sync them. At all. After that, I permanently abandoned Backblaze and have migrated to IDrive E2 with Duplicati as the backup agent. Duplicati, at least, keeps everything except that which I tell it not to, and doesn't make arbitrary decisions on my behalf.
This is exactly why random corporations need to be gone from government. Or copyright needs to be abolished, one of the two. No corporation (no matter how beloved) should ever have this kind of power. IMO the more powerful an organization becomes, the deeper the scrutiny should be.
This distinction is good in academic circles and similar (like on here). But the public (and ordinary people who aren't people who regularly visit Hacker News -- or even know that Hacker News exists) don't care. To them, AI == inequality and inequality accelerants, because it is funded and run by the richest, most powerful people on Earth. And those very people are making everything worse for all but them, not better. Nobody is going to care about academic distinctions in such circumstances.
It's because the consequences of AI is so direct and obvious, and also faster, where the inequality and job losses from other tech advances are just less direct.
That is, it's not hard to see why so many main streets in smaller towns have boarded up retail stores since you can now get anything in about a day (max) from Amazon. But Amazon (and other Internet giants) always played at least semi-plausible lip service that they were a boon to small fry (see Amazon's FBA commercials, for example). But you've got folks like Altman and Amodei gleefully saying how AI will be able to do all the work of a huge portion of (mostly high paying) jobs.
So it's not surprising that people are more up in arms about AI. And frankly, I don't think it really matters. Anger against "the tech elite" has been bubbling up for a long time now, and AI now just provides the most obvious target.
Does economics or political theory focus on centralization, practically speaking? Not as a normative claim. What the actual effects are like. It just feels like we're at a centralization of power of unprecedented scale, to the point where no previous theories or models could really apply (in order to make analytical progress - I mean sure feudalism is honestly becoming a scarier and scarier analogy but still, there are significant differences)
I'm pretty much only thinking about these kinds of problems at my job at this point, so this is important to me in that regard
I would happily switch to it in a heartbeat if it was a lot more well-documented and if it supported even half of what CMake does.
As an example of what I mean, say I want to link to the FMOD library (or any library I legally can't redistribute as an SDK). Or I want to enable automatic detection on Windows where I know the library/SDK is an installer package. My solution, in CMake, is to just ask the registry. In XMake I still can't figure out how to pull this off. I know that's pretty niche, but still.
The documentation gap is the biggest hurtle. A lot of the functions/ways of doing things are poorly documented, if they are at all. Including a CMake library that isn't in any of the package managers for example. It also has some weird quirks: automatic/magic scoping (which is NOT a bonus) along with a hack "import" function instead of using native require.
All of this said, it does work well when it does work. Especially with modules.
I prefer Godot over Unity honestly. Not just because the engine feels better but because it's accessible, which is what matters to me. Unity isn't and probably never will be, so meh. Sure you can make accessible games in it but the editor itself isn't accessible so it kinda defeats the point of being able to make accessible games in it in the first place. And don't even get me started on Unity's licensing model. Godot's superior C# support is, IMO, just a cherry on top.
IMO (not the GP) but if Anthropic were my friends I would expect them to publish research that didn't just inflate the company itself and that was both reproduceable and verifiable. Not just puff pieces that describe how ethical they are. After all, if a company has to remind you in every PR piece that they are ethical and safety-focused, there is a decent probability that they are the exact opposite.
Honestly, I would be all for outright abolishing arbitration. I have yet to see anything actually good come out of arbitration other than a ruling that protects the entity that forced arbitration in the first place.
Arbitration does help with the problem of overwhelmed and expensive courts. What is needed is fair arbitration.
The outcome should approximate the outcome of the full court proceeding.
Make the arbitration rulings appealable in court on the basis of factual errors, errors of law, corruption, and potential errors by omission (i.e. failure of discovery). And make the company responsible for the full costs of the litigation if the arbiter's judgement is overturned. And punish the arbiter, perhaps a 2 year ban on accepting any case from that industry.
I'm sure more adjustments would be needed, but it should be possible to get both the arbiters and the companies to want arbitration to be a faster, cheaper route to the same outcome as the courts, rather than a steamroller that avoids all accountability for the company.
I want the court result for free and instantaneously.
Barring that, faster and cheaper is better.
Simply limiting discovery, counterbalanced by loosened rules of evidence, followed by allowing specialist arbiters and avoiding the multi-year wait for a court proceeding seems to be faster and cheaper. There is a small error introduced by allowing discovery of 1,000 pages of emails instead of 100,000, and by allowing hearsay or affidavits, but probably most disputes would not strictly depend on deposing a dozen people and interpreting the 23rd box of company documents.
Then fix the court system? Create more courts, hire more judges/clerks... I mean, I know it isn't "as simple" as that, but that's the proper solution instead of creating a half-legal half-favoritist system where a company can force you into arbitration where, more often than not, the arbitrator is paid by the company, and therefore rules in it's favor.
Or reduce the demand on the legal system? Just adding more expense to an outright broken thing isn't an actual fix. It's a half-measure patch at best. And no, I don't mean create a workaround like arbitration.
Why do court cases take so long and suck up so many resources? Start with that. Perhaps reduce the amount of legislation/laws/etc. on the books, and write laws that limit the litigious society we find ourselves living in.
That is of course easier said than done, but we've chosen this path and can choose to unwind it if we have enough desire to.
I mean... You could? AI comes in all kinds of forms. It's been around practically since Eliza. What is (not) here to stay are the techbros who think every problem can be solved with LLMs. I imagine that once the bubble bursts and the LLM hype is gone, AI will go back to exactly what it was before ChatGPT came along. After all, IMO it's quite true that the AIs nobody talks about are the AIs that are actually doing good or interesting things. All of those AIs have been pushed to the backseat because LLMs have taken the driver and passenger seats, but the AIs working on cures for cancer (assuming we don't already have said cure and it just isn't profitable enough to talk about/market) for example are still being advanced.
I agree on that part as well, but saying that AI will go back at what it was before ChatGPT came along is false. LLM will still be a standalone product and will be taken for granted. People will (maybe? hopefully?) eventually learn to use them properly and not generate tons of slop for the sake of using AI. Many "AI companies" will disappear from the face of Earth. But our reality has changed.
LLMs will not be just a standalone product. The models will continue to get embedded deep into software stacks, as they're already being today. For example, if you're using a relatively modern smartphone, you have a bunch of transformer models powering local inference for things like image recognition and classification, segmentation, autocomplete, typing suggestions, search suggestions, etc. If you're using Firefox and opted into it, you have local models used to e.g. summarize contents of a page when you long-click on a link. Etc.
LLMs are "little people on a chip", a new kind of component, capable of general problem-solving. They can be tuned and trimmed to specialize in specific classes of problems, at great reduction of size and compute requirements. The big models will be around as part of user interface, but small models are going to be increasingly showing up everywhere in computational paths, as we test out and try new use cases. There's so many low-hanging fruits to pick, we're still going to be seeing massive transformations in our computing experience, even if new model R&D stalled today.
> Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.
Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.
Data theft of service or piracy from the web and "AI" users content are used in the model training sets, and when codified the statistical saliency is significant if popular content is present.
For example, when an LLM does a vector search, there is a high probability of pirated content bleed-though and isomorphic plagiarism in the high dimensional vector space results. Thus, often when you coincidentally type in "name a cartoon mouse", there is a higher probability Disney "Micky Mouse" will pop out in the output rather than "Mighty Mouse". Note Trademarks never expire if the fees are paid, and Disney can still technically sue anyone that messes with their mouse.
Much like em dashes "--", telling the current set of models to stop using them inappropriately often fails. Also, activation capping is used to improve the models behavioral vector, and have nothing to do with the Anthropic CEO developing political ethics.
LLM are useful for context search, but can't function properly without constantly stealing from actual humans. Thus, will often violate copyright, trademark, and patents. In a commercial context it is legally irrelevant how the output has misappropriated IP, and one can bet your wallet the lawyers won't care either. No, IP is not public domain for a long time (17 to 78 years) regardless of peoples delusions, even if some kid in a place like India (no software patents) thinks it is..
This channel offers several simplified explanations of the work being done with models, and Anthropic posts detailed research papers on its website.
Many YC bots are poisoning discourse -- so this thread will likely get negative karma. Some LLM users seem to develop emotional or delusional relationships with the algorithms. The internet is already >52% generated nonsense and growing. =3
The quoted content said that "Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now." I was explicitly asking how this meshed with my understanding of copyright, at least in the United States, which requires that a work of authorship be authored by a human and not by a machine; where a work is not authored by a human, copyright protection does not subsist, and therefore the respective work is in the public domain. And I was further asking for an explanation as to how including a work that is AI-generated (aka in the public domain) made "... that aggregate body becoming less free". Unless my understanding of copyright law and court precedent is massively off the mark, I am confused as to how less freedom is aforded in this instance.
The precedent case in the US formed a legal consensus that "AI" content can't be copyrighted, but it may also contain unlicensed/pirated IP/content.
Thus, one should not contaminate GPL/LGPL licensed source code with such content. The reason it causes problems is the legal submarines may (or may not if they settled out of court with Disney) surface at a later date, as the lawsuits and DMCA strikes hit publishers.
It doesn't mean people won't test this US legal precedent, as most won't necessarily personally suffer if a foundation gets sued out of existence for their best intentions/slop-push. =3
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
reply