Yes, people ought to do a side by side comparison of a new user learning to K8S v AWS v GCP before claiming Kubernetes adds more complexity than it returns in benefits.
Remember the first time you saw the AWS console? And the last time?
Because it is hard to manage the configuration. It's why tools like terraform exist.
Anecdote. I worked for a small company that was later acquired. It turned out one of the long time employees had set up the company's AWS account using his own Amazon account. Bad on it's own. We built out the infra in AWS. A lot of it was "click-ops". There was no configuration management. Not even CloudFormation (which is not all that great in my opinion). Acquiring company realizes mistake after the fact. Asks employee to turn over account. Employee declines. Acquiring company bites the bullet and shells out a five figure sum to employee to "buy" his account. Could have been avoided with some form of config management.
> Acquiring company realizes mistake after the fact. Asks employee to turn over account. Employee declines. Acquiring company bites the bullet and shells out a five figure sum to employee to "buy" his account. Could have been avoided with some form of config management.
That is completely the wrong lesson from this anecdote.
1) The acquiring company didn't do proper due diligence. Sorry, this is diligence 101--where are the accounts and who has the keys?
2) Click-Ops is FINE. In a startup, you do what you need now and the future can go to hell because the company may be bankrupt tomorrow. You fix your infra when you need to in a startup.
3) Long-time employee seemed to have exactly the right amount of paranoia regarding his bosses. The fact that the buyout appears to have killed his job and paid so little that he was willing to torch his reputation and risk legal action for merely five figures says something.
Make sense.
This is a classic example of bad operation management leads to unexpected outcome. Could have be solved much easier with proper configuration management.
Sounds like the exact nightmare a previous employer was living. AWS' (awful) web UI convinces the faint of heart to click through wizards for everything. If you're not using version control for _anything_ related to your infrastructure...you have my thoughts and prayers.
Pretty much. The lesson learned for me was to always have version control for the complete stack including the infra for the stack. I like terraform for this. Terragrunt at least solves the issue of modules for terraform reducing the verbosity. Assume things could go wrong and you will need to redeploy EVERYTHING. I've been there.
Migration on resources taking live traffic is not an easy thing. Before migrating the traffic over to a different end-point, the time and the engineering work to make sure the new endpoints works is money and cost. Also there could be statefull data in the original account and to do a live migration of data with new data coming in at X tps is absolutely hard work.
> Besides, personally I find AWS console much easier to understand. I don't get why people hate it.
The console is fine as a learning tool for deployment/management, and for occasional experimentation, monitoring, and troubleshooting, but any IaC tool is vastly more manageable for non-toy deployments where you need repeatability and consistency and/or the ability to manage more than a very small number of resources.
Google can’t migrate if the underlying hardware fails quick enough.
I don’t think AWS has talked about live migrate, but given stability of their VMs and rareness of “we need to restart it notices”, it seems like they have something.
I've "experienced" a hard drive failure on both platforms.
On AWS, I was getting pagerduty'd because the solo gateway box was down, we couldn't ssh in so after an hour of no progress with dubugging or support, we just hit the reset button and hoped for the best. Fortunately this worked and later we where told there has a disk failure.
On GCP, I didn't even know and only discovered it in the logs when I was looking for other audit reasons. Turns out your long-running google VMs are being migrated all of the time and you had no idea. They actually have a policy / SLA around it, basically saying they refresh their entire fleet of servers every 6 weeks iirc. Honestly, if AWS is not doing something like this, I'd have increased concerns about leaky security neighbors (i.e. someone who has a VM running for multiple years without software updates. Hopefully you should be protected on shared servers, but it is software afterall)
Console is suppose to just be a web UI to quickly make a change or explore the feature. Any high quality engineering team should avoid make changes through console because it's not testable and repeatable.
For code change, use codedeploy or container through ECS. For configuration and infrastructure changes, CloudFormation should be the right tool.
How do you view all the VMs in a project across the globe at the same time?
Do you need to manage keys when ssh'n into a VM?
Do you know what the purpose of all the products are? If you don't know one, are you able to at least have an idea what it's for without going to documentation?
The have also directly opposed many efforts for Kubernetes, even to their own customers, until they realized they couldn't win. Only then did they cave, and they are really doing the bare minimum. The most significant contribution to OSS they have made was a big middle finger to Elastic search...
Of course everyone's experience is different, but in my case...
> How do you view all the VMs in a project across the globe at the same time?
I'm not sure what it's got to do with k8s? I can't see jobs that belong to different k8s clusters at the same time, either.
> Do you need to manage keys when ssh'n into a VM?
Well, in k8s everybody who has access to the cluster can "ssh" into each pod as root and do whatever they want, or at least that's how I've seen it, but I'm not sure it's an improvement.
> Do you know what the purpose of all the products are? If you don't know one, are you able to at least have an idea what it's for without going to documentation?
Man, if I got a dime every time someone asked "Does anyone know who owns this kubernetes job?", I'll have... hmm maybe a dollar or two...
Of course k8s can be properly managed, but IMHO, whether it is properly managed is orthogonal to whether it's k8s or vanilla AWS.
If you're running pods as root, you're doing it wrong. That was a no-no with docker, and it's still a no-no for kubernetes. People still run non-containerized services as root too...
This is getting off-topic, but I didn't understand the rationale behind that. Processes running inside docker/k8s are already isolated, so unless you're running something potentially malicious, why would it matter if it's root or not?
(Of course, if you're running untrusted user code, then you'll need every protection you can muster, but I'm talking about running an internally developed application. If you can't trust that, you already have a bigger problem.)
If the container is running as root, and you escape the container, you are root on the host.
Containers share the kernel with the host, and are only as isolated as the uid the process in the container runs as and the privileges you grant that container.
I've spent in total a tenth as much time learning k8s and related systems, as I have spent on AWS.
Most situations I have a direct comparison, k8s takes less ops. Often thanks to helm.
The AWS console is designed for lockin and I could use configuration management for AWS too but the time required to go through their way of doing x is just not worth it. Unless I want to become a AWS solutions architect consultant
Sure, the most asinine thing (borrowing from Rob Pike) is to have a system where invisible characters define the scope and semantics of what you are writing. Now Helm takes this one step further (and I one beyond that before saying no more to myself and discovering https://cuelang.org) and starts using text interpolation with helpers for managing indentation in this invisibly scoped language. I hacked in imports, but was like, ok, making this worse.
So there's this problem and a number of experiments are going on. One camp has the idea of wrapping data / config in more code. These are your Pulumi and Darklang like systems. Then there is another camp that say you should wrap code in data and move away from programming, recursion, and Turing completeness. This seems like the right way to me for a lot of reasons both technical and haman centric.
I've pivoted my company (https://github.com/hofstadter-io/Hof) to be around and powered by Cue. Of the logical camp, it is by far going to be the best and comes from a very successful lineage. I'm blown away by it like when I found Go and k8s.
Recently migrated my company's k8s product to Cue and it's pure bliss.
Configuration should be data, not code. Cue has just the right amount of expressivity - anything more complex shouldn't be done at the configuration layer, but in the application or a separate operator.
There is still configuration, there has to be, you've just wrapped it so much it's not visible anymore (which is even worse than Pulumi, at least they are using an existing language). You still have to express (and write) the same information...
Darklang is solidly in the Pulumi camp, that's where outsiders put it. (I have seen the insides without beta / your demo, someone with a beta account showed me around a bit)
The real problem with Darklang is they have their own custom language and IDE. What exactly are you trying to solve?
> Remember the first time you saw the AWS console? And the last time?
There was a time in between for me - that was Rightscale.
For me, the real thing that k8s bring is not hardware-infra - but reliable ops automation.
Rightscale was the first place where I encountered scripted ops steps and my current view on k8s is that it is a massively superior operational automation framework.
The SRE teams which used Rightscale at my last job used to have "buttons to press for things", which roughly translated to "If the primary node fails, first promote the secondary, then get a new EC2 box, format it, install software, setup certificates, assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing."
The value was in the automation of the steps in about 4 domains - monitoring, node allocation, package installation and configuration realignment.
The Nagios, Puppet and Zookeeper combos for this was a complete pain & the complexity of k8s is that it is a "second system" from that problem space. The complexity was always there, but now the complexity is in the reactive ops code, which is the final resting place for it (unless you make your arch simpler).
> The SRE teams which used Rightscale at my last job used to have "buttons to press for things", which roughly translated to "If the primary node fails, first promote the secondary, then get a new EC2 box, format it, install software, setup certificates, assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing."
If I understand this correctly, all of the things could have been automated in AWS fairly easily .
"If the primary node fails" Health check from EC2 or ELB.
"get a new EC2 box" ASG will replace host if it fails health check.
"format it" The AMI should do it.
"install software, setup certificates" Userdata, or Cloud-init.
"assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing" This could be orchestrated by some kind of SWF workflow if it takes a long time or just some lambda function if it's within a few mins.
Ansible.
You can keep your YAML and deploy actual virtual servers on your cloud provider.
Kubernetes is an introvert and this doesn't correspond to anything but it's padded cell walls.
In Ansible it's an extrovert, an exoskeleton.
Kubernetes insides out makes it right again.
Make sense?
No, not following. I understand how master based provisioning management systems had their time and place, but we've largely moved beyond that to baking images, whether containers or VMs. Running a master based system comes with a whole house of other issues. Ansible is now relegated to a better system than based for installing packages and configuring the baked image. The time of booting a vanilla imstamce and then installing software when a scaling events happens is over.
By the way, what does ansible do to help with scaling applications?
Why do a comparison? K8S runs on AWS and GCP. They have managed services for setting up one. If you know K8S as a developer, then you simply consume the cloud K8S cluster.
I think the point is that there are people that claim that k8s adds a ton of complexity to your environment. But if you compare k8s alone with managing your infrastructure using (non-k8s) AWS or GCP primitives, you'll find that the complexity is similar.
The problem is that what the nodes in AWS are doing is fairly transparent. When my kubernetes pod does not come up, it’s always a hell of a pain figuring out why from just the events that kubernetes is giving me.
While that's true on the managing instances side, you also need to actually deploy the infrastructure to manage them (If you're not using some PaaS offering). You don't need to do this for other IaaS.
Honestly the last time I looked at k8s was like 5 years ago, but back then it looked like a pretty big pita to admin.
The last 5 years have been transformative for both cloud native development and also open source software
It is a completely different world that stretches far beyond Kubernetes, though I attribute much of the change to what has happened from / around k8s -> cncf
It's so easy, I can launch production level clusters is 15 minutes with four keystrokes and make backups and restore to new ephemeral clusters with a few more simple commands
> but back then it looked like a pretty big pita to admin
- well it's also a pita to update services without a downtime.
- and it sucks to update operating systems without a downtime.
- sometimes you reinvent the wheel, when you add another service or even a new website
however with k8s everything above is kinda the same, define a yaml file, apply it, it works.
and also k8s itself can be managed via ansible/k3s/kops/gke/kubeadmin/etc...
it's way easier to create a cluster and manage it.
That's exactly the point. You avoid lock-in to AWS or GCP by running on K8S instead. K8S becomes the "operating system": a standardized abstraction over different hardware.
Isn’t a “standardized abstraction over different hardware” just... an actual operating system? Isn’t “the operating system of the cloud” just... the actual operating system running on your cloud VMs? If you script a deployment environment atop vanilla Linux distros (e.g. ansible), you also avoid public cloud lock-in. (Side benefit: you also avoid container engine lock-in, and container layer complexity!)
Containers are a standard abstraction over the operating system, not over the hardware (or the VM, even). This has its use cases, but making it “the standard” for deployment of all apps and workloads is just bananas, in my view.
Kubernetes, when viewed as the OS for the data center, controls, manages, and allocates a pool of shared resources. When I install and run an application on my laptop, there are a ton of details I don't care about that just happen magically. Kubernetes maps this idea onto the resources and applications of the cloud.
Again, Kubernetes is far more than just deploying, running, mad scaling an application. It allows so many problems to be solved at the system level, outside of an application and developers awareness.
Take for example restricting base images at your organization. With Kubernetes, SecOps can install an application which scans all incoming jobs and either rejects them, or in more sophisticated setups, hot swaps the base image
Remember the first time you saw the AWS console? And the last time?