"Don't do X, don't break the rule, you're doing it wrong"... that's why people don't write tests (well, maybe not the main reason, but that was my case at least).
Do whatever that will increase confidence in code and testability. If I want to make it easier for me to develop, I'm gonna do anything that will enable me to get shit done. Test takes 0.1 second instead of 0.001? Whatever. I can improve it if needed, when I get more knowledge or I can discard it completely.
When I stopped caring about 'should I use mock, or inject closure, or interface...' I started writing more tests and producing better software. Because often the end goal is about writing clearer code even with shitty tests.
Being a gatekeeper from unit testing is the last thing I aspire to be, but past me (and that’s my imaginary audience) would’ve loved this explanation before he got some projects into mocking hell.
I’m the first to say that if you have no tests and no idea how to get start, go with end-to-end tests first and take it from there. That doesn’t mean that guidelines that help you to have less brittle and more idiomatic unit tests are bad per se IMHO.
I have an idea and I know how to get started. Not doing end-to-end because I also do unit tests and they improve my API. It's just that they're not perfect, especially duration-wise.
My point is that I don't want get discouraged by being confused about best practices. Most of the software suffers from any testing, and many juniors read articles like these. Seniors do YOLO, correctly or their own way.
Bit offtop, not debunking the content of this post though!
I have personally worked on a team where velocity collapsed under the weight of the test suite.
There were lots of brittle mocks. I think we even did mock out the Ruby equivalent of the HTTP client.
Every change would inevitably break tests (not functionality!) and developers were spending more than half their time wrestling with the test suite.
All of us could have used this article.
Please always strive to improve your practice. Commit things that aren’t perfect, but try to internalize best practices over time. Gradually raise your own code quality bar as you get better.
Learning from posts like the OP is super important if you don’t have a good senior Eng on your team. A group of smart juniors can easily code themselves into a hole of left to their own devices. I know this, because I did this.
FWIW I’m a former senior-less junior that programmed himself into a hole. My blog and conference talks are exactly the material that I’d loved to have had ~10 years ago.
Not sure if I’d have been smart enough to take my advice though. ;)
>Every change would inevitably break tests (not functionality!) and developers were spending more than half their time wrestling with the test suite.
I find that this invariably happens because the team tries to test implementation at a very low level rather than behavior at a high level.
I blame an overemphasis on unit tests over integration testing (e.g. J B Rainsberger's inane rant) and tutorials that imply that, for example, if you build a class you have to have some tests for that class.
At a previous company, someone established a "rule" that every "new function or method" needed its own test. Unless you lucked out with a lax reviewer (like me!) , your code would not pass review if this wasn't followed. Meanwhile, there were basically zero integration or e2e tests. People spent more time adding and maintaining useless tests...
These sound like checksum tests. If you touch anything, they fail. That's all they do now, alert you that yes, you touched that code, now go figure the new checksum for the next poor soul.
Agreed. One thing Juniors don't get (and some seniors) is when to test. I see them writing tests on business logic and have to correct them regularly. I've had to sit down and explain to 3rd party technical resources that they're foolishly asking for more technical debt by asking for certain tests.
The other thing Juniors don't get is prototyping. I regularly build something quickly to get the problem set into my head. If it's good enough, I'll leave it. If it's got issues, I'll throw it away and start fresh now that I know what I'm solving. Juniors don't work fast enough or effective enough to do things like that without putting timelines at risk. Plus, they frequently want to do some cargo cult methodology of the week like TDD.
The former. Business logic changes regularly. New phases of projects frequently require changes that break the unnecessary business logic test code.
Test code should fail when an important underlying mechanism regresses. If it always fails on the next phase of the project because some hapless idiot needed to cover every line, the signal to noise ratio gets weaker and the purpose of the test code is lost.
Thanks for the clarification -- that's what I suspected, but thought I must be misreading or misunderstanding, but I also don't know your situation and trust you're making the right set of trade-offs here. If your system is today a cuttlefish, yesterday a cow, and tomorrow a crab, cow tests don't do you much good. That sounds like a crazy environment, and I'd maybe have a care that those hapless idiots are sane people in an insane place.
Counterpoint, I've not seen seniority affect these outcomes at all. Most of the time I see some management type or CTO heavily push the "code coverage / test everything" narrative which then rolls down the hill. Strategy did not seem at all correlated with seniority.
Same goes for prototyping. I don't see much of a correlation between willingness to prototype and seniority, either.
I worked at a place that had a "increase test coverage" dogma. If you ran out of real work, which was common due to poor management, you'd wind up with an awful "increase test coverage to 80%" ticket to keep you busy. This resulted in at least one guy quitting because he said it was a useless waste of time.
I can assure you that it is possible and meaningful to have full code coverage in certain contexts (for instance for important projects that get only rarely touched) and doesn’t inevitably lead to bad test suites.
Railing against code coverage is at this point just as dogma as insisting on it.
Not sure why you're highlighting this in a sub-thread regarding the importance of context. Surely this was already implied?
What I'm railing against is the idea that seniority is a prime indicator to the effective strategy parent comment insists on, which simply doesn't mirror my experiences. What I see is juniors picking up the habits of their superiors. They're learning this dogmatism from somewhere.
Depends on the industry and client. Some industries have a lot of consultants that don't write code but will tell you 80% is the minimum. Some contracts stipulate code coverage deliverables. Some platforms like Salesforce count lines of code covered and prevent deployments unless 75% and passing.
> Same goes for prototyping. I don't see much of a correlation between willingness to prototype and seniority, either.
Willingness is one thing, but getting yelled at for learning and doing your best is another. I never treat my Juniors like that, and encourage a 75/25 working/improving split. Still, several frequently get anxiety about how long it's taking them to clear tickets.
I've found that E2E tests make for a pretty bad case of hill-climbing where tests are concerned. To the point where I think that's the pathology of the testing icecream cone.
It's easier conceptually go to 'up' the testing tree from unit tests to E2E tests, than it is to come down. People do what they are most familiar with, and if the E2E tests are first, then they are the most familiar. They are also glitchy as hell, and I see disdain for that get applied to all tests, not just the human tragedy that Selenium turned out to be. Also the ways in which unit test habits are considered 'bad' in functional tests are either less consequential or easier to walk away from. I'm not entirely sure which it is, or if it's both, but it's definitely less painful to relearn.
We also overvalue the E2E test versus human testing. I can't count how many times someone has come to tell me a login button is broken, 20 minutes before the E2E tests fail and tell me the same thing. Computers arbitrating code regressions are half the point of CI, so if your tests aren't arbitrating, they aren't doing their job, and should be fired.
E2E have higher capex and lower opex - hard to set up well, cheap to maintain if you do.
Low level tests are the reverse - easy to get started, hard to maintain.
If youve got a really well oiled framework e2e tests are spectacular but getting to that point can vary between easy and "harder than the actual project itself" and if you half ass it (which most people do) it ends up a useless flaky mess.
I've only seen this with projects in maintenance mode. Any major functionality changes tend to land teeth-first in the E2E tests, and after a few cycles of that you start dragging your feet a bit on major changes. E2E tests tend to lock in assumptions about the structure of your application, not just the minutia (which is not without its own set of problems).
It is more difficult to get people to write maintainable E2E tests than it is getting them to write maintainable application code, so I've retreated and retrenched.
Weird Ive always found the opposite - E2E tests tend to make the fewest assumptions about the way the application is built.
In most cases we could literally rewrite the whole app in a different language and barely change the tests.
E2E tests do require a solid infrastructure though. If you have tooling problems you cant fix (e.g. you hit selenium's dark corners way too frequently because your web app is even a bit quirky) then maintainence cost can quite quickly exceed any benefit you derive.
Ive also hit this problem of "Id have to create my own equivalent of selenium for interacting with this app's interface" a few times and yeah, the required engineering effort to do it well explodes to the point where its just not worth it.
> It's easier conceptually go to 'up' the testing tree from unit tests to E2E tests, than it is to come down.
It all depends on the language and the project, but my experience is usually the opposite. It's often much easier to write an end to end test. To be able to write a unit test you usually have to architect the code to make it easy to mock, etc - and that takes significant skill for most real world projects.
E2E tests have a lot of downsides, but being challenging to write is probably not one of them.
I took it as: if you only have E2E tests then retrofitting unit tests is hard.
My experience agrees with this. Adding unit tests to untestable code (think: class has 10 dependencies that it news up being the first branching of a depdendecy graph to be untangled) requires refactors that just don’t get done with the busyness of other priorities.
I’m aware of all the problems that e2e tests have, and yet I find it more useful to have 1 test that adds something to a basket and 1 test that removes it than 100 unit tests without contract tests (which I never got the hang of TBH).
(To be fair, my projects are low on JavaScript which somewhat changes the trade-off calculations, but I haven’t seen any assertions about project types above.)
Tests are code too. Like the code being tested, they can be shitty at times, but they are still code and should be treated with the same level of care.
Which is not to say "your tests should be as DRY as a desert". That's frequently a poor fit for test code's lifecycle. But it does mean you shouldn't do stuff that wouldn't pass code review elsewhere, just because "it's just a test". Someone later has to read, understand, and maintain that code - treat it as such.
James Coplien wrote in 2014 I think a paper titled "Why most unit testing is waste" which stirred up a storm among developers. I've seen many rebuttals, and they keep cropping up every year, but most seem to agree with his core premise: unit testing is valuable for testing algorithms -- stateless functions performing a computation -- and not much else.
As the author notes, you can break this rule. As with all rules in software, you learn the rule so you know that you can break the rule.
> "Because often the end goal is about writing clearer code even with shitty tests."
If you're worried about people being afraid to write tests, this really scares people away. Nothing makes me want to write a test for my changes less than a 2000 line test file with 50 mocks.
> Do whatever that will increase confidence in code and testability.
Maybe for some strict definition of "testability", but a lot of people in the Python world use patching to circumvent anything that might reasonably be called "testability" in the pursuit of "confidence" and some puritanical desire to avoid changing the code base to make it testable. This makes everything a complete mess for anyone who comes along later to extend the code or increase test coverage, as the tests now depend on the private internals of the code and the patching logic is magical and unintuitive.
If your "testability" remark precludes patching, then I'm more bought in, but I do worry that some will read this and use it to argue for patching.
That said, experience tells me that mocking delivers a lot less confidence than using the real deal, although there are some cases (e.g., network errors) that effectively require mocks _somewhere_.
Definitely this. I ran into this at a previous company (Python environment) where every test basically relied on mucking around with the internals through patches. Any sort of integration test that relied on calling an internal service was difficult or impossible, so it had to be mocked. External services were always mocked. The code base was a pain to work with.
At at an earlier company, we had very little patching or mocking. We spun up all the internal services when running the tests. Internal calls were never mocked. External calls were sometimes mocked (if test accounts / API keys were not convenient.) We preferred testing the "real thing." Tests ran slower but in general I had much more confidence tests actually did something useful.
I like to distinguish between mocks and fakes. Mocks just return whatever you tell them to return for that test, but a fake is a simplified in-memory implementation (fakes aren't useful for testing error conditions, however, so you may need a few tests which actually do mock if you want to get more complete coverage).
Most of the time for third-party services, I'll just fake the service client for unit tests. For example, if I'm writing a BookStoreApp that has a BookStore dependency (where BookStore is an interface), the concrete implementation for which is a PostgresBookStore, then I'll write my PostgresBookStore client library with unit tests that include a real postgres connection but then my BookStoreApp tests will use a FakeBookStore (which is probably a thin wrapper around a dict, but which behaves like a BookStore should modulo persistence). Finally, I'll have considerably fewer API tests which actually do stand up the complete backend service (including a PostgresBookStore). This way I get a high degree of confidence and also speed of test execution--it's also nice not to have to stand up the entire world just to run one test (in particular, it's nice not to have to debug unrelated connection/auth issues).
Part of the reason why I like FP so much, is that you DON'T need to mock anything most of the time. Just call the damn function and compare the expected with the actual output.
Useful if you have a reference implementation that is obviously correct but slow, and a fast one where it may not be correct.
> Do whatever that will increase confidence in code and testability.
Heh. I've long felt a need to write a blog post touching this very idea, but have been too lazy to.
I think a lot of people (including younger me) would dive right in to things like unit tests without too much thought. Now I focus on:
- What should we test? (Think about the feature you are adding and why - what aspects need to work?)
- Why? should we test it?
- Should we test it? (really the same as above). Common to see junior folks end up testing 3rd party libraries.
- How should we test it? Mocks? Unit tests? Integration tests? Something else?
- Is it even testable in an automated fashion? If not, ponder over what makes it hard to test. Does that need to be rectified?
- How can we be sure we're testing what we think we are? Extremely common to see people write tests that appear to pass, but are testing the wrong thing (buggy test). It's very rare that I see a colleague intentionally introduce a bug in the feature to see if the expected test will fail.
- What is the cost of this feature having a bug? If it's very costly, you may want to have multiple types of tests, and possibly even a human testing it each release. If the cost is almost zero, and you won't need to fix it quickly if a customer reports it, and it's hard to write an automated test for it - just skip it. If this mentality bothers you, know that this is recommended by the ISO standard for SW in cars - even if a bug can end up in loss of life, if you can show it's extremely unlikely, you are conformant if you don't test it.
You should first answer these questions, and they will then guide you into whether to write unit tests, or E2E tests, or something in between. I generally find that people who dive into testing without answering the above write crappy tests (buggy, pointless, etc). Answering these can lead to better mocks (or a decision not to rely on mocks).
In summary, step back and forget everything you've learned about testing before answering these questions. Pretend you don't know the difference between unit tests, integration tests, etc. You've written (or plan to write) a feature.[1] What would you like to test in an automated fashion, and how can you do it? Feel free to use any tool that works for your project.
[1] You're not writing code. You're writing a feature.
I'm curious, as I have never asked the question prior ; What is the canonical "unit test" whatever [method|practice|technique|whatever] that one should know?
Is there a place where tests/unit-test-methodologies are discussed, or codified?
Ultimately the two goals are that changes that don't break the product to pass all the tests and changes that do break the product to fail at least one test. If you remember those two goals, you will go far.
All rules should be in service of the above. I don't know that there is a "most people agree with this" set of rules, but if you read any source on how to test, think about how each rule affects the two goals. Senior developers tend to have developed an instinct for "things like this have failed one of the two rules" but it is hard to codify that instinct into rules.
As soon as you say "canonical", you invite dogma from those with rigid opinions. It is better to integrate advice you hear with deliberate practice. If you do not see, from you own experience, why it makes sense, then discount it. But keep an open mind to changing your opinion later. Your own opinion that you understand why you believe is a million times better than regurgitating slogans that you don't.
The PRACTICAL rule is always have unit tests. Start with writing tests for bugs you actually encountered. Use whatever testing framework is popular in your current language, particularly if it ships with the language. Always try to test things in the most straightforward way possible. Add automation to run all tests regularly.
BUT keep in mind, test code is code. You're now writing code twice, once for the code, once for the test. Either or both can be wrong. In the future, both may have to change. Tests improve reliability, but are not free.
Thank you (albeit condescending) for the answer...
---
What is the PRACTICAL learnings of CREATING tests?
Meaning, how does one succeed at learning to employ the same sentiment above usch that one does well from ground zero?
(integrate tests into your design, but what does one need to know in order to do such?) [ELINSW: Explain like im a newbie software engineer]
----
I understand the philos... I am talking about IS THERE A LEARNING CHANNEL TO BECOME AN EXPERT AT DEVELOPING UNIT TESTS, or is it something that is lit. learned on job?
One of the reasons I wrote the linked article is that the wisdom around is quite fragmented and one (there’s always more than one) frustrating answer is good design makes software testable: “the deep synergy between testability and good design”: https://m.youtube.com/watch?v=4cVZvoFGJTU
Writing unit tests for software that is testable is easy. The hard part is to write testable software. But then there’s people who will argue that making software testable often leads to “test damage”.
Part of the reason why it is so fragmented is that testing was pushed heavily by people who were all for more testing, while a lot of practical knowledge was locked up in the heads of people who spent more time being pragmatic and less being dogmatic. To the extent that pragmatism didn't support testing dogma, therefore, it got left in isolated pieces and didn't become part of the dialog.
If you listen to Kent Beck, Ron Jeffries, Martin Fowler and so on, you could be pardoned for the impression that unit testing is an idea invented by Kent Beck in 1997 with JUnit, that has taken the world by storm since.
That's complete and utter BS. Perl 5.0, released in 1994, came with Test.pm, which made it easy to write tests. Better yet, when you installed from CPAN it would BY DEFAULT run the unit test suite to be sure that your environment worked as expected. (This is a critical idea that I wish had been adopted by other repositories...)
This was Larry Wall encouraging people to follow the practices that he did. Perl 1.0, released in 1987, came with a unit test suite that ran by default before you installed it. In fact he knew the technique because he'd adopted it with his best known previous program, rn. And he knew to do that because the practice was already accepted.
Of course the idea was not original to him. See http://secretsofconsulting.blogspot.com/2008/12/how-we-used-... for the reminiscences of someone who was doing unit testing back in the punchcard era. He should know. He was Manager of Operating Systems Development for Project Mercury (the first NASA program to put humans in space). And THEY were doing unit testing back in the late 1950s, early 1960s.
But, despite how widely known all of this was, Kent Beck reinvented it. And a dogma quickly emerged about this "new idea". Which included some good ideas and some terrible ones.
Extensive use of mocking I would consider a terrible idea. That it was terrible was obvious to me the first time I saw it. Replacing an external dependency with a stubbed out implementation is often an excellent idea. But mocking is a horrible way to write that implementation. I'd far, far rather deal with tests that create a SQLite database and test against than than something that does extensive mocking of what they think a database should do.
That said, I would never trust a learning channel like the one you describe. I've learned the hard way that people who consider themselves experts on unit testing tend to believe that more tests are always better, and have no idea of the impact of excessive unit tests on a codebase. So a channel like the one you describe is going to attract people whose opinions I'm suspicious of.
Instead aim to be expert at developing software. Unit testing will prove to be an important component. But nothing about unit testing makes sense when separated from the development that you are attempting to accomplish.
I've seen plenty of scenarios when a test has also given a false sense of security. It's not actually testing anything important, but looks like it does and should be, but doesn't actually catch defects. This is also a case of no tests being better as at least everyone knows something's untested :/
There's a pretty significant problem here: with this approach, you lose test coverage of the code within your client wrapper. Namely, there's code like this which is not being tested: `.json()["repositories"]`
Now, one might argue, those calls are so trivial that it doesn't need coverage, right? But what if you were trying to get the results from a nested key as a list and also check for an error status etc. etc.? Or what if this is XML deserialization with certain assumptions baked into the deserialization code? You could easily add complexity that you think your tests are covering, but they aren't.
I prefer instead to use https://vcrpy.readthedocs.io/en/latest/ which helps you to create a fixture of what you don't own, and hooks every possible requests/http/urllib system to ensure you're not leaking a hit to a production service on any but your first test run. Tests even need to worry about mocks in most cases, they just need to be wrapped in a simple annotation.
In a way, it's precisely the opposite of the OP's conclusion: mock what you don't own, at the lowest possible level to ensure all the code you wrote is covered. But use the right tool for the job!
I was going to make the same observation, and suggest the same solution, though I’d advocate for doing both things. You definitely do need at least one test getting full coverage for the facade client simply because someone could accidentally insert white space that completely breaks the logic.
Carving out a facade you own makes it really easy for all of the rest of your system to behave predictably in tests; for example what if you call an analytics API on every action? Probably easier to have UTs by default assemble your app with a MockAnalyticsClient with sensible noop mocks instead of actually wiring up the real mocks required for the library client.
I’d call VCR “integration-ish” tests. I argue it’s ok to check in a blob of API response data because (ideally) your remote API doesn’t change that often.
Ideally you do want to be able to run the integration tests periodically (nightly?) without the recorded response to make sure your assumptions about the remote api haven’t broken. That last bit is sometimes tricky to wire up.
It’s also worth noting that VCR has some nice bells and whistles to do substitution (to elide secrets) and can also store the test data in say a storage bucket, if you don’t want to set up git-LFS or put the response data in your repo. (SOAP gets expensive as you may need to store the whole WSDL for example.)
The key thing IMO is keeping the VCR out of all of your business logic tests, and confined just to integration tests targeted at the facade class. Biz logic tests should be as clean as possible and simply mocking the facade helps achieve that.
After reading further in the linked docs, I see that “verified fakes” is another way of getting coverage of the facade class.
The advantage is you never have to check in api results. The disadvantage is you can’t run them on every commit, so you learn about breaks to the facade class later. This might be a good trade-off for small clients that don’t change often. It might be a bad trade off for fairly thick/logic-heavy remote-API clients that change often or are under active development. (I’ve worked with one payments/forex API that would meet this description).
Since the example given in the article is an integration test, it does not prevent using an HTTP replay server to simulate making real calls to the service within end-to-end tests.
I think the author even linked to some of his favorite examples for using an HTTP test server to replay responses to requests.
We started employing this pattern in our code at work recently. Using the Facade pattern to create a simpler interface around a couple of things:
- Feature Flag service, wrapping LaunchDarkly
- Remote Storage service, wrapping S3
The application code uses the simplified services, with little knowledge of the underlying providers (e.g. S3 or LaunchDarkly). The tests use a test implementation of the simplified services.
This pattern simplifies both the application code and test code and is an all-around win.
It's been a game changer personally, and made me excited about software architecture and coding.
Testing generally can be either a great or painful experience. The more mocking you have to do (or setup for your testing environment) the more painful it is.
I try and avoid mocking, or mock as low as possible. Ideally mocking only happens at the HTTP level for network requests and APIs. Having to mock internal functions (that don't make network requests) creates a frustrating experience.
I also find using typescript a great way of writing little unit tests. Making sure types line up, especially when types are highly relational, is exceptionally valuable.
Hey, I’m the author and happy to engage in constructive exchange with everyone who read the article and has thoughts or questions
(NB the full title is ‘“Don’t Mock What a You Don’t Own” in 5 Minutes’ the principle is super old and this article is just an explanation why it makes more sense than it sounds.)
This rule is also recommended by section 9.2.4. "Only mock types that you own" of 2020 book by Vladimir Khorikov: "Unit Testing Principles, Practices, and Patterns".
Worth noting is that book also says in 9.2.1: "Mocks are for integration tests only". I agree.
In my opinion this book is the best comprehensive take on testing I ever encountered.
as opposed to what unit tests? how are you suppose to "unit test" that you properly increment a varibale before inserting into the DB without mocking the DB? I disagree, mocks are for unit tests only. mocks allow you to isolate the unit to be tested.
integration tests is when you start integrating dependencies together. so maybe you have a dummy database like an in-memory db, but you're certainly not mocking it.
system tests is where you would use the actual db.
> how are you suppose to "unit test" that you properly increment a varibale before inserting into the DB without mocking the DB?
You put the logic that increments that variable into a pure function and unit test the input/output pair. Because it is now decoupled from the database, you don't have to deal with it.
> mocks allow you to isolate the unit to be tested.
You achieve isolation instead by refactoring to a "pure functional core and mutable outer-shell" architecture. Above I gave one example of refactoring to functional core. In the cases where you still need to deal with external dependencies in unit tests, you use in-memory implementations (aka simulators) instead of mocks.
Basically there are two subtypes of unit tests - small focused unit tests, testing input/output pairs of purely functional code, and "bigger" unit tests, that check how bigger units of internal business logic collaborate together. They use the in-memory simulators to ensure the tests maintain all the properties of a good unit test: runs fast, doesn't interfere with other tests, requires zero setup, is not flaky and makes zero assumptions about the environment.
So, overall - no mocks for unit testing.
> mocks are for unit tests only.
I do think the opposite is true, with very few exceptions. Mocking in unit testing is an anti-pattern leading to brittle tests with negative value - the cost of maintaining them is way higher than any benefit they provide. Too many false positives, too little true positives, too much rework needed when code changed but no bug was introduced, too unreadable code.
> integration tests is when you start integrating dependencies together. so maybe you have a dummy database like an in-memory db, but you're certainly not mocking it.
This is the only case mocking makes sense. In integration testing you want to test integration with one external endpoint. Hence you mock or simulate all the others. In unit testing there is no need for mocking as explained above. In system (end-to-end) testing there is no need for mocking because we test how everything integrates together.
FWIW I think the database is an integral part of one's service, and should often be included in unit tests (depending on the size of the unit of course).
E.g. if you have CRUDObject and want to test GET and PUT of that...and that is the unit you want to test.. mocking away the DB is (IMO) not as good as including a real database with no data in it (they come in Docker images and making a fresh DB namespace within a running sql server for a single test can be very cheap / cheap enough).
This seems like it has it's place, but there is very high value in a Unit Test that can run with no dependencies outside of your compiler/interpreter. Especially in interpreted languages -- to catch whatever the linters may miss.
The moment you need to access the network, you depend on everything that comes with the network (outages can happen);
Need a DB -- well then you need to start it, prime it; maybe you need a container/etc.
That's just a ton of overhead to validate that your steps are working, and that your code is clean.
Maybe for a small project this works, if you only have one layer of testing.
>We need three layers of mocks to verify that an empty repositories key leads to an empty dictionary. And if I didn’t use a lambda for the json function, it would be even four layers.
Yep, but those 4 or even 5 layers ought to be mostly reusable between one codebase that mocks an HTTP call and another - if they are abstracted away this line of reasoning falls apart, coz then it is simple.
I think the "what to mock?" argument suffers from an overabundance of dogma and a lack of trade-off based decision making. This article is one of many examples of this.
"Only mock what you own, because mocking others is rude."
i once referred to a codebase as foreign, meaning (to my knowledge) not originally developed by my team or division, when justifying not writing new unit test coverage for existing code without discussion, only to realize much later that the owners were actually abroad and may have been offended by that statement.
Heheh, in retrospect that does sound like a plausible thing someone might say, "foreign code" meaning "that code written by foreigners". I wonder if there's been studies about cultural influences on code, like how Japanese programmers write code, compared to other countries, etc.
Even within the U.S., there was the whole thing about the MIT approach (do the right thing) and the "New Jersey" style (worse is better).
"What do you expect from an operating system designed and implemented in New Jersey!"
> For example if an object already does have an idiomatic API, it’s probably not worth wrapping in an identical façade, just so it belongs to you.
Ooooh my gwahh... your third party library better surely has an idiomatic API - otherwise search a better one or please don't expect your average developer come up with one.
I have seen this too often, countless attempts of unecessary wrappers added around fine APIs because we need to own it more, test it better or replace it in some very eventual future.. and what you get is almost always less idiomatic APIs, that transform results badly like swallowing errors, and what else, often requiring me to now learn two APIs, a badly documented but owned one (or just read the source code!!!) and the real one.
Sure I hear what the author says.. but practice with the average dev and this turns out different in reality :(
What the blog completely passes over is that now you'd need to write tests for the DockerRegistryClient... which would be 90% of what he complains about in the first version!
I mean sure, if you break down a problem into three layers instead of one, then the top layer testing will be three times cleaner... but you'll write three times as much code and three times as many tests, one for each layer.
The end-result might be better layered code, but it takes time to write, and time to test and, the not-so-little downside of "clean" code layers that is often never discussed and tossed aside, you need to read and understand three layers of code to know what is really going on instead of one.
It is worth it when your layers are getting large and heavy, but for three-lines-of-code thing like this... is it really worth it?
No, it doesn’t pass over that. It specifically points out that it’s a simplistic example for illustration that probably wouldn’t be worth it. And yet the effects are visible and so are the problems/solutions.
I mean what do you want here? 1,000 LoC in a blog post so it feels worth it?
If we're talking about mocking in general, mocking foreign services might be very useful and may be even essential for good development practices. For example imagine that you're integrating with some other system. You might receive some test credentials during development time, but they'll lost eventually or something will break. So you'll have production integration, but no way to properly develop your code. So I'm always writing some emulator for foreign services. It might be as simple as mocked responses or it might contain its own little database and some primitive logic. But in the end you don't depend on someone's test system, you can run your integration tests, you can run your staging environment and so on.
That might not be necessary if you're integrating with some big guys like stripe or facebook. They probably have enough resources to maintain test environments. But for some little company which don't really care about test environments and whether it corresponds to the production environment - that kind of mocking might be necessary for sanity.
I prefer to mock below the level of the 3rd party client library. In rubyland we use VCR to stub out the lower level network libraries to record the requests/responses. Then during the test runs your network libraries don't make real connections to the APIs - they just play back the recorded responses.
Python has ports of VCR (betamax and VCR.py).
The benefits of this are numerous. You don't need to manually build and verify mocks. You don't need to build a facade for every api client (the api client is already a facade). You exercise the as much of the 3rd party library code as possible on each test - this is very helpful during library upgrades. You can verify your recordings by switching your test to "live" mode every now and then (just configure your CI env do that every day or every week).
> You don't need to build a facade for every api client (the api client is already a facade).
I feel like this is covered by
> Every rule and principle can be broken once you’ve fully understood its purpose. For example if an object already does have an idiomatic API, it’s probably not worth wrapping in an identical façade, just so it belongs to you.
?
I personally prefer pytest-httpserver that is a bit more low-level, but I still prefer idiomatic calls over sprinkled http requests on business code.
Yes - it is covered by that. I do build facades for apis that don't have good client libraries, but for things like s3, or stripe - with solid libraries I don't.
Even when I build those facades I don't mock them out - I just rely on VCR.
I've never used test server like pytest-httpserver. I don't think that would buy much vs using stubbing your low level network libraries like VCR does. The killer feature of VCR is that it records the real network calls on first usage. If pytest-httpserver can do that in a proxy mode it seems like a good way to go though.
I can imagine an alternative is "don't mock, period. Write functional code with tests, so the hack of mocking vanishes as a consequence of composition of functional code." but that's an earful.
And even if not, your CI tests take hours to run because of all the network calls.
Automated tests that do real calls to 3rd party services are essential too, but they should almost never be part of your standard CI scripts that get triggered by every push.
But I think the article was mostly talking about mixing network/http level mocking with business logic tests, which I'd admit isn't always ideal, but IME complex test cases often require some level of compromise.
The more I think about this, the more I like it. It cleanly splits the problem into two testable questions of "am I using this API correctly?" and "is my own logic correct?", without adding tedious complexity.
I'm generally a fan of simple wrappers for APIs when you find yourself writing a bunch of redundant code, and testability is a nice advantage.
I think mocking external services sometimes makes sense. As the author mentions, maybe you're adding a regression test for a fix when dependent service was misbehaving (500 errors, really really slow responses, excessive rate limiting) and sometimes getting such a scenario setup requires changing very specific pieces
So the supposed lesson is that generic third party APIs are hard to mock, so you need to write helper methods everywhere so you can mock those instead?
I know you’re being facetious but: the lesson is rather that mocking generic third-party APIs is so easy, that you can end up in mocking hell real fast. And that you should take the harder route for your future sanity.
how is writing tests scalable? how do you decide what to test? three parameters, maybe three classes per parameter, -1, 0, +1.. you now have 3^3 test cases just for one function.
Tests do not have to guarantee that the behavior is correct for all possible inputs. If testing all possible inputs is easy, do it. Otherwise, you write the tests which you think are most likely to catch errors. Try to exercise different code paths to catch errors in different parts of the code. Try to use different logic in your test from the logic in the code under test.
I wholeheartedly disagree with the author here. Not mocking 3rd party dependencies is a bad practice, it's how you wind up with fragile and flaky tests that can't always be ran locally.
Not mocking 3rd party is done in integration or orchestration testing. Not your unit tests.
So you haven’t made it past the second heading and already felt compelled to “disagree wholeheartedly”? The article is _precisely_ about the knee-jerk reaction you’re exhibiting and trying to explain why that decades old principles makes more sense than it seems.
Please search for “If you’re anything like me, that makes no sense! What else am I supposed to mock?” and read with an open mind.
Assuming you care about clarity and not just style, there might be some feedback to take in when people don't get the gist of what you're saying until after the second heading.
Even if you want to make the writing a journey that arrives at a conclusion rather than a thesis-first argument, the presentation immediately being scattered across a bunch of embedded block quotes and long asides (your disclaimer) invites readers to lose the path, give up, and just make assumptions about the remainder.
And what always happens on the internet when people do that? Comments that miss the point and confuse everybody else about what you were saying. You're right that the other commenter could have been more responsible, but odds are that someone was going to post a comment like even if it didn't happen to be them.
I put the disclaimer there, because I was afraid people would stop reading and start arguing about mocks vs fakes vs etc. But you’re right, people skip the lede (mentions counter-intuitive, I hoped that set the mood) and lose patience, so I’ll try to swap the blocks and see if the comments improve.
There are three related terms: mocking, faking, and stubbing. They mean subtly different things, which the article points out. The author prefers the last one, and uses the middle one for complex cases, avoiding mocks altogether. But that doesn't mean he leaves third-party networking libraries run wild in unit tests. In fact, the very example he builds in the article shows how to avoid that without having to create a mock. Maybe you need to read it again?
And his verified fakes is just a way to introduce bugs because taken to the extreme you'd have to 100% recreate the 3rd party dependency you are interacting with.
It depends on your language and testing framework. In C++-land, for instance, you can't mock a third party library if they don't define virtual methods, forcing you to make a wrapper (an experience with this is what taught me this mantra)
That's a fair point. I guess it comes in a bit differently in C++ land, because the "you shouldn't" comes into what you decide to allow to be injectable (e.g. if your constructor only accepts the third party object type, you're not gonna be able to mock it). As such, you gotta keep the principle in mind when designing your objects. That said, you're definitely right, it smells like that fallacy
I think that applies to other languages, too. There doesn't tend to be much (any?) magic mocking in Go but more "real" local HTTP servers to stub responses and duck typed fake implementations
On the other hand, in Ruby and Python it's pretty trivial to monkey patch and magic mock
Go is a good example, too. Unlike C++ though, the distinction between something with dynamic dispatch (an interface) is a bit more explicit than in C++ (a class with virtual methods, which may not all be virtual). That said, in your example, yeah: if you just pass a stock http.Client around, you lose any ability to mock that short of spinning up a dummy webserver.
If you are mocking the library in its entirety, you just define the same classes/functions that they do - virtual or not - and then link to your implementation instead of theirs for the test build.
Heh, that's an interesting strategy. My company primarily works with gmock, and I'm not certain you could do what you describe with gmock, but yeah that would work otherwise.
I think the sane half-way point is to hide external integrations behind a facade that you can mock for unit tests and then you verify those assumptions with proper integration tests.
Yeah and the real implementation you can still do a full integration test with the real backend (send a request to a web API) if you want but limit runs to only times the facade changes
Do whatever that will increase confidence in code and testability. If I want to make it easier for me to develop, I'm gonna do anything that will enable me to get shit done. Test takes 0.1 second instead of 0.001? Whatever. I can improve it if needed, when I get more knowledge or I can discard it completely.
When I stopped caring about 'should I use mock, or inject closure, or interface...' I started writing more tests and producing better software. Because often the end goal is about writing clearer code even with shitty tests.