WIP - cutting this content and trimming to _ust_ the pre-prod testing part

prod-fidelity
Jack Jackson 5 months ago
parent 5fd56d2117
commit ac3fba3b07
  1. 1
      blog/content/posts/ci-cd-cd, oh my.md
  2. 63
      blog/content/posts/pre-pipeline-verification-and-the-push-and-pray-problem.md
  3. BIN
      blog/static/img/Promotion-Testing.drawio.png

@ -5,6 +5,7 @@ tags:
- homelab
- CI/CD
- meta
- SDLC
---
Since leaving Amazon ~4 months ago and dedicating more time to my own personal projects (and actually trying to ship things instead of getting distracted a few days in by the next shiny project!), I've learned a lot more about the Open Source tools that are available to software engineers; which, in turn, has highlighted a few areas of ignorance about CI/CD Pipelines. Emulating [Julia Evans](https://jvns.ca/), I'm writing this blog both to help lead others who might have similar questions, and to [rubber-duck](https://en.wikipedia.org/wiki/Rubber_duck_debugging) my own process of answering the questions.

@ -0,0 +1,63 @@
---
title: "Pre-Pipeline Verification, and the Push-And-Pray Problem"
date: 2023-11-17T19:49:06-08:00
draft: true
tags:
- CI/CD
- SDLC
---
It's fairly uncontroversial that, for a good service-deployment pipeline, there should be:
* at least one pre-production stage
* automated tests running on that stage
* a promotion blocker if those tests fail
The purpose of this testing is clear: it asserts ("_verifies_") certain correctness properties of the service version being deployed, such that any version which lacks those properties - which "is incorrect" - should not be deployed to customers. This allows promotion to be automated, reducing human toil and allowing developers to focus their efforts on development of new features rather than on confirmation of the correctness of new deployments.
<!--more-->
There's plenty of interesting nuance in the design of in-pipeline testing stages - and although this article isn't _specifically_ about pipeline design, before moving on to the main premise I need to establish one related concept; that of Prod Fidelity.
## Definition of Prod Fidelity
We can think of the deployments[^twelve-factor-app] of an app as being characterized by selecting a set of values for some configuration variables - variables like "_fleet size_", "_stages used of dependency services_", and, most impactfully, "_what Load Balancer fronts this deployment (and is it one that serves traffic from paying, production customers)?_"[^deployed-image-should-not-be-gitops]. A deployment which perfectly mimics production _is_ production (and so is unsuitable for **pre-production** testing[^test-on-prod]); but, the more a deployment differs from production, the more likely that it will give misleading testing results. Some illustative examples:
* Consider an overall system change C which is implemented by change A in service Alpha and by change B in service Beta, where Alpha depends on Beta. Assume that B is deployed to Beta's `pre-prod` stage, but not to Beta's `prod` stage. Consider a test (for behaviour implemented by C) which executes against a deployment of Alpha which a) has A deployed, and which b) hits Beta's `pre-prod` stage. This test will pass (the Alpha deployment has A, and the dependency deployment has B), but it would be incorrect to conclude from that passing test that "_it is safe to promote this version of Alpha to production_" - because Alpha's `prod` depends on Beta's `prod`, and the test made no assertion about whether B was deployed to Beta's `prod`. Thus, in general, the testing stage which makes the final "_is this version safe to promote to production?_" verification should depend on the production deployments of its dependencies.
![Diagram of Promotion Testing](/img/Promotion-Testing.drawio.png "In Situation 1, Alpha's Pre-Prod depends on Beta's Pre-Prod, so a test of functionality requiring System Change C will pass on Pre-Prod; but if Change A is promoted to Alpha's Prod (Situation 2), the behaviour will fail, because Change B is not on Beta's Prod. Conversely, if Alpha's Pre-Prod depends on Beta's Prod (as in Situation 3), then the same test on Alpha's Pre-Prod will **correctly** fail until B is promoted to Beta's Prod")
* Non-prod deployments which are solely intended for testing might disable or loosen authentication, load-shedding/throttling, or other "non-functional" aspects of the service. While this can be sensible and justified if it leads to simpler operations, it can lead to blind-spots in testing around those very same aspects.
* Load Testing results must be interpreted with caution where the configuration of the deployments _and that of its dependencies_ does not match the configuration of `prod`. Even assuming that a service can handle traffic that scales linearly in the compute-size of the service (a justifiable though often-incorrect assumption), scaling your `prod` by a factor of N compared with your load-testing deployment does not guarantee you can handle N-times the traffic if your dependencies are not similarly scaled!
You've probably already guessed, but, to be explicit - I define **Prod Fidelity** to mean "_the degree to which a deployment matches Prod's configuration_". This is not a universally objectively quantifiable value - I cannot tell you whether "_using the same AMIs as_ `prod`" is more or less impactful to Prod Fidelity for your service than "_having_ `DEBUG`_-level logging enabled_" - but, I suspect that _you_ have a decent idea of the relative importance of the particular variables for your service.
For the purposes of this article, it's not important to be able to give a number to Prod Fidelity - just to be able to compare it, to state that a given deployment has higher or lower Prod Fidelity than another. Generally speaking, as a software version progresses through the SDLC, it will be excecuted on deployments of increasing Prod Fidelity:
* Detecting logical errors (rather than errors in configuration, deployment, or infrastructure) _usually_ doesn't require high Prod Fidelity. High Prod Fidelity is generally more expensive - either in literal financial expense (running a deployment with an equal volume of equally-powerful compute hardware to Prod is more expensive than running a small set of "_good-enough to run tests on_"), or in operational complexity (a deployment which closely mimics Production in terms of functionality will require all the same functionality maintainance - authentication providers, certificate management, and so on.). _Ceteris Paribus_, it's preferable if an error can be detected _before_ the change
TK....hmmm. Maybe I a) need to reconsider this point (is there really value in a pipeline beyond Alpha/Beta/Gamma/Load-Test/One-Box/Prod), and b) should just cut this out entirely. But preserve it - it's an interesting idea (and good writing, and especially a good diagram!), but maybe not necessary to _this_ post.
(consider load testing results, or tests which rely on incompletely-deployed behaviour in dependencies when the testing stages don't hit production dependencies). Given that tension, how closely should your testing stages mimic production? For stages which closely mimic production and which "_talk to_" production downstreams and datastores, how do you mark test traffic such that it doesn't distort those datasets or generate real financial transactions while still providing a high-fidelity test?
TK Prod Fidelity increases
## Definition of Deployed Testing
Categories of test are a fuzzy taxonomy - different developers will inevitably have different ideas of what differentiates a Component Test from an Integration Test, or an Acceptance Test from a Smoke Test, for instance - so, in the interests of clarity, I'm here using (coining?) the term "Deployed Test" to denote a test which can _only_ be meaningfully carried out when the service is deployed to hardware and environment that resembles those on/in which it runs in production. These typically fall into two categories:
* Tests whose logic exercises the interaction of the service with other services - testing AuthN/AuthZ, network connectivity, API contracts, and so on.
* Test that focus _on_ aspects of the deployed environment - service startup configuration, Dependency Injection, the provision of environment variables, nuances of the excecution environment (e.g. Lambda's Cold Start behaviour), and so on.
Note that these tests don't have to _solely, specifically, or intentionally_ test characteristics of a prod-like environment to be Deployed Tests! Any test which _relies_ on them is a Deployed Test, even if that reliance is indirect. For instance, all Customer Journey Tests - which interact with a service "as if" a customer would, and which make a sequence of "real" calls to confirm that the end result is as-expected - are Deployed Tests (assuming they interact with an external database), even though the test author is thinking on a higher logical level than confirming database connectivity. The category of Deployed Tests is probably best understood by its negation - any test which uses mocked downstreams, and/or which can be simply executed from an IDE on a developer's workstation without any deployment framework, is most likely not a Deployed Test.
Note also that, by virtue of requiring a "full" deployment, Deployed Tests typically involve invoking the service via its externally-available API, rather than by directly invoking functions or methods as in Unit Tests.
Typically, a change which proceeds through the SDLC will undergo testing which has higher Prod Fidelity
On the spectrum of Prod Fidelity (see the footnote[^multiple-footnote-link] linked from the second paragraph), Deployed Testing falls more towards the high-fidelity end.
TK differntiate from Ephemeral Environments for acceptance
[^twelve-factor-app]: I'm here using the definition from the [Twelve-Factor App](https://12factor.net/), that a Deploy(ment) is "_a running instance of the app[...]typically a production site, and one or more staging sites._". Personally I don't _love_ this definition - the intuitive meaning of "Deployment" for me is "_the act of updating the executable binaries on a particular fleet/subset of execution hardware, to a newer version of those binaries_", and I'm generally loathe to use a term whose term-of-art meaning significantly differs from (i.e. is not a sub/super-set of) the intuitive meaning unless there's clear value to doing so. In particular, I'm not aware of an alternative term for the process of "updating the binaries" , leading to the confusing possible statement "_I'm making a deployment of version 3.2 to the_ `pre-prod` _deployment_". However, the Twelve-Factor definition appears to be widely-used, and my best alternative "_stage_" only really applies within a pipeline, so I'll attempt to use it in an unambiguous way[^not-environment].
[^not-environment]: "Environment" - perhaps the most overloaded term in all software engineering, even worse than "Map" - is not even in the running as an alternative.
[^deployed-image-should-not-be-gitops]: I remain convinced that "_what image is deployed to this deployment?_" is _not_ a configuration variable defining a deployment, but rather is an emergent runtime property of the deployment pipeline regarded as an operating software system; it should be considered as State, not Structure. See my [previous article]({{< ref "/posts/ci-cd-cd, oh my" >}}) for more exploration of this - though, since it's been over a year since I wrote that, and I've now had experience of using k8s/Argo professionally, I'm long-overdue for a follow-up (spoiler alert - I think I was right the first time ;) ).
[^test-on-prod]: Another interesting topic that this post doesn't touch on - should you test on Production? (TL;DR - yes, but carefully, and not solely :P )

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Loading…
Cancel
Save