TLDR: Feature flags are dangerous, and there are better ways of accomplishing what you’re trying to do.
Feature flags (AKA feature toggles) are frequently used in unhealthy ways far beyond the scope of their original design. They are highly flexible, but leveraging that flexibility comes with drastically increased complexity and risk:
- Lack of clarity on what code is actually running in production
- Code is hard to change due to added complexity
- “Land mines” of stale or dead code that is poorly understood
- Poor signaling within the code base of what has actual value
- Multiple code paths, making testing harder, and less likely we’ll do it well
- External source(s) of truth rather than explicitly traceable code
- Confusion around what toggles were changed when, why, and by whom
- Different behavior across environments with different feature flag values
- High risk of inconsistent behavior around local and distributed caches
Feature flags tend to lead to “bad behavior” such as:
- Testing in production rather than due diligence during development
- Creating monolithic releases rather than iterative development
- Leaving bad or outdated code around rather than cleaning it up right away
- Tolerating inefficient code rather than improving it’s performance
- Living with slow, unreliable deploy processes rather than fixing it
Unhealthy Patterns of Feature Flag Use
Feature flags are frequently abused as an “easy” means of accomplishing things, all of which could be done in much better ways:
- Gradual Rollout
- Pre-launching (AKA Dark deployments)
- Kill Switches
- Code Sharing
It’s tempting to gradually direct more and more traffic into a new feature due to low confidence in its stability. Doing so encourages development of large solutions rather than small, careful, iterative steps. Larger solutions are hard to change. When we inevitably find out that a large solution does not scale as expected, the cost to change is high, and sunk cost fallacy leads to poor decision making. We cling to bad implementations rather than throwing them out in favor of simpler/better designs we haven’t tried yet.
Instead of gradually throwing traffic at a feature until we hit full volume, gradually throw pieces of the feature at your full volume. Pick the smallest, simplest implementation of a feature. Spend time thinking about it carefully and discussing it with your peers. Implement it in development, review it carefully, test it thoroughly, and release it to production. If things go poorly, revert it! You haven’t invested much, so it’s easy to iterate on your design or toss it out and start over. Soon you’ll realize you got your whole feature out before you know it. It will seem easy. It will seem like you barely did any work at all. That’s a good thing and it should be celebrated!
Some projects are challenging, and you don’t always know what the iterations will be. That’s OK! Build them as prototypes, even if they’re massive, but DO NOT release your prototypes to production! Instead, once you’ve figured out the whole implementation, validate it by breaking it down into smaller releasable pieces. If you can’t do that, it’s a sign your implementation is bad. Go back to the drawing board, talk to your peers, and reach out for help! There are a lot of smart people out there. Don’t let the limited perspective of a few engineers lead to assuming there’s no better design!
Pre-launching (AKA Dark Deploys)
We often try to fully develop and pre-launch massive features before they are actually ready to be used in production. This is a potentially gross violation of YAGNI. By the time business is ready for us to release a feature chances are the requirements have changed. Hiding functionality behind disabled feature flags with the intention of turning them on later creates confusion about what’s actually live in production. It adds to cognitive load when trying to understand and maintain a code base. It increases the risk that the code in question will go stale in the face of constantly changing requirements. Is improperly signals what’s important to our future selves and colleagues.
If a feature is not ready to go live in production, keep it on a branch rather than releasing it to production behind a feature flag. When you are ready to release the feature, don’t just merge your feature branch! Start by reloading all requirements into memory and revalidating them to make sure they still hold true. Break your release into the smallest possible steps. Test each step thoroughly in development. Release and monitor each step to production before moving on. Keep the latest business requirements in mind throughout the process! This ensures that all code on the main branch is actually used in production, eliminating confusion and reducing cognitive load. It also ensures that each piece of the feature supports the most up to date requirements.
We also tend to pre-launch or dark deploy large changes in which we have low confidence. See the “Gradual Rollout” section above for why this is bad and how to do things differently.
We sometimes have “valuable” code that we cannot afford to run during periods of intense traffic such as sales or peak holiday traffic. We wrap this code in feature toggles and turn it off in anticipation of intense server load telling ourselves we’re reducing the risk of catastrophic failure. We tell ourselves “we need this!”, all while acknowledging that it does not scale. Kill switches create a false sense of security and tolerance for inefficient code that should be improved rather than being allowed to risk taking down our systems. You cannot always predict heavy load.
If you have code that you need but does not scale then it is a liability. One of the most common examples of the kill switch pattern is putting logging code behind a feature flag. We tell ourselves “this information helps us understand and improve our system, but we can’t afford to pay the price at scale when our system is under stress.” Having proper logging when your system is running fine doesn’t tell you much. You want that full visibility when everything goes to heck to figure out what went wrong. If your logging systems are capable of taking down production under stress you’re doing it wrong. Rework your design so that it is more performant, or better yet move the performance heavy pieces of logging outside of your production system so that if it crashes your product doesn’t go down with it.
In other situations where you need to be able to flip or change code behavior quickly, you should be able to do so with a quick release. Having a fast, reliable continuous deployment pipeline allows you do do this. What’s more, proper automated testing in your pipeline means that your rapid change is automatically tested rather than assuming everything will work after a switch is flipped in production.
There are many times we want to share code with colleagues and collaborate without affecting what’s happening in production. This could be done by putting the code behind a feature flag and push it to our main branch. It especially happens in shops doing trunk based development. We telling ourselves that we are socializing our code and increasing visibility on what we are doing, but we are actually adding to confusion about what code is valuable and actually used in production. Code is either running in production and adding value, or it is not.
Git has a perfectly good mechanism for managing this problem: branching. Use it. There’s little harm if a branch goes stale in the face of changing priorities or requirements. In contrast, code that’s on the main branch behind a feature flag becomes a liability when it goes stale. We end up leaving landmines for our future selves and colleagues, just waiting to blow up in our faces if someone turns the flag on or takes the wrong lesson from what they’re seeing. We tend to assume (especially early in our careers) that if code is in the main branch there must be a reason for it, and it must have value. We risk drawing bad conclusions despite the fact the code should never go live and was built on outdated assumptions. Keep prototype and work in progress code on branches! Trust in your ability to resolve conflicts if those branches become long lived. Better yet, treat long lived branches as prototype work and re-implement these features iteratively. See “Gradual Releases” section above for more reasoning on this.
It’s important to recognize that feature flags can be a divisive topic. There are always pros and cons to everything. The goal is to carefully consider each side of the argument, put it in our own context, and work as a team to move the needle in the right direction.
Before anyone feels the need to bring it up in comments, I see enormous value in feature flags for A/B feature comparison tests where you are trying to see which user experience in a UI converts better. That’s not what I’m tackling in this article.
The opinions expressed in this article are my own after 15 years of engineering and 5 years of leading teams in both trunk based and more fluid branching development strategies. I’ve seen systems organically grow to massive monoliths, used feature flags, lead automated testing efforts, and seen the benefits of their removal and breaking things up into smaller composable parts first hand.
Here are some similar opinions to consider: