Programming the Environment

TLDR: It’s often easier to tweak the environment than it is to change behavior directly, especially in the face of limited resources.

One of my favorite Chief Technology Officers once told me “First you program the code, then you program the people, then the organization.” There’s a lot of truth in that statement, but it’s not quite that simple.

We all have inputs and outputs, but the logic that drives our behavior is locked within the black box of our individual psychology. The same input can have very different outputs on any given day. People are complicated.

Changing behavior is hard. Changing an organization’s behavior is really hard. An approach that works with one person might not work with another, and it’s easy for individuals to fall back into bad habits. Faced with limited resources and tight deadlines, it’s often easier to tweak the environment than it is to change behavior directly.

Why does this keep happening?

I was chatting with a principal architect in my organization, and we lamented the amount of testing happening in production, the tight coupling of systems, and the volume of dead code in the release branch. They talked about creating examples of best practices with the hopes that people would pick up good habits. Some engineers ran with the suggestions, but many were slow to change faced with the pressures of day to day work.

At the risk of sounding cynical, I’ve come to think of engineering teams as groups of brilliant cats. They all have their own personalities, quirks, and motivations. They don’t always get along. You have to work hard to earn their trust.

It’s hard to convince cats not to climb onto a high shelf if there’s a chair for them to jump off of. Things get knocked down and inevitably broken. Spraying the cats with water only causes more damage as they scramble to get out of the way. Adding new toys on the ground might interest some of the cats, but not all of them. Some will inevitably keep returning to that shelf, so you have to take the chair away.

Academic Purity is Unattainable

We have long suspected that the way feature flags are being used is a problem in our organization. This is a symptom of the greater problem, not the root cause itself. People are very quick to say things like “anything can be dangerous if you don’t know how to use it”, and suggest that it’s bad to eliminate a thing rather than learn to use it properly. I agree in principle, but it’s not that easy in the real world. You have to recognize your organization’s limits.

Given unlimited time and resources, we’d clearly prefer to train our existing engineers or hire senior engineers with a breadth of external experience to draw from. Unfortunately we rarely have that luxury in the face of deadlines and budgets. Companies often grow quickly, and go on hiring sprees with limited budgets to keep up with engineering demand. Before you know it you’re profitable, but you might have under experienced engineers and a sprawling code base that’s hard to test and maintain. Yes, this can all be avoided with careful planning and resourcing from the start, but I’m focusing on more common realities here.

Once a company is profitable the train is already barreling down the track. You can’t slow it down to coach everyone at once because there are deadlines to meet and revenue streams to maintain. You can’t change all the wheels at once either. Swapping your engineering base out for more senior engineers leaves you with a lack of domain knowledge, and the train comes grinding to a halt. You can’t just create a new train either. Existing products have to be maintained while constantly porting things over to the new system and chasing after feature parity. You have to do it iteratively, but that takes time.

It’s often easier to tweak the environment in order to make “bad behavior” less likely given immediate, systemic problems. In the cat metaphor it’s a matter of removing the chair that’s allowing them to jump onto a dangerous shelf. In our case, it’s a matter of removing existing feature flags and potentially not using them at all. Yes, feature flags have value in some cases (e.g. feature comparison testing for conversion), but they can cause immense chaos when used improperly, especially when there are better ways of accomplishing what we’re trying to do.

The Power of Taking Things Away

If you’re using a Lightsaber to cut bread in a crowded kitchen, someone is going to get their hand chopped off. Chances are, you’re not gonna need it. That bread could be sliced just as well with a simple bread knife. You might not be able to overthrow the empire with a bread knife, but the problem you’re tackling gets solved just the same, and it’s less likely someone loses a limb.

In a previous life I led a front end team with an application written in Backbone.js. Seemingly benign changes in one side of the application often resulted in unrelated parts crashing down. Switching to the slightly more functional React for rendering and Redux for unidirectional state management helped a lot, but familiar problems kept happening.

JavaScript gives you all the rope you could ever need to hang ourself. Its flexibility enables all kinds of undesirable behavior like rampant mutation and side effects. The language itself is not to blame. It’s a lack of experience (or more often simple human error) that makes it dangerous. Even with an experienced team of thoughtful engineers, the risk of missing something on a bad day makes it inevitable that glitches creep in. We’re all human.

We were using Clojure with great results on our back-end. We loved that taking things away and having few ways of doing things made it hard to do the wrong thing. Data types in Clojure are immutable in most cases, and you have to work very hard to store mutable state. This encourages functional patterns with very few side effects resulting in explicit, testable code with very little ambiguity.

We were already heavily invested in the React front-end ecosystem, but wanted to get away from the flexibility of JavaScript. We discovered ClojureScript which compiles to JavaScript, and re-frame which closely mirrors React / Redux patterns. Making the switch took away the mutability and complexities of the more flexible JavaScript implementations, and allowed us to improve our quality so much that we went from releasing once every other week to once every other day.

Further Reading

Software Engineering Manager and Senior Software Engineer with a passion for maintainable, testable code. https://www.linkedin.com/in/jeromedane/