Let’s start this with a story.
Back in June 2000, I was writing some software for the Private Placement Division of a merchant bank called Thomas Weisel Partners in San Francisco. JDK 1.3.1 if I remember right. I was quite proud of the stuff I built there, especially some third-party integrations with vendors like FedEx and Siebel (remember them?) - all this before things (or even terms) like “API” existed, let alone REST.
But one piece of code I was particularly proud of was a multipart file uploader, which I pretty much wrote from scratch over an all-nighter by reading and implementing the original RFC for it. Because there was nothing already available to do it - or rather, there was no Stack Overflow to check if there was. It was probably not very elegant, but the damn thing worked - on the first try. And Private Placement moved a lot of docs around, so it was “mission-critical”, and needed to be maintained.
A few months later I discovered Struts and realized I could have slept in the night I wrote it. ¯\_(ツ)_/¯. We never got around to moving that functionality over. It’s probably still being maintained by some poor schmuck in Nebraska. (I’m still proud of that bit of coding though 🙂).
Point of all this dorkage? All tech is debt. It’s only a matter of time.
“Tech Debt” is the eternal bugaboo of startup-land. Pegged as the ultimate velocity killer (more on “velocity” in a later post), the persistent itch in the build-and-ship pipeline that just can’t get enough scratch, the receiving end of most accusatory fingers when bugs happen, the one thing standing in the way of continuous delivery utopia. Techies underscore it to clamor for sprint-space to do their beloved refactors (I’ve done it), rationalize missed deadlines (more on “deadlines” in a later post), and warn of dire unintended consequences and collateral damage if it isn’t addressed now. Business is confounded by it, and rightly so.
The way I have written the above paragraph makes it seem like it’s all resplendent baloney. It’s not. Tech debt is a real thing, and left unattended, it can cause all of that above. I have personally experienced this pain, at least twice. I just believe there is more nuance to it than the blanket statements I have seen hurled at it.
Here’s three things I have come to believe:
Tech debt is inevitable.
There’s good tech debt and bad tech debt
Good debt can decay into bad ← this is where the shyte goes down
Tech debt is inevitable
Probably a good place to provide at least my definition of tech debt. In the broadest sense of the term:
Any piece of software that has to be maintained over time is, technically speaking, tech debt.
AKA:
All tech is debt
Software ages fast. And unlike many cheeses, mostly not well.
To be sure, there’s the voluntarily incurred debt, stemming from things like shortcuts on one end to over-engineering on the other, or “organic debt” in the extreme (Appendix I). But even if you avoid all of these traps with your spectacular foresight and craft that perfect async service, that distributed framework you built your service on top of may be EOL-ed (or worse, licensed – true story), rendering your gumdrop software elegance into a mass of debt with a ticking clock inside it. Or the use case you architected for is no longer relevant, and you’re stuck with 4 microservices, 3 Kafka topics, and 11 events to update a user profile, which felt right at the time, but a CRUD endpoint would now do just fine.
Ergo: tech debt is inevitable. That’s just the way it goes.
(Except maybe for that use-once-and-toss data migration script you whipped up in Rust, that no one will ever see or inherit.)
There’s good debt and bad debt
Tech debt gets a bad rap, because “debt”.
But if you’re not voluntarily incurring a little tech debt in your startup journey, you’re most likely not moving fast enough. Or, as the bromide goes, “perfect is the enemy of good”. If a little bit of hardcoding (with a plan to abstract it out in the next couple of sprints) gets you to market sooner, or beat the competition to that launch, that’s the win. You don’t want to end up writing the best piece of software that no one uses.
There are of course, empirically bad slabs of debt often incurred by startups in early stages. I mean, don’t build your entire platform in Clojure, for Pete’s sake – where will you find the developers?
However, a small manageable piece of debt, voluntarily incurred to move the business forward faster, is a good thing. However, it is only as good as the payment plan built in to work that debt down. If you’re smelling a credit card debt analogh here, it’s coming soon, as this segues well into the next point.
Good debt can decay into bad
Almost all bad cases of tech debt I have seen, encountered, dealt with, even created, have been, in some shape, a case of “good debt” devolved into “bad debt”. This is the kind that causes a newly hired CTO to beg for “3 months to clean it all up”. And what early-stage startup has that kind of time?
Cue the credit card debt analogy. We all build debt monthly on our cards. It helps build our FICO scores and credit history. And the financially well-behaved among us pay the statements down in full every month. Conversely, if left unattended it can rapidly become untenable.
This is what happens in most cases with startup tech debt - the paydown plan is not put into place as the debt is being created. Or it is created but not followed. Or attempts to follow it are shot down. Or all of the above. So, more debt is created on top of old debt. And the debt-stack Jenga continues until the only solution is filing for bankruptcy. Aka the dreaded “full rewrite” - which causes engineers to salivate and business to hyperventilate in equal measure, both for good reasons.
If there’s one thing I’ve learnt, the hard way, it’s this: Good debt without a paydown plan will inevitably go bad. Keep working the debt down every sprint, bit by bit, even as you accumulate new bits of debt.
Where do we go from here
I promised not to be very prescriptive in these ramblings. I won’t. Also, because historically, I have had limited success in managing this particular challenge. So, what follows here is really commentary on what I tried to do to contain it, or what I wish I had done sooner.
First off, avoid organically bad debt. A good tech leader/CTO will do this by default: stack selection, foundational architecture, etc. I won’t spend too much time on that here for now.
Most importantly: Build a payment plan along with the debt. This can take several shapes but is mostly about habit creation and expectation management.
For every ticket you create in a sprint that is “tech-debt-y” (say a declarative hardcoding for something) create a companion ticket for Sprint N+1 (or 2) to refactor to something less crude.
Fold debt reduction into ongoing enhancements. It will always be “faster” in the short term to ship something by stacking debt. Instead, price in a couple of extra hours to work that old debt down. Best way to shave tech debt is in the process of building something new.
Planned obsolescence. Remember I said all code eventually becomes debt? A large maintenance surface area can cause serious drag on shipping speed and increase the chance of unintended consequences. Keep measuring what’s being used and what isn’t, sunset old bits of code on a regular basis. A key prerequisite here is to measure use and adoption of everything new you build of course - but good product orgs do this by default.
Despite all best efforts, it is quite likely some corner of your platform may morph into a Golgothan over time that needs to be neutralized. It is more likely than not to happen. Define a quantified framework (complete with OKRs) to rationalize taking on this larger tech debt work. E.g. in the past, I have formulated a “PBSD Framework” for greenlighting and scoping large-format debt reduction work. It helped - at least to make the case for it.
I could go on and on about this, but I should stop here. It’s already gotten too long. If there’s interest, maybe there’s a part deux here someday.
But TL;DR here: don’t knock tech debt. It’s not all bad, and it’s not your fault. Just keep the Golgothan away.
Appendix I: Organically bad debt
I don’t know what else to call it, but this refers to debt that’s bad from the get-go, unlike “good debt” gone bad over time.
Prototype Becomes Product. Very common, sadly. And has happened to me, twice. The little proto you built to demo to VCs for your seed round? That’s not your final product. Use the proceeds from the seed to build it for real.
This is an extreme case of “good debt gone bad”, and in absence of good tech leadership from the get-go, will almost certainly require a full rewrite.
Obscure Stack Debt. Mentioned before: don’t go Erlang or Clojure or (shudder) Scala. Stick to well-understood, well-supported, talent-available languages and stacks for your core platform. Or you will have to rewrite it anyway. There’s always peripheral stuff you build to try out new cool tech.
Complexity Debt, aka over-engineering. Simple is beautiful, Microservices aren’t all that. Start with the simplest flow, complexify from there if needed (and in most cases isn’t). Making a complex distributed architecture simple is often harder than incrementally layering complexity with the right foundational architectural decisions.
I’m sure there are many more. These are the ones I ran into the most. Would love to hear about yours!
Appendix II: The PBSD Framework
I’ve used this in the past to both analyze/greenlight and then measure the impact of large-format tech-debt initiatives, like major refactors or full rewrites. This was the PBSD rubric:
Performance - is it slowing stuff down?
Bug frequency - is it causing too many bugs?
Scalability - is it causing bottlenecks under traffic?
Dev velocity - is it slowing down dev/ship velocity?
For each proposed tech debt work, make it mandatory to publish a PBSD memo for discussion:
Function
<describe the function supported by this bit of software>
Cost of debt:
<some PBSD score, e.g. “PS”, to indicate the debt was impacting performance and scalability, along with some description of how>
Complexity: <numerical score from 1 to 5, increasing with level of complexity>
Debt reduction path
<a bulleted list of proposed path to work down the debt>
Success metrics
<how would we measure success – OKRs, in effect>