"Why are we hearing about issues only now?"
Meaning: "Why are we hearing that there are issues so late in the sprint/epic/project?"
Classic. If you have a few years in the tech industry, you've likely heard or said a variant of this one more than once.
Very often the first reflex, internalized or even externalized, is to blame it on the engineering team or at least question their seniority. If that's the case, your culture might already be on its way down the drain, and it's not for the people mentioned. So put on your gloves and get ready to pull it out.
To illustrate the cause, let me start with a real-world requirement (sensitive details omitted): every time you open a team's page, the subpage previously opened via a dropdown should be loaded from cache (so that you're back to the first page if you don't open the team for a while).
My intuitive estimate for this before any deeper thought would have been 2-3h, including review time and deployment. For anyone non-technical, anything more than that for such a simple thing might seem ridiculous. OK, let's build it.
We'll just store the selected team ID in the browser storage. The browser limits how much storage a website can hold, but it shouldn't be that much anyway. We should actually store a list of team IDs because users commonly have access to multiple teams of an organization.
Ah wait, i just said "organization". Some have access to multiple organizations too, we need to see how to account for that. OK, after some research and time spent, we notice that a team cannot be in multiple organizations, so although we've lost some extra time, we're at least sure this isn't relevant.
OK, so now we want to store our list of team IDs. After some time thinking about in which of the browser storage types to store these, we pick local storage.
OK, but if the user comes back tomorrow, we want them to start on the first subpage again (hence the cache). Or do we? The requirement only states there should be a cache, indicating some vague form of expiration mechanism. This makes it necessary to find a way of invalidating data in local storage, meaning some sort of cache management mechanism needs to be built, possibly including an additional library. But many details are still missing from the requirement:
- What happens when the user revisits the team page? Does the expiration period reset for that team?
- Does everything reset at midnight and all teams go to the first subpage, or does every visited team get a countdown of 24h?
- In the delete-at-midnight scenario, we need to make sure to track "midnight" in the user's timezone and not UTC.
(Please note that I'm not suggesting you should have such crazily-detailed requirements documented at all times, but ideally you should be able to make quick decisions on these with your PM or PO.)
Maybe an expiring cookie would be a better option than local storage then? But some browsers have limits on the amount of cookies per domain, so we probably don't want to pollute that storage option.
...
After a while, you might decide that it'd be simpler to have this on the backend, as you at least have an already running Redis instance that could handle it and clean it up there. But then you have to:
- Store and retrieve the IDs via endpoints, and also adjust the frontend for that.
- You now also need to associate team IDs with users and track all of these individual combinations, because two or more users might have access to the same team, but select different subpages. They might also be in different timezones, and/or have different expiration times.
- And so on and so on...
So, even if you're not a developer, by now you have surely noticed that the implementation would take a multiple of the original estimate. Some of these things are foreseeable after little thought, but most are not until you actually dedicate time to work on such features (meaning both think about and develop them). Also, the more delivery pressure you're under, the more of these considerations will slip at the beginning.
It's not uncommon for this to happen for multiple tickets within the same sprint or epic. For the tickets the team picked up early, you might even hear about issues "on time", raise it early enough with your superiors, and be on your merry way. But if this happens in later stages, and it definitely can, you can:
- Choose to escalate with immediate negative impact to team morale. If this is your default reaction to raised issues, you'll kill your culture with a thousand papercuts. You will also jeopardize your project further by moving focus from development to countless hours spent in retrospectives and other meetings trying to figure out "what went wrong".
- Work towards adjusting your expectations and process to be more resilient to such updates. "Resilient" doesn't mean "plan ahead better next time" (this is very delusional thinking), but rather have a clear path forward and lower the cost of adjustment every time this happens.
I am obviously in favour of the latter approach. So, how do you become more resilient then?
Here are a few ideas:
- Kill the notion of "being late", and generally put yourself in the mindset of anticipating surprises.
- If extending time is not an option (make sure you're in this situation rarely!), plan your scope in a way where you can cut away from it to still hit a significant date.
- Don't start asking repeated questions on how the perceived delay could have happened. It's very draining for the team, and you'll be hearing variations on the same answer most of the time anyway.
- Simplify your reaction to every surprise ("no drama"). You might be tempted to add a few meetings on top, ask for status updates more often, book retrospectives every sprint, etc., but don't. Instead, trust your team and try to get more value out of the status meetings you already have, and then see if even some of those could be cut to make more space for delivery.
- Don't promise concrete dates to customers! It's OK and even critical to keep them up-to-date on what you're working on and what you're planning down the line, but the wording needs to be careful, and it needs to be clear for them that the timelines might change. This can happen not just because of unforeseen issues, but also due to ever-changing business goals too, so this strategy makes sense from multiple angles.
- Don't ever take estimates as commitment, because this way you'll breach commitments more often than not. Even more importantly, don't communicate them externally (see previous point). Why? Because estimates are usually done at the time when you know the least about the work ahead of you, so they will often be wrong if not adjusted.
- Apply modern software engineering practices such as trunk-based development, test-driven development, feature flags, continuous delivery, pair programming, etc. to make sure you can deliver your software in small chunks and often.
- Related to the previous point, strive for deploying to production multiple times a day. this way your whole organization will always use the current version of your software and have appropriate expectations.
- Find ways of monitoring progress and timeline projections continuously, and make them always accessible and up-to-date so you can react to changes early. This is one of our must-have requirements for Cadence.