How to Manage External Dependencies
Note: I would like to thank Chris Kipp (@ckipp01) and Martin Kok for providing the proofreading of this article. The image comes from "Epicurious HTML graph" by Noah Sussman is licensed under CC BY 2.0
Project setup
Our project is composed of multiple teams with different experience levels, goals, and users. Each team is in charge of several applications or repositories, although ownership is sometimes shared, and like any other project we develop internal libraries to guide or impose a certain culture or idiosyncrasy. It is important to understand, teams maintaining interconnected applications can make a completely reasonable decision, like adding a new library, impacting another team where it becomes unacceptable, so special care must be taken to avoid hampering other teams. I think we should strive to take the best possible decisions, not only by their technical impact but also taking context into consideration, present and future.
Guidelines
Depending on what you’re trying to do different actions may apply, so let’s divide them on: Adding, Updating, and Removing dependencies. The underlying principle to all of them is simplicity, one of our main responsibilities as software developers is to manage complexity [1], for which dependencies can add to or simplify.
Adding dependencies
- Don’t, just don’t: You need to decode JSON, after a quick search you find this amazing library so you decide to add it to the project. Zip Zap done, next ticket please. Now wait a moment, are you sure your project didn’t already have a library that could do that? Even before jumping to add a new dependency we must find if our project is already capable of doing such a thing. Nowadays we can easily see what libraries are included in our project and what facilities they bring. The specific tool will depend on your particular situation (for us this is usually Intellij, Metals or the plain old grep). This guideline also applies if you can, with little to no effort, add the functionality on top of what’s already there.
- Rule of least power: [2] Now that you’ve searched throughout your project and there is no easy way to decode JSON, you need to add a library. Open source is great, the world runs on open source. There is no way to deny the impact and growth that spans out of it, but there is also a complicated part that comes with OSS, namely paralysis by analysis [3]. There are sometimes just too many alternatives, I’m looking at you JS. So how do you go about choosing a library over every other option? Let’s use the rule of least power, which for our case I think it entails the following:
- It does what we need it to do and nothing else (or at least as possible). The less surface an API has the better. This means less mental overhead is necessary to learn/use it, and probably less complexity is involved.
- It brings an amount of transitive dependencies proportional to the complexity of the task we need to solve. Did you ever install a Calculator App on your phone that asked permissions to your camera, contacts and microphone? How did you feel? Uncomfortable, right? That’s exactly how I feel when the amount of transitive dependencies is too big for the task I’ve to solve, all other things equal, I like to keep the list as short as possible (you’ll learn the reason in the next section).
- Aligns with the application, team and project: Every part has its culture, when adding a new library make sure it aligns with it to ensure the least amount of pushback or friction. If an application uses ScalaZ it would be a bit weird on my part to add Cats.
- Check for pulse: How alive does the dependency feel? How many maintainers does it have? When was the last release? How many filed issues does it have? How many of those are resolved? A live dependency is one that continues to evolve and improve, and you don’t want to get stuck on a dead end.
These aren’t the only guidelines you can follow, just remember adding a library is always easier than maintaining or removing it.
Updating dependencies
How one goes about managing dependencies you may ask. If the application works without problem, and nothing changes there isn’t anything to do, if it ain’t broke why fix it. But more often than not you’ll want to update your dependencies, like we do with Scala Steward. There are a few things to consider besides increasing a version number on your project definition:
- Continuous Integration: Good test coverage paired with continuous integration is a must on every serious project, in our topic of interest, it allows you to test for regressions which will help you find and fix the following issues.
- Breaking changes: The new version can introduce breaking changes that require modification of not only your code but other applications using yours. Here the decision lies on if you want to pass on the breaking changes or if you can absorb that issue without your clients noticing:
- Share the load: Here you have to fix the breaking changes in your codebase, maybe a parameter has been added, maybe a return type has changed, or maybe a deprecated class has been removed. Anyway this implies that the API you expose to your clients has to change and they in turn have to change how they use your code. If this happens, please be sure to coordinate with other teams using your API, communicate what’s changing and why, and if possible provide a migration guide to alleviate any headaches. There are also some tools to help us identify such conflicts, such as Decca or MiMa.
- Good Samaritan: Here you have to fix the breaking changes but you also have to figure out how to avoid breaking your API’s contract. It is not always possible, but it could mean the difference between happy and angry users.
- A parameter has been added? Maybe overload the method and provide a default value in the existing one.
- A return type changed? Maybe do a decoration.
- A class was removed? Maybe add it back on your application.These are very simple examples that serve as a starting point, usually they are more complicated.
- Dependency conflicts: These happen when two incompatible version of the same library are expected to exist, mainly due to transitive dependencies (In the JVM this usually manifest itself with a
ClassNotFoundException
orNoSuchMethodException
, although the JVM is not the only affected environment). And as noted in the previous section, the more transitive dependencies you have the more chance you have to run into this kind of problem. Dependency conflicts have been studied extensively by Wang et al.[4] and some their recommendations can be found here: - Finding a version that doesn’t conflict: Most often than not you are not in control of the class that gets loaded by the ClassLoader. Best case scenario you can exclude the transitive dependency shadowing the version you need and everything will still work, worst case scenario you have to exclude both versions and find one that all libraries are happy with, this also includes finding a version of each of each of your dependencies that happens to have a compatible version.
- Shading a version: Sorry I lied, worst case scenario is actually requiring both incompatible versions to exist, this is where shading a dependency alters the package name for a specific dependency so the class loader can pick up both versions.
Removing dependencies
For some reason you have decided to remove a dependency. Maybe you want to replace it with a better suited alternative, maybe you want this library to be provided by the client application or maybe you no longer need to provide the feature it was supposed to help implement. How much effort is spent here will depend on how pervasive it has been in your codebase, libraries are easier to remove than frameworks (In my experience removing frameworks or database technologies are such tremendous effort than usually justify an entire application rewrite). Just like the section above, a good CI setup is paramount for doing this with the least amount of errors:
- Baby steps: In case we don’t need the library anymore, it’s easier to search where it’s used and remove it one step at a time. Where to start will depend on your application., Yyou simply can’t begin removing calls to the library code without thinking about how you adapt the missing result values and side effects. My recommendation is to take a bottom-up approach with baby steps, running all tests continuously.
- Removal by abstraction: In case you want to replace the dependency or push its definition to another application or module, my recommendation is to abstract its API. You can see this on libraries such as SLF4J or Paul Dijou’s JWT Scala. We isolate the parts of the library’s API we use under one or more abstractions so we can treat the dependency as an implementation, then we implement this contract with the library we want as replacement. Mind this approach only pays off if this kind of flexibility is needed, e.g. the original library may still be needed by other teams using our code. Otherwise I would fallback into the previous baby steps approach but instead of removing calls we would replace them with the new library.
There is another approach that I purposely forgot to mention, Mono Repos [5]. I don’t have enough experience to comment on them, however I do have some observations and hope to share them in the future.
Conclusion
Libraries are a great way to get things done quickly. They are one of the reasons we avoid reinventing the wheel, but just like any other piece of software, they can be flawed. It’s something that once added to our project can affect it and even shape it. Great care must be taken so dependencies don’t turn out to be more headaches than it tries to solve.
References
[1]: Law of conservation of complexity
[2]: W3 Least Power and Scala Principle of Least Power
[3]: Analysis paralysis
[4]: Do the Dependency Conflicts in My Project Matter? - Ying Wang et al.
[5]: Why Google Stores Billions of Lines of Code in a Single Repository
Comments
Post a Comment