If you have a set of microservices using a common language, it is sometimes reasonable to extract commonly used behaviour into shared libraries. One of the consequences of this is an increase in the complexity of your dependency graph, and so there are some important decisions to make about how to manage versions of transitive dependencies on your own, and third party, libraries. This post specifically talks about Java or Kotlin with Maven.
If your code is going to remain fixed, and it doesn't depend on any libraries that could conceivably present any security risk, then it doesn't matter too much how you manage dependencies. However, in the real world there are two reasons that versions of transitive dependencies change - vulnerability patching and changes to functionality (either upgrading to a later version of an existing library for its new functionality, or bringing in a new library which depends on a different version of one of your existing dependencies).
Our goal in both cases (but particularly the first) should be to make version changing as simple as possible, and with as little risk of unforeseen consequences as possible.
There are three broad approaches:
- Manage versions from the bottom (shared libraries enforce versions of transitive dependencies)
- Manage versions from the top (microservices enforce versions of transitive dependencies)
- Manage versions from outside (versions are enforced in a separate common definition - for example the spring-boot parent)
At first glance the first approach sounds compelling especially if you have a large number of services - any vulnerability can be addressed in one place and will just propagate out to services from there. However there is a fallacy here and a few serious dangers.
The fallacy is that managing a version in one place simplifies things for microservices. This does not hold because the services still need to up-version their dependency on this shared library so rather than simplifying the management you have just added another level of indirection.
The dangers all relate to complexity. First, how can you be sure that one library is the only one managing that dependency? If two libraries use the same transitive dependency, which one should you choose to manage it? What if some microservices use one of those libraries and others use the other. You'd have to manage in both libraries and then what if they are out of step?
As soon as you have multiple shared libraries with the same transitive dependencies, addressing a security vulnerability starts with a time-consuming investigation to discover what is actually defining the version. Since maven 3, the verbose option of dependency:tree is broken. You can deliberately down-version with
- mvn org.apache.maven.plugins:maven-dependency-plugin:2.10:tree -Dverbose=true | less
but while this tells you that a library has been dependency managed, it isn't necessarily trivial to discover which POM is actually defining that version
- 08:33:06,547 [INFO] | | +- com.github.bohnman:squiggly-filter-jackson:jar:1.3.6:compile
- 08:33:06,547 [INFO] | | | +- org.antlr:antlr4-runtime:jar:4.6:compile
- 08:33:06,547 [INFO] | | | +- (org.apache.commons:commons-lang3:jar:3.11:compile - version managed from 3.4; omitted for duplicate)
- 08:33:06,548 [INFO] | | | +- (com.google.guava:guava:jar:19.0:compile - omitted for conflict with 20.0)
- 08:33:06,548 [INFO] | | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.11.4:compile - version managed from 2.6.4; omitted for duplicate)
Another serious problem is that vulnerability patching and functionality are now tied together - what happens if your microservice is depending on version 1.0.0 of a shared library, which has since evolved up to version 2.0.0 with a breaking change. Now a vulnerability is discovered in a dependency and the shared library updates the managed dependency, releasing this as version 2.0.1. Your microservice can't up-version without addressing any backward compatibility issues, the simple route then is to dependency manage the vulnerable transitive dependency in your microservice POM - when there are multiple conflicting managed versions, maven takes a 'closest wins' strategy. However, now you are starting to create a sprawling mess of conflicting version management. Next time you need to deal with a security vulnerability, your job will be twice as hard.
The second approach is conceptually an improvement - microservices should be in control of their own versions. The reality is not so simple of course - if multiple versions of a transitive dependency are used by classes from different shared libraries at runtime, then unless you are running with OSGI one version will win, and you have to make sure that the version which wins is compatible with all usages. That is not something that can be solved by any particular strategy - it's just something that has to be managed, but do you want to figure that out and manage it in every one of your microservices? In the absence of breaking changes highest version should be the right choice (and hence that is the default behaviour in Gradle) so that maybe simplifies the decision making, but doesn't remove that overhead.
The downside of this approach is the sheer amount of dependency management that may be required. However a pragmatic 'laissez faire' approach is just to let the maven rules untangle dependencies and dependency management for all transitive dependencies until you detect a security vulnerability, and then dependency manage in the microservice at that point. The only downside is that in the presence of a large number of microservices, that can be a lot of versions being specified in a lot of places - the big downside is where microservices are not in active development the overhead of up-versioning various dependencies across many microservices can be time consuming.
The third approach addresses the downsides in both the other approaches to an extent. The main advantage is that rather than an ad-hoc wild-west set of versions being specified, there is some central control - where that can be managed by team members with the right level of expertise; and where microservices have an intelligent set of tests, this can reduce the risk of one up-version having unexpected results because of a usage that hadn't been fully considered. This is the approach taken by spring-boot, where a core of experts and a community of collaborators can take an opinionated view of what versions work together correctly in order to provide the services that Spring claims to provide. It is this combination of expertise and community collaboration that gives developers a high degree of confidence that up-versioning spring-boot will not create unexpected side-effects in any of the code within spring's jurisdiction. For dependencies outside spring-boot, a similar approach can be taken by a team or set of teams. If versions are managed centrally, and any changes are managed and tested then microservices should have a degree of confidence that the shared libraries they are depending on will provide the services they claim, without side effects.
Spring-boot projects will tend to use a spring-boot parent and so one option is to have a team parent pom that extends this. However this means the parent also determines the version of spring boot, and any update will have a large blast area. An approach that is easier to manage is the use of a team BOM, or more than one if there are logical groupings of dependencies.
The 'closest wins' approach of Maven means that any microservice can use BOMs but still override such imposed versioning in the (hopefully) rare cases that this is required. Ultimately managing dependencies just IS complex, but making the right decisions and ensuring you have adequate controls in your CI and CD is what will keep your team as nimble as possible in this complex landscape.
No comments:
Post a Comment