foobarcat: git

Showing posts with label git. Show all posts

Tuesday, 31 October 2017

On Using Multiple Source Control Repositories for Microservices

The nature of microservices and its small services with limited bounded contexts means that most functionality within the platform entails communication between services. In terms of network overhead this is largely mitigated by efficient and low latency communication protocols such as gRPC. However, this means that most functional changes made by developers require making changes to multiple services. Certainly, it follows that the smaller we make our services, the more likely a change is to cross service boundaries.

A graph showing the relationship between service size and the typical number of services affected by a change

This overhead, along with many others such as frequent re-implementation of common functionality, is an accepted cost of using microservices as services are naturally more granular, composable and reusable.

There are two accepted processes for version controlling microservices, the 'monorepo' and the repository per service 'manyrepo' approach. With the monorepo, all services are kept in the same source control repository. When studying microservices from a theoretical perspective it seems logical to add yet another layer of separation between services by adopting manyrepo but there are some real issues and overheads associated with doing so, which your author has recently been experiencing first hand! Below I have attempted to communicate the relative pros and cons for the 'manyrepo' approach.

Pros

The separation of repos with explicit dependencies upon one another makes the scope of any commit easy to reason about. That is, to say which services a commit affects and requires deployment of. With a single repo 'the seams' between services aren't as well defined / bounded. This makes it harder to release services independently with confidence, thus hampering the continuous deployment of independent releases that is considered crucial to microservices based architectures. That is to say the monorepo, to an extent encourages lock-step releases.

It scales better with large organisations of many developers, it ensures that developer's version control workflows do not become congested.

It helps repos stay smaller and so keeps inbound network traffic for pulls smaller for developers providing a faster workflow.

Cons

With multiple repos changelogs become fragmented across multiple pull requests making it harder to review, deploy and roll back a single feature. Indeed, deploying a feature can require deployments across mulitiple services, making it more difficult to track the status of features through environments. There is a lot of version control overhead here.

Making common changes to each service. Making a certain change to N services requires N git pushes.

It is harder to detect API breakage as the tests in the consumer of the API will not run until a change to that service is pushed. Note that this can be resolved with 'contract testing', that is, testing of the API of a service within the service itself, which you do not get for free in the consumer.

It is more difficult to build a continuous integration system where all dependencies are not immediately available via the same repository.

This a very interesting topic that definitely warrants further discussion and debate, indeed the the choice of monorepo vs manyrepo has interesting implications for release strategy and whether supporting varied permutations of co-existing versions is worth the very real and visible overhead that it incurs.

Lots of the workflow pros of the manyrepo don't really come into affect until you have a very large engineering staff which most companies won't / are unlikely to ever have. Also google have solved some of these problems for monorepos by doing fancy stuff with virtual filesystems.

Personally I considered the manyrepo to be superior until I had to experience the increased developer overhead first hand. However it remains to be seen if this encouragement to think about services as separate distinct entities is worth it in the long run.

Wednesday, 13 September 2017

Initialising String Literals at Compile Time in Go

Recently I was working on a service and realised that we had no established way of querying it for its version information. Previously, on an embedded device, I have written a file at build time to be read by the service at run time, with the location set by an env var. It would also be possible to set the version in an env var itself but these are overridable

However a colleague suggested injecting the version string into the binary itself at compile time so I decided to investigate this novel approach.

go tool link specifies that the -X argument allows us to define arbitrary string values

go tool link
...
-X definition
add string value definition of the form importpath.name=value
...

go build help explains that there is an 'ldflags' option which allows the user to specify a flag list passed through to go tool link

go build help
...
-ldflags 'flag list'
arguments to pass on each go tool link invocation.
...

So we can pass a string definition through the go command (build, run etc)!

In the above program we can see that we define a variable foo in package main.
Thus the fully qualified name of this variable is main.foo, thus this is the name we pass.

$ go run main.go
injected var is []

$ go run -ldflags '-s -X main.foo=bar' main.go
injected var is [bar]

This can be used in novel ways to inject the output of external commands at compile time such as the build date.

$ go run -ldflags "-X 'main.foo=$(date -u '+%Y-%m-%d %H:%M:%S')'" main.go
injected var is [2017-09-13 13:44:59]

This is a nice feature that I'm sure has applications beyond my reckoning at this time. In my use case it makes for a compact, neat solution, however it does cause some indirection when reading the code making it harder to follow so I would argue should be used sparingly. It has the nice quality of being immutable over env vars which could potentially be rewritten. Anyhow it is a pretty cool linker feature!

Friday, 16 December 2016

Rebase as an Integration Strategy for Feature Branches

There are generally two reasons for using git rebase, 1) To tidy up/ rearrange commits that aren't publically in use 2) As a strategy for integrating branches. This post discusses the second use case. Rebase gets a lot of bad press, I think that this is parlty due to misunderstanding, it's like a dog that people kick, it bites someone, then it gets put down, so lets try and understand it and then play fetch with it or something instead.

So most people are familiar with merging as an integration strategy, the problem with merging is that it creates a non-linear history polluted with merge commits. The murky history manifests itself as ambiguity in tools like git log and increased difficulty in using git bisect, it generally makes archeological spelunking and working with history harder. Merges do have some benefits however, they naturally work well with pull requests and also preserve branch history (that may or may not be valuable to you). For a full discussion of pros and cons check out this excellent article.

Merge and Rebase Workflows

Starting steps, create a feature branch off the tip of up-to-date/ upstream master

 $ git checkout master
 $ git pull
 $ git checkout -b feature-branch

State of Play
      C             feature-branch
      /
A---B---D        master

A normal merge workflow

... do some work, stage it
 $ git commit -m "C"
 $ git checkout master
 $ git merge feature-branch
 $ git push origin master

Post-Merge
     C        feature-branch
    /   \
A---B---D        master

A normal rebase workflow

... do some work, stage it
 $ git commit -m "C"
 $ git rebase master
 $ git push origin feature-branch:master

Post Rebase
C'    feature-branch
       /
A---B---D        master

The Rebase Workflow Analysed

The rebase command above takes master and forward ports C on top and sets this to be feature-branch, this can be considered as 'rebasing' "feature commit" on the updated master. Through this process C is rewritten, with a different SHA, it is now a different commit, C', this can be a sticking point in understanding, but an important feature of a commit is that it is immutable. Rebases rewrite commits whereas merges don't.

The git push means, push the ref before the colon to the ref after the colon on the remote origin. If your git push is being rejected as a non-fast forward, you are doing something wrong or someone has pushed in the time since you pulled, the blighters, repull master and rebase onto it again.

Consider Dry Run

If you are worried about what you are commiting, note that you can always see what you are doing before you push it using the --dry-run argument to git push, which stops short of sending the actual update, you can then run git log on the SHA range it outputs.

$ git push origin feature-branch:master --dry-run  
 To git@github.com:aultimus/example-project  
   d4f3294..6c53234 feature-branch -> master  
 $ git log -U3 d4f3294..6c53234  
 ...

On a personal note, I prefer generally prefer rebase over merge for integrating feature branches, I am a bit keen on a nice clean history, the importance of well-kept history is a topic for a future post.