Back in July I published Encountering some turbulence on Bitbucket's journey to a new platform, sharing with the public for the first time that Bitbucket Cloud is in the final stages of a migration from our data center onto Atlassian's cloud platform—the same internal platform underlying Jira Cloud, Confluence Cloud, Statuspage, and many other internal services.
I also shared that because of increased file system latency as a result of this platform move, certain operations have become slower. Specifically, rendering diffs and merging pull requests both saw a measurable decline in performance after moving out of our data center.
To optimize our diff and diffstat endpoints, our engineers implemented a solution where we would proactively generate diffs during pull request creation and cache them. Then upon viewing the pull request, they would see diff, diffstat, and conflict information all retrieved from the cache, bypassing the file system and avoiding increased latency.
For a web service at Bitbucket's scale, solving problems like this often doesn't go the way you expect. While our engineers worked on resolving these issues, they discovered an optimization opportunity that would allow us to perform much more of the work required to generate the diff in local memory, significantly reducing file system I/O. We rolled out this change on July 13 and saw average response times for our diff and diffstat APIs plummet by nearly 40%: