Thursday, August 23, 2018

Testing and modifying GitHub PRs without extra remotes, branches, or clones

As a popular open-source project, Ansible sees dozens of pull requests (PRs) each day from numerous members of our awesome community. Our CI system puts each one of those PRs through its paces on a litany of hosts and containers, but sometimes that's not enough. During the process of reviewing a PR, we may need to run it locally on a specialized test system, and sometimes we'll need to submit changes to it that should also be run through the CI gantlet before being merged. GitHub made this process a lot easier with the ability to commit changes to PR branches on forks, but most of the official documentation of the process either requires a whole new clone of the remote repo, or adding remotes or branches to your local repo. That's a lot of extra unnecessary work for ephemeral branches and forks I don't want to keep around.

It's possible to locally pull down and test a PR, as well as push changes back to the original fork/branch, without messing with any local clones/remotes/branches. This relies on a couple of oft-misunderstood git features: detached HEADs, and the FETCH_HEAD ref. Basically, the process involves fetching the PR branch directly from the remote fork via its URL, then checking out the resultant FETCH_HEAD ref as a disconnected head (so we don't have to create a local branch either). At that point, we have exactly the commits as they exist on the PR's source branch. This is important, because if we were to use a rebased tree, we can no longer just add commits to the original PR branch. With the original commits, we can make modifications, test things, whatever. Any commits we make are added to the disconnected head, which we can then push directly back to the PR fork's branch (again by URL), and GitHub will add the new commits to the end of the branch, just as if the original submitter had pushed new commits. All CI and checks on the PR will be triggered as usual, code reviews and comments can happen, etc., - we're still taking full advantage of GitHub's PR feature set (instead of direct-merging the changes back to the main Ansible repo and bypassing all the rest of GitHub and Ansible's pre-merge infrastructure).

So let's get to it already!

Let's assume you have an Ansible clone laying around in ~/projects/ansible, and that it's your current working directory...

Before we can fetch a PR branch, we'll first need to know the source fetch URL and branch. As of this writing, when viewing a PR, it can most easily be found just below the PR title, and looks like "(user) wants to merge (commits) into (target_fork:target_branch) from (source_user:source_branch)". That last part is what we need: the username or org where the source fork lives, and the source branch name it's coming from.

The source fork fetch URL should be "https://github.com/(user-or-org)/(repo_name).git" to fetch over HTTPS, or "git@github.com:(user-or-org)/(repo-name).git" for SSH. So if the submitter's username is "bob", the project repo is called "ansible", and the source branch name for the PR is "fix_frob", the HTTPS fetch URL would be "https://github.com/bob/ansible.git", and the SSH version would be "git@github.com:bob/ansible.git".

With these two pieces of information, we can now fetch the PR branch with a command like:

git fetch (source fork fetch URL) (source branch)

For our hypothetical example:

git fetch https://github.com/bob/ansible.git fix_frob

We now have the necessary objects from the remote sitting locally in a temporary ref called FETCH_HEAD (which is used internally by git for all fetch operations). In order to do something useful with them, we need to check them out into a working copy:

git checkout FETCH_HEAD

This gives us the contents of the temporary FETCH_HEAD ref into what's called a "detached HEAD"- it behaves just like a branch checkout in every way, but there's no named branch "handle" for us to use to refer to it, which means there's nothing we need to worry about cleaning up when we're done!

At this point, we can do whatever operations we like, just as if it were a normal working copy or branch checkout. If it was just a local test, and there's nothing we need to push back to the source branch, the next checkout of any branch will zap the state, and there's nothing for us to clean up. If we want to keep it around for some reason, it's easy to convert a detached HEAD into a normal branch.

But maybe there's a small change you want to add to the PR- say the submitter forgot a changelog stub and we just want to get it merged without waiting. GitHub's UI will allow you to make a change to an existing file in a PR as a new commit, but you can't add new files through the UI. No worries- we can use a similar process to push new commits back to the original source branch!

Make whatever changes are necessary and commit them as normal (as many commits as needed)...

For our hypothetical example:

echo "bugfixes: tweaked the frobdingnager to only frob once" > changelogs/fragments/fix_frob.yaml
git add changelogs/fragments/fix_frob.yaml
git commit -m "added missing changelog fragment"

We could just push our changes up, but remember, we're talking about pushing commits to someone else's repo. It's a neighborly thing to do to verify that we've only included the changes we expect, and that the submitter hasn't added anything more. To do that, we'll use the same command we did originally to refresh the FETCH_HEAD ref with the current contents of the source branch (which are hopefully unchanged):

git fetch (source fork fetch URL) (source branch)

so for our example:

git fetch https://github.com/bob/ansible.git fix_frob

and then we'll diff our detached HEAD contents that we want to push against the just-updated FETCH_HEAD:

git diff FETCH_HEAD HEAD

which should show us only our new commits. If anything else shows up, we've either accidentally committed some unrelated stuff, or new stuff has shown up in the original source branch, and it needs to be reconciled before we push (an exercise left for the reader).

Assuming all's well and we're ready to push, using the same source repo URL and branch we figured out above, push the changes back to the source repo with a command like:

git push (source fork fetch URL) HEAD:(source branch name)

For our example:

git push https://github.com/bob/ansible.git HEAD:fix_frob

If all's well, you should be prompted for credentials, then the new commits will be pushed. At this point, you can check out any other branch/ref and work on as normal, or repeat this process for other PRs- no cleanup necessary!

If you see an error about "failed to push some refs", it usually means the PR owner has changed something on the source branch, and you'll need to reconcile before you push. Force-pushing is almost never the right thing to do- you may potentially overwrite other commits!

A few other notes:
* Support was later added for SSH push, which makes life much easier if you're using 2FA (of course you are, right?). Pushing over HTTPS with 2FA enabled requires jumping through some extra hoops... You'll have to use a personal access token as your password, since GitHub's 2FA support doesn't support command-line authentication.
* Be very careful about merging or rebasing from other branches if you'll be pushing changes back. A rebase will prevent you from pushing altogether (without force-pushing, but don't do that), and a careless merge from your own target branch will add all the intermediate commits since the PR owner last rebased. At least for Ansible, that's a deal-breaker...

Testing and updating PRs without extra remotes, branches or clones using this process saves me a lot of hassle and cleanup- hope it's useful to you!