The fine art of balancing two contradicting forces: A DevOps story of security vs usability

As DevOps engineers, one of the hardest tasks we may face is juggling two roles.

On the one hand, we are the enablers who provide our teams with everything they need to do their work better. On the other hand, we are responsible for the integrity, resilience, and security of the infrastructure.

Implementing security and compliance in a small and agile company like UP42 is an acrobatic balancing act: we must be able to implement changes fast and swiftly and keep them on board with infrastructure changes while also eliminating as many attacks as possible.

If you enforce too many limitations to protect the environment, you may lose the support of your teams and have a hard time implementing new security measures. Alternatively, if you choose to satisfy the teams at all costs, the environment's security will be more like a strainer than a protective wall.

This is a story of balancing two opposite forces.

Where We've Been

When I joined the UP42 SRE team a few months ago, I was granted full admin rights on our GitHub organization.

After a few months, I needed to perform a task that required me to remove my admin rights. To my surprise, I found that I lost access to more than a few of the repositories I regularly work with.

I started to research how we manage the GitHub permissions and this is what I discovered:

- Although we manage a well-structured GitHub team structure, its usage was not exclusive, but we provided access to repositories on a user basis. That led users within the same teams to see different sets of repositories (as I did).

- We had irregularities in the user roles. Some were admins, others were members without any logical structure.

- Repositories had inconsistent settings and protections in place.

I knew we didn't have a well-structured workflow to manage our GitHub organization, but the level of different settings caught me by surprise. It was clear there is work to be done to make our GitHub more organized.

The only question that was left was: How? How to unify the permissions and settings and add a layer of changes auditing while keeping the developers satisfied?

Repository

Balancing a Pin on Its Head

At first, we brainstormed what would be our goal for the GitHub governance project.

We could choose between two extremes:

Remove all admin rights from all users except the SRE and IT teams and make everyone come to us in order to change settings or access permissions. This approach could grant us major control and security but comes with a huge drawback: it would limit the developers' ability to work in an agile and self-supporting way, which is the exact opposite of our company culture.
Grand everyone admin access to GitHub organization and remove the need to manage repository permissions altogether, but that is not realistic. We do work hard to implement the principle of least privilege.

We had to find a way to balance a good security level and auditing while ensuring the development pace would not be impacted.

In the end, we settled for this:

Only the SRE and IT team should be GitHub admins;
Permissions on the repositories would be team-based;
The repository owner team may modify the repository's settings but not its permissions;
All or most actions on GitHub should be audited.

The King's Road

At UP42, we rely heavily on Terraform to manage our Infrastructure as a Code. GitHub has a Terraform provider and our developers work with Terraform regularly, so choosing to implement the solution with Terraform was an easy decision.

All of our GCP infrastructure is terraformed and changes are rolled out via CI. That way, developers do not necessarily need admin permissions on the GCP infrastructure but can still do all the changes they require via the Terraform code.

We decided that to create the same workflow for the GitHub "infrastructure" as well. After a few iterations, we implemented a Terraform module with our desired repository settings, imported all the users and a few repositories, and finished with a working PoC that achieved everything we wanted.

We were confident we had built a good solution and that the teams would like it as much as we did. It turns out the reactions weren't quite as glowing as we had expected.

Great Solutions Are Not Enough. You Need to Everyone's Consensus

When we showcased the project to the teams in the following guild meeting, the initial reaction was positive. They saw the benefits of easy repository setups, simple configuration (as little as eight lines of code for full configuration), and persistence over all the repositories. Yet, as the demo continued, questions and concerns around the permissions started to overshadow the many benefits.

A few meetings later, we managed to boil down all concerns to two major needs:

The ability to create repositories, not via the code, for PoCs and testing;
The ability to ignore branch protection in special cases (.i.e. they need to be repository admins).

Facing the Harsh Reality

What started as a project that we, the SRE team, thought covered all our concerns ended up not being quite so. We needed to make some security-related compromises to enable more needed usability.

We agreed the repository "owner" team would have admin rights on the repository to use the admin "superpower" when necessary. We also agreed they would not change permissions manually (the GitHub Terraform provider does not have, as I am writing this post, an authoritative permissions management). We also provided them the option to create repositories manually and later import them to Terraform using an import script we wrote for them.

However, the developers did understand the need for certain control measures.

module "my_repo" {
  source = "github.com/up42/terraform-modules//github_repository"
  name        = "my-repo"
  code_owners = ["* @my_org/my_team"]
  branch_protection = [{
    pattern                       = "master"
    required_pull_request_reviews = {}
    required_status_checks = {
      contexts = [ "ci/circleci: lint", "ci/circleci: plan" ]
    }
  }]
  permissions = [
      { team = local.all_teams["my_team"].id, role = "admin" },
      { team = local.all_teams["other_team"].id, role = "pull" },
  ]
}

Conclusion

As I hope this example emphasized, balancing security and usability is a like pulling a too-short blanket. You can rarely implement all of the security and compliance measures without impacting the usability of a system.

As for the project itself, it's now been in use for some time, and the general feedback is relatively good. It may not a perfect solution, and it still has some room for improvement, but it's a small step in the right direction.

There is a saying that Rome wasn't built in a day. The same idea stands for security.

Meanwhile, feel free to check out some of our open source tools on our GitHub repository.

Nir Tal

Site Reliability Engineer at UP42