Design a Fine-grained Authorization System with RBAC and ACL

2023-07-30 / modified at 2024-12-22 / 2.2k words / 13 mins

Implementing flexible, fine-grained permission management is an essential component of robust software design. This article will walk through various authorization system designs that aim to achieve that goal.

The design may be too complicated for start-up teams, as the article is mainly for enterprise applications that have lots of compliance policies.

What is authorization management

Terminologies

An authorization management(Authz) constitutes subjects, namespaces, resources and capabilities.

Subjects: Standalone users or groups. In the enterprise environment, users and groups are getting sychronized from managed IAM platforms, HR services or LDAP servers.
- User: The individual user, such as sAMAccountName in LDAP servers.
- Group: A group is just a set of individuals users, such as memeberOf in LDAP servers. Nested group may be complex at the first glance, but actually they can be flatten via SQL recursive query(CTE).
Namespaces: An organizational structures, such as nested departments.
- Metadata: Name, description and other descriptive K-V pairs of the namespace.
- Resources: The digital asserts inside the namespace.
  - Metadata: Name, description and other descriptive K-V pairs of the assert.
  - Asserts: Such as API entrypoints, Row & Column level fiter or virtual datalakes. You have to design the granularity at the row level, object level or field level, they are specialized domains
  - Capabilities: Defines the capabilities that someone is granted to access asserts. It’s mostly described with an immutable string.
    - Effect: read/write/list/full
    - Action: The field in the assert.
    - Eg: AWS use allow:secretsmanager to describle the read capability.
  - Assignments: Who is grant to access the resources, there are four types of strategies
    - Share with all members inside the namespace regardless of the assigned namespace capabilities.
    - Share with members who is grant with some capabilities inside the namespace.
    - Share with individual users or groups directly inside the namespace.
    - Share with whoever out of the namespace. (Not recommand)
  - Ownership: Who is responsible for maintaining metadata and asserts, managing shares of the resource.
- Capabilities: Defines the member capabilities in the organizational structure.
- Assignments: Define the membership inside the namespace, such as inviting user A with three capabilities granted as a namespace member.
- Ownership: Who is responsible for managing the namespace.
Rule Engine: The dynamic process of assessing if someone is allowed to access the resource. That’s what so called ABAC, RBAC or ACL.

Preparation

I’d recommand to enumerate all capabilities and checkpoints before any development, as the capabilities may be embedded in the code and difficult to be modified over time.

Following checklist must be accomplished before your permission design:

Capability and granularity: mostly for specialized domains, the design can’t be reused among projects. Ensure the design being stateless so that developers can write unit tests independently.
Membership: Who can manage the membership? Is your system design vulnerable to privilege escalation?
Ownership can be complex in GDPR compliance, especially personal data that is held, processed, and redistribed by companies, which requires massive checklists. To manage data produced by a person, It’s better to ensure each object can be managed at the lowest granularity. Before all else, it’s critical to assign each object to only one person who has the legal right, as the owner might not process the data directly. Even the matter being discussed here is the design of authorization, the design should also be considered for big data or AI trainings in the future. Moreover, the design of the grant should be managed appropriately. Never assign an object without the owner’s explicit consent regardless of the supervisory role such as super admins.
Audit:
- Implement selective dialog for grant access instead of default settings.
- Define audit checkpoints when operations are invoked, they should be managed transparently.
Rules engine
- Scope: Check if require multi-scope organizational structures in your service, as the priority evaluations can’t be directly merged within multi-scope organizations.
- Contrary: Which has higher priority while both “allow policy” and “denial policy” are evaluated together.

RBAC

What is RBAC

Role-based access control, which is abbreviated to RBAC, is widely taught in computer courses. It provides an indirect mechanism that assigns an attribute(such as a job title called “Accounting Manager”) to a person and then binds the attribute to the capabilities in the managed scopes.

How roles are defined

Prior to the discussing, it’s imperative we first address three types of role definitions. In the real-world development, I’ve seen the real-world users who couldn’t distinguish following opeartions.

Invite User A into a project as a ProjectAccountant role, mimicking real-world job functions; then appropriate data permissions would be programmatically assigned to that role, granting User A access automatically.
Invite A into a group called project_accountant_data, then assign the data permission to the group directly.
Invite User A into a project with a role AccountingReader, while the role is just a set of capabilities of namespaces and resources, which is unrelative to the real-world project accountant.

Role definitions can take either an intensional and extensional form. An intensional definition specifies a read-world role while extensional definitions just enumerate the explicit members or a set of abstract capabilities.

The Intensional role definition - Conceptualization

The first definition uses the declarative notations for roles, that is to say, the conceptualization of the roles. In a nature language statement, the role utilizes with a restrictive relative clause that function as a pronoun.

1 2	- [Kyon ] can view sales data. + [People who are sales ] can view sales data.

The intensional definition of a role can be an attribule.

var MarketingDept = [
  'John': {role: ['Manager','Sales']}, 
  'Lisa': {role: ['Sales', 'FAE']}, 
  'Tom' : {role: ['Sales']}
];

As a computer never understand what “Manage” means, building a real-world title comes with the developer overhead. There are a few considerations to keep in mind:

Conceptualization is a highly developed skill that relies on the industrial experience and ontological thinking. A leetcode proven programmer may not be capable of abstraction in a short time.
The real-world role may change frequently while software engineers need rework for simulating the real world through Excels or databases over time.
While some users argue their roles should get more privileges, others oppose expanding access, resulting in contentious and protracted meetings that ultimately delay development timeline.

The extensional role definition 1 - Membership enumeration

To reduce the complexity, the fundamental action is to remove attributes – a role can be merely considered a named subgroup. We can enumerate all roles without any conceptualization!

var MarketingManager = ['John'];
var MarketingSales = ['Lisa', 'Tom'];
var MarketFAE = ['Lisa']
var MarketingDept = [MarketingManager, MarketingSales, MarketFAE]

In my view, the enumerative way is a better alternative because it can be learned by users instinctively.

It’s more flexible to externalize the real-world roles as enumerative groups (nested groups are acceptable) into third party systems (such as OpenLDAP, HR Service), while our applications connecting to these services are abstracted from the underlying definition of departments.

Membership validation or synchronization can also be implemented via centralized authorization framework such as traefik’s forward-auth or other aspect oriented middleware.

The extensional role definition 2 - Capabilities enumeration

Another alternative is to create build-in roles via capabilities

1
2
3

var S3Reader = ['read:s3'];
var EC2Reader = ['read:ec2instance', 'read:ec2image', 'read:ec2ebs'];
var ClouldReader = [S3Reader, EC2Reader]

It’s the most widely used in cloud services, we can also classify it as ACL (Access controll list). However, an ambiguous role name can also leading to a steep learning curve, as examplified by using “Owner”, “Manager”, and “Admin” together.

For example, Gitlab, a well-known code management software that represents the design of nested RBAC authorization. When we want to assign a read-only permission to someone, we found that the role Reporter and Guest are both semantic symbols of observation, which meaned they were not mutually exclusive in the real world, making it impossible to configure the roles correctly without an official cheat sheet. Additionally, when we tried to modify the source code, we found that the capability :admin_project was in a mess (both frontend and backend).

In contrast to the prior case, Sonarqube, a code scanner platform, chooses to manage roles concisely. Sonarqube defines two roles: the project administrator and the project member. Everyone can infer the administrator has a higher permission.

Build-in roles must be concise enough, such as admin, read-write and read-only. If you insist on designing additional roles, the names of role must be derived from the capabilities inside the system rather than from the real world.

The design of an ACL can be challenging as it requires linear growth in database storage as the amount of data and owners being managed increases over time.

How to use them together?

RBAC provides coarse-grained access control by attributes while ACL uses fine-grained policies at an individual resource level. Complications arise when permission overlaps, here are some hints.

To implement the requirements, here is the pesudo code

//Check if a user can access a resource
public boolean can(User current, Resource res, String effect, String action){
  // deflate all capable subjects, as exemplified in Membership enumeration
  List<Subject> subjects = getJointLDAPGroups(current);
  // find all capabilities, as exemplified in Capabilities enumeration
  Map<Namespace, Map<String, String>> nsCaps = getCapsOfJointNamespaces(subjects);
  Map<String, String> resCaps = getCapsOfTheResource(subjects, res);
  
  // Your have to implement your own priorities
  return evaluationEngine(subjects, nsCaps, resCaps, effect, action)
}
// test
can(userA, dataA, "read", "accounting:margins");
can(userA, dataB, "list", "accounting:netprofiles");

Inheritance & Segregation of duty (SoD)

It’s deemed high-risk to allow an admin to access tanant or subordinate data as it violates isolation boundaries. Intersection and union caculations can be applied to constrain privileges for the admin role, enabling admins to read item lists while denying access to details.

// enabling admins to read item lists while denying access to details.
systemScopeCheck  = () -> can(userA, null, "manage", "accounting:margins");
objectScopeCheck  = () -> can(userA, dataA, "read", "accounting:margins");
boolean canList       = systemScopeCheck() || objectScopeCheck()
boolean canReadDetail = systemScopeCheck() && objectScopeCheck()

The cost of developing SoD is relatively expensive and time-consuming, we insert hundreds of checkpoints and audit logs inside the source code. Don’t take first priority over it.

Appendex

Alternatives to notation based role definitions

Recenlty some frameworks focus on low-code role definitions, claiming to be pluggable, declarative, and unified. Meanwhile, these methods don’t always guarantee the flexibility.

The following table compares the notations across diversified authorization frameworks.

Notation	Paradigm	Logic
Tries traversal: Open Policy Agent	Declarative	Combinational
Relation tuple: Ory Permissions	Declarative	Combinational
Relation tuple: Google Zanzibar	Declarative	Combinational
Spring Security Annotations	Declarative	Combinational
Casbin Go like DSL	Declarative	Sequential
Rete algrorithm: Drools	Mostly declarative	Combinational
JSON Schema	Mostly declarative	Sequential
Hard-coded programming	Imperative	Sequential

The key to classify the frameworks is to check if it supports sequential logic during the policy evaluation. For instance, the if-else statements implicate the priorities during the calculation, while bitwise operations are timeless and do not require the state storage during the evaluation.

If you need if-else statements during the policy evaluation, I’d recommand not to use any low-code solutions to torture yourself.

In recent work, I tried to use “Open Policy Agent(OPA)” so that I could modify the policies on the fly. Unexpectedly, the requirements from customers were not as declarative as low-code, including following rules.

Nested and complicated if-then statements with hard-coded situations
Hard-coded role assignments to a specialized leader
Row-level filters that were strongly bonded with SQL queries

After I tries OPA, I found that

The rego language was not Golang, I’d rather call it as Verilog for its timeless. OPA is more like an EDA synthesizer.
Required addtional work hours on the maintenance of high availability of OPA instances

After times of reworks, I gave up using OPA because our works were merely a translation between “customer’s prompt” and “handmade code”, which was an AI capable of.

From my perspective, the most common mistake in system design I’ve made is over engineering. There is no unified way to notate the constrains. Conversely, the only approach to reduce the expenditure is to minify the code at an engineering level.

Summary

In this article, we walked through RBAC, ACL, and low-code.

Make roles mutually exclusive while using RBAC, using extensional definition is preferred.
Ensure ownership and membership be defined before using ACL.
It’s ideal to provide rights-respecting policies and SOD when using RBAC and ACL together.
Low-code frameworks is not a unified solution, a hard-coded solution could be acceptable for small teams.

Ultimately, choosing the best solution depends on compliance requirements.