Design a Fine-grained Authorization System with Capability-first Architecture
2023-07-30 / modified at 2026-04-09 / 1.8k words / 10 mins

Implementing flexible, fine-grained permission management is an essential component of robust software design. This article will walk through various authorization system designs that aim to achieve that goal.

Authorization Fundamentals

Terminologies

An authorization management(Authz) constitutes subjects, namespaces, objects and capabilities.

$2AuthzSubjects(Authn)Users & External GroupsCapabilitiesNamespacesMetadata: Namespaces' name, description...ObjectMetadata: Object's name, description...Data: the assetsCapabilities on the object(Immutable string)AssignmentsCapabilities on the namespace(Immutable string)AssignmentsOwnership

Subjects

Standalone principals (users, groups, even robot accounts). In the enterprise environment, users and groups are getting synchronized from managed IAM platforms, HR services or LDAP servers.

  • User: The individual user, such as sAMAccountName in LDAP servers.
  • External-defined group: A group is just a set of individuals users, such as memeberOf in LDAP servers, including nested groups.

The subjects are in scope of authentication(authn) system, what we need to do is synchronization.

Capabilities

Defines the permission that someone is granted to access. It’s mostly described with

  • an two immutable string combinations, like read:secrets to describe the read capability.
    • verb: read/write/list/full, which is similar to chmod.
    • noun: The field in the asset.
  • the computational results.

Granularity

These are different dimensions based on data classification. For instance, when design a multi-tenant project

1
2
3
4
5
Granularity Levels:
├── Tenant/Organization Level (Multi-tenancy isolation)
├── Object Level (Document #123)
├── Record/Row Level (Specific database rows)
└── Field/Column Level (Specific attributes)

Namespaces

Namespace are virtual scoping containers defined inside our system, such as nested departments, the diagram shows the top organization and departments.

1
2
3
4
Organization
└── Department1 (e.g., R&D)
└── Department2 (e.g., QA Team)
└── Objects (e.g., Documents, Records)

Inside each namespace (until Department2), there are three attributes

  • Namespace-level capabilities: Defines the member capabilities in the organizational structure.
  • Ownership: Who is responsible for managing the namespace.

Extrernal groups are fetched from third-party IAM, the namespaces are designed by ourselves.

Object

The object is the digital Assets inside the namespace, which contains

  • Metadata: Name, description and other descriptive K-V pairs of that resource.
  • Data: Such as API entrypoints, Row & Column level filter or virtual datalakes. You have to design the granularity at the row level, object level or field level, they are specialized domains
  • Resource-level capabilities
  • Ownership: Who is responsible for maintaining metadata and data, managing approvals of the resource.

Authorization Models

How assigments are defined

instead of

1
subject → permission(RBAC/ABAC/ACL etc.) → object

My design is

1
subject + namespace + object + capabilities → assigment decision

The minimum combination is 3x3x3=27

1
2
3
4
5
6
7
8
9
10
11
12
1. subjects -> objects
1.1. subjects - (subject-intrinsic capabilities) -> objects
1.2. subjects - (object-side capabilities) -> objects
1.3. subjects - (subject-intrinsic capabilities) - (object-side capabilities) -> objects
2. subjects -> namespace
2.1. subjects - (subject-intrinsic capabilities) -> namespace
2.2. subjects - (namespace-side capabilities) -> namespace
2.3. subjects - (subject-intrinsic capabilities) - (namespace-side capabilities) -> namespace
3. namespace -> objects
3.1. namespace - (namespace-side capabilities) -> objects
3.2. namespace - (object-side capabilities) -> objects
3.3. namespace - (namespace-side capabilities) - (object-side capabilities) -> objects

Beside the foundamentals combination, there are also

  • Organizational inheritance & merge & override
  • Contrary: Which has higher priority while both “allow policy” and “denial policy” are evaluated together.

I call it the superset of an academic & NIST classification.

the intrinsic capabilities on subjects

the subject-intrinsic, or the “prior” capabilities (inspired from Kant), which means it exists externally before software system design. For instance

  • The static literal attributes conceptualization

    • name, mail, employee ID, job title - Invite User A into a project as a ProjectAccountant job title, mimicking real-world job functions; then appropriate permissions would be programmatically assigned to that directly.
    • external department & group (from AuthN/IAM) - Invite user A into a external managed LDAP group called project_accountant_data, then assign the data permission to the group directly in our system, we do nothing direcly on the user A.
  • The external computability: exists only when evaluated, it can be algorithm or AI-based.

    • enrollment duration - Only allow 5 years experiences drivers to apply the form
    • accountability - For whom handles more than 50 SRE tickets can update the production database
    • security - The AI-ranked top 10 suspicious employees will be terminated accessing confidential data

As conceptualization is a highly developed skill that relies on the industrial experiences and iterations, there are a few considerations to keep in mind:

  • Minimize the types in system design.
  • Make the predicate mutual exclusive.

Moreover, an ambiguous definition can lead to a steep learning curve, as exemplified by using “Owner”, “Manager”, and “Admin” together.

the namespace-side capabilities

the membership capability involved what can do inside the virualized hierarchical container.

It can be described as

  • simple static records

    • the database schema like project_member(p_id, project_id, user_id) table
    • replication - synchonization and mapping from LDAP groups into dedicate tenants.
  • accessing: like allow_external, allow_sign_up, allowed_ip

  • Parent-child hierarchical relationship

Membership validation or synchronization can also be implemented via centralized authorization framework such as traefik’s forward-auth or other aspect oriented middlewares.

the Object-side capabilities

Capability defined by object, for instance, a document file can have

  • security classification: public, internal, confidential

  • accessing: reading, editing, sharing or execution

    • Unix File System: like chmod 600 on a non-share file.
    • Cloud ACL: like read:ec2instance
  • schema: defined the scope and existence of an object

    • external schema: URI like doc:///depart1/depart2/doc.txt, an unique UUID like document#134
    • Internal schema: the granularity, rows, fields, even potential

Security & Compliance

In the enterprise environment, there are two considerations on permissions

  • Internal control and procedural justice => you can fail or be slow but you can’t do nothing wrong.
  • Compliance policies: GDPR, SOC2, NIST

I’d recommend to enumerate all capabilities and checkpoints before any development:

  • Capability and granularity, which are domains specialized.
  • Membership: Who can manage the membership.
    • Implement selective dialog for grant access instead of default settings, Never assign an object without the owner’s explicit consent.
  • GDPR accountability: if personal data that is held, processed, and redistributed by companies, it’s better to assign each object to only one data controller who has the legal right, despite the owner might not process the data afterwards.
  • Audibility: administrative operations should be logged and audited
    • sensitive data: viewing, editing, deleting.
    • membership: granting, transfer, and revoking
    • but also be aware redacting sensitive logs for users privacy.
  • Performace & Engineering
    • Permission storage, versioning, cache (TTL, granularity) and invalidation
    • Permission calculation like bitwise, pruning
    • Ensure the design being stateless so that developers can write unit tests and benchmarks independently.

And finally, check your code & design with AI SAST threat modeling, check if your system design vulnerable to privilege escalation?

Permission Evaluation Examples

Evaluate with intersection and union calculations

Inheritance

Here is an examples for administrative roles

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Priority: Explicit DENY > Explicit ALLOW > Namespace DENY > Namespace ALLOW > Default DENY
boolean canAccess = evaluateAccess(user, resource, capability, LEVEL_RESOURCE, LEVEL_NAMESPACE) {
// Level 1: Resource-level explicit DENY (highest)
if (hasExplicitDenyAtLevel(user, resource, capability, LEVEL_RESOURCE)) return false;

// Level 2: Resource-level explicit ALLOW
if (hasExplicitAllowAtLevel(user, resource, capability, LEVEL_RESOURCE)) return true;

// Level 3: Namespace-level explicit DENY
if (hasExplicitDenyAtLevel(user, resource, capability, LEVEL_NAMESPACE)) return false;

// Level 4: Namespace-level explicit ALLOW
if (hasExplicitAllowAtLevel(user, resource, capability, LEVEL_NAMESPACE)) return true;

// Level 5: Default DENY
return false;
}

Segregation of duty (SoD)

SOD design requires matrixes to evaluate risks.

  • Role-Role Matrix: Evaluate the risk of a person having multiple roles.
  • Task-Task Matrix: Evaluate the risk of the tasks being accomplished by one person.
    • Purchasing and accounting.
    • Constructing and verification.
  • Role-Task Matrix: Draw a unified diagram permissions, especially for permission hierarchies.

The cost of on compliance is relatively expensive and time-consuming, we insert checkpoints and audit logs all over entrypoints. Here is a SOD example

First create a matrix

1
2
3
4
5
6
7
┌─────────────────┬─────┬─────┬─────┐
│ Role │ A │ B │ C │
├─────────────────┼─────┼─────┼─────┤
│ Create Invoice │ ✓ │ ✓ │ ✗ │
│ Approve Invoice │ ✗ │ ✗ │ ✓ │
│ View Invoice │ ✓ │ ✓ │ ✓ │
└─────────────────┴─────┴─────┴─────┘

Then implement with checks

1
2
3
4
5
6
7
8
// e.g. you can't apply for invoice and approve by yourself
var create = "/invoice/create"
var approve = "/invoice/approve"
// User cannot hold conflicting roles, implement yourself like retainAll
boolean hasConflictingRole = checkSoD(user, create, approve);
if (hasConflictingRole) {
return "SoD violation";
}

RBAC+ACL combination

RBAC provides attribute-based access control while ACL uses explicit policies at an individual resource level. Complications arise when permission overlaps, to implement the requirements, here is my pesudo code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//Check if a user can access a resource
public boolean can(Subject currentUser, Resource resource, String verb){
// deflate all capable subjects, as exemplified in Membership enumeration
List<Subject> subjects = accumulateJointLDAPGroups(currentUser);

// RBAC path: Attribute-based evaluation, may inheritance cascade
boolean rbacAllowed = evaluateRoleBasedAccess(subjects, resource, verb);

// ACL path: Explicit assignment evaluation
boolean aclAllowed = evaluateAccessControlList(subjects, resource, verb);

// Priority resolution, implement case by case
return resolvePriority(rbacAllowed, aclAllowed);
}
// test
can(userA, dataA, "read", "/accounting/margins");
can(userA, dataB, "list", "/accounting/netprofiles");

Appendex

Alternatives to notation based role definitions

Recently some frameworks focus on low-code role definitions, claiming to be pluggable, declarative, and unified. Meanwhile, these methods don’t always guarantee the flexibility.

The following table compares the notations across diversified authorization frameworks.

NotationParadigmLogic
Tries traversal: Open Policy AgentDeclarativeCombinational
Relation tuple: Ory PermissionsDeclarativeCombinational
Relation tuple: Google ZanzibarDeclarativeCombinational
Spring Security AnnotationsDeclarativeCombinational
Casbin Go like DSLDeclarativeSequential
Rete algorithm: DroolsMostly declarativeCombinational
JSON SchemaMostly declarativeSequential
Hard-coded programmingImperativeSequential

The key to classify the frameworks is to check if it supports sequential logic during the policy evaluation. For instance, the if-else statements implicate the priorities during the calculation, while bitwise operations are timeless and do not require the state storage during the evaluation.

If you need if-else statements during the policy evaluation, I’d recommend not to use any low-code solutions to torture yourself.

In recent work, I tried to use “Open Policy Agent(OPA)” so that I could modify the policies on the fly. Unexpectedly, I found that in my try run

  • Complexity on working with row-level SQL query filter, or nested and complicated if-then statements
  • Additional work hours on the maintenance of high availability of OPA instances

After times of reworks, I gave up using OPA because our works were merely a translation between “customer’s prompt” and “handcrafted code”.

The optimal design balances compliance needs, operational overhead, and team capacity, there is no unified way to notate the constrains. Conversely, the only approach to reduce the expenditure is to minify the code at an engineering level.