Managing Redis ACLs in a Multi-Tenant Environment: Ensuring Consistency Across Redis Sentinel Nodes

20205-05-20

2 minute read

Introduction

In a multi-tenant architecture, managing access to Redis can be quite challenging. Each tenant requires specific access controls to ensure that services within that tenant only interact with their respective data. Redis Access Control Lists (ACLs) are a great way to manage these permissions. However, maintaining these ACLs across a Redis Sentinel cluster can introduce complexities, especially when failovers and master-slave transitions occur.

The Initial Setup: Multi-Tenant Redis ACLs

We started with a Redis Sentinel cluster designed to provide high availability and automatic failover for our multi-tenant application. For each tenant, we created specific Redis ACLs to restrict access to keys and channels pertinent only to that tenant. This setup worked flawlessly for a while, ensuring robust security and isolation.

The Problem: Disappearing ACLs

Out of nowhere, we encountered a critical issue: we lost all our ACL users. This was perplexing and immediately disrupted our services. Upon troubleshooting and investigation, we discovered a crucial detail shared in a Redis forum discussion: Redis does not automatically sync ACLs between master and slave nodes. As a result, whenever a master node failed and a new one was promoted, the newly promoted master lacked the ACLs, leading to the disappearance of our user definitions.

The Cause: Lack of ACL Synchronization

The root of our problem was clear: our approach of creating ACLs only on the master node was insufficient. In a Redis Sentinel setup, the master can change due to failovers, and since ACLs were not synchronized to the slave nodes, the promoted master did not have the ACL configurations. This oversight meant that each failover resulted in a complete loss of user ACLs, breaking the tenant-specific access controls.

The Solution: Applying ACLs Across All Nodes

To address this, we modified our approach to create and save ACLs on both the master and slave nodes. This ensures that all nodes in the cluster have consistent ACL configurations, regardless of which node is currently acting as the master.

We implemented a Python script that connects to the Redis Sentinel, discovers all nodes, and applies the necessary ACLs to each one. This way, even during failovers, the newly promoted master will have the correct ACL configurations, preserving tenant isolation and access controls.

Key Steps in Our Solution

Connect to Redis Sentinel: The script connects to the Redis Sentinel to discover the current master and slave nodes.
Apply ACLs to All Nodes: It then applies the ACL configurations to both the master and all slave nodes, ensuring consistency.
Handle Failovers Gracefully: By having all nodes configured with the necessary ACLs, failovers no longer result in lost configurations.

Conclusion

Managing ACLs in a Redis Sentinel cluster for a multi-tenant architecture requires ensuring that all nodes have the same ACL configurations. Our initial approach of setting ACLs only on the master node led to significant issues during failovers. By extending ACL application to all nodes, we now maintain consistent and reliable access controls, ensuring the security and isolation of tenant data.

This experience underscores the importance of understanding the intricacies of the systems we rely on and being prepared to adapt our strategies as we encounter new challenges. With the new script in place, we are confident in our ability to provide a robust and secure multi-tenant Redis environment.