Before we dive into the technical details, let’s first briefly discuss what Audit Trails are.
An Audit Trail (or Audit Log) is an account of the operations that someone has performed, in a chronological order. Each audit record captures at least this basic information:
Operation that was performed;
Identity of the user who performed the operation;
Resource on which the operation was performed;
Time at which the operation was performed.
As you can see, each audit record is essentially about who did what, and when.
Auditing helps in:
Regulatory compliance, as it’s a kind of documentary evidence that can be used to track changes.
Detecting security violations.
Audit Trails are an inbuilt feature of the Harness platform. Most of the actions that are performed on the Harness platform generate an audit record. See the below screenshot.
Note: The Audit Trails UI will be released in Q4 ending Jan 2022. Audit Trails can be accessed via API at present.
If you want more information on the basics of Audit Trails, please feel free to read our 101-level blog post. The rest of this post is a technical deep dive, and talks about the high-level design of how we implemented Audit Trails into Harness.
In Harness, we have a microservices-based architecture where there are multiple domain-specific microservices. For example, we have a CI microservice, a CD microservice, etc. that wants to maintain this kind of an audit trail. For this reason, Audit Trails is built as a platform-level feature inside Harness as it’s a common functionality required by many microservices.
At a high level, the auditing system consists of a central dedicated microservice, which is responsible for everything related to audits. On the other side, there are clients (other domain specific microservices, such as CI, CD etc. in this case) who record the operations happening inside them in the form of an audit with the auditing service.
This microservice is primarily responsible for:
Providing query & filter capabilities on audits.
Deleting audits beyond the retention period.
Enforcing access control on audits.
Some of the salient features of this service are:
It’s a Dropwizard application, as are most of the other microservices in Harness.
It exposes all its functionality via REST APIs. No GraphQL.
All the APIs are access controlled to ensure only authorized users can read audits.
There is a background job which runs inside this service to ensure audits beyond retention period are deleted periodically.
Multiple instances of this service run in production for high availability and scalability.
Data storage is the most interesting aspect of this service. The Auditing service uses MongoDB as its storage backend. These are the few reasons for this choice:
Non-Relational Data: Audit records don't have a relationship among them. One record doesn’t refer to another. Hence, foreign key relationships are not required.
ACID: No requirement to maintain consistency across multiple audit records. Audit records are independent of each other. So, a data store that provides ACID property at an individual record level is just fine.
Operational Overhead: Rich experience of working with MongoDB within Harness. Most of the microservices in Harness use MongoDB Atlas as their data store. A different data store would have added to operational complexity.
Scalability: As with most NoSQL DBs, MongoDB also provides sharding capability out-of-the-box, which may be helpful in the future when the volume of data increases. Relational DBs are difficult to scale once data grows beyond what can be fit on a single machine, because of the features such as ACID transactions, foreign key constraints they provide.
Schemaless: Audit records don’t have a fixed schema. Many fields like Timestamp, Action, User, Resource ID, etc.are common to all audit records, but there can be many additional fields in the audit record that depend on the kind of operation being audited.
As mentioned earlier, clients are other microservices that want to capture the operations being executed inside them as an audit trail.
Registering the audit is easy: just call the API of the auditing service. However, the most important technical consideration on the client side is how to capture the operation as an audit record in the first place.
Let’s look at few options:
Embed Audit Code In the Service Business Logic: Each service method can create an audit record and save it with the auditing service. In this approach, audit log code and business logic gets mixed up, which makes it unmaintainable. Also, developers can forget to write audit logging code. This makes this approach error-prone.
Aspect-Oriented Programming (AOP): Use Aspect-Oriented framework like Spring to intercept method calls and save audits. However, the AOP framework only has access to method name and arguments, so it’s difficult to determine the object on which the operation is being performed, and generate an enriched business-oriented audit record.
Change Data Capture: Find out operations that happened via transaction log maintained by databases, and generate an audit record from them. This approach has a high coupling to the underlying DB technology. Frameworks like Debezium simplify change data capture by abstracting out DB-specific implementations. But this approach still has high operational and maintenance (code to translate DB row level changes to high level business-oriented audit record) overhead, something we would like to avoid.
We would prefer a standard way to capture audits across microservices even though they might be built upon different tech stacks. Enter Domain Events.
Domain Events are a very important tool in Domain-Driven Design. Domain Events record business significant occurrences. ProjectCreated, AccountDeleted, SecretUpdatedare examples of Domain Events. Other business workflows can be triggered based on these domain events.
Domain Events contains all relevant information. For example: ProjectCreated events would have accountId, id, name, description properties. Other common properties such as userId, timestamp etc. can be populated from request context.
It’s extremely important that changes and corresponding Domain Events are saved together in the same transaction in DB.Otherwise, there exists a possibility that changes are persisted in DB, but before Domain Events could be persisted, the client crashes and Domain Events are lost.
Once Domain Events are persisted, a background job notifies interested listeners about their occurrence so that they can be executed. Interested listeners can be within the same microservice or other microservices. If another microservice is interested in the event, a dedicated listener can publish the event into some topic of a message broker, to which other services can subscribe to.
Audit Records from Domain Events
Once you have Domain Events in place, it’s very easy to leverage and generate audit records from them. Have a listener, which listens to the Domain events, convert them to audit records and finally save them with the auditing service.
Advantages of This Approach
Low Maintenance: Domain Events are part of the domain. They are captured at the business layer. Hence, It’s very easy to translate them to Audit records. Service layer code is not polluted with audit logging code.
No Need for Middleware: Domain Events are persisted in the service’s data store first, which acts as a persistent queue. An event would continue to be dispatched to all the listeners for processing until all of them have processed the event successfully. Hence, even if the auditing service is not available for some duration, it's not an issue. The event would continue to be retried until the audit is successfully registered!
We hope you enjoyed this technical Audit Trails walkthrough. Audit Trails are necessary in many industries and simply as a Best Practice in CI/CD. Again, feel free to read our entry-level post on Audit Trails to find out more about why you need them.
Interested in more Governance-focused content? Read our pieces on RBAC and Secrets!