AIR Responder Architecture; overview and performance analysis

What is an AIR responder?

The AIR responder, a 40MB standalone package, acts as a virtual incident responder, delivering SOC level 3-4 expertise to your assets for unmatched cyber resilience and readiness. It interfaces with the AIR console for executing precise, user-defined tasks, providing wide-ranging coverage with minimal resource use, bypassing the need for constant monitoring.

The AIR responder maintains regular communication with the AIR Console via what in its simplest form is known as HTTP polling, and what we like to call, ‘a visit’. The visit interval is normally about 30 seconds for environments with fewer than 1000 assets. For larger environments, the interval is calculated using the following formula:

intervalSeconds = MANAGED_ENDPOINT_COUNT / 100

For instance, in a scenario with 5000 assets, the calculated visit interval would be 50 seconds.

The responder sends these visit requests to tell the AIR console that it is online and ready to receive any task assignments that are awaiting actioning.

If the responder does not make a visit at the required interval, it will be shown as offline in the AIR console.

If the responder does not make a visit for 30 days, it will be marked as unreachable. This status will immediately be fixed once the asset is back online.

If a task assignment is not collected by the responder within 30 days of its creation, it will expire and will not be actioned even when the asset reconnects and the responder visits next.

How does the AIR responder work?

Simply put, when the AIR responder collects a task assignment from the AIR console, it carries out the task and provides a report back to the AIR console upon completion. On the other hand, when the AIR responder is in an idle state, it periodically (as discussed above) sends visit requests to the AIR console, checking if any new tasks have been assigned to it. During these visit requests, the AIR responder only checks for task assignments and does not perform any other operations.

The AIR responder is capable of executing various tasks when assigned by the AIR Console. These tasks include:

  • Acquisition

  • Triage scanning (YARA, Sigma, osquery, MITRE ATT&CK)

  • Isolation

  • interACT sessions

  • Auto Tagging

  • Disk/Volume Imaging

  • Investigation (Timeline)

  • Baseline

  • Log Retrieval

  • Certificate Authority Update

  • Migration

  • Reboot

  • Shutdown

  • Update

  • Uninstall

Both the Acquisition and Disk Image tasks support the ability to upload collected evidence to external repositories such as Amazon S3, Azure Blob Storage, FTPS, SFTP, and SMB. These tasks enable the AIR responder to securely transfer the acquired evidence or disk images to the designated repositories for storage and further analysis.

By utilizing the supported protocols and repositories, the AIR responder ensures that the collected evidence or disk images are safely transmitted and stored in the desired locations. This allows for efficient storage, accessibility, and collaboration, making it easier to manage and analyze the acquired data in a secure and scalable manner.

NB: AIR v4.7 (and later) has an option for the Windows, macOS, and Linux AIR responders to transmit evidential collections directly to external evidence repositories, thereby efficiently minimizing the utilization of local disk space.

How is the AIR responder secured?

The AIR responder maintains robust security by implementing a range of measures including:

Encrypted Traffic: The traffic between the AIR responder and the AIR Console, as well as between the AIR responder and any evidence repositories is encrypted with TLS 1.2 and TLS 1.3 if available on the server. If neither of these two TLS protocols is available, the connection will not be established. This ensures that data in transit is protected against interception and unauthorized access.

Communication: The AIR Console does not initiate the sending of task assignments to the AIR responder; rather, it is the AIR responder that initiates the interaction, by asking the AIR Console if it has any tasking assignments ready for it to run. This approach significantly reduces the risk of various security attacks, as it controls the communication flow and reduces the AIR responder's exposure to external threats.

Privileged Account Usage: On macOS and Linux, the AIR responder uses the root account, while on Windows, it uses the system account. This level of access control makes it difficult for other users to tamper with the application, thereby enhancing its security.

Regular Internal Penetration Testing: Before every release, our internal penetration test security team conducts thorough penetration testing. This proactive approach helps identify and mitigate potential vulnerabilities.

Secure Libraries and Third-Party Applications: We consistently use updated and vulnerability-free libraries and third-party applications. This precaution in maintaining up-to-date software components protects against known security vulnerabilities.

Supply Chain Attack Prevention: Measures are in place to protect against supply chain attacks, and these are continuously improved by our DevOps team. This is crucial to prevent threats that could compromise the software development and deployment process.

Continuous Source Code Scanning: The source code is regularly scanned by security tools. This constant monitoring helps to quickly identify and resolve any security issues that arise in the codebase.

Digital Signing: The use of digital signatures adds a layer of security, ensuring the authenticity and integrity of our software. This helps to prevent tampering and to verify that the software has not been altered after it was signed.

Blackbox Analysis: The binary undergoes Blackbox analysis, a method of testing the software’s external functioning without delving into its internal structure. This type of analysis has been performed on the AIR responder. It helps in identifying security vulnerabilities from an outsider’s perspective, providing a critical view of the system's external defenses.

Graybox Analysis: For the AIR responder project, Graybox analysis has been conducted. This testing method combines both the internal and external examination of the software, providing a more comprehensive security overview.

Are databases used by the AIR responder?

Functioning like a server application, the AIR responder does not use databases. Rather, it operates by saving reports as individual files using SQLite. These reports are subsequently forwarded to the AIR Console. This approach simplifies the data handling process, enabling the efficient and secure storage and transfer of information.

What is the Binalyze design process?

We continuously advance our development process by implementing the SCRUM methodology, complemented by unit and integration testing. The use of both unit and integration testing is crucial for maintaining high-quality standards and ensuring that each component of our product functions seamlessly individually, and as part of the whole system.

Resource monitoring for the AIR responder

After the initial installation, it is normal to observe a small amount of memory being allocated, typically around 30MB to 40MB, with no significant CPU or disk usage during idle states. This behavior is expected and can be attributed to the necessary resources required for the AIR responder to function properly.

During idle states, the AIR responder remains in standby mode, pending its next call to the Console to collect any new tasking assignments. The allocated memory is utilized to maintain the AIR responder's core functionality and to ensure prompt responsiveness when new tasks are assigned.

When the AIR responder receives an acquisition task, the evidence collection process is carried out by a sub-process called Tactical (or Incident Response Evidence Collector on Windows). During the acquisition process, it is normal to observe increased CPU and memory usage as the Tactical sub-process actively collects and processes the evidence.

The increase in CPU and memory usage is a result of the intensive data gathering and analysis performed by the Tactical sub-process. It utilizes system resources to efficiently capture and process the required evidence, ensuring the integrity and completeness of the collected data.

The extent of CPU and memory usage during the acquisition task may vary depending on factors such as the size and complexity of the evidence being collected. Once the acquisition is completed, the CPU and memory usage will typically return to normal levels, reflecting the completion of the resource-intensive task.

If you prefer to limit the CPU usage during the execution of acquisition or triage tasks, you have the option to set a CPU policy that restricts the maximum CPU usage to a specified percentage. This setting, adjustable in the AIR Console before execution, allows you to limit CPU usage. Setting a lower percentage may extend task completion times.

A Triage task does not involve running the Tactical sub-process for evidence collection. Instead, the Triage task is executed within the AIR responder, utilizing its internal capabilities to analyze and evaluate the collected data.

While the CPU usage for a Triage task may typically be low, it is still possible to set a CPU policy for the Triage task.

The log file of the running AIR responder provides valuable information about CPU usage, memory usage, and other system resources. Here is an example of the log entries about system and service resources:

  1. INFO 2024-01-04 18:45:25+03:00 2.31.2 triage: resmon: SysStats{GoHeapAlloc: 2.3 MB, GoHeapSys: 12 MB, NumGoroutines: 27, NumCPU: 16} file:pkg/resmon/handlers.go:16 func:resmon.(*LoggingStatsHandler).HandleSysStats

  2. INFO 2024-01-04 18:45:26+03:00 2.31.2 triage: resmon: PidStats{PID: 9460, Name: AIR.exe, CPU: 14.7%, AvgCPU: 25.9%, Mem: 56 MB, NumFDs: 341, NumCPU: 16} file:pkg/resmon/handlers.go:21 func:resmon.(*LoggingStatsHandler).HandlePidStats

The log file for the AIR responder can be found at the following location:

C:\Program Files (x86)\Binalyze\AIR\agent\AIR.log.txt

You can navigate this path on your system to access the log file and view the relevant information about CPU usage, memory usage, and other resources as logged by the AIR responder during its operation.

These usages were observed on a system equipped with an Intel Core i7-10875H running at 2.30GHz (with 16 processors) and 32GB of memory (Windows 10 Pro).

Similar scenarios can be observed on macOS with the built-in Activity Monitor application. To access detailed process information, simply click on the (i) button within the Activity Monitor.

On Linux, an alternative option for resource monitoring is to use htop instead of the built-in app top. The htop option offers enhanced capabilities and can be installed by following these steps:

  1. Open the terminal.

  2. Run the command: sudo apt-get install htop (for Ubuntu/Debian-based distributions) or sudo yum install htop (for CentOS/Fedora-based distributions).

  3. Once installed, type htop in the terminal and press Enter to launch the application.

Using htop provides a more comprehensive and user-friendly interface for monitoring system resources on Linux.

Resource Monitoring with resmon

There is also a CLI tool named resmon specifically developed for internal usage. It can be used to gather resource usage data related to the AIR responder and its subprocesses, storing them in a local database.

By default, resmon will monitor the AIR responder if no flags are given. However, you can monitor other processes by providing a PID flag or a process name flag. For more detailed information on its usage, a usage document for resmon can be provided upon request.

The information collected by resmon is stored in a local database, which includes numerous entries for the monitored process and its subprocesses. Due to the abundance of entries with comprehensive details, reading and interpreting the data can be challenging.

To address this, a script has been developed alongside resmon to visualize these outputs. It displays the CPU and memory usage of the processes (including subprocesses) monitored by resmon in a graphical format.

In the following section, we will share the resmon results as it monitored various task assignments being executed by the AIR responder. Throughout the tasks, resmon will continuously monitor the AIR responder and its subprocesses, generating a comprehensive local database that captures the output of resource monitoring.

For easy visualization, we will utilize a feature of a resmon designed to focus on visualizing its output by presenting the CPU and memory usage in intuitive graphical representations. These visualizations provide valuable insights into the resource utilization of the AIR responder and its subprocesses from the beginning to the end of each tasking assignment.

The following scenarios were observed on a system equipped with an Intel Core i7-10875H running at 2.30GHz (with 16 processors) and 32GB of memory (Windows 10 Pro).

Analysis of an Acquisition Task

Below, you will find two graphs illustrating the CPU and Memory usage of the AIR responder. These graphs represent the resource utilization from the moment an acquisition task is started through to its completion.

Duration

Report Size (Zipped)

Database Size

Event Record Count

Drone

Total Disk Space

Used Disk Space

06m29s

199KB

38KB

10091

Enabled

512 GB

176 GB

Analysis of an Acquisition Task (with CPU limit of 50%)

In this scenario, we will examine the CPU and Memory usage of the AIR responder while running tasks received from the AIR Console, with a specific condition: the CPU usage of the AIR responder is limited to 50%.

This limitation is possible due to the capability of the AIR responder to control and restrict its CPU utilization during task execution.

The visualized graphs provided below depict the resource utilization, specifically focusing on the CPU and Memory usage of the AIR responder. These graphs showcase the performance of the AIR responder, highlighting its ability to effectively manage the CPU allocation while carrying out tasks received from the AIR Console.

The script can occasionally display temporary CPU usage spikes that surpass the process's CPU limit as a result of aggregating subprocesses.

Duration

Report Size (Zipped)

Database Size

Event Record Count

Drone

Total Disk Space

Used Disk Space

06m48s

200KB

39KB

10102

Enabled

512 GB

176 GB

Analysis of a Triage Task

Let’s examine the resource usage of the AIR responder when a Triage task is received from the AIR Console.

Duration

Triage Rule Type

Total Disk Space

Used Disk Space

CPU Limit

19m33s

YARA

512 GB

176 GB

100%

Analysis of a Triage Task (with CPU limit of 50%)

Similar to an acquisition task, a Triage task can also be configured with a CPU limit for executing the AIR responder. The following graphs illustrate the resource usage of a Triage task running with a CPU limit of 50%.

Duration

Triage Rule Type

Total Disk Space

Used Disk Space

CPU Limit

27m09s

YARA

512 GB

176 GB

50%

Last updated