Introduction
With the evolving sophistication and persistence of threat actors, there is no excuse for organizations to be unprepared to strategically and quickly respond to the inevitable—cybersecurity incidents.
Among its many documented standards, the National Institute of Standards and Technology (NIST) provides structured guidelines that help organizations develop effective and consistent incident response functions within their team. These guidelines are written for analysts, responders, and other stakeholders in an effort to ensure consistency in incident handling.
This article explores the NIST guidelines for incident response and provides actionable insights for implementing best practices around incident handling.
NIST and SANS Incident Response Steps
First, it’s important to differentiate between the two most well-known incident response frameworks used in the field. The National Institute of Standards and Technology (NIST) is a U.S federal agency that researches, develops, and maintains a number of measurement standards for science and technology to enable quality assurance and regulatory practices.
While NIST is not explicitly focused on developing cybersecurity standards (for example, it is also known for developing the national standards for units of measurement, materials science, and manufacturing), it is well known for its work for helping organizations better understand, manage, and reduce their cybersecurity risk and respond to incidents.
Specifically, the NIST guidelines for incident response are detailed in the NIST Special Publication 800-61 Revision 2, titled “Computer Security Incident Handling Guide.”
On the other hand, the SANS Institute also offers a well-known and widely adopted incident response framework. While similar, SANS breaks down some of NIST’s phases into more granular steps to offer additional focus on certain areas. Despite their slight differences, both frameworks share the same overarching goal of preparing, improving, and standardizing the incident response process.
Now, we look at the NIST model and the practical steps organizations can follow to develop an incident response plan.
Preparation
Any good framework or model needs to start with a comprehensive planning and preparation stage. And within the NIST guidelines, the preparation phase is meant to map out key objectives and clear policies for an organization’s overall incident response strategy.
While an incident response team typically doesn’t handle the actual prevention of incidents, its planning and readiness determines the success of its response capabilities. Within the preparation phase, an organization ensures they have the resources and tools ready, such as communication mechanisms, contact information, incident tracking systems, and equipment to investigate and analyze artifacts.
Communication Planning
Incident response teams should have updated contact or on-call information for team members and external contacts that they may need to bring in and assign duties. In cases where an organization leverages incident response retainers, there needs to be a process in place to recruit personnel quickly and avoid bureaucratic blockers that could prevent getting the right people on the scene promptly.
Additionally, as a best practice, organizations should pre-establish primary and alternate methods of communication, including out-of-band channels in case the primary means of communication is lost or compromised. For example, if the company’s primary communication channel (such as Microsoft Teams or Slack) is found to be compromised by an inside attacker or otherwise rendered inaccessible, incident response teams may need to fall back to using secure conference bridges or encrypted chat messaging services like Telegram, WhatsApp, or Signal.
Equipment Planning
Digital forensics and incident response (DFIR) teams often maintain a “jump kit” or a “jump bag,” which is a preconfigured resource containing the hardware and software tools necessary to perform initial collection, analysis, and reporting during incident triage.
As a general best practice, an effective jump kit should include items like:
- Spare laptops or workstations
- Live response tools
- Memory capture tools
- Disk imaging tools
- Chain of custody forms
- Physical storage bags
- Electronic storage media
- Write blockers
Asset Management
Lastly, to effectively manage incidents and maintain visibility over the organization’s assets, response teams need access to an up-to-date asset management resource.
As a best practice, organizations need to maintain documentation of their endpoints, servers, networking equipment, and security appliances. It’s also important to maintain regular baseline images, activity benchmarks, and system backups, so that analysts and responders can comparatively identify anomalies and restore systems as necessary.
Detection and Analysis
How can we effectively respond to an incident if we don’t know if one has even occurred? During the Detection and Analysis phase, we rely on our previously deployed and maintained monitoring solutions, threat intelligence, detection engineering, and reporting mechanisms to alert the team on potential incidents as they occur. This phase not only allows us to determine the legitimacy, scope, and impact of incidents, but also minimizes their potential damage and spread.
Identification Sources and Analysis
NIST outlines a number of potential sources of information for incident detection and analysis, including network communication and session data collected between endpoints (network flows), alerts from various security appliances (Security Information and Event Management systems, firewalls, intrusion detection and prevention solutions, etc.), and even reports from people inside and outside of the organization.
While investing in the latest and greatest monitoring and detection tools is a solid strategy for identifying incidents, having a clear and communicated channel for staff to report anomalies and signs of incidents is equally important. This allows the incident response team to act quickly and investigate early indicators that could be catalysts for severe incidents.
Given the high volume of events and telemetry that SOC teams are flooded with, it can be challenging to discern which potential incidents require immediate investigation over the benign noise. When surveying SOC analysts on what frustrates them the most on the job, the concept of “alert fatigue” is often among the common answers.
Arriving at a balance between collecting just enough information to facilitate useful investigations, avoiding an excess of irrelevant data is known as tuning the signal-to-noise ratio. When scoping out detection and monitoring tools, a common but misguided reaction is to add more data by deploying more monitoring and detection tools. SOCs should instead focus on getting the most functionality out of their current toolset and improving integration among their existing systems. This will allow for more event correlation opportunities and the ability to establish effective baselines, which can cut down on the noise and keep the focus on statistical and behavioral anomalies.
Incident Prioritization and Notification
Just as it is important to be able to identify genuine incidents, organizations must also prioritize their response capacity. There should be structured triage procedures in place for analysts to assess the severity and potential impact of each incident to determine its urgency and the appropriate response. Clear escalation paths should also be documented to ensure that critical incidents receive immediate attention and resources.
Containment, Eradication, and Recovery
As mentioned earlier, NIST combines containment, eradication, and recovery into a single cohesive step for recovering from and mitigating cybersecurity incidents. However, each phase is uniquely important for ensuring that systems and operations return to normalcy in a secure manner.
Containment
To put it simply, containment involves taking immediate steps to limit the scope and impact of an incident. The goal is to prevent further spread and damage while preserving evidence for later analysis. The specific containment strategy should be tailored to the scope of the incident and the nature of the asset or endpoint. Containment should be executed in a way that balances stopping the threat, preserving evidence, and maintaining business operations.
For example, network segmentation is often used to quarantine and isolate affected systems and prevent lateral movement of the threat. If an incident involves a compromised user account, access restrictions can be implemented and credentials can be reset to stop any further unauthorized access.
Eradication
Eradication involves removing the root cause of the incident – whether it be manually removing persistence mechanisms, restoring user accounts, or remediating any vulnerabilities that were exploited. While eradication might seem like the most obvious step in the incident response process, it needs to be well thought out and carefully executed to ensure that all traces of the threat are addressed.
As we’ve explored, eradication is not a standalone solution. It must be done alongside the other phases, such as containment and recovery, to ensure a comprehensive response. For example, failing to address underlying vulnerabilities or leaving breadcrumbs of an attack on a system can lead to recurring issues or even new incidents, even after restoring systems from a known-good backup. As such, the eradication process needs to be thoroughly documented and systematic. It should always be followed by careful validation and further monitoring to confirm that the threat has been fully removed.
Recovery
Recovery focuses on restoring systems to normal operation, confirming their functionality, and implementing measures to prevent future incidents.
One of the most common steps in the recovery process is restoring systems from known-good backups. This involves using clean, verified backups to return systems to their pre-incident state, so the organization can be sure that any malicious changes or persistence mechanisms are removed. For example, the best (and cheapest) way to recover from a ransomware incident is to have recent, working backups to restore from. Because of this, one of the best proactive measures for organizations is to regularly create and maintain backups. These backups must be tested before restoration to confirm their integrity and usability.
Finally, the recovery phase should include proactive measures to prevent future, similar incidents, like applying patches and updates, hardening systems to a recommended benchmark, and enhancing monitoring around the affected endpoint(s).
Post-Incident Activity
Lastly, while incident management is often divided into distinct stages, it is more effective to visualize it as a continuous lifecycle. This is due to the importance of developing a feedback loop that supports an ever-evolving incident response function. Organizations need to incorporate lessons learned from each incident into their future response strategies. This includes conducting thorough post-incident reviews to analyze what happened, evaluate the effectiveness of the team’s response, and identify areas for improvement to prepare for future incidents.
A common method of fostering this improvement is by conducting “lessons learned” meetings, which should happen shortly after a major incident and possibly periodically for smaller ones. These meetings involve reviewing the incident, evaluating the effectiveness of the response, and discussing what could be improved. They should address the incident timeline, response effectiveness, and any obstacles that slowed recovery objectives.
Another important task is creating a detailed follow-up report, often referred to as a “post-mortem,” that includes a chronology of events, damage estimates, and other relevant data. This report serves as a reference for future incidents and may be important for legal proceedings or public relations.
The conclusions and findings from these initiatives can justify additional budgets, improve security measures, and help in refining incident response capabilities. As Winston Churchill is often quoted, “Never let a good crisis go to waste.”
Conclusion
The structured approach to incident management and response provided by NIST not only helps in identifying, prioritizing, and responding to incidents, but also ensures that organizations are prepared to handle crises of all sizes consistently. By focusing on preparation, detection, containment, eradication, and recovery, organizations can create a comprehensive incident response strategy that not only addresses immediate threats, but also improves their long-term security posture.
Additionally, the feedback loop that is naturally created when performing post-incident reviews and lessons learned activities ensures continuous improvement of the incident response process. This iterative and self-evolving approach allows organizations to refine their strategies, budget against their weaknesses, and stay on top of threats.
About the Author: Andrew Prince
Andrew is a seasoned and passionate security professional who brings a wealth of experience in areas such as security operations, incident response, threat hunting, vulnerability management, and cloud infrastructure security.
With a professional background in development and system administration, Andrew offers a well-rounded perspective on his security strategy. Andrew also navigates both offensive and defensive operations to provide a holistic approach to keeping people, processes, and technology secure. He is also active in developing various Capture the Flag challenges, creating security training, and sharing knowledge through content creation.
Social Media Links:
About TCM Security
TCM Security is a veteran-owned, cybersecurity services and education company founded in Charlotte, NC. Our services division has the mission of protecting people, sensitive data, and systems. With decades of combined experience, thousands of hours of practice, and core values from our time in service, we use our skill set to secure your environment. The TCM Security Academy is an educational platform dedicated to providing affordable, top-notch cybersecurity training to our individual students and corporate clients including both self-paced and instructor-led online courses as well as custom training solutions. We also provide several vendor-agnostic, practical hands-on certification exams to ensure proven job-ready skills to prospective employers.
Pentest Services: https://tcm-sec.com/our-services/
Follow Us: Blog | LinkedIn | YouTube | Twitter | Facebook | Instagram
Contact Us: sales@tcm-sec.com
See How We Can Secure Your Assets
Let’s talk about how TCM Security can solve your cybersecurity needs. Give us a call, send us an e-mail, or fill out the contact form below to get started.