In our roles as cybersecurity professionals, we often find ourselves drowning in a sea of data logs, unable to extract meaning and insight from the enormous amount of data. This is where the great trio of sed, awk, and grep can be put into action. In spite of the fact that these three command-line utilities may seem like relics from the past, they are the unsung heroes of the log analysis world.
Sed: The Stream Editor
In the command line environment, Sed stands for Stream Editor, which is a powerful tool for manipulating and processing text data. Sed is like the Swiss Army knife of text manipulation. Streams of text can be edited in real-time with this application, making it a useful tool for analyzing log files in real-time. The following are some of the things you can do with sed:
- Filter out irrelevant log entries
- Extract specific fields or patterns
- Perform complex text substitutions
History of Sed
Sed was developed in the early days of Unix, when text processing was a crucial aspect of what a computer could do. As a member of the Bell Labs team, Lee McMahon, came up with sed to replace the existing editor in a more flexible and efficient manner. Over time, sed evolved into a standard part of Unix systems, along with various implementations and extensions that were developed as a result.
Sed: Key Features
Sed’s core functionality revolves around text manipulation, offering a range of features, including:
Text filtering: Sed can select specific lines or patterns from input streams.
Text transformation: Sed can modify text by substituting, deleting, or inserting characters.
Regular expressions: Sed supports regular expressions for advanced pattern matching.
Scriptability: Sed allows users to write scripts to automate complex text processing tasks.
Here are a few examples of what you can do with sed:
Extract all log entries containing the word "error"
sed -n '/error/p' log.txt
Replace all occurrences of "oldstring" with "newstring"
sed 's/oldstring/newstring/g' log.txt
Or
sed s:”oldstring”:”newstring”:g log.txt
Delete all lines starting with "#" (comments)
sed '/^#/d' log.txt
Sed can also be used in incident response to detect SQL injection attacks. For example:
Extract all log entries containing the word "SELECT" to detect potential SQL injection attacks
sed -n '/SELECT/p' access.log
It can also be used by penetration testers to extract password hashes. For example:
Extract all log entries containing the word "password" to extract password hashes
sed -n '/password/p' auth.log
Awk: The Pattern Processing Powerhouse
The Awk programming language, named after its creators Alfred Aho, Peter Weinberger, and Brian Kernighan, consists of a command line utility and a programming language that are commonly used for data analysis and text processing. This allows you to search, manipulate, and report on structured data in a structured way, making it a powerful tool for log analysis, data mining, and file manipulation, as well as log search and manipulation.
History of Awk
Awk’s development began in the 1970s at Bell Labs, where Aho, Weinberger, and Kernighan created the language as a successor to their earlier editor. Awk’s design focused on simplicity, flexibility, and efficiency, allowing users to write concise and powerful text processing programs. Over time, Awk evolved and became a standard component of Unix systems, with various implementations and extensions emerging.
Awk: Key Features
Awk is a programming language in its own right, designed specifically for text processing. It’s like a supercharged version of sed, with added features like:
Pattern matching: Awk can search for specific patterns in input data.
Field manipulation: Awk can extract, manipulate, and reformat fields within structured data.
Conditional statements: Awk supports if-else statements and loops for conditional processing.
Functions: Awk allows users to define custom functions for reusable code.
Regular expressions: Awk supports regular expressions for advanced pattern matching.
Here are a few examples of what you can do using awk:
Extract all log entries with a specific status code (404)
awk '$5 == "404"' access.log
Print the first and third fields of each line
awk '{print $1, $3}' log.txt :
Calculate the sum of the second field across all lines
awk 'BEGIN {sum=0} {sum+=$2} END {print sum}' log.txt
Awk is a useful tool in incident response and can help identify brute force attacks. For example:
Extract all log entries with a 401 status code to identify potential brute force attacks
awk '$5 == "401"' access.log
Penetration testers can also use awk to extract sensitive data. Take the following example:
Extract the second field of each line, which contains sensitive data
awk '{print $2}' sensitive_data.log
Grep: The Guardian of Patterns
Grep, which stands for Global Regular Expression Print, is a command-line utility used for searching and comparing patterns in text files. There is a primary purpose of grep, which is to find and display lines that contain a certain pattern, making it one of the most important tools for log analysis, data mining, and file searching.
History of Grep
Ken Thompson, a member of the Unix development team at Bell Labs, created the first version of grep in the early 1970s. A regular expression-based pattern matching function in the earlier ed editor inspired Thompson to create grep. A variety of implementations and extensions of grep evolved over time and became a standard component of Unix systems.
Grep: Key Features
The core functionality of grep revolves around pattern matching and searching, offering a range of features, including:
Regular expressions: Grep supports regular expressions for advanced pattern matching.
Pattern searching: Grep can search for specific patterns in text files.
File searching: Grep can search for files containing specific patterns.
Line matching: Grep can display lines that contain a specific pattern.
Color highlighting: Grep can highlight matched patterns in color.
Some common examples of how grep is used include:
Find all log entries containing a specific IP address
grep '192.168.1.100' log.txt
Search for lines containing either "error" or "warning"
grep -E 'error|warning' log.txt
Recursively search for a pattern in all files within a directory
grep -R 'pattern' /path/to/directory
Grep is also useful in incident response and can help detect malware communication. For example:
Search for log entries containing suspicious patterns indicative of malware communication
grep -E 'command\.php|eval\(' access.log
In addition, penetration testers can use grep to find hidden backdoors in systems. For example:
Recursively search for files containing the string "bash.history" to find hidden backdoors
grep -R 'bash\.history' /home/user
RegEx: The Pattern Matching Powerhouse
Regular Expressions, or RegEx, are a sequence of characters that provide the basis for defining a search pattern that can be used to match and manipulate text. It is a powerful tool that helps you find, validate, and extract data from text files, log files, and strings. In order to write code, programmers, data analysts, and system administrators need to be able to express exactly what they need, making regex an indispensable skill.
The purpose of Basic Regular Expressions is to outline a pattern that can be used to match and manipulate text patterns using a sequence of characters that define a search pattern.
Here are a few basic regex patterns to get you started:
. (dot) matches any single character
* (star) matches zero or more of the preceding element
+ (plus) matches one or more of the preceding element
? (question mark) matches zero or one of the preceding element
^ (caret) matches the start of a line
$ (dollar sign) matches the end of a line
[abc] (square brackets) matches any character within the brackets
(abc) (parentheses) groups elements and captures matches
Examples:
grep ‘hello*’ file.txt matches lines containing “hello” followed by zero or more characters
grep ‘^hello’ file.txt matches lines starting with “hello”
grep ‘hello$’ file.txt matches lines ending with “hello”
grep ‘[a-zA-Z]’ file.txt matches lines containing any letter (uppercase or lowercase)
RegEx in Action
RegEx is widely used in various programming languages, command-line utilities, and text editors. Some popular use cases include:
- Validating user input (e.g., email addresses, phone numbers)
- Extracting data from logs and text files
- Searching and replacing text in files and strings
- Parsing HTML and XML documents
Combining Sed, Awk, Grep and RegEx
Combining sed, awk, grep, and regex allows for powerful text processing and manipulation. By chaining these tools together, you can perform complex tasks such as data extraction, formatting, and filtering. For example, you can use grep to search for patterns in a file, then pipe the output to awk for further processing and formatting, and finally use sed to replace or delete text. RegEx can be used throughout the process to specify patterns and match text.
Here’s an example command that combines these tools:
`grep -o '<pattern>' file.txt | awk '{print $2}' | sed 's/<replacement>/' | grep '<final_pattern>'
In this command, grep searches for a pattern in a file, awk extracts the second field, sed replaces text, and finally, grep searches for a final pattern. By combining these tools, you can perform complex text processing tasks with ease.
Here are a few other examples of tasks you can perform by combining sed, awk, grep, and regex:
Replace a string and then search for a pattern
sed 's/oldstring/newstring/g' log.txt | grep 'pattern'
or
sed s:”oldstring”:”newstring:”g log.txt | grep ‘pattern’
Print specific fields and then search for a pattern
awk '{print $1, $3}' log.txt | grep 'pattern'
Detecting SQL injection attacks
Identifying brute force attacks
grep -i "failed login" log.txt | grep -c "<IP_ADDRESS>"
Detecting malware communication
grep -i "tcp connection" log.txt | grep -i "unknownport"
Investigating incident response (group logs by timestamp)
awk '{count[$3]++} END {for (ts in count) print ts, count[ts]}' log.txt
Extracting sensitive data for compliance (grep search for ‘password’ or ‘ credential’
grep -oE '(password|credential)' file.txt
Monitoring system logs for suspicious activity (search logs for a specific username)
grep -i "<USER_NAME>" log.txt
Analyzing network logs for traffic patterns
awk '{count[$11]++} END {for (proto in count) print proto, count[proto]}' log.txt
In addition, they can also be applied to investigate incident response by extracting log entries containing a specific IP address, printing, specific fields, and replacing sensitive information:
grep '192.168.1.100' access.log | awk '{print $1, $3}' | sed 's/oldstring/newstring/g'
There are also use cases for penetration testing. For example:
grep -oE "username=[^&]+" access.log | sed 's/username=//' | grep -v false
This command is like a super-powerful search tool that helps you find specific information in a huge log file called access.log
Here’s what it does step by step:
grep -oE "username=[^&]+" access.log
This part searches for lines in the access.log file that contain the word “username” followed by an equals sign and some characters that aren’t an ampersand (&).
-o flag tells it to only show the part of the line that matches the search, rather than the whole line.
-E flag lets us use regex search patterns.
The | (pipe character) sends the output of the grep command to the next command following the pipe character.
sed 's/username=//'
This takes the results from the previous search and removes the “username=” part from each line, leaving just the username itself.
The | (pipe character) sends the output of the grep command to the next command following the pipe character.
grep -v false
Finally, this filters out any lines that contain the word “false”. The -v flag inverts the search, so it shows everything except the lines with “false.”
In short, this command digs through a log file to find all the usernames that aren’t followed by the word “false.”
Enhancing Automated Scripting
By combining these tools and techniques, you can create powerful automated scripts to process and analyze data. Here are some examples for how these tools can be used to enhance automated scripts:
Grep:
- Use regular expressions to match complex patterns
- Use -v option to invert matching
- Use -A and -B options to print surrounding lines
- Use -f option to read patterns from a file
Awk:
- Use conditional statements (if/else) to manipulate data
- Use loops (for/while) to process data
- Use arrays to store and manipulate data
- Use functions to reuse code
Sed:
- Use regular expressions to match and replace patterns
- Use -e option to execute multiple commands
- Use -f option to read commands from a file
- Use -i option to edit files in place
Combining tools:
- Use grep to filter data, awk to manipulate data, and sed to transform data
- Use pipes (|) to chain commands together
- Use redirection (>, >>, <) to read and write files
Scripting tips:
- Use variables to store and reuse values
- Use conditionals (if/else) to make decisions
- Use loops (for/while) to repeat tasks
- Use functions to reuse code
- Use comments (#) to document code
Conclusion
The sed, awk, and grep commands along with regex skills are among the most powerful tools that a cybersecurity professional can acquire in order to extract valuable insights from data and stay on top of potential security threats.
A cybersecurity professional will be able to build a solid foundation for log analysis with the help of these utilities, so with practice you will become an expert in log analysis.
Educational resources for Sed, Awk, Grep and RegEx
Sed: https://www.gnu.org/software/sed/manual/sed.html
Awk: https://www.gnu.org/software/gawk/manual/gawk.html
Grep: https://www.gnu.org/software/grep/manual/grep.html
RegEx Online Cheatsheet: https://quickref.me/regex.html
RegEx Builder: https://regexbuildertool.com/
RegEx Tester: https://www.freeformatter.com/regex-tester.html
About TCM Security
TCM Security is a veteran-owned, cybersecurity services and education company founded in Charlotte, NC. Our services division has the mission of protecting people, sensitive data, and systems. With decades of combined experience, thousands of hours of practice, and core values from our time in service, we use our skill set to secure your environment. The TCM Security Academy is an educational platform dedicated to providing affordable, top-notch cybersecurity training to our individual students and corporate clients including both self-paced and instructor-led online courses as well as custom training solutions. We also provide several vendor-agnostic, practical hands-on certification exams to ensure proven job-ready skills to prospective employers.
Pentest Services: https://tcm-sec.com/our-services/
Follow Us: Blog | LinkedIn | YouTube | Twitter | Facebook | Instagram
Contact Us: sales@tcm-sec.com
See How We Can Secure Your Assets
Let’s talk about how TCM Security can solve your cybersecurity needs. Give us a call, send us an e-mail, or fill out the contact form below to get started.