Text Searching and Manipulation
grep
In a nutshell, grep56 searches text files for the occurrence of a given regular expression and outputs any line containing a match to the standard output, which is usually the terminal screen.
ls -la /usr/bin | grep -i "zip" | sort**we listed all the files in the /usr/bin directory with ls and pipe the output into the grep command, which searches for any line containing the string “zip”. Understanding the grep tool and when to use it can prove incredibly useful.
To use to one or more search Using
\ls -la /usr/bin | grep -i "zip\\|ZIP\\|Zip\\|br\\|gzip"**split
The split command in Unix/Linux is used to split a file into smaller pieces. This can be particularly useful for managing large files, distributing data, or simply breaking down files into more manageable parts. Here’s an overview of how to use the split command along with various options:
➜ OSCP** for i in {1..1000};do echo {$i}_Osec >> file ;done
➜ split 100 file new_Basic Syntax
split [OPTION] [INPUT [PREFIX]]
OPTION: Various options to specify how the file should be split.
INPUT: The input file to split. If not specified,
splitreads from standard input.PREFIX: The prefix for the output files. The default prefix is "x".
Common Options
l LINES: Split the file into pieces withLINESlines each.b SIZE: Split the file into pieces ofSIZEbytes each. You can use suffixes likeK,M, orGfor kilobytes, megabytes, or gigabytes, respectively.C SIZE: Split the file at the specified size, but ensuring that each split part does not split any individual line.d: Use numeric suffixes instead of alphabetic. For example,x00,x01, etc.a SUFFIX_LENGTH: Use suffixes of lengthSUFFIX_LENGTH. The default is 2.
Examples
Split by Lines:
split -l 1000 largefile.txtThis command splits
largefile.txtinto files each containing 1000 lines. The output files will be namedxaa,xab,xac, etc.Split by Bytes:
split -b 1M largefile.txtThis splits
largefile.txtinto files each containing 1 megabyte of data. The output files will be namedxaa,xab,xac, etc.Split by Lines with Numeric Suffixes:
split -l 1000 -d largefile.txtThis splits
largefile.txtinto files each containing 1000 lines, with numeric suffixes (x00,x01,x02, etc.).Split with Custom Prefix:
split -l 1000 largefile.txt part_This splits
largefile.txtinto files each containing 1000 lines, with the prefixpart_. The output files will be namedpart_aa,part_ab,part_ac, etc.Split by Size with Ensuring Complete Lines:
split -C 100M largefile.txtThis splits
largefile.txtinto files around 100 megabytes each, but ensures that no line is split across files. The output files will be namedxaa,xab,xac, etc.Split by Bytes with Long Suffixes:
split -b 1M -a 3 largefile.txtThis splits
largefile.txtinto files each containing 1 megabyte of data, with suffixes of length 3 (xaa,xab, ...,xaaa,xaab, ...).
Practical Use Case: Splitting a Log File
Assume you have a large log file server.log and you want to split it into smaller parts with each part containing 500 lines:
split -l 500 server.log log_part_
This command creates files named log_part_aa, log_part_ab, log_part_ac, etc., each containing 500 lines from server.log.
Combining Split Files
To combine the split files back into the original file, you can use the cat command:
cat log_part_* > combined_server.log
Conclusion
The split command is a versatile tool for dividing large files into smaller, more manageable pieces. By using various options, you can control the size, number of lines, and naming conventions of the output files to suit your needs.
sed
sed is a powerful stream editor. It is also very complex so we will only briefly scratch its surface here. At a very high level, sed performs text editing on a stream of text, either a set of specific files or standard output. Let’s look at an example:
kali@kali:~$ echo "I need to try hard" | sed 's/hard/harder/'
I need to try harde
➜ OSCP sed 's/pass/root/' txt.txt
root pass pass
rootwod pass
Pass rootlook in this change row 1 only but can’t change Global
To Change use g ⇒ Global
➜ OSCP sed 's/pass/root/g' txt.txt
root root root
rootwod root
Pass root
Cut
The cut command is a Unix/Linux utility used to extract sections from each line of input (usually files). It's commonly used to parse and retrieve specific columns of data from text files or command output. Here's an overview of how to use the cut command with various options:
echo "I hack binaries,web apps,mobile apps, and just about anything
else"| cut -f 2 -d ","
web appsBasic Syntax
cut OPTION [FILE...]
If no file is specified, cut reads from the standard input.
Common Options
b LIST: Select only the bytes listed in LIST.c LIST: Select only the characters listed in LIST.d DELIM: Use DELIM instead of the tab character as the field delimiter.f LIST: Select only the fields listed in LIST.-complement: Complement the selection. This option makescutselect all fields except the ones specified.-output-delimiter=STRING: Use STRING as the output delimiter. The default is to use the input delimiter.
Examples
Cut by Bytes:
echo "Hello, World!" | cut -b 1-5Output:
HelloThis extracts the first 5 bytes from the input string.
Cut by Characters:
echo "Hello, World!" | cut -c 1-5Output:
HelloThis extracts the first 5 characters from the input string.
Cut by Fields:
echo "a:b:c:d" | cut -d ':' -f 2 ➜ OSCP echo "Hello : hi : World : ! : " | cut -f 2,3 -d ":"Output:
bThis extracts the second field from the input string, using
:as the delimiter.Multiple Fields:
echo "a:b:c:d" | cut -d ':' -f 1,3Output:
a:cThis extracts the first and third fields from the input string, using
:as the delimiter.Complement:
echo "a:b:c:d" | cut -d ':' --complement -f 2Output:
a:c:dThis extracts all fields except the second one from the input string, using
:as the delimiter.Change Output Delimiter:
echo "a:b:c:d" | cut -d ':' -f 1,3 --output-delimiter=','Output:
a,cThis extracts the first and third fields and changes the output delimiter to a comma.
Practical Use Case: Extracting Specific Columns from a CSV File
Assume you have a CSV file data.csv with the following content:
Name,Age,Gender,Location
Alice,30,Female,New York
Bob,25,Male,Los Angeles
Carol,28,Female,Chicago
To extract the Name and Location columns:
cut -d ',' -f 1,4 data.csv
Output:
Name,Location
Alice,New York
Bob,Los Angeles
Carol,Chicago
Head
head -n 5 filetail
tail -n 5 fileOutput
{996}_Osec
{997}_Osec
{998}_Osec
{999}_Osec
{1000}_Osecawk
awk is a powerful programming language and command-line utility for text processing and data extraction in Unix/Linux environments. It is especially useful for working with structured text data, such as CSV files or log files. Here’s a detailed guide on how to use awk effectively:
Basic Syntax
awk 'pattern {action}' [file...]
pattern: Specifies the condition to match.
action: Specifies the commands to execute when the pattern matches.
file: Specifies the input file(s) to process. If no file is provided,
awkreads from standard input.
Common Usage Patterns
Print Specific Columns:
awk '{print $1, $3}' file.txtThis command prints the first and third columns from
file.txt.Field Separator:
awk -F',' '{print $1, $2}' file.csvThis command sets the field separator to a comma and prints the first and second columns from
file.csv.Conditional Processing:
awk '$3 > 100 {print $1, $2}' file.txtThis command prints the first and second columns for rows where the third column is greater than 100.
Using Built-in Variables:
awk '{print NR, $0}' file.txtThis command prints the line number (
NR) followed by the entire line ($0) fromfile.txt.Pattern Matching:
awk '/error/ {print $0}' file.txtThis command prints all lines containing the word "error" from
file.txt.
Advanced Examples
Summing a Column:
awk '{sum += $2} END {print sum}' file.txtThis command sums the values in the second column and prints the total after processing all lines.
Average Calculation:
awk '{sum += $2; count++} END {print sum/count}' file.txtThis command calculates and prints the average of the values in the second column.
Complex Field Separator:
awk -F '[: ]' '{print $1, $3}' file.txtThis command sets the field separator to either a colon or a space and prints the first and third columns.
Output Formatting:
awk '{printf "Name: %s, Age: %d\\\\n", $1, $2}' file.txtThis command formats the output with specific text and formatting.
Multi-file Processing:
awk 'FNR == 1 {print FILENAME} {print $0}' file1.txt file2.txtThis command prints the filename before printing the contents of each file.
Practical Use Case: Parsing a Log File
Assume you have a log file access.log with lines in the format:
192.168.1.1 - - [01/Jan/2024:10:00:00] "GET /index.html HTTP/1.1" 200 1024
To extract and print the IP address, date, and status code:
awk '{print $1, $4, $9}' access.log
Output:
192.168.1.1 [01/Jan/2024:10:00:00] 200
Combining awk with Other Commands
awk with Other Commandsawk can be combined with other commands using pipes. For example, to find the total number of unique IP addresses in the log file:
awk '{print $1}' access.log | sort | uniq | wc -l
If
cat file | awk '{if ($i~/pass/) print}'➜ OSCP cat /var/log/apache2/access.log | cut -f 1 -d " " | sort | uniq -c |sort
Output
2 ::1
8 127.0.0.1uniq
uniqcomm
The comm command in Unix/Linux is used to compare two sorted files line by line and produces three-column output: lines only in the first file, lines only in the second file, and lines common to both files. It is a useful tool for identifying differences and similarities between two datasets.
Basic Syntax
comm [OPTION]... FILE1 FILE2
FILE1: The first sorted file.
FILE2: The second sorted file.
Common Options
1: Suppress the first column (lines unique to FILE1).2: Suppress the second column (lines unique to FILE2).3: Suppress the third column (lines common to both files).-check-order: Check that the input files are sorted.-nocheck-order: Do not check that the input files are sorted. This is the default.
Examples
Basic Comparison:
comm file1.txt file2.txtThis compares
file1.txtandfile2.txtand outputs three columns:Lines only in
file1.txt.Lines only in
file2.txt.Lines common to both files.
Suppress First Column:
comm -1 file1.txt file2.txtThis suppresses the first column and only shows lines unique to
file2.txtand lines common to both files.Suppress Second Column:
comm -2 file1.txt file2.txtThis suppresses the second column and only shows lines unique to
file1.txtand lines common to both files.Suppress Third Column:
comm -3 file1.txt file2.txtThis suppresses the third column and only shows lines unique to
file1.txtand lines unique tofile2.txt.Find Common Lines Only:
comm -12 file1.txt file2.txtThis suppresses the first and second columns, showing only lines common to both files.
Find Unique Lines in Both Files:
comm -3 file1.txt file2.txtThis suppresses the third column, showing only lines unique to
file1.txtand lines unique tofile2.txt.
Practical Use Case: Comparing Lists
Assume you have two files, list1.txt and list2.txt, each containing a list of items.
list1.txt:
apple
banana
cherry
date
list2.txt:
banana
cherry
fig
grape
To find items that are in both lists:
comm -12 list1.txt list2.txt
Output:
banana
cherry
To find items that are only in list1.txt:
comm -23 list1.txt list2.txt
Output:
apple
date
To find items that are only in list2.txt:
comm -13 list1.txt list2.txt
Output:
fig
grape
Sorting Before Comparison
The comm command requires the input files to be sorted. If the files are not sorted, you can sort them before using comm:
sort file1.txt -o file1_sorted.txt
sort file2.txt -o file2_sorted.txt
comm file1_sorted.txt file2_sorted.txt
Alternatively, you can use pipes to sort and compare in one command:
comm <(sort file1.txt) <(sort file2.txt)
Conclusion
The comm command is a simple yet powerful tool for comparing two sorted files line by line. By using various options, you can customize the output to focus on unique or common lines, making it easier to analyze differences and similarities between datasets.
diff
The diff command in Unix/Linux is used to compare the contents of two files line by line. It outputs the differences between the files in a format that can be used to create patches or to understand the changes between the two versions of a file. Here's a detailed guide on how to use the diff command along with various options:
Basic Syntax
diff [OPTION]... FILES
FILES: Two files to be compared.
Common Options
uor-unified: Produces a unified format diff with a few lines of context. This is the most commonly used format.cor-context: Produces a context format diff with a few lines of context.ior-ignore-case: Ignores case differences in file contents.wor-ignore-all-space: Ignores all white space.Bor-ignore-blank-lines: Ignores changes that just insert or delete blank lines.ror-recursive: Recursively compares any subdirectories found.qor-brief: Outputs only whether files differ, not the details of the differences.
Examples
Basic Comparison:
diff file1.txt file2.txtThis compares
file1.txtandfile2.txtand outputs the differences.Unified Format:
diff -u file1.txt file2.txtThis compares the files and outputs the differences in the unified format, which shows a few lines of context around the changes.
Context Format:
diff -c file1.txt file2.txtThis compares the files and outputs the differences in the context format.
Ignore Case Differences:
diff -i file1.txt file2.txtThis ignores case differences when comparing the files.
Ignore All White Space:
diff -w file1.txt file2.txtThis ignores all white space when comparing the files.
Ignore Blank Lines:
diff -B file1.txt file2.txtThis ignores changes that only involve blank lines.
Recursive Comparison:
diff -r dir1 dir2This recursively compares directories
dir1anddir2.Brief Output:
diff -q file1.txt file2.txtThis outputs only whether the files differ, not the details of the differences.
Interpreting the Output
The default output format of diff can be a bit cryptic at first. Here's how to interpret it:
1,3c1,3
< line1
< line2
< line3
---
> line1
> line2 changed
> line3
1,3c1,3: Indicates that lines 1-3 in the first file are changed in lines 1-3 in the second file.
Lines prefixed with
<are from the first file.Lines prefixed with
>are from the second file.--separates the lines from the two files.
Practical Use Case: Creating a Patch
You can use diff to create a patch file that contains the differences between two files. This patch file can then be applied to the original file to update it.
Creating a Patch:
diff -u original.txt modified.txt > patch.diff
Applying the Patch:
patch original.txt < patch.diff
Conclusion
The diff command is a powerful tool for comparing files and directories. By understanding its options and output, you can effectively identify and manage differences between file versions. This is particularly useful in software development for tracking changes and creating patches.
vimdiff
vimdiff is a powerful tool that uses the Vim text editor to display the differences between two or more files side by side. It highlights the differences and allows you to interactively merge and edit files. Here's a detailed guide on how to use vimdiff:

Basic Usage
To compare two files:
vimdiff file1.txt file2.txt
To compare three files:
vimdiff file1.txt file2.txt file3.txt
To compare more files (up to four files), simply list them all in the command.
Navigating Differences
When you open files with vimdiff, each file is shown in a separate window. The differences between the files are highlighted. Here are some common commands for navigating and managing differences in vimdiff:
]c: Jump to the next change.[c: Jump to the previous change.door:diffget: Get (copy) the changes from the other file into the current file.dpor:diffput: Put (copy) the changes from the current file into the other file.:diffupdate: Manually update the differences.:diffoff: Turn off the diff mode.:diffthis: Turn on the diff mode for the current window.zo/zc: Open/close folded text.
Editing and Merging
In vimdiff, you can edit any of the files just like in Vim. The changes will be reflected immediately, and the differences will be updated. Here are some useful commands for editing and merging:
Copy Changes from One File to Another: Place the cursor on the change you want to copy, then use:
doto copy the change from the other file into the current file.
Copy Changes from Current File to Another: Place the cursor on the change you want to copy, then use:
dpto copy the change from the current file to the other file.
Customizing the Diff Display
You can customize how vimdiff displays differences by modifying your .vimrc configuration file. Here are some options:
Set the Number of Context Lines:
set diffopt+=context:3This sets the number of context lines to 3.
Highlight Differences with Custom Colors:
highlight DiffAdd ctermfg=NONE ctermbg=LightBlue highlight DiffChange ctermfg=NONE ctermbg=LightMagenta highlight DiffDelete ctermfg=NONE ctermbg=LightCyanThis sets custom colors for added, changed, and deleted lines.
Using vimdiff with Git
vimdiff with Gitvimdiff is particularly useful when used as a merge tool in version control systems like Git. To set vimdiff as the default diff tool in Git, you can use the following command:
git config --global diff.tool vimdiff
To set vimdiff as the default merge tool in Git, use:
git config --global merge.tool vimdiff
You can then use git difftool and git mergetool to resolve conflicts with vimdiff.
Practical Example
Suppose you have two files, file1.txt and file2.txt, with the following contents:
file1.txt:
apple
banana
cherry
date
file2.txt:
apple
banana
citrus
date
Running vimdiff file1.txt file2.txt will open Vim with both files side by side, highlighting "cherry" in file1.txt and "citrus" in file2.txt as the differing lines. You can then navigate to the differences and use the do or dp commands to merge the changes as needed.
Conclusion
vimdiff is a powerful and flexible tool for comparing and merging files. By leveraging Vim's extensive capabilities, you can efficiently navigate, edit, and resolve differences between files. Whether you are comparing simple text files or resolving complex merge conflicts in a version control system, vimdiff provides a robust solution for managing differences.
Last updated