Beginning Linux Thought Shift Needed

Beginning Linux Thought Shift Needed 1

Posted by JD 02/15/2012 at 03:00

Updated: June 2021

Transitioning from Windows or OSX to Linux can be difficult because the OSes have different philosophies. The idea that there needs to be a single app that does x+y+z+a+b+c isn’t how Linux/UNIX apps have been built historically. But that is the way that MS-Windows apps are built, so confusion reigns for recent converts.

In UNIX, tools are generally built to do 1 thing really well, then you chain those tools together do accomplish your desired, final, goal. That requires a thought shift and a tiny bit of mental exercise.

Another good article about how Linux is not Windows. Much to learn and many thought shifts are needed.

Exercise A – Counting Lines and Words in a File

As an example, how would you count all the words or lines in a file?

Windows Way

In Windows, you’d probably open the file in a word processor and look in the footer for the line count and word count. If the data wasn’t there, you’d probably look through the “Tools” menu for an option to count things. Now you have the count for 1 file and need to store that somewhere. Because you can’t easily copy/paste the data (it is a graphic), you need to type it in somewhere else.
Or you might have a programmer’s editor on your PC
Or you might google for a tool that does this specific task in Windows. I did and found a tool for US$60. Nice.
Or you might find the wc.exe tool from UNIX that has been ported to Windows.

Linux / UNIX Way

For a single file, we use ‘wc’ – word count. It is probably already installed.

$ wc -lw file.txt

18 70 file.txt

What if you wanted to count words and lines for every file in a directory?

 $ wc -lw *.txt

   44   440 file01.txt

   28   188 file02.txt

   19    42 file03.txt

   66   263 file04.txt

   15   276 file05.txt

  194  1371 file06.txt

   22   372 file07.txt

    9    24 file08.txt

   17    55 file09.txt

   43   238 file20.txt

   11    31 file41.txt

  468  3300 total

BTW, this output was lined up nicely in the xterm. That could be important later.

Recursive Results

What if you wanted to do that for every file in every directory recursively down?

$ find /path/to/dir -type f -iname “*txt” -print -exec wc -lw {} \;

There are probably 5 other ways, some more efficient, to accomplish this same task. Perhaps using xargs would be better?

And if I want to look over the results later, I can redirect the output to a file easily. No “export” option in every program is required.

Sorting

What if I wanted to sort the output? I could use sort or awk or perl, or any other scripting language and pick the column to be sorted … or import the file into a spreadsheet and use the column sorting there, but doing that means I have to point-n-click every time.

For example:

$ sort -k2 file

Sort can be used as a filter, so

$ find /path/to/dir -type f -iname “*txt” -print -exec wc -lw {} \; | sort -k2

will sort on the 2nd column. Handy?

Automatic and Emailed Results

What if I wanted to run the word count at 5am, weekdays only, sort the output, and email that to me AND someone else? Do you see where I’m going? This is pretty trivial to automate under Linux by chaining a few apps together.

Exercise B – Knowing which files changed

Say we have seen some odd things happening on our system and would like to know which files have changed over the last few weeks.

Windows Way

In Windows, you’d probably look for a tool, perhaps with audit in the name. I’m not going to look, but there probably isn’t one for home users, so we’d need to get some “enterprise” audit software, which won’t be cheap, probably needs an MS-SQL server license, CALs for each client, and perhaps an MS-Windows Server to receive the data. That would be $2500 off the top of my head for a cheap solution, including hardware.

Linux / UNIX Way

Using the ‘find’ command, we can get a list of all the files that changed in the last 1, 2, 5, 10, 20, 45 days on our system. Deleted files are gone, but if we are consistent and capture the list of files daily, then we can see which are missing between the to lists.

I don’t really want all the files on the system in my testing. Certain directories are logical only, created by the OS as a way to access hardware and system information/settings. /proc/ and /sys/ and some files in /dev/ are like that.

To get started, lets be simple.
$ sudo find $HOME /etc -type f -mtime -1 -ls -print | egrep -v mozilla | tee /tmp/changed-files-$(date "+%F")

That will find all the files that changed in the last day located in /etc/ or under the userid’s HOME directory. Then it will strip out from that list all files under mozilla … which is going to be browser cache files and browser tracking db files, lists of URLs visited. Turned out for me that NOT excluding that made the list over 3600 files. With excluding Mozilla modified files, only 30 changed. That was manageable.
The -ls option gets some detailed information about the files. Then we have output go both to the screen AND to a log file with the current date. The ‘find’ -1 says within the last day. Because find uses the system for time and the system works in whole seconds almost always, the “last day” really means the number of seconds that 24 hours equates into, not “today’s date” i.e. all files since midnight – locally or GMT. Unix doesn’t work that way unless we go out of our way to make that happen.

Above we learned to use wc to count lines in a file. That’s who I quickly knew these commands:

$ egrep -v .cache  /tmp/changed-files-2021-06-16 | wc -l

166

$ egrep -v mozilla  /tmp/changed-files-2021-06-16 | wc -l

30

Were returned. I cal look over 30 filename in a few seconds to see what makes sense and what is surprising. Over time, I’ll get better and better at recognizing what’s important and what isn’t. For example, I knew that all the cache and mozilla files were unimportant, not useful to look at in the 3600 files that changed in the last day. I could add more filters to remove files that change all the time – like the $HOME/.config/dconf/user file. That file holds gnome stuff about apps, window placement, and some config settings. It is a binary DB file, so I’m not going to use diff/sdiff or meld to see changes from the last version.

There is a problem with this method.

I know that the files have been changed, but don’t have any way to see what actually changed. For that, versioned backups are necessary. Alas, that’s a different problem than just thinking a little differently to answer questions quickly.

Summary

As you can see, in Linux/UNIX taking a simple task and making it handle more and more becomes easy. This is a standard pattern that is used over and over. Being able to email the output of an automatic task happens all-the-time in Linux. I use this technique about 50 times a day. Ok, I review about 50 automatic emails every day – I don’t do anything besides delete the message after review. I should point out that for many of these tasks,

if nothing bad happens, then no email is sent. No news is good news, right?
If there isn’t any bad happening, don’t say anything. If there is a failure, fail loudly. These are also part of the Linux/UNIX philosophy. Silence is golden.

Posted in Computers
Meta 1 comment, permalink, rss, atom

James Boelter 02/15/2012 at 20:19

This is probably the single best article I’ve read (or at least remember reading) on the philosophical differences between Windows and Linux. A real eye-opener.

Many thanks for taking the time to post this!

JDPFu.com 2025

Open Source Solution Architect