Read posts about unix

August 7

A spoof on one of my Unix articles (Kilala.nl (Cailin Coilleach)) by Cailin Coilleach

The Martian Diary logo
Bwahaha, this is awesome! ^_^

A few months back I wrote an article about Unix shell variable scoping. I wrote the article after running into an odd situation where variables would lose their value, depending on which part of my script was running. All programming languages have these kinds of issues and they are commonly referred to as the variable scope. In the end I figured out most of what was going down and I wrote about it so others would find my solution.

In the meantime dozens of people -have- found this article, but one reader in particular has done something rather amusing with it! Instead of applying it to a scripting problem he (or is it a she?) was having, my article was turned into the diary of a Linux nerd.

Here's a small excerpt.
I was playing around with her (for a) "while" and was pipeing some data into her "while" (construct). Everything was working fine while I was inside her, but the moment I got out, it was as though everything changed. She has no memory of me and all the things I had did inside her. Half an hour of work and no final effect and I actually thought she was enjoying it.

Dramatic, no?

Posted in: diary of a linux nerd , spoof , sysadmin , unix
June 19

michaelb (Put together quickly (Haligan)) by michaelb


Leopard on Intel-based Macs is now registered as an Open Brand UNIX 03 product. So now have better compatibility to compile and run that existing UNIX code.

Posted in: mac os x , open brand , open group , unix

Mac OS 10.5 is now an Open Brand UNIX 03 Registered Product (Put together quickly (Haligan)) by michaelb

Leopard on Intel-based Macs is now registered as an Open Brand UNIX 03 product. So now have better compatibility to compile and run that existing UNIX code.

Posted in: mac os x , open brand , open group , unix
April 18

Cutting down on the use of pipes (Kilala.nl (Cailin Coilleach)) by Cailin Coilleach

One of the obvious down sides to using a scripting language like ksh as opposed to a "real" programming language like Perl or PHP (or C for that matter) is that, for each command that you string together, you're forking off a new process.

This isn't much of a problem when your script isn't too convoluted or when your dataset isn't too large. However, when you start processing 40-50MB log files with multiple FOR loops containing a few IF statements for each line, then you start running into performance issues.

And as I'm running into just that I'm trying to find ways to cut down on the forking, which means getting rid of as many IFs and pipes as possible. Here's a few examples of what has worked for me so far...

Instead of running:
[ expr1 ] && command1
[ expr2 ] && command1

Run:
[ (expr1) && (expr2) ] && command1

Why? Because if test works the way I expect it to, it'll die if the first expression is untrue, meaning that it won't even try the second expression. If you have multiple commands that complement eachother then you ought to be able to fit them into a set of parentheses after test cutting down on more forks.

Instead of running:
if [ `echo $STRING | grep $QUERY | wc -l` -gt 0 ]; then

Run:

if [ ! -z `echo $STRING | grep $QUERY` ]; then

More ideas to follow soon. Maybe I ought to start learning a "real" programming language? :D

EDIT:
OMG! I can't believe that I've just learnt this now, after eight years in the field! When using the Korn shell use [[ expr ]] for your tests as opposed to [ expr ].

Why? Because the [ expr ] is a throw-back to Bourne shell compatibility that makes use of the external test binary, as opposed to the built-in test function. This should speed up things considerably!

Posted in: optimization , pipes , shell script , unix
March 13

Parallellization in shell scripts (Kilala.nl (Cailin Coilleach)) by Cailin Coilleach

Today I was working on a shell script that's supposed to process multiple text files in the exact same manner. Usually you can get through this by running a FOR-loop where the code inside the loop is repeated for each file in a sequential manner.

Since this would take a lot of time (going over 1e6 lines of text in multiple passes) I wondered whether it wouldn't be possible to run the contents of the FOR-loop in parallel. I rehashed my script into the following form:

subroutine()
{
contents of old FOR-loop, using $FILE
}

for file in "list of files"
do
FILE="$file"
subroutine &
done

This will result in a new instance of your script for each file in the list. Got seven files to process? You'll end up with seven additional processes that are vying for the CPUs attention.

On average I've found that the performance of my shell script was improved by a factor of 2.5, going from ~40 lines per three seconds to ~100 lines. I was processing seven files in this case.

The only downside to this is that you're going to have to build in some additional code that prevents your shell script from running ahead, while the subroutines are running in the background. What this code needs to be fully depends on the stuff you're doing in the subroutine.

Posted in: ksh , parallellization , shell script , unix