Files
from Programming Perl... A filehandle is just a name you give to a file, device, socket, or pipe to help you remember which one you're talking about, and to hide some of the complexities of buffering and such... Filehandles make it easier for you to get input from and send output to many different places.
Find...
Find files with some characteristic
$path = "/path/to/dir/";
opendir DIR, $path;
@arr1 = readdir DIR;
@arr2 = grep{-T "$path$_"} @arr1; #text files only
@arr3 = grep{!-d "$path$_"} @arr1; #no directories
@arr4 = grep{-s "$path$_" < 1024} @arr1; #less than 1K
The file test operator "-T" returns true if the tested directory item is a text file. There are about two-dozen other tests available for use against a directory item. Notice the readdir function returns a list of the names of every item in the specified directory. Since the file test expects a full path it is necessary to rebuild it in the grep block by combining the path to the directory (stored in $path) and the name of the item (placed temporarily in $_).
Recursive Directory Searches
use File::Find;
find(\&handleFind, '/home/documents/code');
sub handleFind {
my $foundFile = $File::Find::name;
print "$foundFile\n" if ($foundFile =~ /\.html?$/i);
}
/documents/code/index.html
/documents/code/perl/example.HTM
Perl programmers who run their scripts on Unix systems have simple ways to use the Unix system tools to accomplish everyday tasks like recursive directory listings (i.e. listing every directory item, and all the items contained in any subdirectories within). One of the greatest features of Perl however is the fact that it runs on so many different computer platforms. So, if you happen to be on a non-Unix system or if you just don't like writing scripts that use the system tools, there is another way to do it. This trick uses the module File::Find to accomplish its task. When you use this module you gain access to a subroutine called "find" which expects a list of arguments to be passed: the first argument is a reference to a subroutine in your script that will be called every time a file is found, followed by a list of file paths to search in. "find" will chug away calling your subroutine every time it finds a file, going deeper into subdirectories if needed, meanwhile within my "handleFind" subroutine I retrieve the name of the file that "find" found by referring to a special module variable called "$File::Find::name". I can then do any test or process I wish on the file -- in this case I just print the name if it ends in an HTML file extension.
Read...
Read the contents of an entire file at once
open FILE, '< ./anthem.txt' or die $!;
flock FILE, 1 or die $!; # wait for lock
seek(FILE, 0, 0); # move pointer to beginning
my $slurp = do{local $/; <FILE>};
flock FILE, 8; # release the lock
close(FILE);
print $slurp;
If you're sure you really want to fill your computer's operating memory with the contents of a file, then this trick will accomplish that. The angle brackets "<>" work on a file handle by returning either: the next record, in a scalar context or a list of all the records, in an array context. It would be wise to format your data into manageable records, separated by some token characters. The special Perl variable $/ can be set to that record separator. By default $/ will be a newline, but by undefining it you cause perl to consider the entire file to be one single record. Because $/ is a global variable changing it in one place will have side-effects elsewhere. For that reason we create a limited scope and localize our $/ redefinition inside it by wrapping the work in a do{...} block. Note also that we lock the file for reading, which will wait for any other flocks to release before we try to access it.
Read the contents of a file more nicely
use IO::All;
my $file = io("./anthem.txt")->lock;
my $slurp = $file->slurp;
$file->close;
print $slurp;
The previous example looks awfully ugly considering Perl is supposed to be a "high level" language, doesn't it? So we install the IO::All module and things become much cleaner. The IO::All module brings together several powerful IO modules and offers a single unified interface to all of them.
Assign...
Assigning one filehandle to another
open(MYOUT, "> bottle.txt"); *STDOUT = *MYOUT; print "message";
You probably have used perl's print function with a filehandle before, but did you know that even if you don't use a filehandle perl assumes you mean a default one called "STDOUT"? C programmers will recognize that as the standard output -- usually the screen, or terminal window (or the browser, when writing CGIs). What we've done here is create our own filehandle, pointing to a file. Then we've done something a little sneaky -- we've used the star symbol-prefix to refer to STDOUT as a typeglob. Typeglobs allow use to create an "alias" of sorts, making all variables of a certain name point to another variable of a certain name. The second line of this snippet basically says that the STDOUT variable now aliases to the MYOUT variable. Once this is done any prints to the default filehandle go instead to our own filehandle.
Modify...
Change the first, last or nth line of a file
use Tie::File;
my @lines = ();
tie @lines, 'Tie::File', 'log_file.txt'
or die "Can't tie file: $!";
unshift @lines, "New first line.\n";
print $lines[0];
Wouldn't it be nice to be able to treat the lines of a file with as much flexibility as you can an array? Want the 42nd line? Want to delete the last line, or add a new line to the beginning? These things are pretty simple to do with an array, but damned hard to do with a file. Mark Jason Dominus' Tie::File module gives you all the magic you need to do both. Just tie your file to an array variable, and from then on whatever you do to the array happens to the file. While this module does the trick, if you really need to do this sort of thing on a regular basis, you should probably ask yourself why your records aren't stored in a database.
Write...
Writing to two filehandles at the same time
use IO::Tee;
$tee = IO::Tee->new(">> debuglog.txt", \*STDOUT);
print $tee "an error occurred on ".scalar(localtime)."\n";
If, for whatever reason you wanted to print the same string to two places at once you are trying to do what the Unix utility "tee" does. This functionality is available in your perl script even if you aren't running it on Unix, via the IO::Tee module. Tee is object-oriented so you must first create a Tee object using the new constructor. This sub takes two arguments, each can be either a string representing a filehandle or a reference to an open filehandle. In this case we use a string representing an appending filehandle, pointing to a file called "debuglog.txt" and a built-in filehandle called STDOUT. This built-in filehandle is automatically created and it is in fact the default target that print points to (usually the terminal or, for a CGI the browser). To get a reference to a filehandle requires that we use the back-slash operator on a typeglob symbol, the star. Typeglobs are a special way to refer to all variables of a given name at once (regardless of its "type", array, hash, scalar etc.). Its necessary to use the star because filehandles have no prefix symbol of their own. new returns an instance object of the class Tee, and we assign this to the scalar $tee. Now whenever we print to the $tee object we are actually sending the string to two places at once!
Writing to a file, creating nested directories if necessary
use File::Path;
use File::Basename;
my $log_file = "logs/tuesday/am/traffic.log";
eval{ mkpath(dirname($log_file)) };
$@ and print "Couldn't create path: $@";
open LOG, ">> $log_file" or die $!;
print LOG "ok\n";
It's easy enough if your script is creating dynamically named files, but what if you want to create nested directories on the fly? You could write some complicated loop to find each intervening directory, test if it already exists and use mkdir to create it, but why bother when you can use the File::Path module? This gives you a very handy "mkpath" function, that works like Perl's own mkdir, but accepts and creates a whole path of directories. And if your path happens to have a filename on the end, you can pare it down to just the directory name using File::Basename's "dirname" function.
Printing to an array (or any buffer-like variable)
my @output;
BEGIN {
{ package Buffer;
sub TIEHANDLE { my ($class, $buffer) = @_; bless $buffer, $class; }
sub PRINT { my $buffer = shift; push @$buffer, $_ foreach @_; }
}
tie *STDOUT=>"Buffer", \@output;
}
print "I ", "like ";
print "bread ", ".";
END {
no warnings 'untie'; untie *STDOUT;
print reverse @output;
}
You can tie any filehandle to a package, even the built-in filehandle STDOUT, the default target of perl's print function. Doing this allows you to grab the strings passed to print and do what ever you like with them from within that package. This must be set up in a BEGIN block to ensure that all prints are caught. Then, in the END block, you can untie your filehandle to restore its usual behavior.
More...
Finding the basename of a file from its full path-name
use File::Basename; $path = "/home/docs/trick.of.the.week.html"; $basename = basename($path, ".html"); print $basename;
Okay, we cheated. The problem was to find just the name of the file, without its preceding path, and minus the "dot extension". The File::Basename module makes it pretty easy to do this. We just pass it the whole path, and a string that specifies what extension we want removed and presto we're done. The "path" is a string of names representing all the directories (also called "folders") our file is enclosed in. Notice that each directory name is separated by a forward slash. This character is special, because it is reserved by the computer's operating system just for this purpose. You are never allowed to use your system's directory separator character in the name of a directory or file -- that would just be too confusing for everyone, especially the system. It is useful to know that the most popular OS each use different characters for this purpose: Unixish systems use the forward-slash, Windows uses the back-slash (on Windows, by the way, you can use either forward or backward slashes in your scripts to represent path separators and Perl will know what you mean). File::Basename, of course, will properly find the file's basename whatever OS you run it on.
Changing ownership of a file to a particular username
($uid, $gid) = (getpwnam($username))[2,3]
or die "$user not in passwd file";
chown ($uid, $gid, $file)
or warn "couldn't chown $file.";
You may have an occasion where you know a username and need to do something with it, like change the ownership on a file. Unfortunately, Perl's chown command doesn't take a username as an argument, but rather a pair of numbers: the userid, and the groupid. But Perl hasn't left us stranded with a string when only a pair of numbers will do. By using the getpwnam function and passing the username we get an array of information about that username from the passwd file. For use in chown you'll need the 2 and 3 elements of the array. AUTHOR: Luke Melia
Rename every file in a directory
$ perl -e 'for(`ls -1 *.htm`) {chomp; $f=$_; tr/A-Z/a-z/; rename $f, $_};'
This will only work on a Unixish command line, but could probably be adapted to work on other systems. The interesting bit is in the for (a synonym for foreach here) source, which is a back-ticked expression. With many Unix commands, you can use them in back-ticks where you would otherwise put an array. We are translating filenames from upper to lowercase, but you could replace that by any transformation you like, such as s/\.htm/.html/. These will work on the $_ variable by default, which is just what for gives you by default.