chomp
is often used when reading from a file. By default it trims the newline character, although for its full functionality refer to the perldocs.
Beware of the difference between characters and bytes: Not all encodings - especially UTF-8 - use 1-byte-characters. While this is handled pretty much flawlessly by PerlIO, there is one potential pitfall of note:
read
uses characters for its length and offset parametersseek
and tell
always use bytes for positioningSo don't use arithmetics based on these mixed values. Instead use e.g. Encode::encode('utf8',$value_by_read)
to get the octets(bytes) from a read
result, whose count you can then use with tell
and seek
.
my $filename = '/path/to/file';
open my $fh, '<', $filename or die "Failed to open file: $filename";
# You can then either read the file one line at a time...
while(chomp(my $line = <$fh>)) {
print $line . "\n";
}
# ...or read whole file into an array in one go
chomp(my @fileArray = <$fh>);
If you know that your input file is UTF-8, you can specify the encoding:
open my $fh, '<:encoding(utf8)', $filename or die "Failed to open file: $filename";
After finished reading from the file, the filehandle should be closed:
close $fh or warn "close failed: $!";
See also: Reading a file into a variable
Another and faster way to read a file is to use File::Slurper Module. This is useful if you work with many files.
use File::Slurper;
my $file = read_text("path/to/file"); # utf8 without CRLF transforms by default
print $file; #Contains the file body
See also: [Reading a file with slurp]
This code opens a file for writing. Returns an error if the file couldn't be opened. Also closes the file at the end.
#!/usr/bin/perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
# Open "output.txt" for writing (">") and from now on, refer to it as the variable $fh.
open(my $fh, ">", "output.txt")
# In case the action failed, print error message and quit.
or die "Can't open > output.txt: $!";
Now we have an open file ready for writing which we access through $fh
(this variable is called a filehandle). Next we can direct output to that file using the print
operator:
# Print "Hello" to $fh ("output.txt").
print $fh "Hello";
# Don't forget to close the file once we're done!
close $fh or warn "Close failed: $!";
The open
operator has a scalar variable ($fh
in this case) as its first parameter. Since it is defined in the open
operator it is treated as a filehandle. Second parameter ">"
(greater than) defines that the file is opened for writing. The last parameter is the path of the file to write the data to.
To write the data into the file, the print
operator is used along with the filehandle. Notice that in the print
operator there is no comma between the filehandle and the statement itself, just whitespace.
open my $filehandle, '<', $name_of_file or die "Can't open $name_of_file, $!";
This is the basic idiom for "default" File IO and makes $filehandle
a readable input stream of bytes
, filtered by a default system-specific decoder, which can be locally set with the open
pragma
Perl itself does not handle errors in file opening, so you have to handle those yourself by checking the exit condition of open
. $!
is populated with the error message that caused open to fail.
On Windows, the default decoder is a "CRLF" filter, which maps any "\r\n" sequences in the input to "\n"
open my $filehandle, '<:raw', 'path/to/file' or die "Can't open $name_of_file, $!";
This specifies that Perl should not perform a CRLF
translation on Windows.
open my $filehandle, '<:raw:encoding(utf-8)', 'path/to/file'
or die "Can't open $name_of_file, $!";
This specifies that Perl should both avoid CRLF
translation, and then decode the resulting bytes into strings of characters ( internally implemented as arrays of integers which can exceed 255 ), instead of strings of bytes
Before reading and writing text files you should know what encoding to use. See the Perl Unicode Documentation for more details on encoding. Here we show the setting of UTF-8 as the default encoding and decoding for the function open
. This is done by using the open
pragma near the top of your code (right after use strict;
and use warnings;
would be appropriate):
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std );
The open
function creates a filehandle that is used for reading from and/or writing to a file. The open
function has the signature
open(FILEHANDLE, MODE, FILEPATH)
and returns a false value if the operation fails. The error description is then stored to $!
.
Reading
#!/usr/bin/perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
my $file_path = "/path/to/file";
open(my $file_handle, '<', $file_path) or die "Could not open file! $!";
while(my $row = <$file_handle>) {
print chomp($row), "\n";
}
close $file_handle
or warn "Close failed!";
Writing
#!/usr/bin/perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
my $file_path = "/path/to/file";
open(my $file_handle, '>', $file_path) or die "Could not open file! $!";
print $file_handle "Writing to a file";
close $file_handle
or warn "Close failed!";
Reading chunks
Opening and reading big files can take some time and resources. If only a small part of the content is required, it might be a good idea to read the content in chunks using the read
function which has the signature
read(FILEHANDLE, SCALAR, LENGTH, OFFSET)
FILEHANDLE
must be an opened file handle, SCALAR
will hold the read data after the operation. LENGTH
specifies the number of characters to be read starting from the OFFSET
. The function returns the number of characters read, 0
if the end of file was reached and undef
in case of an error.
read($file_handle, $data, 16, 0);
Reads 16 characters from the beginning of the file into $data
.
autodie
allows you to work with files without having to explicitly check for open/close failures.Since Perl 5.10.1, the autodie
pragma has been available in core Perl. When used, Perl will automatically check for errors when opening and closing files.
Here is an example in which all of the lines of one file are read and then written to the end of a log file.
use 5.010; # 5.010 and later enable "say", which prints arguments, then a newline
use strict; # require declaring variables (avoid silent errors due to typos)
use warnings; # enable helpful syntax-related warnings
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
use autodie; # Automatically handle errors in opening and closing files
open(my $fh_in, '<', "input.txt"); # check for failure is automatic
# open a file for appending (i.e. using ">>")
open( my $fh_log, '>>', "output.log"); # check for failure is automatic
while (my $line = readline $fh_in) # also works: while (my $line = <$fh_in>)
{
# remove newline
chomp $line;
# write to log file
say $fh_log $line or die "failed to print '$line'"; # autodie doesn't check print
}
# Close the file handles (check for failure is automatic)
close $fh_in;
close $fh_log;
By the way, you should technically always check print
statements. Many people don't, but perl
(the Perl interpreter) doesn't do this automatically and neither does autodie
.
Sometimes it is needful to backtrack after reading.
# identify current position in file, in case the first line isn't a comment
my $current_pos = tell;
while (my $line = readline $fh)
{
if ($line =~ /$START_OF_COMMENT_LINE/)
{
push @names, get_name_from_comment($line);
}
else {
last; # break out of the while loop
}
$current_pos = tell; # keep track of current position, in case we need to rewind the next line read
}
# Step back a line so that it can be processed later as the first data line
seek $fh, $current_pos, 0;
To write a gzipped file, use
the module IO::Compress::Gzip
and create a filehandle by creating a new instance of IO::Compress::Gzip
for the desired output file:
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
use IO::Compress::Gzip;
my $fh_out = IO::Compress::Gzip->new("hello.txt.gz");
print $fh_out "Hello World!\n";
close $fh_out;
use IO::Compress::Gzip;
To read from a gzipped file, use
the module IO::Uncompress::Gunzip
and then create a filehandle by creating a new instance of IO::Uncompress::Gunzip
for the input file:
#!/bin/env perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
use IO::Uncompress::Gunzip;
my $fh_in = IO::Uncompress::Gunzip->new("hello.txt.gz");
my $line = readline $fh_in;
print $line;
# encode/decode UTF-8 for files and standard input/output
use open qw( :encoding(UTF-8) :std );
This pragma
changes the default mode of reading and writing text ( files, standard input, standard output, and standard error ) to UTF-8, which is typically what you want when writing new applications.
ASCII is a subset of UTF-8, so this is not expected to cause any problems with legacy ASCII files and will help protect you the accidental file corruption that can happen when treating UTF-8 files as ASCII.
However, it is important that you know what the encoding of your files is that you are dealing with and handle them accordingly. (Reasons that we should not ignore Unicode.) For more in depth treatment of Unicode, please see the Perl Unicode topic.
Mode | Explaination |
---|---|
> | Write (trunc). Will overwrite existing files. Creates a new file if no file was found |
>> | Write (append). Will not overwrite files but append new content at the end of it. Will also create a file if used for opening a non existing file. |
< | Read. Opens the file in read only mode. |
+< | Read / Write. Will not create or truncate the file. |
+> | Read / Write (trunc). Will create and truncate the file. |
+>> | Read / Write (append). Will create but not truncate the file. |