Ask the Delphi Pro 10-Minute Solutions
 |
Getting the Number of Records From a Fixed-Length ASCII File
By Brendan Delumpa
If you work with fixed-length ASCII files, you might want to know how many total lines there are in a file. Sure, you can open up the file in a text editor, but large files can take forever to load. You may be wondering if there is a better way. I've come up with three methods of handling this problem. The first method I'm going to show you may not be the best way, but it's reasonably fast, and exceptionally easy to use. It starts out with this premise: If you know the total number of bytes in the file and know the length of each record, you can divide the total bytes by the record length, and you should get the number of records in the file. Sounds reasonable, right?
For this example, I used a TFileStream object to open up my text file. I like using this particular object because it has some convenient methods and properties that you can use to get the information that you needin particular, the Size property and the Read and Seek methods. So how do you use them?
First, open up a file stream on a text file, and then get its total byte size. Now, serially move through the file, byte-by-byte reading each byte into a single-character buffer until you reach a return character (#13). As you pass each byte, increment a counter variable that will serve as both a file reference point and later, the length of the record. When you get to the return character, break out of the loop, add 2 to the reference counter (to account for the #13#10 CR/LF pair). Finally return the result as the file size divided by the record length:
{======================================================================
This function will give you the exact record count of a file. It uses
a TFileStream and goes through it byte by byte until it encounters
a #13. When it does, it adds 2 to the recLen to account for the #13#10
CR/LF pair, then divides the byte size of the file by the record true
record length.
Note that this will only work on text files.
======================================================================}
function GetTextFileRecords(FileName : String) : Integer;
var
ts : TFileStream;
fSize,
recLen : Integer;
buf : Char;
begin
buf := #0;
recLen := 0;
//Open up a File Stream
ts := TFileStream.Create(FileName, fmOpenRead);
with ts do begin
//Get the File Size
fSize := Size;
try
//Move through the file a byte at a time
while (buf <> #13) do begin
Seek(recLen, soFromBeginning);
Read(buf, 1);
Inc(recLen);
end
finally
Free;
end;
end;
recLen := recLen + 2; //Need to account for CR/LF pair.
Result := Round(fSize/recLen);
end;
This method may not be the most efficient, but it's safe, and it works. The second method, however, is a faster way of doing this. Open up the file as a regular file, then read a bunch of bytes into a large bufferlet's say an Array of Char 4K in size:
function GetTextFileRecords(FileName : String) : Integer;
const
BlockSize = 8192;
var
F : File;
fSize,
amtXfer : Integer;
buf : Array[0..BlockSize] of Char;
begin
AssignFile(F, FileName); //Open up the text file as an untyped file
Reset(F, 1);
fSize := FileSize(F); //Get the file size
BlockRead(F, buf, BlockSize, amtXfer); //read in up to an 8K block
CloseFile(F); //close the file, you're done
Result := Round(fSize/(Pos(#13, StrPas(buf)) + 1));
end;
Perusing through an array is much faster than moving through a file, but the disadvantage is that you run the risk of having the buffer too small. I've seen some fixed-length ASCII files with line sizes up to 8K. However, there are several things different about this function as opposed to the first function I wrote. First of all, it involves a lot less code. This is due to not having to perform class constructor; I open up an untyped file, read a big block, get its size, then immediately close it. Notice too that I don't use a loop to find a #13. Instead, I use the StrPas function to convert the array of char into a string that's passed to the Pos function that will give me the position of the return character; thus the record length. Adding one to this value will account for the #10 portion of the CR/LF pair.
Because I don't have to deal with constructing an object, this method is a lot faster than the earlier method, and amazingly it's not very complicated. Where this type of operation can get tricky is with the BlockRead function. In order to use BlockRead successfully, you need to specify a record size. That specification can be a bit confusing, so just remember this: for byte-by-byte serial reads through a file, always use a record size of 1. Also, notice that I included a variable called amtXfer. BlockRead fills this variable with the actual number of bytes read. If you don't supply this variable, you'll raise an exception when BlockRead executes. That's not too much of a problem because all you need to do is create an exception handling blockbut why bother? Just supply the variable, and you don't have to worry about the exception.
Is this the best way to get the record length of a fixed length text file? Admittedly, it's one of the faster ways save using Assembler. But I'm wondering what a purely WinAPI call set would look like. I guess my curiosity got the best of me, because I wasn't satisfied doing just the BlockRead method. I knew there had to be another way to do it with WinAPI calls:
function GetTextFileRecordsWinAPI(FileName : String) : Integer;
const
BlockSize = 8192;
var
F : THandle;
amtXFer,
fSize : DWORD;
buf : Array[0..BlockSize] of Char;
begin
//Open up file
F := CreateFile(PChar(FileName), GENERIC_READ, FILE_SHARE_READ, nil,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL OR FILE_FLAG_NO_BUFFERING, 0);
fSize := GetFileSize(F, nil); //Get the file's size
ReadFile(F, buf, BlockSize, amtXfer, nil); //Read a block from the file
CloseHandle(F);
Result := Round(fSize/(Pos(#13, StrPas(buf)) + 1));
end;
This last method is almost exactly the same as the second method, but instead uses WinAPI calls to accomplish the same task. Actually, for simplicity's sake, I prefer the elegance of the second methodthere's just a lot less coding involved. With the WinAPI method, while it may require one less line of code, the CreateFile function is not the easiest thing to work withI spent a bit of time Alt-Tabbing between the code editor and Windows help to get the syntax and constants right. Granted, it's easier now that I've done it, but it's not a method that I prefer. So I'll leave it up to you to decide which method you like best.
|
|
|