page 1
PolyXarc
v2.0 9 April 1990
Public domain by Jeffrey J. Nonken
Amiga port by Steve Palm
OS/2 port by Bill Andrus
By giving this away, complete with source code and docs, I hope
to encourage others to do likewise. I think we can all gain in
the long run by sharing. Public Domain means that nobody owns it.
You can give it away, sell it, incorporate bits into other
programs, whatever you want. However, in the interests of program
management, I request that if you make changes that may benefit
others, you
1) Share them with us;
2) Send me a copy of source so I can make an 'official' release.
Warrantee: none.
Address netmail to 1:273/715 in Fidonet. If you wish to log on,
call (215)279-9799 (300/1200/2400/9600/14400 HST/V.32), PCPable.
You can get the latest distribution version of PolyXarc by file
requesting POLYXARC from 1:273/715. If you wish to send anything
via U.S. Snail, write to:
Jeffrey J. Nonken
507 Ave. Presidio
San Clemente, Ca. 92672
NOTE: Version 2.0 has major changes in the configuration file
syntax. VERSION 1.x CONFIGURATION FILES WILL NOT WORK WITH 2.x! I
have made no attempts to detect old configuration files and
provide correction or warning; that would complicate and enlarge
PolyXarc unnecessarily. It is up to you to make sure your
configuration files have been replaced or updated.
WHAT IT IS
==========
"Poly" means "many", and "Xarc" means "archive extractor".
PolyXarc is a program that permits automatic extraction of most
known archive formats. By "Xarc" I do not necessarily refer to
SEA's ARC program or the file format supported by ARC (which I
will refer to as ".ARC format" herein), although PolyXarc
supports both. The purpose of this program is to allow Fidonet
BBS system operators to automatically extract bundles that have
been made with various archive programs. Traditionally sysops
have use one .ARC format program or another. But recent
improvements in compression techniques, along with a rapidly
climbing volume of echomail, have encouraged sysops to look at
other programs.
It is not my purpose to praise one archive format over another.
In fact, the whole purpose of this program is to support as wide
a variety as possible.
PolyXarc 2.0: multiple-format archive extractor executive
page 2
PolyXarc does not itself extract archive files; PolyXarc
identifies which format is being used and calls on the
appropriate archive extractor program to do the actual work. In
this lies its flexibility. Think of PolyXarc as an "Archive
Extractor Executive".
PolyXarc is a direct replacement for SPAZ v1.40 (developed by Dan
Thomson and Andrew Farmer), except that PolyXarc is much larger
(about 35k vs. 6k) and may not work in cases where memory is at a
premium. However, the SPAZ developers have not been able to keep
up with recent developments. Therefore I wrote PolyXarc to take
up the slack.
Aside from maintaining compatibility with SPAZ, I designed
PolyXarc with three major goals in mind:
- directly executable by Opus (replaces ARCE) or other mailers;
- stand-alone (can automatically extract all mail bundles);
- maximum flexibility.
PolyXarc supports one critical command-line switch that SPAZ 1.40
does not: the '/r' switch. Opus 1.0x passes a '/r' switch to ARCE
to force it to overwrite existing files, and SPAZ will exit with
an error if it sees that switch, making it unsuitable for use as
a direct ARCE replacement. PolyXarc will accept either '/r' or
SPAZ's '/o'.
PolyXarc also duplicates SPAZ's ability to find all mail bundles
in a specified directory and de-archive all of them. In addition,
PolyXarc will sort the files by their date/timestamps before
beginning extraction to assure that the packets are, as much as
possible, extracted in the correct order. PolyXarc also gives you
the choice of limiting the number of archives that it extracts in
a session so that you can process them a few at a time instead of
all at once.
I tried to write PolyXarc to be as flexible and expandable as I
could within reason. Unlike most multi-archive mail unbundlers,
except for some .ARC format recognition, PolyXarc does not have
any specific format recognition built in. Instead, PolyXarc has a
configuration file that contains signature strings and command
line templates, allowing you to expand PolyXarc's capabilities
without having to write any code. In addition, you can customize
PolyXarc's recognition of .ARC format compression flags, and tell
it which program to use for which flag. However, if all else
fails, and you come upon a format which PolyXarc is not equipped
to handle, you can alter the source to accomodate it, for I have
released the source code to the public domain. This gives
unlimited permission for anybody to alter it, providing a
permanent upgrade path.
PolyXarc 2.0: multiple-format archive extractor executive
page 3
The one thing SPAZ has that PolyXarc does not support is the -A
switch. That causes SPAZ to limit itself to using ARCE for level
8 .ARC format files. Since PolyXarc on one hand does not directly
support any specific program, and on the other hand is completely
configurable, it makes no sense for PolyXarc to support the -A
flag. If -A shows up on the command line, PolyXarc will ignore
it.
HOW TO USE IT
=============
PolyXarc's syntax is:
PolyXarc [switches] archive [name [name...]] [switches]
Recognized switches are:
-Cconfig -I -O
-D -Maddr -R
-F[n] -N -Q
archive
This is the name of the archive file you wish to extract from, or
if you specify the -F switch, the path where mail bundles are
kept. In the former case (not using -F) wildcards are supported,
and if you do not supply an extension, PolyXarc will assume an
extension of ".*". In the latter case (using -F), you must
specify a path ONLY, not a file name. Also, if you specify -F you
may leave the archive path out completely and it will default to
the current directory.
name...
This is one or more names of files within the archive. If you
list more than one name, they should be separated by spaces.
These are optional. If left off, most extractors will assume that
you want to extract all files. The only limitation is that
PolyXarc expects an archive or path name first, then zero or more
file names. PolyXarc considers any parameter that does not start
with a switch character ('-' or '/') to be a file name or archive
name.
-Cconfig
This is the file that contains signature information. If it is
not in your current subdirectory you must supply a path to it as
well. The file name must follow the switch with no intervening
spaces. If you do not specify -C on the command line, PolyXarc
will look first for POLYXARC.CFG in the current directory, then
in the directory of the .EXE file, and if not found will then
look for a PENGUIN file (described later).
PolyXarc 2.0: multiple-format archive extractor executive
page 4
-D
Specifying this will cause PolyXarc to delete any archive file it
successfully has extracted. Specifying the /F parameter causes /D
to be enabled unconditionally.
-F[n]
-F will cause PolyXarc to extract all the mail bundles it can
find in the subdirectory specified by 'archive'. If you specify a
number, PolyXarc will extract up to that number of mail bundles.
'archive' must be a path, not a filename. Specifying -F also
forces -D and -O, and disables -N.
-I
I really mostly intended this for using in some manual modes.
This switch will cause PolyXarc to ignore errors returned from
the archivers.
-Maddr
This causes PolyXarc to calculate and display the net address of
the sender of each bundle. 'addr' is your netmail address, net
and node only. Example: -M273/715.
-N
Specifying this causes PolyXarc not to attempt to sort an .ARC
format file. Normal action is to sort all .ARC format files (the
files inside the archive) by the date and time stamps. Specifying
-F will override -N and cause PolyXarc to sort the files.
-O
This is the overwrite flag. If the archiver finds that a file it
is extracting already exists, this flag will cause it to
overwrite the old file without prompting. Specifying the -F
parameter causes -O to be enabled unconditionally.
-R
This is the same as -O. I included it to maintain ARCE syntax
compatibility.
-Q
Specifying -Q causes PolyXarc not to display most status and
configuration messages. Error messages will still be displayed.
NOTE: None of these switches is position dependent or
case sensitive. Any may be specified with a hyphen (-)
or a slash (/). If you specify a question mark (?) as
the first parameter, you will get a brief help message
and no processing will be done.
PolyXarc 2.0: multiple-format archive extractor executive
page 5
PolyXarc requires a single configuration file. This file gives
PolyXarc enough information to recognize various archive formats
by looking for specific 'signatures'. There are three ways to
specify the configuration file name:
- just have POLYXARC.CFG in the current directory, or in the
directory with POLYXARC.EXE;
- use the -C switch to specify it on the command line;
- use a PENGUIN file.
Doug Boone has proposed an idea that I rather like; rather than
proliferate configuration files, gather as many as possible into
one file. His implementation requires you to specify the file in
an environment variable called PENGUIN. PolyXarc may use the
PENGUIN technique if you have set the environment variable.
This is how PolyXarc decides which technique to use: if you
specify a file on the command line using the -C parameter,
PolyXarc will use that file. If you don't specify the -C
parameter, PolyXarc will look for POLYXARC.CFG in the current
directory. If it doesn't exist, PolyXarc will then look in the
directory that it was executed from. If it isn't there either,
PolyXarc will look for the PENGUIN environment variable. If it
exists, PolyXarc will use that file; otherwise, it will exit with
an error.
To set the environment variable, you would put a statement like
this into your AUTOEXEC.BAT file:
SET PENGUIN=D:\OPUS\PENGUIN.CFG
There are three keywords that PolyXarc recognizes in the
configuration file, plus BEGIN and END statements. PolyXarc will
not recognize any keywords until it sees a BEGIN POLYXARC
statement, and will ignore anything after an END POLYXARC
statement until it sees another BEGIN POLYXARC, if any. You may
have as many of these blocks as you want in the configuration
file, and they may overlap other programs' blocks if you wish.
PolyXarc will ignore anything it doesn't recognize. The keywords
are:
ARC
SIGNATURE
NOSORT
Before I describe the keyword syntaxes, let me explain the
command templates they use. A command template is a description
of the command line used to call a certain program. The general
syntax of a command line consists of the command, followed by one
or more parameters, the overwrite flag key, the extract flag key,
the filename key, and/or the extractname key. The parameters and
keys may be in different orders depending on the syntax of the
particular archive extractor you are using.
PolyXarc 2.0: multiple-format archive extractor executive
page 6
The command is the actual DOS command (such as ZOO or ARCE). The
parameters are any parameters needed to cause an extraction. The
extract flag key is actually an 'insert flag' that tells PolyXarc
where to put the extract switch if it is needed. The extract flag
key is "%1". The overwrite flag key, "%2", tells PolyXarc where
to put the overwrite switch if it is needed. The filename key,
"%3", is the name of the archive to extract from, and is also an
insert flag. The extractname key, "%4", is a list of names of
files to extract from the archive. Another way of looking at it
is that you take the general syntax for the particular archiver,
put a %1 in place of the extract switch (if any... most archivers
will not actually use this field), a %2 in place of the
overwrite switch, a %3 in place of the archive filename, and a %4
where the list of files would be if you wanted to specify which
files to extract. The actual order of these parameters depends on
what each particular archiver expects to see.
As mentioned above, most extractors will not need the extract
flag key. In most cases, the extractors have an extract command
(either explicit or implied) that is used whether you want
automatic overwrite or not; to overwrite, you use the overwrite
command in addition to the extract command. I have not seen any
archive extractor that uses a separate command for extract-
without-overwrite versus extract-with-overwrite; nevertheless, I
would rather include support for a redundant parameter now than
have to 'fix' it later. In any case, the extract flag key will
only be used if overwrite option is not invoked on PolyXarc's
command line, and the overwrite flag key will only be used if the
overwrite option is invoked. If there is an extract command that
is used whether or not overwrite is invoked, then you should
include it as a permanent part of the template for that command.
General syntax for the ARC keyword:
ARC level overwrite extract command_template
PolyXarc recognizes .ARC format files internally. The ARC command
spec defines a level (known as a 'compression type' by the .ARC
compatible programs) at which a certain program should be
executed. The lower numbers must be specified first, as a level
will actually specify its own level and all below it, and the
levels will be searched in definition order. For example, if you
specify:
ARC 8 /r ! arce %3 %4 %2
ARC 11 /WA ! pak /e %2 %3 %4
then 'arce' will be used for all archives that have files
compressed to levels 0 through 8. If an archive has anything
compressed with levels 9, 10, or 11, 'pak' will be used. If you
specify 'pak' first:
ARC 11 /WA ! pak /e %2 %3 %4
ARC 8 /r ! arce %3 %4 %2
then 'arce' will never be used, because PolyXarc will use the
first definition that has the same level or above as the highest
level in the .ARC file.
PolyXarc 2.0: multiple-format archive extractor executive
page 7
Overwrite is the parameter this particular archive program needs
in order to automatically overwrite existing files while
extracting. The overwrite parameter must be 5 characters or less.
Extract is the parameter this particular archive program needs in
order to extract files. This is only used for programs which use
different commmands for 'extract' and 'extract overwrite'. The
extract parameter must be 5 characters or less.
In most cases you will not need the extract parameter. However,
there must be at least one character defined for each field,
whether it is used or not. In my distributed configuration file I
simply put in a single exclamation mark (!) as a place holder.
The command template defines the command line (including the name
of the arc file) used to invoke the necessary de-archiver.
Here are some examples from the distributed configuration file:
;
; Make sure these are in order from low to high level!
;
ARC 8 /r ! arce %3 %4 %2
ARC 9 /r ! pkunpak %2 %3 %4
ARC 11 /WA ! pak /e %2 %3 %4
Notice that the level numbers go from low to high. Here is the
PAK command line, first with overwrite enabled then without,
after expansion:
pak /e /WA 0000FFFB.MO2
pak /e 0000FFFB.MO2
General syntax for the SIGNATURE keyword:
SIGNATURE type offset signature overwrite [extract] template
This command allows PolyXarc to automatically recognize different
compressor archive formats without being locked into any
particular format or set of formats. The only limitation is that
the signatures must consist of non-whitespace printable
characters. .ARC files are not a problem because they are
recognized by PolyXarc internally. Note that you should not try
to define a SIGNATURE for .ARC format files. The different
signatures will be looked for in order of definition, with .ARC
files coming last.
Offset is the offset into the file of the signature. If offset is
preceeded by a '-', PolyXarc will count backwards from the end of
the file.
Signature is the text of the signature. The signature parameter
must be 5 characters or less.
PolyXarc 2.0: multiple-format archive extractor executive
page 8
Overwrite is the parameter this particular archive program needs
in order to automatically overwrite existing files while
extracting. The overwrite parameter must be 5 characters or less.
Extract is the parameter this particular archive program needs in
order to extract files. This is only used for programs which use
different commmands for 'extract' and 'extract overwrite'. The
extract parameter must be 5 characters or less.
In most cases you will not need the extract parameter. However,
there must be at least one character defined for each field,
whether it is used or not. In my distributed configuration file I
simply put in a single exclamation mark (!) as a place holder.
The command template defines the command line (including the name
of the arc file) used to invoke the necessary de-archiver.
Here are some examples from the distributed configuration file:
;
SIGNATURE -3 DWC w ! dwc e%2 %3 %4
SIGNATURE 2 -lh /mc ! lharc e %2 %3 %4
SIGNATURE 0 ZOO O ! zoo x%2 %3 %4
SIGNATURE 0 PK -o ! pkunzip %2 %3 %4
An expanded Zoo command line would look like this, first without,
then with overwrite:
zoo x 0000FFFB.MO2
zoo xO 0000FFFB.MO2
The NOSORT keyword syntax is:
NOSORT
If specified, NOSORT causes PolyXarc not to attempt to sort .ARC
format files. This is almost identical to the -N switch on the
command line. If neither is specified, PolyXarc will attempt to
sort the files in .ARC format archives by date and time in order
to keep mail packets in chronological order. This is in case the
sender used PKPAK or PKARC to pack his mail; these programs sort
the files in the archive alphabetically, which occasionally
causes messages to get out of order. Note that while the -F
switch will override the -N switch, the NOSORT keyword overrides
the -F switch, causing PolyXarc never to sort.
TECHNICAL SECTION
=================
PolyXarc figures out what format is being used by looking for
certain characteristics in the file. For example, each PKZIP
format file has "PK" as the first two bytes in the file. All .ARC
format files have a hex 1A value as the first byte. The SIGNATURE
lines in the configuration file describe what to look for, where
to look for it, and what command syntax to use if PolyXarc finds
it.
PolyXarc 2.0: multiple-format archive extractor executive
page 9
...except for .ARC files. Over the years .ARC files have added a
number of different formats. Each time one is added, it gets
assigned a level. The standard programs (ARC, ARCA, and ARCE)
support up to level 8. PKPAK added level 9, and PAK added levels
10 and 11. Due to program size and other considerations, people
often prefer to use the minimum program required for the job,
even though (so far) all the programs support all the levels up
to the ones they have defined. In other words, PKUNPAK will
extract arc files up to level 9, and PAK supports extraction of
all levels up to 11. So if you want, you can use PAK for
extraction all the time. Either way, the ARC lines in the
configuration file describe what to look for, and what command
syntax to use if PolyXarc finds it. Each ARC line has a level
number. That number is the highest level that program will
support, not just the only number. So you want to define the ARC
keywords in ascending order of level so that PolyXarc uses the
minimum program for the highest level found in the archive.
EXAMPLES
========
I have my incoming and outgoing mail on two different drives. To
optimize space usage on my outgoing drive, I have PolyXarc
extract the archives into the incoming directory before I let
QMail loose on them. Therefore my batch file looks something like
this:
C:
CD \NET\FILE
PolyXarc -F -M273/715
E:
QM TOSS etc.
If you want to run PolyXarc directly from Opus on a barefoot Opus
system, simply rename POLYXARC.EXE to ARCE.EXE. If you already
have the real ARCE someplace on your system, make sure that the
renamed PolyXarc is either earlier in the path, or replaces ARCE,
or is in the directory you are running it from. Otherwise, it
will quite likely find the original ARCE first. Remember also, if
you have ARCE.COM and ARCE.EXE in the same directory, DOS will
always run ARCE.COM. Also remember to either remove the ARCE
control line from PolyXarc's configuration file, or else specify
ARCE's path in the control line; otherwise PolyXarc will end up
trying to execute itself and you will end up with what is known
in the business as 'a mess'.
To use PolyXarc from Confmail, add the -A parameter to the end of
Confmail's command line:
Confmail import blah blah blah -A PolyXarc -o -m273/715
PolyXarc 2.0: multiple-format archive extractor executive
page 10
RETURN CODES &c.
================
0 = No error.
1 = Can't open/read source (missing or invalid file).
2 = Can't create/write dest (maybe disk or directory full).
3 = Can't delete old source file (read-only).
4 = Can't rename temp file to source name.
5 = Out of memory.
6 = Cannot determine archive type.
7 = Configuration file syntax error.
8 = Error was returned from the archive extractor.
9 = File was zero-length.
10 = DOS error while trying to execute the extractor.
Note that if you use the -f switch, not all of these errors will
be returned. In the case of a zero-length file, PolyXarc -f will
simply give a warning message, delete the file, and go on. In the
cases of errors 3, 6 and 8, PolyXarc will rename the file to
BAD_ARC.??? (where ??? is an ascending decimal number starting
from 000) and go on.
CAVEATS
=======
If a file has been transferred via XMODEM, the last record of
that file will be padded with nulls. In the case of DWC-
compressed files, this can be a problem. Since PolyXarc looks for
"DWC" at the end of the file, the null padding will cause
PolyXarc not to recognize a DWC file. I could add enough smarts
into PolyXarc to have it search backwards through the nulls.
However, DWC is not used by many sysops, and there are other
programs that work just as well, so it doesn't seem worthwhile.
This is a potential problem for any archiver whose signature is
at the end of the file, though at the moment I don't know of any
besides DWC.
ACKNOWLEDGEMENTS
================
Thanks to Mike Housky for the entire .ARC format search-and-sort
section. I shamelessly stole it straight out of his PAKSORT
program. Thanks also to Dan Thomson for coming up with the
original program, SPAZ, and saving me a lot of research and
planning. (There is probably no easier program to design than one
for which you have a working model!)
Many thanks to John Lull and Clay Tinsley, two of my beta
testers. They faithfully ran every revision I put out and let me
know whenever something went wrong. (Which was often.) Clay was
also responsible for convincing me to write PolyXarc. David Page,
another beta tester, also gave some helpful criticism.
Thanks to Steve Palm, who ported PolyXarc over to use on his
Amiga, and Bill Andrus, who ported PolyXarc over to OS/2. Between
us we came up with a set of source files and definitions that
allow us to compile the same source under our respective
operating systems.
PolyXarc 2.0: multiple-format archive extractor executive
page 11
Thanks to Thom Henderson, Phil Katz, Rahul Dhesi, David W.
Cooper, Nogate Consulting, Haruyasu Yoshizaki, Vernon Buerg,
Wayne Chin, and all the other archive program authors for all
their hard work. Keep it up, guys!
And as always, thanks to Ward Christensen for everything.
- Jeffrey J. Nonken, sysop of Ophiophile Opus (1:273/715 in
Fidonet) in Blue Bell, Pa., and an all-around great guy.
PolyXarc 2.0: multiple-format archive extractor executive