POLYXARC.DOC

30 KB 438075c0ca3bfdef…
                                                                page 1


                                    PolyXarc
                               v2.0  9 April 1990
                       Public domain by Jeffrey J. Nonken
                            Amiga port by Steve Palm
                            OS/2 port by Bill Andrus

        By  giving this away, complete with source code and docs, I  hope 
        to  encourage others to do likewise. I think we can all  gain  in 
        the long run by sharing. Public Domain means that nobody owns it. 
        You  can  give  it away, sell it,  incorporate  bits  into  other 
        programs, whatever you want. However, in the interests of program 
        management,  I request that if you make changes that may  benefit 
        others, you
         1) Share them with us;
         2) Send me a copy of source so I can make an 'official' release.

        Warrantee: none.

        Address  netmail to 1:273/715 in Fidonet. If you wish to log  on, 
        call (215)279-9799 (300/1200/2400/9600/14400 HST/V.32),  PCPable. 
        You  can get the latest distribution version of PolyXarc by  file 
        requesting POLYXARC from 1:273/715. If you wish to send  anything 
        via U.S. Snail, write to:

                               Jeffrey J. Nonken
                               507 Ave. Presidio
                               San Clemente, Ca. 92672

        NOTE:  Version  2.0 has major changes in the  configuration  file 
        syntax. VERSION 1.x CONFIGURATION FILES WILL NOT WORK WITH 2.x! I 
        have  made  no  attempts to detect old  configuration  files  and 
        provide correction or warning; that would complicate and  enlarge 
        PolyXarc  unnecessarily.  It  is  up to you  to  make  sure  your 
        configuration files have been replaced or updated.

        WHAT IT IS
        ==========
        "Poly"  means  "many",  and  "Xarc"  means  "archive  extractor". 
        PolyXarc  is a program that permits automatic extraction of  most 
        known  archive formats. By "Xarc" I do not necessarily  refer  to 
        SEA's  ARC program or the file format supported by ARC  (which  I 
        will  refer  to  as  ".ARC  format"  herein),  although  PolyXarc 
        supports  both. The purpose of this program is to  allow  Fidonet 
        BBS  system operators to automatically extract bundles that  have 
        been  made  with various archive programs.  Traditionally  sysops 
        have   use  one .ARC  format  program  or  another.  But   recent 
        improvements  in  compression techniques, along  with  a  rapidly 
        climbing  volume of echomail, have encouraged sysops to  look  at 
        other programs.

        It  is not my purpose to praise one archive format over  another. 
        In fact, the whole purpose of this program is to support as  wide 
        a variety as possible. 




           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 2


        PolyXarc   does  not  itself  extract  archive  files;   PolyXarc 
        identifies   which  format  is  being  used  and  calls  on   the 
        appropriate  archive extractor program to do the actual work.  In 
        this  lies  its  flexibility. Think of PolyXarc  as  an  "Archive 
        Extractor Executive".

        PolyXarc is a direct replacement for SPAZ v1.40 (developed by Dan 
        Thomson  and Andrew Farmer), except that PolyXarc is much  larger 
        (about 35k vs. 6k) and may not work in cases where memory is at a 
        premium. However, the SPAZ developers have not been able to  keep 
        up  with recent developments. Therefore I wrote PolyXarc to  take 
        up the slack.

        Aside  from  maintaining  compatibility  with  SPAZ,  I  designed 
        PolyXarc with three major goals in mind:
          - directly executable by Opus (replaces ARCE) or other mailers;
          - stand-alone (can automatically extract all mail bundles);
          - maximum flexibility.

        PolyXarc supports one critical command-line switch that SPAZ 1.40 
        does not: the '/r' switch. Opus 1.0x passes a '/r' switch to ARCE 
        to force it to overwrite existing files, and SPAZ will exit  with 
        an error if it sees that switch, making it unsuitable for use  as 
        a  direct ARCE replacement. PolyXarc will accept either  '/r'  or 
        SPAZ's '/o'.

        PolyXarc also duplicates SPAZ's ability to find all mail  bundles 
        in a specified directory and de-archive all of them. In addition, 
        PolyXarc  will  sort the files by  their  date/timestamps  before 
        beginning  extraction to assure that the packets are, as much  as 
        possible, extracted in the correct order. PolyXarc also gives you 
        the choice of limiting the number of archives that it extracts in 
        a session so that you can process them a few at a time instead of 
        all at once.

        I  tried to write PolyXarc to be as flexible and expandable as  I 
        could  within reason. Unlike most multi-archive mail  unbundlers, 
        except  for some .ARC format recognition, PolyXarc does not  have 
        any specific format recognition built in. Instead, PolyXarc has a 
        configuration  file that contains signature strings  and  command 
        line  templates, allowing you to expand  PolyXarc's  capabilities 
        without having to write any code. In addition, you can  customize 
        PolyXarc's recognition of .ARC format compression flags, and tell 
        it  which  program to use for which flag. However,  if  all  else 
        fails, and you come upon a format which PolyXarc is not  equipped 
        to handle, you can alter the source to accomodate it, for I  have 
        released  the  source  code  to the  public  domain.  This  gives 
        unlimited  permission  for  anybody  to  alter  it,  providing  a 
        permanent upgrade path.








           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 3


        The  one thing SPAZ has that PolyXarc does not support is the  -A 
        switch. That causes SPAZ to limit itself to using ARCE for  level 
        8 .ARC format files. Since PolyXarc on one hand does not directly 
        support any specific program, and on the other hand is completely 
        configurable,  it makes no sense for PolyXarc to support  the  -A 
        flag.  If -A shows up on the command line, PolyXarc  will  ignore 
        it.

        HOW TO USE IT
        =============
        PolyXarc's syntax is:

          PolyXarc [switches] archive [name [name...]] [switches]

        Recognized switches are:
          -Cconfig               -I                     -O                     
          -D                     -Maddr                 -R                     
          -F[n]                  -N                     -Q                     


        archive

        This is the name of the archive file you wish to extract from, or 
        if  you  specify the -F switch, the path where mail  bundles  are 
        kept. In the former case (not using -F) wildcards are  supported, 
        and  if you do not supply an extension, PolyXarc will  assume  an 
        extension  of  ".*".  In the latter case  (using  -F),  you  must 
        specify a path ONLY, not a file name. Also, if you specify -F you 
        may leave the archive path out completely and it will default  to 
        the current directory.

        name...

        This  is  one or more names of files within the archive.  If  you 
        list  more  than one name, they should be  separated  by  spaces. 
        These are optional. If left off, most extractors will assume that 
        you  want  to  extract all files. The  only  limitation  is  that 
        PolyXarc expects an archive or path name first, then zero or more 
        file names. PolyXarc considers any parameter that does not  start 
        with a switch character ('-' or '/') to be a file name or archive 
        name.

        -Cconfig

        This  is the file that contains signature information. If  it  is 
        not in your current subdirectory you must supply a path to it  as 
        well.  The file name must follow the switch with  no  intervening 
        spaces.  If you do not specify -C on the command  line,  PolyXarc 
        will  look first for POLYXARC.CFG in the current directory,  then 
        in  the  directory of the .EXE file, and if not found  will  then 
        look for a PENGUIN file (described later).






           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 4


        -D

        Specifying this will cause PolyXarc to delete any archive file it 
        successfully has extracted. Specifying the /F parameter causes /D 
        to be enabled unconditionally.

        -F[n]

        -F  will  cause PolyXarc to extract all the mail bundles  it  can 
        find in the subdirectory specified by 'archive'. If you specify a 
        number, PolyXarc will extract up to that number of mail  bundles. 
        'archive'  must  be a path, not a filename.  Specifying  -F  also 
        forces -D and -O, and disables -N.

        -I

        I  really  mostly intended this for using in some  manual  modes. 
        This  switch will cause PolyXarc to ignore errors  returned  from 
        the archivers.

        -Maddr

        This causes PolyXarc to calculate and display the net address  of 
        the  sender of each bundle. 'addr' is your netmail  address,  net 
        and node only. Example: -M273/715.

        -N
        Specifying  this causes PolyXarc not to attempt to  sort  an .ARC 
        format file. Normal action is to sort all .ARC format files  (the 
        files inside the archive) by the date and time stamps. Specifying 
        -F will override -N and cause PolyXarc to sort the files.

        -O

        This is the overwrite flag. If the archiver finds that a file  it 
        is  extracting  already  exists,  this  flag  will  cause  it  to 
        overwrite  the  old  file without prompting.  Specifying  the  -F 
        parameter causes -O to be enabled unconditionally.

        -R

        This  is  the same as -O. I included it to maintain  ARCE  syntax 
        compatibility.

        -Q

        Specifying  -Q  causes PolyXarc not to display  most  status  and 
        configuration messages. Error messages will still be displayed.

             NOTE:  None of these switches is position dependent  or 
             case sensitive. Any may be specified with a hyphen  (-) 
             or  a slash (/). If you specify a question mark (?)  as 
             the first parameter, you will get a brief help  message 
             and no processing will be done.



           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 5


        PolyXarc  requires a single configuration file. This  file  gives 
        PolyXarc enough information to recognize various archive  formats 
        by  looking  for specific 'signatures'. There are three  ways  to 
        specify the configuration file name:
         - just  have  POLYXARC.CFG in the current directory, or  in  the 
           directory with POLYXARC.EXE;
         - use the -C switch to specify it on the command line;
         - use a PENGUIN file.

        Doug  Boone has proposed an idea that I rather like; rather  than 
        proliferate configuration files, gather as many as possible  into 
        one file. His implementation requires you to specify the file  in 
        an  environment  variable called PENGUIN. PolyXarc  may  use  the 
        PENGUIN technique if you have set the environment variable.

        This  is  how  PolyXarc decides which technique to  use:  if  you 
        specify  a  file  on the command line  using  the  -C  parameter, 
        PolyXarc  will  use  that  file. If  you  don't  specify  the  -C 
        parameter,  PolyXarc  will look for POLYXARC.CFG in  the  current 
        directory.  If it doesn't exist, PolyXarc will then look  in  the 
        directory  that it was executed from. If it isn't  there  either, 
        PolyXarc  will look for the PENGUIN environment variable.  If  it 
        exists, PolyXarc will use that file; otherwise, it will exit with 
        an error. 

        To  set the environment variable, you would put a statement  like 
        this into your AUTOEXEC.BAT file:

          SET PENGUIN=D:\OPUS\PENGUIN.CFG

        There  are  three  keywords  that  PolyXarc  recognizes  in   the 
        configuration file, plus BEGIN and END statements. PolyXarc  will 
        not  recognize  any  keywords  until it  sees  a  BEGIN  POLYXARC 
        statement,  and  will  ignore  anything  after  an  END  POLYXARC 
        statement  until it sees another BEGIN POLYXARC, if any. You  may 
        have  as  many of these blocks as you want in  the  configuration 
        file,  and they may overlap other programs' blocks if  you  wish. 
        PolyXarc will ignore anything it doesn't recognize. The  keywords 
        are:

          ARC
          SIGNATURE
          NOSORT

        Before  I  describe  the keyword syntaxes,  let  me  explain  the 
        command  templates they use. A command template is a  description 
        of  the command line used to call a certain program. The  general 
        syntax of a command line consists of the command, followed by one 
        or more parameters, the overwrite flag key, the extract flag key, 
        the filename key, and/or the extractname key. The parameters  and 
        keys  may be in different orders depending on the syntax  of  the 
        particular archive extractor you are using.





           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 6


        The command is the actual DOS command (such as ZOO or ARCE).  The 
        parameters are any parameters needed to cause an extraction.  The 
        extract flag key is actually an 'insert flag' that tells PolyXarc 
        where to put the extract switch if it is needed. The extract flag 
        key  is "%1". The overwrite flag key, "%2", tells PolyXarc  where 
        to  put the overwrite switch if it is needed. The  filename  key, 
        "%3", is the name of the archive to extract from, and is also  an 
        insert  flag.  The extractname key, "%4", is a list of  names  of 
        files  to extract from the archive. Another way of looking at  it 
        is that you take the general syntax for the particular  archiver, 
        put a %1 in place of the extract switch (if any... most archivers 
        will  not  actually  use  this field), a  %2   in  place  of  the 
        overwrite switch, a %3 in place of the archive filename, and a %4 
        where  the list of files would be if you wanted to specify  which 
        files to extract. The actual order of these parameters depends on 
        what each particular archiver expects to see.

        As  mentioned  above, most extractors will not need  the  extract 
        flag  key. In most cases, the extractors have an extract  command 
        (either  explicit  or  implied) that is  used  whether  you  want 
        automatic  overwrite or not; to overwrite, you use the  overwrite 
        command  in addition to the extract command. I have not seen  any 
        archive  extractor  that  uses a separate  command  for  extract-
        without-overwrite versus extract-with-overwrite; nevertheless,  I 
        would  rather include support for a redundant parameter now  than 
        have  to 'fix' it later. In any case, the extract flag  key  will 
        only  be  used if overwrite option is not invoked  on  PolyXarc's 
        command line, and the overwrite flag key will only be used if the 
        overwrite option is invoked. If there is an extract command  that 
        is  used  whether or not overwrite is invoked,  then  you  should 
        include it as a permanent part of the template for that command.

        General syntax for the ARC keyword:

          ARC level overwrite extract command_template

        PolyXarc recognizes .ARC format files internally. The ARC command 
        spec  defines a level (known as a 'compression type' by  the .ARC 
        compatible  programs)  at  which  a  certain  program  should  be 
        executed.  The lower numbers must be specified first, as a  level 
        will  actually  specify its own level and all below it,  and  the 
        levels will be searched in definition order. For example, if  you 
        specify:
           ARC        8    /r  !  arce %3 %4 %2
           ARC       11   /WA  !  pak /e %2 %3 %4
        then  'arce'  will  be  used for all  archives  that  have  files 
        compressed  to  levels 0 through 8. If an  archive  has  anything 
        compressed  with levels 9, 10, or 11, 'pak' will be used. If  you 
        specify 'pak' first:
           ARC       11   /WA  !  pak /e %2 %3 %4
           ARC        8    /r  !  arce %3 %4 %2
        then  'arce'  will never be used, because PolyXarc will  use  the 
        first definition that has the same level or above as the  highest 
        level in the .ARC file.



           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 7


        Overwrite is the parameter this particular archive program  needs 
        in   order  to  automatically  overwrite  existing  files   while 
        extracting. The overwrite parameter must be 5 characters or less.

        Extract is the parameter this particular archive program needs in 
        order to extract files. This is only used for programs which  use 
        different  commmands for 'extract' and 'extract  overwrite'.  The 
        extract parameter must be 5 characters or less.

        In  most cases you will not need the extract parameter.  However, 
        there  must  be at least one character defined  for  each  field, 
        whether it is used or not. In my distributed configuration file I 
        simply put in a single exclamation mark (!) as a place holder.

        The command template defines the command line (including the name 
        of  the arc file) used to invoke the necessary de-archiver.  

        Here are some examples from the distributed configuration file:

        ;
        ; Make sure these are in order from low to high level!
        ;
           ARC        8    /r  !  arce %3 %4 %2
           ARC        9    /r  !  pkunpak %2 %3 %4
           ARC       11   /WA  !  pak /e %2 %3 %4

        Notice  that the level numbers go from low to high. Here  is  the 
        PAK  command  line, first with overwrite  enabled  then  without, 
        after expansion:

          pak /e /WA 0000FFFB.MO2 
          pak /e  0000FFFB.MO2 

        General syntax for the SIGNATURE keyword:

          SIGNATURE type offset signature overwrite [extract] template

        This command allows PolyXarc to automatically recognize different 
        compressor   archive  formats  without  being  locked  into   any 
        particular format or set of formats. The only limitation is  that 
        the   signatures   must  consist  of   non-whitespace   printable 
        characters. .ARC  files  are  not  a  problem  because  they  are 
        recognized  by PolyXarc internally. Note that you should not  try 
        to  define  a  SIGNATURE for .ARC  format  files.  The  different 
        signatures  will be looked for in order of definition,  with .ARC 
        files coming last. 

        Offset is the offset into the file of the signature. If offset is 
        preceeded by a '-', PolyXarc will count backwards from the end of 
        the file. 

        Signature  is the text of the signature. The signature  parameter 
        must be 5 characters or less.




           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 8


        Overwrite is the parameter this particular archive program  needs 
        in   order  to  automatically  overwrite  existing  files   while 
        extracting. The overwrite parameter must be 5 characters or less.

        Extract is the parameter this particular archive program needs in 
        order to extract files. This is only used for programs which  use 
        different  commmands for 'extract' and 'extract  overwrite'.  The 
        extract parameter must be 5 characters or less.

        In  most cases you will not need the extract parameter.  However, 
        there  must  be at least one character defined  for  each  field, 
        whether it is used or not. In my distributed configuration file I 
        simply put in a single exclamation mark (!) as a place holder.

        The command template defines the command line (including the name 
        of the arc file) used to invoke the necessary de-archiver.

        Here are some examples from the distributed configuration file:
        ;
           SIGNATURE  -3   DWC      w    !    dwc e%2 %3 %4
           SIGNATURE   2  -lh     /mc    !    lharc e %2 %3 %4
           SIGNATURE   0   ZOO      O    !    zoo x%2 %3 %4
           SIGNATURE   0   PK      -o    !    pkunzip %2 %3 %4

        An expanded Zoo command line would look like this, first without, 
        then with overwrite:

          zoo x 0000FFFB.MO2 
          zoo xO 0000FFFB.MO2 

        The NOSORT keyword syntax is:

          NOSORT

        If specified, NOSORT causes PolyXarc not to attempt to sort  .ARC 
        format  files. This is almost identical to the -N switch  on  the 
        command  line. If neither is specified, PolyXarc will attempt  to 
        sort the files in .ARC format archives by date and time in  order 
        to keep mail packets in chronological order. This is in case  the 
        sender used PKPAK or PKARC to pack his mail; these programs  sort 
        the  files  in  the archive  alphabetically,  which  occasionally 
        causes  messages  to  get out of order. Note that  while  the  -F 
        switch will override the -N switch, the NOSORT keyword  overrides 
        the -F switch, causing PolyXarc never to sort.

        TECHNICAL SECTION
        =================
        PolyXarc  figures  out what format is being used by  looking  for 
        certain  characteristics  in the file. For  example,  each  PKZIP 
        format file has "PK" as the first two bytes in the file. All .ARC 
        format files have a hex 1A value as the first byte. The SIGNATURE 
        lines in the configuration file describe what to look for,  where 
        to look for it, and what command syntax to use if PolyXarc  finds 
        it.



           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 9


         ...except for .ARC files. Over the years .ARC files have added a 
        number  of  different formats. Each time one is  added,  it  gets 
        assigned  a  level. The standard programs (ARC, ARCA,  and  ARCE) 
        support up to level 8. PKPAK added level 9, and PAK added  levels 
        10  and 11. Due to program size and other considerations,  people 
        often  prefer  to use the minimum program required for  the  job, 
        even  though (so far) all the programs support all the levels  up 
        to  the  ones  they have defined. In other  words,  PKUNPAK  will 
        extract  arc files up to level 9, and PAK supports extraction  of 
        all  levels  up  to  11. So if you want,  you  can  use  PAK  for 
        extraction  all  the  time.  Either way, the  ARC  lines  in  the 
        configuration  file describe what to look for, and  what  command 
        syntax  to  use if PolyXarc finds it. Each ARC line has  a  level 
        number.  That  number  is the highest  level  that  program  will 
        support, not just the only number. So you want to define the  ARC 
        keywords  in ascending order of level so that PolyXarc  uses  the 
        minimum program for the highest level found in the archive.

        EXAMPLES
        ========
        I have my incoming and outgoing mail on two different drives.  To 
        optimize  space  usage  on my outgoing  drive,  I  have  PolyXarc 
        extract  the  archives into the incoming directory before  I  let 
        QMail loose on them. Therefore my batch file looks something like 
        this:

           C:
           CD \NET\FILE
           PolyXarc -F -M273/715
           E:
           QM TOSS etc.

        If you want to run PolyXarc directly from Opus on a barefoot Opus 
        system,  simply rename POLYXARC.EXE to ARCE.EXE. If  you  already 
        have  the real ARCE someplace on your system, make sure that  the 
        renamed PolyXarc is either earlier in the path, or replaces ARCE, 
        or  is  in the directory you are running it from.  Otherwise,  it 
        will quite likely find the original ARCE first. Remember also, if 
        you  have ARCE.COM and ARCE.EXE in the same directory,  DOS  will 
        always  run  ARCE.COM. Also remember to either  remove  the  ARCE 
        control line from PolyXarc's configuration file, or else  specify 
        ARCE's  path in the control line; otherwise PolyXarc will end  up 
        trying  to execute itself and you will end up with what is  known 
        in the business as 'a mess'.

        To use PolyXarc from Confmail, add the -A parameter to the end of 
        Confmail's command line:

           Confmail import blah blah blah -A PolyXarc -o -m273/715








           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 10


        RETURN CODES &c.
        ================
           0 = No error.
           1 = Can't open/read source (missing or invalid file).
           2 = Can't create/write dest (maybe disk or directory full).
           3 = Can't delete old source file (read-only).
           4 = Can't rename temp file to source name.
           5 = Out of memory.
           6 = Cannot determine archive type.
           7 = Configuration file syntax error.
           8 = Error was returned from the archive extractor.
           9 = File was zero-length.
          10 = DOS error while trying to execute the extractor.

        Note that if you use the -f switch, not all of these errors  will 
        be returned. In the case of a zero-length file, PolyXarc -f  will 
        simply give a warning message, delete the file, and go on. In the 
        cases  of  errors 3, 6 and 8, PolyXarc will rename  the  file  to 
        BAD_ARC.???  (where ??? is an ascending decimal  number  starting 
        from 000) and go on.

        CAVEATS
        =======
        If  a  file has been transferred via XMODEM, the last  record  of 
        that  file  will  be  padded with nulls.  In  the  case  of  DWC-
        compressed files, this can be a problem. Since PolyXarc looks for 
        "DWC"  at  the  end  of the file, the  null  padding  will  cause 
        PolyXarc  not to recognize a DWC file. I could add enough  smarts 
        into  PolyXarc  to have it search backwards  through  the  nulls. 
        However,  DWC  is not used by many sysops, and  there  are  other 
        programs  that work just as well, so it doesn't seem  worthwhile. 
        This  is a potential problem for any archiver whose signature  is 
        at the end of the file, though at the moment I don't know of  any 
        besides DWC.

        ACKNOWLEDGEMENTS
        ================
        Thanks to Mike Housky for the entire .ARC format  search-and-sort 
        section.  I  shamelessly  stole it straight out  of  his  PAKSORT 
        program.  Thanks  also  to Dan Thomson for  coming  up  with  the 
        original  program,  SPAZ,  and saving me a lot  of  research  and 
        planning. (There is probably no easier program to design than one 
        for which you have a working model!)

        Many  thanks  to  John  Lull and Clay Tinsley,  two  of  my  beta 
        testers. They faithfully ran every revision I put out and let  me 
        know  whenever something went wrong. (Which was often.) Clay  was 
        also responsible for convincing me to write PolyXarc. David Page, 
        another beta tester, also gave some helpful criticism.

        Thanks  to  Steve Palm, who ported PolyXarc over to  use  on  his 
        Amiga, and Bill Andrus, who ported PolyXarc over to OS/2. Between 
        us  we  came up with a set of source files and  definitions  that 
        allow  us  to  compile  the  same  source  under  our  respective 
        operating systems.


           PolyXarc 2.0: multiple-format archive extractor executive
                                                                page 11



        Thanks  to  Thom  Henderson, Phil Katz,  Rahul  Dhesi,  David  W. 
        Cooper,  Nogate  Consulting, Haruyasu  Yoshizaki,  Vernon  Buerg, 
        Wayne  Chin,  and all the other archive program authors  for  all 
        their hard work. Keep it up, guys!

        And as always, thanks to Ward Christensen for everything. 

        -  Jeffrey  J.  Nonken, sysop of Ophiophile  Opus  (1:273/715  in 
        Fidonet) in Blue Bell, Pa., and an all-around great guy.















































           PolyXarc 2.0: multiple-format archive extractor executive