NAME
    Set::Files - routines to work with files, each definining a single set

SYNOPSIS
      use Set::Files;
      $Version = $Set::Files::VERSION;

      $obj     = new Set::Files(OPT => VAL, OPT => VAL, ...);

      @set     = $obj->list_sets( [TYPE] );

      @uid     = $obj->owner;
      $uid     = $obj->owner(SET);

      @set     = $obj->owned_by(UID [,TYPE]);

      @ele     = $obj->members(SET);

      $flag    = $obj->is_member(SET, ELE);

      @type    = $obj->list_types( [SET] );

      @dir     = $obj->dir;
      $dir     = $obj->dir(SET);

      %opts    = $obj->opts(SET);
      $val     = $obj->opts(SET,VAR);

      $obj->cache;

      $num     = $obj->add   (SET, FORCE, COMMIT, ELE1,ELE2,...);
      $num     = $obj->remove(SET, ELE1,ELE2,...);

      $obj->commit(SET1,SET2,...);

      $obj->delete(SET);
      $obj->delete(SET,1);

DESCRIPTION
    This is a module for working with simple sets of elements where each set
    is defined in a separate file (one file for each set to be defined).

    The advantages of putting each set in a separate file are:

    Set managment can be delegated
        If all sets are defined in a single file, management of all sets
        must be done by a single user, or by using a suid program. By
        putting each set in a separate file, different files can be owned by
        different users so management of different sets can be delegated.

    Set files are a simple format
        Because a file consists of a single set only, there is no need to
        have a complex file format which has to be parsed to get information
        about the set. As a result, set files can easily be autogenerated or
        edited with any simple text editor, and errors are less likely to be
        introduced into the file.

    The disadvantages are:

    Permissions problems
        Some applications may need to read all of the data, but since the
        different set files may be owned by different people, permissions
        may get set such that not all set files are readable.

        Applications which actually gather all of the data will need to be
        run as root in order to be reliable. Alternately, some means of
        enforcing the appropriate permissions needs to be in place.

    No central data location
        Usually, when you want to define sets, the data ultimately needs to
        be stored in one central location (which might be a single file or
        database).

        To get around this, a wrapper must be written using this module to
        copy the data to the central location.

    Simple elements only
        Many types of sets have elements which have attributes (for example,
        a ranking within the set or some other attribute). When you start
        adding attributes, you need a more complex file structure in order
        to store this information, so that type of set is not addressed with
        this module. The only attribute that an element has is membership in
        the set.

    Slow data access
        Because the data is spread out over several files, each of which
        must be parsed, and any error checking done, accessing the data can
        be significantly slower than if the data were stored in a central
        location.

    Features of this module include:

    Data caching
        This module provides routines for caching the information from all
        the set files. This can be used to avoid the permissions problems
        (allowing user run applications access to all cached data) and
        decrease access time (no parsing is left, and error checking can be
        done prior to caching the information).

        This still requires that a privileged user or suid script be used to
        update the cache.

    Multiple type of sets
        Often, it is conveniant to define different types of sets using a
        single set of files as there may be considerable overlap between the
        sets of different types.

        For example, it might be useful to create files containing sets of
        users who belong to different committees in a department. Also,
        there might be sets of users who belong to various departmental
        mailing lists. One solution is to have two different directories,
        one with set files with lists of users on the various committees;
        one with set files with lists of users on each mailing list. Since
        there might be overlap between these groups, it might be nice to
        have the two sets of files overlap. For example, some committees may
        want to have a mailing list associated with the group, others don't
        want a mailing list, and there may be mailing lists not associated
        with a committee.

        This allows you to have a single file for each set of users, but
        some sets will have mailing lists, some will be committees, and some
        will be both.

    Set ownership
        Since the different files may be owned by different people,
        operations based on set ownership can be done.

METHODS
    The following methods are available:

    VERSION
          use Set::Files;
          $Version=$Set::Files::VERSION;

        Check the module version.

    new
          $obj = new Set::Files(OPT => VAL, OPT => VAL, ...);

        This creates a new Set::Files object which reads the appropriate set
        files (or a cache of the information in set files). The
        initialization options available are described below.

    list_sets
          @set     = $obj->list_sets( [TYPE] );

        Returns a list of all defined sets or the sets of the specified
        type.

    owner
          @uid     = $obj->owner;
          $uid     = $obj->owner(SET);

        Lists all UIDs who own a set, or the owner of the specified set.

    owned_by
          @set     = $obj->owned_by(UID [,TYPE]);

        Lists all sets owned by the specified UID (or those of a specific
        type).

    members
          @ele     = $obj->members(SET);

        Lists all elements in the specified set.

    is_member
          $flag    = $obj->is_member(SET, ELE);

        Returns 1 if ELE is a member of SET.

    list_types
          @type    = $obj->list_types( [SET] );

        A list of all types defined, or the types that the specified set
        belong to.

    dir
          @dir     = $obj->dir;
          $dir     = $obj->dir(SET);

        All directories containing set files, or the directory containing
        the file of the specified set.

    opts
          %opts    = $obj->opts(SET);
          $val     = $obj->opts(SET,VAR);

        Returns a hash of all options set for a set, or the value of a
        specific option. If the specific option is not set, 0 is returned.

    delete
          $obj->delete($set);
          $obj->delete($set,1);

        This removes the specified set file. By default, it renames the set
        file to .set_files.$set (which are ignored when reading in set
        data). If the optional second argument is passed in, no backup is
        made (i.e. the set file is deleted completely).

        This method is only available to those who have write access to the
        directory containing the set file.

    cache
          $obj->cache;

        This dumps the current set information to a cache file. This method
        is only valid if the data was read in from files. If it was read in
        from the cache, this method will fail.

    add, remove
          $num = $obj->add   (SET, FORCE, COMMIT, ELE1,ELE2,...);
          $num = $obj->remove(SET, FORCE, COMMIT, ELE1,ELE2,...);

        These functions add/remove the specified elements to/from the set.

        When adding elements to a set, it is first checked to see if the
        element is already in the set, and if so, whether it is explicitely
        excluded in the set file, or comes from some other set file via. an
        INCLUDE tag.

        If the element is not in the set, it is added. If the FORCE flag is
        true, the element will be added to the set file explicitly if it is
        already in the set, but only via. an INCLUDE tag. In either case,
        any OMIT tag which removes this element will be removed from the
        list.

        When removing elements from a set, a similar set of tests are done.
        If the element is in the set, it is removed from the file (if it
        appears in the file) AND a OMIT tag is included. If the element does
        NOT appear in the set, the file is unmodified unless the FORCE flag
        is true, in which case an OMIT tag is added.

        The COMMIT flag is used to determine whether the file should be
        written out over the existing file. The file can only be written out
        if data was read from the files. If it was read in from the cache,
        this will fail.

        The return value is the number of changes made to the set.

    commit
          $obj->commit(SET1,SET2,...);

        Any changes that have been made with the add and remove methods can
        be written out to the set file(s) with this method. This method is
        only valid if the data was read in from files. If it was read in
        from the cache, this method will fail.

INIT OPTIONS
    The following options can be passed in to the new method:

    path
          path => DIR1:DIR2:...
          path => [ DIR1, DIR2, ... ]

        The set files may be stored in one or more different directories. By
        default, set files are assumed to be in the current directory, but
        using this option, the directory (or directories) can be explicitely
        set.

        One thing to note. If multiple directories are used, and a file of
        the same name exists in more than one of the directories, the first
        one found (in the order that the directories are included in the
        list) is used. A warning will be issued for files of the same name
        in other directories, but they will be ignored.

        Warnings will be issued for unreadable directories, or unreadable
        files within a directory.

    valid_file
          valid_file => REGEXP
          valid_file => !REGEXP
          valid_file => \&FUNCTION

        By default, all files in the directories are used. With this option,
        filenames are tested and only those that pass will be used. Others
        will be silently ignored.

        REGEXP is a regular expression. Only filenames which match the
        REGEXP will pass (or if !REGEXP is used, only filenames which do NOT
        match REGEXP will pass).

        If a reference to a function is passed in, the function
        &FUNCTION(dir,file) will be evaluated for each file. If it returns
        0, the file will be silently ignored. Otherwise it will be used.

    invalid_quiet
          invalid_quiet = 1

        By default, when a file is ignored due to failing a valid_file test,
        or when an element is ignored due to failing a valid_ele test, a
        warning is issued. With this option, no warning is issued.

    cache
          cache => DIR

        Data from the set files may be cached in order to speed up data
        access. If this option is used, you must specify the directory where
        the data will be cached. The directory may be the same as one of the
        directories containing the set files.

        The cache directory defaults to the first directory given in the
        path option (or the current directory if no path option is given).

    read
          read => "cache"
          read => "files"
          read => "file"

        When an application wants to use data from the set files, they can
        either read the data from set files or the cache.

        If the cache option was used, the default is to read from the cache
        if it exists, read from the files otherwise. If no cache option was
        used, the default is to read from the files. When data is read in
        from the cache, the commit and cache methods are disabled.

        If the file option is used, it reads a single set from a single file
        along with all dependancy sets (i.e. sets that are included or
        excluded via. the appropriate tags). This allows someone to make
        changes to a single set file that they own even if permissions are
        set so that they cannot read other set files. The commit method is
        available, but the cache method is disabled. The file option
        requires that the set option also be present.

        With the files option, all set files are read. Both the commit and
        cache methods are enabled.

    set
          set => SET

        This defines which set to read when the read = file> option is used.
        This option is required when read = file> and ignored for any other
        value for read.

    types
          types => TYPE
          types => [ TYPE1, TYPE2, ... ]

        Sets can be of one or more types (or they can belong to no type and
        be used solely in building other sets using the INCLUDE or EXCLUDE
        tags described in the FILE FORMAT section below).

        This option can be used to specify the names of the different types
        of sets defined by these files.

        If this option is not given, then there is only one type and by
        default, all sets belong to it.

    default_types
          default_types => [ TYPEa, TYPEb, ... ]
          default_types => "all"
          default_types => "none"
          default_typew => TYPE

        Some types of sets may be more common than others, and you may or
        may not want to have to explicitely define which types a set belong
        to.

        If a list of types are passed in, every type must be defined in the
        types option (warnings will be issued if they weren't). If a value
        of "all" is passed in, sets belong to all types by default. If a
        value of "none" is passed in, sets don't belong to any type by
        default.

        By default, sets belong to all types available.

    comment
          comment => REGEXP

        This defines a regular expression used to recognize (and strip out)
        comments from a set file. The default expression is "#.*" which
        means that all characters from a pound sign to the end of the line
        are removed.

        If REGEXP is passed in as an empty string, there are no comments.
        All lines are either empty or contain an element.

    tagchars
          tagchars => STRING

        This defines a character (or a string) which marks a line of the set
        file as containing a tag. The default value is "@".

    valid_ele
          valid_ele => REGEXP
          valid_ele => !REGEXP
          valid_ele => \&FUNCTION

        By default, every non-blank line (after comments have been stripped
        out) is treated as an element. If this option is used, elements are
        tested, and only those that pass the test are treated as valid.
        Others are invalid and produce a warning.

        If a reference to a function is passed in, the function
        &FUNCTION(set,ele) will be evaluated for each element. If it returns
        0, the element will be silently ignored. Otherwise it will be
        included in the set.

    scratch
          scratch => DIR

        When automatically updating a set file, the directory where the
        files live may or may not be writable by a user who owns a set file.

        If the directory is writable by the user, there is no problem. In
        this case, when a new set file is written, the old one is backed up
        and the new one written in it's place.

        If the directory is NOT writable by the user, the old copy is backed
        up to the scratch directory. This directory must be writable by the
        user. It defaults to /tmp.

FILE FORMAT
    A set file has a very simple format. It consists of blank lines, tags,
    and elements. Comments may be included as whole lines or part of one of
    the above lines.

    Each line is checked for comments and they are removed before any other
    processing is done. A comment is anything that matches a regular
    expression which can be set using the comment Init option. The default
    regular expression is "#.*" which means that comments start with a pound
    sign anywhere on the line and go to the end of the line.

    Tags are lines which begin with begin with a special string (which can
    be set with the tagchars Init option. The default string is "@". Tag
    lines are of one of the formats:

      @TAG
      @TAG VAL1,VAL2,...

    All other lines are elements. Elements are any string (one per line).

    Leading/trailing spaces are ignored in all cases.

    The set name is the name of the set file.

    The following TAGs are known:

    INCLUDE SET1,SET2,...
        This includes all members of one or more other sets in the current
        set.

    EXCLUDE SET1,SET2,...
        This excludes all members of one or more other sets from the current
        set. This overrides any members included from other sets, but does
        NOT exclude members explicitely included in the set file.

    OMIT ELE
        This exludes a specific element from the current set. This overrides
        any elements included via. an INCLUDE tag, or any elements
        explicitly included in the set file.

        Each element must be specified separately since there is no
        guarantee that elements may not contain commas.

    TYPE TYPE1,TYPE2,...
        The default types that this set belongs to are determined by the
        types and default_types Init options.

        This tag explicitely puts this set if the specified types, even if
        it is not in those types of default.

    NOTYPE TYPE1,TYPE2,....
        Similar to the TYPE tag, but this tag explicitely removes the set
        from the specified types, even if it is in them by default.

    OPTION VARIABLE [= VALUE]
        Although there is no support for element specific attributes, there
        IS support for attributes which apply to the entire set (and which
        can be made available to applications using these sets).

        Each set may have a hash associated with with key/value pairs (if no
        value is include, it defaults to 1). These attributes are available
        using the info method.

    All tag lines can be repeated any number of times, so:

      @INCLUDE foo,bar

    is equivalent to

      @INCLUDE foo
      @INCLUDE bar

    All tags are case insensitive.

    When determining the members of a set which includes and excludes other
    sets, or omits specific elements from the set, all inclusions are
    evaluted first, followed by all exclusions (i.e. all exclusions override
    all inclusions). If there is a cyclic dependancy (i.e. A depends on B
    depends on A where a dependancy can either be an INCLUDE or EXCLUDE), an
    error is reported and the cyclic dependancy is ignored.

    A few examples illustrate the use of INCLUDE, EXCLUDE, and OMIT tags. In
    the examples, the set file A contains the elements: E1, E2, E3. The set
    file B contains the elements: E3, E4, E5. The set file contains the
    following lines:

      @INCLUDE A
      @EXCLUDE B
      E5
      E6

    defines a set contains the elements: E1, E2, E5, E6. The first line
    includes E1, E2, E3. The second line excludes E3. It does NOT exclude E5
    since the EXCLUDE tag does not override elements explicitly included in
    the set file. Finally, the E5 and E6 elements are added.

    The set file containing the following lines:

      @INCLUDE A
      @EXCLUDE B
      @OMIT    E2
      @OMIT    E6
      E5
      E6

    defines a set contains the elements: E1, E5. This is similar to the
    above example, except that the OMIT tags override elements included via.
    the INCLUDE tag AND elements explicitly included in the set file.

FILES
    Several files are used by the Set::Files module. They all live in the
    directory set by the cache Init Option except for set specific files
    which live in the same directory as the set file. Files are:

    .set_files.SET
        A backup of the given set. When a set file is updated, the original
        file is stored in this file. The file is stored either in the same
        directory as the set file (if it is writable) or in the directory
        specified by the scratch Init Option.

    .set_files.SET.new
        A temporary file where a new set file (or the update to an old one)
        is written. Once completed, this file is moved into place as the new
        set file. This file lives in the same directory as the set file or
        in the scratch directory.

    .set_files.cache
        The file containing the cache. This is created using the cache
        method.

    .set_files.template
        When creating a new set file (or updating an existing one), this
        file is used (if it exists) as a starting point and then all the
        data is appended to it. This is a good place to store comments
        describin how to edit the set files, etc., that set file maintainers
        can read for help.

KNOWN PROBLEMS
    None at this point.

LICENSE
    This script is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

AUTHOR
    Sullivan Beck (sbeck@cpan.org)