# $Id: README.TeraGrid,v 1.4 2007/04/25 02:59:06 kst Exp $
# $Source: /projects/globus/kst/CVS/public_html/gx-map/README.TeraGrid,v $
# [ Id: README.TeraGrid,v 1.23 2006/12/09 10:54:11 kst Exp $
# [ Source: /projects/globus/kst/CVS/tools/gx-map/README.TeraGrid,v [ 

This is an overview of TeraGrid-specific information for the new
0.5.3.2 release of gx-map.

If you have any questions, please feel free to contact me,
Keith Thompson, <kst@sdsc.edu>.

Installation:
=============

Instructions for installing the gx-map system are in the file INSTALL.
You need to make and edit a copy of the file "sample.conf" to specify
configuration parameters.  For TeraGrid sites, you should use

    GX_PROPAGATE    gx-propagate.in.teragrid
    EXTRAS          TGCDB

The GX_PROPAGATE variable specifies the version of the gx-propagate
command that propagates information from gx-map to the TGCDB.
The EXTRAS variable specifies the TGCDB subsystem, which receives
information from the TGCDB and forwards it into gx-map.

Make sure your system has all the prerequisite commands and modules
needed for gx-map, including the TeraGrid-specific ones (see the
INSTALL file for details), and that the system is able to talk to
the TGCDB.  The configure-gx-map script will check the prerequisites
and report any errors.

I recommend creating a special account "gxmap" to own and run the
gx-map commands.  If you prefer you can use the "globus" account for
this purpose.

At SDSC, a shared NFS-mounted filesystem, /misc/gx-map, is used for
data needed by the gx-map system.  This allows certain commands
(gx-check-requests, gx-db-request, and gx-db-check-requests) to
be run on a single system, with the information being used by all
SDSC systems.  If there is no such shared filesystem, you'll need to
run these commands separately on each system, with information being
propagated through the TGCDB.

There are legitimate concerns about the security of NFS in general.
Naturally, each site will have to make its own decision about any
tradeoffs between security and convenience.

It's most convenient if all systems at a site share the same namespace
(see the NAMESPACE configuration variable in sample.conf).  A namespace
corresponds to a common and consistent set of Unix usernames and
numeric UIDs.  It should also correspond to a resource names in
the TGCDB, but this isn't always the case; for example, there are
currently four distinct TGCDB resource names for SDSC, but only a
single namespace.  (See "TGCDB resource names vs. gx-map namespaces",
below.)

During installation, you will be instructed to edit the tgcdb.db-config
file.  This file contains information that enables access to the TGCDB.
Consult the TeraGrid accounting group for the specific values to use.

As of release 0.5.3.2, the resource_name in the tgcdb.db-config file
may be a wildcard.  The '%' character is a wildcard that matches zero
or more characters (like '*' in a Unix filename pattern).  This may
be used to retrieve mapping information for all resources at a site.
(The generated grid-mapfile on each system will still contain
entries only for users who actually have accounts on that system.)
A wildcard may be specified with a '%' character; for example, the
resource_name for SDSC is "%.sdsc.teragrid".

Use a wildcard *only* if you have consistent Unix user names across all
resources at your site.  If, for example, the Unix user name "fred"
might belong to Fred Smith on foo.yoursite.edu, and to Fred Jones on
bar.yoursite.edu, then you do not have a single namespace; you must
have distinct, disjoint installations of gx-map on those two systems,
and you should not specify a wildcard for the resource_name attribute.

Initial Deployment Plan:
========================

One obvious way to deploy gx-map 0.5.3.2 across TeraGrid sites would
be to install it, enable propagation via gx-propagate, and feed
the existing grid-mapfiles into the system via gx-ingest (see the
corresponding man pages for more information).  The problem with
this is that it would create a huge flurry of AMIE packets (at least
some of which need to be processed manually).

So the plan (at least at SDSC) is:

1. Install and deploy gx-map 0.5.3.2.  Use gx-convert-logs and
   gx-cleanup-log to create a new requests.log file (bypassing
   gx-check-requests and gx-propagate), so existing mappings
   are not propagated through the TGCDB.

2. Temporarily set up the cron job to invoke gx-check-requests with
   the "-nopropagate" option, so new mappings are not yet propagated
   either.

3. Generate the new grid-mapfiles and CA files in a new directory,
   *not* into /etc/grid-security.  Verify that the resulting
   grid-mapfiles are consistent with existing ones.

4. If everything seems to be working correctly, set up symlinks so that
   the new grid-mapfiles appear in /etc/grid-security on each system.

5. Meanwhile, test the propagation code.  Once that's reasonably well
   verified, remove the "-nopropagate" argument to gx-check-requests.
   From that point on, all new mappings should be propagated to
   all sites.

Teragrid sites other than SDSC can follow a similar procedure.

Note that the grid-mapfile subsystem is separate from gx-ca-update;
either can be deployed independently.  For example, if a site doesn't
choose to use gx-map to maintain its grid-mapfiles, it can still use
it to maintain its certificates directory, keeping CRLs up to date.

Cross-Site Propagation:
=======================

gx-map 0.5.3.2 includes support for cross-site propagation
of grid-mapfile entries, via an optional plug-in program
called "gx-propagate".  The TeraGrid-specific version is
"gx-propagate.in.teragrid", which can be specified in the installation
config file (see sample.conf).  By default, the gx-propagate command
is invoked by gx-check-requests for each validated request; use
"gx-check-requests -nopropagate" to disable this.  The gx-propagate
command submits information to the TGCDB (TeraGrid Central Database).

Information from the TGCDB is propagated into the gx-map system at
each site via gx-map's TGCDB subsystem.

TGCDB resource names vs. gx-map namespaces
==========================================
A gx-map namespace is an identifier for a set of consistent usernames
and numeric UIDs, typically for one or more systems at a single site.
For example, SDSC is a single namespace; a user will have the same
username and UID on all SDSC systems (but may or may not have an
account on all systems).

The TGCDB associates each user with a resource name.  For example,
the following TGCDB resource names are associated with SDSC:

    bluegene.sdsc.teragrid
    datastar.sdsc.teragrid
    datastar-p655.sdsc.teragrid
    dtf.sdsc.teragrid

The use of wildcards for the resource_name attribute, added in gx-map
0.5.3.2, addresses this issue.  Other sites can either do something
similar, or use separate a gx-map installation for each resource.

Cron Jobs:
==========

Previous releases of gx-map required multiple cron jobs for different
commands.  As of release 0.5.3, you can instead have a single cron
job that simply invokes the gx-cron-job command.  This requires you
to create the etc/gx-map/gx-cron-job.conf file (so in a sense the
complexity is merely pushed down a level).

Here's a sample cron job, running the gx-cron-job command once
every minute:

* * * * * /usr/local/apps/gx-map-0.5.3.2/sbin/gx-cron-job

And here's a sample gx-cron-job.conf file (this is similar to the
one for chester.sdsc.edu):
============================== CUT HERE ==============================
{
    name DB
    hostname chester.sdsc.edu
    interval 60
    command gx-db-request -full-query
    command gx-db-check-requests
}

{
    name Check Index
    hostname chester.sdsc.edu
    interval 5+3
    command gx-check-index -ca npaci -index /projects/security/PKI/ca-npaci/ca.db.index
    command gx-check-index -ca sdsc  -index /projects/security/PKI/ca-sdsc/ca.db.index
}

{
    name Check Requests
    interval 5+4
    command gx-check-requests
}

{
    name Generate mapfile (chester)
    hostname chester.sdsc.edu
    interval 5
    command gx-gen-mapfile -gt2-compat /usr/local/apps/grid-security-%VERSION%/%HOSTNAME%.grid-mapfile
    command gx-gen-mapfile -all        /usr/local/apps/grid-security-%VERSION%/ALL.grid-mapfile
}

{
    name CA update (chester)
    hostname chester.sdsc.edu
    interval 30+5
    command gx-ca-update -target-dir /usr/local/apps/grid-security-%VERSION%/certificates
}
============================== CUT HERE ==============================

The "DB" section runs once an hour and downloads information from
the TGCDB.  To avoid overloading the database server, other sites
might want to stagger their DB jobs ("interval 60+5", "interval
60+10", etc.).  The "hostname" line causes this to be done only on
chester.sdsc.edu, even if gx-cron-job is run on other machines.
As written, this does a full query of the database and looks for
changes.  Other approaches are possible; see the gx-db-request(8)
man page (also available on the gx-map web page) for more information.

The "Check Index" section runs ever 5 minutes.  This is specific to
SDSC; it processes updates in the index files for the NPACI and SDSC
CAs (Certificate Authorities).

The "Check Requests" section invokes gx-check-requests every 5 minutes.

The "Generate mapfile" section generates new grid-mapfiles as needed.
In this example, "%VERSION%" expands to the current version number
for gx-map, and "%HOSTNAME%" expands to the current hostname.

The "CA update" section invokes gx-ca-update to update the
certificates directory.  /etc/grid-security/grid-mapfile and
/etc/grid-security/certificates can be symbolic links into
/usr/local/apps (or whererer you prefer to put things), or you can
update the /etc/grid-security directory directly.

Since SDSC uses a shared NFS-mounted data directory, other systems
within SDSC can invoke gx-check-requests with the "-nodownload"
option, to extract files from the cache rather than having each system
re-download them.

By default, gx-ca-update will not install a certificate if there is
no current corresponding CRL.  If a CRL expires and a new one is not
available, the old one will be left in place, effectively disabling all
certificates issued by the CA.  This behavior is designed to prevent
a revoked user or host certificate from being recognized (which is
the whole point of being able to revoke a certificate), but at the
cost of being unable to use potentially valid certificates in some
circumstances.

There are several options available if you wish to sacrifice security
for convenience.  See the gx-ca-update(8) man page for details.

The gx-ca-update command writes log messages to the file
gx-map-data/gx-ca-update.log under the gx-map installation directory.
In a future release, it may optionally use the syslog facility as well.

To process information from the TGCDB (either by querying the database
directly or by processing AMIE packets), chester.sdsc.edu will run
cron jobs similar to the following:

# minute hour mday month wday command
0 * * * * /usr/local/apps/gx-map-0.5.3.2/sbin/gx-db-request -full-query ; \
	  /usr/local/apps/gx-map-0.5.3.2/sbin/gx-db-check-requests

This causes a full query of the TGCDB to be performed once an
hour, invoking the gx-request command as needed for any changes.
The interval on which this is done may have to be tuned to minimize
the impact on the database server.  I suggest staggering the timing
of these jobs across TeraGrid sites.

Other approaches are possible.  See the gx-db-request(8) man page
(also available on the gx-map web page) for more information.

Other TeraGrid sites will need to run similar cron jobs.

The gx-check-requests, gx-db-request and gx-db-check-requests commands
need to be run on at least one system per site.  (They will result in
calls to the gx-request command.)  If there is a shared filesystem,
they only need to be run on one system; otherwise, they need to be
run on each system that needs a grid-mapfile.

The gx-gen-mapfile and gx-ca-update commands need to be run on each
system that needs a grid-mapfile and a certificates directory.

                -- Keith Thompson <kst@sdsc.edu> Sat 2006-12-09