# $Id: README.TeraGrid,v 1.4 2007/04/25 02:59:06 kst Exp $ # $Source: /projects/globus/kst/CVS/public_html/gx-map/README.TeraGrid,v $ # [ Id: README.TeraGrid,v 1.23 2006/12/09 10:54:11 kst Exp $ # [ Source: /projects/globus/kst/CVS/tools/gx-map/README.TeraGrid,v [ This is an overview of TeraGrid-specific information for the new 0.5.3.2 release of gx-map. If you have any questions, please feel free to contact me, Keith Thompson, . Installation: ============= Instructions for installing the gx-map system are in the file INSTALL. You need to make and edit a copy of the file "sample.conf" to specify configuration parameters. For TeraGrid sites, you should use GX_PROPAGATE gx-propagate.in.teragrid EXTRAS TGCDB The GX_PROPAGATE variable specifies the version of the gx-propagate command that propagates information from gx-map to the TGCDB. The EXTRAS variable specifies the TGCDB subsystem, which receives information from the TGCDB and forwards it into gx-map. Make sure your system has all the prerequisite commands and modules needed for gx-map, including the TeraGrid-specific ones (see the INSTALL file for details), and that the system is able to talk to the TGCDB. The configure-gx-map script will check the prerequisites and report any errors. I recommend creating a special account "gxmap" to own and run the gx-map commands. If you prefer you can use the "globus" account for this purpose. At SDSC, a shared NFS-mounted filesystem, /misc/gx-map, is used for data needed by the gx-map system. This allows certain commands (gx-check-requests, gx-db-request, and gx-db-check-requests) to be run on a single system, with the information being used by all SDSC systems. If there is no such shared filesystem, you'll need to run these commands separately on each system, with information being propagated through the TGCDB. There are legitimate concerns about the security of NFS in general. Naturally, each site will have to make its own decision about any tradeoffs between security and convenience. It's most convenient if all systems at a site share the same namespace (see the NAMESPACE configuration variable in sample.conf). A namespace corresponds to a common and consistent set of Unix usernames and numeric UIDs. It should also correspond to a resource names in the TGCDB, but this isn't always the case; for example, there are currently four distinct TGCDB resource names for SDSC, but only a single namespace. (See "TGCDB resource names vs. gx-map namespaces", below.) During installation, you will be instructed to edit the tgcdb.db-config file. This file contains information that enables access to the TGCDB. Consult the TeraGrid accounting group for the specific values to use. As of release 0.5.3.2, the resource_name in the tgcdb.db-config file may be a wildcard. The '%' character is a wildcard that matches zero or more characters (like '*' in a Unix filename pattern). This may be used to retrieve mapping information for all resources at a site. (The generated grid-mapfile on each system will still contain entries only for users who actually have accounts on that system.) A wildcard may be specified with a '%' character; for example, the resource_name for SDSC is "%.sdsc.teragrid". Use a wildcard *only* if you have consistent Unix user names across all resources at your site. If, for example, the Unix user name "fred" might belong to Fred Smith on foo.yoursite.edu, and to Fred Jones on bar.yoursite.edu, then you do not have a single namespace; you must have distinct, disjoint installations of gx-map on those two systems, and you should not specify a wildcard for the resource_name attribute. Initial Deployment Plan: ======================== One obvious way to deploy gx-map 0.5.3.2 across TeraGrid sites would be to install it, enable propagation via gx-propagate, and feed the existing grid-mapfiles into the system via gx-ingest (see the corresponding man pages for more information). The problem with this is that it would create a huge flurry of AMIE packets (at least some of which need to be processed manually). So the plan (at least at SDSC) is: 1. Install and deploy gx-map 0.5.3.2. Use gx-convert-logs and gx-cleanup-log to create a new requests.log file (bypassing gx-check-requests and gx-propagate), so existing mappings are not propagated through the TGCDB. 2. Temporarily set up the cron job to invoke gx-check-requests with the "-nopropagate" option, so new mappings are not yet propagated either. 3. Generate the new grid-mapfiles and CA files in a new directory, *not* into /etc/grid-security. Verify that the resulting grid-mapfiles are consistent with existing ones. 4. If everything seems to be working correctly, set up symlinks so that the new grid-mapfiles appear in /etc/grid-security on each system. 5. Meanwhile, test the propagation code. Once that's reasonably well verified, remove the "-nopropagate" argument to gx-check-requests. From that point on, all new mappings should be propagated to all sites. Teragrid sites other than SDSC can follow a similar procedure. Note that the grid-mapfile subsystem is separate from gx-ca-update; either can be deployed independently. For example, if a site doesn't choose to use gx-map to maintain its grid-mapfiles, it can still use it to maintain its certificates directory, keeping CRLs up to date. Cross-Site Propagation: ======================= gx-map 0.5.3.2 includes support for cross-site propagation of grid-mapfile entries, via an optional plug-in program called "gx-propagate". The TeraGrid-specific version is "gx-propagate.in.teragrid", which can be specified in the installation config file (see sample.conf). By default, the gx-propagate command is invoked by gx-check-requests for each validated request; use "gx-check-requests -nopropagate" to disable this. The gx-propagate command submits information to the TGCDB (TeraGrid Central Database). Information from the TGCDB is propagated into the gx-map system at each site via gx-map's TGCDB subsystem. TGCDB resource names vs. gx-map namespaces ========================================== A gx-map namespace is an identifier for a set of consistent usernames and numeric UIDs, typically for one or more systems at a single site. For example, SDSC is a single namespace; a user will have the same username and UID on all SDSC systems (but may or may not have an account on all systems). The TGCDB associates each user with a resource name. For example, the following TGCDB resource names are associated with SDSC: bluegene.sdsc.teragrid datastar.sdsc.teragrid datastar-p655.sdsc.teragrid dtf.sdsc.teragrid The use of wildcards for the resource_name attribute, added in gx-map 0.5.3.2, addresses this issue. Other sites can either do something similar, or use separate a gx-map installation for each resource. Cron Jobs: ========== Previous releases of gx-map required multiple cron jobs for different commands. As of release 0.5.3, you can instead have a single cron job that simply invokes the gx-cron-job command. This requires you to create the etc/gx-map/gx-cron-job.conf file (so in a sense the complexity is merely pushed down a level). Here's a sample cron job, running the gx-cron-job command once every minute: * * * * * /usr/local/apps/gx-map-0.5.3.2/sbin/gx-cron-job And here's a sample gx-cron-job.conf file (this is similar to the one for chester.sdsc.edu): ============================== CUT HERE ============================== { name DB hostname chester.sdsc.edu interval 60 command gx-db-request -full-query command gx-db-check-requests } { name Check Index hostname chester.sdsc.edu interval 5+3 command gx-check-index -ca npaci -index /projects/security/PKI/ca-npaci/ca.db.index command gx-check-index -ca sdsc -index /projects/security/PKI/ca-sdsc/ca.db.index } { name Check Requests interval 5+4 command gx-check-requests } { name Generate mapfile (chester) hostname chester.sdsc.edu interval 5 command gx-gen-mapfile -gt2-compat /usr/local/apps/grid-security-%VERSION%/%HOSTNAME%.grid-mapfile command gx-gen-mapfile -all /usr/local/apps/grid-security-%VERSION%/ALL.grid-mapfile } { name CA update (chester) hostname chester.sdsc.edu interval 30+5 command gx-ca-update -target-dir /usr/local/apps/grid-security-%VERSION%/certificates } ============================== CUT HERE ============================== The "DB" section runs once an hour and downloads information from the TGCDB. To avoid overloading the database server, other sites might want to stagger their DB jobs ("interval 60+5", "interval 60+10", etc.). The "hostname" line causes this to be done only on chester.sdsc.edu, even if gx-cron-job is run on other machines. As written, this does a full query of the database and looks for changes. Other approaches are possible; see the gx-db-request(8) man page (also available on the gx-map web page) for more information. The "Check Index" section runs ever 5 minutes. This is specific to SDSC; it processes updates in the index files for the NPACI and SDSC CAs (Certificate Authorities). The "Check Requests" section invokes gx-check-requests every 5 minutes. The "Generate mapfile" section generates new grid-mapfiles as needed. In this example, "%VERSION%" expands to the current version number for gx-map, and "%HOSTNAME%" expands to the current hostname. The "CA update" section invokes gx-ca-update to update the certificates directory. /etc/grid-security/grid-mapfile and /etc/grid-security/certificates can be symbolic links into /usr/local/apps (or whererer you prefer to put things), or you can update the /etc/grid-security directory directly. Since SDSC uses a shared NFS-mounted data directory, other systems within SDSC can invoke gx-check-requests with the "-nodownload" option, to extract files from the cache rather than having each system re-download them. By default, gx-ca-update will not install a certificate if there is no current corresponding CRL. If a CRL expires and a new one is not available, the old one will be left in place, effectively disabling all certificates issued by the CA. This behavior is designed to prevent a revoked user or host certificate from being recognized (which is the whole point of being able to revoke a certificate), but at the cost of being unable to use potentially valid certificates in some circumstances. There are several options available if you wish to sacrifice security for convenience. See the gx-ca-update(8) man page for details. The gx-ca-update command writes log messages to the file gx-map-data/gx-ca-update.log under the gx-map installation directory. In a future release, it may optionally use the syslog facility as well. To process information from the TGCDB (either by querying the database directly or by processing AMIE packets), chester.sdsc.edu will run cron jobs similar to the following: # minute hour mday month wday command 0 * * * * /usr/local/apps/gx-map-0.5.3.2/sbin/gx-db-request -full-query ; \ /usr/local/apps/gx-map-0.5.3.2/sbin/gx-db-check-requests This causes a full query of the TGCDB to be performed once an hour, invoking the gx-request command as needed for any changes. The interval on which this is done may have to be tuned to minimize the impact on the database server. I suggest staggering the timing of these jobs across TeraGrid sites. Other approaches are possible. See the gx-db-request(8) man page (also available on the gx-map web page) for more information. Other TeraGrid sites will need to run similar cron jobs. The gx-check-requests, gx-db-request and gx-db-check-requests commands need to be run on at least one system per site. (They will result in calls to the gx-request command.) If there is a shared filesystem, they only need to be run on one system; otherwise, they need to be run on each system that needs a grid-mapfile. The gx-gen-mapfile and gx-ca-update commands need to be run on each system that needs a grid-mapfile and a certificates directory. -- Keith Thompson Sat 2006-12-09