GRACE Paradigm

From GridSiteWiki

GRACE (GridSite - Apache - CGI - Executables) is a method of using Unix pool accounts to provide partial "sandboxing" of services, which allows remote users to deploy services in the form of scripts and native executables, and so enables third-party service hosting, built on top of Apache/GridSite.

Table of contents

Jobs and pool accounts

In large production grids such as the LHC Computing Grid, there has been a focus on providing support for jobs written as scripts and native binary executables. This has partially reflected the heritage of the applications of these grids, such as High Energy Physics, with its large investment in Fortran/C/C++ analysis and simulation codes.

For this reason, effort has been put into providing native execution environments at remote sites on the grid. One of the issues this approach must deal with is the danger that careless or malicious jobs from one user will interfere with other users' files or programs, and the pool account system developed at Manchester HEP for EDG and adopted by LCG and EGEE has provided one solution.

Scripts and suexec

A not dissimilar problem has been faced in the mainstream web world, where web server administrators have needed to host CGI executables provided by multiple users (perhaps in commercial, third-party service hosting, where no trust relation exists beyond monthly credit card payments of hosting charges.)

The Apache software provides a solution for this by allowing CGI scripts or executables to be run as different Unix users at the level of each virtual host (each apparent website.) This mechanism, named suexec after the wrapper program which it relies on, leads to a very powerful separation of roles using Unix account privileges:

root 
server started as root, binds to privileged ports (80 and 443), access to X.509 certificate private key, creates log files
apache 
processes requests, reads HTML and other files with a URL
CGI users 
accessed via suexec when dynamically creating a CGI response, files protected from other CGI users via Unix filesystem permissions

This mechanism is widely used but in standard Apache is tied to fixed configuration decisions made when the Apache web server is started.

Combining pool accounts and suexec

One of our goals in the GridSite project has been to provide support for thirty party hosting of Web Services for grids, even when the service is written as a script or native executable. This requires some form of sandboxing of users, to prevent them interfering with each other's files or programs, in the same way as must be prevented for remote batch jobs.

To do this, we have combined the pool account system with the Apache suexec mechanism (renamed to gsexec.) As well as providing legacy support compatible with Apache's default, this allows two new modes of operation, which can be configured appropriately for each directory on the GridSite server.

Two new modes of operation

First, a CGI web service can be executed as a Unix pool user associated with the authenticated identity of the client. That is, based on their X.509 certificate or GSI proxy. The lock files associated with the pool mechanism mean that the same client certificate will be associated with the same pool account on subsequent requests (until the account lease expires, and the file space associated with the account is recycled.) This allows services to maintain internal session information in the form of temporary files owned by pool users, and protected from interference by the Unix file permissions system. (It can also be used for other user-like permission systems, such as MySQL databases.)

In the second mode, a pool account is associated with the CGI web services stored in a particular directory. This means that for every remote client, the same Unix account will be used (and the CGI services are therefore responsible for maintaining separation between the sessions of different authenticated users.)

This mode is intended to support third-party services, where user A is given write access to a directory capable of hosting CGI services. Service scripts or executables can be deployed by simply uploading them using GridSite's manual or programmatic interfaces, and then the service can access requests from other users, B1, B2, ... . Because A's service runs as a dedicated pool account, if another user C also has the ability to deploy services to their own directory, then C still cannot interfere with A's files from their distinct pool account.

Without these mechanisms, either all the services must run as the same “apache” or “nobody” Unix account, which permits conflicts between users' actions, or each user must be configured individually by the site administrator, which requires that the server is shut down, all sessions are stopped and the server is started with the new configuration.

GRACE Paradigm

This combination of the ability to manage grid-facing access permissions through GridSite, and local file access permissions via pool accounts allows us to define a new execution model for Web Services in grid environments, which we refer to as GRACE ("GridSite - Apache - CGI - Executables.")

GRACE offers an alternative to the reliance on Java for webservices, and is especially attractive to applications which have a large investment in executable code, or have performance requirements which are not suited to current implementations of Java.

Furthermore, the ability to use standard scripting languages such as Perl, Python and even PHP to provide Web Services offers possibilities of rapid prototyping of simple services, in languages which site administrators and scientists typically use for day to day automation tasks.