GridHTTP

From GridSiteWiki

GridHTTP is a protocol implemented in GridSite version 1.1.11 onwards, which supports bulk data transfers via unencrypted HTTP, but makes use of authentication and authorization with the usual grid credentials over HTTPS.

Table of contents

Protocol

To initiate a GridHTTP transfer, clients set an Upgrade: GridHTTP/1.0 header when making an HTTPS request for a file. This header notifies the server that the client would prefer to retrieve the file by HTTP rather than HTTPS, if possible. The authentication and authorization are done via HTTPS (X.509, VOMS, GACL etc deciding whether it's ok) and then the server may redirect the client to an HTTP version of the file using a standard HTTP 302 redirect response giving the HTTP URL (which can be on a different server, in the general case.) For small files, the server can choose to return the file over HTTPS as the response body. When contacting a legacy server, the Upgrade header will be silently ignored and the file will be returned via HTTPS as normal.

For redirection to plain HTTP transport, a standard HTTP Set-Cookie header is used to send the client a one-time passcode in the form of a cookie, GRIDHTTP_PASSCODE, which much be presented to obtain the file via HTTP. This one-time passcode only works for the file in question, and only works once: the current implementation stores it in a file and deletes the file when the passcode is used. (This mechanism is no worse than GridFTP for providing an unencrypted data channel: it's vulnerable to man-in-the-middle attacks or snooping to obtain a copy of the requested file, but not vulnerable to replay attacks or to other files being obtained by the attacker.)

As you can see, GridHTTP is really a profile for using the HTTP/1.1 standard, rather than a new protocol or a set of extensions: no new headers or methods are involved.

Ways of extending it to support variable TCP window sizes so it can be used for a mix of long and short distance connections (currently the TCP window size has to be set in the Apache configuration file), and support for third-party transfers using the HTTP COPY method from WebDAV are being added to the GridSite implementation.

Advantages

One big advantage of redirecting to a pure HTTP GET transfer is not just that the server and client don't have to spend CPU en/decrypting it, but that Apache can use the sendfile() system call to tell the kernel to copy it directly from the filesystem to the network socket (or you can use the Linux kernel module HTTP server, which has much the same effect.) This means the data never has to be copied through userspace (the so-called zero copy mode.)

As far as client side APIs go, any client side library which supports HTTP redirects and cookies and lets you add your own headers is sufficient (even the curl command line tool lets you do this, with the -H and -c options, without having to make any modifications to its code.)

From GridSite version 1.1.11, htcp supports GridHTTP redirection, by using the --grid-http option.

Server side configuration

mod_gridsite adds two new Apache configuration file directives to enable GridHTTP support: GridSiteGridHTTP and GridSiteSessionsDir.

GridSiteGridHTTP should be specified in Directory or Location sections of the Apache httpd.conf to enable GridHTTP transfers for the files governed by those sections. (ie GridHTTP can be selectively enabled at the level of individual directories if required.) For HTTPS virtual servers, setting GridSiteGridHTTP on will enable redirects to the file on the corresponding HTTP virtual host, when a request is made to the HTTPS server with the header Upgrade: GridHTTP/1.0 present. For both HTTPS and HTTP virtual servers, the directive GridSiteGridHTTP on will also allow requests to be made with the GRIDHTTP_PASSCODE cookie: if the cookie value matches a valid onetime passcode created by making a request via HTTPS, then the request will be acted upon.

GridSiteSessionsDir is used to change the directory name holding the onetime passcodes (the default is /var/www/sessions) This directory must be writable by the Unix account which the httpd server runs as, and should only be readable by that account for security reasons. GridSiteSessionsDir may only appear once, as part of the main/default server configuration (ie not inside a virtual server.) mod_gridsite will create this directory at startup, but suitable permissions / ownership can be produced manually on variants of RedHat Linux with:

chown apache.apache /var/www/sessions
chmod 0700 /var/www/sessions

Client side examples

To copy a remote file to local disk using htcp, with your X.509 credentials in the standard location (.globus/usercert.pem and .globus/userkey.pem):

bash: htcp --verbose --grid-http \
 https://test.hep.man.ac.uk/1234.txt /tmp/1234.txt
htcp version 1.1.11
https://test.hep.man.ac.uk/1234.txt -> /tmp/1234.txt
Add  Upgrade: GridHTTP/1.0
Enter PEM pass phrase:
Received GridHTTP Auth Cookie: GRIDHTTP_PASSCODE=f269452c1c00b235ukH86g
Received Location: http://test.hep.man.ac.uk/1234.txt
... Found (302)
GridHTTP redirect to http://test.hep.man.ac.uk/1234.txt
... OK (200)

(Without the --verbose option, the command produces no output on success.)

You can achieve the same transfer using the curl command (and a lot more options!):

bash: curl -s --verbose --capath /etc/grid-security/certificates/ \
 --cert $HOME/.globus/usercert.pem --key $HOME/.globus/userkey.pem \
 --location --header 'Upgrade: GridHTTP/1.0' \
 https://test.hep.man.ac.uk/1234.txt > /tmp/1234.txt
* About to connect() to test.hep.man.ac.uk port 443
* Connected to test.hep.man.ac.uk port 443
Enter PEM pass phrase:
> GET /1234.txt HTTP/1.1
User-Agent: curl
Host: test.hep.man.ac.uk
Upgrade: GridHTTP/1.0
< HTTP/1.1 302 Found
< Date: Sat, 10 Sep 2005 20:11:09 GMT
< Server: Apache mod_ssl OpenSSL mod_gridsite
< Set-Cookie: GRIDHTTP_PASSCODE=f48e19e8a48e00719M2ZpV; 
 expires=Sat, 10 Sep 2005 20:16:09 GMT;  
 domain=test.hep.man.ac.uk; path=/1234.txt
< Location: http://test.hep.man.ac.uk/1234.txt
< Content-Length: 0
< Content-Type: text/plain
* Issue another request to this URL: 'http://test.hep.man.ac.uk/1234.txt'
* Connected to test.hep.man.ac.uk port 80
> GET /1234.txt HTTP/1.1
User-Agent: curl
Host: test.hep.man.ac.uk
Cookie: GRIDHTTP_PASSCODE=f48e19e8a48e00719M2ZpV
< HTTP/1.1 200 OK
< Date: Sat, 10 Sep 2005 20:16:10 GMT
< Server: Apache mod_ssl OpenSSL mod_gridsite
< Content-Length: 4
< Content-Type: text/plain

The GridSite Toolbar is a Firefox extension that allows the use of the GridHTTP protocol from within a web browser.