cl_sudo - execute a command with sudo on the cluster


Synopsis:

usage: cl_sudo [options] command
     [see basic options]
     -u user             sudo to given username and run command
     -v                  verbose output
     -V                  really verbose output


Description:
cl_sudo is used to execute a command with super-user (root) priveledges. When executing a command with sudo, the user can (temporarily) kill other users' jobs, install programs, edit root-level files, etc.

In many cases, parallel jobs may crash and leave behind zombie processes or unattached child processes. Such processes often retain their allocated memory even though the program is destroyed. While Unix/Linux will swap these unused memory blocks out of the way, it is a good idea to kill off these programs now and then.

cl_sudo is a Perl script which simply automates the process of rsh-ing to the machines, doing a sudo call with the given command.

As its name implies, cl_sudo runs on top of the regular sudo command. If you don't know what that means, try "man sudo" (it is beyond the scope of this page to explain how to configure sudo). In particular, cl_sudo ALWAYS requests a password and ALWAYS resets the user's sudo timestamp on the local machine. Additionally, as cl_sudo connects to remote machines, it will send the SAME password in order to reset the user's sudo timestamp on the remote machine. This implies that all machines should use the same /etc/sudoers file and same password information, whether by file, NIS, or LDAP (perhaps not a strict requirement, but you can easily get yourself into trouble here).


Options:
The main option is whether you want to actually run as another user or actually run the command as root (very dangerous). Using the -u option allows you kill off only a given user's jobs, instead of killing off all jobs from all users (if you were root).

There really aren't any other options to cl_sudo beyond the verbose-ness of output, -v and -V.


Examples:

cl_sudo -N alpha killall my_big_prog
this will kill all instances of my_big_prog regardless of what user is running them
cl_sudo -u johnny -N bravo "rm -rf /scratch/*"
remove all of user johnny's scratch files on all the bravo machines; note that the quotes are needed to protect the asterisk from expansion by the local shell (and is always a good idea anyway)
cl_sudo -m foo,bar,baz cp /etc/config.rev2 /etc/config
copy the new revision of the config file into its proper place in /etc; presumably you cl_cp-ed the same file to the three machines beforehand


How It Works:
Rather than trying to re-invent the sudo command (which would possibly introduce serious security bugs), we simply leverage the standard Linux sudo command. As such, you have to know what sudo does and how it works ... which is beyond the scope of this page. For more info, try "man sudo" and "man sudoers" and look at the /etc/sudoers file.

We use a multi-step procedure to perform a remote sudo.

  1. In serial mode, ask the user for their password
  2. Use the "sudo -S -l" command to verify that the password is valid and that the user really does have sudo priveledges
  3. Loop across the list of remote machines
    1. Run "sudo -S -v" on the remote machine to validate and get a new sudo time-stamp for that machine
    2. Run the sudo command that the user specified
    3. Run "sudo -K" to clean up the remote sudo time-stamp
Note that we assume all machines have similar /etc/sudoers files, and that all machines have the same user/password information (probably through NIS, LDAP, etc.).

Note also that while we explicitly remove the sudo time-stamp from the remote machines, the time-stamp on the local machine is left valid.


RCSID $Id: cl_sudo.html,v 1.2 2002/03/12 14:11:18 jpormann Exp $