Skip to content

SLURM Standalone on AWS

Launch Login Node

To set up a cluster, you will need to import a Flight Solo image.

  1. Go the EC2 instance console

  2. Click "Launch" to go to the EC2 instance setup page.

  3. Set the number of instances to 1, and name of instance to something descriptive.

  4. Confirm that the region(top right, next to username) is correct.

  5. In the "Application and OS Images" section choose the "My AMIs" tab and select your imported solo AMI.

  6. In the "Instance type" section, choose the required instance size.

  7. In the "Keypair" section, select a keypair to use. It is good practice to use the same keypair for the login and compute nodes.

  8. In the "Network settings" sections, click the "Edit" button to set the network and subnet. Remember what these are, as they should be the same for any associated compute nodes.

  9. Another thing needed is a security group to associate with all nodes on the cluster. It is recommended to use a security group with rules limiting traffic through:

    • HTTP
    • HTTPS
    • SSH
    • Port 8888
    • Ports 5900 - 5903
    • All traffic from within the security group should be allowed. (This rule can only be added after creation)

    Note

    If you already have a security group which does this, use it here and make sure to use it again for the compute nodes. Otherwise, a security group can be made from the launch page, or through the security groups page

    Describing exactly how to create a security group is out of scope for this documentation, but covered by the AWS documentation.

    However, here is an example security group that might be used for a Flight Solo cluster:

  10. After a security group has been made, click "Choose Existing" select it from the drop down menu.

  11. In the "Configure Storage" section, allocate as much memory as needed. 8GB is the minimum required for Flight Solo, so it is likely the compute nodes will not need much more than that, as the login node hosts most data.

  12. Finally, click "Launch Instance".

  1. Find the Flight Solo image here or by searching the marketplace for "Flight Solo".

  2. Click "Continue to Subscribe"

  3. Read the terms and conditions, then click "Continue to Configuration"

  4. Configure region, software version (if unsure use the latest), and fulfillment option (if unsure use the default). Then click "Continue to Launch". Make sure the region is the same for all nodes to be used in a cluster.

  5. Click on "Usage Instructions" to see some instructions on how to get started, and a link to this documentation.

  6. Select the "Launch from Website" action.

  7. Choose an instance type to use.

  8. Choose VPC settings. Remember what VPC was used to create this instance, as it should also be used for any associated compute nodes.

  9. Choose a subnet. Remember what subnet was used to create this instance, as it should also be used for any associated compute nodes.

  10. A security group is needed to associate with all nodes on the cluster. It is recommended to use a security group with rules limiting traffic through:

    • HTTP
    • HTTPS
    • SSH
    • Port 8888
    • Ports 5900 - 5903
    • All traffic from within the security group should be allowed. (This rule can only be added after creation)

    Note

    If you already have a security group which does this, use it here and make sure to use it again for the compute nodes. Otherwise, a security group can be made from the launch page, or through the security groups page

    Describing exactly how to create a security group is out of scope for this documentation, but covered by the AWS documentation.

    However, here is an example security group that might be used for a Flight Solo cluster:

    Tip

    The seller's settings (shown below) can be used as a reference for creating a security group.

  11. After a security group has been made, click "Select existing security group" select it from the drop down menu.

  12. Choose what key pair to use. It is good practice for this to be the same on all nodes in a cluster.

  13. Click Launch

General Configuration

Create Node Inventory

  1. Parse your node(s) with the command flight hunter parse.

    1. This will display a list of hunted nodes, for example

      [flight@login-node.novalocal ~]$ flight hunter parse
      Select nodes: (Scroll for more nodes)  login-node.novalocal - 10.10.0.1
         compute-node-1.novalocal - 10.10.101.1
      

    2. Select the desired node to be parsed with Space, and you will be taken to the label editor

      Choose label: login-node.novalocal
      

    3. Here, you can edit the label like plain text

      Choose label: login1
      

      Tip

      You can clear the current node name by pressing Down in the label editor.

    4. When done editing, press Enter to save. The modified node label will appear next to the ip address and original node label.

      Select nodes: login-node.novalocal - 10.10.0.1 (login1) (Scroll for more nodes)  login-node.novalocal - 10.10.0.1 (login1)
         compute-node-1.novalocal - 10.10.101.1
      

    5. From this point, you can either hit Enter to finish parsing and process the selected nodes, or continue changing nodes. Either way, you can return to this list by running flight hunter parse.

    6. Save the node inventory before moving on to the next step.

      Tip

      See flight hunter parse -h for more ways to parse nodes.

Add genders

  1. Optionally, you may add genders to the newly parsed node. For example, in the case that the node should have the gender cluster and all then run the command:
    flight hunter modify-groups --add cluster,all login1
    

SLURM Standalone Configuration

  1. Configure profile

    flight profile configure
    
    1. This brings up a UI, where several options need to be set. Use up and down arrow keys to scroll through options and enter to move to the next option. Options in brackets coloured yellow are the default options that will be applied if nothing is entered.
      • Cluster type: The type of cluster setup needed, in this case select Slurm Standalone.
      • Cluster name: The name of the cluster.
      • Default user: The user that you log in with.
      • Set user password: Set a password to be used for the chosen default user.
      • IP or FQDN for Web Access: As described here, this could be the public IP or public hostname.
  2. Apply an identity by running the command flight profile apply, E.g.

    flight profile apply login1 all-in-one
    

    Tip

    You can check all available identities for the current profile with flight profile identities

  3. Wait for the identity to finish applying. You can check the status of all nodes with flight profile list.

    Tip

    You can watch the progress of the application with flight profile view login1 --watch

Success

Congratulations, you've now created a SLURM Standalone environment! Learn more about SLURM in the HPC Environment docs.

Verifying Functionality

  1. Create a file called simplejobscript.sh, and copy this into it:

    #!/bin/bash -l
    echo "Starting running on host $HOSTNAME"
    sleep 30
    echo "Finished running - goodbye from $HOSTNAME"
    

  2. Run the script with sbatch simplejobscript.sh, and to test all your nodes try queuing up enough jobs that all nodes will have to run.

  3. In the directory that the job was submitted from there should be a slurm-X.out where X is the Job ID returned from the sbatch command. This will contain the echo messages from the script created in step 1