Why use a Cluster?
- High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
- These other systems can be used to do work that would either be impossible or much slower on smaller systems.
- HPC resources are shared by multiple users.
- The standard method of interacting with such systems is via a command line interface.
Connecting to a remote HPC system
- An HPC system is a set of networked machines.
- HPC systems typically provide login nodes and a set of worker nodes.
- The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).
- Files saved on one node are available on all nodes.
Exploring Remote Resources
- An HPC system is a set of networked machines.
- HPC systems typically provide login nodes and a set of compute nodes.
- The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).
- Files saved on shared storage are available on all nodes.
- The login node is a shared machine: be considerate of other users.
EPCC version - Working on a remote HPC system
- “An HPC system is a set of networked machines.”
- “HPC systems typically provide login nodes and a set of worker nodes.”
- “The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).”
- “Files saved on one node are available on all nodes.”
Scheduler Fundamentals
- The scheduler handles how compute resources are shared between users.
- A job is just a shell script.
- Request slightly more resources than you will need.
HPCC version - Scheduler Fundamentals
- The scheduler handles how compute resources are shared between users.
- A job is just a shell script.
- Request slightly more resources than you will need.
EPCC version - Working with the scheduler
- “The scheduler handles how compute resources are shared between users.”
- “Everything you do should be run through the scheduler.”
- “A job is just a shell script.”
- “If in doubt, request more resources than you will need.”
Environment Variables
- Shell variables are by default treated as strings
- Variables are assigned using “
=
” and recalled using the variable’s name prefixed by “$
” - Use “
export
” to make an variable available to other programs - The
PATH
variable defines the shell’s search path
Accessing software via Modules
- Load software with
module load softwareName
. - Unload software with
module unload
- The module system handles software versioning and package conflicts for you automatically.
Transferring files with remote computers
-
wget
andcurl -O
download a file from the internet. -
scp
andrsync
transfer files to and from your computer. - You can use an SFTP client like FileZilla to transfer files through a GUI.
Running a parallel job
- Parallel programming allows applications to take advantage of parallel hardware.
- The queuing system facilitates executing parallel tasks.
- Performance improvements from parallel execution do not scale linearly.
Using resources effectively
- Accurate job scripts help the queuing system efficiently allocate shared resources.
Using shared resources responsibly
- Be careful how you use the login node.
- Your data on the system is your responsibility.
- Plan and test large data transfers.
- It is often best to convert many files to a single archive file before transferring.