Introduction to High-Performance Computing: Key Points

Alpha

Introduction to High-Performance Computing

Why use a Cluster?

High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
These other systems can be used to do work that would either be impossible or much slower on smaller systems.
HPC resources are shared by multiple users.
The standard method of interacting with such systems is via a command line interface.

Connecting to a remote HPC system

An HPC system is a set of networked machines.
HPC systems typically provide login nodes and a set of worker nodes.
The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).
Files saved on one node are available on all nodes.

Exploring Remote Resources

An HPC system is a set of networked machines.
HPC systems typically provide login nodes and a set of compute nodes.
The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).
Files saved on shared storage are available on all nodes.
The login node is a shared machine: be considerate of other users.

EPCC version - Working on a remote HPC system

“An HPC system is a set of networked machines.”
“HPC systems typically provide login nodes and a set of worker nodes.”
“The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).”
“Files saved on one node are available on all nodes.”

Scheduler Fundamentals

The scheduler handles how compute resources are shared between users.
A job is just a shell script.
Request slightly more resources than you will need.

HPCC version - Scheduler Fundamentals

The scheduler handles how compute resources are shared between users.
A job is just a shell script.
Request slightly more resources than you will need.

EPCC version - Working with the scheduler

“The scheduler handles how compute resources are shared between users.”
“Everything you do should be run through the scheduler.”
“A job is just a shell script.”
“If in doubt, request more resources than you will need.”

Environment Variables

Shell variables are by default treated as strings
Variables are assigned using “=” and recalled using the variable’s name prefixed by “$”
Use “export” to make an variable available to other programs
The PATH variable defines the shell’s search path

Accessing software via Modules

Load software with module load softwareName.
Unload software with module unload
The module system handles software versioning and package conflicts for you automatically.

Transferring files with remote computers

wget and curl -O download a file from the internet.
scp and rsync transfer files to and from your computer.
You can use an SFTP client like FileZilla to transfer files through a GUI.

Running a parallel job

Parallel programming allows applications to take advantage of parallel hardware.
The queuing system facilitates executing parallel tasks.
Performance improvements from parallel execution do not scale linearly.

Using resources effectively

Accurate job scripts help the queuing system efficiently allocate shared resources.

Using shared resources responsibly

Be careful how you use the login node.
Your data on the system is your responsibility.
Plan and test large data transfers.
It is often best to convert many files to a single archive file before transferring.