High Performance Computing

From ICT science
Jump to navigation Jump to search

Science faculty General Purpose Facility

Gemini

Gemini (Gemini.science.uu.nl) Linux server is free to use but the machine is shared by many users. This computing facility provides a good solution for the development and testing of small programs that don't require huge dedicated computing power. 
It is possible to add/buy your own node to insure computing capacity for exclusive use and additional software (not free of charge). Dependent on availability we may be able to provide a separate node for free for a trial period so you can try before you buy.

The general characteristics of the machine are:

- 4 general accessible nodes {1x 32, 3x48 cores}
- 3 group-specific nodes {48 cores each}.
- OS : Scientific Linux 7.9 {equal to Redhat EL 7.9}
- serving short and long queue (and group-specific queues), for batch jobs, 
- a lot of standard software, easy to use through "environment modules", 
- limited interactive jobs possible, 
- local scratch storage (2 TB) for fast i/o jobs.
- connected to main BFS-storage, so standard "Home", "Projects" and "WWWProjects" accessible. 

Characteristics

(+) Easy to use and access.
(+) Efficient and effective for small interactive jobs or batch processing
(+) Fast local disk space
(+) direct mounting of BFS project storage
(+) Easy environment configuration by use of environment modules.
(+) Free
(+) possibly add your own node for exclusive use (not free of charge)
(-) Limited personal configuration possible (no root user access, though installation of libraries or modules on demand by gemini-admin)
(-) No exclusive access
(-) Limited cluster jobs, as queues are limited to 1 node.
(-) No GPU-Cores.


Contact: gemini-admin@science.uu.nl

Dedicated Virtual Machines

The Faculty has a VMWare installation and can provide virtual machines (VMs) on demand on a pay per use basis. This option provides a good solution if one needs exclusive access and special software to be installed. Also, having a VM can be turned on/off on demand, thereby offering flexible cost management. This flexibility also means that one can switch from cluster to single machine computation easily.

The memory and the space of these machines depends on the specific requirements. Some indicative costs are the following:

Users can have full administrative rights and exclusive access on these machines. They are for CPU computation and their characteristics can be easily adapted according to the needs of the computation.

(+) Easy to use and full access.
(+) Good for either interactive or for batch jobs
(+) Machines can be configured as from scratch as needed (installation of software etc.)
(+) Exclusive access
(+) Can be used for cluster-based/parallel jobs.
(-) Although they can be used for parallel jobs, they are not the best option, since, being virtual, there is little control over the usage of the different cores. Hence, precise timing measurements cannot be done at core level.
(-) Users need to configure these machines from scratch
(-) Disk is virtual which means that is not ideal to measure I/O operations
(-) Cost
(-) no GPU/visual computing support

Contact: science.ictbeheer@uu.nl / ITS-TopDesk?

Group Clusters

Dedicated clusters, configured for dedicated tasks. Almost always owned and maintained by a specific research group. Since such machines were based on different needs, they are widely differing in resources (#nodes, #memory, #cores, #architecture, #interconnects).

(+) Fully customizable machines (by owners)
(+) Optimizeable for GPU/visual computing
(+) Initial hardware installation by ICT-Beta, further maintenance on the basis of best effort
(-) Accessibility: Generally located in UU datacenter Almere, so only accessible by ICT-Beta employees
(+/-) Software setup by owner

List of HPC: managed clusters

Contact: science.ictbeheer@uu.nl

UBC Cluster

The UBC cluster is an HPC facility located in UMCU. It is part of the Utrecht Bioinformatics Center (UBC). It can be used by any researcher at the Utrecht Science Park, this solution is preferable for large-scale experiments of ready-made programs (used on a pay-per-use mode) and not for actual development or testing of a program. 
  • The cluster consists of a collection of compute servers (>2000 CPUcores, 30 GPUs, 20 TB ram), 1.7 PB high performance storage and 1.1 PB archive storage running on CentOS Linux.
  • It is controlled by the SLURM batch-wise queueing system with a few head nodes and data transfer nodes.
  • Data storage is separated between PI groups.
  • They provide a central GUIX software management system, notebook services and containersupport.
  • Two full time staff members are available to maintain, innovate and support this essential research resource.

More information can be found here: https://ubc.uu.nl/infrastructure/high-performance-computing.

The UBC cluster is free for trial usage. For actual usage, a research group buys shares. Priority is given to researchers with more shares. Each group has its own “space” that facilitates sharing among members of the group. Packages can be installed (based on Guix for library management).

Indicative usage costs:

CPU: 1200 EUR for 50K CPU hours
GPU: 1150 EUR for 5K GPU hours (+5 cores 64 GB ram). One can use any available GPU.
Storage: 180 EUR per TB per year
Archive: 45 EURper TB per year
(+) HPC, so it is for very fast computation
(-) Relatively expensive storage (compared to ITS machines)
(-) Optimized for "Life Sciences"

SURFSara

SURFsara offers an integrated ICT research infrastructure and provides services in the areas of computing, data storage, visualization, networking, cloud and e-Science. It is the national computer center for scientific research.

Surfsara has two supercomputers (Cartesius and Lisa) and a HPC cloud (a platform for customizable Virtual Machines). The supercomputers are both production systems. Not development. The former is for extremely computation intensive jobs with fast interconnect between compute nodes. Both supercomputers support CPU and GPU computations. Both are operated using a linux terminal and computational tasks are submitted as jobs (so no interactive output). Typically a user has a 200GB storage space that is NFS in the home file system. If required, more storage can be requested. There is also an archive service which is backed up and can be at the level of TB. There is also the option of using scratch space on the compute node (~1 TB), but this is removed at the end of a job and output should be synchronized to remote storage before the job ends.

The HPC cloud is a configurable HPC system that provides the user with a Virtual Machine with an operating system of choice (command line as well as Desktop versions) and the possibility to install software. The main advantage is self service, and the possibility to interactively use GUIs of Jupyter notebooks, Matlab, RStudio, etc.

Surfsara uses a model of credits. A starterbudget (small projects, tests) can be obtained through the UU Research IT department via Topdesk. (see: https://intranet.uu.nl/en/surfsara-credits-temporary-free-computational-power ). UU Research IT also offers HPC consultancy and support. For projects up to a maximum of 100K core hours per year proposals can be submitted directly to Surfsara (see: https://www.surf.nl/en/apply-for-access-to-compute-services). Processing time is in the order of weeks. For larger projects one has to submit a proposal to NWO.

(+) free (up to 10K core hours, or as long as the underlying project is approved)
(+) thousands of CPU cores
(+) GPU nodes
(+) professional support (remote)
(-) production machine only

Contact: the Research Data Management team (research.engineering@uu.nl) or directly Surfsara (helpdesk@surfsara.nl)

Surf Research Cloud

A Digital Research Environment provided by SURF. Can already be piloted by UU research groups.

https://servicedesk.surfsara.nl/wiki/display/WIKI/Research+Cloud+Documentation

Surf services and tariffs: https://www.surf.nl/files/2021-09/surf-diensten-en-tarieven-2022_versie-aug-2021-v3.pdf

UU contact: A.P.M.Smeele@uu.nl

External Commercial Providers

There are traditional providers in the market that offer computing services, e.g., Amazon, MS Azure, Baidu, and Google Colab. The Research Data Management department can provide guidance on how to obtain and set up cloud-computing services like AWS, Azure, and Google Cloud. These facilities may be a good alternative if a collective solution is provided for long term, e.g., the way the university has chosen to use Microsoft or OneDrive services. The UU already has a contract and processing agreement for MS Azure Cloud services and is in the process of providing accounts for this purpose. Please contact: rdm-beta@uu.nl

Please keep in mind that handling privacy sensitive data requires you to have an approved processing agreement with the provider involved.

There are no free Sara credits to try the machines at the beginning. However, some providers offer partially-free services, e.g. Google Colab. These machines offer mainly a Jupiter notebook service, while some others are basic system level offerings of machines (VMs) to be configured as desired. This of course requires admin knowledge.

Some indicative prices for the Azure service, for instance, can be found here: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/

  • 4 cpu cores with 16 GB: 86 euro per month
  • 16 cpu cores with 64 GB : 345 euro per month
  • 32 cpu cores with 128 GB: 690 euro per month
  • 100 GB local disk space, 16 euro per month (not for archive)
(-) not free (and maybe more expensive that local NL or local UU solutions)
(-) cannot be used as a partial solution
(-) no direct support from the UU technicians/services
(+) professional-level support (paid)

Contact: the Research Data Management team (https://www.uu.nl/rdm)

Personal machines

Researchers can always buy own (personal) machines. These can be hosted at the UU-datacenter and they remain in exclusive property of the purchasing researcher. 

A positive aspect is that most costs are paid upfront and not ‘as you go’. Administration can be done by the owner or can be outsourced to the ITS at a basic cost. Such a machine could be completely autonomous or attached to the Gemini HPC mentioned above (with restriction that hardware config is in accordance with Gemini}

(+) Full administrative control over the machine {if NOT connected to other HPC}
(+) Full control over the CPU, GPU, and disk access
(+) Can support GPU/visual computing
(-) Potentially large upfront cost
(-) For long-term usage (5..10 years or even more), hardware upgrades are needed, which incur additional costs, and may cause compatibility problems
(-) Administrative expertise required

Contact: science.ictbeheer@uu.nl

Additional UU documentation on HPC facilities

RDM support: https://www.uu.nl/en/research/research-data-management/tools-services/software-and-computing/high-performance-and-cloud-computing