News

User's Page: Pleione & Titan maintenance

Added by Timo Eronen over 8 years ago

Both clusters are in pre-maintenance state due to slurm upgrade.

Maxtime is set to 1 hour.

Shutdown will be initiated without any further notification.

User's Page: Titan & Pleione reboot done.

Added by Timo Eronen over 8 years ago

Both clusters are ready to use now.

User's Page: Clusters reboot in near future

Added by Timo Eronen over 8 years ago

Due to new kernel both clusters will be rebooted as soon as all queues have been drained.

The partition max. runtime has been set to 1 hour to allow monitoring jobs execute.

Depending on the currently running jobs' max. time settings, the reboot will happen at latest one week from now.

The reboot will initiated without any further notice!

User's Page: Cluster usage

Added by Timo Eronen over 8 years ago

Both clusters have now 'fake' compute node #99 which is dedicated to Grid jobs (including cluster monitoring).

This means that all 'normal' compute nodes (titan nodes 1-9 and pleione nodes 1-32) can be freely used for real work without disturbing monitoring jobs.

I.e. you can take over all the nodes from normal, small, big, all (whatever, except grid) partitions.

User's Page: Pleione maintenance completed

Added by Timo Eronen over 8 years ago

Nodes pl17, pl22 and pl99 IB cables fixed.

Cluster is ready to use.

User's Page: Pleione short maintenance break.

Added by Timo Eronen over 8 years ago

The cluster is in pre-reboot state and max runtime has been decreased to 1 hour. Once all partitions has been drained the cluster will be rebooted.

User's Page: Titan reboot done.

Added by Timo Eronen over 8 years ago

Titan reboot scheduled for 23.1 already done.

Partitions maxtime restored to one week so the cluster is ready to use.

User's Page: Pleione reboot done.

Added by Timo Eronen over 8 years ago

Pleione reboot scheduled for 23.1 already done.

Partitions maxtime restored to one week so the cluster is ready to use.

User's Page: Cluster reboot next week (maybe 23.1.2017)

Added by Timo Eronen over 8 years ago

Due to new kernel both clusters need reboot.

All partitions' max runtime is set to 1 hour until the reboot. So, don't panic if (when) your (new) job is put on hold until the reboot and max runtime restored.

User's Page: Titan reconfiguration completed

Added by Timo Eronen almost 9 years ago

Titan is now configured according to FGCI rules. The two most significant changes are:

- HyperThreading is disabled
- 20% of resources are reserved for Grid usage

NOTE: The number of logical cores is the same as number of physical cores for all compute nodes:

ti1 :        48 cores (four 12 core CPUs)
ti2 - ti9 :  24 cores (two 12 core CPUs per node)

Project

General

Profile

Cluster Controller at p55cc2

Add news

User's Page: Pleione & Titan maintenance

User's Page: Titan & Pleione reboot done.

User's Page: Clusters reboot in near future

User's Page: Cluster usage

User's Page: Pleione maintenance completed

User's Page: Pleione short maintenance break.

User's Page: Titan reboot done.

User's Page: Pleione reboot done.

User's Page: Cluster reboot next week (maybe 23.1.2017)

User's Page: Titan reconfiguration completed