Reboot completed. Job max-time is restored and the clusters are now ready to use. Hyper Threading is now enabled on both clusters i.e. you can reserve and run two threads per cpu core if your workload benefits from it.
Dione and Titan reboot again in few days due to kernel update. Clusters are in pre-reboot state. Job max-time set to 1 hour.
Reboot completed. Job max-time is restored and the clusters are now ready to use.
1. Dione and Titan has been removed from the FGCI Grid. Thus it is not possible to submit jobs via the Grid infrastructure to Dione or Titan any more. Both clusters will be available to run local jobs, though.
2. Dione GPU nodes di39, di40, di 41 and di42 are removed from the cluster. Hence there are two GPU nodes (di37 and di38) left in the cluster.
3. FGCI Grid infrastructure will be shut down in the end of 2021. Submitting Grid jobs to any FGCI cluster is not possible starting from 1.1.2022
Dione and Titan reboot in few days due to updates. Clusters are in pre-reboot state. Job max-time set to 1 hour.
Titan reboot done, cluster is ready.
Dione reboot completed, GPU nodes should be OK.
Dione has been rebooted due to kernel upgrade. gpu nodes did not upgrade gracefully, nvidia driver might be broken.
Normal compute nodes should run OK.
Titan reboot done, cluster is ready.
Julia is broken now due to clib upgrade. :(
Compute node 2 will be fixed next week.
Dione has been back online for about two hours. It seems to be more stable than last week, but there is not much load yet.