Cluster Use


Cluster Use

The cluster is assembled with multiple generation of nodes with variety of configurations.  Among all 24 nodes, 7 of them are equipped with GPUs.  See table below.

Nodes configurations

NodeNVidia GPU modelRAMCUDABest for
node7Tesla K40c 12GB + 2x Tesla C2050 2.5GB128GB8.0Old applications that runs on GPU
node82x Tesla P100 16GB512GB10.2Machine learning, like pytorch
node15-18Quadro P1000 4GB192GB10.2GPU accelerated data processing via SGE
node23Tesla P100 16GB192GB10.2Basic machine learning and data processing with relatively large matrix

You log in to one of our three head nodes. 

For test run, programming and non-node specific debugging, and GUI interactive data management and processing, please stick on your head node.

If your software or pipeline support SGE, you may submit them from any node.  Better enclose your submission script in a screen process, so that you can go back to check if everything has been running well.

For non-SGE applications that can run in non-interactive way, especially for those that either takes a long time or occupies significant amount of CPU powers, please avoid running them on head nodes.

Nodes specifications and recommended applications

NodeOSSpecificationBest for
master1CentOS 7.7Intel 32 core 384GBGlobal log in node, interactive and GUI
master2CentOS 8.1AMD 16 core 128GBManagement console, not for data processing use
node3CentOS 7.6Intel 40 core 128GBCollaborator reserved
node4-6,19-22CentOS 8.1Intel 40 core 128GBData processing
node7CentOS 7.4Intel 16 core 128GB, Tesla K40c 12GB + 2x Tesla C2050 2.5GBLCMODEL and obsolete GPU processing, MR facility connectivity
node8CentOS 7.6Intel 56 core 512GB, 2x NVidia Tesla P100 16GB GPUMachine learning and GPU accelerated simulation, Emory login node
node9,11-14CentOS 7.6Intel 8 core 24GBTo be retired/replaced
node10CentOS 7.4Intel 40 core 64GBBackup server
node15-18CentOS 8.1Intel 64 core 192GB, NVidia Quadro P1000 GPUGPU accelerated data processing
node23CentOS 7.6Intel 56 core 192GB, NVidia Tesla P100 16GB GPUGPU accelerated data processing, Emory login node
node24CentOS 7.5AMD 16 core 64GBOld master1, dicom server

Yes and no.

Yes:

  • Different nodes are equipped with different generation of CPUs, with or without GPUs, and slightly different operating system.  So, your code may not work exactly the same on each node, especially for processing speed.
  • Some software are limited by license on certain node. Like lcmodel is limited on node7.
  • Due to the complexity of the user requests and the nature of architecture difference among nodes, it has been impossible to keep all OS and software package versions consistent across the cluster.

No:

  • Data file system and user profile are mounted consistently across the cluster, as well as locale and system profiles.  So you should not have path and file accessibility issue across the cluster.
  • Major managed data processing software are mirrored across the cluster.  They are supposed to produce same results if running on same input. 

Baseline: generally, it should get same result if you run your pipeline on different node without changing your code, though may finish in very different length of time, except the following pitfalls.

Pitfall:

  • If your software can automatically decide use or not use GPU, and has optimization against number of cores and total RAM, you should expect to get different results on different node;
  • Your code may run only on some node but crash on others, if you compiled it on a higher version OS, or your code consumes too much RAM, or optimized against a certain hardware architecture.