Cluster Use
Cluster Use
Node | NVidia GPU model | RAM | CUDA | Best for |
node7 | Tesla K40c 12GB + 2x Tesla C2050 2.5GB | 128GB | 8.0 | Old applications that runs on GPU |
node8 | 2x Tesla P100 16GB | 512GB | 10.2 | Machine learning, like pytorch |
node15-18 | Quadro P1000 4GB | 192GB | 10.2 | GPU accelerated data processing via SGE |
node23 | Tesla P100 16GB | 192GB | 10.2 | Basic machine learning and data processing with relatively large matrix |
You log in to one of our three head nodes.
For test run, programming and non-node specific debugging, and GUI interactive data management and processing, please stick on your head node.
If your software or pipeline support SGE, you may submit them from any node. Better enclose your submission script in a screen process, so that you can go back to check if everything has been running well.
For non-SGE applications that can run in non-interactive way, especially for those that either takes a long time or occupies significant amount of CPU powers, please avoid running them on head nodes.
Node | OS | Specification | Best for |
master1 | CentOS 7.7 | Intel 32 core 384GB | Global log in node, interactive and GUI |
master2 | CentOS 8.1 | AMD 16 core 128GB | Management console, not for data processing use |
node3 | CentOS 7.6 | Intel 40 core 128GB | Collaborator reserved |
node4-6,19-22 | CentOS 8.1 | Intel 40 core 128GB | Data processing |
node7 | CentOS 7.4 | Intel 16 core 128GB, Tesla K40c 12GB + 2x Tesla C2050 2.5GB | LCMODEL and obsolete GPU processing, MR facility connectivity |
node8 | CentOS 7.6 | Intel 56 core 512GB, 2x NVidia Tesla P100 16GB GPU | Machine learning and GPU accelerated simulation, Emory login node |
node9,11-14 | CentOS 7.6 | Intel 8 core 24GB | To be retired/replaced |
node10 | CentOS 7.4 | Intel 40 core 64GB | Backup server |
node15-18 | CentOS 8.1 | Intel 64 core 192GB, NVidia Quadro P1000 GPU | GPU accelerated data processing |
node23 | CentOS 7.6 | Intel 56 core 192GB, NVidia Tesla P100 16GB GPU | GPU accelerated data processing, Emory login node |
node24 | CentOS 7.5 | AMD 16 core 64GB | Old master1, dicom server |
Yes and no.
Yes:
- Different nodes are equipped with different generation of CPUs, with or without GPUs, and slightly different operating system. So, your code may not work exactly the same on each node, especially for processing speed.
- Some software are limited by license on certain node. Like lcmodel is limited on node7.
- Due to the complexity of the user requests and the nature of architecture difference among nodes, it has been impossible to keep all OS and software package versions consistent across the cluster.
No:
- Data file system and user profile are mounted consistently across the cluster, as well as locale and system profiles. So you should not have path and file accessibility issue across the cluster.
- Major managed data processing software are mirrored across the cluster. They are supposed to produce same results if running on same input.
Baseline: generally, it should get same result if you run your pipeline on different node without changing your code, though may finish in very different length of time, except the following pitfalls.
Pitfall:
- If your software can automatically decide use or not use GPU, and has optimization against number of cores and total RAM, you should expect to get different results on different node;
- Your code may run only on some node but crash on others, if you compiled it on a higher version OS, or your code consumes too much RAM, or optimized against a certain hardware architecture.