Cluster Use
Cluster Use
| Node | NVidia GPU model | RAM | CUDA | Best for |
| master2 | 2x NVIDIA A40 48GB | 1TB | 13.1 | Testing and pipeline development |
| node3 | 1x NVIDIA RTX A2000 6GB | 4TB | 13.0 | GPU accelerated data processing that need large RAM |
| node8 | 2x Tesla P100 16GB | 512GB | 12.3 | Older applications |
| node9 | 1x NVIDIA A40 48GB | 512GB | 13.1 | General GPU accelerated applications |
| node15-18 | Quadro P1000 4GB | 192GB | 12.8 | GPU accelerated data processing via SGE |
| node23 | Tesla P100 16GB | 192GB | 10.2 | Basic machine learning and data processing |
| node24 | 8x NVIDIA A40 48GB | 1TB | 13.0 | Machine learning and AI training |
| node27,28 | 4x NVIDIA A40 48GB | 1TB | 13.0 | Machine learning and AI training |
| node23 | 2x NVIDIA L40S 48GB | 1.5TB | 13.2 | Machine learning and AI training with large matrix |
You log in to one of our three head nodes.
For test run, programming and non-node specific debugging, and GUI interactive data management and processing, please stick on your head node.
If your software or pipeline support SGE or SLURM, you may submit them from any node. Better enclose your submission script in a screen process, so that you can go back to check if everything has been running well.
For non-SGE applications that can run in non-interactive way, especially for those that either takes a long time or occupies significant amount of CPU powers, please avoid running them on head nodes.
| Node | OS | Specification | Best for |
| master1 | Rocky Linux 9.6 | 2x Intel 32 core 384GB | Dedicated file server, not for data processing use |
| master2 | Rocky Linux 9.7 | 2x AMD EPYC 28 core 1TB, 2x A40 GPU | Management console, head node, not for data processing use |
| node3 | Rocky Linux 10.0 | 2x Intel Xeon 32 core 4TB, A2000 GPU | Large RAM node, Collaborator reserved |
| node4-6 | N/A | Retired. Vacant for new nodes | N/A |
| node7 | Rocky Linux 9.6 | Intel 20 core 128GB | Gateway node for backup architecture, facility connection |
| node8 | Rocky Linux 9.6 | Intel 56 core 512GB, 2x NVidia Tesla P100 16GB GPU | GPU accelerated application, Emory login node |
| node9 | Rocky Linux 10.1 | 2x AMD EPYC 28 core 512GB, A40 GPU | GPU accelerated application |
| node10 | Rocky Linux 9.7 | 2x Intel Xeon 20 core 64GB | Backup server #2 |
| node11-14 | Rocky Linux 9.6 | 2x AMD EPYC 32 core 512GB | CPU intensive application |
| node15-18 | Rocky Linux 9.6 | 2x Intel 16 core 384GB, NVidia Quadro P1000 GPU | CPU intensive application that can take some GPU helps |
| node19-22 | Rocky Linux 9.6 | 2x Intel 10 core 128GB | CPU intensive application |
| node23 | CentOS 7.6 | 2x Intel 14 core 192GB, NVidia Tesla P100 16GB GPU | Emory login node, Collaborator Researved |
| node24 | Rocky Linux 9.6 | 2x AMD EPYC 32 core 1TB, 8x A40 GPU | Machine Learning, AI |
| node25 | Rocky Linux 9.7 | 2x AMD EPYC 28 core 256GB | Backup server #1 |
| node26 | Rocky Linux 9.7 | 2x Intel Xeon 24 core 256GB | Backup server #3 |
| node27-28 | Rocky Linux 9.6 | 2x AMD EPYC 32 core 1TB, 4x A40 GPU | Machine Learning, AI, File Servers, Emory Log in Nodes |
| node29 | Rocky Linux 9.7 | 2x AMD EPYC 24 core 1.5TB, 2x L40S GPU | All applications |
Yes and no.
Yes:
- Different nodes are equipped with different generation of CPUs, with or without GPUs, and slightly different operating system. So, your code may not work exactly the same on each node, especially for processing speed.
- Some software are limited by license on certain node. Like lcmodel is limited on node7.
- Due to the complexity of the user requests and the nature of architecture difference among nodes, it has been impossible to keep all OS and software package versions consistent across the cluster.
No:
- Data file system and user profile are mounted consistently across the cluster, as well as locale and system profiles. So you should not have path and file accessibility issue across the cluster.
- Major managed data processing software are mirrored across the cluster. They are supposed to produce same results if running on same input.
Baseline: generally, it should get same result if you run your pipeline on different node without changing your code, though may finish in very different length of time, except the following pitfalls.
Pitfall:
- If your software can automatically decide use or not use GPU, and has optimization against number of cores and total RAM, you should expect to get different results on different node;
- Your code may run only on some node but crash on others, if you compiled it on a higher version OS, or your code consumes too much RAM, or optimized against a certain hardware architecture.