Software


Where may I find certain software?

System commands are under

  • /usr/bin
  • /usr/local/bin

Cluster-wide software available to all users are installed under

  • /usr/local - for main stream and current version software developed by third parties
  • /opt/local - for rarely used and old versions of third party developed software

Users can compile and install software for personal use. Such software will not be seen by other users.

Users can create and share scripts and pipelines.

Doesn't like Windows, MAC, and Android, although linux supports windows/MAC style GUI user interface, it does not provide a self-populated OS level menu system that allows you to reach every commands via a few mouse clicks and moves.  And just like Windows and MAC, there is also not a smart guide to tell you which command you should use and how if you just state a job description.  You may have to search internet for the command that can do the job you want and come back to find out if this command is already available in the system.

Also does not like Windows, MAC OS, and Android, which come as a whole and the applications can be installed as packages, linux type OS usually includes several thousands up to several ten thousands of packages, among which a few hundreds can make a basic working system and nearly nobody really wants to install everything.  Every linux computer may have been installed a different subset of the OS. 

So as a beginner linux user, after you have been able log in the CSIC cluster system based on the New User Information Document you received, you will need to know the following text based commands to navigate around and start your work:

  1. Shell characters:
    1. "~": User home directory.
    2. ".": Current directory.
    3. "..": Parent directory.
    4. "&": Detach a progress from a terminal.
    5. ">": write output of command before ">" into file.
    6. "<": Take out content of a file named by the filename after "<" and feed it to the command.
    7. "|": Pipe exports of one command to the next.
    8. "$()": Treat the output of commands inside bracket as a string.
    9. "\": The command will continue in next line.
    10. "?": Stand for an arbitrary character.
    11. "*": Stand for an arbitrary string.
  2. Informational commands:
    1. ls: Show content of current directory.  Common directives:
      1. ls /path/to/folder: List contents of a folder;
      2. ls -l: List details of all files in a folder;
      3. ls -a: List also hidden files.  In linux, hidden files are usually files and folders with name starts from a dot.  These files are usually generated by the system to keep your log in preference or current status;
      4. ls -t: ls by default list files in a-z alphabetical order.  -t makes it to list by descending time order, newer files first;
      5. ls -r: List by reverse order;
      6. ls -R: List recursive into all sub-directories;
      7. All directive can be combined, like "ls -altrR" will list all files in current folder and sub-folders in reversed time order with detailed file information.
    2. qhost: Show status of all nodes in the cluster system.  Here are the meanings of the output columns:
      Column NameMeaning
      HOSTNAMEname of a node on the cluster.  To get into it, do "ssh hostname".
      ARCHArchitecture of the node.  They are all lx-amd64.
      NCPUNumber of virtual threads that can be run on a node without time sharing. 
      NSOCAlways 2, as long as each of our nodes are equipped with two physical CPUs.
      NCORNumber of physical cores on a node.  It equals to number of CPUs in a node times number of physical cores per CPU.  For example, if a node is equipped with two 14 core CPUs, NCOR will be 2x14=28.
      NTHRNumber of virtual threads.  Equals to NCPU.  This number also equals to NCOR for CPU architecture without hyper-threading, or twice of NCOR if the CPU architecture supports hyper-threading.
      LOADNumber of concurrently running threads on a node.  If LOAD is more than NTHR, the node is overloaded, and some threads are running in time sharing mode.  The average speed of running of your process will be less than inversely proportional to the LOAD:NTHR ratio.
      MEMTOTTotal physical RAM installed on a node, by gigabytes.  Most of our nodes are equipped with 3-4GB/thread.  The high RAM nodes are equipped with about 10GB/thread.
      MEMUSEPhysical memory already allocated by system and user processes.  MEMUSE is always less than MEMTOT. However, if MEMUSE is too close to MEMTOT, the system will automatically push some not very active data to swap file.  If these data are still actively accessed by your process, it will be drastically slowed down.  The best practice is to avoid nodes with already high MEMUSE.
      SWAPTOSize of swap file.  This is like the pagefile in Windows.  Linux system cannot always survive if requested RAM gets over the total available RAM, it may halt or semi-halt if exchange among physical RAM and SWAP faces high demand. System pushes low access rate data from physical RAM to swapfile, so that the system can still be active when total allocated RAM is over the total physical RAM.  However, knowing that average RAM throughput can peak to 6.4GB/s while hard drive throughput, even fast SSD, can only reach 550MB/s, running your process on SWAP means that it is running no more than 10% of its normal speed.  We set this big (hundreds of GB) only for increasing system stability, not means that you can regularly run your programs that need more RAM than MEMTOT.
      SWAPUSAllocated swap space.  When this is high, like more than a few GB, the system performance will drastically drop.  Reason explained in the SWAPTO row.

      Bottom line: avoid submitting new jobs on nodes with high MEMUSE (over 80% MEMTOT) and high SWAPUS (over 10GB). 
    3. pwd: To show your current directory.
    4. man command: Show the manual of a command, if manual is available.  Alternative: "info command"
    5. df: To show the disk space information of a node.
    6. w: To show who is using this node.
    7. ps: To show your running progress.  Using "ps aux" to show all processes running on this node.  The first column of the output is the PID (process identification number).
    8. top: To show processes that are using the most resources.
    9. uname -a: Get Linux kernel version information.
    10. hostname: Get name of current node.
    11. history: Show the history of the commands you recently run.
  3. Maneuvering commands
    1. ssh node: Navigate among nodes (ssh hostname).  Pitfall: do not do ssh to host1 then from host1 ssh host2 then from host2 ssh host3...  Your ssh session is nested in your terminal.  If you hop into many layers, the traffic of every bit will go back and forth of the loop, and slow you down.  Please always exit from an internal node before heading to another.
    2. cd destination: Change directory to destination.  Trick:  "cd ~" to go back to your home dir.
    3. exit: To exit a terminal or ssh session.
  4.  Environment commands
    1. xterm: Open another terminal window.  Better practice: use "xterm &" to have the new terminal detached from the current one.  Alternatives: konsole for KDE fans, gnome-terminal for GNOME fans, and xfce4-terminal for XFCE fans.
    2. screen: Open a nested text based environment that can keep your process running even when you are disconnected.  Use Ctrl-A-D to disconnect a screen, use "screen -r" to show PIDs of running screens, use "screen -r PID" to re-enter a screen, use "exit" inside a screen to quit it.  Pitfall: Forget to exit screens after the process inside is already finished.
    3. echo PARA: To display content of system parameter.
    4. . file: Source a system environment modification file.
  5. File and directory operation commands
    1. touch filename: To create an empty new file with specified name.
    2. cp file1 dir/file2: Copy file1 to file2.
    3. mv file1 dir/file2: Move file1 to file2.  If in same directory, this means to rename.
    4. rename expression replacement filename: Replace expression in filename to replacement.
    5. ln -s file link: make a symbolic link of file as link.
    6. rm filename: To remove a file.  Using "rm -f filename" to delete file without interactive confirmation.  Use "rm -rf directory" to delete entire dir without interactive confirmation.  Pitfall: avoid using a script to generate list for "rm -rf".  This may cause all your files deleted if your script cannot generate an existing folder name.  The system has some safeguard against this but is not always working. 
    7. mkdir dir-name: To create a directory.
    8. rmdir dir-name: To remove an empty directory.  To remove directory including all contents, do "rm -rf dir-name", but please be very cautious.  Linux does not have any liable mechanism to recover deleted files.  Back up is back logged for days.
    9. find dir -method expression: Find files under a directory by a method.  Common application: "find . -name "*key*", finding all files under current directory tree with name including key.
    10. chmod mode filename:  Change privilege of a file.  rwxrwxrwx means Read, Write, Execute for user, group, and everybody.
    11. chown user:group filename: Change owner and/or group of a file.
    12. tar: Archive a group of files into a tar ball.  It can be combined with different zip utilities for smaller archive file size.
    13. sed: Replace strings in a text file.
    14. rsync: Synchronize a file or a folder to a different location.  Example: "rsync -av location1 location2".  This can also be used to sync through network: "rsync -av local_folder host:remote_folder".
    15. scp: Secure copy file from one host to another: "scp file host:location".
  6. File content commands
    1. cat filename: show content of a file.
    2. head -number filename: Show the top number lines of a file.
    3. tail -number filename: Show the last number lines of a file.
    4. more filename: Show file content page by page.
    5. less filename: Show file content in a scroll-able fashion.
    6. diff file1 file2: Show difference of two files.  Alternative: comm.
    7. grep term file: Find lines in file containing term.
    8. display file: The system will find an app that can display it content, like an image or a pdf file.
  7. Display manipulation commands
    1. sort: sort exports from a program, default in alphabetic order.
    2. clear: Clear the current terminal window.
  8. Process management commands
    1. kill PID: stop a running process and remove it from RAM.  If "kill PID" does not work, try "kill -1 PID" and "kill -9 PID".
    2. whereis command: Find the location of a command in search path.  It will find all copies of the command.
    3. locate filename: Locate the path of files with the given filename in the indexed file system.
    4. which command: Find which of the command is the default.  Useful if there are multiple copies of commands with the same name.
    5. qsub: Submit a job to SGE.

The linux system is equipped with the following text editors:

NameDescriptionGUIBest for
nanoFull terminal text editorNoGeneral text editing
vi/vimFull terminal text editorNoOld Unix style
geditGnome text editorYesGeneral text editing
emacsProgrammer's editorYesProgramming
kwriteKDE text editorYesSimilar with gedit
gvimvim with GUIYesOld Unix style
geanyIDE-like text editor using GTK+YesScript editing
oofficeLibre OfficeYesMS Windows Alternative

Yes.  By five means.

  1. If you need to program script pipeline, you may use either shell script or script language like perl, R, and php.  Bash 4, perl 5, R 3.6 and php 7 are supported on all nodes.  These files can be run by "./filename" after you change their mode to executable by "chmod u+x".  Python and MATLAB scripts can also go this way or be edited in their native programming interface.  Different versions of python and MATLAB can be found in /usr/local/MATLAB and /opt/rh.  Third party script pipeline may need to be edited to fit in to our environment. 
  2. For some advanced script language, like perl, python, R, and MATLAB, scripts may require extra packages to run.  If it supports calling user installed libraries, you may download dependencies and modify the script to direct them finding the private external packages.  If it has to be system installed packages, please ask the system administrator.
  3. For programming language like c++, if the OS provided compiler is not compatible to your source code, please find more recent compilers under /opt/rh/devtoolset-x.  If your source code requires some operating system supported development packages, please ask system administrator to install them.
  4. For OS supported functionalities and large third party software packages, if not currently available, please ask system administrator to install them.  Some third party software packages support user installation for private application.  We allow users to install and run such applications without notifying the system admin.
  5. Some particular software may not be compatible to our existing shared environment.  We support virtualization like docker container.  User needs to register with system administrator to do so.

Bottom line: Please avoid

  1. Leaving immature code running without monitoring.  If found abnormal activities from user code, the running process will be killed without notice.  Such activities include mainly:
    • Abuse system resource - saturate CPU power, RAM, storage bandwidth, large amount garbage files, and abnormal network activities;
    • Hanging loop and idling processes or screens;
  2. Providing services to internet.  Such activity will be suspended and corresponding code will be deleted.  Responsive users will be warned.
  3. Using this system to hack other computer systems or control hacked computers, or pirating copyright protected resources.  Users who committed this will be suspended and we reserve right of legal action against corresponding users if any damage has been done.
  4. Using this system's resource for personal financial interest.  Corresponding users will be suspended.