SBBS Compute FAQs

How do I get in to Apocrita
I'm in, now what?
How do I launch my analysis?
Directory structure
Do you have any suggestions on how to work in this kind of structure?
How do I compress my files so they take up less space?
How do I create links/shortcuts between locations / folders / places on Apocrita?
How do I transfer files from my desktop to Apocrita?
Is there scratch space available on the nodes?
I need a specific software package but it isn't installed.
I have more questions, help!

1. How do I get onto Apocrita

Firstly request an account via the Apocrita account request form.
Your username on the system will be your college username, e.g. btw000.
Make sure you have a terminal window open (Linux and Mac OS) or an SSH client like MobaXterm (Windows)
SSH into login.hpc.qmul.ac.uk
e.g. ssh -X btw000@login.hpc.qmul.ac.uk

2. I'm in, now what?

Have a look around, remember this is a Linux command line.

The official Apocrita HPC wiki has a set of basic instructions.
The SBBS-Informatics site for specific SBBS user information such as the Hive archive and the Galaxy server. This page also contains a short tutorial for Linux command line usage.
To get further into the linux command line there are quite a few tutorials at Linux.org

Apocrita has a large number of tools and software packages installed. However, because many of these tools conflict with each other you need to manually load the tools you want to use into your environment. This is done with the module command

First try and list all available modules with "module avail".
You can then load one of the modules, for example lets load a specific version of R - "module load R/3.2.5".
If you put these module load commands in your ~/.bash_profile file they will be automatically loaded when you log in.

3. How do I launch my analysis?

qsub my_submission_script.sh will submit a job to the grid
The first thing you should do is head to the documentation!
Frontend node ≠ execution node
- This means that you DO NOT run jobs on the node you have logged into.
- To run jobs you must submit to the queue (usually with a script) or use the qlogin interface.
qlogin gives you an interactive session assigned by the scheduler:
- qlogin - this gets you logged onto a node assigned by the queue.
- qlogin -l h_vmem=100G - this gets you onto one of the big memory machines. This can sometimes take a while to schedule since the fat nodes are often fully loaded
qlogin appears like a normal shell terminal and you can work directly in it, allowing you to for example test code or run a job that requires interaction.

4. Directory structure

There are several places to put your data. Where you put it really depends on what kind of data it is, how large, what it's used for etc.
These folders are our preset folder structure and have implications for speed, storage space, and data persistence.
The table below gives an outline of the folders:

Directory	Type	Size	Backed up?
`/data/scratch/btw000`	Fast work space	1TB	No
`/data/home/btw000`	Small space for installations and important files	50GB	Yes
`/data/SBCS-BloggsLab`	Shared space for your research group	1TB per research group	Yes
`Hive`	Slow, long-term archive	2TB per research group	Yes, mirrored to identical storage box

5. Do you have any suggestions on how to work in this kind of structure?

Ultimately it depends on your style of working and the space requirements of the project. Here is one suggestion on what to keep where:

Home:
- Scripts
- Local installations
- Small reference datasets used very often
Lab space:
- Any data shared by you and your research group
- Collaborative projects
- Reference datasets/databases
Scratch:
- Your current working dataset
- Raw data for current analysis
- Intermediate files created in processing data
- Unpolished results
Hive:
- Finished research projects - results
- Research projects on hold - intermediate data
- Raw data in its original form
- Anything of considerable size which is not to be used for a few months or more

6. How do I compress my files so they take up less space?

For big files the best option is gzip, when it is done zipping it will append the .gz extension and delete the original file
gzip data/lane1.fastq

you can use gzipped file without uncompressing them with some programs. There are also versions of for example less (to read the file), cat (to print it to the console or concatenate with other files) and grep (searching a file). These are called zless, zcat and zgrep.
zless data/lane1.fastq.gz
zcat data/lane1.fastq.gz | fastx_clipper ...
zgrep "search-term" data/lane1.fastq.gz

To compress a directory of many small files it is better to use a tar archive instead of compressing all individual files one by one. It will create a new file which contains the directory and can automatically gzip the file if you supply the -z option to the tar program.
tar -czvf my_directory.tar.gz my_directory
You can also use zip which may be more familiar to windows users
zip -r ~/archive/2012-02-21-my-analysis.zip ~/work-weekly/2012-02-21-my-analysis/

7. How do I create links/shortcuts between locations / folders / places on Apocrita?

You need to know your target (which folder or file you want your link to go to) and what name you wish to give the link.
The command is $ ln -s target_name link_name

e.g. $ ln -s /home/btw123/target_file link_file
This creates a link (or shortcut if you prefer) called link_file in the current directory that points to /home/btw123/target_file.

8. How do I transfer files from my desktop to Apocrita?

You can use "scp" top copy the files onto the server:
scp my.fastq.gz btw000@login.hpc.qmul.ac.uk:~/btw000/temp/my.fastq.gz

Alternatively, use a program with a graphical user interface such as CyberDuck or Filezilla.

Open CyberDuck
Input login.hpc.qmul.ac.uk as server
Input Apocrita username and password
Connect!

You should see a your home directory on Apocrita and you can now drag and drop files to and from your desktop

9. Is there local scratch space available on the nodes?

Yes - about 100GB! You can access it using the variable $TMPDIR
So if you wanted to create a file in this directory you would use the path:
- $TMPDIR/my_file
The $TMPDIR directory is cleaned up after your job has finished.

10. I need a specific software package but it isn't installed.

The easiest way is to request installation by using the form here.
You can also install it yourself in your home directory. You will need to use the full path to access the executable, or add it to your $PATH in your .bashrc or .profile

11. I have more questions... help!

There is a more extensive guide to Apocrita which may answer questions or give you useful tips.
You are always welcome to email Adrian and ask away.

Global main menu

Areas of study

Study at Queen Mary

Experience Queen Mary

Research and Innovation

Research by faculties and centres

Collaborations and partnerships