SBCS Compute FAQs
- How do I get in to Apocrita
- I'm in, now what?
- How do I launch my analysis?
- Directory structure
- Do you have any suggestions on how to work in this kind of structure?
- How do I compress my files so they take up less space?
- How do I create links/shortcuts between locations / folders / places on Apocrita?
- How do I transfer files from my desktop to Apocrita?
- Is there scratch space available on the nodes?
- I need a specific software package but it isn't installed.
- I have more questions, help!
- Firstly request an account via the Apocrita account request form.
- Your username on the system will be your college username, e.g. btw000.
- Make sure you have a terminal window open (Linux and Mac OS) or an SSH client like MobaXterm (Windows)
- SSH into login.hpc.qmul.ac.uk
ssh -X email@example.com
Have a look around, remember this is a Linux command line.
- The official Apocrita HPC wiki has a set of basic instructions.
- The SBCS-Informatics site for specific SBCS user information such as the Hive archive and the Galaxy server. This page also contains a short tutorial for Linux command line usage.
- To get further into the linux command line there are quite a few tutorials at Linux.org
Apocrita has a large number of tools and software packages installed. However, because many of these tools conflict with each other you need to manually load the tools you want to use into your environment. This is done with the module command
- First try and list all available modules with "
- You can then load one of the modules, for example lets load a specific version of R - "
module load R/3.2.5".
- If you put these module load commands in your
~/.bash_profilefile they will be automatically loaded when you log in.
qsub my_submission_script.shwill submit a job to the grid
- The first thing you should do is head to the documentation!
- Frontend node ≠ execution node
- This means that you DO NOT run jobs on the node you have logged into.
- To run jobs you must submit to the queue (usually with a script) or use the qlogin interface.
qlogingives you an interactive session assigned by the scheduler:
qlogin- this gets you logged onto a node assigned by the queue.
qlogin -l h_vmem=100G- this gets you onto one of the big memory machines. This can sometimes take a while to schedule since the fat nodes are often fully loaded
- qlogin appears like a normal shell terminal and you can work directly in it, allowing you to for example test code or run a job that requires interaction.
- There are several places to put your data. Where you put it really depends on what kind of data it is, how large, what it's used for etc.
- These folders are our preset folder structure and have implications for speed, storage space, and data persistence.
- The table below gives an outline of the folders:
||Fast work space||1TB||No|
||Small space for installations and important files||50GB||Yes|
||Shared space for your research group||1TB per research group||Yes|
||Slow, long-term archive||2TB per research group||Yes, mirrored to identical storage box|
Ultimately it depends on your style of working and the space requirements of the project. Here is one suggestion on what to keep where:
- Local installations
- Small reference datasets used very often
- Lab space:
- Any data shared by you and your research group
- Collaborative projects
- Reference datasets/databases
- Your current working dataset
- Raw data for current analysis
- Intermediate files created in processing data
- Unpolished results
- Finished research projects - results
- Research projects on hold - intermediate data
- Raw data in its original form
- Anything of considerable size which is not to be used for a few months or more
For big files the best option is gzip, when it is done zipping it will append the .gz extension and delete the original file
you can use gzipped file without uncompressing them with some programs. There are also versions of for example less (to read the file), cat (to print it to the console or concatenate with other files) and grep (searching a file). These are called zless, zcat and zgrep.
zcat data/lane1.fastq.gz | fastx_clipper ...
zgrep "search-term" data/lane1.fastq.gz
To compress a directory of many small files it is better to use a tar archive instead of compressing all individual files one by one. It will create a new file which contains the directory and can automatically gzip the file if you supply the -z option to the tar program.
tar -czvf my_directory.tar.gz my_directory
You can also use zip which may be more familiar to windows users
zip -r ~/archive/2012-02-21-my-analysis.zip ~/work-weekly/2012-02-21-my-analysis/
You need to know your target (which folder or file you want your link to go to) and what name you wish to give the link.
The command is
$ ln -s target_name link_name
$ ln -s /home/btw123/target_file link_file
- This creates a link (or shortcut if you prefer) called
link_filein the current directory that points to
You can use "
scp" top copy the files onto the server:
scp my.fastq.gz firstname.lastname@example.org:~/btw000/temp/my.fastq.gz
Alternatively, use a program with a graphical user interface such as CyberDuck or Filezilla.
- Open CyberDuck
- Input login.hpc.qmul.ac.uk as server
- Input Apocrita username and password
You should see a your home directory on Apocrita and you can now drag and drop files to and from your desktop
- Yes - about 100GB! You can access it using the variable
- So if you wanted to create a file in this directory you would use the path:
- The $TMPDIR directory is cleaned up after your job has finished.
- The easiest way is to request installation by using the form here.
- You can also install it yourself in your home directory. You will need to use the full path to access the executable, or add it to your