diff --git a/schedule.xlsx b/schedule.xlsx index 2943319c..5902c165 100644 Binary files a/schedule.xlsx and b/schedule.xlsx differ diff --git a/topics/hpc/intro/lab_uppmax_intro.qmd b/topics/hpc/intro/lab_hpc_intro.qmd similarity index 74% rename from topics/hpc/intro/lab_uppmax_intro.qmd rename to topics/hpc/intro/lab_hpc_intro.qmd index 92704139..c6500d27 100644 --- a/topics/hpc/intro/lab_uppmax_intro.qmd +++ b/topics/hpc/intro/lab_hpc_intro.qmd @@ -1,6 +1,6 @@ --- -title: 'Uppmax Introduction' -subtitle: "High-performance computing cluster: UPPMAX" +title: 'HPC Introduction' +subtitle: "High-performance computing cluster: Dardel" author: 'Martin Dahlö' format: html --- @@ -48,7 +48,7 @@ ssh -Y nid001009 If the list is empty you can run the allocation command again and it should be in the list: ```bash -salloc -A `r id_project` -t 04:00:00 -p shared -n 4 +salloc -A `r id_project` -t 04:00:00 -p shared -c 4 ``` # Copy files for lab @@ -56,7 +56,7 @@ salloc -A `r id_project` -t 04:00:00 -p shared -n 4 Now, you will need some files. To avoid all the course participants editing the same file all at once, undoing each other's edits, each participant will get their own copy of the needed files. The files are located in the folder ```bash -`r path_resources`/linux/uppmax_tutorial +`r path_resources`/linux/hpc_tutorial ``` Next, copy the lab files from this folder. `-r` means recursively, which means all the files including sub-folders of the source folder. Without it, only files directly in the source folder would be copied, **NOT** sub-folders and files in sub-folders. @@ -67,20 +67,20 @@ Next, copy the lab files from this folder. `-r` means recursively, which means a # syntax cp -r -cp -r `r path_resources`/linux/uppmax_tutorial `r path_workspace` +cp -r `r path_resources`/linux/hpc_tutorial `r path_workspace` ``` Have a look in the folder you just copied ```bash -user@login1 ~ $ cd `r path_workspace`/uppmax_tutorial +user@login1 ~ $ cd `r path_workspace`/hpc_tutorial -user@login1 uppmax_tutorial $ ll +user@login1 hpc_tutorial $ ll total 128K drwxrwxr-x 2 user user 2,0K May 18 16:21 . drwxrwxr-x 4 user user 2,0K May 18 15:34 .. -rwxrwxr-x 1 user user 1,2K May 18 16:21 data.bam -rw-rw-r-- 1 user user 232 May 18 16:21 job_template -user@login1 uppmax_tutorial $ +user@login1 hpc_tutorial $ ``` # Run programs @@ -96,20 +96,20 @@ less data.bam Not so pretty.. Luckily for us, there is a program called [samtools](http://www.htslib.org/) that is made for reading BAM files. To use it on PDC we must first load the module for `samtools`. Try starting samtools before loading the module. ```bash -user@login1 uppmax_tutorial $ samtools +user@login1 hpc_tutorial $ samtools -bash: samtools: command not found ``` That did not work, try it again after loading the module: ```bash -user@login1 uppmax_tutorial $ module load bioinfo-tools samtools/1.10 -Message: NOTE: The modules made available by loding this module are all UPPMAX legacy, please consider loading PDC installed modules when available, as they are optimized for working on Dardel +user@login1 hpc_tutorial $ module load bioinfo-tools samtools/1.20 +Message: NOTE: The modules made available by loding this module are all PDC legacy, please consider loading PDC installed modules when available, as they are optimized for working on Dardel -user@login1 uppmax_tutorial $ samtools +user@login1 hpc_tutorial $ samtools Program: samtools (Tools for alignments in the SAM format) -Version: 1.10 (using htslib 1.10) +Version: 1.20 (using htslib 1.20) Usage: samtools [options] @@ -127,10 +127,12 @@ Commands: targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates + ampliconclip clip oligos from the end of reads -- File operations collate shuffle and group alignments by name cat concatenate BAMs + consensus produce a consensus Pileup/FASTA/FASTQ merge merge sorted alignments mpileup multi-way pileup sort sort alignment file @@ -138,6 +140,9 @@ Commands: quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA + import Converts FASTA or FASTQ files to SAM/BAM/CRAM + reference Generates a reference from aligned data + reset Reverts aligner changes in reads -- Statistics bedcov read depth per BED region @@ -145,23 +150,31 @@ Commands: depth compute the depth flagstat simple stats idxstats BAM index stats + cram-size list CRAM Content-ID and Data-Series sizes phase phase heterozygotes stats generate stats (former bamcheck) + ampliconstats generate amplicon specific stats -- Viewing flags explain BAM flags + head header viewer tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM + samples list the samples in a set of SAM/BAM/CRAM files + + -- Misc + help [cmd] display this help message or help for [cmd] + version detailed version information ``` -{{< fa exclamation-circle >}} All modules are unloaded when you disconnect from UPPMAX, so you will have to load the modules again every time you log in. If you load a module in a terminal window, it will not affect the modules you have loaded in another terminal window, even if both terminals are connected to UPPMAX. Each terminal is independent of the others. +{{< fa exclamation-circle >}} All modules are unloaded when you disconnect from PDC, so you will have to load the modules again every time you log in. If you load a module in a terminal window, it will not affect the modules you have loaded in another terminal window, even if both terminals are connected to PDC. Each terminal is independent of the others. To use samtools to view a BAM file, use the following line. ```bash -user@login1 uppmax_tutorial $ samtools view -h data.bam +user@login1 hpc_tutorial $ samtools view -h data.bam @HD VN:1.0 SO:coordinate @SQ SN:chr1 LN:249250621 @@ -216,13 +229,26 @@ Try deleting the whole last line in the file, save it, and exit `nano`. To view which module you have loaded at the moment, type ```bash -user@login1 uppmax_tutorial $ module list +user@login1 hpc_tutorial $ module list Currently Loaded Modules: - 1) uppmax 2) bioinfo-tools 3) samtools/1.10 + 1) craype-x86-rome 11) PrgEnv-cray/8.5.0 + 2) libfabric/1.15.2.0 12) snic-env/1.0.0 + 3) craype-network-ofi 13) systemdefault/1.0.0 (S) + 4) perftools-base/23.12.0 14) bioinfo-tools + 5) xpmem/2.8.2-1.0_3.9__g84a27a5.shasta 15) ncurses/6.4 + 6) cce/17.0.0 16) bzip2/1.0.8 + 7) craype/2.7.30 17) xz/5.4.5 + 8) cray-dsmml/0.2.2 18) libdeflate/1.19 + 9) cray-mpich/8.1.28 19) htslib/1.20 + 10) cray-libsci/23.12.5 20) samtools/1.20 + + Where: + S: Module is Sticky, requires --force to unload or purge + ``` -Let's say that you want to make sure you are using the latest version samtools. Look at which version you have loaded at the moment (`samtools/1.10`). +Let's say that you want to make sure you are using the latest version samtools. Look at which version you have loaded at the moment (`samtools/1.20`). Now type @@ -230,7 +256,7 @@ Now type module avail ``` -to see which programs are available at UPPMAX. Can you find samtools in the list? Which is the latest version of samtools available at UPPMAX? +to see which programs are available at PDC. Can you find samtools in the list? Which is the latest version of samtools available at PDC? To change which samtools module you have loaded, you have to unload the the module you have loaded and then load the other module. To unload a module, use @@ -239,19 +265,19 @@ module unload ``` -Look in the list from `module list` to see the name of the module you want to unload. When the old module is unloaded, load `samtools/0.1.19` (or try with the latest samtools module!). +Look in the list from `module list` to see the name of the module you want to unload. When the old module is unloaded, load `samtools/1.15.1` (or try with the latest samtools module!). # Submitting a job Not all jobs are as small as converting this tiny BAM file to a SAM file. Usually the BAM files are several gigabytes, and can take hours to convert to SAM files. You will not have reserved nodes waiting for you to do something either, so running programs is done by submitting a job to the queue system. What you submit to the queue system is a script file that will be executed as soon as it reaches the front of the queue. The scripting language used in these scripts is **bash**, which is the same language as you usually use in a terminal i.e. everything so far in the lecture and lab has been in the bash language (`cd`, `ls`, `cp`, `mv`, etc.). -Have a look at **job_template** in your **uppmax_tutorial** folder. +Have a look at **job_template** in your **hpc_tutorial** folder. ```bash less job_template #! /bin/bash -l -#SBATCH -A gXXXXXXX +#SBATCH -A XXXXXXX #SBATCH -p shared #SBATCH -J Template_script #SBATCH -t 01:00:00 @@ -260,7 +286,7 @@ less job_template module load bioinfo-tools # go to some directory -cd /proj/gXXXXXXX/nobackup/ +cd /cfs/klemming/projects/supr/naiss2099-99-999 # do something echo Hello world! @@ -278,36 +304,21 @@ sbatch job_template # Job queue -If you want to know how your jobs are doing in the queue, you can check their status with `$ squeue -u username` or `jobinfo -u username`. +If you want to know how your jobs are doing in the queue, you can check their status with `$ squeue -u username` -Rewrite the previous sbatch file so that you book 3 days of time, and to use a node instead of a core. This will cause your job to stand in the queue for a bit longer, so that we can have a look at it while it is queuing. Submit it to the queue and run **`jobinfo`**. +Rewrite the previous sbatch file so that you book 3 days of time, and to use a node instead of a core. This will cause your job to stand in the queue for a bit longer, so that we can have a look at it while it is queuing. Submit it to the queue and run `squeue`. ```bash -jobinfo -u username - -CLUSTER: rackham -Running jobs: - JOBID PARTITION NAME USER ACCOUNT ST START_TIME TIME_LEFT NODES CPUS NODELIST(REASON) - 3134399 devcore user g20XXXXX R 2018-05-18T16:32:54 59:25 1 1 r483 - -Nodes in use: 462 -Nodes in devel, free to use: 2 -Nodes in other partitions, free to use: 4 -Nodes available, in total: 468 - -Nodes in test and repair: 13 -Nodes, otherwise out of service: 5 -Nodes, all in total: 486 - -Waiting jobs: - JOBID POS PARTITION NAME USER ACCOUNT ST START_TIME TIME_LEFT PRIORITY CPUS NODELIST(REASON) FEATURES DEPENDENCY - 3134401 221 core Template_script user g20XXXXX PD N/A 1:00:00 100000 1 (None) (null) +squeue -u username -Waiting bonus jobs: + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) +5104668 shared Template username PD 0:00 1 (Priority) ``` -If you look under the heading **"Waiting jobs:"** you'll see a list of all the jobs you have in the queue that have not started yet. The most interesting column here is the **POS** column, which tells you which position in the queue you have (221 in my example). When you reach the first place, your job will start as soon as there are the resources you have asked for. +Here we can see in the status column (`ST`)that the job is pending (`PD`) and has not started yet. The job is waiting for a node to become available. When the job starts, the status will change to `R` (running). If you run `squeue` again, you will see that the job is running. + +```bash In our case, we are not really interested in running this job at all. Let's cancel it instead. This can be done with the command **`scancel`**. Syntax: @@ -321,6 +332,8 @@ You see the job id number in the output from `jobinfo` or `squeue`. scancel 3134401 ``` +If you have a lot of jobs running, you can cancel all of them by using the command `scancel -u username`. + # Interactive jobs Sometimes it is more convenient to work interactively on a node instead of submitting your work as a job. Since you will not have the reservations we have during the course, you will have to book a node using the **`interactive`** command. Syntax: @@ -345,9 +358,9 @@ Extra material if you finish too fast. ## The devel queue -If it is a really big job, it might be in the queue for a day or two before it starts, so it is important to know that the first thing it does is not crashing because you made a typo on line 7. One way to test this is to open a new connection to UPPMAX, and line by line try your code. Copy-paste (Ctrl+Shift+c and Ctrl+Shift+v in the terminal window) to make sure it's really the code in the script you are trying. +If it is a really big job, it might be in the queue for a day or two before it starts, so it is important to know that the first thing it does is not crashing because you made a typo on line 7. One way to test this is to open a new connection to PDC, and line by line try your code. Copy-paste (Ctrl+Shift+c and Ctrl+Shift+v in the terminal window) to make sure it's really the code in the script you are trying. -If your script is longer than a couple of lines, this approach can be tiring. There are 12 nodes at UPPMAX that are dedicated to do quick test runs, which have a separate queue called **devel**. They are available for use more or less all the time since not very many are using them. To avoid people abusing the free nodes for their analysis, there is a **1 hour time limit** for jobs on them. To submit jobs to this short testing queue, change `-p` to devel instead of node or core, and make sure `-t` is set to **maximum 01:00:00**. Try submitting the samtools sbatch file we used earlier to the devel queue and run it again. +If your script is longer than a couple of lines, this approach can be tiring. There are 12 nodes at PDC that are dedicated to do quick test runs, which have a separate queue called **devel**. They are available for use more or less all the time since not very many are using them. To avoid people abusing the free nodes for their analysis, there is a **1 hour time limit** for jobs on them. To submit jobs to this short testing queue, change `-p` to devel instead of node or core, and make sure `-t` is set to **maximum 01:00:00**. Try submitting the samtools sbatch file we used earlier to the devel queue and run it again. ## Info about finished jobs diff --git a/topics/hpc/intro/slide_uppmax_intro.pdf b/topics/hpc/intro/slide_hpc_intro.pdf similarity index 100% rename from topics/hpc/intro/slide_uppmax_intro.pdf rename to topics/hpc/intro/slide_hpc_intro.pdf diff --git a/topics/other/assets/thinlinc_03.png b/topics/other/assets/thinlinc_03.png index 84b3a75b..afb8297b 100644 Binary files a/topics/other/assets/thinlinc_03.png and b/topics/other/assets/thinlinc_03.png differ