Installing bcl2fastq from source without bonkers symlinking or copying libraries to other directories.

 

Between my final year of PhD (when I was also working as IBERS and IMAPS HPC SysAdmin) my life consisted of installing software into a module type system since HPC environments have multiple pieces of software installed, often the same software but with different versions and you fundamentally do not use a package manager for user software as it'll cause you a world of pain. One annoyance I always found was when you look for help when a piece of software fails to compile and you get an answer that boils down to;

yum install PACKAGE 

or

apt-get install PACKAGE

And this happened today. bcl2fastq, the Illumina software is all kinds of fun and games (USING MAKE FILES TO DEMULTIPLEX RAW DATA???? WHY?????). The install instructions are your usual ./configure/make....so you do your ./configure and all is well until you get the following error...

boost-1_44_0 installed successfully
-- Successfuly built boost 1.44.0 from the distribution package...
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- Looking for floorf
-- Looking for floorf - found
-- Looking for round
-- Looking for round - found
-- Looking for roundf
-- Looking for roundf - found
-- Looking for powf
-- Looking for powf - found
-- Looking for erf
-- Looking for erf - found
-- Looking for erf
-- Looking for erf - found
-- Looking for erfc
-- Looking for erfc - found
-- Looking for erfc
-- Looking for erfc - found
CMake Error at cmake/cxxConfigure.cmake:74 (message):
  No support for gzip compression
Call Stack (most recent call first):
  c++/CMakeLists.txt:33 (include)


-- Configuring incomplete, errors occurred!
Couldn't configure the project:

/software/testing/bcl2fastq/1.8.4/build/bootstrap/bin/cmake -H"/software/testing/bcl2fastq/1.8.4/src/bcl2fastq/src" -B"/software/testing/bcl2fastq/1.8.4/build" -G"Unix Makefiles"  -DCASAVA_PREFIX:PATH=/software/testing/bcl2fastq/1.8.4/x86_64 -DCASAVA_EXEC_PREFIX:PATH= -DCMAKE_INSTALL_PREFIX:PATH=/software/testing/bcl2fastq/1.8.4/x86_64 -DCASAVA_BINDIR:PATH= -DCASAVA_LIBDIR:PATH= -DCASAVA_LIBEXECDIR:PATH= -DCASAVA_INCLUDEDIR:PATH= -DCASAVA_DATADIR:PATH= -DCASAVA_DOCDIR:PATH= -DCASAVA_MANDIR:PATH= -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo

Moving CMakeCache.txt to CMakeCache.txt.removed

My first thought was to quickly look to see if we have libz libraries installed on the HPC as a module, and they're not. Fair enough. I then wondered why there wasn't a libz library already installed by the OS, and there is but it seems to be different between the software node (a node dedicated to installing software so as not to annoy folk who are on the login node) and the compute nodes. So pointing to /lib64 would probably not work (it might if bcl2fastq is doing a static binary, I've not checked).

[[email protected]]$ locate libz
/lib64/libz.so.1
/lib64/libz.so.1.2.3
[[email protected]]$ locate libz
-bash: locate: command not found
[[email protected]]$ echo "grrr"
grrr
[[email protected]]$ ls -lath /lib64/libz*
lrwxrwxrwx 1 root root  13 Dec 13 14:09 /lib64/libz.so.1 -> libz.so.1.2.7
-rwxr-xr-x 1 root root 89K Nov  5 18:09 /lib64/libz.so.1.2.7

Okay so after a quick google I get the same old hacky responses;

https://biogist.wordpress.com/2012/10/23/casava-1-8-2-installation/

https://www.biostars.org/p/11202/

http://seqanswers.com/forums/showthread.php?t=11106

My next thought then was how about I just install libz from source. So,

[[email protected]]$ source git-1.8.1.2
[[email protected]]$ which git
[[email protected]]$ git clone https://github.com/madler/zlib.git

YES, EVEN GIT HAS VERSIONS!!! yum/apt isn't the answer to everything!

[[email protected] ]$ cd zlib/
[[email protected] ]$ ./configure
[[email protected] ]$ make -j4
[[email protected] ]$ ls -lath libz*s*
lrwxrwx--- 1 martin JIC_c1 14 Jan 27 16:11 libz.so.1 -> libz.so.1.2.11
lrwxrwx--- 1 martin JIC_c1 14 Jan 27 16:11 libz.so -> libz.so.1.2.11
-rwxrwx--x 1 martin JIC_c1 103K Jan 27 16:11 libz.so.1.2.11

And that's great. We now have our libraries compiled, just need to let my bash shell know where they are;

export LIBRARY_PATH=/software/testing/bcl2fastq/1.8.4/lib/zlib

and then back into my bcl2fastq build directory, rerun ./configure --prefix=/where/the/bins/go and it compiled.

All done without a yummy apt....it's Friday and I need to go home.

Weird awk error when messing around with making a GFF from a TXT file

 

Strange error when trying to mess around with a text file created from STATA in Windows. When using awk to create the 9 column GFF file to use with SignalMap awk goes a little weird. The original data looks like this;

CHR1	101	.17989999
CHR1	151	.083400011
CHR1	301	-.125
CHR1	451	0
CHR1	501	.16670001
CHR1	601	.69999999
CHR1	651	.33329999
CHR1	751	.75
CHR1	801	0
CHR1	901	.25099999

And when you try to create a GFF file using awk, it goes weird like this;

[[email protected]]$ head Sample.txt |awk '{if($3>=0) print $1"\t.\tSAMPLE\t"$2"\t"$2+49"\t"$3"\t.\t.\t."}'
CHR1	.	.AMPLE	.01	150	.17989999
CHR1	.	.AMPLE	.51	200	.083400011
CHR1	.	.AMPLE	.51	500	0
CHR1	.	.AMPLE	.01	550	.16670001
CHR1	.	.AMPLE	.01	650	.69999999
CHR1	.	.AMPLE	.51	700	.33329999
CHR1	.	.AMPLE	.51	800	.75
CHR1	.	.AMPLE	.01	850	0
CHR1	.	.AMPLE	.01	950	.25099999

After 10 mins of banging my head on the table I realised that it was probably something to do with Windows/Unix formatting. So this solved it;

[[email protected]]$ dos2unix -n sample.txt sample_new.txt
[[email protected]]$ head sample_new.txt |awk '{if($3>=0) print $1"\t.\tSAMPLE\t"$2"\t"$2+49"\t"$3"\t.\t."}'
CHR1	.	SAMPLE 101	150	.17989999	.	.
CHR1	.	SAMPLE 151	200	.083400011	.	.
CHR1	.	SAMPLE 451	500	0	.	.
CHR1	.	SAMPLE 501	550	.16670001	.	.
CHR1	.	SAMPLE 601	650	.69999999	.	.
CHR1	.	SAMPLE 651	700	.33329999	.	.
CHR1	.	SAMPLE 751	800	.75	.	.
CHR1	.	SAMPLE 801	850	0	.	.
CHR1	.	SAMPLE 901	950	.25099999	.	.

Getting SignalMap GFF files into IGV

You can already load a GFF file in IGV

undefined

This will allow you to load your file into IGV, however it will be slow especially for many tracks. So you may wish to convert the GFF to a bedgraph file, and then from there create a TDF file.

Convert GFF into a bedgraph file

cat input.gff | awk '{print $1"\t"$4-1"\t"$5"\t"$6}' > output.bed

and then convert this bed file to a TDF file which can be done using igvtIGV. 

In IGV, go to the 'Tools' -> 'Run igvtools' menu at the top and you will get the following box;

undefined

Ensure the 'Command' is set to "toTDF", set the 'Input File' to your bedfile, the 'Output File' will be filled in automagically (unless you want to change it) and then set the 'Zoom Levels' to 10.

This will create a new tdf file which is much smaller than the bedfile and contains the histogram you would have had in SignalMap.

NOTE: Still to write up

  • Give visuals of signal map/IGV
  • Command-line method of doing this

 

Cheat Sheet

 

My ever-growing list of commands that I use often enough to need to write down but not often enough that I can remember it. Maybe someone else will find it useful.

Get some data from SRA

##download the data using the stupid SRA toolkit

./prefetch -V SRR3311821

##Extract the fastq data. I always run with --split-files even if I know it's single-end reads

~/bin/sratoolkit.2.8.1-ubuntu64/bin/fastq-dump --split-files -v SRR3311821.sra

Sorting and Indexing bamfiles for IGV

source samtools-1.3.1

samtools sort [email protected] 4 accepted_hits.bam -o accepted_hits_sorted.bam

samtools index accepted_hits_sorted.bam

 

 

 

Newer posts → Home ← Older posts