Monday, November 30, 2015

5C bed file data format

5C and 3C are the newer technologies in sequencing where the chromatin inetraction data can be obtained. If you looking for such data and happen to download from UCSC genome browser, it may be hard to look around for format describing the fields. We asked the authors and here is the explanation:

The site from which you may download data may be this:

BED  file format descrition can be found from : 

Here is a sample data for GM12878 cell line:

chr22   31998728        33247041        5C_301_ENm004_FOR_292.5C_301_ENm004_REV_
32      1000    .       31998728        33247041        0       2       12744,40
98,     0,1244215,
chr5    131346229       132145236       5C_299_ENm002_FOR_241.5C_299_ENm002_REV_
33      1000    .       131346229       132145236       0       2       2609,210
5,      0,796902,

col1: Chromosome name
col2: Chromosome start
col3: chromosome end
col4: Name of the interacting sites (primer names)
col7: chromosome start
col8: chromosome end
col11: block sizes in comma separated list
col12: block offset in comma separated list

Now I will explain what col11 and col12 means...

the beginning of interacting site is the cromosome start and the beginning of offset is 0.

So, the interacting site begins at 31998728 + 0 and the interacting block length is 12744.

The beginning position of interacting site 2 is: 31998728 + 1244215 = 33242943
 The size of interacting block 2 is 4098. so, end of interacting site is 33242943 + 4098 = 33247041.

Here is a diagrammatic representation:

Monday, November 23, 2015

Algal Biotechnology Workshop at IIT Mumbai on 21st Nov 2015

It was an insightful workshop on algal biotechnology at VMCC hall, IIT Mumbai during 21st November 2015. The organizers managed to have the world leaders as speakers in this area. The workshop started with handing over the materials to the participants followed by welcome address by Dr. Wangikar from IIT Mumbai followed by an insightful talk by Dr. Santanu Dasgupta from Reliance Industries.

Summaries of some of the interesting talks are discussed here:

De. Duu-long Jee from Department of Chemical Engineering, National Taiwan University:
Lutein, one of the 600 naturally occurring carotenoids is abundantly found in marigold flowers as well as in Micro-algae. Dr. jee presented an overview of cost-effectiveness of Lutein production with microalgae vs marigold flowers. Although microalgae produces about 3-4 times more lutein compared to Marigold flowers, but the energy required to extract those from micro-alga makes it an expansive option. Marigold on the other hand needs less nutrient, less power but more space... So, there is a need for engineering micro-algae that can have enhanced Lutein production with lesser energy dependence for extraction.

John Beardall, Monash University:

Extremophiles will play a major role in algal biotechnology, since they have altered metabolism. It is a well known fact that CO2 is sequestered in algae to enhance growth. But growth and lipid accumulation are oxymoron. Don't happen at the same time. They have explored media as a way for determining what favors the optimized fatty acid production. Their observation indicates that some micro-algae grow really well in media with altered  source of carbon (glycerol and xylose) and also produce optimal fatty acid. Myxotrophic growth is favored for higher fatty acid production.

Jo-Shu Chang, Department of chemical engineering, National Cheng Kung University, Tanian, Taiwan:

Talked about CO2 sequestration by micro-algae and production of economically important compounds. He discussed about major energy components from micro-algae as Butanol, ethanol, H2, Diols, lactic acids and succinic acids. The effluent gas composed of 23.1% CO2, SOx 85 ppm, NOx 75 ppm and at temperature of 230C can be used for growing microalgae. Burkholderia (a proteo bacter) can be used for lipase production.

Min S. Park, Advanced Biomass R & D Center, BioEnergy Engineering and Research Laboratories, Dept. of chemical and Biomolecular Engineering, Daejeon, Republic of Korea:

Nanocloropsis is the choicest microalgae used for studying bioenergy production. These organisms have lipid droplets in their chloroplast. They have done series of signalling work involving Nanochloropsis and came to conclusion that JNK type of MAPK was highly activated under osmotic stress. NaCl induces osmotic stress -> acts upon MAPKK -> acts on MAPK -> represses Transcription factor -> inducing lipid production. They also observed that lipid production is inhibited by treatment of MEK specific inhibitor. The microbial culture community comprising the treatment plants mostly contained scenedesmus, Golenkinia, Microspora, Micractinium etc.

Jong Moon Park, Department of Chemical Engineering, School of Environmental Science and Engineering, Division of Advanced Nuclear Engineering, POSTECH, Republic of Korea:

He presented 2 different aspects of Bio-enegy production: 1. Enhanced fatty acid production from microalgae and ethanol production of Cyanobacteria.
In Cyanobacteria, they have used several approaches for enhancing ethanol production directly by manipulating few enzymes. One is glucose-6-phosphate 1-dehydrogenase, encoded by zwf and the other is Pdc.  His admission is that ethanol from these engineered bacteria is released out of the cell and hence is not dangerous for the organism itself.
His notable work is also on microalgae where they have used food waste water and municipal sludge as one of the combinations for optimal growth of microalgae. He has also suggested that the municipality wate or food waste water can be diluted 20 times for growing micro-algae in them.
Chlorella was used for bio-diesel production.
Article look up are: Dexter and Fu, 2009; Li, C. 2015 for ethanol from Cyanobacteria.

Apart from this there were many more interesting talks, that I am not delving upon here. So, in all, everyone is looking for a breakthrough in growing these organisms faster and producing fatty acids quickly....

Thursday, November 5, 2015

Installing R packages that use shared library in Linux

Many R packages use scripts (or libraries) written in other languages like C, FORTRAN etc from shared libraries. Normally the main scripts (and their dependencies like header files(.h files)) are kept in the src directory inside the package. During installation of the package from the source file(.gz) using R CMD INSTALL somepackage.tar.gzthe scripts are compiled and generates some shared objects in the local directory which dynamically links to the shared library (to file) which is generally /usr/local/lib. This linking happens through some configuration file (/etc/ ) and some environment variables (e.g. LD_LIBRARY_PATH). Often the conf file and the environment variable does not contain the path of the shared library (mostly happens when users use their own shared library instead of the default) and thus during installation it shows the error:  "shared object not found.... no such file or directory".

one way to solve this problem is problem is to run the ldconfig (or /sbin/ldconfig) commands(preferably in verbose mode(-v) ).  This program creates the required links and cache to the most recent shared libraries.


I faced similar type of error "shared object not found.... no such file or directory" during installation of the package fftwtools (R CMD INSTALL fftwtools.tar.gz). The steps I followed to fix the problem are:

1. error obsereved : can not open .../fftwtools/src/ ...  no such file or directory.

2. located the file using:  locate (to be sure that the file exists)



3. run /sbin/ldconfig -v

/sbin/ldconfig: Path `/lib64' given more than once

/sbin/ldconfig: Path `/usr/lib64' given more than once

/opt/bio/EMBOSS/lib: -> ->
/lib: -> ->
/lib64: -> ->
/usr/local/lib: -> ->
......... -> ->
4. Install the package:  R CMD INSTALL fftwtools.tar.gz

Hope the steps works for you. I will be happy to answer any queries regarding this issue. Thanks a lot for reading the post.