5C and 3C are the newer technologies in sequencing where the chromatin inetraction data can be obtained. If you looking for such data and happen to download from UCSC genome browser, it may be hard to look around for format describing the fields. We asked the authors and here is the explanation:
The site from which you may download data may be this: https://www.encodeproject.org/experiments/ENCSR000CYD/
BED file format descrition can be found from : https://genome.ucsc.edu/FAQ/ FAQformat.html#format1
Here is a sample data for GM12878 cell line:
The site from which you may download data may be this: https://www.encodeproject.org/experiments/ENCSR000CYD/
BED file format descrition can be found from : https://genome.ucsc.edu/FAQ/
Here is a sample data for GM12878 cell line:
chr22 31998728 33247041 5C_301_ENm004_FOR_292.5C_301_ ENm004_REV_
32 1000 . 31998728 33247041 0 2 12744,40
98, 0,1244215,
chr5 131346229 132145236 5C_299_ENm002_FOR_241.5C_299_ ENm002_REV_
33 1000 . 131346229 132145236 0 2 2609,210
5, 0,796902,
col1: Chromosome name
col2: Chromosome start
col3: chromosome end
col4: Name of the interacting sites (primer names)
col5:
col7: chromosome start
col8: chromosome end
col11: block sizes in comma separated list
col12: block offset in comma separated list
Now I will explain what col11 and col12 means...
the beginning of interacting site is the cromosome start and the beginning of offset is 0.
So, the interacting site begins at 31998728 + 0 and the interacting block length is 12744.
The beginning position of interacting site 2 is: 31998728 + 1244215 = 33242943
The size of interacting block 2 is 4098. so, end of interacting site is 33242943 + 4098 = 33247041.
Here is a diagrammatic representation:
Thanks. It helps a lot.
ReplyDelete