Monday, March 10, 2014

Installing EMBOSS GUI for your local data

EMBOSS is a great sequence analysis software that comes pre-packaged with a number of very useful programs. However, for those who are awry of using commandline options, there is good news. GUI applications are also available for EMBOSS. EMBOSS-GUI is one of the most popular GUI application that is widely used for interfacing EMBOSS programs. This program is developed by Luke McCarthy. The recent version of this GUI is called called as EMBOSS-explorer, has some bugs and I could not get it working in my system. However, I have a copy from the earlier release that has a stable version. You can write me for a copy if you want. First install it and the procedure is fairly simple. You may need certain privileges to be able to install this GUI. After installation you will get a file in your cg-bin directory that will have paths for the following directories:

print "Content-type: text/html\n\n";
init('/dir1', '/dir2', '/dir3', '', '')

where dir1 is where you have installed the GUI package.
          dir2 is where you have the binaries of EMBOSS program installed
          dir3 is the document root
          The next two are the http address for web site and the cgi-bin address.
After this is all in place your GUI should work fine.

Here is a catch, for some reason, in my last installation, the path I directed for installation and the actual path from where it was reading the GUI package was different. So, before doing any change first check which GUI is in path by writing a small perl script:

print $_, "\n" for @INC;
The GUI present in any of these paths are the ones that you are going to modify anyways.

After I have installed it, I would always want to have my own data to be read without user cop pasting the sequences. In figure-1 in the section where "To access a sequence from a database, enter the USA path here: (dbname:entry)" is written one can actually supply the name of the organism and particular sequence in a fasta file to be worked on. This is a value addition if you are dealing with a lot of sequences.I did a bit of poking around and figured out that in file which is located under your dir1/ (as listed in file) has several functions that actually runs the show. For instance the list of programs that are loaded are nothing but a call to EMBOSS program wossname. For GUI, several programs should be disabled since this will be computationally very intense. There is a $exclude option in where these programs can be named. directory etc.
All the GUI options are derived from the acd files that are stored under acd directory under /dir1. These acd files are named as program_name.acd. Now coming back to setting your files to be read internally, in go to the line where it is:
 elsif ($item =~ /^[\w-]+:/),
 change it to the way you like so that you can provide input string to be recognized. Set the path to your data. I replaced this string with: 

 elsif ($item =~ /\//) {                                  
 push @command, ("-$     param", $item);

Put your genome/sequence files under MYPATH.
Figure-1: Screen shot of EMBOSS GUI.

If for some reason the installation script for GUI does not work and you have got the GUI package, then do the manual installation the following way:

1. In the html directory place the path of the correctly (Mostly it should be in the cgi-bin directory)
2. Inside the cgi-bin directory, check the path for init() parameters as written above. Correct them accordingly.
3. Check if you are accessing the right EMBOSS::GUI module. For that you have to check @INC.
4. Inside the right EMBOSS directory, you have to place one acd directory, that is missing with this package. This acd directory can be obtained from the EMBOSS executable package. The acd directory should have a number of .acd files. These files are read for changing the forms for each of the EMBOSS programs.
5. Look for exclude file which should be inside GUI/data and put it under EMBOSS/GUI package.

Now probably you have