SClassify
Supervised Protein Family Classification and New Family Construction
SClassify is a supervised protein family classification algorithm that overcomes the problems of existing supervised and unsupervised algorithms and achieves much improved accuracy. It can assign proteins to existing families in databases, and by taking into account similarities between the unclassified proteins, can assign them to new families.
Installation
The SClassify source code, including sample input and output files, can be compiled under the Unix/Linux/Windows(Cygwin) environment. The following steps will create a directory called sclassify. Detailed usage of SClassify is provided in a README file.
- gunzip sclassify.tar.gz
- tar xvf sclassify.tar
- cd sclassify
- ./install
How to use
The program assumes that e-values between each unclassified protein and each protein in existing families and e-values between each pair of unclassified proteins have already been obtained by other software such as BLAST or SSEARCH. The following files are needed:
-
A file that lists the name of each protein in existing families along with the name of its family in a two-column tab-separated format (example file: pfam.list).
-
A file that lists the name of each unclassified protein in a one-column format (example file: test.list).
-
A file that lists the e-values between each unclassified protein and each protein in existing families in a three-column tab-separated format that gives the name of an unclassified protein, the name of a protein in an existing family, and the e-value between them. There is no need to have an e-value for each pair if some of them are missing. The file is optional (example files: blast/test_pfam.score, ssearch/test_pfam.score).
-
A file that lists the e-values between each pair of unclassified proteins in a three-column tab-separated format that gives the names of two unclassified proteins and the e-value between them. There is no need to have an e-value for each pair if some of them are missing. The file is optional (example files: blast/test_test.score, ssearch/test_test.score).
USAGE
./sclassify -c infile1 -u infile2 -p infile3 -n infile4 -e cutoff -o outfile
where infile1 to infile4 are the input files described above, cutoff is the e-value cutoff, and outfile is the output file.
EXAMPLES