/* kvlsprot_id_acc_mapper ====================== This program requires one argument: The name of a file. The file should have swissprot/UniProt format. The program will read through the file and print the following columns (separated by tabs) to standard output: - Uniprot-ID ("ID") - accession-number ("AC") - boolean column: 't' means that the accession number is the primary accession number (if the -n argument has been given, the values in the third column will be 1/0) The output may be used to feed a relational database system, so that the two UniProt identifier types may be mapped to each other. Note that in UniProt, there may be more than one accession number associated to an entry. In other words, an ID may have several accession numbers. An accession number may be the primary accession number for more than one ID(!). Compilation on Linux (add "-O2" and "-march=pentiumpro", if you want to try to tune it a bit): g++ -o kvlsprot_id_acc_mapper kvlsprot_id_acc_mapper.cc - Or, if you have trouble running the binary across Linux versions, you may perform a static compilation: g++ -static -o kvlsprot_id_acc_mapper kvlsprot_id_acc_mapper.cc */ /* Copyright (C) 2005 Troels Arvin . This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ #include #include #include #include #include #include #include #include #include using namespace std; static void usage(char* arg0) { char* progname=basename(arg0); cerr << progname << ":" << endl << endl; cerr << "Reads through a swissprot/UniProt formatted file." << endl; cerr << "Outputs three columns (tab-delimited):" << endl; cerr << " - UniProt ID" << endl; cerr << " - Accession number" << endl; cerr << " - Boolean ('t'/'f'): Is the accession number the primary one?" << endl; cerr << "Note that there may be several lines per ID, because an ID" << endl; cerr << "may have more than one accession number. If an ID as more " << endl; cerr << "than one accession number, one of the accession numbers will" << endl; cerr << "be the 'primary' accession number. Unfortunately, it looks" << endl; cerr << "like an accession number may actually be the _primary_ accession" << endl; cerr << "number for more than one ID(!)." << endl << endl; cerr << "Usage:" << endl << basename(progname) << " [-n] " << endl << endl; cerr << "-n : use numeric booleans, i.e. print 0 instead of 'f', and " <