DisProt Help

Linking to DisProt
Every protein in the DisProt database can be viewed in HTML format directly via
http://www.disprot.org/protein.php?id=disprot_id.
The same information can be accessed in XML format via
http://www.disprot.org/protein.php?id=disprot_id&view=xml
and in FASTA format via
http://www.disprot.org/protein.php?id=disprot_id&view=fasta
A disprot_id is a 7-character long string consisting of DP followed by the 5-digit protein number. For example: DP00016.




Fasta Format
FASTA format consists of one line header followed by a corresponding amino acids sequence, coded using the single-letter amino acid code.

Example:

>DisProt|DP00036|sp|P22121|gi|123686|pir|S13365 #1-195:Protein-DNA binding:Transactivation (transcriptional activation)
#217-222:Protein-DNA binding:Transactivation (transcriptional activation) #260-276:Protein-DNA binding:Transactivation (transcriptional activation)
#268-271:Protein-DNA binding:Transactivation (transcriptional activation) 
MGHNDSVETMDEISNPNNILLPHDGTGLDATGISGSQEPYGMVDVLNPDSLKDDSNVDEPLIEDIVNPSLDPEGVVSAEP
SNEVGTPLLQQPISLDHVITRPASAGGVYSIGNSSTSSAAKLSDGDLTNATDPLLNNAHGHGQPSSESQSHSNGYHKQGQ
SQQPLLSLNKRKLLAKAHVDKHHSKKKLSTTRARPAFVNKLWSMVNDKSNEKFIHWSTSGESIVVPNRERFVQEVLPKYF
KHSNFASFVRQLNMYGWHKVQDVKSGSMLSNNDSRWEFENENFKRGKEYLLENIVRQKSNTNILGGTTNAEVDIHILLNE
LETVKYNQLAIAEDLKRITKDNEMLWKENMMARERHQSQQQVLEKLLRFLSSVFGPNSAKTIGNGFQPDLIHELSDMQVN
HMSNNNHNNTGNINPNAYHNETDDPMANVFGPLTPTDQGKVPLQDYKLRPRLLLKNRSMSSSSSSNLNQRQSPQNRIVGQ
SPPPQQQQQQQQQQGQPQGQQFSYPIQGGNQMMNQLGSPIGTQVGSPVGSQYGNQYGNQYSNQFGNQLQQQTSRPALHHG
SNGEIRELTPSIVSSDSPDPAFFQDLQNNIDKQEESIQEIQDWITKLNPGPGEDGNTPIFPELNMPSYFANTGGSGQSEQ
PSDYGDSQIEELRNSRLHEPDRSFEEKNNGQKRRRAA

Disordered regions are denoted by symbol "#" in format:

#<starting residue>-<ending residue>

In the example above residues 1 to 195, 217 to 222, 260-276, and 268 to 271 are disordered.

Ordered (structurally determined) parts of proteins are denoted by symbol "&" in format:

<starting residue>-<ending residue>

The functional class(es) and subclass(es) (if known) of each structurally determined region follow
the starting-ending residue. Functional classes are denoted by the symbol "*", functional subclasses
are denoted by the symbol ":". For example:

<starting residue>-<ending residue>*Molecular recognition effectors:Protein-protein binding

Structurally undetermined are all remaining residues (posibly containing very short disordered regions).

In the example above residues 197 to 216, 223 to 259, and 277 to 677 are structurally undetermined.


The Single-Letter Amino Acid Code
Code Amino Acid Code
AAlanineAla
CCysteineCys
DAspartic AcidAsp
EGlutamic AcidGlu
FPhenylalaninePhe
GGlycineGly
HHistidineHis
IIsoleucineIle
KLysineLys
LLeucineLeu
MMethionineMet
NAsparagineAsn
PProlinePro
QGlutamineGln
RArginineArg
SSerineSer
TThreonineThr
VValineVal
WTryptophanTrp
YTyrosineTyr



Homologues
Homologues are obtained using the CD-HIT clustering program with a 50% identity threshold.

References:
"Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2001) 17:282-283
"Tolerating some redundancy significantly speeds up clustering of large protein databases", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2002) 18:77-82



Disprot-footer
Contact us