Molecular models of a protein’s structure can give detailed insight into mechanisms underlying its function, especially when viewed combined with sequence features.
In theory, 3D structural models are now available for many proteins, however in practice it is often complex to find all appropriate models and view them with sequence features. Thus it was developed Aquaria, a new web resource that provides 49 million pre-calculated structural models using homology from sequence to structure – 10 times more than currently available from other resources.
Using Aquaria we surveyed not only the visible proteome, but also the ‘unknown’ or ‘dark’ proteome, i.e., regions of proteins that remain stubbornly inaccessible to both experimental structure determination and modeling. Building upon a recent structural modeling study covering 546,000 proteins across many organisms, it was found 44–54% of the proteome in eukaryotes and viruses is dark, compared with only 14% for archaea and bacteria. Surprisingly, most dark proteins cannot be accounted for by (expected) conventional explanations, beside that the dark proteome has unexpected features.
Therefore, this work suggests several new directions for research in structural and computational biology. This work will help focus future research efforts to shed light on the remaining dark proteome.