Biocuration Insights: UniProt

Biocurators organize biological literature and data into reusable databases and resources that enable researchers to build on past findings, compare results across studies and species, and focus their time on critical scientific questions and drive new research. In many ways, biocurators are the unsung heroes of scientific progress. Therefore, we’re kicking off a series to highlight these efforts.

Our first highlighted resource in this series is the Universal Protein Resource, better known as UniProt (https://www.uniprot.org).

UniProt is a global, freely accessible protein knowledge resource that underpins research across biology, medicine, and biotechnology by combining expert curation, computational methods, and community input to deliver accurate, current, and usable protein information. These biocuration efforts transform large-scale protein data into reliable biological knowledge by carefully selecting high-quality reference proteomes, rigorously extracting experimental evidence from the literature, and structuring representation using interoperable vocabularies and ontologies. UniProt maintains its strong focus on usability—through intuitive search, navigation, and integrated analysis tools—allow researchers to move seamlessly between curated knowledge and methods such as sequence searches, alignments, peptide analysis, and identifier mapping. By integrating community contributions and machine-learning–assisted workflows under expert oversight, this work highlights biocuration as a collaborative, evolving practice essential for understanding biology at scale.

The first paper we’re highlighting is UniProt: the Universal Protein Knowledgebase in 2025 – it provides a foundational infrastructure update—high relevance to ISB community:

  • Core Biocuration Contribution: systematic overhaul of UniProt pipelines, limiting to high‑quality reference proteomes; combines expert annotation, ORCID‑tracked community submissions, and machine‑learning frameworks (UniRule, ProtNLM) to expand functional data and QC.
  • Key Methods:
    • BUSCO‑driven QC & reference‑proteome selection
    • Expanded UniRule & PANTHER rule sets; LLM‑based ProtNLM function naming
    • New Genomics tab linking proteins to genome coordinates
    • Community curation via ORCID submissions
  • Resources Resused: UniProtKB/Swiss‑Prot, UniProtKB/TrEMBL, UniParc, UniRef, Gene Ontology, ChEBI, Rhea, GO‑CAM, InterPro, ProtVista, Complex Portal
  • Impact/Applications: provides a trusted, FAIR backbone for AI/omics research, drug discovery and database interoperability; recognised as a Global Core Biodata Resource.
  • Strengths: Combines expert & ML curation; robust QC; large‑scale reach.
  • Caveats/Limitations: ML predictions need curator validation; initial drop in TrEMBL size.

The second paper we want to highlight is The UniProt Consortium. Searching and navigating UniProt databases (2023):

  • Core Biocuration Contribution: Peer‑reviewed tutorial standardising discovery of curated UniProt knowledge; boosts accessibility and reproducibility.
  • Key Methods:
    • Basic & advanced search protocols with Boolean logic and field filters
    • Demonstrates integration with analysis tools
    • Emphasises FAIR API endpoints and query syntax
  • Resources Reused: UniProtKB, UniRef, UniParc, Proteomes dataset selector, BLAST, Align, ID‑Mapping, REST & SPARQL APIs
  • Impact/Applications: empowers users to retrieve accurate annotations, enabling reproducible data mining and training
  • Strengths: Clear, screenshot‑rich, modular; open access.
  • Caveat/Limitations: Instructional—no new biological data; UI changes may date examples.

We hope you enjoyed this quick-and-dirty summary of two recent papers. Want ISB to highlight your work? Check out this form.

Leave a Reply

Search by Categories