Biocuration Insights: UniProt

December 18, 2025December 18, 2025

Biocurators organize biological literature and data into reusable databases and resources that enable researchers to build on past findings, compare results across studies and species, and focus their time on critical scientific questions and drive new research. In many ways, biocurators are the unsung heroes of scientific progress. Therefore, we’re kicking off a series to highlight these efforts.

Our first highlighted resource in this series is the Universal Protein Resource, better known as UniProt (https://www.uniprot.org).

UniProt is a global, freely accessible protein knowledge resource that underpins research across biology, medicine, and biotechnology by combining expert curation, computational methods, and community input to deliver accurate, current, and usable protein information. These biocuration efforts transform large-scale protein data into reliable biological knowledge by carefully selecting high-quality reference proteomes, rigorously extracting experimental evidence from the literature, and structuring representation using interoperable vocabularies and ontologies. UniProt maintains its strong focus on usability—through intuitive search, navigation, and integrated analysis tools—allow researchers to move seamlessly between curated knowledge and methods such as sequence searches, alignments, peptide analysis, and identifier mapping. By integrating community contributions and machine-learning–assisted workflows under expert oversight, this work highlights biocuration as a collaborative, evolving practice essential for understanding biology at scale.

The first paper we’re highlighting is UniProt: the Universal Protein Knowledgebase in 2025 – it provides a foundational infrastructure update—high relevance to ISB community:

Core Biocuration Contribution: systematic overhaul of UniProt pipelines, limiting to high‑quality reference proteomes; combines expert annotation, ORCID‑tracked community submissions, and machine‑learning frameworks (UniRule, ProtNLM) to expand functional data and QC.
Key Methods:
- BUSCO‑driven QC & reference‑proteome selection
- Expanded UniRule & PANTHER rule sets; LLM‑based ProtNLM function naming
- New Genomics tab linking proteins to genome coordinates
- Community curation via ORCID submissions
Resources Resused: UniProtKB/Swiss‑Prot, UniProtKB/TrEMBL, UniParc, UniRef, Gene Ontology, ChEBI, Rhea, GO‑CAM, InterPro, ProtVista, Complex Portal
Impact/Applications: provides a trusted, FAIR backbone for AI/omics research, drug discovery and database interoperability; recognised as a Global Core Biodata Resource.
Strengths: Combines expert & ML curation; robust QC; large‑scale reach.
Caveats/Limitations: ML predictions need curator validation; initial drop in TrEMBL size.

The second paper we want to highlight is The UniProt Consortium. Searching and navigating UniProt databases (2023):

Core Biocuration Contribution: Peer‑reviewed tutorial standardising discovery of curated UniProt knowledge; boosts accessibility and reproducibility.
Key Methods:
- Basic & advanced search protocols with Boolean logic and field filters
- Demonstrates integration with analysis tools
- Emphasises FAIR API endpoints and query syntax
Resources Reused: UniProtKB, UniRef, UniParc, Proteomes dataset selector, BLAST, Align, ID‑Mapping, REST & SPARQL APIs
Impact/Applications: empowers users to retrieve accurate annotations, enabling reproducible data mining and training
Strengths: Clear, screenshot‑rich, modular; open access.
Caveat/Limitations: Instructional—no new biological data; UI changes may date examples.

We hope you enjoyed this quick-and-dirty summary of two recent papers. Want ISB to highlight your work? Check out this form.

Amos Bairoch, a biocurator at heart, passes away

December 15, 2025December 15, 2025

It is with great sadness that we learned the passing of Prof. Amos Bairoch on November 29^th, 2025. Amos was a deeply valued and admired colleague, as well as a cherished friend to many within the biocuration community. His remarkable blend of enthusiasm, intellect, energy, creativity, humor, and rigor fueled the numerous initiatives he led. His unwavering work ethic consistently resulted in work of exceptional quality.

Amos can be considered the original professional biocurator, even though he came about it in an accidental way – as all great breakthroughs. During his Ph.D. at the University of Geneva, Switzerland, he was taken off the bench path due to a faulty mass spectrometer in the early 1980s. While waiting for the machine to be repaired, he started to work on a software package (PC/Gene) to analyze protein sequences. The software relied on the Protein Identification Resource (PIR) of the National Biomedical Research Foundation (NBRF), that had been developed by Margaret Dayhoff starting in 1965. This set the scene for Swiss-Prot to emerge in 1986 as Amos had broadened Dayhoff’s protein curation to produce a structured on-line resource. Protein annotation needed to abide by rules and Amos set out to state those rules, share them with other biocurators thereby initiating standards.

Many years before the development of biomedical ontologies, Amos spearheaded the development of controlled vocabularies and was aware of the need to channel those efforts within the life science community. He co-founded the Swiss Institute of Bioinformatics (SIB) in 1998 with Ron Appel, who had launched Expasy, one of the first web servers for molecular biology, at the time tailor-made to hosting Swiss-Prot.

Amos’ major input to biocuration was praised all throughout. He was recognized with an Exceptional contribution to ISB award in 2021. Ten years earlier, his contribution to the expansion of proteomics was crowned by the HUPO Distinguished Achievement Award in Proteomic Sciences (2011). As recently as 2025, the International Society for Computational Biology (ISCB) acknowledged his commitment with a Senior Scientist Accomplishment award. Importantly for the biocuration community, the seed of ISB was planted in many minds and, with his impulse, it was established in Switzerland in 2009.

Amos was a relentless biocurator and probably one of the most productive in the array of curated databases. Whoever has seen him sitting in meetings will always remember his eyes and fingers stuck on a laptop and whether he was gathering information on proteins for Swiss-Prot (1986-2009), neXtprot (2009-2022), or cell lines for Cellosaurus (2014-2025), it remained an obsessive task for him. Yet, as soon as he raised his head from his laptop, he would be keen to discuss and share on the latest cool information he found or on any topic anyone would bring about.

We say goodbye to a great scientist, colleague, and friend, but his legacy will continue to inspire.

Executive Committee Candidates 2025

September 20, 2025September 23, 2025

The election of three members of the International Society for Biocuration Executive Committee (ISB EC) will be held from September 22^nd – October 3^rd, 2025.

Emails will be sent to current members on September 25^th. Only current members, as of September 21^st, 2025, who receive this email will be allowed to vote. Please note that if you are an ISB member and do not receive the email, please contact us at isb@biocurator.org.

We thank all of the following five candidates for agreeing to stand for election to the Executive Committee (EC). Information about the candidates standing for election to the Executive Committee (EC) is available below:

TBK Reddy

Position: Genomic Standards Group Lead

Affiliation: DOE Joint Genome Institute Lawrence Berkeley National Lab, Berkeley, CA, USA

Biosketch: Dr. T.B.K. Reddy has devoted more than 25 years to advancing the field of biocuration through leadership in the development and stewardship of internationally recognized biological databases. He began his career at The Jackson Laboratory, where he contributed to the Mouse Genome Database, one of the earliest and most influential model organism resources. He later directed curation efforts at the Tuberculosis Database (TBDB), integrating genomic and functional data to accelerate research on a critical global health challenge.

Since 2011, Dr. Reddy has led the Genomic Standards Group at the U.S. Department of Energy Joint Genome Institute (JGI), Lawrence Berkeley National Laboratory. In this role, he oversees the Genomes Online Database (GOLD), a flagship repository that provides curated metadata for genomes, metagenomes, and related projects worldwide. Under his guidance, GOLD has become a cornerstone for microbial genomics and microbiome research, setting benchmarks for data standards and interoperability.

Throughout his career, Dr. Reddy has championed the use of controlled vocabularies, metadata standards, and FAIR principles to ensure that curated data are accurate, discoverable, and reusable. He has co-authored numerous peer-reviewed papers, trained students and curators, and played an active role in the International Society for Biocuration community.

Motivation: I am motivated to serve on the ISB Executive Committee because I believe the future of biocuration depends on our ability to adapt quickly and stay relevant in an AI-centric scientific world. Over the past 25 years, I have led curation efforts at the Mouse Genome Database, the Tuberculosis Database, and now the Genomes Online Database (GOLD) at the DOE Joint Genome Institute. Across these projects, I have seen how curated metadata is not just infrastructure—it is the foundation that drives biological discovery. Today, AI and machine learning offer powerful opportunities, but their success depends on high-quality, standardized, and comprehensive metadata. If we continue with “business as usual,” we risk being left behind. I see ISB playing a critical role in preparing our community to respond to rapidly changing needs, from curating massive new datasets to adopting AI-assisted workflows that augment curator expertise.

On the Executive Committee, I will work to:

1) Position ISB at the forefront of AI-ready curation standards.

2) Expand community training that integrates both best practices and new tools.

3) Foster agility in ISB activities so we can meet emerging challenges and continue to accelerate discovery.

Kalpana Panneerselvam

Position: IntAct Team lead/Senior curator

Affiliation: EMBL-EBI, Hinxton, Cambridgeshire, UK

Biosketch: I began my biocuration career in 2009, working on projects involving the bioindexing of key terms from scientific publications. I then served as a curator for Ingenuity Variant Analysis (QIAGEN), where I focused on clinical variants and their associations with phenotypes, therapeutic interventions, population studies, and biomarkers relevant to clinical conditions and therapies.

Currently, I contribute to the IntAct database, specializing in the curation of molecular interaction data. My work emphasizes building contextual interactomes, including clinical contexts and tissue-specific interactions, as well as studying how clinical variants affect interactomes. I also curate detailed features such as binding domains involved in interactions, interaction kinetics, and the roles of inhibitors, agonists, and antagonists. In addition, I have been involved in mapping tissues and cell lines where interactions are detected into ontologies at their simplest context, with mappings shared to EFO for public access. I also contributed to the proposal for upgrading the XML-maker into a more user-friendly tool for generating IMEx-compliant high-throughput interaction data, ready for import into the curation pipeline.

Beyond curation, I have actively engaged in community outreach and training activities, particularly in the areas of molecular interaction network biology and data analysis.

Motivation: I see serving on the Executive Committee as an opportunity to give back to the community that has shaped my career, while working with colleagues to ensure that biocuration continues to grow as a recognized and valued discipline. I would be particularly interested in contributing to the committee’s efforts in coordinating micro-grant and fellowship submissions that support training and innovation in biocuration, as well as in preparing calls for hosting future Biocuration meetings to ensure global representation and inclusivity.

Susan Bello

Position: Senior Scientific Curator

Affiliation: Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA

Biosketch: I have been a curator for Mouse Genome Informatics for over 20 years. While my research background was in oceanography and toxicology, I translated that experience into mouse phenotypes. I began by concentrating on curation of phenotypes and alleles developing an understanding of nomenclature standards and use of ontologies. Over time, I moved into work on ontology development for the Mammalian Phenotype, Human Disease, Vertebrate Trait, and UPheno ontologies. I’ve also been involved with website development, including the creation of the Human – Mouse: Disease Connection portal at MGI. With the advent of the Alliance of Genome Resources, I work as part of a team on the integration and harmonization of allele, phenotype, and disease curation across species. This project has included developing LinkML models to support curation needs across a wide range of species.

Motivation: I have been a member of the ISB since 2009 and have been on the ISB EC for the past 3 years, acting as chair of the EC for the past 2 years. On the EC, I have served on the Awards and Training & Outreach subcommittees throughout my tenure. On the Training & Outreach subcommittee, I’ve helped to identify additional groups of curators that could be brought into the ISB to expand our membership. I was part of the organizing committee for the 2025 Biocuration conference. I help to keep the ISB active on social media, especially on our Bluesky account. As chair I have worked to keep the many tasks of the EC progressing forward and to update procedures for the EC. I’ve also been involved with our interactions with the Global Biodata Coalition providing insights and feedback for their interactions with funders. In a second term on the EC, I hope to continue to increase the visibility of the work of biocurators, expand the membership of the ISB, and improve the broader communities understanding of the importance of and value added by the work of biocurators to hopefully increase support for biocuration.

Ranjana Kishore

Position: Biocurator, WormBase and Alliance of Genome Resources

Affiliation: Biology and Biological Engineering (BBE), California Institute of Technology, Pasadena, CA 91125

Biosketch: I am a biocuration scientist and life sciences professional with over 20 years of experience in the fields of biomedical curation and data management with an in-depth expertise in modeling, integration and display of biomedical data and automated methods of text summarization. I have contributed extensively and worked with widely used biomedical resources such as the Gene Ontology Consortium, WormBase, and groups such as EBI-EMBL and most recently with the Alliance of Genome Resources (Alliance). I have led several projects from conception to completion: biocuration of human models of disease for WormBase and automated text summarization of gene data that led to thousands of gene summaries for C. elegans and nine other nematode and parasitic species. More recently I led a similar effort at the Alliance which has resulted in thousands of gene summaries for nine major model organism species, now integrated into resources such as NCBI RefSeq, directly serving the broader scientific community. I will bring my experience working across diverse biocuration groups and multidisciplinary teams to the EC committee of the ISB to better achieve its goals.

Motivation: I am a biocuration scientist with over 20 years of experience in biomedical curation, data management, and integration of biological knowledge. What motivates me is the central role that biocuration plays in enabling discovery across the life sciences. I envision ISB as a hub for advancing both the science and the visibility of biocuration, particularly in this era of dwindling monetary funding and resources. I believe ISB can play a stronger role in fostering collaborations among diverse biocuration groups, promoting and sharing innovative methods and tools, and ensuring the sustainability and visibility of biocuration. Equally important, I will work to strengthen global partnerships, build relationships with user and publishing communities, improve documentation on standards and expand training and mentoring opportunities for early-career biocurators, with a strong commitment to equality, diversity, and inclusion. If elected to the ISB Executive Committee, I will work to increase interaction between biocurators by providing new forums for the exchange of ideas and experiences. I will bring to the EC not only my deep expertise and longstanding connections within the community, but also my strong communication skills—both written and spoken, extensive experience working in teams, and readiness to try new and bold ideas in order to fulfill and even enhance the goals of the ISB.

Biocuration 2026 Travel Awards

September 19, 2025September 19, 2025

The International Society for Biocuration (ISB) is offering 5 in-person travel grants for the Biocuration 2026 Conference in Cape Town, South Africa.

In-person awards cover up to 2,700 CHF in expenses, paid as reimbursements following the conference, meaning that awardees must pay their expenses and send receipts (including any for currency exchange costs) to the ISB following the conference.

In person awardees are also required to submit a recent photograph and a written report (minimum 100 words) about the outcomes of their attendance. These will be posted on the ISB website, newsletter, ISB mailing list, and promoted on social media (Bluesky, LinkedIn, Mastodon).

All awardees are expected to present either a talk or poster at the conference.

If you are a current ISB member, please provide the email address associated with your ISB membership.

Deadline for applications: October 31st, 2025

Apply here

Annual General Meeting

September 17, 2025September 17, 2025

October 27th, 2025

The International Society for Biocuration (ISB) will hold its Annual General Meeting (AGM) on Monday, October 27th, 2025 along with presentations by our two biocurator career award winners, Tiago Lubiana and Kimberly Van Auken.

Time: 4:00–6:00 pm CST / 3:00–5:00 pm GMT / 11:00 am–1:00 pm EDT / 8:00 am–10:00 am PDT

Note that daylight savings begins in Europe/UK on October 26th and daylight savings begins on November 2nd in the USA, so there’s a slightly different offset than usual. All canonical times for this event are based on European time!

Please fill out this form to register to attend by Sunday, October 26th, 2025 and receive the meeting link.

This meeting will be recorded, by attending the meeting you are agreeing to be recorded. The recording will be available on the ISB website after the meeting.

Schedule (in CET):

4:00pm Sue Bello: ISB Annual General Meeting
4:30pm Open for questions and suggestions from attendees
5:45pm Tiago Lubiana, winner of the Early Career Award
5:10pm Kimberly Van Auken, winner of the Advanced Career Award

Equity, Diversity, Inclusion and Accessibility Officer

February 12, 2025February 12, 2025

The International Society for Biocuration (ISB) is committed to working to build an inclusive and diverse network of biocurators, ontologists, data stewards and others who work to improve the quality of data wherever they may work. The EDI subcommittee has worked hard to establish a set of guidelines to promote equity, diversity, inclusion, and accessibility for the society. With these guidelines in place and with the difficulty in maintaining an active committee in the past year the executive committee has decided to establish an Equity, Diversity, Inclusion and Accessibility Officer.

This officer will be charged with:

Acting as a point person for ISB members to communicate EDIA concerns.
Reviewing applications for Biocuration conference organizers for any EDIA concerns.
Working with the Biocuration conference liaison to ensure the annual conference is following EDIA guidelines.
Acting as a point person to think ahead for any potential EDIA blindspots.

The past few years have seen the first Biocuration conference in India (2024), the first fully hybrid Biocuration conference (2025), and plans for the first Biocuration conference in Africa (2026). We fund travel fellowships to enable curators from low-income countries to attend Biocuration conferences, We have increased the number of available microgrants and inclusivity grants available to members this year to two of each type. We have also revised and updated our guidelines for conference organizers.

We thank Mary Ann Tuli and the members of the EDI committee for their tireless work over the years to guide the society policies to where they are now.

We thank Luana Licata for volunteering to be the inaugural EDIA Officer!

Archived Data Sets

February 10, 2025February 12, 2025

Last week saw a flurry of messages about how to find archived data sets. This is the list of resources and links from those messages. The bulk of this list came from the Data Rescue Project (@datarescue2025.bsky.social) that was shared by Melissa Haendel. Please check the Data Rescue Project page for new updates. The Data Rescue Project now has a homepage https://www.datarescueproject.org/about-data-rescue-project/

Larger and Established Data / Website Efforts

End of Term Crawl

The main coordinated effort to archive websites
Datasets have been more of a challenge, especially data embedded in databases.

EDGI

They have been focused on environmental data and a good organization to follow for updates.
They work with Public Environmental Data Project (see below)

Public Environmental Data Project

A coalition committed to preserving and providing public access to federal environmental data.
January 31, 2025 – CDC’s Social Vulnerability Index and Environmental Justice Index
January 24, 2025 – Council on Environmental Quality EJScorecard
January 24, 2025 – Climate and Economic Justice Screening Tool

Harvard’s LIbrary Innovation Lab Team

They have been focusing on data.gov and should released their data on Feb 6, 2025. https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/
- #SafeguardingResearch is in contact with them to mirror data on servers not in US-jurisdiction

ICPSR

Overview of ICPSR’s data rescue activities to date:
- Downloaded ~2800 files from various sources requested by researchers; all the files ICPSR collected will soon be available via a dropbox link.
- Examining CDC data dump from archive.org to assess what might be missing.
  - Ideally will also be a resource for those looking for data to see what is/isn’t available.
- ICPSR staff and allies are generating metadata for each of the datasets we have so that we can make them available through an existing archive at ICPSR (DataLumos, openICPSR, or the Resource Center for Minority Data, depending on our timeline and some technical issues we’re working out)
ICPSR Data Lumos – They have the older version of a lot of major data, including a recent addition from the CDC.

IPUMS

They have data and have been working on cataloging efforts
Notification went out yesterday that they will share more soon.

Dryad

Generalist repository available to help with data publication, storage, and preservation.

Synapse

Generalist biology and biomedical data repository available to help with data publication, storage, and preservation.

Silencing Science Tracker

Joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund.
Tracks government attempts to restrict or prohibit scientific research, education or discussion, or the publication or use of scientific information.

OSF

Generalist repository for archiving, sharing, and storing all types of research outputs, not limited to preprints or only data.
OSF is available as an option for pre-prints of articles if, for some reason, they cannot be posted on official sources.
Many universities also have institutional repositories where research (articles, data, dissertations, etc) from that institution can be posted. They also have preservation mandates. An example is Penn’s ScholarlyCommons.

The Climate Mirror Project

Has NOAA data pulled during the 2017 data rescue.

Open Energy Data Initiative

A volunteer has pointed out that “key equity data” is missing from the Dept of Energy. Says they were able to find it on this site. Includes additional data from DOE.

Wayback Machine

The Wayback Machine is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form. Other projects include Open Library & archive-it.org.

Data Rescue Events

University of Washington-based Data Rescue
- Hosted by the University of Washington Center for Advances in Libraries, Museums, and Archives (CALMA), series of data rescues followed the model from 2017. The spreadsheet of data reviewed at the events is available: Data Tracking List – Data Rescue 2025 (Responses).xlsx
- It is unclear if they are hosting more.
Healthy Regions Policy Lab at UIUC
- https://emails.illinois.edu/newsletter/02/615978402.html
- Includes CDC, EPA, and HRSA Data
Stanford’s Big Local News
- They are running Federal data collection collaborative

Smaller/Ad Hoc Rescue Efforts/ Data Archiving Activists

UCSB LSIT Data Mirroring
- Mirrored and archived public data on locally hosted git server
- Includes retrieved data sets from CDC, NIH, and NOAA
CDC Page on Internet Archive
- A special archive created on IA of all CDC datasets publicly available as of January 28, 2025
- uploaded by DataHoarders (we think)
Datasets in Dataverse
- Data uploaded by the Climate Change and Health Research Coordinating Center (CAFE)
  - CAFE is looking for potentially non US based location to duplicate the contents of their collection
- Includes CDC’s Social Vulnerability Index data.
- Most of what’s being placed here is data focusing on health and the environment.
- DataRefuge from 2017 DataRefuge initiative can be opened for more deposits
Safeguarding Research
- Organizer is Henrik Schönemann; https://fedihum.org/@lavaeolus
- There is a forum: https://safeguarding-research.discourse.group/ (admin = Henrik)
  - Based in EU, USA and global – got access to Update 1-2 PB (and more on the way) of storage & people willing to seed
  - Currently, we’ve got around 1TB of data backed up
    - Including >100.000 PDFs from academia.edu (“transgender”, “Queer Studies”, “intersex”, “nonbinary” etc. – see the forum for the full list)
    - 350GB web archive of CDC, including all 30.000 files from archive.cdc.gov And much more
    - “We’re working on providing a central index of archives, with metadata about who archived what, when, to be disseminated widely alongside torrent files and act as both a central point of coordination for archivers to assess what new work is needed, and a mass distribution channel.”
  - Possible contact to CERN, will update asap
Data Hoarder
- A reddit community that is coordinating efforts to rescue data.
Data Hoarding
- index of resources and archives related to data hoarding, web archival and self hosting.
ArchiveTeam Warriors
- They run a distributed crawler. Anyone can install it to help contribute.
- US Federal Data page
- Data is uploaded to Archive.org by volunteers
Data Liberation Project
- Note: It looks like the project may have stalled in September 2024. Send info if you know more about them.
- Run by BigLocalNews and MuckRock, which are good groups to follow.

Tools for Data Rescues

DCN Curating Data for Data Rescues
- Provides key insights for curating data and the types of questions that need to be asked.
Data Management Checklist For Data Rescues (from MIT)
- Checklist to assist with curating data rescue efforts.
#RStats package from @ropensci.org
- gitcellar downloads and archives all repos, issues, and PRs from a GitHub organization in one shot: docs.ropensci.org/gitcellar/
WebRecorder.net
- According to an email: has archived 8TB+ of government sites, some from the End-of-Term-Archive seed list, some from EDGI Slack requests, and many sites independently
ArchiveBox.io
- According to an email: has also archived government datasets from data.gov, CIBP, USCIS, NOAA, NASA, NSIDC, and more
Awesome-datahoarding
- Provides a list of tools for web harvesting, etc.
Awesome Web Archiving
- Another curated list of web archiving tools
DataRescue Workflow
- This is the workflow from the original data rescue/DataRefuge project in 2017.
- Many of the tools are no longer working, but the workflow is still useful. UW used this to create their workflow above.
- The challenge with the original project was where to store and how to make discoverable the large amounts of data captured.
- Part of this effort is also housed in the Harvard Dataverse Repository and can be opened for more data deposits
- There is a CKAN instance with some of the 2017 data.
https://govdiff.com/
- Tool created by Jerome Paulos to show side-by-side changes in government websites.
How You Can Help Archive U.S. Government Data Right Now: Install Archive Team Warrior
- This is a reddit post, but it lists instructions for how to archive and the tools needed to be able to contribute. Figured it would best be categorized here.

Library Guides to Data Rescues

American Univ: https://subjectguides.library.american.edu/data_rescue (Now shared through Springshare)
Univ of MN: https://libguides.umn.edu/govpubs/admin
Salem State: https://libguides.salemstate.edu/datapreservation
Butler: https://libguides.butler.edu/archiveddatasources
Hamilton: https://libguides.hamilton.edu/c.php?g=132443&p=10779226
Albany: https://libguides.library.albany.edu/c.php?g=1450281&p=10779581
GODORT: https://godort.libguides.com/c.php?g=1450475&p=10780944

Articles on current efforts

Call to arms: What government information librarians can do to help save critical federal information from being lost – Blogpost from FGI (Free Government Information)
Why EDGI is Archiving Public Environmental Data – blog post from EDGI
Preserving federal health data – by The Journalist’s Resource out of the Harvard Kennedy School
- As the US government removes health websites and data, here’s a list of non-government data alternatives and archives – by The Journalist’s Resource
Archivists Work to Identify the Thousands of Datasets Disappearing from Data.gov – by 404 Media; interviews with EOT and James Jacobs
The scramble to back up CDC.gov – by Garbage Day; mentions some coordinating efforts by Health Professionals and Journalists to gather the CDC data
Lending a hand with EOT Crawl – blog post from the PEGI Project.
As the Trump admin deletes online data, scientists and digital librarians rush to save it – Salon Magazine. Talks about EOT.
Three Efforts to Preserve Government Data as a New Trump Administration Approaches – Union of Concerned Scientists
What’s at Stake if the Data at Federal Agencies Disappears? – Union of Concerned Scientists
Researchers rush to preserve federal health databases before they disappear from government websites from The Journalist’s Resource

Articles for context

CDC Site Restores Some Purged Files from NYT
Thousands of U.S. Government Web Pages Have Been Taken Down Since Friday” by Ethan Singer.
The Government Information Crisis Is Bigger Than You Think It Is blog post by Free Government Information
CDC removes gender, equity references in public health material from WaPo
BREAKING NEWS: CDC orders mass retraction and revision of submitted research across all science and medicine journals from Inside Medicine
A Look at Federal Health Data Taken Offline from KFF
As Data Goes Off-Line Under Trump, Environmental Researchers Are Uploading Backups from Inside Higher Ed
The mad dash to protect environmental data from Donald Trump from The Verge
Some federal health websites restored, others still down, after data purge from VPM
Trump orders USDA to take down websites referencing climate crisis from The Guardian

Existing Alternative Data Sources

Thanks to Brianne Dosch for suggesting the section and some of the bullets.

PolicyMap – offers a free tier that can be used to view basic information down to the tract-level, but more detailed data and functionality requires a subscription; available at some universities
- Purged Federal Agency Data Available
FRED – They have some demographic data as well; free and open source
Census Reporter – is a free, open-source platform focused on making American Community Survey (ACS) data more accessible, including the recent upload of the 2022 1-Year ACS data
Esri – for mapping users, the GIS vendor publishes several U.S. Census Bureau data sets, including the ACS, through its ArcGIS Online Platform
IPUMS – Even when the government operates normally, many analysts turn to Minnesota Population Center products to access ACS, Current Population Survey microdata and Decennial Census data
Social Explorer – historical Census data and more; available at some universities
SimplyAnalytics – has internally processed American Community Surveys; available at some universities
American College of Obstetricians and Gynecologists – Hosting copies of immunization schedules and contraceptive use guidance from the CDC
https://www.ebi.ac.uk/ena/browser/home – The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. Mirrors SRA public data

Economic Indicators

National League of Cities: Federal Grant Navigation Equity Dashboard
- This tool aggregated data from many sources – it seems to still be able to categorize disadvantaged communities (by environmental and economic standards), as well as other critical data denotations that are increasingly hard to access
ALICE Economic Vitality Dashboard and Report (2022 w/ 2024 update)
- This resource specifically provides data on work, housing, and community resources for households below the ALICE threshold (Asset Limited, Income Constrained, Employed). The data is provided by the U.S. Census Bureau’s Public Use Microdata Sample (PUMS, 202!)
National Equity Atlas Dashboards
- A data and policy tool that provides a detailed report card on racial and economic equity – this tool can provide a holistic Racial Equity Index snapchat of communities. The Atlas draws its data from a unique regional equity indicators database developed and maintained by two private institutions: PolicyLink and USC Equity Research Institute ERI.

Public Health

County Health Rankings & Roadmaps (CHR&R)
- A program of University of Wisconsin’s Population Health Institute, this data tool aims to highlight the symbiotic nature of health and equity by factoring in physical environment, social and economic indicators, clinical care, and health behaviors to health outcomes.
  - They also recommend these additional health data platforms:
  - America’s Health Rankings report is a health assessment tool based on state-level health indicators.
  - Congressional District Health Dashboard pulls together local data on the health and well-being for each congressional district.
City Health Dashboard
- From NYU Langone Health, this platform provides 40+ measures of health and factors affecting health across five areas (Health Behaviors, Social and Economic Factors, Physical Environment, Health Outcomes, and Clinical Care) for 970+ cities across the U.S.

Biocuration 2025 Preliminary Schedule of Talks

January 21, 2025January 24, 2025

Schedule of talks for April 7-9

DAY 1

Keynote: Tanya Berger-Wolf

Director of the Translational Data Analytics Institute, Director of Imageomics Institute, PI of AI and Biodiversity Change (ABC) Global Climate Center, Ohio State University.

Day 1, Session 1: Data Standards & Ontologies

Encouraging authors to use and cite data in public repositories; a publisher perspective.
- Bastien Molcrette
- Data Publications, Data Standards, Fair Data Principles, Public Data Resources
DO Spanish: enhancing DEI via a standardized workflow for translating ontology and website content
- Lynn Schriml
- Curation, Data Sharing, Disease, Ontologies
Harnessing Community Power for Long-Term Success of the Mondo Disease Ontology
- Sabrina Toro
- Curation, Data Standards, Disease, Ontologies
The Earth Metabolome Initiative Ontology
- Tarcisio Mendes de Farias
- Data Modeling, Knowledge Graphs, Omics Data, Ontologies

Keynote: Nirav Merchant

Director of the Data Science Institute at University of Arizona, PI CyVerse

Day 1, Session 2: Artificial Intelligence

From Lab Bench to Web: A Strategy for Making Biomedical Data Findable and Accessible
- Christina Parry
- Data Standards, Fair Data Principles, Graph Databases, Repositories
Extending Ontology for Biomarkers of Aging using OLIVE
- Hande Kucuk McGinty
- Artificial Intelligence, Knowledge Graphs, Large Language Models, Ontologies
Building the Lighthouse: Guiding LLM-Powered Biocuration with Domain Knowledge and Context
- Harry Caufield
- Generative Artificial Intelligence, Large Language Models, Literature Mining, Ontologies
AI Curation Methods for NASA Scientific Data
- Walter Alvarado
- Artificial Intelligence, Curation, Large Language Models, Metadata
Plant Reactome: A plant pathways Knowledgebase and discovery platform
- Sushma Naithani
- Artificial Intelligence, Curation, Functional Gene Annotations, Knowledge Graphs

Day 1, Session 3: Data Sharing, Databases & Knowledgebases

Single-cell comparative transcriptomics for hundreds of species?
- Frederic Bastian
- Comparative Data, Curation, Data Standards, Gene Expression
Epitope-Driven Annotations in Protein Resources
- Randi Vita
- Database, ontology, protein, epitope
Towards FAIR Phenome: Indian Crop Phenome Database at Indian Biological Data Centre (IBDC)
- Sonia Balyan
- Data Sharing, Data Standards, Databases, Phenotypes
Making Rare Disease Data Available in the Rare Disease Cures Accelerator-Data and Analytics Platform
- Nicole Vasilevsky
- Curation, Data Sharing, Disease, Fair Data Principles
Project ‘Shail’: Curating a mountain
- Saurabh Raghuvanshi
- Curation, Databases, Drug Discovery, Genomics
Import of Human GWAS Data and Mapping of EFO to multiple ontologies at the Rat Genome Database
- Stan Laulederkind
- Curation, Disease, Genomics, Ontologies

DAY 2

Keynote: Paul Thomas

Director, Division of Bioinformatics, Director of the Gene Sequence, Function, and Health Laboratory Initiative, University of Southern California, PI Gene Ontology, PI PANTHER

Day 2, Session 1: Gene/Protein Functional Prediction

DisProt: The Manually Curated Resource for Intrinsically Disordered Proteins
- M. Victoria Nugnes
- Curation, Databases, Ontologies, Proteins
A Large Scale Crowdsourcing of the Fifth Critical Assessment of Protein Function Annotation
- Iddo Friedberg
- Annotations, Artificial Intelligence, Functional Protein Annotations, Public Data Resources
New Synteny visualizations on Xenbase
- Malcolm Fisher
- Annotations, Comparative Data, Genomes, Synteny

Day 2, Session 2: Gene/Protein Functional Prediction

Cross-species quantification of function annotations provides insights into disease-associated uncharacterized human genes
- Parnal Joshi
- Annotations, Comparative Analysis, Data Analysis, Functional Protein Annotations
Leveraging the AlphaFold Database for enhanced protein function annotation
- Paulyna Magaña
- Annotations, Functional Protein Annotations, Protein Structure Prediction, Proteins
Leveraging Large Language Models for Gene Summary Generation at the Alliance of Genome Resources
- Valerio Arnaboldi
- Large Language Models, Literature Mining, Automated Gene Summaries, Text Summarization
Life Cycle Events for Protein Family Models: Birth, Maturation, Cloning, Retirement
- Daniel Haft
- Bacteria, Data Sharing, Functional Protein Annotations

Keynote: Andy Hickl

Chief Technology Officer, Allen Institute

Day 2, Session 3: Natural Language Processing

Semi-automated curation of post-translational modification relationships using automated knowledge extraction and assembly
- Benjamin Gyori
- Artificial Intelligence, Curation, Databases, Literature Mining
Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity
- Daniela Raciti
- Curation, Machine Learning, Community Curation, Sentence Classification
Enhancing the SIB Literature Services with annotations to support biocuration
- Deborah Caucheteur
- Annotations, Curation, Data Analysis, Literature Mining
Protein structure enrichment through text mining
- Melanie Vollmar
- Annotations, Literature Mining, Natural Language Processing, Protein structures
Enhancing data annotation in ChEMBL for robust analyses
- Sybilla Corbett
- Annotations, Curation, Fair Data Principles, Natural Language Processing

Day 2, Session 4: Glycans

Glycan Archetypes: definitions, implementations and applications for standardizing glycan structure data
- Kiyoko Aoki-Kinoshita
- Data Standards, Databases, Glycans, Ontologies
BiomarkerKB: Biomarker-centric data modeling and knowledge integration for translational research
- Raja Mazumder
- Curation, Databases, Glycans, Knowledge Graphs
Inferring Tissue and Cell-type Glycosyltransferase Specificity from Single-Cell Gene Expression Data
- Nathan Edwards
- Annotations, Glycans, Machine Learning

DAY 3

Keynote: Shannon Farrell

Data Curation Network/Univ. Minnesota

Day 3, Session 1: Data Curation

It’s Now or Never: Delays in Biocuration Disproportionately Affect Understudied Proteins
- An Phan
- Curation, Data Analysis, Functional Gene Annotations, Literature Mining
How have standards in genomics evolved since the first microbial genome was published 3 decades ago?
- Chris Hunter
- Data Standards, Metadata, Ontologies, Repositories

Keynote: Sandra Orchard

ISB 2023 Exceptional Contribution to Biocuration Awardee, EMBL-European Bioinformatics Institute – UK

Day 3, Session 2: Data Curation Databases, Infrastructure, Literature Mining, Public Data Resources

The global biodata infrastructure: how, where, who, and what?
- Chuck Cook
- Databases, Infrastructure, Literature Mining, Public Data Resources

Announcement for the 2024 Annual General Meeting (AGM)

September 11, 2024November 2, 2024

The International Society for Biocuration (ISB) will hold its Annual General Meeting (AGM) on Tuesday, October 29^th, 2024 along with presentations by our two biocurator career award winners, Sushma Naithani and Maria Victoria Nugnes.

Time:

3:00–5:00 pm CET (Central European)
2:00–4:00 pm GMT (British)
10:00 am–12:00 pm EST (Eastern)
8:00–10:00 am PST (Pacific)

Note that daylight savings begins in Europe/UK on October 27^th, 2024 and daylight savings begins on November 3^rd, 2024 in the USA, so there’s a slightly different offset than usual. All canonical times for this event are based on European time!