The Future of Biocuration: Panel discussion from the Biocuration2021 virtual conference

By: Nicole Vasilevsky and Jane Lomax

Like all in-person gatherings in this past year, the annual International Society for Biocuration conference went virtual in 2021. At the inaugural session on April 13, 2021, a group of panelists discussed ‘the future of biocuration’. The panel was moderated by Rama Balakrishnan, who has served on the ISB Executive Committee since 2017, and is the co-chair (along with Susan Bello from the Jackson Laboratory) of the Biocuration2021 conference. Rama was joined by four panelists from various roles in academia and industry to discuss what is in store for our community. The recording is available here.

What is curation: Distilling knowledge from information

Rama initiated the discussion with the fundamental and relevant question, ‘what does the word curation mean to you?’ Working in the biocuration field, many curators can probably relate to this question, a question that is frequently asked by people who are outside this field. The role of a curator at a museum, for example, may be more familiar, but biocuration is a less well-understood field. Rama, who has held varying roles as a curator (academic and industry), tried to get after how the actual task of curation may differ amongst us. Sandra Orchard, from EBI shared a classical definition of ‘turning unstructured data into structured searchable data’, but recognized this is not always true as, whilst some curation tasks involve making data more structured, text-minable and machine-readable, the outcome of data curation does not always result in completely structured data. Carol Bult from MGI defined curation as “applying semantic standards to ensure data findability and aggregation.” 

Coming from the industry perspective, both Kambiz Karimi (Myriad Women’s Health) and James Malone (SciBite) agreed. Curation involved meaning-based capture and structuring of content using controlled vocabularies. Data curation can also include data cleaning, which is often a pre-curation task. Curation can help improve and enrich data interpretability and ultimately add value. It allows for enhanced search, querying, semantic integration and meta-analysis. 

How can we ensure quality?

Given that the panelists all agreed on a high level definition of curation, Rama then asked about ensuring data quality. What does good quality mean and what are metrics to assess quality? Different quality control (QC) and quality assurance (QA) processes apply, depending on the type of curation that is being done, whether you are curating tax forms (as James did in a summer job long ago) or curating the mouse biology literature. Some processes that were discussed by Carol and others  included intercurator checks, crowdsourcing feedback from downstream users, practices to ensure collaboration, regression testing to ensure continuity and consistency across datasets. Sandra pointed out that curators cannot be all things to everything, and stressed the importance of specialist databases with curators who are domain experts who can take the first pass at the curation, and build re-processing pipelines or scoring mechanisms to export high quality subsets to other data resources.

James and Rama noted how detecting outliers can assist with quality checks. However, it may not always be easy to detect the outliers without the expert knowledge in a specific area. For example, Rama curates patient data at Genentech, and once came across a data reporting a patient had a 100℃ fever (rather than 100℉), which was easy to spot as an error. However, in a more complicated clinical use case, detecting erroneous data points may not be so obvious and require more specialized knowledge.

Kambiz shared that Myriad has several QC approaches, including a peer review process, a spot checking program to have curators spot check each other’s work and a quality check process that compares their classification to previous classifications from the community. 

Sandra also noted the importance of researchers collaborating with curators prior to publication. She shared an anecdote where an author published a paper with an erroneous dataset, a simple mistake where a row in a spreadsheet had been accidentally deleted, causing nonsensical results. The curator picked this up and contacted the author, who was able to correct it, but this speaks to the importance of pre-submitting data to the database before publication and the important role a curator can play with the research community. 

Opportunities with Machine Learning and Automation 

While a lot of biocuration is done manually, more and more processes and workflow are being automated, with text mining, machine learning (ML), natural language processing (NLP) and AI.  The panel was asked their opinion on how AI and ML will affect the work of biocurators? Sandra assured us that machine learning will enhance our work, but is not concerned that it will replace human curation. Data is too messy, the literature is too unstructured, and human review and curation is going to be needed in the foreseeable future. James echoed her sentiments in saying, “[Machine Learning] will become an assistant, it will not replace subject matter experts who are biologists, scientists, curators. It will play a role in helping us.” James sees it as an opportunity for biocuration, where we should work to exploit advances in deep learning, noting the importance of biocuration is more pronounced now than ever. We can train AI to aid in biocuration and we can work together. In addition, quality Machine Learning/AI requires training sets that have been human-curated, and the advances of these technologies will require more curators; this is a new opportunity for this community. Carol agreed, but brought up the point that there may be the perception that these technologies are advanced to the point where curators can be replaced. This is causing challenges with funding for biocuration due to the notion that machine learning can do all or most of what human curators do. While machine learning can assist with making biocuration scalable, we need to do better as a community at communicating how these things interrelate and feed off each other.

“Biocuration has never been more valuable than it is now and yet under appreciated.” It’s something the Society can help us tackle: this perception and articulate how manual and machine learning biocuration can go hand and hand. – Carol Bult

Approaching authors

An audience member inquired whether database curators approached authors for clarification about their published data, and whether authors were responsive. Kambiz shared that they did approach authors when there was ambiguity with the content or data in an article. Sandra concurred, and alluded to the challenge with time dependencies; if a paper was recently published (1 year – 18 months ago), they frequently got a response. If a paper is over 3 years old, in general, they were less likely to get a reply, as the first author may have moved on and the PI is unfamiliar with the details of the data. 

This may speak to an opportunity to better train researchers in becoming familiar with curation methods and standards, to allow for unambiguous reporting in their publications. Requirements to share data at the time of publication will also help address this need.

Getting the journals involved

This led to the next question about working with the journals to publish data in a more structured way. Carol has had some experience working with journals in the mouse community, who are careful about publishing mouse names with the accepted terminology and nomenclature. She did mention that sometimes there is push back as to whether the recommended standard is the accepted standard, and whether this is going to evolve or change in the future. We all may be familiar with the situation below.

Source: https://xkcd.com/927/

This is an opportunity for a systematic community approach, the ISB should promote standards adoption to the journals.

Sandra pointed out that a challenge with approaching journals to use our standards, is the sheer number of journals. A more targeted approach may be more appropriate. For example, the proteomics community was successful in getting a restricted number of journals in their field to require data sharing to ProteomeXchange (http://www.proteomexchange.org/) prior to publication.

Sandra also recommended that we first talk amongst ourselves as a community and define our needs, and what standards to adopt and promote, and then approach the journals.

The elephant in the room: Funding

In recent years, NIH funding has decreased to various databases. How do we sustain our own careers, and train the next generation of curators? 

Kambiz felt it is easier to justify the need for curation due to the regulatory aspect of his industry. Even if there are NLP based processes to extract gene to disease relationships,  manual review will always be needed. He foresees  automated processes will assist with manual curation going forward.

Carol emphasized that we need to promote how important curation is to data science. Data science is recognized as an important field, therefore we should frame curation within its role in data science. We have to be better about explaining return on investment in curation – what can we do when data is curated, and we wouldn’t be able to do, if it wasn’t? She pointed out that the reality that biocuration is considered infrastructure, which is largely ignored, until it is broken. As a Society, can we demonstrate the impact that biocuration has on advancing data science?

Sandra reiterated that we need to make ourselves more visible, we need people outside the community to understand what we do. We need to work together as a community efficiently to not duplicate efforts, we need to align on standards, use specialist databases for initial analysis and data cleaning, and use the baseline resources like accession numbers, and show good examples of good curation.

Continue the conversation on Slack.

Do you have topics you’d like to discuss in a future panel, or suggested speakers? Please let us know (intsocbio@gmail.com).

EBI Training: A guide to molecular interactions

A GUIDE TO MOLECULAR INTERACTIONS

During this webinar, we will give you an introduction to molecular interactions and how to find these through the molecular interaction database IntAct. We will show you examples of how you can search for interaction data, how to create molecular interaction networks using our network viewer based on Cytoscape.js and how to download this data for further analysis.

We will also have a quick look at two other resources, PSICQUIC and IMEx, that integrate molecular interactions from several sources.

Who is this course for?

This webinar is aimed at students or early researchers beginning to use bioinformatics resources in their studies/research who wish to learn more about molecular interactions and IntAct. No prior knowledge of bioinformatics is required, but undergraduate level knowledge of biology would be useful.

Outcomes

By the end of the webinar you will be able to:

Explain what molecular interactions are
Describe what IntAct can be used for
Search for interaction data

26 May 2021

15:30 – 16:30 ( BST )

Online and Free

2021 Biocuration Awards Nominations

The International Society for Biocuration is happy to announce the 2021 Biocuration Awards.

In 2021, the ISB will give two different awards to people who have made a significant impact in the field of biocuration. We welcome your nominations!

Description of the awards:

1) Award for Exceptional Contributions to Biocuration
ISB’s Exceptional Contributions Award recognizes a person who is a leader or a pioneer in the field of biocuration, and whose work has been fundamental to the advancement of biocuration.

2) Biocuration Career Award
The Biocuration Career Award recognizes biocurators in non-leadership positions who have made sustained contributions to the field of biocuration. Those who hold Principal Investigator or Group Leader positions are not eligible for the Biocuration Career Award.

Each award recipient will be invited to present a talk at the 2021 International Biocuration Conference, which will be held virtually this year (the dates and details are to be determined).

Nomination process:
Nominations will be reviewed by the 2021 ISB Awards Committee, comprised of one member of the ISB’s Executive Committee (ISB-EC) and six (6) additional members from the wider research community; these members were nominated by the ISB-EC based on diversity in area of expertise, organization type, role, and geographic location.

Who can nominate and/or be nominated?

·      Any currently active ISB member may nominate anyone in the field of biocuration, whether the potential nominee is a member of ISB or not.

·      Members of the ISB can make no more than 1 nomination per award.

·      Current members of the Executive Committee or the ISB Award Committee are not eligible for the awards.

·      Self-nominations will not be considered.

How to submit a nomination:

Nominations should be sent via email to the awards committee at intsocbio@gmail.com with the subject line “Biocuration Awards Nominations”.

The nomination email should contain all the following fields:

·      Nominator details (name, e-mail and affiliation, member of ISB);

·      Nominee details (name, e-mail and affiliation);

·      Type of award nomination (either Exceptional Contributions to Biocuration or Biocuration Career Award);

·      Short list of scholarly contributions (a maximum of 50 words);

·      Brief description of why you are recommending this person (a maximum of 350 words).

Deadline for submitting nominations:  Friday, February 26, 2021

Please welcome the new 2020-2021 ISB Executive Committee

We welcome Robin Haw as our newest member to the ISB EC. Nicole Vasilevsky and Rama Balakrishnan are returning for their second term.

Our new Chair/Secretary/Treasurer are as follows:

Thanks to Sylvain Poux for your years of service; our outgoing EC member and Treasurer (EC member 2014-2020, Treasurer 2018-2020). Thanks to Sandra Orchard, our outgoing chair (Chair 2018-2020; Sandra will continue on the EC for another year).

Please click here for the composition of the subcommittees. Please note, the Equity, Diversity and Inclusion subcommittee is open to all members, if you would like to join, please reply to this email.

2020 has been quite a year with COVID, quarantines, the Black Lives Matter movement, the US election and more. We feel optimistic about the year to come and we want to serve our community as best we can.

Biocuration 2020 online workshops

As part of the Biocuration 2020 conference we had received excellent workshop proposals from several groups. Since the cancellation of the meeting we have been working with interested workshop organizers to bring this part of the conference online. We are excited to announce that we now have 3 workshops scheduled for the fall:

  • September 24, 9am PT, 12pm ET, 5pm CET – Biocompute Objects: Methods for communicating provenance of data and analysis 
    • Organizers: Charles Hadley King, Raja Mazumder, Jonathon Keeney; George Washington University
    • Register here
    • Recording here
  • October 29, 12pm PT, 3pm ET, 8pm CET – Gene Wiki: how to synchronize and curate primary sources with and in Wikidata 
    • Organizers: Andra Waagmeester, Lynn Schriml and Sabeh Ul-Hasan, Gene Wiki
    • Register here
    • Recording here
  • Dec 04, 8am PT, 11am ET, 4pm UK, 5pm CET – Biolink Model – A community driven data model for life sciences
    • Organizers: Deepak Unni, Chris Mungall, Lawrence Berkeley National Laboratory
    • Register here
    • Recording here

All workshops will be free to all participants.

There is a Slack workspace set up to facilitate communication between organizers and participants. If you are interested in attending any of these workshops please email biocuration2020 @ gmail.com and we will send you an invite to the Slack workspace.

EXECUTIVE COMMITTEE ELECTION 2020

The election of the new International Society for Biocuration Executive Committee (ISB EC) will be held from September 27 – October 04, 2020.

The list of 7 candidates for 2020 can be viewed here.

The Executive Committee is composed of nine (9) members, each with a 3-year term. Being a member of the Executive Committee is a great way to become directly involved with the work of our society, and contribute to the decisions that are taken on behalf of the biocuration community. We would like to encourage all members interested in running for election to get involved in the process.

Serving on the ISB EC minimally involves attending monthly (1 hour)  teleconference meetings, following up on any action items from meetings, and  promoting the ISB’s activity to members and non-members. Examples of activities performed by EC members include reviewing micro-grant submissions, preparing call for participation for hosting Biocuration meetings, preparing materials for the ISB election, monitoring ISB mail and maintaining the website. There are specific positions such as Chair, Secretary and Treasurer that will require a larger time commitment, as they will be in charge of leading the steps of the EC and by extension the membership.

3 positions on the Executive Committee are up for election in 2020/2021. These positions are currently held by Nicole Vasilevsky, Rama Balakrishnan and Sylvain Poux. Nicole and Rama can re-stand for election. (The current ISB EC members are here.)

2020 Electoral Process

A) The Nominating Committee:

A Nominating Committee (NC) has been formed to oversee the electoral process, to review applications, and establish the final list of candidates. We are very grateful for their assistance with the execution of this election. The members of the 2020 Nominating Committee are TBD.

B) Instructions to Candidates: 

  1. If you would like to run for a position on the Executive Committee, you must first register your intent with the NC by emailing intsocbio@simplelists.com
  1. Please fill out this form by 28 August 2020, which includes a ‘statement of intent‘, a brief biographical sketch, and a ‘conflict of interests‘ statement describing any activities, memberships of other associations, editorial positions on journals, etc. (Please email us at intsocbio@simplelists.com if you are unable to access this form.)

C) Timeline:

  • Nominations will be received until 28 August 2020.
  • The NC will review all candidacies and share their selections with the ISB Executive Committee by 14 September 2020.
  • Candidates must be announced to the membership and on website (with letters of intent) by 21 September 2020.
  • Voting will take place online over the course of one week from 27 September – 04 October 2020. (Further details about the voting process will be shared soon). Sue Bello will act as election officer.
  • Only paying members* with registration fees cleared on or before 21 September 2020 will be allowed to vote. If you pay your registration via bank transfer, please allow at least 2-3 working days for the payment to be processed.

*Note – please contact us at intsocbio@simplelists.com if you have issues with registering or renewing your membership. Known issues exist with our membership payment system.

The Nominating Committee is looking forward to receiving your applications!

ISB statement from the Equity, Inclusion and Diversity committee

In response to recent events related to racism and injustice, most recently in the United States, but still present everywhere in the world, and in support of the Black Lives Matter movement. The ISB is a society devoted to fostering diversity, equity, and inclusion in the scientific enterprise and in society as a whole. We unequivocally sustain and believe that Black Lives Matter.

Like all STEM organisations, we cannot ignore both the legacy of racism, especially in the life sciences, and the reality of discrimination in our institutions and workplaces. In the specific case of our membership, we are aware that discrimination is present in our field, and it is often reflected in a lack of recognition, defined career-tracks, or opportunties for advancement for people invested in biocuration.

What can we do? The ISB EDI committee was created in response to a necessary discussion about sexism, and we were aware from the start of the necessity of tackling other forms of discrimination and intersectionality. We must accelerate our current efforts, and go beyond them.

Current efforts:

New initiatives:

  • We are currently working on inviting champions of equity, diversity, and inclusion from a variety of institutions to present webinars on the subject of EDI; to share with us more on how the subject is addressed at their institutions and to advise ISB members on what we can do to make a difference
  • We will continue to review and revise, where necessary, our Code of Ethics and Professional Conduct to ensure that we represent the values and beliefs of all of our community

We welcome you to get in touch with any questions, concerns or suggestions. All members of our community are welcome to participate in the EDI Committee. Please email us at: intsocbio@gmail.com.

Annual General Meeting June 30, 2020

You are invited to a virtual Annual General Meeting of the International Society for Biocuration on Tuesday, 30 June, 8am PST, 11am EST, 4:00 PM UK, 5pm western Europe.

https://us02web.zoom.us/j/88975996534

This will take the form of a 30 minute presentation on the Society’s activities followed by a Q&A to the EC.

We look forward to seeing you there.

The ISB Executive Committee


Call in information below:

Topic: ISB AGM
Time: Jun 30, 2020 04:00 PM London

Join Zoom Meeting
https://us02web.zoom.us/j/88975996534
Meeting ID: 889 7599 6534

One tap mobile:
+13126266799,,88975996534# US (Chicago)
+13462487799,,88975996534# US (Houston)

Dial by your location:
+1 312 626 6799 US (Chicago)
+1 346 248 7799 US (Houston)
+1 646 558 8656 US (New York)
+1 669 900 9128 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Germantown)

Meeting ID: 889 7599 6534

Find your local number:

https://us02web.zoom.us/u/k6yJQrJn7–