Skip to content

Reimplement conflation to allow on-the-fly conflation #224

@gaurav

Description

@gaurav

One thing that's really helped with identifying the correct clique is to (hugely) prefer preferred name exact matches over non-exact matches. One problem with conflating identifiers before putting them into NameRes is that unless we've chosen exactly right name for the overall clique, it can make it harder to find that clique and for other, non-exact matches to override it.

One way of fixing that would be to revert to our previous plan for handling conflations, which is to load the conflation files into memory and to apply conflation on-the-fly as follows:

  1. You can search NameRes without conflation turned on to get individual clique entries -- this is often what you want anyway, although you then may need to manually conflate it afterwards.
  2. You can search NameRes with conflation turned on -- we run search for the best matches and, if a match has a conflated identifier, we expand it on-the-fly to include all the other cliques in that conflation. We also add some metadata to indicate what is being conflated.
  3. If you want to look up a CURIE, we return the unconflated image, but can (optionally?) also include the list of other CURIEs that we would conflate it do under a particular conflation.

This would at least fix tylenol, which has an exact match in UMLS:C0699142, but which would then be combined into acetaminophen when conflation is applied. It might help with others, too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions