Skip to content

Fix Funcotator transcript override functionality #9214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2025

Conversation

kockan
Copy link
Contributor

@kockan kockan commented Jun 23, 2025

Some of the newer GENCODE GTFs (at the very least GRCh37 liftovers of versions >v19) contain transcript IDs whose version numbers include a secondary number separated with an underscore, e.g. "ENST00000xxxxxx.yy_zz". The currently used regex only considers a single dot character followed by one or more digits and therefore fails to properly remove the version number. This PR should fix that.
Personal note: I don't like regexes in production code anyway, so instead of replacing it with one that works for now, I opted for a cleaner and more maintainable (in my opinion) alternative.

In addition, the recently introduced --prefer-mane-transcripts requires the file to be strictly of the GENCODE GTF format, but the hg19 data source with GENCODE v43 (backmapped to GRCh37) contains a GTF in the ENSEMBL GTF format (actually GFF3), making this option unusable. This PR should also fix that.

- Correctly remove the version suffix from transcripts that include an underscore character

- Allow ENSEMBL GTF files to use the --prefer-mane-transcripts option
@kockan kockan requested a review from kachulis June 23, 2025 18:44
@gatk-bot
Copy link

gatk-bot commented Jun 23, 2025

Github actions tests reported job failures from actions build 15832447462
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 15832447462.11 logs
integration 17.0.6+10 15832447462.0 logs

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kockan! just a couple very small comments, and a failing test, then should be good to go

@kockan kockan requested a review from kachulis June 24, 2025 17:39
@@ -886,19 +886,25 @@ private List<GencodeFuncotation> createFuncotationsHelper(final VariantContext v
List<GencodeGtfTranscriptFeature> transcriptList;

// Only get basic transcripts if we're using data from Gencode:
if ( gtfFeature.getGtfSourceFileType().equals(GencodeGtfCodec.GTF_FILE_TYPE_STRING) ||
gtfFeature.getGtfSourceFileType().equals(EnsemblGtfCodec.GTF_FILE_TYPE_STRING)) {
if ( gtfFeature.getGtfSourceFileType().equals(GencodeGtfCodec.GTF_FILE_TYPE_STRING) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why split this up like this instead of keeping in the same if block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh i see, nvm

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks @kockan!

@kockan kockan merged commit 2c08b8e into master Jun 24, 2025
21 checks passed
@kockan kockan deleted the kockan_funcotator_transcript_fix branch June 24, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants