Skip to content

Conversation

@kookster
Copy link
Member

@kookster kookster commented Oct 18, 2025

implements #174

"-show_streams",
"-show_format",
"-show_entries",
"format_tags",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what actually causes ffprobe to detect and return the id3 tags

* @property {string} Type
* @property {string} [Type]
* @property {boolean} [EBUR128]
* @property {string} [MatchTags]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Task attribute to pass in a regex that can be used to select and return only certain tags


// use regex to extract only the matching tags
Object.keys(tags).forEach((key) => {
if (regex.test(key) || regex.test(tags[key])) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks if there is a matching tag by key or value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts about the API for this feature:

  1. I think it would be good to allow the job definition to explicitly decide if it's matching on tag key and/or tag value

  2. I could possibly imagine wanting to provide other comparison methods down the road besides regex, so for future proofing it may be worth being more verbose about that in the API, so we don't end up with a single parameter that paints us into a corner

  3. Right now we're only operating on tags coming back from FFprobe, and we don't run all files that are inspectable through FFmpeg. So for example, if a job inspects a JPEG, which can have tag-like things, this wouldn't do anything, even if they job asks for tags that are present on the file. We should make sure the docs are very clear about what types of metadata this will work with, and, if necessary, for which file formats. Right now this wouldn't even return tags for video files, which we do inspect with FFprobe, so there's a bit of ambiguity in how this feature works at the moment.

    I think it'd be best to avoid making this something like MatchId3Tags at the API level, and that's not even really what it's doing already, but if we do want to make the choice that this isn't a a general-purpose "find metadata key/values" feature, I'd want to make sure the API reflects that.

    My preference would be to have reasonable support for standard types of KV metadata on audio, video, and images. Audio and video I think we can rely on the FFprobe stuff already in here, and the sharp lib we're already using for images has solid support for Xmp/EXIF, so it's probably not hard to expand this over to that. I think those two tools normalize access to tags/metadata across a lot of different formats to a pretty reasonable degree to call that a good starting point.

I'm imagining something like this for the API (structure, not naming)

Task: {
    Type: "Inspect",
    ReturnMetadata: {
        Keys: {
            StringLike: "AIS_AD_BREAK_"
        },
        Values: {
            StringLike: "AIS_AD_BREAK_"
        }
    }
}

Copy link
Member Author

@kookster kookster Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with most of this; I'm not sure about expanding the scope of this ticket to include how we handle metadata on anything that might go through Porter, but I agree current limits could be documented, and issues added for building out the image or other support.

Adobe XMP is also included in the id3 of many of the files I looked at, but ffprobe on its own doesn't do a great job with it - additional processing or a different tool might (e.g. the xmp is returned hex encoded, ffprobe doesn't handle that, or parsing the xml)

);
expect(result.Task).toEqual("Inspect");
expect(result.Inspection.Audio.Tags).toEqual({
comment: "AIS_AD_BREAK_1=2000,0;",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns the comment that matches

The project includes a `Makefile` to centralize various commands used in development and operation. When new commands are required, be sure to add them to the `Makefile`, even if it calls out to something else, like `npm run-script`.

Besides language runtimes, package managers, and command line programs, tooling should be written to assume that libraries are installed within the package, not globally available on the system. I.e, do not asusme the `eslint` or `prettier` are installed globally.
Besides language runtimes, package managers, and command line programs, tooling should be written to assume that libraries are installed within the package, not globally available on the system. I.e, do not assume the `eslint` or `prettier` are installed globally.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some spell check changes on here

{
"name": "porter",
"version": "1.0.0",
"type": "module",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what got me past import errors!

Copy link
Member

@farski farski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional thinking around how best to handle this feature for the majority of files that get inspected.

Also, need to update the README for this new functionality.

If additional tests are going to added, I would do them in a branch off this branch, so this PR doesn't get too overloaded.


// use regex to extract only the matching tags
Object.keys(tags).forEach((key) => {
if (regex.test(key) || regex.test(tags[key])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts about the API for this feature:

  1. I think it would be good to allow the job definition to explicitly decide if it's matching on tag key and/or tag value

  2. I could possibly imagine wanting to provide other comparison methods down the road besides regex, so for future proofing it may be worth being more verbose about that in the API, so we don't end up with a single parameter that paints us into a corner

  3. Right now we're only operating on tags coming back from FFprobe, and we don't run all files that are inspectable through FFmpeg. So for example, if a job inspects a JPEG, which can have tag-like things, this wouldn't do anything, even if they job asks for tags that are present on the file. We should make sure the docs are very clear about what types of metadata this will work with, and, if necessary, for which file formats. Right now this wouldn't even return tags for video files, which we do inspect with FFprobe, so there's a bit of ambiguity in how this feature works at the moment.

    I think it'd be best to avoid making this something like MatchId3Tags at the API level, and that's not even really what it's doing already, but if we do want to make the choice that this isn't a a general-purpose "find metadata key/values" feature, I'd want to make sure the API reflects that.

    My preference would be to have reasonable support for standard types of KV metadata on audio, video, and images. Audio and video I think we can rely on the FFprobe stuff already in here, and the sharp lib we're already using for images has solid support for Xmp/EXIF, so it's probably not hard to expand this over to that. I think those two tools normalize access to tags/metadata across a lot of different formats to a pretty reasonable degree to call that a good starting point.

I'm imagining something like this for the API (structure, not naming)

Task: {
    Type: "Inspect",
    ReturnMetadata: {
        Keys: {
            StringLike: "AIS_AD_BREAK_"
        },
        Values: {
            StringLike: "AIS_AD_BREAK_"
        }
    }
}

@kookster kookster changed the title Extract id3 tags by regex WIP: Extract id3 tags by regex Oct 21, 2025
@kookster kookster changed the title WIP: Extract id3 tags by regex Extract id3 tags by regex Oct 31, 2025
@kookster kookster requested a review from farski October 31, 2025 19:09
Copy link
Member

@farski farski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting pretty close

/** @typedef {import('./index.js').InspectTask} InspectTask */

/**
* @typedef {{ [key: string]: string }} Tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to elevate metadata to the top level of the Inspection object? I can't think of any cases where individual streams with in a file would have different metadata like what we're dealing with here (e.g., tags). I guess technically everything inspect is returning is metadata in a way, but we don't currently call things like bit depth or video resolution that.

So for example, once we support tag matching on other file types, does it make sense that those tags end up in different places in the results, depending on if the file is video/audio/image?

It could be useful to say that we treat this type of returnable metadata as a file-level aspect, and we normalize things so that no matter where in the file or which part of the inspection process it comes from, things like tags would always end up under Inspection.Metadata.Tags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong opinion. I think almost everything we return is a kind of metadata, so I would perhaps just call these "tags" as they are arbitrary data added to the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue I see with pulling this metadata up out of the Audio is that it messes up how data is returned now from each of the different inspect tasks, that index.js would have to get reworked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to put it is that a single inspect task can include multiple return values, like when an audio file could include an image in the id3 data, so it makes sense to have the metadata for the audio and image separated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to resolve this unless you feel strongly, and have a way to handle combined metadata from each inspect type?

import { inspect as ebur128 } from "./ebu-r-128.js";
import { inspect as ffprobe } from "./ffprobe.js";
import { inspect as mpck } from "./mpck.js";
import { ffprobeTags } from "./tags.js";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the tags logic in a shared util file, so the types and methods could be reused

* @property {number} [LoudnessTruePeak]
* @property {number} [LoudnessRange]
* @property {number} [UnidentifiedBytes]
* @property {Tag[]} [Tags]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be a hash, but is now a list of objects, as the same key may be used for more than one tag.

@kookster
Copy link
Member Author

kookster commented Nov 3, 2025

A few changes here -

  • Added support for video to audio tags, extracted common ffmpeg tag extraction
  • Added support for sharp extracted png comments, which operate like tags
  • Because comments and possible id3 tags can have repeated labels, use an array of objects instead of a map for them
  • StringMatches instead of StringIncludes to match regex functionality
    @farski I think we're even closer?

@kookster kookster requested a review from farski November 3, 2025 14:51
@kookster kookster merged commit 6313dab into main Nov 4, 2025
4 checks passed
@kookster kookster deleted the feat/id3_breaks branch November 4, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants