-
Notifications
You must be signed in to change notification settings - Fork 3
Extract id3 tags by regex #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "-show_streams", | ||
| "-show_format", | ||
| "-show_entries", | ||
| "format_tags", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what actually causes ffprobe to detect and return the id3 tags
src/lambdas/inspect/index.js
Outdated
| * @property {string} Type | ||
| * @property {string} [Type] | ||
| * @property {boolean} [EBUR128] | ||
| * @property {string} [MatchTags] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a Task attribute to pass in a regex that can be used to select and return only certain tags
src/lambdas/inspect/audio.js
Outdated
|
|
||
| // use regex to extract only the matching tags | ||
| Object.keys(tags).forEach((key) => { | ||
| if (regex.test(key) || regex.test(tags[key])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks if there is a matching tag by key or value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts about the API for this feature:
-
I think it would be good to allow the job definition to explicitly decide if it's matching on tag key and/or tag value
-
I could possibly imagine wanting to provide other comparison methods down the road besides regex, so for future proofing it may be worth being more verbose about that in the API, so we don't end up with a single parameter that paints us into a corner
-
Right now we're only operating on tags coming back from FFprobe, and we don't run all files that are inspectable through FFmpeg. So for example, if a job inspects a JPEG, which can have tag-like things, this wouldn't do anything, even if they job asks for tags that are present on the file. We should make sure the docs are very clear about what types of metadata this will work with, and, if necessary, for which file formats. Right now this wouldn't even return tags for video files, which we do inspect with FFprobe, so there's a bit of ambiguity in how this feature works at the moment.
I think it'd be best to avoid making this something like
MatchId3Tagsat the API level, and that's not even really what it's doing already, but if we do want to make the choice that this isn't a a general-purpose "find metadata key/values" feature, I'd want to make sure the API reflects that.My preference would be to have reasonable support for standard types of KV metadata on audio, video, and images. Audio and video I think we can rely on the FFprobe stuff already in here, and the
sharplib we're already using for images has solid support for Xmp/EXIF, so it's probably not hard to expand this over to that. I think those two tools normalize access to tags/metadata across a lot of different formats to a pretty reasonable degree to call that a good starting point.
I'm imagining something like this for the API (structure, not naming)
Task: {
Type: "Inspect",
ReturnMetadata: {
Keys: {
StringLike: "AIS_AD_BREAK_"
},
Values: {
StringLike: "AIS_AD_BREAK_"
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with most of this; I'm not sure about expanding the scope of this ticket to include how we handle metadata on anything that might go through Porter, but I agree current limits could be documented, and issues added for building out the image or other support.
Adobe XMP is also included in the id3 of many of the files I looked at, but ffprobe on its own doesn't do a great job with it - additional processing or a different tool might (e.g. the xmp is returned hex encoded, ffprobe doesn't handle that, or parsing the xml)
| ); | ||
| expect(result.Task).toEqual("Inspect"); | ||
| expect(result.Inspection.Audio.Tags).toEqual({ | ||
| comment: "AIS_AD_BREAK_1=2000,0;", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns the comment that matches
| The project includes a `Makefile` to centralize various commands used in development and operation. When new commands are required, be sure to add them to the `Makefile`, even if it calls out to something else, like `npm run-script`. | ||
|
|
||
| Besides language runtimes, package managers, and command line programs, tooling should be written to assume that libraries are installed within the package, not globally available on the system. I.e, do not asusme the `eslint` or `prettier` are installed globally. | ||
| Besides language runtimes, package managers, and command line programs, tooling should be written to assume that libraries are installed within the package, not globally available on the system. I.e, do not assume the `eslint` or `prettier` are installed globally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some spell check changes on here
| { | ||
| "name": "porter", | ||
| "version": "1.0.0", | ||
| "type": "module", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what got me past import errors!
farski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional thinking around how best to handle this feature for the majority of files that get inspected.
Also, need to update the README for this new functionality.
If additional tests are going to added, I would do them in a branch off this branch, so this PR doesn't get too overloaded.
src/lambdas/inspect/audio.js
Outdated
|
|
||
| // use regex to extract only the matching tags | ||
| Object.keys(tags).forEach((key) => { | ||
| if (regex.test(key) || regex.test(tags[key])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts about the API for this feature:
-
I think it would be good to allow the job definition to explicitly decide if it's matching on tag key and/or tag value
-
I could possibly imagine wanting to provide other comparison methods down the road besides regex, so for future proofing it may be worth being more verbose about that in the API, so we don't end up with a single parameter that paints us into a corner
-
Right now we're only operating on tags coming back from FFprobe, and we don't run all files that are inspectable through FFmpeg. So for example, if a job inspects a JPEG, which can have tag-like things, this wouldn't do anything, even if they job asks for tags that are present on the file. We should make sure the docs are very clear about what types of metadata this will work with, and, if necessary, for which file formats. Right now this wouldn't even return tags for video files, which we do inspect with FFprobe, so there's a bit of ambiguity in how this feature works at the moment.
I think it'd be best to avoid making this something like
MatchId3Tagsat the API level, and that's not even really what it's doing already, but if we do want to make the choice that this isn't a a general-purpose "find metadata key/values" feature, I'd want to make sure the API reflects that.My preference would be to have reasonable support for standard types of KV metadata on audio, video, and images. Audio and video I think we can rely on the FFprobe stuff already in here, and the
sharplib we're already using for images has solid support for Xmp/EXIF, so it's probably not hard to expand this over to that. I think those two tools normalize access to tags/metadata across a lot of different formats to a pretty reasonable degree to call that a good starting point.
I'm imagining something like this for the API (structure, not naming)
Task: {
Type: "Inspect",
ReturnMetadata: {
Keys: {
StringLike: "AIS_AD_BREAK_"
},
Values: {
StringLike: "AIS_AD_BREAK_"
}
}
}
farski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is getting pretty close
src/lambdas/inspect/audio.js
Outdated
| /** @typedef {import('./index.js').InspectTask} InspectTask */ | ||
|
|
||
| /** | ||
| * @typedef {{ [key: string]: string }} Tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to elevate metadata to the top level of the Inspection object? I can't think of any cases where individual streams with in a file would have different metadata like what we're dealing with here (e.g., tags). I guess technically everything inspect is returning is metadata in a way, but we don't currently call things like bit depth or video resolution that.
So for example, once we support tag matching on other file types, does it make sense that those tags end up in different places in the results, depending on if the file is video/audio/image?
It could be useful to say that we treat this type of returnable metadata as a file-level aspect, and we normalize things so that no matter where in the file or which part of the inspection process it comes from, things like tags would always end up under Inspection.Metadata.Tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a strong opinion. I think almost everything we return is a kind of metadata, so I would perhaps just call these "tags" as they are arbitrary data added to the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue I see with pulling this metadata up out of the Audio is that it messes up how data is returned now from each of the different inspect tasks, that index.js would have to get reworked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to put it is that a single inspect task can include multiple return values, like when an audio file could include an image in the id3 data, so it makes sense to have the metadata for the audio and image separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to resolve this unless you feel strongly, and have a way to handle combined metadata from each inspect type?
| import { inspect as ebur128 } from "./ebu-r-128.js"; | ||
| import { inspect as ffprobe } from "./ffprobe.js"; | ||
| import { inspect as mpck } from "./mpck.js"; | ||
| import { ffprobeTags } from "./tags.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the tags logic in a shared util file, so the types and methods could be reused
| * @property {number} [LoudnessTruePeak] | ||
| * @property {number} [LoudnessRange] | ||
| * @property {number} [UnidentifiedBytes] | ||
| * @property {Tag[]} [Tags] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be a hash, but is now a list of objects, as the same key may be used for more than one tag.
|
A few changes here -
|
implements #174