Replies: 1 comment
-
|
The minerU plugin fails to process large files that take longer than ~250–300 seconds because its polling timeout is hardcoded in the plugin code: it tries up to 50 times with a 5-second interval, then raises a timeout error if the result isn't ready. These timeout and retry values are not configurable via environment variables or pipeline parameters, which is why you can't find them in your Task IDs (batch_id) are generated by the backend parsing server for each run and always change, so cache reuse is not possible—the plugin doesn't control or reuse task IDs, nor does it implement any caching mechanism source. Workarounds:
There are currently no official configuration options or plugin updates that address these limitations. If this is a blocker, consider requesting a feature or discussing it with the maintainers. For reference, see the minerU plugin's README and recent PRs. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
Content
When using the minerU plugin, small files are processed without any problems; however, large files fail if the processing time exceeds 300 seconds. Additionally, the relevant configuration options cannot be found in the env file. What should be done? Moreover, the task ID changes with each run, making it impossible to reuse the cache from a previous processing session.
pipeline
Beta Was this translation helpful? Give feedback.
All reactions