-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Issue: Problems with tar extraction logic in _repack_model.py
File:
sagemaker-mlops/src/sagemaker/mlops/workflow/_repack_model.py
There are a few issues in the tar extraction code that cause the “safe member” checks to not work correctly. This affects _get_safe_members(), _is_bad_path(), and custom_extractall_tarfile().
1. _get_safe_members() may iterate over the wrong object
The code currently does:
tar.extractall(path=extract_path, members=_get_safe_members(tar))Iterating over tar (a TarFile object) does not produce TarInfo entries.
It produces file-like objects, so file_info.name, file_info.issym(), etc. fail or behave incorrectly.
Fix:
tar.extractall(path=extract_path, members=_get_safe_members(tar.getmembers()))2. _get_safe_members() uses _get_resolved_path("") as its base directory
This line:
base = _get_resolved_path("")resolves to the current working directory, which is not the directory where the archive content is being extracted. This means the checks are applied against the wrong path.
Recommended fix: pass the actual extract directory as the base.
3. _is_bad_path() uses startswith() to validate paths
The current check:
resolved.startswith(base)is unsafe because paths like /tmp/x2 still “start with” /tmp/x.
This can allow files to escape the intended directory.
Safer fix:
os.path.commonpath([resolved, base]) == baseWhy this matters
Because the code is meant to prevent unsafe paths or symlinks from being extracted, these issues mean the current safety checks don’t reliably work. In some cases they might allow files outside the expected directory, or mistakenly block valid files.