Skip to content

Conversation

@slymattz
Copy link
Contributor

@slymattz slymattz commented Jan 23, 2026

Testing the changes

  • I tested the changes in this PR: briefly

The way I tested python bindings, i.e. pypdfium2:

import pypdfium2 as pdfium

pdf = pdfium.PdfDocument.new()
pdf.new_page(width=595, height=842)

output_filename = "test_output.pdf"
pdf.save(output_filename)

print(f"Saved successfully to: {output_filename}")

pdf.close()
import math
import ctypes
import os.path
import PIL.Image
import pypdfium2.raw as pdfium_c

# Load the document
filepath = os.path.abspath("./MOC-61793_Lenarex_tux_linux.pdf")
pdf = pdfium_c.FPDF_LoadDocument((filepath+"\x00").encode("utf-8"), None)

# Check page count to make sure it was loaded correctly
page_count = pdfium_c.FPDF_GetPageCount(pdf)
assert page_count >= 1

# Load the first page and get its dimensions
page = pdfium_c.FPDF_LoadPage(pdf, 0)
width  = math.ceil(pdfium_c.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium_c.FPDF_GetPageHeightF(page))

# Create a bitmap
# (Note, pdfium is faster at rendering transparency if we use BGRA rather than BGRx)
use_alpha = pdfium_c.FPDFPage_HasTransparency(page)
bitmap = pdfium_c.FPDFBitmap_Create(width, height, int(use_alpha))
# Fill the whole bitmap with a white background
# The color is given as a 32-bit integer in ARGB format (8 bits per channel)
pdfium_c.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

# Store common rendering arguments
render_args = (
    bitmap,  # the bitmap
    page,    # the page
    # positions and sizes are to be given in pixels and may exceed the bitmap
    0,       # left start position
    0,       # top start position
    width,   # horizontal size
    height,  # vertical size
    0,       # rotation (as constant, not in degrees!)
    pdfium_c.FPDF_LCD_TEXT | pdfium_c.FPDF_ANNOT,  # rendering flags, combined with binary or
)

# Render the page
pdfium_c.FPDF_RenderPageBitmap(*render_args)

# Get the value of a pointer to the first item of the buffer
buffer_ptrval = pdfium_c.FPDFBitmap_GetBuffer(bitmap)
assert buffer_ptrval, "buffer pointer value must be non-null"
# Cast the pointer value to an actual pointer object so we can access .contents
buffer_ptr = ctypes.cast(buffer_ptrval, ctypes.POINTER(ctypes.c_ubyte))
# Re-interpret as array
buffer = (ctypes.c_ubyte * (width * height * 4)).from_address(ctypes.addressof(buffer_ptr.contents))

# Create a PIL image from the buffer contents
img = PIL.Image.frombuffer("RGBA", (width, height), buffer, "raw", "BGRA", 0, 1)
# Save it as file
img.save("out.png")

# Free resources
pdfium_c.FPDFBitmap_Destroy(bitmap)
pdfium_c.FPDF_ClosePage(page)
pdfium_c.FPDF_CloseDocument(pdf) 

https://pypi.org/project/pypdfium2/

Local build testing

  • I built this PR locally for my native architecture: x86_64-glibc
  • I built this PR locally for these architectures (if supported. mark crossbuilds):
    • aarch64-musl (crossbuild)
    • armv6l-musl (crossbuild)

@slymattz slymattz marked this pull request as draft January 23, 2026 16:54
@slymattz slymattz force-pushed the ocrmypdf_major branch 3 times, most recently from 401cbdd to cce7b7e Compare January 23, 2026 17:39
@slymattz slymattz marked this pull request as ready for review January 23, 2026 18:17
@slymattz slymattz changed the title [WIP] python3-ocrmypdf: update to 17.0.0b1 python3-ocrmypdf: update to 17.0.0b1 Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant