PDF->SVG/PNG: fusing Poppler’s battle-tested PDF engine with Cairo’s SVG output surface, then wraps the native glue in a tidy C# API via [DllImport]
Building a Cross-Platform PDF-to-SVG Bridge for .NET
The business problem
A few weeks ago a customer asked for PDF pages to appear inside a generated DOCX report.
Converting each page to raster images (PNG/JPEG) ballooned the file size and slowed down the pipeline.
Embedding raw PDF was out - the Word rendering engine would not display it reliably.
Goal: “If a page can be expressed as scalable vector graphics, embed it as SVG; otherwise fall back to a single PNG bitmap.”
That single sentence drove the rest of the journey.
Designing the native engine
Choosing the tools
Poppler - mature PDF parser / renderer.
Cairo - can paint to many back-ends, including an SVG and PNG surfaces.
Together they let us walk the display list that a PDF page contains and stream the final drawing instructions into memory - no temp files required.
A minimal C API
The C++ core is wrapped in five plain-C entry points so any language with FFI can bind to it:
/* pdf2svg.h - simplified */
void* pdf_open_doc(const unsigned char* bytes, int len, int* out_page_count);
void pdf_close_doc(void* doc);
unsigned char* pdf_get_page_data(
void* doc, int page_num,
bool force_to_png, /* fallback switch */
int dpi, /* raster DPI for PNG pages */
int* out_len,
bool* out_is_svg /* tells caller PNG vs SVG */
);
void pdf_release_buffer(unsigned char* buffer);
Under the hood pdf_get_page_data
looks at Poppler’s text mapping to decide whether the page is vector-friendly. If it’s an “image only” page (e.g., a scanned sheet), Cairo renders it onto a PNG surface at the requested DPI.
Implementation of the - pdf_get_page_data
unsigned char*
pdf_get_page_data(void* doc_handle,
int page_num,
bool force_to_png,
int dpi,
int* out_len,
bool* out_is_svg)
{
auto* ctx = static_cast<PdfDoc*>(doc_handle);
PopplerPage* page = poppler_document_get_page(ctx->doc, page_num);
if (!page) {
*out_len = 0;
*out_is_svg = false;
return nullptr;
}
double w, h;
poppler_page_get_size(page, &w, &h);
bool has_text = false;
bool image_only = false;
if (!force_to_png) {
char* txt = poppler_page_get_text(page);
has_text = (txt && *txt != '\0');
g_free(txt);
GList* images = poppler_page_get_image_mapping(page);
const double tol = 0.5;
for (GList* iter = images; iter; iter = iter->next) {
auto* m = static_cast<PopplerImageMapping*>(iter->data);
if (fabs(m->area.x1) < tol &&
fabs(m->area.y1) < tol &&
fabs(m->area.x2 - w) < tol &&
fabs(m->area.y2 - h) < tol)
{
image_only = true;
break;
}
}
poppler_page_free_image_mapping(images);
} else {
image_only = true;
}
std::vector<unsigned char> buf;
if (image_only && !has_text) {
if (dpi <= 0)
dpi = 72;
double scale = dpi / 72.0;
int width_px = static_cast<int>(std::ceil(w * scale));
int height_px = static_cast<int>(std::ceil(h * scale));
// Rasterize to PNG
cairo_surface_t* img_surf = cairo_image_surface_create(
CAIRO_FORMAT_ARGB32,
width_px,
height_px);
cairo_t* img_cr = cairo_create(img_surf);
cairo_scale(img_cr, scale, scale);
poppler_page_render(page, img_cr);
cairo_destroy(img_cr);
cairo_surface_write_to_png_stream(img_surf, write_cb, &buf);
cairo_surface_destroy(img_surf);
*out_is_svg = false;
} else {
// Emit SVG
cairo_surface_t* svg_surf = cairo_svg_surface_create_for_stream(
write_cb, &buf, w, h);
cairo_t* svg_cr = cairo_create(svg_surf);
poppler_page_render_for_printing(page, svg_cr);
cairo_destroy(svg_cr);
cairo_surface_destroy(svg_surf);
*out_is_svg = true;
}
g_object_unref(page);
*out_len = static_cast<int>(buf.size());
if (*out_len == 0) {
*out_len = 0;
*out_is_svg = false;
return nullptr;
}
unsigned char* out = static_cast<unsigned char*>(std::malloc(buf.size()));
if (!out) {
*out_len = 0;
*out_is_svg = false;
return nullptr;
}
std::memcpy(out, buf.data(), buf.size());
return out;
}
Key decisions
Single marshalled buffer. The native side allocates one contiguous block so any .NET caller - Mono, CoreCLR, AOT - can copy and free safely.
ByteSink
is a trivialstd::vector<uint8_t>
wrapperno libpng or gzip is pulled in - Cairo writes PNG/SVG bytes directly.
Tolerance for "mixed" pages. If any vector/text survives, keep SVG
otherwise fall back to PNG. That rule kept our Word files tiny yet visually identical.
Building the shared object / DLL
The repository is set up with CMake + vcpkg:
git clone https://github.com/Forevka/pdf2svgwrapper
cd pdf2svgwrapper
git submodule update --init --recursive # pulls vcpkg
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
This produces pdf2svgwrapper.dll
+ bunch dependencies (Windows) or libpdf2svgwrapper.so
(Linux).
Shipping it to .NET
P/Invoke bindings
Because the native API is pure C, the C# side is a handful of [DllImport]
declarations:
static class NativeMethods
{
[DllImport("native-svg2pdf/pdf2svgwrapper", CallingConvention = CallingConvention.Cdecl)]
public static extern IntPtr pdf_open_doc(
IntPtr pdfData,
int pdfLen,
out int pageCount
);
[DllImport("native-svg2pdf/pdf2svgwrapper", CallingConvention = CallingConvention.Cdecl)]
public static extern IntPtr pdf_get_page_data(
IntPtr docHandle,
int pageNum,
bool isForcePng,
int dpi,
out int dataLen,
out bool isSvg
);
[DllImport("native-svg2pdf/pdf2svgwrapper", CallingConvention = CallingConvention.Cdecl)]
public static extern void pdf_close_doc(IntPtr docHandle);
[DllImport("native-svg2pdf/pdf2svgwrapper", CallingConvention = CallingConvention.Cdecl)]
public static extern void pdf_release_buffer(IntPtr ptr);
}
A thin Pdf2SvgInterop
class converts those raw pointers into managed byte[]
or MemoryStream
.
Explaining Pdf2SvgInterop.ConvertPdfPages
public static IEnumerable<PdfPageData> ConvertPdfPages(byte[] pdfBytes, bool isForceToPng, int dpi = 300)
{
// pin the managed array
var handle = GCHandle.Alloc(pdfBytes, GCHandleType.Pinned);
try
{
IntPtr ptr = NativeMethods.pdf_open_doc(handle.AddrOfPinnedObject(), pdfBytes.Length, out int pageCount);
if (ptr == IntPtr.Zero)
throw new PopplerCairoConvertationException("Failed to open PDF.");
try
{
for (int i = 0; i < pageCount; i++)
{
IntPtr dataBuf = NativeMethods.pdf_get_page_data(ptr, i, isForceToPng, dpi, out int dataLen, out
var isSvg);
if (dataBuf == IntPtr.Zero) throw new PopplerCairoConvertationException($ "Page {i} conversion failed.");
try
{
var dataBytes = new byte[dataLen];
Marshal.Copy(dataBuf, dataBytes, 0, dataLen);
yield return new PdfPageData
{
Data = new MemoryStream(dataBytes, writable: false),
IsSvg = isSvg,
};
}
finally
{
NativeMethods.pdf_release_buffer(dataBuf);
}
}
}
finally
{
NativeMethods.pdf_close_doc(ptr);
}
}
finally
{
handle.Free();
}
}
Why this shape?
Concern | How the method addresses it |
---|---|
GC pinning | We never hand managed memory to C++; instead C++ allocates, C# copies. |
Large PDFs | Because we use generators: |
Thread safety | Each call operates on its own document handle - no global state. |
AOT friendliness | Pure |
Back-pressure | The caller decides when to dispose or stream each |
No SafeHandle
close-over a try/finally
instead - this keeps allocations to a minimum in the hot loop.
Cross-platform lookup is handled by the NuGet’s runtimes/*/native/
layout, so pdf2svgwrapper
resolves to pdf2svgwrapper.dll
, .so
, or .dylib
automatically.
NuGet Access
This package can be added to your project via this command
dotnet add package PDF2SVG.PopplerCairo.Bindings
That does three things automatically:
What happens | |
---|---|
Managed DLL is referenced ( | Gives you the |
Native binaries are copied to | The correct |
| You can |
Example of usage
The sample program shows basic usage:
byte[] pdfBytes = File.ReadAllBytes("./input-2.pdf");
var pageData = Pdf2SvgInterop.ConvertPdfPages(pdfBytes, true); // optional dpi can be provided
var index = 0;
foreach (var pdfPageData in pageData)
{
if (pdfPageData.IsSvg)
File.WriteAllBytes($"./output-{index}.svg", pdfPageData.Data.ToArray());
else
{
File.WriteAllBytes($"./output-{index}.png", pdfPageData.Data.ToArray());
}
Console.WriteLine($"Pdf page {index} processed");
index++;
}
Console.WriteLine("Pdf processed");
This will read input-2.pdf
from the project directory and create output-{index}.svg
or output-{index}.png
for each page.
Where to Grab the Code
Repository | ||
---|---|---|
Native C++ engine - Poppler + Cairo wrapper | CMake project, minimal C API, vcpkg manifests, ready to cross-compile Windows / Linux | |
.NET bindings & convenience helpers |
| |
Published NuGet package | https://www.nuget.org/packages/PDF2SVG.PopplerCairo.Bindings | Pre-built managed DLL plus platform-specific native binaries |
End-to-end demo |
| Opens a PDF, streams each page to SVG/PNG in-memory, and drops them into a disk |
Conclusion
In under a week I went from "Word reports with blurry screenshots" to a drop-in .NET
helper that streams crisp SVG pages straight into Open XML. The recipe was simple:
Leverage battle-tested C libraries (Poppler + Cairo).
Hide them behind a five-function C facade.
Use P/Invoke to light them up in C#.
Pack the native bits so downstream teams just reference a NuGet.
The result feels almost unfair: a few hundred lines of C++, a few dozen lines of C#, and suddenly every .NET app gets a PDF-to-SVG super-power.
So next time you bump into a "native only" library, remember - bridging C/C++ to C# is usually the easy part. The real magic is deciding exactly the problem you want that native code to solve.