侧边栏壁纸
博主头像
phphi

phphi's blog

  • 累计撰写 51 篇文章
  • 累计收到 0 条评论

Day27 - Python操作PDF文件

2026-4-24 / 0 评论 / 1 阅读

PDF 文件操作

PyPDF2 读取和操作 PDF

pip install PyPDF2

提取文本

import PyPDF2

reader = PyPDF2.PdfReader('test.pdf')
for page in reader.pages:
    text = page.extract_text()
    print(text)

旋转页面

reader = PyPDF2.PdfReader('test.pdf')
writer = PyPDF2.PdfWriter()

for page in reader.pages:
    rotated = page.rotate(90)  # 顺时针旋转90度
    # page.rotateCounterClockwise(90)  # 逆时针
    writer.add_page(rotated)

with open('rotated.pdf', 'wb') as f:
    writer.write(f)

加密 PDF

writer = PyPDF2.PdfWriter()
for page in PyPDF2.PdfReader('test.pdf').pages:
    writer.add_page(page)
writer.encrypt('password123')
with open('encrypted.pdf', 'wb') as f:
    writer.write(f)

合并页面(水印)

reader1 = PyPDF2.PdfReader('test.pdf')
reader2 = PyPDF2.PdfReader('watermark.pdf')  # 水印页PDF
writer = PyPDF2.PdfWriter()
watermark = reader2.pages[0]

for page in reader1.pages:
    page.merge_page(watermark)  # 叠加水印
    writer.add_page(page)

with open('watermarked.pdf', 'wb') as f:
    writer.write(f)

reportlab 生成 PDF

pip install reportlab
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfgen import canvas

c = canvas.Canvas('demo.pdf', pagesize=A4)
w, h = A4

# 绘制图片
img = canvas.ImageReader('photo.jpg')
c.drawImage(img, 20, h - 200, 150, 180)

# 换页
c.showPage()

# 注册字体(支持中文)
pdfmetrics.registerFont(TTFont('MyFont', 'chinese_font.ttf'))

# 写入文字
c.setFont('MyFont', 40)
c.setFillColorRGB(0.9, 0.5, 0.3)
c.drawString(w//2 - 80, h//2, '你好,PDF!')

c.save()

其他工具

用途
pdfminer.six 提取文本(命令行:pdf2text.py file.pdf
pdfplumber 表格提取
pypdf PyPDF2 的活跃分支,推荐用这个

总结

  • PyPDF2/pypdf:读取文本、旋转页面、加密、合并/水印
  • reportlab:从零生成 PDF,支持绘图和文字
  • 中文 PDF 生成需要注册中文字体
  • PDF 文本提取效果取决于文件是否嵌入字体(扫描件无法提取)