June 9, 2026

How to Extract Transactions from a PDF Bank Statement Automatically

Manually copying bank statement data into spreadsheets is one of the most time-consuming tasks in accounting. Here's how to automate it completely.

The Core Problem

Bank PDFs come in dozens of different layouts. Some use tables, some use line-based formats. Dates appear in many different patterns. Amounts may use commas as thousand separators or decimal points depending on the country. Currency symbols and conventions vary widely.

Template-based extraction fails every time a bank updates its layout. AI-powered extraction learns the document's structure and handles variations automatically.

Method 1: REST API (Fastest)

The simplest approach — upload a PDF, get structured JSON back.

import requests

response = requests.post(
    "https://api.bank-statement-parser.clkr.work/extract",
    headers={"X-Api-Key": "pex_your_api_key"},
    files={"file": open("statement.pdf", "rb")}
)

data = response.json()
print(f"Format: {data['bankKey']}")
print(f"Transactions: {len(data['transactions'])}")

for txn in data['transactions']:
    print(f"{txn['date']} | {txn['description'][:40]:40} | {txn['amount']:>10.2f}")

Example output:

Format: bank_v1
Transactions: 47
2026-05-01 | MARKET PAYMENT                           |    -245.90
2026-05-02 | SALARY CREDIT                            |   8500.00
2026-05-03 | ELECTRICITY BILL                         |    -189.00

Method 2: Export to Excel/CSV

import requests
import pandas as pd

resp = requests.post(
    "https://api.bank-statement-parser.clkr.work/extract",
    headers={"X-Api-Key": "pex_your_api_key"},
    files={"file": open("statement.pdf", "rb")}
)

df = pd.DataFrame(resp.json()["transactions"])
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df.to_excel("transactions.xlsx", index=False)
print(f"Exported {len(df)} transactions")

Method 3: Batch Processing

Process an entire folder of bank statements in one go:

import os, requests, pandas as pd
from pathlib import Path

API_KEY = "pex_your_api_key"
all_transactions = []

for pdf_path in Path("./statements").glob("*.pdf"):
    resp = requests.post(
        "https://api.bank-statement-parser.clkr.work/extract",
        headers={"X-Api-Key": API_KEY},
        files={"file": open(pdf_path, "rb")}
    )
    result = resp.json()
    for txn in result.get("transactions", []):
        txn["source"] = pdf_path.name
        txn["format"] = result.get("bankKey", "unknown")
        all_transactions.append(txn)
    print(f"✓ {pdf_path.name}: {len(result.get('transactions',[]))} transactions")

df = pd.DataFrame(all_transactions)
df.to_excel("all_transactions.xlsx", index=False)
print(f"\nTotal: {len(df)} transactions from {len(all_transactions)} files")

Method 4: Webhook for Async Processing

For large files or high-volume workflows, use webhooks:

import requests

# Submit with webhook
resp = requests.post(
    "https://api.bank-statement-parser.clkr.work/extract",
    headers={
        "X-Api-Key": "pex_your_api_key",
        "X-Webhook-Url": "https://your-app.com/bank-callback"
    },
    files={"file": open("large_statement.pdf", "rb")}
)
print("Queued:", resp.json())
# Your webhook receives the result when processing completes

JSON Response Schema

{
  "bankKey": "bank_v1",
  "pageCount": 3,
  "transactions": [
    {
      "date": "2026-05-01",
      "description": "MARKET PAYMENT",
      "amount": -245.90,
      "balance": 1254.10
    }
  ]
}

Supported Formats

Any bank from any country is supported for text-based PDFs. When an unknown format is encountered, the system automatically analyzes the PDF structure, builds a parsing profile, and is ready for the next request — zero manual work.

Free Tier

100 pages/month free. No credit card. [Sign up →](https://bank-statement-parser.clkr.work/en/register)

Get Started for Free

Up to 3,000 pages/month free. No credit card required.

Create Account