How to Extract Transactions from a PDF Bank Statement Automatically
How to Extract Transactions from a PDF Bank Statement Automatically
Manually copying bank statement data into spreadsheets is one of the most time-consuming tasks in accounting. Here's how to automate it completely.
The Core Problem
Bank PDFs come in dozens of different layouts. Some use tables, some use line-based formats. Dates appear as "01 Jan 2026", "2026-01-01", or "01/01/26". Amounts may use commas as thousand separators or decimal points depending on the country.
Template-based extraction fails every time a bank updates its layout. AI-powered extraction learns the document's structure and handles variations automatically.
Method 1: REST API (Fastest)
The simplest approach — upload a PDF, get structured JSON back.
import requests
response = requests.post(
"https://api.bank-statement-parser.clkr.work/extract",
headers={"X-Api-Key": "pex_your_api_key"},
files={"file": open("statement.pdf", "rb")}
)
data = response.json()
print(f"Bank: {data['bankKey']}")
print(f"Transactions: {len(data['transactions'])}")
for txn in data['transactions']:
print(f"{txn['date']} | {txn['description'][:40]:40} | {txn['amount']:>10.2f}")
Example output:
Bank: garantibbva_v1
Transactions: 47
2026-05-01 | MARKET PAYMENT | -245.90
2026-05-02 | SALARY CREDIT | 8500.00
2026-05-03 | ELECTRICITY BILL | -189.00
Method 2: Export to Excel/CSV
import requests
import pandas as pd
resp = requests.post(
"https://api.bank-statement-parser.clkr.work/extract",
headers={"X-Api-Key": "pex_your_api_key"},
files={"file": open("statement.pdf", "rb")}
)
df = pd.DataFrame(resp.json()["transactions"])
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df.to_excel("transactions.xlsx", index=False)
print(f"Exported {len(df)} transactions")
Method 3: Batch Processing
Process an entire folder of bank statements in one go:
import os, requests, pandas as pd
from pathlib import Path
API_KEY = "pex_your_api_key"
all_transactions = []
for pdf_path in Path("./statements").glob("*.pdf"):
resp = requests.post(
"https://api.bank-statement-parser.clkr.work/extract",
headers={"X-Api-Key": API_KEY},
files={"file": open(pdf_path, "rb")}
)
result = resp.json()
for txn in result.get("transactions", []):
txn["source"] = pdf_path.name
txn["bank"] = result.get("bankKey", "unknown")
all_transactions.append(txn)
print(f"✓ {pdf_path.name}: {len(result.get('transactions',[]))} transactions")
df = pd.DataFrame(all_transactions)
df.to_excel("all_transactions.xlsx", index=False)
print(f"\nTotal: {len(df)} transactions from {len(all_transactions)} files")
Method 4: Webhook for Async Processing
For large files or high-volume workflows, use webhooks:
import requests
# Submit with webhook
resp = requests.post(
"https://api.bank-statement-parser.clkr.work/extract",
headers={
"X-Api-Key": "pex_your_api_key",
"X-Webhook-Url": "https://your-app.com/bank-callback"
},
files={"file": open("large_statement.pdf", "rb")}
)
print("Queued:", resp.json())
# Your webhook receives the result when processing completes
JSON Response Schema
{
"bankKey": "garantibbva_v1",
"pageCount": 3,
"transactions": [
{
"date": "2026-05-01",
"description": "MARKET PAYMENT - ISTANBUL",
"amount": -245.90,
"balance": 12450.10
}
]
}
Supported Banks
100+ banks including all major Turkish banks (Garanti BBVA, İş Bankası, Yapı Kredi, Akbank, Ziraat, Denizbank, Halkbank, QNB Finansbank, TEB, Kuveyt Türk) and international banks from Germany, the UK, the US, UAE, and more.
Free Tier
3,000 pages/month free. No credit card. [Sign up →](https://bank-statement-parser.clkr.work/en/register)