Back to Blog
June 9, 2026

How to Extract Transactions from a PDF Bank Statement Automatically

How to Extract Transactions from a PDF Bank Statement Automatically

Manually copying bank statement data into spreadsheets is one of the most time-consuming tasks in accounting. Here's how to automate it completely.

The Core Problem

Bank PDFs come in dozens of different layouts. Some use tables, some use line-based formats. Dates appear as "01 Jan 2026", "2026-01-01", or "01/01/26". Amounts may use commas as thousand separators or decimal points depending on the country.

Template-based extraction fails every time a bank updates its layout. AI-powered extraction learns the document's structure and handles variations automatically.

Method 1: REST API (Fastest)

The simplest approach — upload a PDF, get structured JSON back.

import requests

response = requests.post(

"https://api.bank-statement-parser.clkr.work/extract",

headers={"X-Api-Key": "pex_your_api_key"},

files={"file": open("statement.pdf", "rb")}

)

data = response.json()

print(f"Bank: {data['bankKey']}")

print(f"Transactions: {len(data['transactions'])}")

for txn in data['transactions']:

print(f"{txn['date']} | {txn['description'][:40]:40} | {txn['amount']:>10.2f}")

Example output:

Bank: garantibbva_v1

Transactions: 47

2026-05-01 | MARKET PAYMENT | -245.90

2026-05-02 | SALARY CREDIT | 8500.00

2026-05-03 | ELECTRICITY BILL | -189.00

Method 2: Export to Excel/CSV

import requests

import pandas as pd

resp = requests.post(

"https://api.bank-statement-parser.clkr.work/extract",

headers={"X-Api-Key": "pex_your_api_key"},

files={"file": open("statement.pdf", "rb")}

)

df = pd.DataFrame(resp.json()["transactions"])

df['date'] = pd.to_datetime(df['date'])

df = df.sort_values('date')

df.to_excel("transactions.xlsx", index=False)

print(f"Exported {len(df)} transactions")

Method 3: Batch Processing

Process an entire folder of bank statements in one go:

import os, requests, pandas as pd

from pathlib import Path

API_KEY = "pex_your_api_key"

all_transactions = []

for pdf_path in Path("./statements").glob("*.pdf"):

resp = requests.post(

"https://api.bank-statement-parser.clkr.work/extract",

headers={"X-Api-Key": API_KEY},

files={"file": open(pdf_path, "rb")}

)

result = resp.json()

for txn in result.get("transactions", []):

txn["source"] = pdf_path.name

txn["bank"] = result.get("bankKey", "unknown")

all_transactions.append(txn)

print(f"✓ {pdf_path.name}: {len(result.get('transactions',[]))} transactions")

df = pd.DataFrame(all_transactions)

df.to_excel("all_transactions.xlsx", index=False)

print(f"\nTotal: {len(df)} transactions from {len(all_transactions)} files")

Method 4: Webhook for Async Processing

For large files or high-volume workflows, use webhooks:

import requests

# Submit with webhook

resp = requests.post(

"https://api.bank-statement-parser.clkr.work/extract",

headers={

"X-Api-Key": "pex_your_api_key",

"X-Webhook-Url": "https://your-app.com/bank-callback"

},

files={"file": open("large_statement.pdf", "rb")}

)

print("Queued:", resp.json())

# Your webhook receives the result when processing completes

JSON Response Schema

{

"bankKey": "garantibbva_v1",

"pageCount": 3,

"transactions": [

{

"date": "2026-05-01",

"description": "MARKET PAYMENT - ISTANBUL",

"amount": -245.90,

"balance": 12450.10

}

]

}

Supported Banks

100+ banks including all major Turkish banks (Garanti BBVA, İş Bankası, Yapı Kredi, Akbank, Ziraat, Denizbank, Halkbank, QNB Finansbank, TEB, Kuveyt Türk) and international banks from Germany, the UK, the US, UAE, and more.

Free Tier

3,000 pages/month free. No credit card. [Sign up →](https://bank-statement-parser.clkr.work/en/register)

Get Started for Free

Up to 3,000 pages/month free. No credit card required.

Create Account