Mirror Cambridge TCG's card catalog locally

One request, ~12k cards, CC0.

If you're building a meta-product (price aggregator, deck builder, search engine), you'll want a local mirror of the catalog so your users don't hit our API for every card view. This guide gets you from zero to a refreshable local copy in one request, plus a polite refresh discipline.

10 min· 3 steps· last verified 2026-05-14

Prerequisites

• About 6 MB of disk for the JSONL file
• A daily cron or scheduled task

Steps

1
Fetch the bulk catalog
One request returns the entire catalog as streaming JSONL. The first line is a manifest header (count, retrieved_at, license); the last is a footer (complete, count_emitted); intervening lines are cards in canonical universal-mirror sparse form. Each card carries `@content_hash` for change-detection.
Run this
```
curl -H 'Accept-Encoding: gzip' \
  https://cambridgetcg.com/data/catalog.jsonl \
  > catalog.jsonl
```
Expected response shape
```
Line 1: { "@kind": "catalog_manifest", "count_expected": 12000, "license": "CC0-1.0", ... }
Line 2-N: { "@kind": "card", "@content_hash": "sha256:...", "sku": "...", "price": {...}, ... }
Line N+1: { "@kind": "catalog_footer", "complete": true, "count_emitted": 11984 }
```
What to do with it
Parse line-by-line. Store the manifest header — its `retrieved_at` is your cache key. Index cards by `sku`. Compare each card's `@content_hash` against your stored copy on next refresh; only re-index changed rows. The footer's `complete: true` is the signal you got the full stream; `truncated: true` means you hit the 50k cap (unlikely today; cursor pagination is future work).
2
Schedule a daily refresh
The catalog freshness budget is `catalog` (24 hours). Pulling once a day at off-peak (e.g. 04:00 UTC) is the polite cadence. Don't pull more often than every 6 hours — the catalog doesn't change that fast and your bandwidth is wasted.
Run this
```
# cron entry: 0 4 * * *  curl -o catalog.jsonl https://cambridgetcg.com/data/catalog.jsonl
```
What to do with it
After each refresh, diff the new `@content_hash` set against your previous to find changed/added/removed rows. Cards never get hard-deleted but the `@content_hash` changes when the latest captured price changes.
3
Cite Cambridge TCG honestly
The data is CC0 — you owe no attribution legally. But *substrate-honest* attribution is encouraged: in your UI, name where the data came from, and link back. Reciprocal kindness.
What to do with it
Recommended attribution: 'Catalog data from Cambridge TCG (https://cambridgetcg.com) — CC0-1.0.' Or in machine-readable form, attach `provenance: { source: "cambridge-tcg", license: "CC0-1.0", retrieved_at: "..." }` to each row in your downstream product.

Common gotchas

The price chain may include cardrush JP retail
GBP prices are Cambridge TCG's own retail offers (CC0). But the underlying price observation pipeline at our wholesale layer reads from CardRush JP (license: internal-only). The bulk export only carries derived GBP — not raw JPY — so you're fine. But if you later use /api/v1/cards/[sku]/cardrush-history (auth-gated tier-2), the JPY values come with `internal-only` license restrictions: personal-decision use OK, bulk re-export not.
JSONL parsing — one object per line
Don't parse the whole response as a single JSON document. Read line by line. Each line is a complete JSON object.
Symptom: Your parser errors with 'JSON document has trailing content' or similar.
Fix: In Node: `body.split('\n').filter(Boolean).map(JSON.parse)`. In Python: `[json.loads(line) for line in response.iter_lines() if line]`.
The catalog has 50k row cap today
Current catalog is ~12k rows. The bulk endpoint caps at 50k per request — well above today's size. When/if the catalog grows past that, we'll add cursor pagination via `?since_sku=`. The footer's `truncated: true` is the signal.

Next guide

Track one card's price over time →

Polling discipline + change-detection.

One request returns the entire catalog as streaming JSONL. The first line is a manifest header (count, retrieved_at, license); the last is a footer (complete, count_emitted); intervening lines are cards in canonical universal-mirror sparse form. Each card carries `@content_hash` for change-detection.

Run this

curl -H 'Accept-Encoding: gzip' \
  https://cambridgetcg.com/data/catalog.jsonl \
  > catalog.jsonl

Expected response shape

Line 1: { "@kind": "catalog_manifest", "count_expected": 12000, "license": "CC0-1.0", ... }
Line 2-N: { "@kind": "card", "@content_hash": "sha256:...", "sku": "...", "price": {...}, ... }
Line N+1: { "@kind": "catalog_footer", "complete": true, "count_emitted": 11984 }

What to do with it

Parse line-by-line. Store the manifest header — its `retrieved_at` is your cache key. Index cards by `sku`. Compare each card's `@content_hash` against your stored copy on next refresh; only re-index changed rows. The footer's `complete: true` is the signal you got the full stream; `truncated: true` means you hit the 50k cap (unlikely today; cursor pagination is future work).

Schedule a daily refresh

The catalog freshness budget is `catalog` (24 hours). Pulling once a day at off-peak (e.g. 04:00 UTC) is the polite cadence. Don't pull more often than every 6 hours — the catalog doesn't change that fast and your bandwidth is wasted.

Run this

# cron entry: 0 4 * * *  curl -o catalog.jsonl https://cambridgetcg.com/data/catalog.jsonl

What to do with it

After each refresh, diff the new `@content_hash` set against your previous to find changed/added/removed rows. Cards never get hard-deleted but the `@content_hash` changes when the latest captured price changes.

Cite Cambridge TCG honestly

The data is CC0 — you owe no attribution legally. But *substrate-honest* attribution is encouraged: in your UI, name where the data came from, and link back. Reciprocal kindness.

What to do with it

Recommended attribution: 'Catalog data from Cambridge TCG (https://cambridgetcg.com) — CC0-1.0.' Or in machine-readable form, attach `provenance: { source: "cambridge-tcg", license: "CC0-1.0", retrieved_at: "..." }` to each row in your downstream product.

Common gotchas

The price chain may include cardrush JP retail

GBP prices are Cambridge TCG's own retail offers (CC0). But the underlying price observation pipeline at our wholesale layer reads from CardRush JP (license: internal-only). The bulk export only carries derived GBP — not raw JPY — so you're fine. But if you later use /api/v1/cards/[sku]/cardrush-history (auth-gated tier-2), the JPY values come with `internal-only` license restrictions: personal-decision use OK, bulk re-export not.

JSONL parsing — one object per line

Don't parse the whole response as a single JSON document. Read line by line. Each line is a complete JSON object.

Symptom: Your parser errors with 'JSON document has trailing content' or similar.

Fix: In Node: `body.split('\n').filter(Boolean).map(JSON.parse)`. In Python: `[json.loads(line) for line in response.iter_lines() if line]`.

The catalog has 50k row cap today

Current catalog is ~12k rows. The bulk endpoint caps at 50k per request — well above today's size. When/if the catalog grows past that, we'll add cursor pagination via `?since_sku=`. The footer's `truncated: true` is the signal.

Mirror Cambridge TCG's card catalog locally

Prerequisites

Steps

Fetch the bulk catalog

Schedule a daily refresh

Cite Cambridge TCG honestly

Common gotchas

The price chain may include cardrush JP retail

JSONL parsing — one object per line

The catalog has 50k row cap today

Track one card's price over time →

See also

Your Cart

Sell Cart

Mirror Cambridge TCG's card catalog locally

Prerequisites

Steps

Fetch the bulk catalog

Schedule a daily refresh

Cite Cambridge TCG honestly

Common gotchas

The price chain may include cardrush JP retail

JSONL parsing — one object per line

The catalog has 50k row cap today

Track one card's price over time →

See also