AI in Finance
FROM PRACTICE, NOT THEORY

Issue №01 - 03 June 2026

The Data Conversation Nobody Wants to Have

~~.~~ THREE PRACTITIONER INSIGHTS ~~.~~

We spent 6 weeks on the model. We spent 6 months on the data pipeline. This is normal.

I used to present this ratio apologetically, like it was a failure of planning. Now I present it as a law of nature. In regulated banking, data access involves security reviews, privacy impact assessments, data-sharing agreements between departments, and often a philosophical debate about data ownership. The model is the easy part. The data pipeline is the work. If your project plan allocates equal time to both, multiply the data side by four and you'll be closer to reality. I'm not exaggerating.

I changed my mind about synthetic data. Here's why.

Eighteen months ago, I was skeptical. Synthetic data felt like a shortcut — a way to avoid the hard work of getting real data access. I was wrong. In our regulatory environment, getting approval to use real client data for model experimentation can take 8–12 weeks per dataset. Synthetic data — statistically representative datasets generated without any real PII — lets us prototype in days instead of months. We now use it for all initial experimentation. Real data comes in for final validation and production training. This isn't cutting corners. It's intelligent sequencing. The privacy team actually prefers it because we're not requesting sensitive data access for models that might not work.

I lost an argument about data lineage last month. I was right to lose it.

We had a credit risk model nearly ready for pilot — six weeks of work, linking three years of historical default data to customer behavioural data across two source systems. Solid model. The data protection team flagged the linkage and asked a question I couldn't answer convincingly: "If a customer exercises their Article 15 right tomorrow, can you produce the exact training data that influenced a decision about them, within 72 hours?" I said yes in theory. They asked me to prove it. I couldn't — not within 72 hours, not reliably, not at scale. We paused the project for four weeks to build proper lineage tracking before resuming. I was frustrated. I was also wrong to be frustrated. The infrastructure we built during those four weeks is now the foundation under three other projects in the pipeline. The pause wasn't a delay. It was a tax I'd been deferring, and the bill came due with interest. If your data scientists can't reproduce the exact training data behind a deployed model, you are not ready for the Article 15 letter that is coming. And it is coming.

~~-~~ Two Use Cases ~~-~~

→ WIN KYC onboarding review: 55% faster, zero compliance friction

A compliance team was drowning in KYC document verification — identity documents, proof-of-address letters, corporate registrations — all manually cross-referenced against application forms. They deployed a document AI pipeline that extracts, cross-references, and flags discrepancies. Key design decision: the model never approves a client. Ever. It pre-fills fields and highlights mismatches. Every final decision is human. That single design principle — "the model flags, the human decides" — made the compliance team comfortable and the regulatory approval straightforward. Nobody argued about model risk because the model doesn't make risk decisions. Sometimes the smartest AI strategy is deliberately limiting what the AI does.

→ LESSON The €2.3 million data lake that nobody queried

I know a bank (not mine) that spent 18 months and €2.3 million building a centralized data lake to "enable AI at scale." They hired consultants. They bought licenses. They built connectors to 40+ source systems. The result: a massive repository with inconsistent metadata, unclear ownership, and data freshness ranging from real-time to six months old. The AI team bypassed it entirely and went directly to source systems because it was faster and more reliable. The data lake became a monument to infrastructure-first thinking. The lesson is simple but hard to accept: don't build a cathedral. Build a tool shed next to the first thing you're actually constructing. Expand the shed when you need to build more.

One myth I'd retire

"Our data isn't clean enough for AI."

I hear this at every conference and in every boardroom. It's the most socially acceptable way to say "we're not doing AI" without taking any blame. Here's my uncomfortable response: your data doesn't need to be perfect. It needs to be good enough for a specific, scoped task. A document extraction model doesn't care about your data warehouse quality — it needs a reliable pipeline of PDFs and a validation layer. A meeting summarizer needs access to transcripts, not a golden record. Stop using "data quality" as a shield. Every month you delay, the gap between you and the banks that started imperfectly grows wider. They're learning. You're waiting. Waiting is not a strategy.

◉ THE REGULATORY SIGNAL

[Written May 29th.] On 7 May, EU negotiators reached a provisional agreement on the Digital Omnibus on AI. The headline: high-risk obligations under Annex III — including creditworthiness assessment and credit scoring — are postponed from 2 August 2026 to 2 December 2027. Sixteen months. Final adoption is expected in June, publication in July. I've already had three people in my network ask me whether we're slowing down. We are not. Here is why that would be the wrong call. The documentation work — your AI inventory, Annex III classification matrix, data quality evidence under Article 10, risk management process under Article 9, the audit trail infrastructure under Article 12 — IS the operational value, not the regulatory deliverable. Banks that pause will start panicking in mid-2027 when they realise a small number of conformity assessors are servicing the entire European market, and assessor availability tightens as enforcement approaches. Banks that keep moving will have evidence accumulated over eighteen months instead of three. What to do this week: tell your steering committee the date moved, then tell them nothing else changes. The conversation you do not want to have in October 2027 is "why didn't we use the extra time we were given."

🎁 FREE THIS ISSUE: Data Readiness Self-Assessment

Before you start any AI project, you need to know where your data actually stands — not where your architecture diagram says it stands. I built a 10-question self-assessment that takes 5 minutes. It covers: access speed, ownership clarity, freshness, documentation, cross-system linkage, PII handling, quality monitoring, lineage traceability, synthetic data capability, and governance process. Score yourself honestly. Anything below 6/10 means your first project should include a data workstream, not just a model workstream.

Next issue changes tone. We're moving from "where to start" to "how to build the machine." First up: governance. I'm sharing the exact one-page governance framework that governs every AI system in my bank. Three questions, three tiers, fits on a laminated card. I'm also sharing the story of how I accidentally created a shadow AI problem — and the uncomfortable fix. June 17th.

If this was useful, forward it to one finance leader who'd want it.
That's how this newsletter grows.

Unsubscribe · Preferences

Christophe Atten

The Secret Behind AI Projects

The Data Conversation Nobody Wants to Have

Governance as Your Secret Sauce

Avoiding AI Compliance Pitfalls

Streamlined AI Governance Uncovered