Tankada is an IAM gateway that sits between LLM agents and SQL databases. It intercepts every query, evaluates it against per-table scope policies and behavioural session signals, and either forwards it to the database or denies it with a structured reason the agent can act on.
The threat class. A goal-directed ReAct agent under a task prompt does not stop on a generic deny. It rewrites the query: different columns, different predicates, adjacent tables, tautologies. Each variant looks like a fresh request to a per-call IAM system. Across a single task execution an agent can issue a dozen reformulations of the same forbidden access; in standard deployments none of them trigger a cross-query signal.
What Tankada adds:
- Per-table scope enforcement with a single policy map that the analyzer, gateway and database all consult.
- Session-scoped behavioural risk scoring that accumulates deny counts per table within a task and blocks the session when reformulation patterns emerge.
- Structured
deny_categories API that tells a compliant agent whether to abort (semantic deny), rewrite (syntactic deny), or retry (transient).
Open source. The per-table scope engine and the deny_categories taxonomy are at github.com/saluc28/tankada. The session-scoring extension that powers this live demo is a proprietary module on top of the open-source core.
Methodology disclosure. Demo scenarios are split into two kinds, marked on each preset card:
policy: the dashboard sends a pre-crafted SQL query to the gateway. These scenarios deterministically exercise a specific policy rule (tautology, row limit, schema enumeration, pagination). They illustrate what the gateway does given a known input; they do not demonstrate agent behaviour.
agent: the dashboard runs a real LLM agent on a natural-language task. These scenarios show actual agent behaviour, including the agent's freedom to rewrite queries, and may produce different outcomes run-to-run.
The empirical claim of the paper, that LLM agents autonomously reformulate denied queries, is not proven by this demo. It is measured by an experiment over 20 task types, 3 LLM models and 3 conditions, reported in §6 Evaluation. Full transcripts are released alongside the paper.
Paper. The arXiv pre-print is in submission. Once published, the link in the header will resolve.