← /

## evals

A small, hand-curated benchmark of questions about my CV. Click run and the agent loop runs entirely in your browser via Chrome's on-device Gemini Nano — each answer is graded against expected keywords plus a hallucination-guard list. The judge is deliberately dumb (just substring checks) so the verdict is reproducible and free.

why it exists: because shipping an agentic feature without an eval harness is how you find out it's broken from a recruiter's tweet.

pass: 0 · fail: 0 · skip: 0 · total: 10

[pending]wbg-role
q: What's your current role?
[pending]agentic
q: Tell me about your agentic AI work.
[pending]ifc-malena
q: What did you build at IFC?
[pending]kubernetes
q: How have you used Kubernetes in production?
[pending]publications
q: Have you published research?
[pending]speaker
q: Have you spoken at conferences?
[pending]aidevex
q: What is AIDevEx?
[pending]boa
q: Tell me about your time at Bank of America.
[pending]tcs
q: Did you work on Java early in your career?
[pending]hallucination-guard
q: Have you worked at Google?