The Pymetrics test explained: what the games measure and how employers use them
Pymetrics swaps personality questionnaires for short behavioral games that measure attention, memory, planning, and how you handle risk. Here's how employers use the results, what playing feels like, and the questions worth asking.
AI summary
- Pymetrics replaces personality questionnaires with about a dozen short behavioral games measuring attention, memory, planning, and risk approach. It was founded by neuroscientist Frida Polli and is now part of Harver.
- Employers compare your game-derived profile against a model built from people already succeeding in the role, usually at the very top of high-volume funnels.
- Observed behavior is harder to fake than self-description, but game scores still aren't job skills. Ask what the model was trained on, and pair it with role-specific evidence before anyone gets cut.
The Pymetrics test is a series of short online games that build a behavioral profile of how you pay attention, remember, plan, and take risks. Pymetrics (the company itself styled the name lowercase) was founded in the early 2010s by neuroscientist Frida Polli and is now part of Harver, the hiring-assessment company that acquired it in 2022. Employers, mostly large ones running high-volume or early-career funnels, compare each candidate’s game-derived profile against a model of what success looks like in the role.
If you’ve been invited to “play a few games,” keep two things in mind. There are no right answers in the usual sense, and you can’t study for it. But the friendly framing doesn’t change what it is: an assessment whose output influences whether you advance. If you’re evaluating it as a buyer, Pymetrics is the most serious attempt yet to replace self-report questionnaires with observed behavior. It’s also the most opaque tool in the category, and that opacity cuts in both directions.
How the Pymetrics games work
Instead of asking what you’re like, Pymetrics watches what you do. You play roughly a dozen short exercises, typically around half an hour in total. The specifics vary, but the publicly described examples are well known. You pump a digital balloon that’s worth more money the bigger it gets and bursts if you push too far, which speaks to how you handle risk. You recall growing strings of digits, which is working memory. You decide how to split money with an anonymous partner, which gets at fairness and trust. You hit a key when certain shapes flash and hold back for others, which measures attention and impulse control. Other tasks touch planning, learning from feedback, and reading emotion in faces.
Most of these are adaptations of decades-old cognitive science lab tasks. The product’s claim was never that the games are novel. It’s that running them at scale, then comparing your behavioral fingerprint against people already thriving in a role, produces a useful hiring signal.
How employers use it
The core mechanic is the success model. Employees in the role, often the stronger performers, play the same games. Modeling identifies the behavioral patterns that distinguish them, and applicants are scored by how closely they resemble that pattern. The recommendation then feeds the funnel, usually at the very top, before a human has looked at anything.
That placement is the point. The pitch was built for employers drowning in early-career applications, where reading every resume was never going to happen. The era’s most cited example was Unilever, which publicly described rebuilding its graduate hiring around gamified assessments followed by recorded video interviews.
Pymetrics also leaned hard into algorithmic fairness, publicly auditing its models for adverse impact and open-sourcing a bias-testing tool. Since the acquisition, the games are sold within Harver’s broader assessment platform.
What candidates experience
It feels different from any questionnaire. Nothing asks you to describe yourself. You just do the tasks, usually on whatever device you have, and then it ends and you have no idea how you did. That last part is the strangest piece of the experience. With a personality questionnaire you can at least guess what your answers implied. With a balloon you popped six times, you can’t.
Practical advice is short. Read each game’s instructions carefully, because the task switches fast. Play in one quiet sitting. And don’t try to outsmart it by guessing what the balloon is really for. You don’t know what the model rewards for this role, and your guess is as likely to hurt as help.
If the opacity bothers you and you’re curious what a transparent trait profile looks like instead, our free work style profile is a Big Five questionnaire that shows you exactly what’s measured and hands you the results.
What Pymetrics gets right
The mechanism is genuinely different. Watching behavior sidesteps the central weakness of questionnaires, which is that people describe an edited version of themselves. You can’t present an idealized self through a memory task.
It respects the funnel. Half an hour of games at the top of a graduate pipeline beats an hour-plus questionnaire on completion, and most candidates find games less miserable than 180 forced-choice items.
And it moved the industry on fairness. Whatever you conclude about game-based hiring, Pymetrics made adverse-impact auditing something vendors are expected to answer for, and it published tooling rather than just claims.
The limitations and open questions
Opacity is the big one, and it runs both ways. Candidates can’t sanity-check what a game measured or contest a result they never see. Employers get model output built on features no hiring manager can intuitively inspect. “The model says this applicant resembles your top performers” is a hard sentence to interrogate in a calibration meeting, and “I got rejected by a balloon game” is a hard sentence to hear from a candidate you wanted.
Lab constructs aren’t job skills. Risk tolerance, memory span, and generosity in a money split are real, measurable things. None of them is evidence that someone can reconcile the ledger, clear a ticket queue, or run a sprint. The leap from game behavior to job outcome is a modeling claim, and you’re being asked to take it largely on trust.
Models learn from incumbents. If the people currently succeeding in a role share traits for reasons unrelated to the work, a model fit to them can absorb that. Adverse-impact audits catch demographic skew. They don’t tell you whether the model measures what actually matters for the job.
And the regulatory surface is growing. Algorithmic screening tools now face bias-audit and disclosure requirements in some jurisdictions, with New York City’s automated employment decision tool law the loudest example. That’s vendor homework you inherit the day you deploy.
If you’re evaluating it for screening
Questions worth putting to Harver, or to any vendor selling game-based screening:
- What was the success model for this specific role trained on, and how large and recent is that data?
- How often are models revalidated, and what happens when the role changes?
- What adverse-impact testing has been run, and can we see results for a population like ours?
- What does a rejected candidate see, and what can we defensibly tell them?
- Where does it sit in our funnel, and what human review follows it?
- What’s the accommodation path for candidates who can’t complete game-based tasks?
- What does pricing look like at our volume? It’s quote-based, so make them model your actual applicant numbers.
The thread running through all of these: you’re accountable for decisions the tool influences, so only buy what you can explain.
Where skills-based screening fits
Games tell you how someone behaves on abstract tasks. They still don’t show whether the person can do the work you’re hiring for. That takes role-specific evidence: work samples, structured one-way interviews, and skills assessments built around the position. If you’re standardizing the interview side too, our free interview question generator is a quick place to start.
That’s the layer Truffle covers. Truffle is a candidate screening platform that combines talent assessments with resume screening and one-way video interviews. AI transcribes, summarizes, and scores responses against your criteria, then puts the evidence in front of you in one view. The decisions stay yours, and every score traces back to something you can watch or read.
For a graduate program with fifty thousand applications, behavioral games solve a volume problem most teams will never have. For a company hiring year-round across normal-sized funnels, evidence of the actual work is easier to defend to hiring managers, easier to explain to candidates, and harder to argue with.
Frequently asked questions
How much does Pymetrics cost?
There’s no public price list. Pymetrics was always sold as an enterprise product, and since the acquisition it’s quoted through Harver’s platform, typically scoped by hiring volume and modules. If you’re evaluating it, ask for pricing modeled on your actual applicant numbers.
Can you fail the Pymetrics test?
There’s no pass or fail score. The games produce a behavioral profile that’s compared against a model for the role, so the same profile can fit one position and not another. Not advancing means the model read you as a weaker resemblance to that role’s benchmark, nothing more.
How long does the Pymetrics test take?
Plan for roughly half an hour for the core games, done in one sitting. Individual games are short, a couple of minutes each, but they switch quickly and the instructions matter.
Can you practice or retake the Pymetrics games?
You can’t meaningfully practice, because the tasks measure baseline behavior rather than knowledge. Retakes have historically not been available on demand, and your results could carry over across employers using the platform. Policies can change under Harver, so follow the instructions in your invitation and ask the recruiter if anything is unclear.
What do the Pymetrics games actually measure?
Behavioral attributes like attention, working memory, planning, learning from feedback, risk approach, and fairness in money-exchange tasks. Individually they’re narrow lab measures. The hiring signal comes from aggregating them into a profile and comparing it against people already succeeding in the role.