AI hiring bias is not only about gender, age, ethnicity, or bad training data.
There is a stranger problem now: the AI may prefer candidates who sound like the AI.
That sounds niche until you look at how hiring already works. Candidates use ChatGPT or Copilot to polish resumes. Recruiters use AI tools to summarize, filter, rank, or compare those same resumes. The same kind of model can end up on both sides of the table.
A new paper calls this AI self-preferencing. In plain English: when a large language model (LLM) evaluates resumes, it can prefer text that looks like something it would have written, even when the underlying candidate information is held constant.1
That is the useful part of the story.
Not “AI is bad.” Not “never use AI in hiring.” Not “ISO 42001 magically solves the EU AI Act.”
The point is sharper: if AI can influence who reaches a shortlist, you need governance before the shortlist exists.
The paper found a bias most audits would miss
The researchers tested a pattern that is becoming normal in hiring.
Applicants use AI to write or improve resume summaries. Employers use AI to judge them. The evaluator model then sees text that may carry the style, structure, and phrasing of a model like itself.
The paper describes the risk like this:
Self-preferencing bias is inherently interactional: it arises when LLMs are asked to judge content that may share stylistic or linguistic patterns with their own generative outputs.
Source: Xu, Li, Jiang, “AI Self-preferencing in Algorithmic Hiring”, Section 1.1
That word matters: interactional.
This is not the old story where biased historical data gets encoded into a model. That can still happen. But this is different. The bias comes from the setup itself: AI-generated applications meet AI-based evaluation.
In the study, the researchers compared resume summaries while keeping the candidate information fixed. They found strong LLM-vs-human self-preference across major models. In simulated hiring pipelines across 24 occupations, candidates using the same LLM as the evaluator were 23 percent to 60 percent more likely to be shortlisted than equally qualified applicants who submitted human-written resumes.1
The candidate did not become better. The signal changed.
The shortlist is where the damage happens
Hiring teams often talk as if AI screening is just preparation. The final decision is still human, so the risk feels smaller.
That is too comfortable.
The shortlist is not admin. It is where hiring becomes real. A candidate who never gets shortlisted does not get to impress the hiring manager later. A false negative at the screening stage usually dies quietly.
That is why this paper is useful even though it is not field evidence from live applicant tracking systems. It does not prove that every employer using AI screening already has this problem. It studies controlled resume summaries, not full hiring processes.1
But it shows a failure mode worth taking seriously. A tool meant to find better candidates can end up rewarding candidates who wrote in the model’s preferred style.
That is not a small wording issue. It changes who gets human attention.
The EU AI Act treats this as high stakes
The EU AI Act does not treat recruitment AI as ordinary workplace software.
Annex III point 4(a) covers AI systems intended for recruitment or selection, especially systems used to place targeted job ads, analyze and filter job applications, or evaluate candidates. Article 6 says Annex III systems are high-risk, unless a specific exception applies because the system does not pose a significant risk, including by not materially influencing decision-making.2
That exception matters. A narrow tool that only formats interview notes is not the same thing as a tool that filters candidates.
But if the tool analyzes applications, ranks candidates, recommends who to interview, or materially shapes the shortlist, do not hand-wave it away as “just a productivity feature.” The Act puts employment and worker management in a sensitive category for a reason.
Annex III point 4(b) extends the same logic beyond recruitment. It covers AI used for decisions affecting work-related relationships, promotion, termination, task allocation based on personal traits or behavior, and monitoring or evaluating worker performance and behavior.3
So the boundary is not “before employment” and “after employment.”
The boundary is whether AI can affect a person’s access to work, conditions of work, progression, monitoring, or evaluation.
Most high-risk obligations apply from 2 August 2026, with some AI Act chapters already phased in earlier.4 That gives companies time, but not much comfort. The legal category can change when the same generic AI feature moves from drafting text to influencing people decisions.
What a normal employer has to control
Most employers are not building hiring models. They are buying tools or turning on features inside existing systems.
Under the AI Act, that usually makes them deployers: organizations using an AI system under their authority.5
That word is easy to underestimate. “We only use the vendor tool” does not remove the employer’s part of the job.
For high-risk systems, Article 26 expects deployers to use the system according to instructions, assign human oversight to people with competence, training, authority, and support, monitor operation, keep logs under their control, inform workers and workers’ representatives before workplace use, and inform natural persons when an Annex III high-risk system makes or assists decisions about them.6
Article 4 adds a broader AI literacy duty. Providers and deployers must take measures, to their best extent, to ensure that staff and other people using AI systems on their behalf have enough AI literacy for the context and the people affected.7
Put that into plain hiring language.
If a recruiter uses an AI screening feature, the company needs to know what the tool is for, what it must not be used for, who checks its output, what evidence is retained, what candidates or workers are told, and how the company reacts if the tool starts behaving badly.
That is not a legal department side quest. It is the operating model.
ISO 42001 is the disciplined part. That is why it helps
ISO/IEC 42001:2023 does not give automatic AI Act compliance. It is not a magic certificate shield.
It is useful for a better reason: it forces the company to answer the dull questions that AI adoption skips.
Who owns the use case?
What is the intended use?
Who is affected?
Which supplier is involved?
What logs exist?
What risks are acceptable?
What happens when the risk changes after rollout?
Clause 4 requires the organization to determine its context, its role in relation to AI systems, relevant interested parties, and scope.8 Clause 6 requires AI risk assessment, AI risk treatment, and AI system impact assessment.9
Annex A turns that into practical control areas: AI policy, roles and responsibilities, concern reporting, impact assessment, lifecycle controls, data, information for interested parties, responsible use, intended use, and supplier responsibilities.10
This maps well to the hiring problem.
AI self-preference is not only a model behavior. It is a process risk. It depends on who uses the tool, for what decision, with what input data, under what instructions, with what oversight, and with what review loop.
That is management system territory.
A better pre-launch test for hiring AI
Before an AI tool influences a shortlist, run a test that matches the actual risk. Not a demo. Not a vendor slide deck. A real governance review.
- Define the decision line.
State exactly where AI may assist and where it may not decide. Summarizing applications is one use. Filtering candidates out is another.
- Classify the use case.
Check whether the system analyzes applications, filters candidates, evaluates candidates, or affects work-related decisions. If yes, treat Annex III as directly relevant unless you have a documented reason why the high-risk exception applies.2
- Review the vendor’s intended use.
Do not buy “AI for HR” as a category. Ask what the system is intended to do, what data it uses, what logs exist, what human oversight the provider expects, and what the vendor says not to do.
- Test beyond demographic bias.
Demographic bias still matters. Test it. But also test for style preference, preference for a specific model’s wording, resume-polish preference, and cases where the same candidate information gets different scores because the wording changed.1
- Give human oversight real authority.
Human oversight is not a rubber stamp after the model has sorted everyone. The reviewer needs competence, time, authority, and a way to challenge the output.
- Keep enough evidence to learn.
Keep the logs and records needed to understand what happened. Which model version. Which input. Which shortlist. Which override. Which concern. Which corrective action.
- Review after rollout.
The risk does not end when procurement signs. AI systems change. Candidate behavior changes. Hiring managers learn to trust or distrust the tool. Review the process like something that can drift, because it can.
That list is not glamorous. Good governance rarely is.
But it is much better than discovering the problem after three hiring rounds and no way to reconstruct what happened.
The article in one sentence
Hiring AI can create a new kind of bias: candidates may win because their resume fits the evaluator model, not because they fit the job.
The EU AI Act matters because recruitment and worker management sit in a high-stakes category. ISO/IEC 42001:2023 matters because it gives the organization a way to govern AI use before it becomes invisible daily practice.
If you already work with management systems, this should feel familiar. The answer is not a heroic one-time review. It is policy, ownership, supplier control, risk assessment, impact assessment, monitoring, corrective action, and management review.
The same management discipline that keeps quality, environment, information security, and work environment under control now has a new job.
AI has entered the hiring process. The management system has to follow it there.
If you want to see what that looks like in practice, AmpliFlow’s ISO 42001 setup brings the pieces into one management system: AI policy, risk assessment, impact assessment, supplier control, actions, and follow-up.
Footnotes
-
Jiannan Xu, Gujie Li, and Jane Yi Jiang, “AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights,” arXiv, 2025. The paper reports LLM-vs-human self-preference in a controlled resume-screening setup and simulated shortlist effects of 23 percent to 60 percent. Source: arXiv HTML. ↩ ↩2 ↩3 ↩4 ↩5
-
Regulation (EU) 2024/1689, Article 6 and Annex III point 4(a). Annex III covers AI systems used for recruitment or selection, including systems used to place targeted job ads, analyze and filter job applications, and evaluate candidates. Article 6 includes the high-risk classification rule and the Article 6(3) exception for Annex III systems that do not pose a significant risk, including by not materially influencing decision-making. Source: EUR-Lex. ↩ ↩2
-
Regulation (EU) 2024/1689, Annex III point 4(b). The Act also covers AI systems used for decisions affecting work-related relationships, promotion, termination, task allocation based on personal traits or behavior, and monitoring or evaluation of worker performance and behavior. Source: EUR-Lex. ↩
-
Regulation (EU) 2024/1689, Article 113. The regulation applies from 2 August 2026, with Chapters I and II applying from 2 February 2025, and Chapter III Section 4, Chapter V, Chapter VII, Chapter XII, and Article 78 applying from 2 August 2025, except Article 101. Source: EUR-Lex. ↩
-
Regulation (EU) 2024/1689, Article 3(4). A deployer is a natural or legal person, public authority, agency, or other body using an AI system under its authority, except for purely personal non-professional activity. Source: EUR-Lex. ↩
-
Regulation (EU) 2024/1689, Article 26. Deployer duties for high-risk AI systems include use according to instructions, competent human oversight, monitoring, log keeping when logs are under the deployer’s control, workplace information duties, and informing natural persons when Annex III high-risk systems make or assist decisions about them. Source: EUR-Lex. ↩
-
Regulation (EU) 2024/1689, Article 4. Providers and deployers must take measures to ensure, to their best extent, a sufficient level of AI literacy among staff and other persons dealing with operation and use of AI systems on their behalf. Source: EUR-Lex. ↩
-
ISO/IEC 42001:2023, clauses 4.1 to 4.4. These clauses cover organizational context, the organization’s role in relation to AI systems, interested parties, scope, and establishing the AI management system. ↩ -
ISO/IEC 42001:2023, clauses 6.1.2, 6.1.3, and 6.1.4. These clauses require an AI risk assessment process, AI risk treatment process, and AI system impact assessment process. ↩ -
ISO/IEC 42001:2023, Annex A. Relevant controls include A.2.2 AI policy, A.3.2 roles and responsibilities, A.3.3 reporting concerns, A.5.2 to A.5.4 impact assessment, A.6.2 lifecycle controls, A.7 data controls, A.8 information for interested parties, A.9 responsible use and intended use, and A.10 third-party relationships. ↩