Every year, over 1.5 million papers are published in biomedicine and life sciences. Tens of thousands more sit on preprint servers, reviewed by nobody, invisible to most, waiting for a journal decision that might take two years to arrive. Meanwhile, the scientific community needs to know what is worth reading. Researchers need to know what to build on. Funders need to know what to back.
The tool most people reach for is journal rank. Where was it published? Nature? Cell? PLOS ONE? It is a fast heuristic, and everyone understands it. The problem is that it measures the prestige of the venue, not the quality of the paper. A weak paper in a top journal outscores a strong paper in a mid-tier one, every time. And journal rank does not exist at all for preprints, which are the fastest-growing part of the scientific literature.
There is a better way. We created it. It’s called QED Score.
What is QED Score?
QED Score is an AI-based quality metric that reads a manuscript and evaluates it on two dimensions: originality, how far the finding advances what the field already knows, and validity, how well the evidence actually supports the conclusions drawn.
Before scoring begins, every manuscript is fully anonymized. Author names, institutional affiliations, and any identifying information are automatically stripped. The system does not know if the paper comes from Harvard or a university it has never heard of. It reads the science.
Under the hood, QED Score uses a multi-agent AI architecture. The manuscript is decomposed into its constituent claims and the evidence behind each one. Specialized agents examine the work in parallel, checking for inconsistencies across figures, evaluating statistical rigor, identifying contradictions with the existing literature, and assessing whether the data supports the conclusions drawn. Their findings pass through a verification layer before being synthesized into a single calibrated score. That score is expressed as a percentile rank; a paper scoring 80 ranks above 80% of the reference corpus. This makes QED Score directly comparable across fields and over time.
Study 1: Does QED Score agree with human experts?
To validate our score, the first question was straightforward: does QED Score match the judgements of people who actually know the field?
A professionally labelled corpus of 925 published papers from 185 authors was obtained. Each paper had been assigned to one of three quality tiers, Limited, Satisfactory, or Strong, by a panel of domain experts working from explicit criteria: originality and validity. The scoring pipeline never saw these labels.
QED Score separated Limited papers from the rest with an AUC of 0.867. To put that in context, an AUC of 0.50 is random chance and 1.0 is perfect. For comparison, the same test was run using SJR journal rank as the predictor on the 795 papers where both QED scores and SJR values were available. QED Score outperformed journal rank on both comparisons: 0.863 vs 0.804 for identifying Limited papers, and 0.782 vs 0.774 for identifying Strong ones.
Prior to this, journal rank has been the best existing proxy for paper quality. This study shows that QED Score rivals it.
Study 2: Can QED Score predict where a paper will be published, before review?
The second study was more ambitious. 4,953 bioRxiv preprints from April 2025 were scored with QED Score before peer review, then tracked to see where they ended up being published.
A critical methodological constraint applied: QED Score is generated using language models whose knowledge cut-offs predated the April 2025 corpus. Every paper was unseen by the underlying models. That means the correlation found cannot be explained by the system having encountered the papers before.
Of the 4,953 preprints, 2,879 were subsequently matched to a published version with an associated SJR value. The correlation between the preprint QED Score and the eventual journal's SJR was Spearman ρ = 0.63, which is substantial agreement between an AI assessment of unreviewed work and the outcome of formal peer review months later. The correlation held across all 21 life science disciplines analysed, ranging from ρ = 0.78 in Genetics to ρ = 0.39 in Systems Biology.
In short? QED Score, computed on a preprint, can anticipate the rank of the journal in which it will eventually appear.
Study 3: When QED Score and journal rank disagree, who is right?
Finally, we looked at what happens when QED Score and journal rank divert. 100 paper pairs were constructed specifically because QED Score and journal rank pointed in opposite directions: the paper QED rated more highly had been published in the lower-ranked journal, and vice versa. Blinded domain-expert professors then judged the stronger paper in each pair, without knowing where either had been published or what QED Score had said.
Experts rendered 70 confident judgements. On 60 decisive ones, excluding ties, they sided with QED Score in 75% of cases (Wilson CI 63%–84%, p < 0.001). They preferred the QED-favoured paper three times more often than the journal-rank-favoured paper.
When it comes to QED Score vs journal rank, experts side with the AI.
Introducing The 1%
Having validated QED Score across three independent studies, it was applied at scale. Between May 2025 and April 2026, 57,455 bioRxiv preprints were scored, a near-complete census of life science preprint output over a full year. Every paper was anonymized before scoring and evaluated on the same two criteria: originality and validity.
The papers scoring in the top 1% of that distribution constitute The 1%: the most original and valid contributions to life science preprint output of the year, identified before peer review, blind to identity, ranked on the science alone.
At 1 in 100, The 1% is more selective than any major life science journal. Nature accepts roughly 8% of submissions. Science accepts around 6%. The 1% is not a journal acceptance. It is an assessment of the science itself, made by a system that does not know and does not care who wrote it. If your paper is in The 1%, it earned its place on merit.
What QED Science has built is unprecedented. For the first time, scientific quality can be measured at scale: blind to identity and zipcode, validated against expert judgement, and available the moment a paper exists. The 1% is one application of that capability, but the technology is not limited to a single use case.
Any decision that rests on the quality of scientific evidence, such as drug development, clinical research, hiring, funding and more, can now be informed by a signal that reflects the science rather than the system around it.
Read the full methodology and validation data in our white paper, and discover The 1% for yourself here.

