Writing9 minJune 9, 2026

How to Write a Methods Section That Survives Peer Review

Reviewers reject papers they cannot reproduce. Here's how to write a methods section detailed enough to rebuild, ordered so a stranger could follow, and defensible against the questions reviewers actually ask.

Jin Park
Founder & Editorial Lead

1. What the Methods Section Is Actually For

The methods section has exactly two jobs: let a competent stranger reproduce your work, and

convince a skeptical reviewer that your results mean what you claim. Everything else — elegant

prose, motivation, history — belongs in the introduction. If a sentence does not serve

reproduction or credibility, cut it.

Reviewers reach for "reject" when they cannot tell what you did. A vague methods section reads

as either carelessness or concealment, and reviewers assume the less charitable option. The bar

is concrete: could someone in your field, with your section and nothing else, rebuild the

experiment and expect the same outcome? If the honest answer is no, you have rewriting to do.

Two Tests Every Methods Section Must Pass

  • Reproduction: a peer could rebuild it from your text alone.
  • Credibility: a skeptic cannot find an obvious confound you ignored.
  • If either fails, the strongest results section will not save the paper.

2. Order It So a Stranger Could Follow

Write the methods in the order someone would execute them, not the order you discovered them.

A reliable structure: data or materials first, then preprocessing, then the model or procedure,

then the experimental setup (hardware, hyperparameters, splits), then evaluation metrics. Each

subsection should hand the reader cleanly to the next.

Use subsection headings generously. A reviewer skims first, reads second — clear headings let

them find the one detail they want to check without rereading the whole section. If your field

has a "Materials and Methods" convention, follow it exactly; reviewers notice when you deviate

from what they expect and it costs you goodwill before they reach your results.

3. Justify Choices, Don't Just Describe Them

The weakest methods sections are pure description: "We used a learning rate of 0.001 and trained

for 100 epochs." The reviewer's immediate question is "why?" — and an unanswered why is an opening

for rejection. Strong methods sections describe and justify in the same breath: "We used a learning

rate of 0.001, selected by grid search on the validation set (Appendix B)."

You do not need a citation or experiment for every choice, but you do need to signal that the

choice was deliberate. "Following standard practice in [domain]" with a citation handles the

conventional decisions. Reserve real justification for the choices a reviewer could plausibly

attack: your baseline selection, your evaluation metric, your data split, anything non-standard.

Choices Reviewers Will Question

  • Why this baseline and not the obvious stronger one?
  • Why this metric — does it actually measure what you claim?
  • How were train/validation/test split, and could there be leakage?
  • Were hyperparameters tuned on the test set? (If yes, the paper is dead.)

4. Report Enough to Reproduce — The Checklist

Reproducibility is not a virtue you mention; it is a list of specifics you provide. The single

most common reason a result cannot be reproduced is a missing number the authors thought was

obvious. Assume nothing is obvious. Many top venues now ship a reproducibility checklist with the

submission form — read it before you write, not after, because it tells you exactly what reviewers

were told to look for.

Put exhaustive detail in an appendix or supplementary file so the main text stays readable, then

reference it explicitly. "Full hyperparameters are in Appendix C" is far stronger than burying

forty numbers in a paragraph nobody can parse. Release code and data when you can — a working

repository answers more reviewer questions than any amount of prose.

Reproducibility Minimums

  • Data: source, version, size, and exact preprocessing steps.
  • Splits: how train/val/test were created, with seeds if random.
  • Model: architecture, all hyperparameters, and how they were chosen.
  • Compute: hardware, runtime, and number of runs averaged.
  • Code: a link, or a clear statement of why it is unavailable.

5. Statistics and Baselines: The Reviewer's Attack Surface

Two things draw reviewer fire more than anything else: weak baselines and missing statistics.

Compare against the strongest method a reviewer would expect, not a straw man — if you skip the

obvious strong baseline, the first reviewer will name it and ask why it is absent, and you will

spend your rebuttal on defense instead of strength. When you cannot beat a baseline, say so and

explain the tradeoff; honesty reads better than a suspicious omission.

Report variance, not just point estimates. A single run is an anecdote. Average over multiple

seeds, report standard deviation or confidence intervals, and state how many runs you used. If

you claim method A beats method B, a reviewer will ask whether the difference survives the noise —

have the answer in the paper, ideally with a significance test appropriate to your data.

6. Common Mistakes and a Final Self-Check

The recurring failures are predictable: tuning on the test set, reporting one lucky run,

describing without justifying, and omitting the detail that turns out to be load-bearing. Each is

easy to fix before submission and expensive to fix during rebuttal, when a reviewer has already

formed a negative impression.

Before you submit, hand the methods section to a labmate who has not seen the project and ask

them to list everything they would need to reproduce it. The gaps they find are the gaps a

reviewer will find. Close them now, while it costs you an afternoon instead of a resubmission

cycle.

Methods Section Self-Check

  • Could a stranger reproduce this from the text plus appendix alone?
  • Is every non-standard choice justified, not just stated?
  • Did you compare against the strongest expected baseline?
  • Do results report variance over multiple runs, not a single number?
  • Is it 100% certain nothing was tuned on the test set?
Jin Park
About the author
Jin Park
Founder & Editorial Lead

PhD graduate who spent years tracking conference deadlines across computer science and engineering. Built ScholarDue after missing a submission window in the final year of candidacy and realizing no single tool tracked CFPs, extensions, and notification dates in one place.

Learn more