Ideas Have Genomes: Benchmarking Scientific Lineages through Structured Inheritance

Official code for GENE-BENCH, a benchmark that operationalizes research-idea evolution as genome-centric lineage understanding and generation. GENE-BENCH contains two components:

GENE-Exam: 42 task types and 1,029 closed-form instances evaluating four dimensions of lineage competence — genome abstraction, inheritance mapping, evolutionary reasoning, and lineage validation.
GENE-Arena: 30 domain tasks evaluating lineage-grounded idea generation via Population Evolving Score (PES).

Quickstart

pip install -r requirements.txt

Set API Credentials

export BASE_URL="https://api.openai.com/v1"
export API_KEY="sk-your-key-here"
export MODEL_NAME="gpt-4o"

Run GENE-Exam

# Smoke test (single task type, 2 instances)
python -m gene_exam.evaluators.eval_benchmark \
  --provider openai \
  --model gpt-4.1-mini \
  --task-type T1-01_contribution_type \
  --max-per-task 2 \
  --output gene_exam/results/smoke.json

# Full 42-task benchmark
python -m gene_exam.evaluators.eval_benchmark \
  --provider openai \
  --model gpt-4.1-mini \
  --concurrency 8 \
  --output gene_exam/results/eval_full.json

Run GENE-Arena PES

Place generated proposals at:

gene_arena/results/<arena-id>/ideas/<task_id>/<participant_id>_<setting>.json

Each file should contain:

{
  "content": "... generated proposal text or JSON ..."
}

Then score with PES:

python gene_arena/run_arena.py pes \
  --arena-id smoke \
  --tasks cs_AgentFramework \
  --participants openai-default \
  --settings Question \
  --judge-models judge-gpt4o judge-gpt4o-mini judge-gpt4.1-mini

Results are written to gene_arena/results/<arena-id>/.

Repository Structure

IdeasHaveGenomes/
├── gene_exam/
│   ├── Questions/           # 42 task types × 1,029 instances
│   └── evaluators/          # Exact-match evaluator
├── gene_arena/
│   ├── task/                # 30 domain tasks (10 domains × 3)
│   ├── run_arena.py         # PES runner
│   ├── dynamics_eval.py     # Evolutionary dynamics inference
│   ├── genome_differ.py     # Gene alignment & diff
│   └── population_evolving_score.py  # PES scoring
├── config.py
└── requirements.txt

License

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
gene_arena		gene_arena
gene_exam		gene_exam
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ideas Have Genomes: Benchmarking Scientific Lineages through Structured Inheritance

Quickstart

Set API Credentials

Run GENE-Exam

Run GENE-Arena PES

Repository Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ideas Have Genomes: Benchmarking Scientific Lineages through Structured Inheritance

Quickstart

Set API Credentials

Run GENE-Exam

Run GENE-Arena PES

Repository Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages