add script to analyse dataset

This commit is contained in:
2026-04-03 15:45:54 +02:00
parent 49f2e0b008
commit 8e733dfe39
3 changed files with 268 additions and 0 deletions
+18
View File
@@ -133,6 +133,24 @@ just curate-dataset append=true
just curate-dataset append=true archive=true archive_dir=data/dataset/archive
```
Analyze dataset quality overall and by day (best game overall/day included):
```sh
python -m server.DatasetStats --input "good_moves-*.jsonl"
python -m server.DatasetStats --input data/dataset --output data/dataset/stats-report.json
```
The stats report now includes both:
- `best_game` (survival/length focused)
- `best_pressure_game` (high-pressure quality focused: fewer safe options + strong survival)
Or with `just`:
```sh
just analyze-dataset
just analyze-dataset input=data/dataset output=data/dataset/stats-report.json
```
To store compact dataset-only records (JSONL) and skip full per-game JSON files:
```sh