Skip to content

CSSLab/C1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C1

Chess puzzle SFT/RL on Qwen3.

Environment Setup

Quick setup (one command):

bash setup_env.sh

Or manual setup:

# Create conda environment (Python 3.12)
conda create -n c1 python=3.12 -y
conda activate c1

# Install verl and LLaMA-Factory from git
pip install --no-deps --no-build-isolation \
    "verl @ git+https://github.com/volcengine/verl.git@facd9fb50193522f87983b89f886afe8c0810acc" \
    "llamafactory @ git+https://github.com/hiyouga/LLaMA-Factory.git@a711bce664faade03b540ad30c41707ba8c928ad"

# Install other dependencies (adjust path if needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install vllm flash-attn transformers "datasets>=2.16.0,<=4.0.0" python-chess

Key packages installed:

  • torch 2.8.0 + cu128
  • vllm 0.11.0
  • flash-attn 2.8.1
  • transformers 4.57.0
  • datasets >=2.16.0,<=4.0.0
  • python-chess 1.999 (chess 1.11.2)
  • verl (git: facd9fb)
  • llamafactory (git: a711bce)

API Keys

Create api_keys.json in the project root:

{
  "openrouter": {
    "api_key": "your-openrouter-key"
  },
  "wandb": {
    "api_key": "your-wandb-key",
    "entity": "your-wandb-entity"
  },
  "huggingface": {
    "token": "your-hf-token"
  }
}

⚠️ Never commit api_keys.json - it's already in .gitignore.


Data Generation

Training data is generated from chess puzzles using Gemini API:

cd code
python 1_cot_generation.py --cfg configs/gemini3_flash.yaml    # Gemini 3 Flash data
python 1_cot_generation.py --cfg configs/gemini3.5_flash.yaml   # Gemini 3.5 Flash data

Then convert to LLaMA-Factory format:

python 2_format_matching.py --register

Data files:

  • /data/train_sft_gemini-3-flash.json - Generated with Gemini 3 Flash
  • /data/train_sft_gemini-3.5-flash.json - Generated with Gemini 3.5 Flash

SFT Training

Always use scripts/sft.sh to run training (never call llamafactory-cli directly):

conda activate c1

# 0.6B models
bash scripts/sft.sh configs/qwen3-0.6b-gemini3-flash.yaml    # GPU 0-3
bash scripts/sft.sh configs/qwen3-0.6b-gemini3.5-flash.yaml # GPU 4-7

# 4B models
bash scripts/sft.sh configs/qwen3-4b-gemini3-flash.yaml      # GPU 0-3
bash scripts/sft.sh configs/qwen3-4b-gemini3.5-flash.yaml   # GPU 4-7

GPU Allocation:

  • Training jobs: 4 GPUs each (DDP)
  • Run parallel jobs on disjoint GPU sets (e.g., GPU 0-3 and GPU 4-7)

Outputs:

  • Checkpoints: /data1/C1/qwen3-{size}/sft_{dataset}/checkpoint-*
  • Logs: /logs/sft_train_{name}_{timestamp}.log
  • WandB: lilvjosephtang-university-of-toronto/c1_sft

Training configs:

  • Base models: /data1/models/Qwen/Qwen3-0.6B, /data1/models/Qwen/Qwen3-4B
  • LoRA rank: 32
  • Training steps: 320

SFT Evaluation

Evaluate trained models on test set:

conda activate c1

# Evaluate gemini3-flash model
CUDA_VISIBLE_DEVICES=0,1,2,3 python code/sft_eval.py \
    --base_model_path /data1/models/Qwen/Qwen3-4B \
    --lora_dir /data1/C1/qwen3-4b/sft_gemini3_flash \
    --output_dir ../outputs/val_sft_gemini3_flash

# Evaluate gemini3.5-flash model
CUDA_VISIBLE_DEVICES=4,5,6,7 python code/sft_eval.py \
    --base_model_path /data1/models/Qwen/Qwen3-4B \
    --lora_dir /data1/C1/qwen3-4b/sft_gemini3.5_flash \
    --output_dir ../outputs/val_sft_gemini3.5_flash

Parameters:

  • --base_model_path: Path to base Qwen3 model
  • --lora_dir: Path to trained LoRA adapters
  • --output_dir: Where to save predictions
  • --test_data_path: Test data path (default: ../data/test.parquet)
  • --tensor_parallel_size: Number of GPUs for vLLM (default: 4)
  • --temperature: Sampling temperature (default: 0.0 for deterministic)
  • --top_p: Nucleus sampling (default: 1.0)

Outputs:

  • Predictions: {output_dir}/checkpoint-*.jsonl
  • Logs: /logs/sft_eval_{name}_{timestamp}.log

RL Training (GRPO/DAPO)

Make sure conda environment is activated:

conda activate c1

Set model path and run:

export C1_MODEL_PATH=/data1/C1/qwen3-0.6b/sft_gemini3_flash

bash scripts/grpo_qwen-0.6b.sh    # 0.6B model
bash scripts/grpo_qwen-4b.sh      # 4B model
bash scripts/dapo_qwen-4b.sh      # DAPO variant

WandB Projects

  • SFT: c1_sft (entity: lilvjosephtang-university-of-toronto)
  • RL (GRPO/DAPO): c1_rl

Override with WANDB_PROJECT env var if needed.


Project Structure

C1/
├── api_keys.json          # API keys (not in git)
├── code/
│   ├── 1_cot_generation.py # Data generation from Gemini API
│   ├── 2_format_matching.py # Convert to LLaMA-Factory format
│   ├── sft_eval.py         # SFT evaluation script
│   └── configs/           # Config files for data generation
├── configs/
│   ├── qwen3-0.6b-gemini3-flash.yaml
│   ├── qwen3-0.6b-gemini3.5-flash.yaml
│   ├── qwen3-4b-gemini3-flash.yaml
│   └── qwen3-4b-gemini3.5-flash.yaml
├── data/
│   ├── test.parquet        # Test set
│   └── train_sft_*.json    # Training data
├── logs/                   # Training and evaluation logs
├── outputs/                # Evaluation predictions
├── scripts/
│   ├── sft.sh             # SFT training wrapper
│   ├── grpo_qwen-*.sh     # GRPO training
│   └── dapo_qwen-*.sh     # DAPO training
└── setup_env.sh           # Environment setup script

Notes

  • Always use scripts/sft.sh for training - it handles wandb config properly
  • GPU allocation: Set CUDA_VISIBLE_DEVICES before running commands
  • vLLM 0.11.0 required for Qwen3 support
  • Use temperature=0.0 for deterministic evaluation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors