Hydra Configuration Reference

This document describes the Hydra configuration structure and parameters used for model training and management.

Configuration Overview

The configuration is organized into several sections controlling different aspects of the training pipeline:

defaults:
  - _self_
  - override hydra/job_logging: disabled
  - override hydra/hydra_logging: disabled

model:
  # Model architecture and training parameters
  # ...

training:
  # Training hyperparameters
  # ...

paths:
  # Directory paths and system locations
  # ...

Main Configuration Sections

Defaults Configuration

Defaults Overrides
Key	Description
`_self_`	Includes current config in composition hierarchy
`override hydra/job_logging`	Disables Hydra’s default job logging
`override hydra/hydra_logging`	Disables Hydra’s internal system logging

Model Configuration

Model Parameters
Parameter	Description	Default
model_name	Base model identifier from Hugging Face Hub	“Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it”
lora_r	LoRA rank dimension	16
lora_alpha	LoRA alpha scaling factor	32
qtype	Quantization type for GGUF conversion	“q4_1”
torch_dtype	Base model dtype (float16/float32)	“float16”

Training Configuration

Training Parameters (Key Items)
Parameter	Description	Default
per_device_train_batch_size	Batch size per GPU	1
gradient_accumulation_steps	Number of update steps before backward pass	4
learning_rate	Initial learning rate	2e-5
max_seq_length	Maximum input sequence length	2048
gradient_checkpointing	Enable memory-efficient training	true

Paths Configuration

Path Directories
Parameter	Description	Example
data_dir	Input dataset directory	“data”
output_dir	Trained model output directory	“models”
llama_cpp_dir	Path to llama.cpp installation	“../llama.cpp”
quantized_path	llama.cpp quantizer executable path	“build/bin/llama-quantize”

Training Pipeline Workflow

The complete training process follows these stages:

Initialization
- Configure logging and environment
- Load base model with 4-bit quantization
- Prepare tokenizer with custom padding
Data Preparation
- Load dataset from JSON files
- Generate chat-formatted prompts
- Tokenize with sequence length truncation
Model Training
- Apply LoRA configuration to base model
- Train using either SFTTrainer or GRPO
- Merge adapter weights with base model
Model Conversion
- Convert merged model to GGUF format
- Quantize using llama.cpp tools
- Save final weights to output directory

# Simplified pipeline flow
def train_pipeline(cfg):
    steps = train(cfg)
    with TemporaryDirectory() as tmp_dir:
        model_merge_for_converting(cfg, steps, tmp_dir)
        convert_to_gguf(tmp_dir, ...)
        quantize_model(...)
        copy_final_weights(...)

Important Implementation Notes

LoRA Configuration

The model uses Low-Rank Adaptation with these key settings:

LoRA Parameters
Module	Target Layers	Parameters
peft.LoraConfig	proj layers (q_proj, v_proj, etc)	r=16, alpha=32
Modules to Save	lm_head

Quantization Setup

The system supports two-stage quantization:

Training Quantization
- 4-bit NFQuant via BitsAndBytes
- Compatible dtype: float16
Post-Training Quantization
- GGUF conversion with llama.cpp
- Supported types: q4_0, q4_1, etc

Note

For optimal performance, ensure llama.cpp is compiled with CUDA support when quantizing on GPU systems.

Logging Configuration

Custom logging setup includes:

Hydra logging disabled for cleaner outputs
W&B integration for experiment tracking
Custom logging levels via logging_config.py

Warning

The hf_token field must be updated with a valid Hugging Face token when using private models or datasets.

Environment Requirements

The system requires these key dependencies:

Python 3.8+
PyTorch 2.0+
Transformers 4.30+
PEFT 0.4+
Hydra 1.3+
llama.cpp (latest version)

Full configuration schema available in conf/config.yaml