testing_model package

Submodules

testing_model.deepeval module

File for the testing model using deepeval framework

Classes:

`CustomLocalModel`([model, url])	A custom local model implementation for DeepEval testing.
`CustomMistralModel`(api_key[, model, temperature])	A custom Mistral model implementation for DeepEval testing with rate limiting.

Functions:

`set_local_model_via_cli`([model_name, base_url])	Set the local model via CLI using deepeval command.
`test_from_dataset`([test_dataset, test_file])	Test the model using a dataset of prompts.
`test_mention_number_of_values`(user_input, output)	Check if the model mentions the number of values inappropriately.

class testing_model.deepeval_func.CustomLocalModel(model: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', url: str = 'http://localhost:1234/v1/', *args: Any, **kwargs: Any)[source]

Bases: DeepEvalBaseLLM

A custom local model implementation for DeepEval testing.

model

The underlying language model.

Type:: ChatOpenAI

model_name

Name of the model.

Type:: str

Methods:

`__init__`([model, url])	Initialize the custom local model.
`a_generate`(prompt)	Asynchronously generate a response for the given prompt.
`generate`(prompt)	Generate a response for the given prompt.
`get_model_name`()	Get the name of the model.
`load_model`()	Load and return the model.

__init__(model: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', url: str = 'http://localhost:1234/v1/', *args: Any, **kwargs: Any)[source]

Initialize the custom local model.

Parameters:

model (str, optional) – Name of the model. Defaults to “vikhr-yandexgpt-5-lite-8b-it_gguf”.
url (str, optional) – Base URL for the model. Defaults to “http://localhost:1234/v1/”.

async a_generate(prompt: str) → str[source]

Asynchronously generate a response for the given prompt.

Parameters:: prompt (str) – Input prompt for the model.
Returns:: Generated model response.
Return type:: str

generate(prompt: str) → str[source]

Generate a response for the given prompt.

Parameters:: prompt (str) – Input prompt for the model.
Returns:: Generated model response.
Return type:: str

get_model_name() → str[source]

Get the name of the model.

Returns:: Model name.
Return type:: str

load_model() → ChatOpenAI[source]

Load and return the model.

Returns:: The loaded language model.
Return type:: ChatOpenAI

class testing_model.deepeval_func.CustomMistralModel(api_key: str, model: str = 'mistral-small-latest', temperature: float = 0.1, *args: Any, **kwargs: Any)[source]

Bases: DeepEvalBaseLLM

A custom Mistral model implementation for DeepEval testing with rate limiting.

client

Mistral API client.

Type:: Mistral

model_name

Name of the model.

Type:: str

temperature

Sampling temperature.

Type:: float

last_request_time

Timestamp of last API request.

Type:: Optional[float]

rate_limit_delay

Minimum delay between requests.

Type:: float

Methods:

`__init__`(api_key[, model, temperature])	Initialize the custom Mistral model.
`a_generate`(prompt)	Asynchronously generate a response for the given prompt.
`generate`(prompt)	Generate a response for the given prompt.
`get_model_name`()	Get the name of the model.
`load_model`()	Load and return the Mistral client.

__init__(api_key: str, model: str = 'mistral-small-latest', temperature: float = 0.1, *args: Any, **kwargs: Any)[source]

Initialize the custom Mistral model.

Parameters:

api_key (str) – API key for Mistral service.
model (str, optional) – Name of the model. Defaults to “mistral-small-latest”.
temperature (float, optional) – Sampling temperature. Defaults to 0.1.

async a_generate(prompt: str) → str[source]

Asynchronously generate a response for the given prompt.

Parameters:: prompt (str) – Input prompt for the model.
Returns:: Generated model response.
Return type:: str

generate(prompt: str) → str[source]

Generate a response for the given prompt.

Parameters:: prompt (str) – Input prompt for the model.
Returns:: Generated model response.
Return type:: str

get_model_name() → str[source]

Get the name of the model.

Returns:: Model name.
Return type:: str

load_model() → Mistral[source]

Load and return the Mistral client.

Returns:: The Mistral API client.
Return type:: Mistral

testing_model.deepeval_func.set_local_model_via_cli(model_name: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', base_url: str = 'http://localhost:1234/v1') → None[source]

Set the local model via CLI using deepeval command.

Parameters:

model_name (str, optional) – Name of the model. Defaults to “vikhr-yandexgpt-5-lite-8b-it_gguf”.
base_url (str, optional) – Base URL for the model. Defaults to “http://localhost:1234/v1”.

Prints:

Success message with command output
Error message if command fails

testing_model.deepeval_func.test_from_dataset(test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json') → None[source]

Test the model using a dataset of prompts.

Parameters:

test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to the processed test file. Defaults to “test.json”.

Logs:

Errors for failed tests
Final test metrics

testing_model.deepeval_func.test_mention_number_of_values(user_input: str, output: str) → bool[source]

Check if the model mentions the number of values inappropriately.

Parameters:

user_input (str) – The original user input.
output (str) – The model’s generated output.

Returns:

Result of the DeepEval test.

Return type:

bool

Raises:

AssertionError – If the test fails based on the defined criteria.

testing_model.test module

File for testing llm model

Classes:

`Content`(*, Action)	The inner element of the pydantic schema for testing model
`MainModel`(*, MessageText, Content)	Pydantic schema for testing model

Functions:

`call_llm`(prompt, model, client)	Sends a prompt to the LLM and returns its response as a dictionary.
`dataset_to_json_for_test`(dataset, filename)	Convert a dataset to a JSON file for testing purposes.
`llamacpp_execute_test`(llm, system_prompt, ...)	Выполняет тест для одного запроса.
`ollama_generate`(client, model_name, prompt, ...)	Wrapper function to generate a response using Ollama's structured outputs.
`run_tests`(cfg, client[, test_dataset_path, ...])	Runs tests by comparing the LLM responses with expected answers from a dataset.
`test_llm`(cfg[, path_test_dataset, ...])	Test the LLM via LM Studio by comparing model responses with expected answers.
`test_via_llamacpp`(model_path[, ...])	Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

class testing_model.test.Content(*, Action: str)[source]

Bases: BaseModel

The inner element of the pydantic schema for testing model

Attributes:

`Action`
`model_config`	Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Action: str

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class testing_model.test.MainModel(*, MessageText: str, Content: Content)[source]

Bases: BaseModel

Pydantic schema for testing model

Attributes:

`Content`
`MessageText`
`model_config`	Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Content: Content

MessageText: str

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

testing_model.test.call_llm(prompt: str, model: str, client: OpenAI) → str[source]

Sends a prompt to the LLM and returns its response as a dictionary.

Parameters:

prompt (str) – The user prompt.
model (str) – The model identifier.
client (OpenAI) – openai client for the llm.

Returns:

Parsed LLM response.

Return type:

dict

testing_model.test.dataset_to_json_for_test(dataset: Dict[str, Any], filename: str) → None[source]

Convert a dataset to a JSON file for testing purposes.

Parameters:

dataset (Dict[str, Any]) – The dataset containing system and example information.
filename (str) – The path to the output JSON file.

Returns:

None

testing_model.test.llamacpp_execute_test(llm, system_prompt: str, prompt: str, expected_answer: str, max_tokens: int, temperature: float) → tuple[dict, bool][source]

Выполняет тест для одного запроса.

Parameters:

llm – Модель для генерации ответов.
system_prompt (str) – Системный промпт с инструкциями.
prompt (str) – Пользовательский запрос.
expected_answer (str) – Ожидаемый результат.
max_tokens (int) – Максимальное число генерируемых токенов.
temperature (float) – Параметр температуры для генерации.

Returns:

Кортеж, содержащий словарь с результатами теста и булевое значение (True, если тест пройден).

Return type:

tuple

testing_model.test.ollama_generate(client: Client, model_name: str | bytes, prompt: str, schema: Dict) → Dict[source]

Wrapper function to generate a response using Ollama’s structured outputs.

Parameters:

client (ollama.Client) – The Ollama client instance.
model_name (str) – The name of the model in Ollama.
prompt (str) – The formatted prompt to send to the model.
schema (Dict) – The JSON schema for the expected response.

Returns:

The parsed JSON response conforming to the schema.

Return type:

Dict

testing_model.test.run_tests(cfg: DictConfig, client: OpenAI | Client, test_dataset_path: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: callable = None, use_ollama: bool = False) → None[source]

Runs tests by comparing the LLM responses with expected answers from a dataset.

Parameters:

cfg (DictConfig) – Configuration with model settings.
client (OpenAI | ollama.client) – OpenAI or ollama client for LLM interaction.
test_dataset_path (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.
test_func (callable, optional) – Additional test function to execute on each result. This function should accept the user prompt, LLM’s message text and the correct answer.
use_ollama (bool) – Flag to indicate if Ollama should be used for testing.

Returns:

None

testing_model.test.test_llm(cfg: DictConfig, path_test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: List[Callable] | None = None, llm_url: str | None = 'http://localhost:1234/v1/', use_ollama: bool = False, ollama_client: Client | None = None) → None[source]

Test the LLM via LM Studio by comparing model responses with expected answers.

Parameters:

cfg (DictConfig) – Configuration dictionary containing model settings.
path_test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.
test_func (Optional[List[Callable]]) – List of additional test functions to execute on each result.
llm_url (str, optional) – URL of the LLM service. Defaults to “http://localhost:1234/v1/”.
use_ollama (bool) – Flag to indicate if Ollama should be used for testing.
ollama_client (Optional[ollama.Client]) – Ollama client for connection

Returns:

None

Raises:

Logs errors for failed tests and prints accuracy metrics. –

testing_model.test.test_via_llamacpp(model_path: str | bytes, test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', n_gpu_layers: int = -1, n_ctx: int = 2048, temperature: float = 0.7, max_tokens: int = 2048, test_func: ~typing.Callable = <function llamacpp_execute_test>, system_prompt: str | None = None) → float[source]

Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

Parameters:

model_path (str | bytes) – Путь к файлу модели GGUF.
test_dataset (str, optional) – Путь к JSON файлу с тестовыми данными. По умолчанию “data/test_ru.json”.
test_file (str, optional) – Путь для сохранения обработанного тестового файла. По умолчанию “test.json”.
n_gpu_layers (int, optional) – Количество слоёв для вычислений на GPU. По умолчанию -1 (все слои).
n_ctx (int, optional) – Размер окна контекста. По умолчанию 2048.
temperature (float, optional) – Температура сэмплинга. По умолчанию 0.7.
max_tokens (int, optional) – Максимальное количество генерируемых токенов. По умолчанию 2048.
test_func (Callable) – Функция, реализующая принцип тестирования.
system_prompt (Optional[str], optional) – Системный промпт для модели.

Returns:

Значение точности (accuracy).

Return type:

float

Module contents

Functions:

`test_from_dataset`([test_dataset, test_file])	Test the model using a dataset of prompts.
`test_llm`(cfg[, path_test_dataset, ...])	Test the LLM via LM Studio by comparing model responses with expected answers.
`test_mention_number_of_values`(user_input, output)	Check if the model mentions the number of values inappropriately.
`test_via_llamacpp`(model_path[, ...])	Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

testing_model.test_from_dataset(test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json') → None[source]

Test the model using a dataset of prompts.

Parameters:

test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to the processed test file. Defaults to “test.json”.

Logs:

Errors for failed tests
Final test metrics

testing_model.test_llm(cfg: DictConfig, path_test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: List[Callable] | None = None, llm_url: str | None = 'http://localhost:1234/v1/', use_ollama: bool = False, ollama_client: Client | None = None) → None[source]

Test the LLM via LM Studio by comparing model responses with expected answers.

Parameters:

cfg (DictConfig) – Configuration dictionary containing model settings.
path_test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.
test_func (Optional[List[Callable]]) – List of additional test functions to execute on each result.
llm_url (str, optional) – URL of the LLM service. Defaults to “http://localhost:1234/v1/”.
use_ollama (bool) – Flag to indicate if Ollama should be used for testing.
ollama_client (Optional[ollama.Client]) – Ollama client for connection

Returns:

None

Raises:

Logs errors for failed tests and prints accuracy metrics. –

testing_model.test_mention_number_of_values(user_input: str, output: str) → bool[source]

Check if the model mentions the number of values inappropriately.

Parameters:

user_input (str) – The original user input.
output (str) – The model’s generated output.

Returns:

Result of the DeepEval test.

Return type:

bool

Raises:

AssertionError – If the test fails based on the defined criteria.

testing_model.test_via_llamacpp(model_path: str | bytes, test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', n_gpu_layers: int = -1, n_ctx: int = 2048, temperature: float = 0.7, max_tokens: int = 2048, test_func: ~typing.Callable = <function llamacpp_execute_test>, system_prompt: str | None = None) → float[source]

Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

Parameters:

model_path (str | bytes) – Путь к файлу модели GGUF.
test_dataset (str, optional) – Путь к JSON файлу с тестовыми данными. По умолчанию “data/test_ru.json”.
test_file (str, optional) – Путь для сохранения обработанного тестового файла. По умолчанию “test.json”.
n_gpu_layers (int, optional) – Количество слоёв для вычислений на GPU. По умолчанию -1 (все слои).
n_ctx (int, optional) – Размер окна контекста. По умолчанию 2048.
temperature (float, optional) – Температура сэмплинга. По умолчанию 0.7.
max_tokens (int, optional) – Максимальное количество генерируемых токенов. По умолчанию 2048.
test_func (Callable) – Функция, реализующая принцип тестирования.
system_prompt (Optional[str], optional) – Системный промпт для модели.

Returns:

Значение точности (accuracy).

Return type:

float