testing_model package

Submodules

testing_model.deepeval module

File for the testing model using deepeval framework

Classes:

CustomLocalModel([model, url])

A custom local model implementation for DeepEval testing.

CustomMistralModel(api_key[, model, temperature])

A custom Mistral model implementation for DeepEval testing with rate limiting.

Functions:

set_local_model_via_cli([model_name, base_url])

Set the local model via CLI using deepeval command.

test_from_dataset([test_dataset, test_file])

Test the model using a dataset of prompts.

test_mention_number_of_values(user_input, output)

Check if the model mentions the number of values inappropriately.

class testing_model.deepeval_func.CustomLocalModel(model: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', url: str = 'http://localhost:1234/v1/', *args: Any, **kwargs: Any)[source]

Bases: DeepEvalBaseLLM

A custom local model implementation for DeepEval testing.

model

The underlying language model.

Type:

ChatOpenAI

model_name

Name of the model.

Type:

str

Methods:

__init__([model, url])

Initialize the custom local model.

a_generate(prompt)

Asynchronously generate a response for the given prompt.

generate(prompt)

Generate a response for the given prompt.

get_model_name()

Get the name of the model.

load_model()

Load and return the model.

__init__(model: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', url: str = 'http://localhost:1234/v1/', *args: Any, **kwargs: Any)[source]

Initialize the custom local model.

Parameters:
  • model (str, optional) – Name of the model. Defaults to “vikhr-yandexgpt-5-lite-8b-it_gguf”.

  • url (str, optional) – Base URL for the model. Defaults to “http://localhost:1234/v1/”.

async a_generate(prompt: str) str[source]

Asynchronously generate a response for the given prompt.

Parameters:

prompt (str) – Input prompt for the model.

Returns:

Generated model response.

Return type:

str

generate(prompt: str) str[source]

Generate a response for the given prompt.

Parameters:

prompt (str) – Input prompt for the model.

Returns:

Generated model response.

Return type:

str

get_model_name() str[source]

Get the name of the model.

Returns:

Model name.

Return type:

str

load_model() ChatOpenAI[source]

Load and return the model.

Returns:

The loaded language model.

Return type:

ChatOpenAI

class testing_model.deepeval_func.CustomMistralModel(api_key: str, model: str = 'mistral-small-latest', temperature: float = 0.1, *args: Any, **kwargs: Any)[source]

Bases: DeepEvalBaseLLM

A custom Mistral model implementation for DeepEval testing with rate limiting.

client

Mistral API client.

Type:

Mistral

model_name

Name of the model.

Type:

str

temperature

Sampling temperature.

Type:

float

last_request_time

Timestamp of last API request.

Type:

Optional[float]

rate_limit_delay

Minimum delay between requests.

Type:

float

Methods:

__init__(api_key[, model, temperature])

Initialize the custom Mistral model.

a_generate(prompt)

Asynchronously generate a response for the given prompt.

generate(prompt)

Generate a response for the given prompt.

get_model_name()

Get the name of the model.

load_model()

Load and return the Mistral client.

__init__(api_key: str, model: str = 'mistral-small-latest', temperature: float = 0.1, *args: Any, **kwargs: Any)[source]

Initialize the custom Mistral model.

Parameters:
  • api_key (str) – API key for Mistral service.

  • model (str, optional) – Name of the model. Defaults to “mistral-small-latest”.

  • temperature (float, optional) – Sampling temperature. Defaults to 0.1.

async a_generate(prompt: str) str[source]

Asynchronously generate a response for the given prompt.

Parameters:

prompt (str) – Input prompt for the model.

Returns:

Generated model response.

Return type:

str

generate(prompt: str) str[source]

Generate a response for the given prompt.

Parameters:

prompt (str) – Input prompt for the model.

Returns:

Generated model response.

Return type:

str

get_model_name() str[source]

Get the name of the model.

Returns:

Model name.

Return type:

str

load_model() Mistral[source]

Load and return the Mistral client.

Returns:

The Mistral API client.

Return type:

Mistral

testing_model.deepeval_func.set_local_model_via_cli(model_name: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', base_url: str = 'http://localhost:1234/v1') None[source]

Set the local model via CLI using deepeval command.

Parameters:
  • model_name (str, optional) – Name of the model. Defaults to “vikhr-yandexgpt-5-lite-8b-it_gguf”.

  • base_url (str, optional) – Base URL for the model. Defaults to “http://localhost:1234/v1”.

Prints:
  • Success message with command output

  • Error message if command fails

testing_model.deepeval_func.test_from_dataset(test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json') None[source]

Test the model using a dataset of prompts.

Parameters:
  • test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.

  • test_file (str, optional) – Path to the processed test file. Defaults to “test.json”.

Logs:
  • Errors for failed tests

  • Final test metrics

testing_model.deepeval_func.test_mention_number_of_values(user_input: str, output: str) bool[source]

Check if the model mentions the number of values inappropriately.

Parameters:
  • user_input (str) – The original user input.

  • output (str) – The model’s generated output.

Returns:

Result of the DeepEval test.

Return type:

bool

Raises:

AssertionError – If the test fails based on the defined criteria.

testing_model.test module

File for testing llm model

Classes:

Content(*, Action)

The inner element of the pydantic schema for testing model

MainModel(*, MessageText, Content)

Pydantic schema for testing model

Functions:

call_llm(prompt, model, client)

Sends a prompt to the LLM and returns its response as a dictionary.

dataset_to_json_for_test(dataset, filename)

Convert a dataset to a JSON file for testing purposes.

llamacpp_execute_test(llm, system_prompt, ...)

Выполняет тест для одного запроса.

ollama_generate(client, model_name, prompt, ...)

Wrapper function to generate a response using Ollama's structured outputs.

run_tests(cfg, client[, test_dataset_path, ...])

Runs tests by comparing the LLM responses with expected answers from a dataset.

test_llm(cfg[, path_test_dataset, ...])

Test the LLM via LM Studio by comparing model responses with expected answers.

test_via_llamacpp(model_path[, ...])

Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

class testing_model.test.Content(*, Action: str)[source]

Bases: BaseModel

The inner element of the pydantic schema for testing model

Attributes:

Action

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Action: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class testing_model.test.MainModel(*, MessageText: str, Content: Content)[source]

Bases: BaseModel

Pydantic schema for testing model

Attributes:

Content

MessageText

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Content: Content
MessageText: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

testing_model.test.call_llm(prompt: str, model: str, client: OpenAI) str[source]

Sends a prompt to the LLM and returns its response as a dictionary.

Parameters:
  • prompt (str) – The user prompt.

  • model (str) – The model identifier.

  • client (OpenAI) – openai client for the llm.

Returns:

Parsed LLM response.

Return type:

dict

testing_model.test.dataset_to_json_for_test(dataset: Dict[str, Any], filename: str) None[source]

Convert a dataset to a JSON file for testing purposes.

Parameters:
  • dataset (Dict[str, Any]) – The dataset containing system and example information.

  • filename (str) – The path to the output JSON file.

Returns:

None

testing_model.test.llamacpp_execute_test(llm, system_prompt: str, prompt: str, expected_answer: str, max_tokens: int, temperature: float) tuple[dict, bool][source]

Выполняет тест для одного запроса.

Parameters:
  • llm – Модель для генерации ответов.

  • system_prompt (str) – Системный промпт с инструкциями.

  • prompt (str) – Пользовательский запрос.

  • expected_answer (str) – Ожидаемый результат.

  • max_tokens (int) – Максимальное число генерируемых токенов.

  • temperature (float) – Параметр температуры для генерации.

Returns:

Кортеж, содержащий словарь с результатами теста и булевое значение (True, если тест пройден).

Return type:

tuple

testing_model.test.ollama_generate(client: Client, model_name: str | bytes, prompt: str, schema: Dict) Dict[source]

Wrapper function to generate a response using Ollama’s structured outputs.

Parameters:
  • client (ollama.Client) – The Ollama client instance.

  • model_name (str) – The name of the model in Ollama.

  • prompt (str) – The formatted prompt to send to the model.

  • schema (Dict) – The JSON schema for the expected response.

Returns:

The parsed JSON response conforming to the schema.

Return type:

Dict

testing_model.test.run_tests(cfg: DictConfig, client: OpenAI | Client, test_dataset_path: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: callable = None, use_ollama: bool = False) None[source]

Runs tests by comparing the LLM responses with expected answers from a dataset.

Parameters:
  • cfg (DictConfig) – Configuration with model settings.

  • client (OpenAI | ollama.client) – OpenAI or ollama client for LLM interaction.

  • test_dataset_path (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.

  • test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.

  • test_func (callable, optional) – Additional test function to execute on each result. This function should accept the user prompt, LLM’s message text and the correct answer.

  • use_ollama (bool) – Flag to indicate if Ollama should be used for testing.

Returns:

None

testing_model.test.test_llm(cfg: DictConfig, path_test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: List[Callable] | None = None, llm_url: str | None = 'http://localhost:1234/v1/', use_ollama: bool = False, ollama_client: Client | None = None) None[source]

Test the LLM via LM Studio by comparing model responses with expected answers.

Parameters:
  • cfg (DictConfig) – Configuration dictionary containing model settings.

  • path_test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.

  • test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.

  • test_func (Optional[List[Callable]]) – List of additional test functions to execute on each result.

  • llm_url (str, optional) – URL of the LLM service. Defaults to “http://localhost:1234/v1/”.

  • use_ollama (bool) – Flag to indicate if Ollama should be used for testing.

  • ollama_client (Optional[ollama.Client]) – Ollama client for connection

Returns:

None

Raises:

Logs errors for failed tests and prints accuracy metrics.

testing_model.test.test_via_llamacpp(model_path: str | bytes, test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', n_gpu_layers: int = -1, n_ctx: int = 2048, temperature: float = 0.7, max_tokens: int = 2048, test_func: ~typing.Callable = <function llamacpp_execute_test>, system_prompt: str | None = None) float[source]

Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

Parameters:
  • model_path (str | bytes) – Путь к файлу модели GGUF.

  • test_dataset (str, optional) – Путь к JSON файлу с тестовыми данными. По умолчанию “data/test_ru.json”.

  • test_file (str, optional) – Путь для сохранения обработанного тестового файла. По умолчанию “test.json”.

  • n_gpu_layers (int, optional) – Количество слоёв для вычислений на GPU. По умолчанию -1 (все слои).

  • n_ctx (int, optional) – Размер окна контекста. По умолчанию 2048.

  • temperature (float, optional) – Температура сэмплинга. По умолчанию 0.7.

  • max_tokens (int, optional) – Максимальное количество генерируемых токенов. По умолчанию 2048.

  • test_func (Callable) – Функция, реализующая принцип тестирования.

  • system_prompt (Optional[str], optional) – Системный промпт для модели.

Returns:

Значение точности (accuracy).

Return type:

float

Module contents

Functions:

test_from_dataset([test_dataset, test_file])

Test the model using a dataset of prompts.

test_llm(cfg[, path_test_dataset, ...])

Test the LLM via LM Studio by comparing model responses with expected answers.

test_mention_number_of_values(user_input, output)

Check if the model mentions the number of values inappropriately.

test_via_llamacpp(model_path[, ...])

Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

testing_model.test_from_dataset(test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json') None[source]

Test the model using a dataset of prompts.

Parameters:
  • test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.

  • test_file (str, optional) – Path to the processed test file. Defaults to “test.json”.

Logs:
  • Errors for failed tests

  • Final test metrics

testing_model.test_llm(cfg: DictConfig, path_test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: List[Callable] | None = None, llm_url: str | None = 'http://localhost:1234/v1/', use_ollama: bool = False, ollama_client: Client | None = None) None[source]

Test the LLM via LM Studio by comparing model responses with expected answers.

Parameters:
  • cfg (DictConfig) – Configuration dictionary containing model settings.

  • path_test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.

  • test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.

  • test_func (Optional[List[Callable]]) – List of additional test functions to execute on each result.

  • llm_url (str, optional) – URL of the LLM service. Defaults to “http://localhost:1234/v1/”.

  • use_ollama (bool) – Flag to indicate if Ollama should be used for testing.

  • ollama_client (Optional[ollama.Client]) – Ollama client for connection

Returns:

None

Raises:

Logs errors for failed tests and prints accuracy metrics.

testing_model.test_mention_number_of_values(user_input: str, output: str) bool[source]

Check if the model mentions the number of values inappropriately.

Parameters:
  • user_input (str) – The original user input.

  • output (str) – The model’s generated output.

Returns:

Result of the DeepEval test.

Return type:

bool

Raises:

AssertionError – If the test fails based on the defined criteria.

testing_model.test_via_llamacpp(model_path: str | bytes, test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', n_gpu_layers: int = -1, n_ctx: int = 2048, temperature: float = 0.7, max_tokens: int = 2048, test_func: ~typing.Callable = <function llamacpp_execute_test>, system_prompt: str | None = None) float[source]

Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.

Parameters:
  • model_path (str | bytes) – Путь к файлу модели GGUF.

  • test_dataset (str, optional) – Путь к JSON файлу с тестовыми данными. По умолчанию “data/test_ru.json”.

  • test_file (str, optional) – Путь для сохранения обработанного тестового файла. По умолчанию “test.json”.

  • n_gpu_layers (int, optional) – Количество слоёв для вычислений на GPU. По умолчанию -1 (все слои).

  • n_ctx (int, optional) – Размер окна контекста. По умолчанию 2048.

  • temperature (float, optional) – Температура сэмплинга. По умолчанию 0.7.

  • max_tokens (int, optional) – Максимальное количество генерируемых токенов. По умолчанию 2048.

  • test_func (Callable) – Функция, реализующая принцип тестирования.

  • system_prompt (Optional[str], optional) – Системный промпт для модели.

Returns:

Значение точности (accuracy).

Return type:

float