testing_model package
Submodules
testing_model.deepeval module
File for the testing model using deepeval framework
Classes:
|
A custom local model implementation for DeepEval testing. |
|
A custom Mistral model implementation for DeepEval testing with rate limiting. |
Functions:
|
Set the local model via CLI using deepeval command. |
|
Test the model using a dataset of prompts. |
|
Check if the model mentions the number of values inappropriately. |
- class testing_model.deepeval_func.CustomLocalModel(model: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', url: str = 'http://localhost:1234/v1/', *args: Any, **kwargs: Any)[source]
Bases:
DeepEvalBaseLLMA custom local model implementation for DeepEval testing.
- model
The underlying language model.
- Type:
ChatOpenAI
- model_name
Name of the model.
- Type:
str
Methods:
__init__([model, url])Initialize the custom local model.
a_generate(prompt)Asynchronously generate a response for the given prompt.
generate(prompt)Generate a response for the given prompt.
Get the name of the model.
Load and return the model.
- __init__(model: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', url: str = 'http://localhost:1234/v1/', *args: Any, **kwargs: Any)[source]
Initialize the custom local model.
- Parameters:
model (str, optional) – Name of the model. Defaults to “vikhr-yandexgpt-5-lite-8b-it_gguf”.
url (str, optional) – Base URL for the model. Defaults to “http://localhost:1234/v1/”.
- async a_generate(prompt: str) str[source]
Asynchronously generate a response for the given prompt.
- Parameters:
prompt (str) – Input prompt for the model.
- Returns:
Generated model response.
- Return type:
str
- class testing_model.deepeval_func.CustomMistralModel(api_key: str, model: str = 'mistral-small-latest', temperature: float = 0.1, *args: Any, **kwargs: Any)[source]
Bases:
DeepEvalBaseLLMA custom Mistral model implementation for DeepEval testing with rate limiting.
- client
Mistral API client.
- Type:
Mistral
- model_name
Name of the model.
- Type:
str
- temperature
Sampling temperature.
- Type:
float
- last_request_time
Timestamp of last API request.
- Type:
Optional[float]
- rate_limit_delay
Minimum delay between requests.
- Type:
float
Methods:
__init__(api_key[, model, temperature])Initialize the custom Mistral model.
a_generate(prompt)Asynchronously generate a response for the given prompt.
generate(prompt)Generate a response for the given prompt.
Get the name of the model.
Load and return the Mistral client.
- __init__(api_key: str, model: str = 'mistral-small-latest', temperature: float = 0.1, *args: Any, **kwargs: Any)[source]
Initialize the custom Mistral model.
- Parameters:
api_key (str) – API key for Mistral service.
model (str, optional) – Name of the model. Defaults to “mistral-small-latest”.
temperature (float, optional) – Sampling temperature. Defaults to 0.1.
- async a_generate(prompt: str) str[source]
Asynchronously generate a response for the given prompt.
- Parameters:
prompt (str) – Input prompt for the model.
- Returns:
Generated model response.
- Return type:
str
- testing_model.deepeval_func.set_local_model_via_cli(model_name: str = 'vikhr-yandexgpt-5-lite-8b-it_gguf', base_url: str = 'http://localhost:1234/v1') None[source]
Set the local model via CLI using deepeval command.
- Parameters:
model_name (str, optional) – Name of the model. Defaults to “vikhr-yandexgpt-5-lite-8b-it_gguf”.
base_url (str, optional) – Base URL for the model. Defaults to “http://localhost:1234/v1”.
- Prints:
Success message with command output
Error message if command fails
- testing_model.deepeval_func.test_from_dataset(test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json') None[source]
Test the model using a dataset of prompts.
- Parameters:
test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to the processed test file. Defaults to “test.json”.
- Logs:
Errors for failed tests
Final test metrics
- testing_model.deepeval_func.test_mention_number_of_values(user_input: str, output: str) bool[source]
Check if the model mentions the number of values inappropriately.
- Parameters:
user_input (str) – The original user input.
output (str) – The model’s generated output.
- Returns:
Result of the DeepEval test.
- Return type:
bool
- Raises:
AssertionError – If the test fails based on the defined criteria.
testing_model.test module
File for testing llm model
Classes:
|
The inner element of the pydantic schema for testing model |
|
Pydantic schema for testing model |
Functions:
|
Sends a prompt to the LLM and returns its response as a dictionary. |
|
Convert a dataset to a JSON file for testing purposes. |
|
Выполняет тест для одного запроса. |
|
Wrapper function to generate a response using Ollama's structured outputs. |
|
Runs tests by comparing the LLM responses with expected answers from a dataset. |
|
Test the LLM via LM Studio by comparing model responses with expected answers. |
|
Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования. |
- class testing_model.test.Content(*, Action: str)[source]
Bases:
BaseModelThe inner element of the pydantic schema for testing model
Attributes:
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- Action: str
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class testing_model.test.MainModel(*, MessageText: str, Content: Content)[source]
Bases:
BaseModelPydantic schema for testing model
Attributes:
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- MessageText: str
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- testing_model.test.call_llm(prompt: str, model: str, client: OpenAI) str[source]
Sends a prompt to the LLM and returns its response as a dictionary.
- Parameters:
prompt (str) – The user prompt.
model (str) – The model identifier.
client (OpenAI) – openai client for the llm.
- Returns:
Parsed LLM response.
- Return type:
dict
- testing_model.test.dataset_to_json_for_test(dataset: Dict[str, Any], filename: str) None[source]
Convert a dataset to a JSON file for testing purposes.
- Parameters:
dataset (Dict[str, Any]) – The dataset containing system and example information.
filename (str) – The path to the output JSON file.
- Returns:
None
- testing_model.test.llamacpp_execute_test(llm, system_prompt: str, prompt: str, expected_answer: str, max_tokens: int, temperature: float) tuple[dict, bool][source]
Выполняет тест для одного запроса.
- Parameters:
llm – Модель для генерации ответов.
system_prompt (str) – Системный промпт с инструкциями.
prompt (str) – Пользовательский запрос.
expected_answer (str) – Ожидаемый результат.
max_tokens (int) – Максимальное число генерируемых токенов.
temperature (float) – Параметр температуры для генерации.
- Returns:
Кортеж, содержащий словарь с результатами теста и булевое значение (True, если тест пройден).
- Return type:
tuple
- testing_model.test.ollama_generate(client: Client, model_name: str | bytes, prompt: str, schema: Dict) Dict[source]
Wrapper function to generate a response using Ollama’s structured outputs.
- Parameters:
client (ollama.Client) – The Ollama client instance.
model_name (str) – The name of the model in Ollama.
prompt (str) – The formatted prompt to send to the model.
schema (Dict) – The JSON schema for the expected response.
- Returns:
The parsed JSON response conforming to the schema.
- Return type:
Dict
- testing_model.test.run_tests(cfg: DictConfig, client: OpenAI | Client, test_dataset_path: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: callable = None, use_ollama: bool = False) None[source]
Runs tests by comparing the LLM responses with expected answers from a dataset.
- Parameters:
cfg (DictConfig) – Configuration with model settings.
client (OpenAI | ollama.client) – OpenAI or ollama client for LLM interaction.
test_dataset_path (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.
test_func (callable, optional) – Additional test function to execute on each result. This function should accept the user prompt, LLM’s message text and the correct answer.
use_ollama (bool) – Flag to indicate if Ollama should be used for testing.
- Returns:
None
- testing_model.test.test_llm(cfg: DictConfig, path_test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: List[Callable] | None = None, llm_url: str | None = 'http://localhost:1234/v1/', use_ollama: bool = False, ollama_client: Client | None = None) None[source]
Test the LLM via LM Studio by comparing model responses with expected answers.
- Parameters:
cfg (DictConfig) – Configuration dictionary containing model settings.
path_test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.
test_func (Optional[List[Callable]]) – List of additional test functions to execute on each result.
llm_url (str, optional) – URL of the LLM service. Defaults to “http://localhost:1234/v1/”.
use_ollama (bool) – Flag to indicate if Ollama should be used for testing.
ollama_client (Optional[ollama.Client]) – Ollama client for connection
- Returns:
None
- Raises:
Logs errors for failed tests and prints accuracy metrics. –
- testing_model.test.test_via_llamacpp(model_path: str | bytes, test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', n_gpu_layers: int = -1, n_ctx: int = 2048, temperature: float = 0.7, max_tokens: int = 2048, test_func: ~typing.Callable = <function llamacpp_execute_test>, system_prompt: str | None = None) float[source]
Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.
- Parameters:
model_path (str | bytes) – Путь к файлу модели GGUF.
test_dataset (str, optional) – Путь к JSON файлу с тестовыми данными. По умолчанию “data/test_ru.json”.
test_file (str, optional) – Путь для сохранения обработанного тестового файла. По умолчанию “test.json”.
n_gpu_layers (int, optional) – Количество слоёв для вычислений на GPU. По умолчанию -1 (все слои).
n_ctx (int, optional) – Размер окна контекста. По умолчанию 2048.
temperature (float, optional) – Температура сэмплинга. По умолчанию 0.7.
max_tokens (int, optional) – Максимальное количество генерируемых токенов. По умолчанию 2048.
test_func (Callable) – Функция, реализующая принцип тестирования.
system_prompt (Optional[str], optional) – Системный промпт для модели.
- Returns:
Значение точности (accuracy).
- Return type:
float
Module contents
Functions:
|
Test the model using a dataset of prompts. |
|
Test the LLM via LM Studio by comparing model responses with expected answers. |
|
Check if the model mentions the number of values inappropriately. |
|
Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования. |
- testing_model.test_from_dataset(test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json') None[source]
Test the model using a dataset of prompts.
- Parameters:
test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to the processed test file. Defaults to “test.json”.
- Logs:
Errors for failed tests
Final test metrics
- testing_model.test_llm(cfg: DictConfig, path_test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', test_func: List[Callable] | None = None, llm_url: str | None = 'http://localhost:1234/v1/', use_ollama: bool = False, ollama_client: Client | None = None) None[source]
Test the LLM via LM Studio by comparing model responses with expected answers.
- Parameters:
cfg (DictConfig) – Configuration dictionary containing model settings.
path_test_dataset (str, optional) – Path to the test dataset JSON file. Defaults to “data/test_ru.json”.
test_file (str, optional) – Path to save the processed test file. Defaults to “test.json”.
test_func (Optional[List[Callable]]) – List of additional test functions to execute on each result.
llm_url (str, optional) – URL of the LLM service. Defaults to “http://localhost:1234/v1/”.
use_ollama (bool) – Flag to indicate if Ollama should be used for testing.
ollama_client (Optional[ollama.Client]) – Ollama client for connection
- Returns:
None
- Raises:
Logs errors for failed tests and prints accuracy metrics. –
- testing_model.test_mention_number_of_values(user_input: str, output: str) bool[source]
Check if the model mentions the number of values inappropriately.
- Parameters:
user_input (str) – The original user input.
output (str) – The model’s generated output.
- Returns:
Result of the DeepEval test.
- Return type:
bool
- Raises:
AssertionError – If the test fails based on the defined criteria.
- testing_model.test_via_llamacpp(model_path: str | bytes, test_dataset: str = 'data/test_ru.json', test_file: str = 'test.json', n_gpu_layers: int = -1, n_ctx: int = 2048, temperature: float = 0.7, max_tokens: int = 2048, test_func: ~typing.Callable = <function llamacpp_execute_test>, system_prompt: str | None = None) float[source]
Тестирование GGUF модели через llama.cpp с использованием передаваемой функции тестирования.
- Parameters:
model_path (str | bytes) – Путь к файлу модели GGUF.
test_dataset (str, optional) – Путь к JSON файлу с тестовыми данными. По умолчанию “data/test_ru.json”.
test_file (str, optional) – Путь для сохранения обработанного тестового файла. По умолчанию “test.json”.
n_gpu_layers (int, optional) – Количество слоёв для вычислений на GPU. По умолчанию -1 (все слои).
n_ctx (int, optional) – Размер окна контекста. По умолчанию 2048.
temperature (float, optional) – Температура сэмплинга. По умолчанию 0.7.
max_tokens (int, optional) – Максимальное количество генерируемых токенов. По умолчанию 2048.
test_func (Callable) – Функция, реализующая принцип тестирования.
system_prompt (Optional[str], optional) – Системный промпт для модели.
- Returns:
Значение точности (accuracy).
- Return type:
float