Custom Benchmark
You can customize the behavior of Evaluator in ChEF for your requirements.
Evaluator
In ChEF, all evaluation pipelines are managed by the Evaluator
(src/ChEF/evaluator.py
) class. This class serves as the control center for evaluation tasks and incorporates various components, including a scenario, an instruction, an inferencer, and a metric. These components are defined through recipe configurations.
Key Components
Scenario: The scenario represents the evaluation dataset and task-specific details.
Instruction: Responsible for processing samples and generating queries.
Inferencer: Performs model inference on the dataset.
Metric: Evaluates model performance using defined metrics.
Evaluation Workflow
The evaluation process in ChEF follows a structured workflow:
Model and Data Loading: First, the model and evaluation dataset (scenario) are loaded.
Evaluator.evaluate
Method: The evaluation is initiated by calling theevaluate
method of theEvaluator
class.Inference with
inferencer.inference
: Theinferencer
is used to perform model inference. During dataset traversal, theInstructionHandler
processes each sample, generating queries that serve as inputs to the model.Results Saving: The output of the inference is saved in the specified
results_path
.Metric Evaluation: Finally, the
metric
evaluates the results file, calculating various performance metrics specific to the evaluation task.Output Evaluation Results: The final evaluation results are provided as output, allowing you to assess the model's performance.
Employ Your Model
In ChEF, you can employ your own custom models by following these steps. This guide will walk you through the process of integrating your model into ChEF.
Step 1: Prepare Your Model Files
1.1. Navigate to the src/ChEF/models/
folder in ChEF.
1.2. Paste all the necessary files for your custom model into this folder.
Step 2: Write the Test Model
2.1. Create a new Python file in models folder and name it something like test_your_model.py
.
2.2. In this file, you will need to inherit from the TestBase
class defined in src/ChEF/models/test_base.py
. The TestBase
class provides a set of interfaces that you should implement for testing your model.
Step 3: Test Your Model
3.1. Add your model in src/ChEF/models/__init__.py
3.2 Prepare your model configuration in src/config/ChEF/models/
. For example, the config for KOSMOS-2
(src/config/ChEF/models/kosmos2.yaml):
model_name: Kosmos2
model_path: ../model_zoo/kosmos/kosmos-2.pt
if_grounding: False # set True for detection and grounding evaluation
The config for KOSMOS-2 on detection tasks evaluation:
model_name: Kosmos2
model_path: ../model_zoo/kosmos/kosmos-2.pt
if_grounding: True
3.3 Use the provided recipes for evaluation.
python tools/eval.py --model_cfg configs/ChEF/models/your_model.yaml --recipe_cfg recipe_cfg
Instruction
In ChEF, the InstructionHandler
(src/ChEF/instruction/__init__.py
) class plays a central role in managing instructions for generating queries when iterating through the dataset in the inferencer
. These queries are then used as inputs to the model for various tasks.
ChEF supports three main query types: standard query
, query pool
, and multiturn query
. For each query type, various query statements are defined based on the dataset's task type.
Standard Query: Standard Query uses the first query defined in the query pool.
Query Pool: Query Pool specifies queries in the pool by assigned ids defined in configuration.
Multiturn Query: Multiturn Query can get different queries depending on the turn id, which are also defined in the query pool
For more details, refer to the src/ChEF/instruction/query.py
.
InstructionHandler
also supports generate in-context examples for query, using ice_retriever
(src/ChEF/instruction/ice_retriever/
). ChEF supports four types of ice_retrievers: random
, fixed
, topk_text
, and topk_img
. The generate_ices
function in InstructionHandler
class outputs several in-context examples for input query.
Employ Your Instruction
You can add special queries in Query Pool
, and define the assigned ids in recipe configuration to use the new queries. You can also define a new type of query by defining the query in src/ChEF/instruction/query.py
and adding a new function in InstructionHandler
.
Inferencer
In ChEF, the Inferencer
component is a crucial part of the system, responsible for model inference. ChEF offers a variety of pre-defined inferencers to cater to different needs. You can easily choose the appropriate inferencer by specifying the inferencer category and necessary settings in the recipe configuration. Additionally, users have the flexibility to define their custom inferencers.
Pre-Defined Inferencers
ChEF provides eight different inferencers that cover a range of use cases. You can effortlessly use the desired inferencer by specifying its category and required settings in the recipe configuration.
Custom Inferencers
For advanced users and specific requirements, ChEF offers the option to create custom inferencers. The basic structure of an inferencer is defined in the src/ChEF/inferencer/Direct.py
file (Direct_inferencer
). You can extend this structure to implement your custom inferencer logic.
class Your_inferencer(Direct_inferencer)
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
def inference(self, model, dataset):
predictions = []
# Step 1: build dataloader
dataloader = DataLoader(dataset, batch_size=self.batch_size, collate_fn=lambda batch: {key: [dict[key] for dict in batch] for key in batch[0]})
for batch in tqdm(dataloader, desc="Running inference"):
# Step 2: get input query
prompts = self.instruction_handler.generate(batch)
# Step 3: model outputs
outputs = model.generate(prompts)
# Step 4: save resuts
predictions = predictions + outputs
# Step 5: output file
self._after_inference_step(predictions)
Metric
In ChEF, the Metric
component plays a crucial role in evaluating and measuring the performance of models across various scenarios and protocols. ChEF offers a wide range of pre-defined metrics, each tailored to different evaluation needs. Detailed information about these metrics can be found in the src/ChEF/metric/__init__.py
file.
Custom Metrics
ChEF also allows users to define their custom metrics. The basic structure of a metric is defined in the src/ChEF/metric/utils.py
file (Base_Metric
). You can extend this structure to implement your custom metric logic.
class Your_metric(Base_metric):
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
def metric_func(self, answers):
'''
answers: List[sample], each sample is a dict
sample: {
'answer' : str,
'gt_answers' : str,
}
'''
# Evaluatation