vision_unlearning.metrics.image_and_text

Classes

MetricImageTextSimilarity

!!! abstract "Usage Documentation"

Module Contents

class vision_unlearning.metrics.image_and_text.MetricImageTextSimilarity(/, **data: Any)

Bases: vision_unlearning.metrics.base.Metric

!!! abstract “Usage Documentation”: [Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__: The names of the class variables defined on the model.

__private_attributes__: Metadata about the private attributes of the model.

__signature__: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__: The core schema of the model.

__pydantic_custom_init__: Whether the model has a custom __init__ function.

__pydantic_decorators__: Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__: A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__: The name of the post-init method for the model, if defined.

__pydantic_root_model__: Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__: The names of fields explicitly set during instantiation.

__pydantic_private__: Values of private attributes set on the model instance.

metrics: List[Literal['clip']]

_clip_metric: torchmetrics.multimodal.clip_score.CLIPScore | None = None

model_post_init(__context: dict | None = None) → None: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

_load_image(image: PIL.Image.Image | numpy.ndarray | str) → torch.Tensor

score(image: PIL.Image.Image | numpy.ndarray | str, text: str) → Dict[str, float]

score_batch(images: List[PIL.Image.Image | numpy.ndarray | str], texts: List[str]) → List[Dict[str, float]]: Warning: this function don’t improve performance. The underlying libraries still work serially. Returns per-pair results in the same order.

score_batch_same_text(images: List[PIL.Image.Image | numpy.ndarray | str], text: str) → List[Dict[str, float]]

Batch CLIP scoring when all images share the same text prompt.

This is meaningfully faster than calling score() N times because the CLIP text encoder runs once for the shared text. Images are processed individually through the CLIP image processor (as in the serial path) but the text encoder forward pass is done only once.

Uses _clip_score_update from torchmetrics (private API, tested against torchmetrics 1.x) which returns per-pair scores as a 1-D tensor. The result is numerically equivalent to calling score() N times (max diff < 2e-5 on 512x512 SD1.4 images).

NOTE: _clip_score_update is a private torchmetrics symbol — if a future torchmetrics version removes it, fall back to the serial score() loop.

Parameters:

images – List of N images (PIL Image, np.ndarray, or file path).
text – Single text caption applied to all images.

Returns:

float}, one per image in input order.

Return type:

List of N dicts {‘clip’