vision_unlearning.benchmarks.I_care

I-CARE benchmark package.

This package contains all code for the I-CARE unlearning benchmark: - configuration.py: domain constants, type aliases, GUI_TO_BACKEND mapping - metadata.py: interference-per-pair / interference-per-entity helpers, InterferencePerEntity - metrics.py: per-entity interference metric functions - result_templates.py: ResultTemplate base class + all subclasses + registries - utils.py: image encoding/decoding helpers, SHAP serialization, error classes

All public symbols are re-exported here so callers can do:

from vision_unlearning.benchmarks.I_care import rt_name_to_class from vision_unlearning.benchmarks.I_care import InterferencePerEntity # etc.

Optional dependencies (seaborn, scikit-learn, shap) are used by the result templates and the SHAP utilities. They are not required for basic testbed / dataset work. Install them via: pip install vision-unlearning[testbed]

Submodules

Attributes

domain_unlearning_algorithm

domain_task

domain_attribute

domain_entity

domain_model

domain_mp

domain_me

domain_s

domain_l

type_task

type_unlearning_algorithm

type_model

type_mp

type_me

type_s

type_l

GUI_TO_BACKEND

unlearning_algorithm_to_epochs

s_to_direction

EMBEDDING_MODEL

EMBEDDING_DIM

rt_name_to_class

rt_name_to_params

Exceptions

InvalidAttributeTypeError

Inappropriate argument value (of correct type).

InsufficientSamplesError

Inappropriate argument value (of correct type).

Classes

InterferencePerEntity

!!! abstract "Usage Documentation"

ResultTemplate

!!! abstract "Usage Documentation"

ResultTemplateMetricMetricAlignment

Measures how strongly two MetricInterferencePerEntity metrics are correlated.

ResultTemplateMetricSimilarityAlignment

To what degree similar entities interfere more with each other.

ResultTemplateMetricSimilarityAlignmentMulti

Multi-input Single-output Regression Generalization of ResultTemplateMetricSimilarityAlignment (see also Appendix E, adapted from the multi-output setting).

ResultTemplateSignificantRelationshipNumerical

Measures whether two numerical attributes are significantly correlated.

ResultTemplateSignificantRelationshipCategorical

Statistical significance of the average MetricInterferencePerEntity across all

ResultTemplateCountSignificantRelationship

Number of significant relationships across all combinations of attributes and

ResultTemplateImplicitAssociationTest

Measures how the strength of automatic associations B between two pairs of

ResultTemplateMinimumCutInterference

Interprets a task as a directed weighted graph and computes the minimum cut separating two entities

ResultTemplateUnlearningVisualSummary

!!! abstract "Usage Documentation"

ResultTemplateInterferenceVisualSummary

Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best

ResultTemplateMatrix

!!! abstract "Usage Documentation"

ResultTemplateInterferenceMatrix

MetricInterferencePerEntityPair between each possible combination of two entities

ResultTemplateSimilarityMatrix

Similarities between each possible combination of two entities within a task.

ResultTemplateMethodComparisonByMetricEntity

Compares the distribution of one MetricInterferencePerEntity across multiple

Functions

convert_params_from_gui_to_backend(→ Dict[str, Any])

Convert GUI values to backend literal values.

get_interference_per_pair_path(→ str)

get_interference_per_pair(→ Dict[str, Dict[str, float]])

exists_interference_per_pair(→ bool)

save_interference_per_pair(→ None)

get_interference_per_pair_inverse(→ Dict[str, ...)

get_interference_per_entity_path(→ str)

get_interference_per_entity(→ List[Dict[str, Any]])

save_interference_per_entity(→ None)

choose_metric_column_interference_per_entity(→ str)

The columns of the interference per entity file are not named in a way that is easy to generate given unlearning_algorithm and interference_entity, so we need to search for the right one.

get_metadata_filtered(→ List[Dict[str, Any]])

get_metadata_filtered_path(→ str)

get_target_overwrite(→ Tuple[str, str])

@return preprocessed target, target_overwrite

get_generated_dataset_file(→ str)

find_worst_interfered(→ Tuple[str, float])

metric_of_worst_interfered(→ float)

is_worst_interfered_target(→ bool)

number_of_interfered_worse_than_target(→ int)

number_of_interfered_worse_than_threshold(→ int)

average_metric(→ float)

_encode_image_file(→ str)

Downsample / reduce resolution to limit size before encoding

_decode_image(→ io.BytesIO)

explanation_to_dict(→ Dict[str, Any])

Serialize a shap.Explanation to a plain dict (JSON-serializable).

dict_to_explanation(→ Any)

Deserialize a plain dict back to a shap.Explanation.

load_dino_model(→ Tuple[Any, Any, str])

Load DINOv2 model, transform pipeline, and device.

embed_image_with_dino(→ List[float])

Embed a single image using a pre-loaded DINOv2 model.

embed_forgetting_session(→ List[Dict[str, Any]])

Embed all images from one forgetting session (entity or baseline).

embed_forgetting_session_batched(→ List[Dict[str, Any]])

Embed all images for one forgetting session using batched GPU inference.

huggingface_dataset_file_exists(→ bool)

Checks if a specific file exists in a Hugging Face dataset repository.

huggingface_dataset_file_download(→ None)

Download a single file from a dataset in Hugging Face Hub.

huggingface_dataset_upload(folder_datasets, ...)

Supposes that a folder dataset_config exists in folder_datasets, and that it contains the dataset files

huggingface_dataset_file_upload(file_path, ...)

Upload a single file to a specific dataset config in Hugging Face Hub.

huggingface_dataset_download(folder_datasets, ...[, ...])

@param clean: If True, the folder will be deleted before downloading

jacc_metric_score(→ float)

Jaccard similarity between two entities, based on their attributes.

display_interesting_interferences(→ None)

Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best

analyze_relationship_regression(→ bool)

Test linear relationship between two numerical variables with significance test

analyze_relationship_category(→ bool)

analyze_relationship_numerical(→ bool)

Analyzes the relationship between a numerical attribute and a numerical metric

analyze_relationship_categorical(→ bool)

Analyzes the relationship between a categorical attribute and a numerical metric

analyze_correlation_between_pairwise_metrics(→ bool)

df1 and df2 are square DataFrames; index and cols are the same within both and among both

check_eval_results(→ float)

Check if the metric satisfy the EXPECTED threshold

Package Contents

vision_unlearning.benchmarks.I_care.domain_unlearning_algorithm = ['FADE', 'Munba', 'UCE']
vision_unlearning.benchmarks.I_care.domain_task = ['Breeds', 'Scenes', 'People']
vision_unlearning.benchmarks.I_care.domain_attribute
vision_unlearning.benchmarks.I_care.domain_entity
vision_unlearning.benchmarks.I_care.domain_model = ['Stable Diffusion 1.4']
vision_unlearning.benchmarks.I_care.domain_mp = ['Delta Clip', 'Delta Brisque', 'RMSE', 'SSIM']
vision_unlearning.benchmarks.I_care.domain_me = ['Emitter worst interfered brisque diff', 'Emitter worst interfered clip diff', 'Emitter worst...
vision_unlearning.benchmarks.I_care.domain_s = ['Clip Cosine Similarity', 'Jacc Similarity']
vision_unlearning.benchmarks.I_care.domain_l = ['Clip Embedding']
vision_unlearning.benchmarks.I_care.type_task
vision_unlearning.benchmarks.I_care.type_unlearning_algorithm
vision_unlearning.benchmarks.I_care.type_model
vision_unlearning.benchmarks.I_care.type_mp
vision_unlearning.benchmarks.I_care.type_me
vision_unlearning.benchmarks.I_care.type_s
vision_unlearning.benchmarks.I_care.type_l
vision_unlearning.benchmarks.I_care.GUI_TO_BACKEND
vision_unlearning.benchmarks.I_care.unlearning_algorithm_to_epochs
vision_unlearning.benchmarks.I_care.convert_params_from_gui_to_backend(params: Dict[str, Any]) Dict[str, Any][source]

Convert GUI values to backend literal values. Unknown keys are passed through unchanged. None stays None.

vision_unlearning.benchmarks.I_care.s_to_direction: Dict[type_s, type_direction]
vision_unlearning.benchmarks.I_care.get_interference_per_pair_path(task: Literal['scenes', 'objects', 'breeds', 'people'], index: int, method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, base_folder: str = 'assets') str[source]
vision_unlearning.benchmarks.I_care.get_interference_per_pair(task: Literal['scenes', 'objects', 'breeds', 'people'], index: int, method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, max_identities: int = 100, base_folder: str = 'assets') Dict[str, Dict[str, float]][source]
vision_unlearning.benchmarks.I_care.exists_interference_per_pair(task: Literal['scenes', 'objects', 'breeds', 'people'], index: int, method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, base_folder: str = 'assets') bool[source]
vision_unlearning.benchmarks.I_care.save_interference_per_pair(interference_per_pair: Dict[str, Dict[str, float]], task: Literal['scenes', 'objects', 'breeds', 'people'], index: int, method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, base_folder: str = 'assets') None[source]
vision_unlearning.benchmarks.I_care.get_interference_per_pair_inverse(task: Literal['scenes', 'objects', 'breeds', 'people'], index: int, method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, index_start: int = 0, max_identities: int = 100) Dict[str, Dict[str, float]][source]
vision_unlearning.benchmarks.I_care.get_interference_per_entity_path(task: Literal['scenes', 'objects', 'breeds', 'people'], base_folder: str = 'assets') str[source]
vision_unlearning.benchmarks.I_care.get_interference_per_entity(task: Literal['scenes', 'objects', 'breeds', 'people'], max_identities: int = 100, base_folder: str = 'assets') List[Dict[str, Any]][source]
vision_unlearning.benchmarks.I_care.save_interference_per_entity(task: Literal['scenes', 'objects', 'breeds', 'people'], metadata_filtered: List[Dict[str, Any]], base_folder: str = 'assets') None[source]
class vision_unlearning.benchmarks.I_care.InterferencePerEntity(/, **data: Any)[source]

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
base_folder: str = 'assets'
remote_repository_name: str = 'LeonardoBenitez/VisionUnlearningEvaluationTestbeds'
save_outputs: bool = True
recompute_if_exists: bool = False
_get_data_path_remote() str[source]
_get_data_path_local() str[source]
compute() List[Dict[str, Any]][source]
vision_unlearning.benchmarks.I_care.choose_metric_column_interference_per_entity(unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm, interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me, metric_cols: List[str]) str[source]

The columns of the interference per entity file are not named in a way that is easy to generate given unlearning_algorithm and interference_entity, so we need to search for the right one. We assume there is only one match, and we assert it. If there are no matches or more than one match, we raise an error.

The names look like this:

‘metric_distil_400_emitter_minus_receiver_worst_interfered_ssim (↓)’,

‘metric_distil_400_emitter_minus_receiver_number_of_interfered_worse_than_target_brisque_diff (↓)’, ‘metric_distil_400_emitter_minus_receiver_number_of_interfered_worse_than_target_clip_diff (↓)’, ‘metric_distil_400_emitter_minus_receiver_number_of_interfered_worse_than_target_rmse (↓)’, ‘metric_distil_400_emitter_minus_receiver_number_of_interfered_worse_than_target_ssim (↓)’, ‘metric_distil_400_emitter_minus_receiver_number_of_interfered_worse_than_zero_clip_diff (↓)’, ‘metric_distil_400_emitter_minus_receiver_average_brisque_diff (↓)’, ‘metric_distil_400_emitter_minus_receiver_average_clip_diff (↑)’, ‘metric_uce_000_emitter_minus_receiver_average_rmse (↓)’, ‘metric_munba_100_emitter_minus_receiver_average_ssim (↑)’,

TODO: these names are defined in 4. Compute interference per entity.ipynb. There should be a central way of defining them.

vision_unlearning.benchmarks.I_care.get_metadata_filtered(task: Literal['scenes', 'objects', 'breeds', 'people'], base_folder: str = 'assets') List[Dict[str, Any]][source]
vision_unlearning.benchmarks.I_care.get_metadata_filtered_path(task: Literal['scenes', 'objects', 'breeds', 'people'], base_folder: str = 'assets') str[source]
vision_unlearning.benchmarks.I_care.get_target_overwrite(task: Literal['scenes', 'objects', 'breeds', 'people'], method: Literal['munba', 'uce', 'distil'], target: str) Tuple[str, str][source]

@return preprocessed target, target_overwrite

vision_unlearning.benchmarks.I_care.get_generated_dataset_file(lora_state: Literal['on', 'off'], seed: int, prompt: str) str[source]
vision_unlearning.benchmarks.I_care.find_worst_interfered(interference_per_pair: dict, metric: str, is_worst_biggest: bool) Tuple[str, float][source]
vision_unlearning.benchmarks.I_care.metric_of_worst_interfered(interference_per_pair: dict, metric: str, is_worst_biggest: bool) float[source]
vision_unlearning.benchmarks.I_care.is_worst_interfered_target(interference_per_pair: dict, metric: str, is_worst_biggest: bool, target: str) bool[source]
vision_unlearning.benchmarks.I_care.number_of_interfered_worse_than_target(interference_per_pair: dict, metric: str, is_worst_biggest: bool, target: str) int[source]
vision_unlearning.benchmarks.I_care.number_of_interfered_worse_than_threshold(interference_per_pair: dict, metric: str, is_worst_biggest: bool, threshold: float) int[source]
vision_unlearning.benchmarks.I_care.average_metric(interference_per_pair: dict, metric: str) float[source]
vision_unlearning.benchmarks.I_care._encode_image_file(img_path: str, max_dim: int = 1024) str[source]

Downsample / reduce resolution to limit size before encoding

vision_unlearning.benchmarks.I_care._decode_image(image_data: str) io.BytesIO[source]
vision_unlearning.benchmarks.I_care.explanation_to_dict(expl: Any) Dict[str, Any][source]

Serialize a shap.Explanation to a plain dict (JSON-serializable).

vision_unlearning.benchmarks.I_care.dict_to_explanation(d: Dict[str, Any]) Any[source]

Deserialize a plain dict back to a shap.Explanation.

Requires ‘shap’ package (optional dependency — install with pip install shap or pip install vision-unlearning[testbed]).

exception vision_unlearning.benchmarks.I_care.InvalidAttributeTypeError[source]

Bases: ValueError

Inappropriate argument value (of correct type).

exception vision_unlearning.benchmarks.I_care.InsufficientSamplesError[source]

Bases: ValueError

Inappropriate argument value (of correct type).

vision_unlearning.benchmarks.I_care.EMBEDDING_MODEL = 'dinov2_vits14'
vision_unlearning.benchmarks.I_care.EMBEDDING_DIM = 384
vision_unlearning.benchmarks.I_care.load_dino_model(model_name: str = EMBEDDING_MODEL, force_device: str | None = None) Tuple[Any, Any, str][source]

Load DINOv2 model, transform pipeline, and device.

Heavy imports (torch, torchvision) happen here, not at module load.

Parameters:
  • model_name – DINOv2 model variant (default: ‘dinov2_vits14’ → 384-dim CLS).

  • force_device – If set, use this device string instead of auto-detecting.

Returns:

(model, transform, device) tuple. model: DINOv2 PyTorch model in eval mode, on device. transform: torchvision.transforms pipeline (resize → crop → normalize). device: device string (‘cuda’ or ‘cpu’).

vision_unlearning.benchmarks.I_care.embed_image_with_dino(image_path: str, model: Any, transform: Any, device: str) List[float][source]

Embed a single image using a pre-loaded DINOv2 model.

Parameters:
  • image_path – Path to a PNG/JPEG image on disk.

  • model – DINOv2 model (from load_dino_model()).

  • transform – torchvision transform (from load_dino_model()).

  • device – device string (‘cuda’ or ‘cpu’).

Returns:

384-dim CLS embedding as a plain Python list of floats. TODO: refactor into batched DataLoader for throughput (currently single-image).

vision_unlearning.benchmarks.I_care.embed_forgetting_session(dataset_folder: str, seeds: List[int], prompts: List[str], metadata_filtered: List[Dict[str, Any]], lora_state: Literal['on', 'off'], task: str, embed_image_fn: Callable[[str], List[float]] | None = None) List[Dict[str, Any]][source]

Embed all images from one forgetting session (entity or baseline).

Iterates over all (seed, prompt) combinations and embeds each matching image. Images that do not exist on disk are skipped with a warning.

Parameters:
  • dataset_folder – Local directory containing the generated images.

  • seeds – List of generation seeds (e.g. [0, 1, 2, 3]).

  • prompts – Full prompt strings (e.g. “An image of Colin Powell”).

  • metadata_filtered – Metadata list used to map prompt index → entity name. metadata_filtered[i][‘name’] corresponds to prompts[i].

  • lora_state – ‘on’ for unlearned model images, ‘off’ for baseline images.

  • task – Task name, passed to get_target_preprocessed().

  • embed_image_fn – Injectable embedding function (image_path → [float]). Required — there is no default. Pass embed_image_with_dino (partially applied) or a test stub.

Returns:

[
{

‘prompted_entity’: str, # entity name (preprocessed) ‘seed’: int, ‘prompt’: str, ‘embedding’: List[float], # 384-dim CLS embedding

]

Return type:

List of records

vision_unlearning.benchmarks.I_care.embed_forgetting_session_batched(dataset_folder: str, seeds: List[int], prompts: List[str], metadata_filtered: List[Dict[str, Any]], lora_state: Literal['on', 'off'], task: str, model: Any, transform: Any, device: str, batch_size: int = 32) List[Dict[str, Any]][source]

Embed all images for one forgetting session using batched GPU inference.

More efficient than embed_forgetting_session() for large image sets. Collects all (path, metadata) pairs first, then processes in batches via a simple loop, amortising Python overhead and maximising GPU utilisation.

Parameters:
  • dataset_folder – Local directory containing the generated images.

  • seeds – List of generation seeds used.

  • prompts – Full prompt strings.

  • metadata_filtered – Metadata list: metadata_filtered[i][‘name’] → prompts[i].

  • lora_state – ‘on’ for unlearned model, ‘off’ for baseline.

  • task – Task name, passed to get_target_preprocessed().

  • model – DINOv2 model (from load_dino_model()), on device, in eval mode.

  • transform – torchvision transform pipeline (from load_dino_model()).

  • device – Torch device string (‘cuda’ or ‘cpu’).

  • batch_size – Number of images per GPU forward pass (default 32). TODO: tune based on VRAM; 32 images × 224×224 ≈ 220MB VRAM.

Returns:

Same structure as embed_forgetting_session().

vision_unlearning.benchmarks.I_care.huggingface_dataset_file_exists(dataset_repository: str, dataset_path: str, token: str | None) bool[source]

Checks if a specific file exists in a Hugging Face dataset repository.

Parameters:
  • dataset_repository – e.g. “username/dataset_name”

  • dataset_path – full path in repo (e.g. “config/file.jsonl”)

  • token – HF token (can be None for public repos)

Returns:

True if file exists, False otherwise

Efficiently checks if a file exists in a Hugging Face dataset repo without listing the entire repository. Could be done more efficiently if we use a new version of the lib, see https://chatgpt.com/share/69edd525-d008-832d-8a0c-ec4560a4fe3b

vision_unlearning.benchmarks.I_care.huggingface_dataset_file_download(folder_datasets: str, dataset_repository: str, file_path: str, token: str, folder_cache: str = '/tmp/huggingface_cache') None[source]

Download a single file from a dataset in Hugging Face Hub.

Parameters:
  • folder_datasets – Local directory where datasets are stored.

  • dataset_repository – Hugging Face dataset repository ID

  • file_path – Full path of the file within the repository (e.g., “config/data.jsonl”)

  • token – Hugging Face authentication token

  • folder_cache – Cache directory for downloads

The file will be saved at os.path.join(folder_datasets, file_path)

vision_unlearning.benchmarks.I_care.huggingface_dataset_upload(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str)[source]

Supposes that a folder dataset_config exists in folder_datasets, and that it contains the dataset files

vision_unlearning.benchmarks.I_care.huggingface_dataset_file_upload(file_path: str, dataset_repository: str, dataset_path: str, token: str)[source]

Upload a single file to a specific dataset config in Hugging Face Hub. @param dataset_path: full name of the file in the repository, including the config folder (e.g., “my_config/my_file.jsonl”)

vision_unlearning.benchmarks.I_care.huggingface_dataset_download(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str, clean: bool = False, folder_cache: str = '/tmp/huggingface_cache', clean_cache: bool = False)[source]

@param clean: If True, the folder will be deleted before downloading

class vision_unlearning.benchmarks.I_care.ResultTemplate(/, **data: Any)[source]

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

recompute_if_exists: bool = False
save_outputs: bool = True
base_folder: str = 'assets'
remote_repository_name: str = 'LeonardoBenitez/VisionUnlearningEvaluationTestbeds'
abstract _serialize_parameters() str[source]
_get_data_path_remote() str[source]
_get_data_path_local() str[source]
classmethod _fig_to_bytes(fig: matplotlib.figure.Figure) bytes[source]
abstract _compute_from_scratch() dict | list[source]
compute() dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateMetricMetricAlignment(/, **data: Any)[source]

Bases: ResultTemplate

Measures how strongly two MetricInterferencePerEntity metrics are correlated.

Arguments: m, t, u, m_e1, m_e2. Result: Pearson p-value, Spearman p-value, Pearson correlation, scatter plot. Interpretation: quantitative; the higher the correlation, the lower the need to calculate both metrics for this specific choice of m, t, and u.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_entity_1: vision_unlearning.benchmarks.I_care.configuration.type_me
interference_entity_2: vision_unlearning.benchmarks.I_care.configuration.type_me
class vision_unlearning.benchmarks.I_care.ResultTemplateMetricSimilarityAlignment(/, **data: Any)[source]

Bases: ResultTemplate

To what degree similar entities interfere more with each other.

Formalized in ap:prediction, which also proposes its natural expansion to a multivariable and non-linear predictive regression.

Arguments: m, t, u, m_p, s. Result: Pearson p-value, Spearman p-value, Pearson correlation, scatter plot. Interpretation: quantitative; if this value is high, interference between two entities can be approximated by similarity (which is cheaper to compute for any new entity). Equivalently, the amount of “transmission wires” can be summarized by this single similarity function.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp
similarity_metric: vision_unlearning.benchmarks.I_care.configuration.type_s
significance_threshold: float = 0.05
_serialize_parameters() str[source]
classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch(exclude_diagonal: bool = True) dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateMetricSimilarityAlignmentMulti(/, **data: Any)[source]

Bases: ResultTemplate

Multi-input Single-output Regression Generalization of ResultTemplateMetricSimilarityAlignment (see also Appendix E, adapted from the multi-output setting). Also, the interpretability and feature engineering aspects are improved.

We consider a fixed model (m), task (t), and unlearning method (u), which are omitted for brevity.

The objective is to quantify whether interference between entities is aligned with their similarity, i.e., to what degree similar entities interfere more with each other.

For every ordered pair of distinct entities (e_i, e_j in t) with (i

eq j), we observe several SimilarityBetweenEntities measures, indexed by superscripts (ell = 1, 2, dots, |S|), and a single MetricInterferencePerEntityPair target (m_p(e_i,e_j)).

Each ordered pair ((e_i, e_j)) is therefore treated as one data point with feature vector

$$ mathbf{X}_{ij} = ig( s^{(1)}(e_i, e_j), dots, s^{(|S|)}(e_i, e_j) ig) $$

and scalar target

$$ Y_{ij} = m_p(e_i, e_j). $$

The resulting dataset is

$$ mathcal{D} = { (mathbf{X}_{ij}, Y_{ij}) mid e_i, e_j in t,i

eq j

}. $$

From this dataset, a regression model can be estimated using standard regression procedures with appropriate validation.

In the linear case,

$$ Y_{ij} = eta_0 + sum_{ell=1}^{|S|} eta_{ell} X^{(ell)}_{ij} +

arepsilon_{ij}.

$$

Given a specific entity (e_i) whose removal is considered, similarities

$$ X^{(ell)}_{ij} = s^{(ell)}(e_i, e_j) $$

can be computed for all remaining entities (e_j in t). The fitted model then yields predictions

$$ hat{Y}_{ij} = f(mathbf{X}_{ij}), $$

which approximate the expected interference on each receiver entity.

Furthermore, the concept of similarity may also encode several forms of practical data engineering. For example, one may define: - a distinct similarity function for each attribute, or - a similarity function based only on the attributes of the emitter entity.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp
similarity_metric_list: List[vision_unlearning.benchmarks.I_care.configuration.type_s]
significance_threshold: float = 0.05
include_attribute_diff_similarity: bool = True
include_attribute_value_similarity: bool = True
regression_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_regression_algorithm = 'linear_regression'
random_state: int = 42
test_size: float = 0.3
_serialize_parameters() str[source]
_get_partial_path_local()[source]
classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 15), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch(exclude_diagonal: bool = True, entity_col: str = 'name') dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateSignificantRelationshipNumerical(/, **data: Any)[source]

Bases: ResultTemplate

Measures whether two numerical attributes are significantly correlated.

Formalized in ap:rt_relationship.

Arguments: m, t, u, m_e, a. Result: Pearson p-value, Spearman p-value, Pearson correlation, scatter plot. Interpretation: qualitative; the researcher should decide if it is ethical or desirable that this attribute propagates interferences.

Pearson test

Use when you want to measure a linear relationship. Assumptions:

  • Both variables are continuous

  • Relationship is linear

  • Bivariate normality (both jointly Gaussian)

  • Homoscedasticity (constant variance)

  • No strong outliers (very sensitive)

Detects: linear correlation only Fails when: relationship is monotonic but non-linear, or heavy outliers exist

Spearman test

Use when you want to measure a monotonic relationship (not necessarily linear) or data is non-Gaussian. Assumptions:

  • Variables are at least ordinal

  • Relationship is monotonic (increasing or decreasing)

  • No distributional assumptions

  • Robust to outliers

Detects: any monotonic trend (linear or curved) Fails when: relationship is non-monotonic (e.g., U-shaped)

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me
attribute: str
significance_threshold: float = 0.05
_get_data_path_remote() str[source]
classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch() dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateSignificantRelationshipCategorical(/, **data: Any)[source]

Bases: ResultTemplate

Statistical significance of the average MetricInterferencePerEntity across all entities, when grouped by each of its values.

Formalized in ap:rt_relationship.

Arguments: m, t, u, m_e, a, optional filterAttributeValue. Result: ANOVA p-value, Kruskal-Wallis p-value, average value of m_e grouped by each value of a, grouped boxplot. Interpretation: qualitative; similar to SignificantRelationshipNumerical. The optional argument filterAttributeValue restricts which emitter entities are included, allowing the analysis of interference flow distribution, such as whether politicians cause more interference to other politicians than artists cause to other artists.

ANOVA

Use when you want to test if group means differ across 3+ independent groups under parametric assumptions. Assumptions:

  • Dependent variable is continuous

  • Groups are independent

  • Normality within each group

  • Homoscedasticity (equal variances)

  • No strong outliers

Hypothesis:
  • H₀: all group means are equal

  • H₁: at least one mean differs

Detects: differences in means Fails when: heavy skew, unequal variances, small n with non-Gaussian data

Kruskal-Wallis

Use when you want to test if group distributions differ without parametric assumptions. Assumptions:

  • Dependent variable is ordinal or continuous

  • Groups are independent

  • Same shaped distributions (only medians should differ for clean interpretation)

  • No normality or equal-variance requirement

Hypothesis:
  • H₀: all group distributions are equal

  • H₁: at least one group differs

Detects: differences in medians / distributions Fails when: distributions differ in shape (then result is ambiguous)

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me
attribute: str
attribute_value: str | int | None = None
min_samples_per_category: int = 5
significance_threshold: float = 0.05
_get_data_path_remote() str[source]
classmethod plot(data: dict, extra_title: str = '', figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch() dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateCountSignificantRelationship(/, **data: Any)[source]

Bases: ResultTemplate

Number of significant relationships across all combinations of attributes and MetricInterferencePerEntity.

Arguments: m, t, u, list of m_e, list of a. Result: integer, list of significances. Interpretation: quantitative; the lower the better. Since the attributes for which it is ethical to propagate interference are constant across all models and methods, a higher value directly implies a higher number of ethical violations, that is, a larger number of “transmission wires” in a given task effectively used by this method and model.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm_list: List[vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm]
interference_entity_list: List[vision_unlearning.benchmarks.I_care.configuration.type_me]
attribute_list: List[str]
top_n: int = 10
_serialize_parameters() str[source]
classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch() dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateImplicitAssociationTest(/, **data: Any)[source]

Bases: ResultTemplate

Measures how the strength of automatic associations B between two pairs of entities changes after unlearning.

Arguments: m, t, u, a_1, a_2, l. Result: |a| x |a| real-valued tensor ΔB. Interpretation: qualitative; a human should decide whether it is ethical or desirable for the unlearning process to cause this change in implicit association between the chosen attributes.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
attribute_1: str
attribute_2: str
latent_embedding: vision_unlearning.benchmarks.I_care.configuration.type_l
class vision_unlearning.benchmarks.I_care.ResultTemplateMinimumCutInterference(/, **data: Any)[source]

Bases: ResultTemplate

Interprets a task as a directed weighted graph and computes the minimum cut separating two entities As a consequence of the max-flow min-cut theorem, it directly follows that the minimum cut is the smallest influence whose removal eliminates every directed influence path from $e_1$ to $e_2$. Based on this, we conjecture that if we need to unlearn $e_1$ while minimizing harm to $e_2$, then the ideal intervention in the unlearning process is to increase the preservation of the emitter-side nodes. More intuitively, we can think of this intervention as “blocking the interference path,” as performed in electrical circuits to protect sensitive components (such as ground partitioning, shielding traces, among others. Arguments: $m$, $t$, $u$, $e_1$, $e_2$, $m_p$. Result: list of entities (corresponding to the emitter-side nodes). Interpretation: qualitative; small set of nodes through which most of the interference from $e_1$ propagates to $e_2$.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp
entity_1: str
entity_2: str
class vision_unlearning.benchmarks.I_care.ResultTemplateUnlearningVisualSummary(/, **data: Any)[source]

Bases: ResultTemplate

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
class vision_unlearning.benchmarks.I_care.ResultTemplateInterferenceVisualSummary(/, **data: Any)[source]

Bases: ResultTemplate

Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp
entity: str | None = None
entity_index: int | None = None
seed: int = 42
images_max_dim: int = 124
_resolve_entity()[source]

Ensures both entity andentity_index are filled. Modifies in place At the end, both are set and consistent with each other

_serialize_parameters() str[source]
classmethod plot(data: dict, figsize: Tuple[int, int] | None = (18, 4), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch()[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateMatrix(/, **data: Any)[source]

Bases: ResultTemplate

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

metric_key_name: str
classmethod plot_make_title(data: dict) str[source]
Abstractmethod:

classmethod plot(data: dict, figsize: Tuple[int, int] | None = None, cmap: str = 'viridis', title: str = '', xlabel: str = 'Receiver entity', ylabel: str = 'Emitter entity', return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateInterferenceMatrix(/, **data: Any)[source]

Bases: ResultTemplateMatrix

MetricInterferencePerEntityPair between each possible combination of two entities within a task.

Arguments: m, t, u, m_p. Result: |t| x |t| real-valued tensor. Interpretation: qualitative; visual patterns may be spotted, especially when rearranging indices in a meaningful manner (for example, grouping professions together). Further quantitative values may be derived, such as the average value or the ratio between the diagonal-average value and the non-diagonal-average value.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm
interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp
metric_key_name: str = 'interference_pair'
_serialize_parameters() str[source]
classmethod plot_make_title(data: dict) str[source]
_compute_from_scratch()[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateSimilarityMatrix(/, **data: Any)[source]

Bases: ResultTemplateMatrix

Similarities between each possible combination of two entities within a task. * Arguments: $m, t, s$ * Result: $|t| imes |t|$ real-valued tensor * Interpretation: qualitative; visual patterns may be spotted, similarly to InterferenceMatrix.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'scenes'
similarity_metric: vision_unlearning.benchmarks.I_care.configuration.type_s = 'clip'
metric_key_name: str = 'similarity_metric'
_serialize_parameters() str[source]
_get_partial_path_local()[source]
classmethod plot_make_title(data: dict) str[source]
_compute_from_scratch() dict[source]
class vision_unlearning.benchmarks.I_care.ResultTemplateMethodComparisonByMetricEntity(/, **data: Any)[source]

Bases: ResultTemplate

Compares the distribution of one MetricInterferencePerEntity across multiple unlearning methods.

  • Arguments: m, t, me, list of u

  • Result: per-method mean, median, std, n, values; box plot

  • Interpretation: lower or higher depending on me direction. Use to rank methods by a single interference-per-entity metric.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'
task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'
interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me
unlearning_algorithm_list: List[vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm]
_serialize_parameters() str[source]
classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None[source]
_compute_from_scratch() dict[source]
vision_unlearning.benchmarks.I_care.rt_name_to_class
vision_unlearning.benchmarks.I_care.rt_name_to_params
vision_unlearning.benchmarks.I_care.jacc_metric_score(entity_1: str, entity_2: str, metadata_filtered: List[Dict[str, Any]], entity_col: str = 'name') float[source]

Jaccard similarity between two entities, based on their attributes. Each attribute (column) contributes between 0 and 1 to the similarity We do not know the types and ranges of the attributes beforehand. For each attribute, both values for the two entities must be non-NaN and of the same type, otherwise we ignore that attribute (contribution 0). The calculation for each attribute is as follows: * If the attribute is categorical (str or bool), the contribution is 1 if the two entities have the same value for that attribute, and 0 otherwise. * If the attribute is numerical, and both values are between 0 and 1, the contribution is 1 - abs(value_1 - value_2) * If the attribute is numerical, and both values are between 1 and 100, the contribution is 1 - abs(value_1 - value_2) / 100 * else, the contribution is 0 (we do not know how to handle it, so we ignore it)

vision_unlearning.benchmarks.I_care.display_interesting_interferences(metadata_filtered: List[Dict[str, Any]], interference_per_pair: Dict[str, Dict[str, float]], index: int, task: Literal['scenes', 'objects', 'breeds', 'people'], method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, metric: str, is_worst_biggest: bool, seed: int = 42, save_path: str | None = None) None[source]

Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best @param metadata_filtered: should be appropriate for this task (this is not verified inside the function) @param interference_per_pair: should be appropriate for this task+index+method+num_train_epochs (this is not verified inside the function) @param index: identities the target

The combination of task+index+method+num_train_epochs identifies a unique unlearned model

vision_unlearning.benchmarks.I_care.analyze_relationship_regression(df: pandas.DataFrame, x: str, y: str, expected_positive: bool = True, plot: bool = True) bool[source]

Test linear relationship between two numerical variables with significance test and direction check.

Returns True only if:
  1. the slope is statistically significant (p < 0.05)

  2. the slope sign matches expectation.

vision_unlearning.benchmarks.I_care.analyze_relationship_category(df, metric: str, category: str, plot: bool = True) bool[source]
vision_unlearning.benchmarks.I_care.analyze_relationship_numerical(df: pandas.DataFrame, attribute: str, metric: str, plot: bool = False, plot_only_significant: bool = False) bool[source]

Analyzes the relationship between a numerical attribute and a numerical metric @param df: interference_per_entity; assumes df[attribute] and df[metric] are numerical @param plot: whether to plot the results @param plot_only_significant: whether to plot only significant relationships; Only applies if plot=True @return: whether any significant relationship was found

Pearson test

Use when you want to measure a linear relationship.

Assumptions: * Both variables are continuous * Relationship is linear * Bivariate normality (both jointly Gaussian) * Homoscedasticity (constant variance) * No strong outliers (very sensitive)

Detects: linear correlation only Fails when: relationship is monotonic but non-linear, or heavy outliers exist


Spearman test

Use when you want to measure a monotonic relationship (not necessarily linear) or data is non-Gaussian.

Assumptions: * Variables are at least ordinal * Relationship is monotonic (increasing or decreasing) * No distributional assumptions * Robust to outliers

Detects: any monotonic trend (linear or curved) Fails when: relationship is non-monotonic (e.g., U-shaped)

vision_unlearning.benchmarks.I_care.analyze_relationship_categorical(df: pandas.DataFrame, attribute: str, metric: str, plot: bool = False, plot_only_significant: bool = False, show_axhline: float | None = None, min_samples_per_category: int = 5, extra_title: str = '') bool[source]

Analyzes the relationship between a categorical attribute and a numerical metric @param df: interference_per_entity; assumes df[attribute] is categorical and df[metric] is numerical @param plot: whether to plot the results @param plot_only_significant: whether to plot only significant relationships; Only applies if plot=True @param show_axhline: if provided, shows a horizontal line at this y-value; Only applies if plot=True @return: whether any significant relationship was found


ANOVA (f_oneway)

Use when you want to test if group means differ across 3+ independent groups under parametric assumptions.

Assumptions: * Dependent variable is continuous * Groups are independent * Normality within each group * Homoscedasticity (equal variances) * No strong outliers

Hypothesis: * H₀: all group means are equal * H₁: at least one mean differs

Detects: differences in means Fails when: heavy skew, unequal variances, small n with non-Gaussian data


Kruskal-Wallis (kruskal)

Use when you want to test if group distributions differ without parametric assumptions.

Assumptions: * Dependent variable is ordinal or continuous * Groups are independent * Same shaped distributions (only medians should differ for clean interpretation) * No normality or equal-variance requirement

Hypothesis: * H₀: all group distributions are equal * H₁: at least one group differs

Detects: differences in medians / distributions Fails when: distributions differ in shape (then result is ambiguous)

vision_unlearning.benchmarks.I_care.analyze_correlation_between_pairwise_metrics(df1: pandas.DataFrame, df2: pandas.DataFrame, metric1_name: str, metric2_name: str, exclude_diagonal: bool = True, plot=True, plot_only_significant=True) bool[source]

df1 and df2 are square DataFrames; index and cols are the same within both and among both

vision_unlearning.benchmarks.I_care.check_eval_results(eval_results, name, threshold: float, operator: Literal['gt', 'lt']) float[source]

Check if the metric satisfy the EXPECTED threshold