vision_unlearning.benchmarks.I_care.result_templates

Attributes

`shap`
`logger`
`rt_name_to_class`
`rt_name_to_params`

Classes

`ResultTemplate`	!!! abstract "Usage Documentation"
`ResultTemplateMetricMetricAlignment`	Measures how strongly two MetricInterferencePerEntity metrics are correlated.
`ResultTemplateMetricSimilarityAlignment`	To what degree similar entities interfere more with each other.
`ResultTemplateMetricSimilarityAlignmentMulti`	Multi-input Single-output Regression Generalization of ResultTemplateMetricSimilarityAlignment (see also Appendix E, adapted from the multi-output setting).
`ResultTemplateSignificantRelationshipNumerical`	Measures whether two numerical attributes are significantly correlated.
`ResultTemplateSignificantRelationshipCategorical`	Statistical significance of the average MetricInterferencePerEntity across all
`ResultTemplateCountSignificantRelationship`	Number of significant relationships across all combinations of attributes and
`ResultTemplateImplicitAssociationTest`	Measures how the strength of automatic associations B between two pairs of
`ResultTemplateMinimumCutInterference`	Interprets a task as a directed weighted graph and computes the minimum cut separating two entities
`ResultTemplateUnlearningVisualSummary`	!!! abstract "Usage Documentation"
`ResultTemplateInterferenceVisualSummary`	Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best
`ResultTemplateMatrix`	!!! abstract "Usage Documentation"
`ResultTemplateInterferenceMatrix`	MetricInterferencePerEntityPair between each possible combination of two entities
`ResultTemplateSimilarityMatrix`	Similarities between each possible combination of two entities within a task.
`ResultTemplateMethodComparisonByMetricEntity`	Compares the distribution of one MetricInterferencePerEntity across multiple
`ResultTemplateEmbeddingUnlearningProfile`	Embedding-space profile of one unlearning event (task, method, entity).
`ResultTemplateEmbeddingForgettingEfficiency`	Embedding-space forgetting efficiency distribution for one (task, method).

Functions

`jacc_metric_score`(→ float)	Jaccard similarity between two entities, based on their attributes.
`display_interesting_interferences`(→ None)	Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best
`analyze_relationship_regression`(→ bool)	Test linear relationship between two numerical variables with significance test
`analyze_relationship_category`(→ bool)
`analyze_relationship_numerical`(→ bool)	Analyzes the relationship between a numerical attribute and a numerical metric
`analyze_relationship_categorical`(→ bool)	Analyzes the relationship between a categorical attribute and a numerical metric
`analyze_correlation_between_pairwise_metrics`(→ bool)	df1 and df2 are square DataFrames; index and cols are the same within both and among both
`check_eval_results`(→ float)	Check if the metric satisfy the EXPECTED threshold

Module Contents

vision_unlearning.benchmarks.I_care.result_templates.shap = None

vision_unlearning.benchmarks.I_care.result_templates.logger

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplate(/, **data: Any)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”: [Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__: The names of the class variables defined on the model.

__private_attributes__: Metadata about the private attributes of the model.

__signature__: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__: The core schema of the model.

__pydantic_custom_init__: Whether the model has a custom __init__ function.

__pydantic_decorators__: Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__: A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__: The name of the post-init method for the model, if defined.

__pydantic_root_model__: Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__: The names of fields explicitly set during instantiation.

__pydantic_private__: Values of private attributes set on the model instance.

recompute_if_exists: bool = False

save_outputs: bool = True

upload_if_recomputed: bool = False

base_folder: str = 'assets'

remote_repository_name: str = 'LeonardoBenitez/VisionUnlearningEvaluationTestbeds'

abstract _serialize_parameters() → str

_get_data_path_remote() → str

_get_data_path_local() → str

classmethod _fig_to_bytes(fig: matplotlib.figure.Figure) → bytes

abstract _compute_from_scratch() → dict | list

compute() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateMetricMetricAlignment(/, **data: Any)

Bases: ResultTemplate

Measures how strongly two MetricInterferencePerEntity metrics are correlated.

Arguments: m, t, u, m_e1, m_e2. Result: Pearson p-value, Spearman p-value, Pearson correlation, scatter plot. Interpretation: quantitative; the higher the correlation, the lower the need to calculate both metrics for this specific choice of m, t, and u.

Extended use: Passing interference_entity_1="Forget clip diff" and interference_entity_2="Retain average clip diff" produces a forget/retain tradeoff scatter. The class method plot_multi_method() overlays results for several methods on one axes, enabling visual comparison of method operating regions (e.g. equalization verification and Pareto-style analysis).

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_entity_1: vision_unlearning.benchmarks.I_care.configuration.type_me

interference_entity_2: vision_unlearning.benchmarks.I_care.configuration.type_me

significance_threshold: float = 0.05

_serialize_parameters() → str

classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False, annotate_top_n: int = 5) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

Single-method scatter with regression line.

Top-N outliers (by absolute residual from the regression) are labelled with the entity name.

classmethod plot_multi_method(method_data: Dict[str, dict], figsize: Tuple[int, int] = (7, 6), return_fig: bool = False, show_means: bool = True, annotate_top_n: int = 3) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

Overlay scatter for multiple methods on one plot.

Useful for visualising method operating regions (e.g. equalization verification, Pareto-style analysis).

Parameters:

method_data – Mapping from method name to the dict returned by compute().
show_means – If True, draw a diamond marker at the per-method centroid.
annotate_top_n – Number of per-method outliers (farthest from centroid) to annotate.

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateMetricSimilarityAlignment(/, **data: Any)

Bases: ResultTemplate

To what degree similar entities interfere more with each other.

Formalized in ap:prediction, which also proposes its natural expansion to a multivariable and non-linear predictive regression.

Arguments: m, t, u, m_p, s. Result: Pearson p-value, Spearman p-value, Pearson correlation, scatter plot. Interpretation: quantitative; if this value is high, interference between two entities can be approximated by similarity (which is cheaper to compute for any new entity). Equivalently, the amount of “transmission wires” can be summarized by this single similarity function.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp

similarity_metric: vision_unlearning.benchmarks.I_care.configuration.type_s

significance_threshold: float = 0.05

_serialize_parameters() → str

classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch(exclude_diagonal: bool = True) → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateMetricSimilarityAlignmentMulti(/, **data: Any)

Bases: ResultTemplate

Multi-input Single-output Regression Generalization of ResultTemplateMetricSimilarityAlignment (see also Appendix E, adapted from the multi-output setting). Also, the interpretability and feature engineering aspects are improved.

—

We consider a fixed model (m), task (t), and unlearning method (u), which are omitted for brevity.

The objective is to quantify whether interference between entities is aligned with their similarity, i.e., to what degree similar entities interfere more with each other.

For every ordered pair of distinct entities (e_i, e_j in t) with (i

eq j), we observe several SimilarityBetweenEntities measures, indexed by superscripts (ell = 1, 2, dots, |S|), and a single MetricInterferencePerEntityPair target (m_p(e_i,e_j)).

Each ordered pair ((e_i, e_j)) is therefore treated as one data point with feature vector

$$ mathbf{X}_{ij} = ig( s^{(1)}(e_i, e_j), dots, s^{(|S|)}(e_i, e_j) ig) $$

and scalar target

$$ Y_{ij} = m_p(e_i, e_j). $$

The resulting dataset is

$$ mathcal{D} = { (mathbf{X}_{ij}, Y_{ij}) mid e_i, e_j in t,i

eq j

}. $$

From this dataset, a regression model can be estimated using standard regression procedures with appropriate validation.

In the linear case,

$$ Y_{ij} = eta_0 + sum_{ell=1}^{|S|} eta_{ell} X^{(ell)}_{ij} +

arepsilon_{ij}.

$$

Given a specific entity (e_i) whose removal is considered, similarities

$$ X^{(ell)}_{ij} = s^{(ell)}(e_i, e_j) $$

can be computed for all remaining entities (e_j in t). The fitted model then yields predictions

$$ hat{Y}_{ij} = f(mathbf{X}_{ij}), $$

which approximate the expected interference on each receiver entity.

Furthermore, the concept of similarity may also encode several forms of practical data engineering. For example, one may define: - a distinct similarity function for each attribute, or - a similarity function based only on the attributes of the emitter entity.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp

similarity_metric_list: List[vision_unlearning.benchmarks.I_care.configuration.type_s]

significance_threshold: float = 0.05

include_attribute_diff_similarity: bool = True

include_attribute_value_similarity: bool = True

regression_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_regression_algorithm = 'linear_regression'

random_state: int = 42

test_size: float = 0.3

_serialize_parameters() → str

_get_partial_path_local()

classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 15), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch(exclude_diagonal: bool = True, entity_col: str = 'name') → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateSignificantRelationshipNumerical(/, **data: Any)

Bases: ResultTemplate

Measures whether two numerical attributes are significantly correlated.

Formalized in ap:rt_relationship.

Arguments: m, t, u, m_e, a. Result: Pearson p-value, Spearman p-value, Pearson correlation, scatter plot. Interpretation: qualitative; the researcher should decide if it is ethical or desirable that this attribute propagates interferences.

Pearson test

Use when you want to measure a linear relationship. Assumptions:

Both variables are continuous

Relationship is linear

Bivariate normality (both jointly Gaussian)

Homoscedasticity (constant variance)

No strong outliers (very sensitive)

Detects: linear correlation only Fails when: relationship is monotonic but non-linear, or heavy outliers exist

Spearman test

Use when you want to measure a monotonic relationship (not necessarily linear) or data is non-Gaussian. Assumptions:

Variables are at least ordinal

Relationship is monotonic (increasing or decreasing)

No distributional assumptions

Robust to outliers

Detects: any monotonic trend (linear or curved) Fails when: relationship is non-monotonic (e.g., U-shaped)

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me

attribute: str

significance_threshold: float = 0.05

_get_data_path_remote() → str

classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateSignificantRelationshipCategorical(/, **data: Any)

Bases: ResultTemplate

Statistical significance of the average MetricInterferencePerEntity across all entities, when grouped by each of its values.

Formalized in ap:rt_relationship.

Arguments: m, t, u, m_e, a, optional filterAttributeValue. Result: ANOVA p-value, Kruskal-Wallis p-value, average value of m_e grouped by each value of a, grouped boxplot. Interpretation: qualitative; similar to SignificantRelationshipNumerical. The optional argument filterAttributeValue restricts which emitter entities are included, allowing the analysis of interference flow distribution, such as whether politicians cause more interference to other politicians than artists cause to other artists.

ANOVA

Use when you want to test if group means differ across 3+ independent groups under parametric assumptions. Assumptions:

Dependent variable is continuous

Groups are independent

Normality within each group

Homoscedasticity (equal variances)

No strong outliers

Hypothesis:

H₀: all group means are equal
H₁: at least one mean differs

Detects: differences in means Fails when: heavy skew, unequal variances, small n with non-Gaussian data

Kruskal-Wallis

Use when you want to test if group distributions differ without parametric assumptions. Assumptions:

Dependent variable is ordinal or continuous

Groups are independent

Same shaped distributions (only medians should differ for clean interpretation)

No normality or equal-variance requirement

Hypothesis:

H₀: all group distributions are equal
H₁: at least one group differs

Detects: differences in medians / distributions Fails when: distributions differ in shape (then result is ambiguous)

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me

attribute: str

attribute_value: str | int | None = None

min_samples_per_category: int = 5

significance_threshold: float = 0.05

_get_data_path_remote() → str

classmethod plot(data: dict, extra_title: str = '', figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateCountSignificantRelationship(/, **data: Any)

Bases: ResultTemplate

Number of significant relationships across all combinations of attributes and MetricInterferencePerEntity.

Arguments: m, t, u, list of m_e, list of a. Result: integer, list of significances. Interpretation: quantitative; the lower the better. Since the attributes for which it is ethical to propagate interference are constant across all models and methods, a higher value directly implies a higher number of ethical violations, that is, a larger number of “transmission wires” in a given task effectively used by this method and model.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm_list: List[vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm]

interference_entity_list: List[vision_unlearning.benchmarks.I_care.configuration.type_me]

attribute_list: List[str]

top_n: int = 10

_serialize_parameters() → str

classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateImplicitAssociationTest(/, **data: Any)

Bases: ResultTemplate

Measures how the strength of automatic associations B between two pairs of entities changes after unlearning.

Arguments: m, t, u, a_1, a_2, l. Result: |a| x |a| real-valued tensor ΔB. Interpretation: qualitative; a human should decide whether it is ethical or desirable for the unlearning process to cause this change in implicit association between the chosen attributes.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

attribute_1: str

attribute_2: str

latent_embedding: vision_unlearning.benchmarks.I_care.configuration.type_l

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateMinimumCutInterference(/, **data: Any)

Bases: ResultTemplate

Interprets a task as a directed weighted graph and computes the minimum cut separating two entities As a consequence of the max-flow min-cut theorem, it directly follows that the minimum cut is the smallest influence whose removal eliminates every directed influence path from $e_1$ to $e_2$. Based on this, we conjecture that if we need to unlearn $e_1$ while minimizing harm to $e_2$, then the ideal intervention in the unlearning process is to increase the preservation of the emitter-side nodes. More intuitively, we can think of this intervention as “blocking the interference path,” as performed in electrical circuits to protect sensitive components (such as ground partitioning, shielding traces, among others. Arguments: $m$, $t$, $u$, $e_1$, $e_2$, $m_p$. Result: list of entities (corresponding to the emitter-side nodes). Interpretation: qualitative; small set of nodes through which most of the interference from $e_1$ propagates to $e_2$.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp

entity_1: str

entity_2: str

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateUnlearningVisualSummary(/, **data: Any)

Bases: ResultTemplate

!!! abstract “Usage Documentation”: [Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__: The names of the class variables defined on the model.

__private_attributes__: Metadata about the private attributes of the model.

__signature__: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__: The core schema of the model.

__pydantic_custom_init__: Whether the model has a custom __init__ function.

__pydantic_decorators__: Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__: A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__: The name of the post-init method for the model, if defined.

__pydantic_root_model__: Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__: The names of fields explicitly set during instantiation.

__pydantic_private__: Values of private attributes set on the model instance.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateInterferenceVisualSummary(/, **data: Any)

Bases: ResultTemplate

Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp

entity: str | None = None

entity_index: int | None = None

seed: int = 42

images_max_dim: int = 124

_resolve_entity(): Ensures both entity andentity_index are filled. Modifies in place At the end, both are set and consistent with each other

_serialize_parameters() → str

classmethod plot(data: dict, figsize: Tuple[int, int] | None = (18, 4), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch()

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateMatrix(/, **data: Any)

Bases: ResultTemplate

!!! abstract “Usage Documentation”: [Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__: The names of the class variables defined on the model.

__private_attributes__: Metadata about the private attributes of the model.

__signature__: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__: The core schema of the model.

__pydantic_custom_init__: Whether the model has a custom __init__ function.

__pydantic_decorators__: Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__: A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__: The name of the post-init method for the model, if defined.

__pydantic_root_model__: Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__: The names of fields explicitly set during instantiation.

__pydantic_private__: Values of private attributes set on the model instance.

metric_key_name: str

classmethod plot_make_title(data: dict) → str

Abstractmethod:

classmethod plot(data: dict, figsize: Tuple[float, float] | None = None, cmap: str = 'viridis', title: str = '', xlabel: str = 'Receiver entity', ylabel: str = 'Emitter entity', return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateInterferenceMatrix(/, **data: Any)

Bases: ResultTemplateMatrix

MetricInterferencePerEntityPair between each possible combination of two entities within a task.

Arguments: m, t, u, m_p. Result: |t| x |t| real-valued tensor. Interpretation: qualitative; visual patterns may be spotted, especially when rearranging indices in a meaningful manner (for example, grouping professions together). Further quantitative values may be derived, such as the average value or the ratio between the diagonal-average value and the non-diagonal-average value.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

interference_pair: vision_unlearning.benchmarks.I_care.configuration.type_mp

metric_key_name: str = 'interference_pair'

_serialize_parameters() → str

classmethod plot_make_title(data: dict) → str

_compute_from_scratch()

vision_unlearning.benchmarks.I_care.result_templates.jacc_metric_score(entity_1: str, entity_2: str, metadata_filtered: List[Dict[str, Any]], entity_col: str = 'name') → float: Jaccard similarity between two entities, based on their attributes. Each attribute (column) contributes between 0 and 1 to the similarity We do not know the types and ranges of the attributes beforehand. For each attribute, both values for the two entities must be non-NaN and of the same type, otherwise we ignore that attribute (contribution 0). The calculation for each attribute is as follows: * If the attribute is categorical (str or bool), the contribution is 1 if the two entities have the same value for that attribute, and 0 otherwise. * If the attribute is numerical, and both values are between 0 and 1, the contribution is 1 - abs(value_1 - value_2) * If the attribute is numerical, and both values are between 1 and 100, the contribution is 1 - abs(value_1 - value_2) / 100 * else, the contribution is 0 (we do not know how to handle it, so we ignore it)

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateSimilarityMatrix(/, **data: Any)

Bases: ResultTemplateMatrix

Similarities between each possible combination of two entities within a task. * Arguments: $m, t, s$ * Result: $|t| imes |t|$ real-valued tensor * Interpretation: qualitative; visual patterns may be spotted, similarly to InterferenceMatrix.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'scenes'

similarity_metric: vision_unlearning.benchmarks.I_care.configuration.type_s = 'clip'

metric_key_name: str = 'similarity_metric'

_serialize_parameters() → str

_get_partial_path_local()

classmethod plot_make_title(data: dict) → str

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateMethodComparisonByMetricEntity(/, **data: Any)

Bases: ResultTemplate

Compares the distribution of one MetricInterferencePerEntity across multiple unlearning methods.

Arguments: m, t, me, list of u
Result: per-method mean, median, std, n, values; box plot
Interpretation: lower or higher depending on me direction. Use to rank methods by a single interference-per-entity metric.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

interference_entity: vision_unlearning.benchmarks.I_care.configuration.type_me

unlearning_algorithm_list: List[vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm]

_serialize_parameters() → str

classmethod plot(data: dict, figsize: Tuple[int, int] = (6, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateEmbeddingUnlearningProfile(/, **data: Any)

Bases: ResultTemplate

Embedding-space profile of one unlearning event (task, method, entity).

For the specified forgotten entity, shows how all 100 entity embeddings shift between the baseline model (LoRA-OFF) and the model that forgot this entity (LoRA-ON). Quantifies whether the forgetting was targeted or diffuse in embedding space.

Arguments: model, task, unlearning_algorithm, entity.

Result: - PCA scatter (2-D) of all 100 entity mean embeddings. Baseline positions

shown as open circles; unlearned positions as filled dots. The forgotten entity is highlighted with a star; an arrow marks its displacement. Points are coloured by the entity’s self-interference (clip_diff) so that collateral damage is immediately visible.

Numeric summary: self-displacement magnitude (L2 norm), mean retained displacement, embedding_specificity_ratio (directional specificity, cosine-distance of self-displacement vs mean retained-entity displacement; same metric stored in the InterferencePerEntity (Me) for this task).

Metric note (directional vs. magnitude): The embedding_specificity_ratio uses cosine distance and therefore captures the direction of embedding change, not its magnitude. A ratio > 1 means the forgotten entity’s embedding shifts in a more novel direction than the average retained entity — this is directional specificity. This is distinct from an L2-based magnitude specificity (which would ask whether the shift is larger in absolute terms). The displacement bars on the right plot use L2 norm; the specificity ratio shown in the title uses cosine distance.

Provenance field: each result includes ratio_source (“ipe” when the ratio was read from the InterferencePerEntity (Me) for this task, “inline” when it was computed from the embedding files directly because the IPE column was absent). “ipe” is the canonical value; “inline” is a transitional fallback.

Interpretation: - Specificity ratio >> 1 and large self-displacement → targeted forgetting. - Specificity ratio ~ 1 or low self-displacement → the method caused

broad embedding drift without isolating the forgotten entity.

Compare with the image-level clip_diff in the scatter colours to detect the concealment pattern (embedding moves, image stays similar).

Relationship to other RTs: - embedding_specificity_ratio belongs to type_me / domain_me,

so it can be passed to MetricMetricAlignment and MethodComparisonByMetricEntity like any other per-entity metric.

For cross-entity summaries, see ResultTemplateEmbeddingForgettingEfficiency.
The “pinpoint-ness” concept aligns with the Holistic Unlearning Benchmark (ICCV 2025) definition of targeted forgetting.

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

entity: str

n_pca_components: int = 2

_serialize_parameters() → str

_resolve_hf_entity() → str: Return the HF-compatible entity name used in embedding file names.

_get_baseline_embedding_path() → str

_get_entity_embedding_path() → str

static _mean_embeddings(raw: dict) → Dict[str, np.ndarray]: Group embedding records by prompted_entity and compute mean per entity.

static _cosine_distance(a: numpy.ndarray, b: numpy.ndarray) → float

classmethod plot(data: dict, figsize: Tuple[int, int] = (12, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch() → dict

class vision_unlearning.benchmarks.I_care.result_templates.ResultTemplateEmbeddingForgettingEfficiency(/, **data: Any)

Bases: ResultTemplate

Embedding-space forgetting efficiency distribution for one (task, method).

Reads embedding_specificity_ratio (cosine-distance self-displacement vs. mean retained-entity displacement) from the InterferencePerEntity (Me) for this task. This RT aggregates that pre-computed metric across all entities in the task and correlates it with the image-level forgetting signal (clip_diff).

Arguments: model, task, unlearning_algorithm.

Prerequisites: The InterferencePerEntity (Me) must exist and must contain the embedding_specificity_ratio column for the requested method. Run “4. Compute interference per entity.py” first if it is missing.

Result: - Bar chart of embedding_specificity_ratio per entity, sorted

descending; dashed line at ratio = 1 (no specificity).

Scatter of embedding_specificity_ratio vs. self-clip_diff per entity, with Spearman correlation and a permutation test (n_permutations resamples; parametric t-tests are invalid here because embedding vectors from the same model are correlated by architecture and data).
Numeric summary: n_total (all entities in task), n_valid (entities with non-NaN ratio — typically those for which interference_per_pair files were available), mean/std of ratio, fraction of entities with ratio > 1 among valid entities, Spearman r between ratio and self-clip_diff, permutation p-value.

Metric note (directional vs. magnitude): embedding_specificity_ratio uses cosine distance (directional specificity). A ratio > 1 means the forgotten entity shifts in a more novel direction than the average retained entity. This is distinct from an L2-based magnitude ratio. Both numerator (self cosine distance) and denominator (mean retained cosine distance) are stored separately so a reader can distinguish “ratio is low because target barely moves” from “ratio is low because retained entities move MORE”.

Important caveat on n_valid: n_valid is typically far smaller than n_total because embedding_specificity_ratio requires interference_per_pair files for each entity. Results from a small n_valid (e.g. 19/100) are underpowered and should be treated as preliminary. The permutation test p-values are reported with n_valid in the title for transparency.

Interpretation: - A method with most ratios >> 1 surgically targets each forgotten entity

in embedding space without disturbing retained embeddings.

A high Spearman r (ratio vs. clip_diff) means embedding-space specificity and image-level forgetting agree: the method is consistently targeted at both levels. For UCE our data show r ≈ -0.14 (not significant) whereas for distil r ≈ -0.12 (not significant at n_valid=19): the two signals decouple for UCE, consistent with the concealment hypothesis (Sharma et al., arXiv 2409.05668).

Relationship to other RTs: - For per-entity detail, see ResultTemplateEmbeddingUnlearningProfile. - embedding_specificity_ratio belongs to type_me and domain_me,

so it can be passed to MetricMetricAlignment and MethodComparisonByMetricEntity like any other per-entity metric.

References concealment: “Sharma et al., arXiv 2409.05668” pinpoint: “Holistic Unlearning Benchmark (ICCV 2025)”

model: vision_unlearning.benchmarks.I_care.configuration.type_model = 'sd1.4'

task: vision_unlearning.benchmarks.I_care.configuration.type_task = 'people'

unlearning_algorithm: vision_unlearning.benchmarks.I_care.configuration.type_unlearning_algorithm

n_permutations: int = 10000

significance_threshold: float = 0.05

_serialize_parameters() → str

classmethod plot(data: dict, figsize: Tuple[int, int] = (14, 5), return_fig: bool = False) → Tuple[matplotlib.figure.Figure, matplotlib.pyplot.Axes] | None

_compute_from_scratch() → dict

vision_unlearning.benchmarks.I_care.result_templates.rt_name_to_class

vision_unlearning.benchmarks.I_care.result_templates.rt_name_to_params

vision_unlearning.benchmarks.I_care.result_templates.display_interesting_interferences(metadata_filtered: List[Dict[str, Any]], interference_per_pair: Dict[str, Dict[str, float]], index: int, task: Literal['scenes', 'objects', 'breeds', 'people'], method: Literal['munba', 'uce', 'distil'], num_train_epochs: int, metric: str, is_worst_biggest: bool, seed: int = 42, save_path: str | None = None) → None

Compared generated images for 9 identities: target, 4 worst (excluding target), 4 best @param metadata_filtered: should be appropriate for this task (this is not verified inside the function) @param interference_per_pair: should be appropriate for this task+index+method+num_train_epochs (this is not verified inside the function) @param index: identities the target

The combination of task+index+method+num_train_epochs identifies a unique unlearned model

vision_unlearning.benchmarks.I_care.result_templates.analyze_relationship_regression(df: pandas.DataFrame, x: str, y: str, expected_positive: bool = True, plot: bool = True) → bool

Test linear relationship between two numerical variables with significance test and direction check.

Returns True only if:

the slope is statistically significant (p < 0.05)
the slope sign matches expectation.

vision_unlearning.benchmarks.I_care.result_templates.analyze_relationship_category(df, metric: str, category: str, plot: bool = True) → bool

vision_unlearning.benchmarks.I_care.result_templates.analyze_relationship_numerical(df: pandas.DataFrame, attribute: str, metric: str, plot: bool = False, plot_only_significant: bool = False) → bool

Analyzes the relationship between a numerical attribute and a numerical metric @param df: interference_per_entity; assumes df[attribute] and df[metric] are numerical @param plot: whether to plot the results @param plot_only_significant: whether to plot only significant relationships; Only applies if plot=True @return: whether any significant relationship was found

—

Pearson test

Use when you want to measure a linear relationship.

Assumptions: * Both variables are continuous * Relationship is linear * Bivariate normality (both jointly Gaussian) * Homoscedasticity (constant variance) * No strong outliers (very sensitive)

Detects: linear correlation only Fails when: relationship is monotonic but non-linear, or heavy outliers exist

Spearman test

Use when you want to measure a monotonic relationship (not necessarily linear) or data is non-Gaussian.

Assumptions: * Variables are at least ordinal * Relationship is monotonic (increasing or decreasing) * No distributional assumptions * Robust to outliers

Detects: any monotonic trend (linear or curved) Fails when: relationship is non-monotonic (e.g., U-shaped)

vision_unlearning.benchmarks.I_care.result_templates.analyze_relationship_categorical(df: pandas.DataFrame, attribute: str, metric: str, plot: bool = False, plot_only_significant: bool = False, show_axhline: float | None = None, min_samples_per_category: int = 5, extra_title: str = '') → bool

Analyzes the relationship between a categorical attribute and a numerical metric @param df: interference_per_entity; assumes df[attribute] is categorical and df[metric] is numerical @param plot: whether to plot the results @param plot_only_significant: whether to plot only significant relationships; Only applies if plot=True @param show_axhline: if provided, shows a horizontal line at this y-value; Only applies if plot=True @return: whether any significant relationship was found

ANOVA (f_oneway)

Use when you want to test if group means differ across 3+ independent groups under parametric assumptions.

Assumptions: * Dependent variable is continuous * Groups are independent * Normality within each group * Homoscedasticity (equal variances) * No strong outliers

Hypothesis: * H₀: all group means are equal * H₁: at least one mean differs

Detects: differences in means Fails when: heavy skew, unequal variances, small n with non-Gaussian data

Kruskal-Wallis (kruskal)

Use when you want to test if group distributions differ without parametric assumptions.

Assumptions: * Dependent variable is ordinal or continuous * Groups are independent * Same shaped distributions (only medians should differ for clean interpretation) * No normality or equal-variance requirement

Hypothesis: * H₀: all group distributions are equal * H₁: at least one group differs

Detects: differences in medians / distributions Fails when: distributions differ in shape (then result is ambiguous)

vision_unlearning.benchmarks.I_care.result_templates.analyze_correlation_between_pairwise_metrics(df1: pandas.DataFrame, df2: pandas.DataFrame, metric1_name: str, metric2_name: str, exclude_diagonal: bool = True, plot=True, plot_only_significant=True) → bool: df1 and df2 are square DataFrames; index and cols are the same within both and among both

vision_unlearning.benchmarks.I_care.result_templates.check_eval_results(eval_results, name, threshold: float, operator: Literal['gt', 'lt']) → float: Check if the metric satisfy the EXPECTED threshold