vision_unlearning.unlearner.fade

Implementation of FADE (in its three variants: non-sparse, sparse-per-module, sparse-per-weight). Please cite the following paper if you use this code: @misc{kelsch2026fadeselectiveforgettingsparse,

title={FADE: Selective Forgetting via Sparse LoRA and Self-Distillation}, author={Carolina R. Kelsch and Leonardo S. B. Pereira and Natnael Mola and Luis H. Arribas and Juan C. S. M. Avedillo}, year={2026}, eprint={2602.07058}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2602.07058},

}

Classes

UnlearnerLoraDistillation

SparsePEFT is not active in this trainer

UnlearnerLoraDistillationSparsePerModule

SparsePEFT is not active in this trainer

SharedR

ElasticLoRALinear

Applies an affine linear transformation to the incoming data: \(y = xA^T + b\).

UnlearnerLoraDistillationSparsePerWeight

SparsePEFT is not active in this trainer

UnlearnerLoraDistillationSparse

SparsePEFT is not active in this trainer

Functions

check_lora_sparsity(→ float)

is_valid_elastic_adapter_config(config)

is_valid_custom_adapter_config(config)

load_custom_adapter_config(custom_adapter_config_file)

get_rank_value(space, strategy)

extract_sub_adapter(adapter_model, ...[, adapter_version])

make_lora_elastic(model, r_values, target_modules[, ...])

find_module_name_by_weights(→ str)

Find the module name in model that has the exact same parameters as target_module

calculate_sparsity_lora(lora_path, filename[, device, ...])

Calculate what is the percentage of zero values in the product A*B, per target module.

Module Contents

class vision_unlearning.unlearner.fade.UnlearnerLoraDistillation(/, **data: Any)

Bases: vision_unlearning.unlearner.UnlearnerLora

SparsePEFT is not active in this trainer

overwritting_concept: str | None = None
overwrite_column: str = 'overwrite'
json_metafile: str | None = None
is_lora_negated: bool = None
{
“forget”: [

{ “file_name”: “images/Architectures-Warm_Smear-19.jpg”, “text”: “An image of Architectures in Warm Smear style.”, “overwrite”: “An image of Architectures in Photo style.” }, { “file_name”: “images/Architectures-Warm_Smear-8.jpg”, “text”: “An image of Architectures in Warm Smear style.”, “overwrite”: “An image of Architectures in Photo style.” }, { “file_name”: “images/Architectures-Warm_Smear-12.jpg”, “text”: “An image of Architectures in Warm Smear style.”, “overwrite”: “An image of Architectures in Photo style.” }

], “retain”: [

{ “file_name”: “images/Architectures-Abstractionism-6.jpg”, “text”: “An image of Architectures in Abstractionism style.”, “overwrite”: “” }, { “file_name”: “images/Architectures-Abstractionism-4.jpg”, “text”: “An image of Architectures in Abstractionism style.”, “overwrite”: “” }, { “file_name”: “images/Architectures-Abstractionism-2.jpg”, “text”: “An image of Architectures in Abstractionism style.”, “overwrite”: “” }

]

}

_pre_checks() None
_prepare_dataloaders() Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]

Get the datasets: you can either provide your own training and evaluation files or specify a Dataset from the hub (the dataset will be downloaded automatically from the datasets Hub).

In distributed training, the load_dataset function guarantees that only one local process can concurrently download the dataset. Downloading and loading a dataset from the hub.

Characteristics of the returned dataloaders: * Batch size and number of workers are set according to the training arguments. * shuffled * collate behavior: Batches are created by stacking the per-example tensors (pixel_values stacked into contiguous FloatTensor) * Fields

  • pixel_values: preprocessed images, ready to be fed to the vae (i.e. resized, cropped, normalized…). Shape=[batch size, 3, resolution, resolution]

  • input_ids: tokenized captions, ready to be fed to the text encoder. Shape=[batch size, sequence length]

  • forget_ids: tokenized overwrite captions, ready to be fed to the text encoder; Returned just be forget dataloader. Shape=[batch size, sequence length]

_train_one_batch(batch_forget, batch_retain)
class vision_unlearning.unlearner.fade.UnlearnerLoraDistillationSparsePerModule(/, **data: Any)

Bases: UnlearnerLoraDistillation

SparsePEFT is not active in this trainer

sparsity_inclusiveness: float = None
parameter_attribution_method: vision_unlearning.utils.parameter_attribution.ParameterAttributionMethod
attribution_overwrite_if_exists: bool = False
_attribution_path: str | None = None
_get_lora_config() peft.LoraConfig
vision_unlearning.unlearner.fade.check_lora_sparsity(adapter_path: str, config_file_name: str = 'adapter_config.json', model_file_name: str = 'adapter_model.bin') float
vision_unlearning.unlearner.fade.is_valid_elastic_adapter_config(config)
vision_unlearning.unlearner.fade.is_valid_custom_adapter_config(config)
vision_unlearning.unlearner.fade.load_custom_adapter_config(custom_adapter_config_file)
vision_unlearning.unlearner.fade.get_rank_value(space, strategy)
vision_unlearning.unlearner.fade.extract_sub_adapter(adapter_model: str, elastic_adapter_config_file: str, output_dir: str, adapter_version: Literal['maximal', 'heuristic', 'minimal'] = 'heuristic')
class vision_unlearning.unlearner.fade.SharedR(r_values)
r_values
current_r
__call__()
update_r()
class vision_unlearning.unlearner.fade.ElasticLoRALinear(in_features, out_features, shared_r, is_lora_A, bias=False, is_group_head=False)

Bases: torch.nn.Linear

Applies an affine linear transformation to the incoming data: \(y = xA^T + b\).

This module supports TensorFloat32.

On certain ROCm devices, when using float16 inputs this module will use different precision for backward.

Parameters:
  • in_features – size of each input sample

  • out_features – size of each output sample

  • bias – If set to False, the layer will not learn an additive bias. Default: True

Shape:
  • Input: \((*, H_\text{in})\) where \(*\) means any number of dimensions including none and \(H_\text{in} = \text{in\_features}\).

  • Output: \((*, H_\text{out})\) where all but the last dimension are the same shape as the input and \(H_\text{out} = \text{out\_features}\).

weight

the learnable weights of the module of shape \((\text{out\_features}, \text{in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_features}}\)

bias

the learnable bias of the module of shape \((\text{out\_features})\). If bias is True, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in\_features}}\)

Examples:

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])
shared_r
is_lora_A
is_group_head = False
_weight
_bias
init_weights(orig: torch.nn.Linear)
active_sub_adapter()
property masked_weight
property weight
property bias
state_dict(*args, destination=None, prefix='', keep_vars=False)

Return a dictionary containing references to the whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to None are not included.

Note

The returned object is a shallow copy. It contains references to the module’s parameters and buffers.

Warning

Currently state_dict() also accepts positional arguments for destination, prefix and keep_vars in order. However, this is being deprecated and keyword arguments will be enforced in future releases.

Warning

Please avoid the use of argument destination as it is not designed for end-users.

Parameters:
  • destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an OrderedDict will be created and returned. Default: None.

  • prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default: ''.

  • keep_vars (bool, optional) – by default the Tensor s returned in the state dict are detached from autograd. If it’s set to True, detaching will not be performed. Default: False.

Returns:

a dictionary containing a whole state of the module

Return type:

dict

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']
load_state_dict(state_dict, strict=True, assign=False)

Copy parameters and buffers from state_dict into this module and its descendants.

If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Warning

If assign is True the optimizer must be created after the call to load_state_dict unless get_swap_module_params_on_conversion() is True.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

  • assign (bool, optional) – When set to False, the properties of the tensors in the current module are preserved whereas setting it to True preserves properties of the Tensors in the state dict. The only exception is the requires_grad field of Parameter for which the value from the module is preserved. Default: False

Returns:

  • missing_keys is a list of str containing any keys that are expected

    by this module but missing from the provided state_dict.

  • unexpected_keys is a list of str containing the keys that are not

    expected by this module but present in the provided state_dict.

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

vision_unlearning.unlearner.fade.make_lora_elastic(model, r_values, target_modules, share_rank_within_layer=True, config_save_dir=None)
vision_unlearning.unlearner.fade.find_module_name_by_weights(model: torch.nn.Module, target_module: torch.nn.Module) str

Find the module name in model that has the exact same parameters as target_module Same = match by memory address

vision_unlearning.unlearner.fade.calculate_sparsity_lora(lora_path: str, filename: str, device: str = 'cuda', threshold: float = 1e-06)

Calculate what is the percentage of zero values in the product A*B, per target module. Load from HuggingFace (TODO: implemenet for local files).

@param threshold: Tolerance for “effectively zero”

class vision_unlearning.unlearner.fade.UnlearnerLoraDistillationSparsePerWeight(/, **data: Any)

Bases: UnlearnerLoraDistillation

SparsePEFT is not active in this trainer

nls: bool = None
nls_target_modules: List[str] = None
search_space: List[int] = None
share_rank_within_layer: bool = None
quantization_aware: bool = None
parameter_attribution_method: vision_unlearning.utils.parameter_attribution.ParameterAttributionMethod
attribution_overwrite_if_exists: bool = False
sparsity_inclusiveness: float = None
_output_dir_super: str | None = None
_output_dir_sub: str | None = None
_attribution_path: str | None = None
_mask_dict: Dict[str, torch.Tensor]
model_post_init(__context: dict | None = None) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

_get_lora_config() peft.LoraConfig
_hook_after_lora_init()

Side-effect: modifies self._unet in-place

_get_accelerator()
_save_lora_layers()

Side-effects: modifies self._unet in-place (casts to float32), saves two directories self._output_dir_super and self._output_dir_sub

class vision_unlearning.unlearner.fade.UnlearnerLoraDistillationSparse(/, **data: Any)

Bases: UnlearnerLoraDistillationSparsePerWeight

SparsePEFT is not active in this trainer