vision_unlearning.utils.data_generation

Functions

generate_dataset(→ List[Dict[str, str]])

Generate images for the given prompts and save them to output_path.

Module Contents

vision_unlearning.utils.data_generation.generate_dataset(model_base_name: str | None, lora_name: str | None, prompts: List[str], output_path: str, filenames: List[str] | None = None, batch_size: int = 4, device: int | str | torch.device = 'cuda', lora_requires_inversion: bool = False, model_pipeline: diffusers.AutoPipelineForText2Image | None = None, seeds: List[int] | None = None) → List[Dict[str, str]]

Generate images for the given prompts and save them to output_path.

When seeds is provided (recommended for reproducibility):

For each seed, the function sets torch/numpy/random global state and passes a seeded torch.Generator to the pipeline call. This guarantees that running with the same model weights and the same seed produces pixel-identical images.
filenames may optionally be provided. When provided, the caller must supply exactly len(seeds) * len(prompts) filenames in seed-major order: [seed0_prompt0, seed0_prompt1, ..., seed1_prompt0, seed1_prompt1, ...].
When seeds is provided but filenames is None: filenames are auto-generated as {seed}_{prompt}.png (no prefix) for each (seed, prompt) pair.
metadata.jsonl is written once after all seeds are processed.

When seeds is None (legacy mode):

filenames may be provided explicitly (one per prompt).
The pipeline is called once per batch without seeding — non-deterministic.
This path is kept for backward compatibility only.

@param model_base_name: HF model name or local path. Ignored if model_pipeline given. @param lora_name: LoRA adapter path. If set, model_base_name is also required. @param prompts: Text prompts to generate images for. @param output_path: Directory where images and metadata.jsonl are saved. @param filenames: Explicit filenames (optional).

Legacy mode (seeds=None): one filename per prompt.

Seeded mode (seeds provided): len(seeds) * len(prompts) filenames in seed-major order. If None, filenames are auto-generated as {seed}_{prompt}.png.

@param batch_size: Number of prompts per pipeline call. @param device: Torch device. @param lora_requires_inversion: Passed to unlearn_lora if lora_name is set. @param model_pipeline: Pre-loaded pipeline (skips loading if provided). @param seeds: List of integer seeds. When provided the generation loop is seeded.