vision_unlearning.utils.data_generation
=======================================

.. py:module:: vision_unlearning.utils.data_generation


Functions
---------

.. autoapisummary::

   vision_unlearning.utils.data_generation.generate_dataset


Module Contents
---------------

.. py:function:: generate_dataset(model_base_name: Optional[str], lora_name: Optional[str], prompts: List[str], output_path: str, filenames: Optional[List[str]] = None, batch_size: int = 4, device: Union[int, str, torch.device] = 'cuda', lora_requires_inversion: bool = False, model_pipeline: Optional[diffusers.AutoPipelineForText2Image] = None, seeds: Optional[List[int]] = None) -> List[Dict[str, str]]

   Generate images for the given prompts and save them to output_path.

   When seeds is provided (recommended for reproducibility):
     - For each seed, the function sets torch/numpy/random global state and passes a
       seeded torch.Generator to the pipeline call.  This guarantees that running with
       the same model weights and the same seed produces pixel-identical images.
     - filenames may optionally be provided.  When provided, the caller must supply
       exactly ``len(seeds) * len(prompts)`` filenames in seed-major order:
       ``[seed0_prompt0, seed0_prompt1, ..., seed1_prompt0, seed1_prompt1, ...]``.
     - When seeds is provided but filenames is None: filenames are auto-generated as
       ``{seed}_{prompt}.png`` (no prefix) for each (seed, prompt) pair.
     - metadata.jsonl is written once after all seeds are processed.

   When seeds is None (legacy mode):
     - filenames may be provided explicitly (one per prompt).
     - The pipeline is called once per batch without seeding — non-deterministic.
     - This path is kept for backward compatibility only.

   @param model_base_name: HF model name or local path.  Ignored if model_pipeline given.
   @param lora_name: LoRA adapter path.  If set, model_base_name is also required.
   @param prompts: Text prompts to generate images for.
   @param output_path: Directory where images and metadata.jsonl are saved.
   @param filenames: Explicit filenames (optional).
       - Legacy mode (seeds=None): one filename per prompt.
       - Seeded mode (seeds provided): len(seeds) * len(prompts) filenames in seed-major
         order.  If None, filenames are auto-generated as ``{seed}_{prompt}.png``.
   @param batch_size: Number of prompts per pipeline call.
   @param device: Torch device.
   @param lora_requires_inversion: Passed to unlearn_lora if lora_name is set.
   @param model_pipeline: Pre-loaded pipeline (skips loading if provided).
   @param seeds: List of integer seeds.  When provided the generation loop is seeded.