vision_unlearning.utils.data_generation ======================================= .. py:module:: vision_unlearning.utils.data_generation Functions --------- .. autoapisummary:: vision_unlearning.utils.data_generation.generate_dataset Module Contents --------------- .. py:function:: generate_dataset(model_base_name: Optional[str], lora_name: Optional[str], prompts: List[str], output_path: str, filenames: Optional[List[str]] = None, batch_size: int = 4, device: Union[int, str, torch.device] = 'cuda', lora_requires_inversion: bool = False, model_pipeline: Optional[diffusers.AutoPipelineForText2Image] = None, seeds: Optional[List[int]] = None) -> List[Dict[str, str]] Generate images for the given prompts and save them to output_path. When seeds is provided (recommended for reproducibility): - For each seed, the function sets torch/numpy/random global state and passes a seeded torch.Generator to the pipeline call. This guarantees that running with the same model weights and the same seed produces pixel-identical images. - filenames may optionally be provided. When provided, the caller must supply exactly ``len(seeds) * len(prompts)`` filenames in seed-major order: ``[seed0_prompt0, seed0_prompt1, ..., seed1_prompt0, seed1_prompt1, ...]``. - When seeds is provided but filenames is None: filenames are auto-generated as ``{seed}_{prompt}.png`` (no prefix) for each (seed, prompt) pair. - metadata.jsonl is written once after all seeds are processed. When seeds is None (legacy mode): - filenames may be provided explicitly (one per prompt). - The pipeline is called once per batch without seeding — non-deterministic. - This path is kept for backward compatibility only. @param model_base_name: HF model name or local path. Ignored if model_pipeline given. @param lora_name: LoRA adapter path. If set, model_base_name is also required. @param prompts: Text prompts to generate images for. @param output_path: Directory where images and metadata.jsonl are saved. @param filenames: Explicit filenames (optional). - Legacy mode (seeds=None): one filename per prompt. - Seeded mode (seeds provided): len(seeds) * len(prompts) filenames in seed-major order. If None, filenames are auto-generated as ``{seed}_{prompt}.png``. @param batch_size: Number of prompts per pipeline call. @param device: Torch device. @param lora_requires_inversion: Passed to unlearn_lora if lora_name is set. @param model_pipeline: Pre-loaded pipeline (skips loading if provided). @param seeds: List of integer seeds. When provided the generation loop is seeded.