vision_unlearning.integrations.huggingface ========================================== .. py:module:: vision_unlearning.integrations.huggingface Attributes ---------- .. autoapisummary:: vision_unlearning.integrations.huggingface.logger Functions --------- .. autoapisummary:: vision_unlearning.integrations.huggingface.huggingface_model_upload vision_unlearning.integrations.huggingface.huggingface_model_download vision_unlearning.integrations.huggingface.huggingface_dataset_exists vision_unlearning.integrations.huggingface.huggingface_dataset_file_exists vision_unlearning.integrations.huggingface.huggingface_dataset_file_upload vision_unlearning.integrations.huggingface.huggingface_dataset_upload vision_unlearning.integrations.huggingface.huggingface_dataset_download vision_unlearning.integrations.huggingface.huggingface_dataset_file_download vision_unlearning.integrations.huggingface.huggingface_get_model_metrics vision_unlearning.integrations.huggingface.huggingface_get_model_images vision_unlearning.integrations.huggingface._huggingface_download_one_file vision_unlearning.integrations.huggingface.huggingface_dataset_download_parallel Module Contents --------------- .. py:data:: logger .. py:function:: huggingface_model_upload(folder_models: str, model_repository: str, model_config: Optional[str] = None, token: Optional[str] = None) -> None Upload an entire folder or specific model config in one single commit When model_config is None, uploads entire contents of folder_models Supposes that the folder exists in `folder_models`, and that it contains the model files .. py:function:: huggingface_model_download(folder_models: str, model_repository: str, model_config: Optional[str] = None, token: Optional[str] = None, clean: bool = False) -> None Download a model or specific model config from Hugging Face Hub. :param folder_models: Local directory to save the model :param model_repository: Hugging Face repository ID :param model_config: Specific model config to download (None for entire repository) :param token: Hugging Face authentication token :param clean: If True, the folder will be deleted before downloading .. py:function:: huggingface_dataset_exists(dataset_repository: str, dataset_config: str, token: Optional[str]) -> bool Checks whether a folder exists in a Hugging Face dataset repository. .. rubric:: Example dataset_repository="username/my_dataset" dataset_config="configs/en" Works without listing the whole repository. .. py:function:: huggingface_dataset_file_exists(dataset_repository: str, dataset_path: str, token: Optional[str]) -> bool Checks if a specific file exists in a Hugging Face dataset repository. :param dataset_repository: e.g. "username/dataset_name" :param dataset_path: full path in repo (e.g. "config/file.jsonl") :param token: HF token (can be None for public repos) :return: True if file exists, False otherwise Efficiently checks if a file exists in a Hugging Face dataset repo without listing the entire repository. Could be done more efficiently if we use a new version of the lib, see https://chatgpt.com/share/69edd525-d008-832d-8a0c-ec4560a4fe3b .. py:function:: huggingface_dataset_file_upload(file_path: str, dataset_repository: str, dataset_path: str, token: str) Upload a single file to a specific dataset config in Hugging Face Hub. @param dataset_path: full name of the file in the repository, including the config folder (e.g., "my_config/my_file.jsonl") .. py:function:: huggingface_dataset_upload(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str) Supposes that a folder `dataset_config` exists in `folder_datasets`, and that it contains the dataset files .. py:function:: huggingface_dataset_download(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str, clean: bool = False, folder_cache: str = '/tmp/huggingface_cache', clean_cache: bool = False) @param clean: If True, the folder will be deleted before downloading .. py:function:: huggingface_dataset_file_download(folder_datasets: str, dataset_repository: str, file_path: str, token: str, folder_cache: str = '/tmp/huggingface_cache') -> None Download a single file from a dataset in Hugging Face Hub. :param folder_datasets: Local directory where datasets are stored. :param dataset_repository: Hugging Face dataset repository ID :param file_path: Full path of the file within the repository (e.g., "config/data.jsonl") :param token: Hugging Face authentication token :param folder_cache: Cache directory for downloads The file will be saved at os.path.join(folder_datasets, file_path) .. py:function:: huggingface_get_model_metrics(model_id: str) -> Dict[str, float | int | bool] Supposes that the credentials are properly configured .. py:function:: huggingface_get_model_images(model_id, prefix: str = '') -> List[PIL.ImageFile.ImageFile] Searches in anything starting with `prefix` .. py:function:: _huggingface_download_one_file(entry: dict, folder_dataset: str, dataset_repository: str, headers: dict) -> bool Download a single file from HF via HTTP. Returns True on success. .. py:function:: huggingface_dataset_download_parallel(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str, clean: bool = False, folder_cache: str = 'C:/tmp/huggingface_cache', hf_prefix: str = 'datasets', max_workers: int = 12) -> None Download a dataset config folder from HF using parallel HTTP requests. Faster alternative to huggingface_dataset_download() for large folders. Uses ThreadPoolExecutor(max_workers) for concurrent file downloads; reduces per-entity download time from ~6 minutes (sequential snapshot_download) to ~35 s at max_workers=12 (measured on HF for 801 PNG files, 2026-05-20). :param folder_datasets: Local parent directory (e.g. "assets/datasets"). :param dataset_repository: HF dataset repo ID. :param dataset_config: Folder name within folder_datasets AND within hf_prefix on HF (e.g. "generated_people_George W Bush_uce_000"). :param token: HF auth token. :param clean: If True, delete local folder before downloading. :param folder_cache: Unused — kept for signature compatibility with huggingface_dataset_download(). :param hf_prefix: Prefix path within the HF repo (default "datasets"). :param max_workers: Thread pool size for concurrent HTTP downloads. Benchmark (2026-05-20, 801 files): 1=349s, 4=91s, 8=48s, 12=35s. 12 is the recommended default; do not exceed 16 (HF rate limits).