vision_unlearning.integrations.huggingface
Attributes
Functions
|
Upload an entire folder or specific model config in one single commit |
|
Download a model or specific model config from Hugging Face Hub. |
|
Checks whether a folder exists in a Hugging Face dataset repository. |
|
Checks if a specific file exists in a Hugging Face dataset repository. |
|
Upload a single file to a specific dataset config in Hugging Face Hub. |
|
Supposes that a folder dataset_config exists in folder_datasets, and that it contains the dataset files |
|
@param clean: If True, the folder will be deleted before downloading |
Download a single file from a dataset in Hugging Face Hub. |
|
|
Supposes that the credentials are properly configured |
Searches in anything starting with prefix |
|
|
Download a single file from HF via HTTP. Returns True on success. |
Download a dataset config folder from HF using parallel HTTP requests. |
Module Contents
- vision_unlearning.integrations.huggingface.logger
- vision_unlearning.integrations.huggingface.huggingface_model_upload(folder_models: str, model_repository: str, model_config: str | None = None, token: str | None = None) None[source]
Upload an entire folder or specific model config in one single commit When model_config is None, uploads entire contents of folder_models Supposes that the folder exists in folder_models, and that it contains the model files
- vision_unlearning.integrations.huggingface.huggingface_model_download(folder_models: str, model_repository: str, model_config: str | None = None, token: str | None = None, clean: bool = False) None[source]
Download a model or specific model config from Hugging Face Hub.
- Parameters:
folder_models – Local directory to save the model
model_repository – Hugging Face repository ID
model_config – Specific model config to download (None for entire repository)
token – Hugging Face authentication token
clean – If True, the folder will be deleted before downloading
- vision_unlearning.integrations.huggingface.huggingface_dataset_exists(dataset_repository: str, dataset_config: str, token: str | None) bool[source]
Checks whether a folder exists in a Hugging Face dataset repository.
Example
dataset_repository=”username/my_dataset” dataset_config=”configs/en”
Works without listing the whole repository.
- vision_unlearning.integrations.huggingface.huggingface_dataset_file_exists(dataset_repository: str, dataset_path: str, token: str | None) bool[source]
Checks if a specific file exists in a Hugging Face dataset repository.
- Parameters:
dataset_repository – e.g. “username/dataset_name”
dataset_path – full path in repo (e.g. “config/file.jsonl”)
token – HF token (can be None for public repos)
- Returns:
True if file exists, False otherwise
Efficiently checks if a file exists in a Hugging Face dataset repo without listing the entire repository. Could be done more efficiently if we use a new version of the lib, see https://chatgpt.com/share/69edd525-d008-832d-8a0c-ec4560a4fe3b
- vision_unlearning.integrations.huggingface.huggingface_dataset_file_upload(file_path: str, dataset_repository: str, dataset_path: str, token: str)[source]
Upload a single file to a specific dataset config in Hugging Face Hub. @param dataset_path: full name of the file in the repository, including the config folder (e.g., “my_config/my_file.jsonl”)
- vision_unlearning.integrations.huggingface.huggingface_dataset_upload(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str)[source]
Supposes that a folder dataset_config exists in folder_datasets, and that it contains the dataset files
- vision_unlearning.integrations.huggingface.huggingface_dataset_download(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str, clean: bool = False, folder_cache: str = '/tmp/huggingface_cache', clean_cache: bool = False)[source]
@param clean: If True, the folder will be deleted before downloading
- vision_unlearning.integrations.huggingface.huggingface_dataset_file_download(folder_datasets: str, dataset_repository: str, file_path: str, token: str, folder_cache: str = '/tmp/huggingface_cache') None[source]
Download a single file from a dataset in Hugging Face Hub.
- Parameters:
folder_datasets – Local directory where datasets are stored.
dataset_repository – Hugging Face dataset repository ID
file_path – Full path of the file within the repository (e.g., “config/data.jsonl”)
token – Hugging Face authentication token
folder_cache – Cache directory for downloads
The file will be saved at os.path.join(folder_datasets, file_path)
- vision_unlearning.integrations.huggingface.huggingface_get_model_metrics(model_id: str) Dict[str, float | int | bool][source]
Supposes that the credentials are properly configured
- vision_unlearning.integrations.huggingface.huggingface_get_model_images(model_id, prefix: str = '') List[PIL.ImageFile.ImageFile][source]
Searches in anything starting with prefix
- vision_unlearning.integrations.huggingface._huggingface_download_one_file(entry: dict, folder_dataset: str, dataset_repository: str, headers: dict) bool[source]
Download a single file from HF via HTTP. Returns True on success.
- vision_unlearning.integrations.huggingface.huggingface_dataset_download_parallel(folder_datasets: str, dataset_repository: str, dataset_config: str, token: str, clean: bool = False, folder_cache: str = 'C:/tmp/huggingface_cache', hf_prefix: str = 'datasets', max_workers: int = 12) None[source]
Download a dataset config folder from HF using parallel HTTP requests.
Faster alternative to huggingface_dataset_download() for large folders. Uses ThreadPoolExecutor(max_workers) for concurrent file downloads; reduces per-entity download time from ~6 minutes (sequential snapshot_download) to ~35 s at max_workers=12 (measured on HF for 801 PNG files, 2026-05-20).
- Parameters:
folder_datasets – Local parent directory (e.g. “assets/datasets”).
dataset_repository – HF dataset repo ID.
dataset_config – Folder name within folder_datasets AND within hf_prefix on HF (e.g. “generated_people_George W Bush_uce_000”).
token – HF auth token.
clean – If True, delete local folder before downloading.
folder_cache – Unused — kept for signature compatibility with huggingface_dataset_download().
hf_prefix – Prefix path within the HF repo (default “datasets”).
max_workers – Thread pool size for concurrent HTTP downloads. Benchmark (2026-05-20, 801 files): 1=349s, 4=91s, 8=48s, 12=35s. 12 is the recommended default; do not exceed 16 (HF rate limits).