| Title: | Models, Datasets and Transformations for Images |
|---|---|
| Description: | Provides access to datasets, models and preprocessing facilities for deep learning with images. Integrates seamlessly with the 'torch' package and its API borrows heavily from the 'PyTorch' vision package. |
| Authors: | Tomasz Kalinowski [ctb, cre], Daniel Falbel [aut, cph], Christophe Regouby [ctb], Akanksha Koshti [ctb], Derrick Richard [ctb], ANAMASGARD [ctb], Chandraveer Singh [ctb], Posit Software, PBC [cph, fnd] (ROR: <https://ror.org/03wc8by49>) |
| Maintainer: | Tomasz Kalinowski <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.9.0.9000 |
| Built: | 2026-05-27 07:27:50 UTC |
| Source: | https://github.com/mlverse/torchvision |
Loads an image using jpeg, png or tiff packages depending on the
file extension.
base_loader(path)base_loader(path)
path |
path or URL to load the image from. |
an channel-last array of image values with dim Height x Width x 3
Performs non-maximum suppression in a batched fashion. Each index value correspond to a category, and NMS will not be applied between elements of different categories.
batched_nms(boxes, scores, idxs, iou_threshold)batched_nms(boxes, scores, idxs, iou_threshold)
boxes |
(Tensor[N, 4]): boxes where NMS will be performed. They are expected to be
in
|
scores |
(Tensor[N]): scores for each one of the boxes |
idxs |
(Tensor[N]): indices of the categories for each one of the boxes. |
iou_threshold |
(float): discards all overlapping boxes with IoU > |
keep (Tensor): int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores
Computes the area of a set of bounding boxes, which are specified by its
coordinates.
box_area(boxes)box_area(boxes)
boxes |
(Tensor[N, 4]): boxes for which the area will be computed. They
are expected to be in
|
area (Tensor[N]): area for each box
Converts boxes from given in_fmt to out_fmt.
box_convert(boxes, in_fmt, out_fmt)box_convert(boxes, in_fmt, out_fmt)
boxes |
(Tensor[N, 4]): boxes which will be converted. |
in_fmt |
(str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']. |
out_fmt |
(str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'] |
Supported in_fmt and out_fmt are:
'xyxy': boxes are represented via corners,
being top left and
being bottom right.
'xywh' : boxes are represented via corner, width and height,
being top left,
w, h being width and height.
'cxcywh' : boxes are represented via centre, width and height,
being center of box,
w, h being width and height.
boxes (Tensor[N, 4]): Boxes into converted format.
Converts bounding boxes from format to format.
refers to center of bounding box
(w, h) are width and height of bounding box
box_cxcywh_to_xyxy(boxes)box_cxcywh_to_xyxy(boxes)
boxes |
(Tensor[N, 4]): boxes in |
boxes (Tensor(N, 4)): boxes in format.
Return intersection-over-union (Jaccard index) of boxes.
Both sets of boxes are expected to be in format with
and .
box_iou(boxes1, boxes2)box_iou(boxes1, boxes2)
boxes1 |
(Tensor[N, 4]) |
boxes2 |
(Tensor[M, 4]) |
iou (Tensor[N, M]): the NxM matrix containing the pairwise IoU values for every element in boxes1 and boxes2
Converts bounding boxes from (x, y, w, h) format to format.
(x, y) refers to top left of bouding box.
(w, h) refers to width and height of box.
box_xywh_to_xyxy(boxes)box_xywh_to_xyxy(boxes)
boxes |
(Tensor[N, 4]): boxes in (x, y, w, h) which will be converted. |
boxes (Tensor[N, 4]): boxes in format.
Converts bounding boxes from format to format.
(x1, y1) refer to top left of bounding box
(x2, y2) refer to bottom right of bounding box
box_xyxy_to_cxcywh(boxes)box_xyxy_to_cxcywh(boxes)
boxes |
(Tensor[N, 4]): boxes in |
boxes (Tensor(N, 4)): boxes in format.
Converts bounding boxes from format to (x, y, w, h) format.
(x1, y1) refer to top left of bounding box
(x2, y2) refer to bottom right of bounding box
box_xyxy_to_xywh(boxes)box_xyxy_to_xywh(boxes)
boxes |
(Tensor[N, 4]): boxes in |
boxes (Tensor[N, 4]): boxes in (x, y, w, h) format.
Utilities for resolving Caltech class identifiers to their corresponding human readable labels.
caltech_classes(class_id = 1:257)caltech_classes(class_id = 1:257)
class_id |
Integer vector of 1-based class identifiers. |
A character vector with 257 entries representing the Caltech 257 class labels.
Other class_resolution:
coco_classes(),
imagenet_classes(),
pascal_voc_classes()
Caltech Datasets
Loads the Caltech-256 Object Category Dataset for image classification. It consists of 30,607 images across 256 distinct object categories. Each category has at least 80 images, with variability in image size.
caltech101_dataset( root = tempdir(), transform = NULL, target_transform = NULL, download = FALSE ) caltech256_dataset( root = tempdir(), transform = NULL, target_transform = NULL, download = FALSE )caltech101_dataset( root = tempdir(), transform = NULL, target_transform = NULL, download = FALSE ) caltech256_dataset( root = tempdir(), transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory for dataset storage. The dataset will be stored under |
transform |
Optional function to transform input images after loading. Default is |
target_transform |
Optional function to transform labels. Default is |
download |
Logical. Whether to download the dataset if not found locally. Default is |
The Caltech-101 and Caltech-256 collections are classification datasets made of color images with varying sizes. They cover 101 and 256 object categories respectively and are commonly used for evaluating visual recognition models.
The Caltech-101 dataset contains around 9,000 images spread over 101 object categories plus a background class. Images have varying sizes.
Caltech-256 extends this to about 30,000 images across 256 categories.
An object of class caltech101_dataset, which behaves like a torch dataset.
Each element is a named list with:
x: A H x W x 3 integer array representing an RGB image.
y: An Integer representing the label.
An object of class caltech256_dataset, which behaves like a torch dataset.
Each element is a named list with:
x: A H x W x 3 integer array representing an RGB image.
y: An Integer representing the label.
Other classification_dataset:
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: caltech101 <- caltech101_dataset(download = TRUE) first_item <- caltech101[1] first_item$x # Image array first_item$y # Integer label ## End(Not run)## Not run: caltech101 <- caltech101_dataset(download = TRUE) first_item <- caltech101[1] first_item$x # Image array first_item$y # Integer label ## End(Not run)
The CIFAR datasets are benchmark classification datasets composed of 60,000 RGB thumbnail images of size 32x32 pixels. The CIFAR10 variant contains 10 classes while CIFAR100 provides 100 classes. Images are split into 50,000 training samples and 10,000 test samples.
Downloads and prepares the CIFAR100 dataset.
cifar10_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) cifar100_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )cifar10_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) cifar100_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )
root |
(string): Root directory of dataset where directory
|
train |
Logical. If TRUE, use the training set; otherwise, use the test set. Not applicable to all datasets. |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
Downloads and prepares the CIFAR archives.
A torch::dataset object. Each item is a list with:
x: a 32x32x3 integer array
y: the class label
Other classification_dataset:
caltech_dataset,
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: ds <- cifar10_dataset(root = tempdir(), download = TRUE) item <- ds[1] item$x item$y ## End(Not run)## Not run: ds <- cifar10_dataset(root = tempdir(), download = TRUE) item <- ds[1] item$x item$y ## End(Not run)
Clip boxes so that they lie inside an image of size size.
clip_boxes_to_image(boxes, size)clip_boxes_to_image(boxes, size)
boxes |
(Tensor[N, 4]): boxes in
|
size |
(Tuple[height, width]): size of the image |
clipped_boxes (Tensor[N, 4])
Loads the MS COCO dataset for image captioning.
coco_caption_dataset( root = tempdir(), train = TRUE, year = c("2014"), download = FALSE, transform = NULL, target_transform = NULL )coco_caption_dataset( root = tempdir(), train = TRUE, year = c("2014"), download = FALSE, transform = NULL, target_transform = NULL )
root |
Root directory where the dataset is stored or will be downloaded to. |
train |
Logical. If TRUE, loads the training split; otherwise, loads the validation split. |
year |
Character. Dataset version year. One of |
download |
Logical. If TRUE, downloads the dataset if it's not already present in the |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target (labels, boxes, etc.). |
An object of class coco_caption_dataset. Each item is a list:
x: an (H, W, C) numeric array containing the RGB image.
y: a character string with the image caption.
Other caption_dataset:
flickr_caption_dataset
## Not run: ds <- coco_caption_dataset( train = FALSE, download = TRUE ) example <- ds[1] # Access image and caption x <- example$x y <- example$y # Prepare image for plotting image_array <- as.numeric(x) dim(image_array) <- dim(x) plot(as.raster(image_array)) title(main = y, col.main = "black") ## End(Not run)## Not run: ds <- coco_caption_dataset( train = FALSE, download = TRUE ) example <- ds[1] # Access image and caption x <- example$x y <- example$y # Prepare image for plotting image_array <- as.numeric(x) dim(image_array) <- dim(x) plot(as.raster(image_array)) title(main = y, col.main = "black") ## End(Not run)
Utilities for resolving COCO 90 class identifiers to their corresponding human readable labels. The labels are retrieved from pytorch/vision source to be compliant with torchvision pretrained models.
coco_classes(class_id = 1:90)coco_classes(class_id = 1:90)
class_id |
Integer vector of 1-based class identifiers. |
A character vector with the COCO class names
Other class_resolution:
caltech_classes(),
imagenet_classes(),
pascal_voc_classes()
Loads the MS COCO dataset for object detection tasks only.
coco_detection_dataset( root = tempdir(), train = TRUE, year = c("2017", "2014"), download = FALSE, transform = NULL, target_transform = NULL )coco_detection_dataset( root = tempdir(), train = TRUE, year = c("2017", "2014"), download = FALSE, transform = NULL, target_transform = NULL )
root |
Root directory where the dataset is stored or will be downloaded to. |
train |
Logical. If TRUE, loads the training split; otherwise, loads the validation split. |
year |
Character. Dataset version year. One of |
download |
Logical. If TRUE, downloads the dataset if it's not already present in the |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target (labels, boxes, etc.). |
The returned image x is in CHW format (channels, height, width), matching the torch convention.
The dataset y offers object detection annotations such as bounding boxes, labels,
areas, and crowd indicators from the official COCO annotations.
Files are downloaded to a coco subdirectory in the torch cache directory for better organization.
An object of class coco_detection_dataset. Each item is a list:
x: a (C, H, W) array representing the image.
y$boxes: a (N, 4) torch_tensor of bounding boxes in the format .
y$labels: an integer torch_tensor with the class label for each object.
y$area: a float torch_tensor indicating the area of each object.
y$iscrowd: a boolean torch_tensor, where TRUE marks the object as part of a crowd.
The returned object has S3 class "image_with_bounding_box"
to enable automatic dispatch by visualization functions such as draw_bounding_boxes().
For instance segmentation tasks, use coco_segmentation_dataset instead.
coco_segmentation_dataset for instance segmentation tasks
Other detection_dataset:
pascal_voc_datasets,
rf100_biology_collection(),
rf100_damage_collection(),
rf100_document_collection(),
rf100_infrared_collection(),
rf100_medical_collection(),
rf100_underwater_collection()
## Not run: # Load dataset for object detection ds <- coco_detection_dataset( train = FALSE, year = "2017", download = TRUE ) item <- ds[1] # Visualize bounding boxes boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)## Not run: # Load dataset for object detection ds <- coco_detection_dataset( train = FALSE, year = "2017", download = TRUE ) item <- ds[1] # Visualize bounding boxes boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)
Loads the MS COCO dataset for instance segmentation tasks.
coco_segmentation_dataset( root = tempdir(), train = TRUE, year = c("2017", "2014"), download = FALSE, transform = NULL, target_transform = NULL )coco_segmentation_dataset( root = tempdir(), train = TRUE, year = c("2017", "2014"), download = FALSE, transform = NULL, target_transform = NULL )
root |
Root directory where the dataset is stored or will be downloaded to. |
train |
Logical. If TRUE, loads the training split; otherwise, loads the validation split. |
year |
Character. Dataset version year. One of |
download |
Logical. If TRUE, downloads the dataset if it's not already present in the |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target.
Use |
The returned image x is in CHW format (channels, height, width), matching the torch convention.
The dataset y offers instance segmentation annotations including labels,
crowd indicators, and segmentation masks from the official COCO annotations.
Files are downloaded to a coco subdirectory in the torch cache directory for better organization.
An object of class coco_segmentation_dataset. Each item is a list:
x: a (C, H, W) array representing the image.
y$labels: an integer torch_tensor with the class label for each object.
y$iscrowd: a boolean torch_tensor, where TRUE marks the object as part of a crowd.
y$segmentation: a list of segmentation polygons for each object.
y$masks: a (N, H, W) boolean torch_tensor containing binary segmentation masks (when using target_transform_coco_masks).
The returned object has S3 class "image_with_segmentation_mask"
to enable automatic dispatch by visualization functions such as draw_segmentation_masks().
For object detection tasks without segmentation, use coco_detection_dataset instead.
coco_detection_dataset for object detection tasks
Other segmentation_dataset:
oxfordiiitpet_segmentation_dataset(),
pascal_voc_datasets,
rf100_peixos_segmentation_dataset()
## Not run: # Load dataset for instance segmentation ds <- coco_segmentation_dataset( train = FALSE, year = "2017", download = TRUE, target_transform = target_transform_coco_masks ) item <- ds[1] # Visualize segmentation masks masked <- draw_segmentation_masks(item) tensor_image_browse(masked) ## End(Not run)## Not run: # Load dataset for instance segmentation ds <- coco_segmentation_dataset( train = FALSE, year = "2017", download = TRUE, target_transform = target_transform_coco_masks ) item <- ds[1] # Visualize segmentation masks masked <- draw_segmentation_masks(item) tensor_image_browse(masked) ## End(Not run)
A comprehensive catalog of all collections RF100 (RoboFlow 100) and EMNIST datasets available in torchvision. This data frame contains metadata about each dataset including descriptions, sizes, available splits, and collection information.
collection_catalogcollection_catalog
A data frame with datasets as rows and 17 columns:
Collection name (biology, medical, infrared, damage, underwater, document, mnist)
Dataset identifier used in collection functions
Brief description of the dataset and its purpose
Machine learning task type (currently all "object_detection")
Number of different object classes
Total images across all splits
Typical image width in pixels
Typical image height in pixels
Size of training split in megabytes
Size of test split in megabytes
Size of validation split in megabytes
Total size across all splits in megabytes
Is training split available
Is test split available
Is validation split available
R function name to load this dataset's collection
URL to the collection on RoboFlow Universe
search_collection(), get_collection_catalog()
## Not run: # View the complete catalog data(collection_catalog) View(collection_catalog) # See all biology datasets subset(collection_catalog, collection == "biology") # Find large datasets (> 100 MB) subset(collection_catalog, total_size_mb > 100) ## End(Not run)## Not run: # View the complete catalog data(collection_catalog) View(collection_catalog) # See all biology datasets subset(collection_catalog, collection == "biology") # Find large datasets (> 100 MB) subset(collection_catalog, total_size_mb > 100) ## End(Not run)
Draws bounding boxes on top of one image tensor
draw_bounding_boxes(x, ...) ## Default S3 method: draw_bounding_boxes(x, ...) ## S3 method for class 'torch_tensor' draw_bounding_boxes( x, boxes, labels = NULL, colors = NULL, color = NULL, fill = FALSE, width = 1, font = c("serif", "plain"), font_size = 10, ... ) ## S3 method for class 'image_with_bounding_box' draw_bounding_boxes(x, ...)draw_bounding_boxes(x, ...) ## Default S3 method: draw_bounding_boxes(x, ...) ## S3 method for class 'torch_tensor' draw_bounding_boxes( x, boxes, labels = NULL, colors = NULL, color = NULL, fill = FALSE, width = 1, font = c("serif", "plain"), font_size = 10, ... ) ## S3 method for class 'image_with_bounding_box' draw_bounding_boxes(x, ...)
x |
Tensor of shape (C x H x W) and dtype |
... |
Additional arguments passed to methods. |
boxes |
Tensor of size (N, 4) containing N bounding boxes in
c( |
labels |
character vector containing the labels of bounding boxes. |
colors |
character vector containing the colors of the boxes or single color for all boxes. The color can be represented as strings e.g. "red" or "#FF00FF". By default, viridis colors are generated for boxes. |
color |
Deprecated alias for |
fill |
If |
width |
Width of text shift to the bounding box. |
font |
NULL for the current font family, or a character vector of length 2 for Hershey vector fonts. |
font_size |
The requested font size in points. |
torch_tensor of size (C, H, W) of dtype uint8: Image Tensor with bounding boxes plotted.
Other image display:
draw_keypoints(),
draw_segmentation_masks(),
tensor_image_browse(),
tensor_image_display(),
vision_make_grid()
if (torch::torch_is_installed()) { ## Not run: image_tensor <- torch::torch_randint(170, 250, size = c(3, 360, 360))$to(torch::torch_uint8()) x <- torch::torch_randint(low = 1, high = 160, size = c(12,1)) y <- torch::torch_randint(low = 1, high = 260, size = c(12,1)) boxes <- torch::torch_cat(c(x, y, x + 20, y + 10), dim = 2) bboxed <- draw_bounding_boxes(image_tensor, boxes, colors = "black", fill = TRUE) tensor_image_browse(bboxed) ## End(Not run) }if (torch::torch_is_installed()) { ## Not run: image_tensor <- torch::torch_randint(170, 250, size = c(3, 360, 360))$to(torch::torch_uint8()) x <- torch::torch_randint(low = 1, high = 160, size = c(12,1)) y <- torch::torch_randint(low = 1, high = 260, size = c(12,1)) boxes <- torch::torch_cat(c(x, y, x + 20, y + 10), dim = 2) bboxed <- draw_bounding_boxes(image_tensor, boxes, colors = "black", fill = TRUE) tensor_image_browse(bboxed) ## End(Not run) }
Draws Keypoints, an object describing a body part (like rightArm or leftShoulder), on given RGB tensor image.
draw_keypoints( image, keypoints, connectivity = NULL, colors = NULL, radius = 2, width = 3 )draw_keypoints( image, keypoints, connectivity = NULL, colors = NULL, radius = 2, width = 3 )
image |
Tensor of shape (3 x H x W) and dtype |
keypoints |
Tensor of shape (N, K, 2) the K keypoints location for each of the N detected poses instance, |
connectivity |
List of integer pairs |
colors |
character vector containing the colors of the keypoints or single color for all keypoints. The color can be represented as strings e.g. "red" or "#FF00FF". By default, rainbow colors are generated for keypoints |
radius |
radius of the plotted keypoint. |
width |
width of line connecting keypoints. |
Image Tensor of dtype uint8 with keypoints drawn.
Other image display:
draw_bounding_boxes(),
draw_segmentation_masks(),
tensor_image_browse(),
tensor_image_display(),
vision_make_grid()
if (torch::torch_is_installed()) { ## Not run: image <- torch::torch_randint(190, 255, size = c(3, 360, 360))$to(torch::torch_uint8()) keypoints <- torch::torch_randint(low = 60, high = 300, size = c(4, 5, 2)) keypoint_image <- draw_keypoints(image, keypoints) tensor_image_browse(keypoint_image) ## End(Not run) }if (torch::torch_is_installed()) { ## Not run: image <- torch::torch_randint(190, 255, size = c(3, 360, 360))$to(torch::torch_uint8()) keypoints <- torch::torch_randint(low = 60, high = 300, size = c(4, 5, 2)) keypoint_image <- draw_keypoints(image, keypoints) tensor_image_browse(keypoint_image) ## End(Not run) }
Draw segmentation masks with their respective colors on top of a given RGB tensor image
draw_segmentation_masks(x, ...) ## Default S3 method: draw_segmentation_masks(x, ...) ## S3 method for class 'torch_tensor' draw_segmentation_masks(x, masks, alpha = 0.8, colors = NULL, ...) ## S3 method for class 'image_with_segmentation_mask' draw_segmentation_masks(x, alpha = 0.5, colors = NULL, ...)draw_segmentation_masks(x, ...) ## Default S3 method: draw_segmentation_masks(x, ...) ## S3 method for class 'torch_tensor' draw_segmentation_masks(x, masks, alpha = 0.8, colors = NULL, ...) ## S3 method for class 'image_with_segmentation_mask' draw_segmentation_masks(x, alpha = 0.5, colors = NULL, ...)
x |
Tensor of shape (C x H x W) and dtype |
... |
Additional arguments passed to methods. |
masks |
torch_tensor of shape (num_masks, H, W) or (H, W) and dtype bool. |
alpha |
number between 0 and 1 denoting the transparency of the masks. |
colors |
character vector containing the colors of the boxes or single color for all boxes. The color can be represented as strings e.g. "red" or "#FF00FF". By default, viridis colors are generated for masks |
torch_tensor of shape (3, H, W) and dtype uint8 of the image with segmentation masks drawn on top.
Other image display:
draw_bounding_boxes(),
draw_keypoints(),
tensor_image_browse(),
tensor_image_display(),
vision_make_grid()
image_tensor <- torch::torch_randint(170, 250, size = c(3, 360, 360))$to(torch::torch_uint8()) mask <- torch::torch_tril(torch::torch_ones(c(360, 360)))$to(torch::torch_bool()) masked_image <- draw_segmentation_masks(image_tensor, mask, alpha = 0.2) tensor_image_browse(masked_image)image_tensor <- torch::torch_randint(170, 250, size = c(3, 360, 360))$to(torch::torch_uint8()) mask <- torch::torch_tril(torch::torch_ones(c(360, 360)))$to(torch::torch_bool()) masked_image <- draw_segmentation_masks(image_tensor, mask, alpha = 0.2) tensor_image_browse(masked_image)
A collection of Sentinel-2 satellite images for land-use classification. The standard version contains 27,000 RGB thumbnails (64x64) across 10 classes. Variants include the full 13 spectral bands and a small 100-image subset useful for demos.
Downloads and prepares the EuroSAT dataset with 13 spectral bands.
A subset of 100 images with 13 spectral bands useful for workshops and demos.
eurosat_dataset( root = tempdir(), split = "val", download = FALSE, transform = NULL, target_transform = NULL ) eurosat_all_bands_dataset( root = tempdir(), split = "val", download = FALSE, transform = NULL, target_transform = NULL ) eurosat100_dataset( root = tempdir(), split = "val", download = FALSE, transform = NULL, target_transform = NULL )eurosat_dataset( root = tempdir(), split = "val", download = FALSE, transform = NULL, target_transform = NULL ) eurosat_all_bands_dataset( root = tempdir(), split = "val", download = FALSE, transform = NULL, target_transform = NULL ) eurosat100_dataset( root = tempdir(), split = "val", download = FALSE, transform = NULL, target_transform = NULL )
root |
(Optional) Character. The root directory where the dataset will be stored.
if empty, will use the default |
split |
One of |
download |
Logical. If TRUE, downloads the dataset to |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
eurosat_dataset() provides a total of 27,000 RGB labeled images.
eurosat_all_bands_dataset() provides a total of 27,000 labeled images with 13 spectral channel bands.
eurosat100_dataset() provides a subset of 100 labeled images with 13 spectral channel bands.
A torch::dataset object. Each item is a list with:
x: a 64x64 image tensor with 3 (RGB) or 13 (all bands) channels
y: the class label
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: # Initialize the dataset ds <- eurosat100_dataset(split = "train", download = TRUE) # Access the first item head <- ds[1] print(head$x) # Image print(head$y) # Label ## End(Not run)## Not run: # Initialize the dataset ds <- eurosat100_dataset(split = "train", download = TRUE) # Access the first item head <- ds[1] print(head$x) # Image print(head$y) # Label ## End(Not run)
Loads the FER-2013 dataset for facial expression recognition. The dataset contains grayscale images
(48x48) of human faces, each labeled with one of seven emotion categories:
"Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", and "Neutral".
fer_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )fer_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )
root |
(string, optional): Root directory for dataset storage,
the dataset will be stored under |
train |
Logical. If TRUE, use the training set; otherwise, use the test set. Not applicable to all datasets. |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
The dataset is split into:
"Train": training images labeled as "Training" in the original CSV.
"Test": includes both "PublicTest" and "PrivateTest" entries.
A torch dataset of class fer_dataset.
Each element is a named list:
x: a 48x48 grayscale array
y: an integer from 1 to 7 indicating the class index
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: fer <- fer_dataset(train = TRUE, download = TRUE) first_item <- fer[1] first_item$x # 48x48 grayscale array first_item$y # 4 fer$classes[first_item$y] # "Happy" ## End(Not run)## Not run: fer <- fer_dataset(train = TRUE, download = TRUE) first_item <- fer[1] first_item$x # 48x48 grayscale array first_item$y # 4 fer$classes[first_item$y] # "Happy" ## End(Not run)
The FGVC-Aircraft dataset supports the following official splits:
"train": training subset with labels.
"val": validation subset with labels.
"trainval": combined training and validation set with labels.
"test": test set with labels (used for evaluation).
fgvc_aircraft_dataset( root = tempdir(), split = "train", annotation_level = "variant", transform = NULL, target_transform = NULL, download = FALSE )fgvc_aircraft_dataset( root = tempdir(), split = "train", annotation_level = "variant", transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory for dataset storage. The dataset will be stored under |
split |
Character. One of |
annotation_level |
Character. Level of annotation to use for classification. Default is |
transform |
Optional function to transform input images after loading. Default is |
target_transform |
Optional function to transform labels. Default is |
download |
Logical. Whether to download the dataset if not found locally. Default is |
The annotation_level determines the granularity of labels used for classification and supports four values:
"variant": the most fine-grained level, e.g., "Boeing 737-700". There are 100 visually distinguishable variants.
"family": a mid-level grouping, e.g., "Boeing 737", which includes multiple variants. There are 70 distinct families.
"manufacturer": the coarsest level, e.g., "Boeing", grouping multiple families under a single manufacturer. There are 30 manufacturers.
"all": multi-label format that returns all three levels as a vector of class indices c(manufacturer_idx, family_idx, variant_idx).
These levels form a strict hierarchy: each "manufacturer" consists of multiple "families", and each "family" contains several "variants".
Not all combinations of levels are valid — for example, a "variant" always belongs to exactly one "family", and a "family" to exactly one "manufacturer".
When annotation_level = "all" is used, the $classes field is a named list with three components:
classes$manufacturer: a character vector of manufacturer names
classes$family: a character vector of family names
classes$variant: a character vector of variant names
An object of class fgvc_aircraft_dataset, which behaves like a torch-style dataset.
Each element is a named list with:
x: an array of shape (H, W, C) with pixel values in the range (0, 255). Please note that images have varying sizes.
y: for single-level annotation ("variant", "family", "manufacturer"): an integer class label.
for multi-level annotation ("all"): a vector of three integers c(manufacturer_idx, family_idx, variant_idx).
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: # Single-label classification fgvc <- fgvc_aircraft_dataset(transform = transform_to_tensor, download = TRUE) # Create a custom collate function to resize images and prepare batches resize_collate_fn <- function(batch) { xs <- lapply(batch, function(item) { torchvision::transform_resize(item$x, c(768, 1024)) }) xs <- torch::torch_stack(xs) ys <- torch::torch_tensor(sapply(batch, function(item) item$y), dtype = torch::torch_long()) list(x = xs, y = ys) } dl <- torch::dataloader(dataset = fgvc, batch_size = 2, collate_fn = resize_collate_fn) batch <- dataloader_next(dataloader_make_iter(dl)) batch$x # batched image tensors with shape (2, 3, 768, 1024) batch$y # class labels as integer tensor of shape 2 # Multi-label classification fgvc <- fgvc_aircraft_dataset(split = "test", annotation_level = "all") item <- fgvc[1] item$x # a double vector representing the image item$y # an integer vector of length 3: manufacturer, family, and variant indices fgvc$classes$manufacturer[item$y[1]] # e.g., "Boeing" fgvc$classes$family[item$y[2]] # e.g., "Boeing 707" fgvc$classes$variant[item$y[3]] # e.g., "707-320" ## End(Not run)## Not run: # Single-label classification fgvc <- fgvc_aircraft_dataset(transform = transform_to_tensor, download = TRUE) # Create a custom collate function to resize images and prepare batches resize_collate_fn <- function(batch) { xs <- lapply(batch, function(item) { torchvision::transform_resize(item$x, c(768, 1024)) }) xs <- torch::torch_stack(xs) ys <- torch::torch_tensor(sapply(batch, function(item) item$y), dtype = torch::torch_long()) list(x = xs, y = ys) } dl <- torch::dataloader(dataset = fgvc, batch_size = 2, collate_fn = resize_collate_fn) batch <- dataloader_next(dataloader_make_iter(dl)) batch$x # batched image tensors with shape (2, 3, 768, 1024) batch$y # class labels as integer tensor of shape 2 # Multi-label classification fgvc <- fgvc_aircraft_dataset(split = "test", annotation_level = "all") item <- fgvc[1] item$x # a double vector representing the image item$y # an integer vector of length 3: manufacturer, family, and variant indices fgvc$classes$manufacturer[item$y[1]] # e.g., "Boeing" fgvc$classes$family[item$y[2]] # e.g., "Boeing 707" fgvc$classes$variant[item$y[3]] # e.g., "707-320" ## End(Not run)
Flickr8k Dataset
flickr8k_caption_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) flickr30k_caption_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )flickr8k_caption_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) flickr30k_caption_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory where the dataset will be stored under |
train |
: If |
transform |
Optional function to transform input images after loading. Default is |
target_transform |
Optional function to transform labels. Default is |
download |
Logical. Whether to download the dataset if not found locally. Default is |
The Flickr8k and Flickr30k collections are image captionning datasets composed of 8,000 and 30,000 color images respectively, each paired with five human-annotated captions. The images are in RGB format with varying spatial resolutions, and these datasets are widely used for training and evaluating vision-language models.
A torch dataset of class flickr8k_caption_dataset.
Each element is a named list:
x: a H x W x 3 integer array representing an RGB image.
y: a character vector containing all five captions associated with the image.
A torch dataset of class flickr30k_caption_dataset.
Each element is a named list:
x: a H x W x 3 integer array representing an RGB image.
y: a character vector containing all five captions associated with the image.
Other caption_dataset:
coco_caption_dataset()
## Not run: # Load the Flickr8k caption dataset flickr8k <- flickr8k_caption_dataset(download = TRUE) # Access the first item first_item <- flickr8k[1] first_item$x # image array with shape {3, H, W} first_item$y # character vector containing five captions. # Load the Flickr30k caption dataset flickr30k <- flickr30k_caption_dataset(download = TRUE) # Access the first item first_item <- flickr30k[1] first_item$x # image array with shape {3, H, W} first_item$y # character vector containing five captions. ## End(Not run)## Not run: # Load the Flickr8k caption dataset flickr8k <- flickr8k_caption_dataset(download = TRUE) # Access the first item first_item <- flickr8k[1] first_item$x # image array with shape {3, H, W} first_item$y # character vector containing five captions. # Load the Flickr30k caption dataset flickr30k <- flickr30k_caption_dataset(download = TRUE) # Access the first item first_item <- flickr30k[1] first_item$x # image array with shape {3, H, W} first_item$y # character vector containing five captions. ## End(Not run)
Loads the Oxford 102 Category Flower Dataset. This dataset consists of 102 flower categories, with between 40 and 258 images per class. Images in this dataset are of variable sizes.
flowers102_dataset( root = tempdir(), split = "train", transform = NULL, target_transform = NULL, download = FALSE )flowers102_dataset( root = tempdir(), split = "train", transform = NULL, target_transform = NULL, download = FALSE )
root |
Root directory for dataset storage. The dataset will be stored under |
split |
One of |
transform |
Optional function to transform input images after loading. Default is |
target_transform |
Optional function to transform labels. Default is |
download |
Logical. Whether to download the dataset if not found locally. Default is |
This is a classification dataset where the goal is to assign each image to one of the 102 flower categories.
The dataset is split into:
"train": training subset with labels.
"val": validation subset with labels.
"test": test subset with labels (used for evaluation).
An object of class flowers102_dataset, which behaves like a torch dataset.
Each element is a named list:
x: a W x H x 3 numeric array representing an RGB image.
y: an integer label indicating the class index.
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: # Load the dataset with inline transforms flowers <- flowers102_dataset( split = "train", download = TRUE, transform = . %>% transform_to_tensor() %>% transform_resize(c(224, 224)) ) # Create a dataloader dl <- dataloader( dataset = flowers, batch_size = 4 ) # Access a batch batch <- dataloader_next(dataloader_make_iter(dl)) batch$x # Tensor of shape (4, 3, 224, 224) batch$y # Tensor of shape (4,) with numeric class labels ## End(Not run)## Not run: # Load the dataset with inline transforms flowers <- flowers102_dataset( split = "train", download = TRUE, transform = . %>% transform_to_tensor() %>% transform_resize(c(224, 224)) ) # Create a dataloader dl <- dataloader( dataset = flowers, batch_size = 4 ) # Access a batch batch <- dataloader_next(dataloader_make_iter(dl)) batch$x # Tensor of shape (4, 3, 224, 224) batch$y # Tensor of shape (4,) with numeric class labels ## End(Not run)
Return generalized intersection-over-union (Jaccard index) of boxes.
Both sets of boxes are expected to be in format with
and .
generalized_box_iou(boxes1, boxes2)generalized_box_iou(boxes1, boxes2)
boxes1 |
(Tensor[N, 4]) |
boxes2 |
(Tensor[M, 4]) |
Implementation adapted from https://github.com/facebookresearch/detr/blob/master/util/box_ops.py
generalized_iou (Tensor[N, M]): the NxM matrix containing the pairwise generalized_IoU values for every element in boxes1 and boxes2
Returns the complete catalog of datasets in collections with their metadata. This is a convenience function that loads and returns the collection_catalog data.
get_collection_catalog()get_collection_catalog()
A data frame with all datasets and their metadata.
search_collection(), collection_catalog
## Not run: # Get complete catalog catalog <- get_collection_catalog() # View in RStudio View(catalog) # Summary statistics summary(catalog$total_size_mb) table(catalog$collection) # Find smallest dataset catalog[which.min(catalog$total_size_mb), ] # Find largest dataset catalog[which.max(catalog$total_size_mb), ] ## End(Not run)## Not run: # Get complete catalog catalog <- get_collection_catalog() # View in RStudio View(catalog) # Summary statistics summary(catalog$total_size_mb) table(catalog$collection) # Find smallest dataset catalog[which.min(catalog$total_size_mb), ] # Find largest dataset catalog[which.max(catalog$total_size_mb), ] ## End(Not run)
A generic data loader for images stored in folders.
See Details for more information.
image_folder_dataset( root, transform = NULL, target_transform = NULL, loader = NULL, is_valid_file = NULL )image_folder_dataset( root, transform = NULL, target_transform = NULL, loader = NULL, is_valid_file = NULL )
root |
Root directory path. |
transform |
A function/transform that takes in an PIL image and returns
a transformed version. E.g, |
target_transform |
A function/transform that takes in the target and transforms it. |
loader |
A function to load an image given its path. |
is_valid_file |
A function that takes path of an Image file and check if the file is a valid file (used to check of corrupt files) |
This function assumes that the images for each class are contained
in subdirectories of root. The names of these subdirectories are stored
in the classes attribute of the returned object.
An example folder structure might look as follows:
root/dog/xxx.png root/dog/xxy.png root/dog/xxz.png root/cat/123.png root/cat/nsdf3.png root/cat/asd932_.png
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
Utilities for resolving ImageNet-1k class identifiers to their corresponding human readable labels. The labels are retrieved from the same source used by PyTorch's reference implementation.
imagenet_classes(class_id = 1:1000) imagenet_1k_classes(class_id = 1:1000) imagenet_21k_df(class_id = 1:21843) imagenet_21k_classes(class_id)imagenet_classes(class_id = 1:1000) imagenet_1k_classes(class_id = 1:1000) imagenet_21k_df(class_id = 1:21843) imagenet_21k_classes(class_id)
class_id |
Integer vector of 1-based class identifiers. |
A character vector with 1000 entries representing the ImageNet-1k class labels.
A data.frame containing columns id and label representing
the ImageNet-21k class identifiers and labels. By default, returns all 21.8K rows.
A character vector with the labels associated with class_id.
Other class_resolution:
caltech_classes(),
coco_classes(),
pascal_voc_classes()
Other class_resolution:
caltech_classes(),
coco_classes(),
pascal_voc_classes()
Other class_resolution:
caltech_classes(),
coco_classes(),
pascal_voc_classes()
Other class_resolution:
caltech_classes(),
coco_classes(),
pascal_voc_classes()
Labelled Faces in the Wild (LFW) Datasets
lfw_people_dataset( root = tempdir(), transform = NULL, split = "original", target_transform = NULL, download = FALSE ) lfw_pairs_dataset( root = tempdir(), train = TRUE, transform = NULL, split = "original", target_transform = NULL, download = FALSE )lfw_people_dataset( root = tempdir(), transform = NULL, split = "original", target_transform = NULL, download = FALSE ) lfw_pairs_dataset( root = tempdir(), train = TRUE, transform = NULL, split = "original", target_transform = NULL, download = FALSE )
root |
Root directory for dataset storage. The dataset will be stored under |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
split |
Which version of the dataset to use. One of |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
train |
For |
The LFW dataset collection provides facial images for evaluating face recognition systems. It includes two variants:
lfw_people_dataset: A multi-class classification dataset where each image is labelled by person identity.
lfw_pairs_dataset: A face verification dataset containing image pairs with binary labels (same or different person).
This R implementation of the LFW dataset is based on the fetch_lfw_people() and fetch_lfw_pairs() functions from the scikit-learn library,
but deviates in a few key aspects due to dataset availability and R API conventions:
The color and resize arguments from Python are not directly exposed. Instead, all images are RGB with a fixed size of 250x250.
The split argument in Python (e.g., train, test, 10fold) is simplified to a train boolean flag in R.
The 10fold split is not supported, as the original protocol files are unavailable or incompatible with clean separation of image-label pairs.
The split parameter in R controls which version of the dataset to use: "original" (unaligned) or "funneled" (aligned using funneling).
The funneled version contains geometrically normalized face images, offering better alignment and typically improved performance for face recognition models.
The dataset is downloaded from Figshare,
which hosts the same files referenced in scikit-learn's dataset utilities.
lfw_people_dataset: 13,233 images across multiple identities (using either "original" or "funneled" splits)
lfw_pairs_dataset:
Training split (train = TRUE): 2,200 image pairs
Test split (train = FALSE): 1,000 image pairs
A torch dataset object lfw_people_dataset or lfw_pairs_dataset.
Each element is a named list with:
x:
For lfw_people_dataset: a H x W x 3 numeric array representing a single RGB image.
For lfw_pairs_dataset: a list of two H x W x 3 numeric arrays representing a pair of RGB images.
y:
For lfw_people_dataset: an integer index from 1 to the number of identities in the dataset.
For lfw_pairs_dataset: 1 if the pair shows the same person, 2 if different people.
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: # Load data for LFW People Dataset lfw <- lfw_people_dataset(download = TRUE) first_item <- lfw[1] first_item$x # RGB image first_item$y # Label index lfw$classes[first_item$y] # person's name (e.g., "Aaron_Eckhart") # Load training data for LFW Pairs Dataset lfw <- lfw_pairs_dataset(download = TRUE, train = TRUE) first_item <- lfw[1] first_item$x # List of 2 RGB Images first_item$x[[1]] # RGB Image first_item$x[[2]] # RGB Image first_item$y # Label index lfw$classes[first_item$y] # Class Name (e.g., "Same" or "Different") # Load test data for LFW Pairs Dataset lfw <- lfw_pairs_dataset(download = TRUE, train = FALSE) first_item <- lfw[1] first_item$x # List of 2 RGB Images first_item$x[[1]] # RGB Image first_item$x[[2]] # RGB Image first_item$y # Label index lfw$classes[first_item$y] # Class Name (e.g., "Same" or "Different") ## End(Not run)## Not run: # Load data for LFW People Dataset lfw <- lfw_people_dataset(download = TRUE) first_item <- lfw[1] first_item$x # RGB image first_item$y # Label index lfw$classes[first_item$y] # person's name (e.g., "Aaron_Eckhart") # Load training data for LFW Pairs Dataset lfw <- lfw_pairs_dataset(download = TRUE, train = TRUE) first_item <- lfw[1] first_item$x # List of 2 RGB Images first_item$x[[1]] # RGB Image first_item$x[[2]] # RGB Image first_item$y # Label index lfw$classes[first_item$y] # Class Name (e.g., "Same" or "Different") # Load test data for LFW Pairs Dataset lfw <- lfw_pairs_dataset(download = TRUE, train = FALSE) first_item <- lfw[1] first_item$x # List of 2 RGB Images first_item$x[[1]] # RGB Image first_item$x[[2]] # RGB Image first_item$y # Label index lfw$classes[first_item$y] # Class Name (e.g., "Same" or "Different") ## End(Not run)
List all available datasets within a specific RF100 collection.
list_collection_datasets(collection)list_collection_datasets(collection)
collection |
Collection name. One of: "biology", "medical", "infrared", "damage", "underwater", "document". |
Character vector of dataset names in the collection.
search_collection(), get_collection_catalog()
## Not run: # List all biology datasets list_collection_datasets("biology") # List all medical datasets list_collection_datasets("medical") ## End(Not run)## Not run: # List all biology datasets list_collection_datasets("biology") # List all medical datasets list_collection_datasets("medical") ## End(Not run)
Load an image located at path using the {magick} package.
magick_loader(path)magick_loader(path)
path |
path or URL to load the image from. |
an magick-image object as result of image_read()
Prepares various MNIST-style image classification datasets and optionally downloads them. Images are thumbnails images of 28 x 28 pixels of grayscale values encoded as integer.
mnist_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) kmnist_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) qmnist_dataset( root = tempdir(), split = "train", transform = NULL, target_transform = NULL, download = FALSE ) fashion_mnist_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) emnist_collection( root = tempdir(), split = "test", dataset = "balanced", transform = NULL, target_transform = NULL, download = FALSE ) emnist_dataset(kind, ...)mnist_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) kmnist_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) qmnist_dataset( root = tempdir(), split = "train", transform = NULL, target_transform = NULL, download = FALSE ) fashion_mnist_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) emnist_collection( root = tempdir(), split = "test", dataset = "balanced", transform = NULL, target_transform = NULL, download = FALSE ) emnist_dataset(kind, ...)
root |
Root directory for dataset storage. The dataset will be stored under |
train |
Logical. If TRUE, use the training set; otherwise, use the test set. Not applicable to all datasets. |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
split |
Character. Used in |
dataset |
to select within |
kind |
the |
... |
the other |
MNIST: Original handwritten digit dataset.
Fashion-MNIST: Clothing item images for classification.
Kuzushiji-MNIST: Japanese cursive character dataset.
QMNIST: Extended MNIST with high-precision NIST data.
EMNIST: A collection of letters and digits with multiple datasets and splits.
A torch dataset object, where each items is a list of x (image) and y (label).
kmnist_dataset(): Kuzushiji-MNIST cursive Japanese character dataset.
qmnist_dataset(): Extended MNIST dataset with high-precision test data (QMNIST).
fashion_mnist_dataset(): Fashion-MNIST clothing image dataset.
emnist_collection(): EMNIST collection with digits and letters arranged in multiple datasets.
emnist_dataset(): Deprecated. Please use emnist_collection.
datasets for emnist_collection()
"byclass": 62 classes (digits + uppercase + lowercase)
"bymerge": 47 classes (merged uppercase and lowercase)
"balanced": 47 classes, balanced digits and letters
"letters": 26 uppercase letters
"digits": 10 digit classes
"mnist": Standard MNIST digit classes
splits for qmnist_dataset()
"train": 60,000 training samples (MNIST-compatible)
"test": Extended test set
"nist": Full NIST digit set
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: ds <- mnist_dataset(download = TRUE) item <- ds[1] item$x # image item$y # label qmnist <- qmnist_dataset(split = "train", download = TRUE) item <- qmnist[1] item$x item$y emnist <- emnist_collection(dataset = "balanced", split = "test", download = TRUE) item <- emnist[1] item$x item$y kmnist <- kmnist_dataset(download = TRUE, train = FALSE) fmnist <- fashion_mnist_dataset(download = TRUE, train = TRUE) ## End(Not run)## Not run: ds <- mnist_dataset(download = TRUE) item <- ds[1] item$x # image item$y # label qmnist <- qmnist_dataset(split = "train", download = TRUE) item <- qmnist[1] item$x item$y emnist <- emnist_collection(dataset = "balanced", split = "test", download = TRUE) item <- emnist[1] item$x item$y kmnist <- kmnist_dataset(download = TRUE, train = FALSE) fmnist <- fashion_mnist_dataset(download = TRUE, train = TRUE) ## End(Not run)
AlexNet model architecture from the One weird trick... paper.
model_alexnet(pretrained = FALSE, progress = TRUE, ...)model_alexnet(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
other parameters passed to the model intializer. currently only
|
Other classification_model:
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
Implements the ConvNeXt architecture from ConvNeXt: A ConvNet for the 2020s
model_convnext_tiny_1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 1000, ... ) model_convnext_tiny_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_small_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_small_22k1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_base_1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 1000, ... ) model_convnext_base_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_large_1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 1000, ... ) model_convnext_large_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... )model_convnext_tiny_1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 1000, ... ) model_convnext_tiny_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_small_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_small_22k1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_base_1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 1000, ... ) model_convnext_base_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... ) model_convnext_large_1k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 1000, ... ) model_convnext_large_22k( pretrained = FALSE, progress = TRUE, channels = 3, num_classes = 21841, ... )
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
channels |
The number of channels in the input image. Default: 3. |
num_classes |
number of output classes (default: 1000). |
... |
Other parameters passed to the model implementation. |
model_convnext_tiny_1k(): ConvNeXt Tiny model trained on Imagenet 1k.
model_convnext_tiny_22k(): ConvNeXt Tiny model trained on Imagenet 22k.
model_convnext_small_22k(): ConvNeXt Small model trained on Imagenet 22k.
model_convnext_small_22k1k(): ConvNeXt Small model pretrained on Imagenet 1k
and fine-tuned on Imagenet 22k classes.
model_convnext_base_1k(): ConvNeXt Base model trained on Imagenet 1k.
model_convnext_base_22k(): ConvNeXt Base model trained on Imagenet 22k.
model_convnext_large_1k(): ConvNeXt Large model trained on Imagenet 1k.
model_convnext_large_22k(): ConvNeXt Large model trained on Imagenet 22k.
| Model | Top-1 Acc| Params | GFLOPS | File Size | `num_classes`| image size | |----------------------|----------|--------|--------|-----------|--------------|------------| | convnext_tiny_1k | 82.1% | 28M | 4.5 | 109 MB | 1000 | 224 x 224 | | convnext_tiny_22k | 82.9% | 29M | 4.5 | 170 MB | 21841 | 224 x 224 | | convnext_small_22k | 84.6% | 50M | 8.7 | 252 MB | 21841 | 224 x 224 | | convnext_small_22k1k | 84.6% | 50M | 8.7 | 252 MB | 21841 | 224 x 224 | | convnext_base_1k | 85.1% | 89M | 15.4 | 338 MB | 1000 | 224 x 224 | | convnext_base_22k | 85.8% | 89M | 15.4 | 420 MB | 21841 | 224 x 224 | | convnext_large_1k | 84.3% | 198M | 34.4 | 750 MB | 1000 | 224 x 224 | | convnext_large_22k | 86.6% | 198M | 34.4 | 880 MB | 21841 | 224 x 224 |
Other classification_model:
model_alexnet(),
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
## Not run: # 1. Download sample image (dog) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see # https://pytorch.org/vision/stable/models.html norm_std <- c(0.229, 0.224, 0.225) img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg" img <- base_loader(img_url) # 2. Convert to tensor (RGB only), resize and normalize input <- img %>% transform_to_tensor() %>% transform_resize(c(224, 224)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # 3. Load pretrained models model_small <- convnext_tiny_1k(pretrained = TRUE, root = tempdir()) model_small$eval() # 4. Forward pass output_s <- model_small(batch) # 5. Show Top-5 predictions topk <- output_s$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)## Not run: # 1. Download sample image (dog) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see # https://pytorch.org/vision/stable/models.html norm_std <- c(0.229, 0.224, 0.225) img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg" img <- base_loader(img_url) # 2. Convert to tensor (RGB only), resize and normalize input <- img %>% transform_to_tensor() %>% transform_resize(c(224, 224)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # 3. Load pretrained models model_small <- convnext_tiny_1k(pretrained = TRUE, root = tempdir()) model_small$eval() # 4. Forward pass output_s <- model_small(batch) # 5. Show Top-5 predictions topk <- output_s$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)
Object detection models combining a ConvNeXt backbone with a Feature Pyramid
Network (FPN) and the Faster R-CNN detection head. The architecture mirrors
model_fasterrcnn_resnet50_fpn(), with the ResNet backbone replaced by
ConvNeXt variants. The design follows the paper
A ConvNet for the 2020s.
model_convnext_tiny_detection()
model_convnext_small_detection()
model_convnext_base_detection()
Accuracy metrics reflect backbone classification performance only. Detection head weights are randomly initialized and must be fine-tuned on task-specific labelled data before meaningful predictions are produced.
| Model | Top-1 Acc | Top-5 Acc | Params | GFLOPS | File Size | Backbone Weights | Notes | |-----------------------------------|-----------|-----------|---------|--------|-----------|-------------------------------|--------------------------| | model_convnext_tiny_detection | 82.5% | 96.1% | 28.6M | 4.46 | 109 MB | IMAGENET1K_V1 | Tiny backbone, FPN head | | model_convnext_small_detection | 83.6% | 96.7% | 50.2M | 8.68 | 192 MB | IMAGENET1K_V1 (22k pretrain) | Small backbone, FPN head | | model_convnext_base_detection | 84.1% | 96.9% | 88.6M | 15.36 | 338 MB | IMAGENET1K_V1 | Base backbone, FPN head |
Each ConvNeXt variant produces four feature maps (C2–C5) fed into the FPN. Channel widths differ between Tiny/Small and Base:
| Variant | FPN in_channels | FPN out_channels | |---------|--------------------------|------------------| | Tiny | c(96, 192, 384, 768) | 256 | | Small | c(96, 192, 384, 768) | 256 | | Base | c(128, 256, 512, 1024) | 256 |
All variants use IMAGENET1K_V1 backbone weights by default (supervised ImageNet-1k).
The Small variant backbone (model_convnext_small_22k) was additionally
pretrained on ImageNet-22k prior to fine-tuning on ImageNet-1k.
Detection head weights are randomly initialized — bounding-box predictions are meaningless without fine-tuning on labelled detection data.
Set pretrained_backbone = TRUE to load ImageNet backbone weights.
model_convnext_tiny_detection( num_classes = 91, pretrained_backbone = FALSE, ... ) model_convnext_small_detection( num_classes = 91, pretrained_backbone = FALSE, ... ) model_convnext_base_detection( num_classes = 91, pretrained_backbone = FALSE, ... )model_convnext_tiny_detection( num_classes = 91, pretrained_backbone = FALSE, ... ) model_convnext_small_detection( num_classes = 91, pretrained_backbone = FALSE, ... ) model_convnext_base_detection( num_classes = 91, pretrained_backbone = FALSE, ... )
num_classes |
Number of output classes excluding background (default: 90 for COCO). |
pretrained_backbone |
Logical. If |
... |
Other arguments (unused). |
model_convnext_tiny_detection(): ConvNeXt Tiny with FPN detection head
model_convnext_small_detection(): ConvNeXt Small with FPN detection head
model_convnext_base_detection(): ConvNeXt Base with FPN detection head
Detection head weights are randomly initialized. Predicted bounding boxes
will be arbitrary until the detection head is trained on labelled data.
Only the backbone benefits from pretrained_backbone = TRUE.
Other object_detection_model:
model_facenet,
model_fasterrcnn,
model_maskrcnn
## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants norm_std <- c(0.229, 0.224, 0.225) url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") img <- base_loader(url) %>% transform_to_tensor() %>% transform_resize(c(520, 520)) input <- img %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Add batch dimension: (1, 3, H, W) # ConvNeXt Tiny detection model <- model_convnext_tiny_detection(pretrained_backbone = TRUE) model$eval() # Please wait 2 mins + on CPU pred <- model(batch)$detections[[1]] num_boxes <- as.integer(pred$boxes$size()[1]) topk <- pred$scores$topk(k = 5)[[2]] boxes <- pred$boxes[topk, ] labels <- imagenet_classes(as.integer(pred$labels[topk])) # `draw_bounding_box()` may fail if bbox values are not consistent. if (num_boxes > 0) { boxed <- draw_bounding_boxes(img, boxes, labels = labels) tensor_image_browse(boxed) } ## End(Not run)## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants norm_std <- c(0.229, 0.224, 0.225) url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") img <- base_loader(url) %>% transform_to_tensor() %>% transform_resize(c(520, 520)) input <- img %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Add batch dimension: (1, 3, H, W) # ConvNeXt Tiny detection model <- model_convnext_tiny_detection(pretrained_backbone = TRUE) model$eval() # Please wait 2 mins + on CPU pred <- model(batch)$detections[[1]] num_boxes <- as.integer(pred$boxes$size()[1]) topk <- pred$scores$topk(k = 5)[[2]] boxes <- pred$boxes[topk, ] labels <- imagenet_classes(as.integer(pred$labels[topk])) # `draw_bounding_box()` may fail if bbox values are not consistent. if (num_boxes > 0) { boxed <- draw_bounding_boxes(img, boxes, labels = labels) tensor_image_browse(boxed) } ## End(Not run)
Semantic segmentation models that use a ConvNeXt backbone with either an FCN (Fully Convolutional Network) head or a UPerNet (Unified Perceptual Parsing Network) head.
These models follow the architecture patterns from mmsegmentation and can be used for semantic segmentation tasks.
model_convnext_tiny_fcn( num_classes = 21, aux_loss = FALSE, pretrained_backbone = FALSE, ... ) model_convnext_small_fcn( num_classes = 21, aux_loss = FALSE, pretrained_backbone = FALSE, ... ) model_convnext_base_fcn( num_classes = 21, aux_loss = FALSE, pretrained_backbone = FALSE, ... ) model_convnext_tiny_upernet( num_classes = 21, aux_loss = FALSE, pretrained = FALSE, pretrained_backbone = FALSE, pool_scales = c(1, 2, 3, 6), ... ) model_convnext_small_upernet( num_classes = 21, aux_loss = FALSE, pretrained = FALSE, pretrained_backbone = FALSE, pool_scales = c(1, 2, 3, 6), ... ) model_convnext_base_upernet( num_classes = 21, aux_loss = FALSE, pretrained = FALSE, pretrained_backbone = FALSE, pool_scales = c(1, 2, 3, 6), ... )model_convnext_tiny_fcn( num_classes = 21, aux_loss = FALSE, pretrained_backbone = FALSE, ... ) model_convnext_small_fcn( num_classes = 21, aux_loss = FALSE, pretrained_backbone = FALSE, ... ) model_convnext_base_fcn( num_classes = 21, aux_loss = FALSE, pretrained_backbone = FALSE, ... ) model_convnext_tiny_upernet( num_classes = 21, aux_loss = FALSE, pretrained = FALSE, pretrained_backbone = FALSE, pool_scales = c(1, 2, 3, 6), ... ) model_convnext_small_upernet( num_classes = 21, aux_loss = FALSE, pretrained = FALSE, pretrained_backbone = FALSE, pool_scales = c(1, 2, 3, 6), ... ) model_convnext_base_upernet( num_classes = 21, aux_loss = FALSE, pretrained = FALSE, pretrained_backbone = FALSE, pool_scales = c(1, 2, 3, 6), ... )
num_classes |
Number of output segmentation classes. Default: 21 (PASCAL VOC). |
aux_loss |
If TRUE, includes an auxiliary classifier branch. Default: FALSE. |
pretrained_backbone |
If TRUE, loads ImageNet pretrained weights for the ConvNeXt backbone. Default: FALSE. |
... |
Additional arguments passed to the backbone. |
pretrained |
If TRUE, loads convnext pretrained weights of backbone and segmentation heads. |
pool_scales |
Numeric vector. Pooling scales used in the Pyramid Pooling Module for UPerNet models. Default: c(1, 2, 3, 6). |
An nn_module representing the segmentation model.
model_convnext_tiny_fcn(): ConvNeXt Tiny with FCN head
model_convnext_small_fcn(): ConvNeXt Small with FCN head
model_convnext_base_fcn(): ConvNeXt Base with FCN head
model_convnext_tiny_upernet(): ConvNeXt Tiny with UPerNet head
model_convnext_small_upernet(): ConvNeXt Small with UPerNet head
model_convnext_base_upernet(): ConvNeXt Base with UPerNet head
model_convnext_tiny_fcn()
model_convnext_small_fcn()
model_convnext_base_fcn()
model_convnext_tiny_upernet()
model_convnext_small_upernet()
model_convnext_base_upernet()
Other semantic_segmentation_model:
model_deeplabv3,
model_fcn_resnet
## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants norm_std <- c(0.229, 0.224, 0.225) # Use a publicly available image wmc <- "https://upload.wikimedia.org/wikipedia/commons/thumb/" url <- "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg" img <- base_loader(paste0(wmc, url)) input <- img %>% transform_to_tensor() %>% transform_resize(c(520, 520)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # ConvNeXt Tiny FCN segmentation model <- model_convnext_tiny_fcn(num_classes = 21, pretrained_backbone = TRUE) model$eval() output <- model(batch) # Visualize result segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) # ConvNeXt Tiny UPerNet segmentation model <- model_convnext_tiny_upernet(num_classes = 21, pretrained_backbone = TRUE) model$eval() output <- model(batch) # Visualize result segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) ## End(Not run)## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants norm_std <- c(0.229, 0.224, 0.225) # Use a publicly available image wmc <- "https://upload.wikimedia.org/wikipedia/commons/thumb/" url <- "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg" img <- base_loader(paste0(wmc, url)) input <- img %>% transform_to_tensor() %>% transform_resize(c(520, 520)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # ConvNeXt Tiny FCN segmentation model <- model_convnext_tiny_fcn(num_classes = 21, pretrained_backbone = TRUE) model$eval() output <- model(batch) # Visualize result segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) # ConvNeXt Tiny UPerNet segmentation model <- model_convnext_tiny_upernet(num_classes = 21, pretrained_backbone = TRUE) model$eval() output <- model(batch) # Visualize result segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) ## End(Not run)
Semantic segmentation models implementing the DeepLabV3 architecture from Rethinking Atrous Convolution for Semantic Image Segmentation. These models use Atrous Spatial Pyramid Pooling (ASPP) to capture multi-scale context, and are available with ResNet-50 and ResNet-101 backbones.
model_deeplabv3_resnet50()
model_deeplabv3_resnet101()
All models are trained on a 20-class subset of COCO that corresponds to Pascal VOC categories, plus background (21 classes total).
| Model | mIoU | Pixel Acc | Params | GFLOPS | File Size | Weights Used | |---------------------------|-------|-----------|--------|--------|-----------|---------------------------| | model_deeplabv3_resnet50 | 66.4% | 92.4% | 42.0M | 178.72 | 161 MB | COCO_WITH_VOC_LABELS_V1 | | model_deeplabv3_resnet101 | 67.4% | 92.4% | 61.0M | 258.74 | 233 MB | COCO_WITH_VOC_LABELS_V1 |
All models use COCO_WITH_VOC_LABELS_V1 weights, trained on COCO with the
20 Pascal VOC categories (+ background = 21 classes).
Backbone weights default to IMAGENET1K_V1 (supervised ImageNet-1k) when
pretrained = FALSE and pretrained_backbone = TRUE.
When pretrained = TRUE, backbone weights are overridden by the full
segmentation model weights and pretrained_backbone is ignored.
The auxiliary classifier branch (aux_loss) is automatically enabled when
loading pretrained weights; set explicitly when training from scratch.
Models expect input tensors of shape (batch_size, 3, H, W), normalized
with ImageNet mean c(0.485, 0.456, 0.406) and std c(0.229, 0.224, 0.225).
Training resolution is 520x520.
Returns a named list with:
$out — main segmentation logits, shape (batch, num_classes, H, W)
$aux — auxiliary logits from an intermediate backbone layer (only when aux_loss = TRUE)
model_deeplabv3_resnet50( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = FALSE, ... ) model_deeplabv3_resnet101( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = FALSE, ... )model_deeplabv3_resnet50( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = FALSE, ... ) model_deeplabv3_resnet101( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = FALSE, ... )
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
num_classes |
Integer. Number of output segmentation classes including
background. Default: |
aux_loss |
Logical or |
pretrained_backbone |
Logical. If |
... |
Other parameters passed to the resnet model. |
model_deeplabv3_resnet50(): DeepLabV3 with ResNet-50 backbone
model_deeplabv3_resnet101(): DeepLabV3 with ResNet-101 backbone
Other semantic_segmentation_model:
model_convnext_segmentation,
model_fcn_resnet
## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") img <- base_loader(url) input <- img %>% transform_to_tensor() %>% transform_resize(c(520, 520)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Add batch dimension: (1, 3, H, W) # --- ResNet-50 backbone --- model <- model_deeplabv3_resnet50(pretrained = TRUE) model$eval() output <- model(batch) segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_browse(segmented) # Show most frequent class mask_id <- output$out$argmax(dim = 2) # (1, H, W) class_contingency_with_background <- mask_id$view(-1)$bincount() class_contingency_with_background[1] <- 0L # we clean the counter for background class id 1 top_class_index <- class_contingency_with_background$argmax()$item() cli::cli_inform("Majority class {.pkg ResNet-50}: {.emph {pascal_voc_classes(top_class_index)}}") # --- ResNet-101 backbone --- model <- model_deeplabv3_resnet101(pretrained = TRUE) model$eval() output <- model(batch) segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_browse(segmented) # Show most frequent class mask_id <- output$out$argmax(dim = 2) # (1, H, W) class_contingency_with_background <- mask_id$view(-1)$bincount() class_contingency_with_background[1] <- 0L # we clean the counter for background class id 1 top_class_index <- class_contingency_with_background$argmax()$item() cli::cli_inform("Majority class {.pkg ResNet-50}: {.emph {pascal_voc_classes(top_class_index)}}") ## End(Not run)## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") img <- base_loader(url) input <- img %>% transform_to_tensor() %>% transform_resize(c(520, 520)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Add batch dimension: (1, 3, H, W) # --- ResNet-50 backbone --- model <- model_deeplabv3_resnet50(pretrained = TRUE) model$eval() output <- model(batch) segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_browse(segmented) # Show most frequent class mask_id <- output$out$argmax(dim = 2) # (1, H, W) class_contingency_with_background <- mask_id$view(-1)$bincount() class_contingency_with_background[1] <- 0L # we clean the counter for background class id 1 top_class_index <- class_contingency_with_background$argmax()$item() cli::cli_inform("Majority class {.pkg ResNet-50}: {.emph {pascal_voc_classes(top_class_index)}}") # --- ResNet-101 backbone --- model <- model_deeplabv3_resnet101(pretrained = TRUE) model$eval() output <- model(batch) segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_browse(segmented) # Show most frequent class mask_id <- output$out$argmax(dim = 2) # (1, H, W) class_contingency_with_background <- mask_id$view(-1)$bincount() class_contingency_with_background[1] <- 0L # we clean the counter for background class id 1 top_class_index <- class_contingency_with_background$argmax()$item() cli::cli_inform("Majority class {.pkg ResNet-50}: {.emph {pascal_voc_classes(top_class_index)}}") ## End(Not run)
Constructs EfficientNet model architectures as described in EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. These models are designed for image classification tasks and provide a balance between accuracy and computational efficiency through compound scaling.
model_efficientnet_b0(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b1(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b2(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b3(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b4(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b5(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b6(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b7(pretrained = FALSE, progress = TRUE, ...)model_efficientnet_b0(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b1(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b2(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b3(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b4(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b5(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b6(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_b7(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
Other parameters passed to the model implementation, such as
|
model_efficientnet_b0(): EfficientNet B0 model
model_efficientnet_b1(): EfficientNet B1 model
model_efficientnet_b2(): EfficientNet B2 model
model_efficientnet_b3(): EfficientNet B3 model
model_efficientnet_b4(): EfficientNet B4 model
model_efficientnet_b5(): EfficientNet B5 model
model_efficientnet_b6(): EfficientNet B6 model
model_efficientnet_b7(): EfficientNet B7 model
Image classification with 1000 output classes by default (ImageNet).
The models expect input tensors of shape (batch_size, 3, H, W), where H and W
should typically be 224 for B0 and scaled versions for B1–B7 (e.g., B7 uses 600x600).
| Model | Width | Depth | Resolution | Params (M) | GFLOPs | Top-1 Acc. |
| B0 | 1.0 | 1.0 | 224 | 5.3 | 0.39 | 77.1 |
| B1 | 1.0 | 1.1 | 240 | 7.8 | 0.70 | 79.1 |
| B2 | 1.1 | 1.2 | 260 | 9.2 | 1.00 | 80.1 |
| B3 | 1.2 | 1.4 | 300 | 12.0 | 1.80 | 81.6 |
| B4 | 1.4 | 1.8 | 380 | 19.0 | 4.20 | 82.9 |
| B5 | 1.6 | 2.2 | 456 | 30.0 | 9.90 | 83.6 |
| B6 | 1.8 | 2.6 | 528 | 43.0 | 19.0 | 84.0 |
| B7 | 2.0 | 3.1 | 600 | 66.0 | 37.0 | 84.3 |
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
## Not run: model <- model_efficientnet_b0() image_batch <- torch::torch_randn(1, 3, 224, 224) output <- model(image_batch) imagenet_classes(which.max(as.numeric(output))) ## End(Not run) ## Not run: # Example of using EfficientNet-B5 with its native image size model <- model_efficientnet_b5() image_batch <- torch::torch_randn(1, 3, 456, 456) output <- model(image_batch) imagenet_classes(which.max(as.numeric(output))) ## End(Not run)## Not run: model <- model_efficientnet_b0() image_batch <- torch::torch_randn(1, 3, 224, 224) output <- model(image_batch) imagenet_classes(which.max(as.numeric(output))) ## End(Not run) ## Not run: # Example of using EfficientNet-B5 with its native image size model <- model_efficientnet_b5() image_batch <- torch::torch_randn(1, 3, 456, 456) output <- model(image_batch) imagenet_classes(which.max(as.numeric(output))) ## End(Not run)
Constructs EfficientNetV2 model architectures as described in EfficientNetV2: Smaller Models and Faster Training.
model_efficientnet_v2_s(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_v2_m(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_v2_l(pretrained = FALSE, progress = TRUE, ...)model_efficientnet_v2_s(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_v2_m(pretrained = FALSE, progress = TRUE, ...) model_efficientnet_v2_l(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
Other parameters passed to the model implementation, such as
|
model_efficientnet_v2_s(): EfficientNetV2-S model
model_efficientnet_v2_m(): EfficientNetV2-M model
model_efficientnet_v2_l(): EfficientNetV2-L model
Image classification with 1000 output classes by default (ImageNet).
The models expect input tensors of shape (batch_size, 3, H, W).
Typical values for H and W are 384 for V2-S, 480 for V2-M,
and 512 for V2-L.
| Model | Resolution | Params (M) | GFLOPs | Top-1 Acc. |
| V2-S | 384 | 24 | 8.4 | 83.9 |
| V2-M | 480 | 55 | 24 | 85.1 |
| V2-L | 512 | 119 | 55 | 85.7 |
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
## Not run: model <- model_efficientnet_v2_s() input <- torch::torch_randn(1, 3, 224, 224) output <- model(input) # Show Top-5 predictions topk <- output$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)## Not run: model <- model_efficientnet_v2_s() input <- torch::torch_randn(1, 3, 224, 224) output <- model(input) # Show Top-5 predictions topk <- output$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)
These models implement the three-stage Multi-task Cascaded Convolutional Networks (MTCNN) architecture from the paper Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.
model_facenet_pnet(pretrained = TRUE, progress = FALSE, ...) model_facenet_rnet(pretrained = TRUE, progress = FALSE, ...) model_facenet_onet(pretrained = TRUE, progress = FALSE, ...) model_mtcnn(pretrained = TRUE, progress = TRUE, ...) model_facenet_inception_resnet_v1( pretrained = NULL, classify = FALSE, num_classes = 10, dropout_prob = 0.6, ... )model_facenet_pnet(pretrained = TRUE, progress = FALSE, ...) model_facenet_rnet(pretrained = TRUE, progress = FALSE, ...) model_facenet_onet(pretrained = TRUE, progress = FALSE, ...) model_mtcnn(pretrained = TRUE, progress = TRUE, ...) model_facenet_inception_resnet_v1( pretrained = NULL, classify = FALSE, num_classes = 10, dropout_prob = 0.6, ... )
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
Other parameters passed to the model implementation. |
classify |
Logical, whether to include the classification head. Default is FALSE. |
num_classes |
Integer, number of output classes for classification. Default is 10. |
dropout_prob |
Numeric, dropout probability applied before classification. Default is 0.6. |
MTCNN detects faces and facial landmarks in an image through a coarse-to-fine pipeline:
PNet (Proposal Network): Generates candidate face bounding boxes at multiple scales.
RNet (Refine Network): Refines candidate boxes, rejecting false positives.
ONet (Output Network): Produces final bounding boxes and 5-point facial landmarks.
| Model | Input Size | Parameters | File Size | Outputs | Notes | |-------|----------------|------------|-----------|-------------------------------|-----------------------------------| | PNet | ~12×12+ | ~3k | 30 kB | 2-class face prob + bbox reg | Fully conv, sliding window stage | | RNet | 24×24 | ~30k | 400 kB | 2-class face prob + bbox reg | Dense layers, higher recall | | ONet | 48×48 | ~100k | 2 MB | 2-class prob + bbox + 5-point | Landmark detection stage |
Inception-ResNet-v1 is a convolutional neural network architecture combining Inception modules with residual connections, designed for face recognition tasks. The model achieves high accuracy on standard face verification benchmarks such as LFW (Labeled Faces in the Wild).
| Weights | LFW Accuracy | File Size | |----------------|--------------|-----------| | CASIA-Webface | 99.05% | 111 MB | | VGGFace2 | 99.65% | 107 MB |
The CASIA-Webface pretrained weights provide strong baseline accuracy.
The VGGFace2 pretrained weights achieve higher accuracy, benefiting from a larger, more diverse dataset.
model_mtcnn() returns a named list with three elements:
boxes: A tensor of shape (N, 4) with bounding box coordinates [x1, y1, x2, y2].
landmarks: A tensor of shape (N, 10) with (x, y) coordinates of 5 facial landmarks:
left eye, right eye, nose, left mouth corner, right mouth corner.
cls: A tensor of shape (N, 2) with face classification probabilities
(face / non-face). The cls head has two classes:
1: Non-face probability (background)
2: Face probability — use this value for thresholding detections
(Here, N is the number of detected faces in the input image.)
model_facenet_inception_resnet_v1() returns a tensor output depending on the classify argument:
When classify = FALSE (default):
A tensor of shape (N, 512), where each row is a normalized embedding
vector (L2 norm = 1).
These 512-dimensional FaceNet embeddings can be compared using cosine
similarity or Euclidean distance for face verification and clustering.
When classify = TRUE:
A tensor of shape (N, num_classes) containing class logits.
model_facenet_pnet(): PNet (Proposal Network) — small fully-convolutional network for candidate face box generation.
model_facenet_rnet(): RNet (Refine Network) — medium CNN with dense layers for refining and rejecting false positives.
model_facenet_onet(): ONet (Output Network) — deeper CNN that outputs final bounding boxes and 5 facial landmark points.
model_mtcnn(): MTCNN (Multi-task Cascaded Convolutional Networks) — face detection and alignment using a cascade of three neural networks
model_facenet_inception_resnet_v1(): Inception-ResNet-v1 — high-accuracy face recognition model combining Inception modules with residual connections, pretrained on VGGFace2 and CASIA-Webface datasets
Other object_detection_model:
model_convnext_detection,
model_fasterrcnn,
model_maskrcnn
Other object_detection_model:
model_convnext_detection,
model_fasterrcnn,
model_maskrcnn
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
## Not run: # Example usage of PNet model_pnet <- model_facenet_pnet(pretrained = TRUE) model_pnet$eval() input_pnet <- torch_randn(1, 3, 224, 224) output_pnet <- model_pnet(input_pnet) output_pnet # Example usage of RNet model_rnet <- model_facenet_rnet(pretrained = TRUE) model_rnet$eval() input_rnet <- torch_randn(1, 3, 24, 24) output_rnet <- model_rnet(input_rnet) output_rnet # Example usage of ONet model_onet <- model_facenet_onet(pretrained = TRUE) model_onet$eval() input_onet <- torch_randn(1, 3, 48, 48) output_onet <- model_onet(input_onet) output_onet # Example usage of MTCNN mtcnn <- model_mtcnn(pretrained = TRUE) mtcnn$eval() image_tensor <- torch_randn(c(1, 3, 224, 224)) out <- mtcnn(image_tensor) out # Load an image from the web url <- paste0("https://upload.wikimedia.org/wikipedia/commons", "/b/b4/Catherine_Bell_200101233d_hr_%28cropped%29.jpg") tmp_file <- tempfile(fileext = ".jpg") download.file(url, tmp_file, mode = "wb") img <- jpeg::readJPEG(tmp_file) # Convert to torch tensor [C, H, W] normalized input <- transform_to_tensor(img) # [C, H, W] batch <- input$unsqueeze(1) # [1, C, H, W] # Load pretrained model model <- model_facenet_inception_resnet_v1(pretrained = "vggface2") model$eval() output <- model(batch) output # Example usage of Inception-ResNet-v1 with CASIA-Webface Weights model <- model_facenet_inception_resnet_v1(pretrained = "casia-webface") model$eval() output <- model(batch) output ## End(Not run)## Not run: # Example usage of PNet model_pnet <- model_facenet_pnet(pretrained = TRUE) model_pnet$eval() input_pnet <- torch_randn(1, 3, 224, 224) output_pnet <- model_pnet(input_pnet) output_pnet # Example usage of RNet model_rnet <- model_facenet_rnet(pretrained = TRUE) model_rnet$eval() input_rnet <- torch_randn(1, 3, 24, 24) output_rnet <- model_rnet(input_rnet) output_rnet # Example usage of ONet model_onet <- model_facenet_onet(pretrained = TRUE) model_onet$eval() input_onet <- torch_randn(1, 3, 48, 48) output_onet <- model_onet(input_onet) output_onet # Example usage of MTCNN mtcnn <- model_mtcnn(pretrained = TRUE) mtcnn$eval() image_tensor <- torch_randn(c(1, 3, 224, 224)) out <- mtcnn(image_tensor) out # Load an image from the web url <- paste0("https://upload.wikimedia.org/wikipedia/commons", "/b/b4/Catherine_Bell_200101233d_hr_%28cropped%29.jpg") tmp_file <- tempfile(fileext = ".jpg") download.file(url, tmp_file, mode = "wb") img <- jpeg::readJPEG(tmp_file) # Convert to torch tensor [C, H, W] normalized input <- transform_to_tensor(img) # [C, H, W] batch <- input$unsqueeze(1) # [1, C, H, W] # Load pretrained model model <- model_facenet_inception_resnet_v1(pretrained = "vggface2") model$eval() output <- model(batch) output # Example usage of Inception-ResNet-v1 with CASIA-Webface Weights model <- model_facenet_inception_resnet_v1(pretrained = "casia-webface") model$eval() output <- model(batch) output ## End(Not run)
Construct Faster R-CNN model variants for object-detection task.
model_fasterrcnn_resnet50_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_fasterrcnn_resnet50_fpn_v2( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_fasterrcnn_mobilenet_v3_large_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_fasterrcnn_mobilenet_v3_large_320_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... )model_fasterrcnn_resnet50_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_fasterrcnn_resnet50_fpn_v2( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_fasterrcnn_mobilenet_v3_large_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_fasterrcnn_mobilenet_v3_large_320_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... )
pretrained |
Logical. If TRUE, loads pretrained weights from local file. |
progress |
Logical. Show progress bar during download (unused). |
num_classes |
Number of output classes excluding background (default: 90 for COCO). |
score_thresh |
Numeric. Minimum score threshold for detections (default: 0.05). |
nms_thresh |
Numeric. Non-Maximum Suppression (NMS) IoU threshold for removing overlapping boxes (default: 0.5). |
detections_per_img |
Integer. Maximum number of detections per image (default: 100). |
... |
Other arguments (unused). |
A fasterrcnn_model nn_module.
model_fasterrcnn_resnet50_fpn(): Faster R-CNN with ResNet-50 FPN
model_fasterrcnn_resnet50_fpn_v2(): Faster R-CNN with ResNet-50 FPN V2
model_fasterrcnn_mobilenet_v3_large_fpn(): Faster R-CNN with MobileNet V3 Large FPN
model_fasterrcnn_mobilenet_v3_large_320_fpn(): Faster R-CNN with MobileNet V3 Large 320 FPN
Object detection over images with bounding boxes and class labels.
Input images should be torch_tensors of shape
(batch_size, 3, H, W) where H and W are typically around 800.
model_fasterrcnn_resnet50_fpn()
model_fasterrcnn_resnet50_fpn_v2()
model_fasterrcnn_mobilenet_v3_large_fpn()
model_fasterrcnn_mobilenet_v3_large_320_fpn()
Other object_detection_model:
model_convnext_detection,
model_facenet,
model_maskrcnn
## Not run: library(magrittr) # ImageNet normalization constants, see https://pytorch.org/vision/stable/models.html norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) # Use a publicly available image of an animal url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") image <- magick_loader(url) %>% transform_to_tensor() %>% transform_resize(c(520, 520)) # ResNet backbone requires image normalization input <- image %>% transform_normalize(norm_mean, norm_std) batch_normalized <- input$unsqueeze(1) # Add batch dimension (1, 3, H, W) # ResNet-50 FPN V2 model <- model_fasterrcnn_resnet50_fpn_v2(pretrained = TRUE, , detections_per_img = 5 ) model$eval() torch::with_no_grad({pred <- model(batch_normalized)$detections[[1]]}) labels <- coco_classes(as.integer(pred$labels)) # Visualize boxes labels <- coco_classes(as.integer(pred$labels)) boxed <- draw_bounding_boxes(image, pred$boxes, labels = labels) tensor_image_browse(boxed) # MobileNet V3 Large 320 FPN batch <- image$unsqueeze(1) # Add batch dimension (1, 3, H, W) model <- model_fasterrcnn_mobilenet_v3_large_320_fpn( pretrained = TRUE, score_thresh = 0.02, nms_thresh = 0.8, detections_per_img = 5 ) model$eval() torch::with_no_grad({pred <- model(batch)$detections[[1]]}) # Visualize boxes labels <- coco_classes(as.integer(pred$labels)) boxed <- draw_bounding_boxes(image, pred$boxes, labels = labels) tensor_image_browse(boxed) ## End(Not run)## Not run: library(magrittr) # ImageNet normalization constants, see https://pytorch.org/vision/stable/models.html norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) # Use a publicly available image of an animal url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") image <- magick_loader(url) %>% transform_to_tensor() %>% transform_resize(c(520, 520)) # ResNet backbone requires image normalization input <- image %>% transform_normalize(norm_mean, norm_std) batch_normalized <- input$unsqueeze(1) # Add batch dimension (1, 3, H, W) # ResNet-50 FPN V2 model <- model_fasterrcnn_resnet50_fpn_v2(pretrained = TRUE, , detections_per_img = 5 ) model$eval() torch::with_no_grad({pred <- model(batch_normalized)$detections[[1]]}) labels <- coco_classes(as.integer(pred$labels)) # Visualize boxes labels <- coco_classes(as.integer(pred$labels)) boxed <- draw_bounding_boxes(image, pred$boxes, labels = labels) tensor_image_browse(boxed) # MobileNet V3 Large 320 FPN batch <- image$unsqueeze(1) # Add batch dimension (1, 3, H, W) model <- model_fasterrcnn_mobilenet_v3_large_320_fpn( pretrained = TRUE, score_thresh = 0.02, nms_thresh = 0.8, detections_per_img = 5 ) model$eval() torch::with_no_grad({pred <- model(batch)$detections[[1]]}) # Visualize boxes labels <- coco_classes(as.integer(pred$labels)) boxed <- draw_bounding_boxes(image, pred$boxes, labels = labels) tensor_image_browse(boxed) ## End(Not run)
Constructs an FCN (Fully Convolutional Network) model for semantic image segmentation, based on a ResNet backbone as described in Fully Convolutional Networks for Semantic Segmentation.
model_fcn_resnet50( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = TRUE, ... ) model_fcn_resnet101( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = TRUE, ... )model_fcn_resnet50( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = TRUE, ... ) model_fcn_resnet101( pretrained = FALSE, progress = TRUE, num_classes = 21, aux_loss = NULL, pretrained_backbone = TRUE, ... )
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
num_classes |
Number of output classes. Default: 21. |
aux_loss |
If TRUE, includes the auxiliary classifier. If NULL, defaults to TRUE when |
pretrained_backbone |
If TRUE, uses a backbone pre-trained on ImageNet. |
... |
Additional arguments passed to the backbone implementation. |
The 21 output classes follow the PASCAL VOC convention:
background, aeroplane, bicycle, bird, boat,
bottle, bus, car, cat, chair,
cow, dining table, dog, horse, motorbike,
person, potted plant, sheep, sofa, train,
tv/monitor.
Pretrained weights require num_classes = 21.
An nn_module representing the FCN model.
Other semantic_segmentation_model:
model_convnext_segmentation,
model_deeplabv3
## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see # https://pytorch.org/vision/stable/models.html norm_std <- c(0.229, 0.224, 0.225) img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg" img <- base_loader(img_url) input <- img %>% transform_to_tensor() %>% transform_resize(c(520, 520)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) model <- model_fcn_resnet50(pretrained = TRUE) model$eval() output <- model(batch) # visualize the result # `draw_segmentation_masks()` turns the torch_float output into a boolean mask internaly: segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) model <- model_fcn_resnet101(pretrained = TRUE) model$eval() output <- model(batch) # visualize the result segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) ## End(Not run)## Not run: library(magrittr) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see # https://pytorch.org/vision/stable/models.html norm_std <- c(0.229, 0.224, 0.225) img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg" img <- base_loader(img_url) input <- img %>% transform_to_tensor() %>% transform_resize(c(520, 520)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) model <- model_fcn_resnet50(pretrained = TRUE) model$eval() output <- model(batch) # visualize the result # `draw_segmentation_masks()` turns the torch_float output into a boolean mask internaly: segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) model <- model_fcn_resnet101(pretrained = TRUE) model$eval() output <- model(batch) # visualize the result segmented <- draw_segmentation_masks(input, output$out$squeeze(1)) tensor_image_display(segmented) ## End(Not run)
Architecture from Rethinking the Inception Architecture for Computer Vision The required minimum input size of the model is 75x75.
model_inception_v3(pretrained = FALSE, progress = TRUE, ...)model_inception_v3(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If |
progress |
(bool): If |
... |
Used to pass keyword arguments to the Inception module:
|
Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 299 x 299, so ensure your images are sized accordingly.
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
Construct Mask R-CNN model variants for instance segmentation task. Mask R-CNN extends Faster R-CNN by adding a mask prediction branch that outputs segmentation masks for each detected object.
model_maskrcnn_resnet50_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_maskrcnn_resnet50_fpn_v2( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... )model_maskrcnn_resnet50_fpn( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... ) model_maskrcnn_resnet50_fpn_v2( pretrained = FALSE, progress = TRUE, num_classes = 90, score_thresh = 0.05, nms_thresh = 0.5, detections_per_img = 100, ... )
pretrained |
Logical. If TRUE, loads pretrained weights from local file. |
progress |
Logical. Show progress bar during download (unused). |
num_classes |
Number of output classes excluding background (default: 90 for COCO). |
score_thresh |
Numeric. Minimum score threshold for detections (default: 0.05). |
nms_thresh |
Numeric. Non-Maximum Suppression (NMS) IoU threshold for removing overlapping boxes (default: 0.5). |
detections_per_img |
Integer. Maximum number of detections per image (default: 100). |
... |
Other arguments (unused). |
A maskrcnn_model nn_module.
model_maskrcnn_resnet50_fpn(): Mask R-CNN with ResNet-50 FPN
model_maskrcnn_resnet50_fpn_v2(): Mask R-CNN with ResNet-50 FPN V2
Instance segmentation over images with bounding boxes, class labels, and segmentation masks.
Input images should be torch_tensors of shape
(batch_size, 3, H, W) where H and W are typically around 800.
Returns a list with:
features: Feature maps from the backbone
detections: List containing:
boxes: Bounding boxes (N, 4)
labels: Class labels (N)
scores: Confidence scores (N)
masks: Segmentation masks (N, 28, 28)
model_maskrcnn_resnet50_fpn()
model_maskrcnn_resnet50_fpn_v2()
Other object_detection_model:
model_convnext_detection,
model_facenet,
model_fasterrcnn
## Not run: library(magrittr) # ImageNet normalization constants, see https://pytorch.org/vision/stable/models.html norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) # Load an image url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") img <- base_loader(url) input <- img %>% transform_to_tensor() %>% transform_resize(c(800, 800)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Mask R-CNN ResNet-50 FPN model <- model_maskrcnn_resnet50_fpn(pretrained = TRUE, , detections_per_img = 5) model$eval() torch::with_no_grad({pred <- model(batch)$detections[[1]]}) # Visualize boxes labels <- coco_classes(as.integer(pred$labels)) boxed <- draw_bounding_boxes(image, pred$boxes, labels = labels) tensor_image_browse(boxed) ## End(Not run)## Not run: library(magrittr) # ImageNet normalization constants, see https://pytorch.org/vision/stable/models.html norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) # Load an image url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/", "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg") img <- base_loader(url) input <- img %>% transform_to_tensor() %>% transform_resize(c(800, 800)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Mask R-CNN ResNet-50 FPN model <- model_maskrcnn_resnet50_fpn(pretrained = TRUE, , detections_per_img = 5) model$eval() torch::with_no_grad({pred <- model(batch)$detections[[1]]}) # Visualize boxes labels <- coco_classes(as.integer(pred$labels)) boxed <- draw_bounding_boxes(image, pred$boxes, labels = labels) tensor_image_browse(boxed) ## End(Not run)
Implementation of the MaxViT architecture described in MaxViT: Multi-Axis Vision Transformer. The model performs image classification and by default returns logits for 1000 ImageNet classes.
model_maxvit(pretrained = FALSE, progress = TRUE, num_classes = 1000, ...)model_maxvit(pretrained = FALSE, progress = TRUE, num_classes = 1000, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
num_classes |
(integer) Number of output classes. |
... |
Additional parameters passed to the model initializer. |
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
## Not run: library(magrittr) # 1. Load the basketball image img_url <- "https://upload.wikimedia.org/wikipedia/commons/7/7a/Basketball.png" img <- base_loader(img_url) # 2. Define normalization (ImageNet) norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) # 3. Preprocess: convert to tensor, resize, Normalize input <- img %>% transform_to_tensor() %>% transform_resize(c(400, 400)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Add batch dimension (1, 3, H, W) # 4. Display the image before normalization tensor_image_browse(input) # 5. Load MaxViT model model <- model_maxvit(pretrained = TRUE) model$eval() # 6. Run inference output <- model(batch) topk <- output$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) # 7. Show Top-5 predictions glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)## Not run: library(magrittr) # 1. Load the basketball image img_url <- "https://upload.wikimedia.org/wikipedia/commons/7/7a/Basketball.png" img <- base_loader(img_url) # 2. Define normalization (ImageNet) norm_mean <- c(0.485, 0.456, 0.406) norm_std <- c(0.229, 0.224, 0.225) # 3. Preprocess: convert to tensor, resize, Normalize input <- img %>% transform_to_tensor() %>% transform_resize(c(400, 400)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # Add batch dimension (1, 3, H, W) # 4. Display the image before normalization tensor_image_browse(input) # 5. Load MaxViT model model <- model_maxvit(pretrained = TRUE) model$eval() # 6. Run inference output <- model(batch) topk <- output$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) # 7. Show Top-5 predictions glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)
Constructs a MobileNetV2 architecture from MobileNetV2: Inverted Residuals and Linear Bottlenecks.
model_mobilenet_v2(pretrained = FALSE, progress = TRUE, ...)model_mobilenet_v2(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
Other parameters passed to the model implementation. |
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v3,
model_resnet,
model_vgg,
model_vit
MobileNetV3 is a state-of-the-art lightweight convolutional neural network architecture designed for mobile and embedded vision applications. This implementation follows the design and optimizations presented in the original paper:MobileNetV3: Searching for MobileNetV3
This function mirrors torchvision::quantization::mobilenet_v3_large and
loads quantized weights when pretrained is TRUE.
model_mobilenet_v3_large( pretrained = FALSE, progress = TRUE, num_classes = 1000, width_mult = 1 ) model_mobilenet_v3_small( pretrained = FALSE, progress = TRUE, num_classes = 1000, width_mult = 1 ) model_mobilenet_v3_large_quantized(pretrained = FALSE, progress = TRUE, ...)model_mobilenet_v3_large( pretrained = FALSE, progress = TRUE, num_classes = 1000, width_mult = 1 ) model_mobilenet_v3_small( pretrained = FALSE, progress = TRUE, num_classes = 1000, width_mult = 1 ) model_mobilenet_v3_large_quantized(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
num_classes |
number of output classes (default: 1000). |
width_mult |
width multiplier for model scaling (default: 1.0). |
... |
Other parameters passed to the model implementation. |
The model includes two variants:
model_mobilenet_v3_large()
model_mobilenet_v3_small()
Both variants utilize efficient blocks such as inverted residuals, squeeze-and-excitation (SE) modules, and hard-swish activations for improved accuracy and efficiency.
| Model | Top-1 Acc | Top-5 Acc | Params | GFLOPS | File Size | Notes | |------------------------|-----------|-----------|---------|--------|-----------|-------------------------------------| | MobileNetV3 Large | 74.04% | 91.34% | 5.48M | 0.22 | 21.1 MB | Trained from scratch, simple recipe | | MobileNetV3 Small | 67.67% | 87.40% | 2.54M | 0.06 | 9.8 MB | Improved recipe over original paper |
model_mobilenet_v3_large(): MobileNetV3 Large model with about 5.5 million parameters.
model_mobilenet_v3_small(): MobileNetV3 Small model with about 2.5 million parameters.
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_resnet,
model_vgg,
model_vit
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_resnet,
model_vgg,
model_vit
## Not run: # 1. Download sample image (dog) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see # https://pytorch.org/vision/stable/models.html norm_std <- c(0.229, 0.224, 0.225) img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg" img <- base_loader(img_url) # 2. Convert to tensor (RGB only), resize and normalize input <- img %>% transform_to_tensor() %>% transform_resize(c(224, 224)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # 3. Load pretrained models model_small <- model_mobilenet_v3_small(pretrained = TRUE) model_small$eval() # 4. Forward pass output_s <- model_small(batch) # 5. Top-5 printing helper topk <- output_s$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) # 6. Show Top-5 predictions glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") # 7. Same with large model model_large <- model_mobilenet_v3_large(pretrained = TRUE) model_large$eval() output_l <- model_large(input) topk <- output_l$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)## Not run: # 1. Download sample image (dog) norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see # https://pytorch.org/vision/stable/models.html norm_std <- c(0.229, 0.224, 0.225) img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg" img <- base_loader(img_url) # 2. Convert to tensor (RGB only), resize and normalize input <- img %>% transform_to_tensor() %>% transform_resize(c(224, 224)) %>% transform_normalize(norm_mean, norm_std) batch <- input$unsqueeze(1) # 3. Load pretrained models model_small <- model_mobilenet_v3_small(pretrained = TRUE) model_small$eval() # 4. Forward pass output_s <- model_small(batch) # 5. Top-5 printing helper topk <- output_s$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) # 6. Show Top-5 predictions glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") # 7. Same with large model model_large <- model_mobilenet_v3_large(pretrained = TRUE) model_large$eval() output_l <- model_large(input) topk <- output_l$topk(k = 5, dim = 2) indices <- as.integer(topk[[2]][1, ]) scores <- as.numeric(topk[[1]][1, ]) glue::glue("{seq_along(indices)}. {imagenet_classes(indices)} ({round(scores, 2)}%)") ## End(Not run)
ResNet models implementation from Deep Residual Learning for Image Recognition and later related papers (see Functions)
model_resnet18(pretrained = FALSE, progress = TRUE, ...) model_resnet34(pretrained = FALSE, progress = TRUE, ...) model_resnet50(pretrained = FALSE, progress = TRUE, ...) model_resnet101(pretrained = FALSE, progress = TRUE, ...) model_resnet152(pretrained = FALSE, progress = TRUE, ...) model_resnext50_32x4d(pretrained = FALSE, progress = TRUE, ...) model_resnext101_32x8d(pretrained = FALSE, progress = TRUE, ...) model_wide_resnet50_2(pretrained = FALSE, progress = TRUE, ...) model_wide_resnet101_2(pretrained = FALSE, progress = TRUE, ...)model_resnet18(pretrained = FALSE, progress = TRUE, ...) model_resnet34(pretrained = FALSE, progress = TRUE, ...) model_resnet50(pretrained = FALSE, progress = TRUE, ...) model_resnet101(pretrained = FALSE, progress = TRUE, ...) model_resnet152(pretrained = FALSE, progress = TRUE, ...) model_resnext50_32x4d(pretrained = FALSE, progress = TRUE, ...) model_resnext101_32x8d(pretrained = FALSE, progress = TRUE, ...) model_wide_resnet50_2(pretrained = FALSE, progress = TRUE, ...) model_wide_resnet101_2(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
Other parameters passed to the resnet model. |
model_resnet18(): ResNet 18-layer model
model_resnet34(): ResNet 34-layer model
model_resnet50(): ResNet 50-layer model
model_resnet101(): ResNet 101-layer model
model_resnet152(): ResNet 152-layer model
model_resnext50_32x4d(): ResNeXt-50 32x4d model from "Aggregated Residual Transformation for Deep Neural Networks"
with 32 groups having each a width of 4.
model_resnext101_32x8d(): ResNeXt-101 32x8d model from "Aggregated Residual Transformation for Deep Neural Networks"
with 32 groups having each a width of 8.
model_wide_resnet50_2(): Wide ResNet-50-2 model from "Wide Residual Networks"
with width per group of 128.
model_wide_resnet101_2(): Wide ResNet-101-2 model from "Wide Residual Networks"
with width per group of 128.
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_vgg,
model_vit
VGG models implementations based on Very Deep Convolutional Networks For Large-Scale Image Recognition
model_vgg11(pretrained = FALSE, progress = TRUE, ...) model_vgg11_bn(pretrained = FALSE, progress = TRUE, ...) model_vgg13(pretrained = FALSE, progress = TRUE, ...) model_vgg13_bn(pretrained = FALSE, progress = TRUE, ...) model_vgg16(pretrained = FALSE, progress = TRUE, ...) model_vgg16_bn(pretrained = FALSE, progress = TRUE, ...) model_vgg19(pretrained = FALSE, progress = TRUE, ...) model_vgg19_bn(pretrained = FALSE, progress = TRUE, ...)model_vgg11(pretrained = FALSE, progress = TRUE, ...) model_vgg11_bn(pretrained = FALSE, progress = TRUE, ...) model_vgg13(pretrained = FALSE, progress = TRUE, ...) model_vgg13_bn(pretrained = FALSE, progress = TRUE, ...) model_vgg16(pretrained = FALSE, progress = TRUE, ...) model_vgg16_bn(pretrained = FALSE, progress = TRUE, ...) model_vgg19(pretrained = FALSE, progress = TRUE, ...) model_vgg19_bn(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr |
... |
other parameters passed to the VGG model implementation. |
model_vgg11(): VGG 11-layer model (configuration "A")
model_vgg11_bn(): VGG 11-layer model (configuration "A") with batch normalization
model_vgg13(): VGG 13-layer model (configuration "B")
model_vgg13_bn(): VGG 13-layer model (configuration "B") with batch normalization
model_vgg16(): VGG 13-layer model (configuration "D")
model_vgg16_bn(): VGG 13-layer model (configuration "D") with batch normalization
model_vgg19(): VGG 19-layer model (configuration "E")
model_vgg19_bn(): VGG 19-layer model (configuration "E") with batch normalization
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vit
Vision Transformer (ViT) models implement the architecture proposed in the paper An Image is Worth 16x16 Words. These models are designed for image classification tasks and operate by treating image patches as tokens in a Transformer model.
model_vit_b_16(pretrained = FALSE, progress = TRUE, ...) model_vit_b_32(pretrained = FALSE, progress = TRUE, ...) model_vit_l_16(pretrained = FALSE, progress = TRUE, ...) model_vit_l_32(pretrained = FALSE, progress = TRUE, ...) model_vit_h_14(pretrained = FALSE, progress = TRUE, ...)model_vit_b_16(pretrained = FALSE, progress = TRUE, ...) model_vit_b_32(pretrained = FALSE, progress = TRUE, ...) model_vit_l_16(pretrained = FALSE, progress = TRUE, ...) model_vit_l_32(pretrained = FALSE, progress = TRUE, ...) model_vit_h_14(pretrained = FALSE, progress = TRUE, ...)
pretrained |
(bool): If TRUE, returns a model pre-trained on ImageNet. |
progress |
(bool): If TRUE, displays a progress bar of the download to stderr. |
... |
Other parameters passed to the model implementation. |
| Model | Top-1 Acc | Top-5 Acc | Params | GFLOPS | File Size | Weights Used | Notes | |-----------|-----------|-----------|---------|--------|-----------|---------------------------|------------------------| | vit_b_16 | 81.1% | 95.3% | 86.6M | 17.56 | 346 MB | IMAGENET1K_V1 | Base, 16x16 patches | | vit_b_32 | 75.9% | 92.5% | 88.2M | 4.41 | 353 MB | IMAGENET1K_V1 | Base, 32x32 patches | | vit_l_16 | 79.7% | 94.6% | 304.3M | 61.55 | 1.22 GB | IMAGENET1K_V1 | Large, 16x16 patches | | vit_l_32 | 77.0% | 93.1% | 306.5M | 15.38 | 1.23 GB | IMAGENET1K_V1 | Large, 32x32 patches | | vit_h_14 | 88.6% | 98.7% | 633.5M | 1016.7 | 2.53 GB | IMAGENET1K_SWAG_E2E_V1 | Huge, 14x14 patches |
TorchVision Recipe: https://github.com/pytorch/vision/tree/main/references/classification
SWAG Recipe: https://github.com/facebookresearch/SWAG
Weights Selection:
All models use the default IMAGENET1K_V1 weights for consistency, stability, and official support from TorchVision.
These are supervised weights trained on ImageNet-1k.
For vit_h_14, the default weight is IMAGENET1K_SWAG_E2E_V1, pretrained on SWAG and fine-tuned on ImageNet.
model_vit_b_16(): ViT-B/16 model (Base, 16×16 patch size)
model_vit_b_32(): ViT-B/32 model (Base, 32×32 patch size)
model_vit_l_16(): ViT-L/16 model (Base, 16×16 patch size)
model_vit_l_32(): ViT-L/32 model (Base, 32×32 patch size)
model_vit_h_14(): ViT-H/14 model (Base, 14×14 patch size)
Other classification_model:
model_alexnet(),
model_convnext,
model_efficientnet,
model_efficientnet_v2,
model_facenet,
model_inception_v3(),
model_maxvit(),
model_mobilenet_v2(),
model_mobilenet_v3,
model_resnet,
model_vgg
Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU). NMS iteratively removes lower scoring boxes which have an IoU greater than iou_threshold with another (higher scoring) box.
nms(boxes, scores, iou_threshold)nms(boxes, scores, iou_threshold)
boxes |
(Tensor[N, 4])): boxes to perform NMS on. They are
expected to be in
|
scores |
(Tensor[N]): scores for each one of the boxes |
iou_threshold |
(float): discards all overlapping boxes with IoU > iou_threshold |
If multiple boxes have the exact same score and satisfy the IoU criterion with respect to a reference box, the selected box is not guaranteed to be the same between CPU and GPU. This is similar to the behavior of argsort in torch when repeated values are present.
Current algorithm has a time complexity of O(n^2) and runs in native R. It may be improve in the future by a Rcpp implementation or through alternative algorithm
keep (Tensor): int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores.
Oxford-IIIT Pet Datasets
oxfordiiitpet_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) oxfordiiitpet_binary_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )oxfordiiitpet_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE ) oxfordiiitpet_binary_dataset( root = tempdir(), train = TRUE, transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory where the dataset is stored or will be downloaded to. Files are placed under |
train |
Logical. If TRUE, use the training set; otherwise, use the test set. Not applicable to all datasets. |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
The Oxford-IIIT Pet collection is a classification dataset consisting of high-quality images of 37 cat and dog breeds. It includes two variants:
oxfordiiitpet_dataset: Multi-class classification across 37 pet breeds.
oxfordiiitpet_binary_dataset: Binary classification distinguishing cats vs dogs.
The Oxford-IIIT Pet dataset contains over 7,000 images across 37 categories, with roughly 200 images per class. Each image is labeled with its breed and species (cat/dog).
A torch dataset object oxfordiiitpet_dataset or oxfordiiitpet_binary_dataset.
Each element is a named list with:
x: A H x W x 3 integer array representing an RGB image.
y: An integer label:
For oxfordiiitpet_dataset: a value from 1–37 representing the breed.
For oxfordiiitpet_binary_dataset: 1 for Cat, 2 for Dog.
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: # Multi-class version oxford <- oxfordiiitpet_dataset(download = TRUE) first_item <- oxford[1] first_item$x # RGB image first_item$y # Label in 1–37 oxford$classes[first_item$y] # Breed name # Binary version oxford_bin <- oxfordiiitpet_binary_dataset(download = TRUE) first_item <- oxford_bin[1] first_item$x # RGB image first_item$y # 1 for Cat, 2 for Dog oxford_bin$classes[first_item$y] # "Cat" or "Dog" ## End(Not run)## Not run: # Multi-class version oxford <- oxfordiiitpet_dataset(download = TRUE) first_item <- oxford[1] first_item$x # RGB image first_item$y # Label in 1–37 oxford$classes[first_item$y] # Breed name # Binary version oxford_bin <- oxfordiiitpet_binary_dataset(download = TRUE) first_item <- oxford_bin[1] first_item$x # RGB image first_item$y # 1 for Cat, 2 for Dog oxford_bin$classes[first_item$y] # "Cat" or "Dog" ## End(Not run)
The Oxford-IIIT Pet Dataset is a segmentation dataset consisting of color images of 37 pet breeds (cats and dogs). Each image is annotated with a pixel-level trimap segmentation mask, identifying pet, background, and outline regions. It is commonly used for evaluating models on object segmentation tasks.
oxfordiiitpet_segmentation_dataset( root = tempdir(), train = TRUE, target_type = "category", transform = NULL, target_transform = NULL, download = FALSE )oxfordiiitpet_segmentation_dataset( root = tempdir(), train = TRUE, target_type = "category", transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory where the dataset is stored or will be downloaded to. Files are placed under |
train |
Logical. If TRUE, use the training set; otherwise, use the test set. Not applicable to all datasets. |
target_type |
Character. One of |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
A torch dataset object oxfordiiitpet_dataset. Each item is a named list:
x: a H x W x 3 integer array representing an RGB image.
y$masks: a boolean tensor of shape (3, H, W), representing the segmentation trimap as one-hot masks.
y$label: an integer representing the class label, depending on the target_type:
"category": an integer in 1–37 indicating the pet breed.
"binary-category": 1 for Cat, 2 for Dog.
Other segmentation_dataset:
coco_segmentation_dataset(),
pascal_voc_datasets,
rf100_peixos_segmentation_dataset()
## Not run: # Load the Oxford-IIIT Pet dataset with basic tensor transform oxfordiiitpet <- oxfordiiitpet_segmentation_dataset( transform = transform_to_tensor, download = TRUE ) # Retrieve the image tensor and label (trimap in raw format) first_item <- oxfordiiitpet[1] first_item$x # RGB image tensor of shape (3, H, W) first_item$y$trimap # (H, W) integer tensor: 1=pet, 2=background, 3=outline first_item$y$label # Integer label (1–37 or 1–2 depending on target_type) oxfordiiitpet$classes[first_item$y$label] # Class name of the label # Load dataset with explicit segmentation mask transformation oxfordiiitpet_masked <- oxfordiiitpet_segmentation_dataset( transform = transform_to_tensor, target_transform = target_transform_trimap_masks, download = TRUE ) masked_item <- oxfordiiitpet_masked[1] masked_item$y$masks # (3, H, W) bool tensor: pet, background, outline # Visualize segmentation masks overlay <- draw_segmentation_masks(masked_item) tensor_image_browse(overlay) ## End(Not run)## Not run: # Load the Oxford-IIIT Pet dataset with basic tensor transform oxfordiiitpet <- oxfordiiitpet_segmentation_dataset( transform = transform_to_tensor, download = TRUE ) # Retrieve the image tensor and label (trimap in raw format) first_item <- oxfordiiitpet[1] first_item$x # RGB image tensor of shape (3, H, W) first_item$y$trimap # (H, W) integer tensor: 1=pet, 2=background, 3=outline first_item$y$label # Integer label (1–37 or 1–2 depending on target_type) oxfordiiitpet$classes[first_item$y$label] # Class name of the label # Load dataset with explicit segmentation mask transformation oxfordiiitpet_masked <- oxfordiiitpet_segmentation_dataset( transform = transform_to_tensor, target_transform = target_transform_trimap_masks, download = TRUE ) masked_item <- oxfordiiitpet_masked[1] masked_item$y$masks # (3, H, W) bool tensor: pet, background, outline # Visualize segmentation masks overlay <- draw_segmentation_masks(masked_item) tensor_image_browse(overlay) ## End(Not run)
The Pascal Visual Object Classes (VOC) dataset is a widely used benchmark for object detection and semantic segmentation tasks in computer vision.
pascal_voc_classes(class_id = 1:21)pascal_voc_classes(class_id = 1:21)
class_id |
Integer vector of 1-based class identifiers. Must be within [1, 21]. |
This dataset provides RGB images along with per-pixel class segmentation masks for 20 object categories, plus a background class. Each pixel in the mask is labeled with a class index corresponding to one of the predefined semantic categories.
The VOC dataset was released in yearly editions (2007 to 2012), with slight variations in data splits and annotation formats.
Notably, only the 2007 edition includes a separate test split; all other years (2008–2012) provide only the train, val, and trainval splits.
The dataset defines 21 semantic classes: "background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair",
"cow", "dining table", "dog", "horse", "motorbike", "person", "potted plant", "sheep", "sofa", "train", and "tv/monitor".
They are available through the classes variable of the dataset object.
Other class_resolution:
caltech_classes(),
coco_classes(),
imagenet_classes()
This dataset is frequently used for training and evaluating semantic segmentation models, and supports tasks requiring dense, per-pixel annotations.
pascal_segmentation_dataset( root = tempdir(), year = "2012", split = "train", transform = NULL, target_transform = NULL, download = FALSE ) pascal_detection_dataset( root = tempdir(), year = "2012", split = "train", transform = NULL, target_transform = NULL, download = FALSE )pascal_segmentation_dataset( root = tempdir(), year = "2012", split = "train", transform = NULL, target_transform = NULL, download = FALSE ) pascal_detection_dataset( root = tempdir(), year = "2012", split = "train", transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory where the dataset will be stored under |
year |
Character. VOC dataset version to use. One of |
split |
Character. One of |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
A torch dataset of class pascal_segmentation_dataset.
The returned list inherits class image_with_segmentation_mask, which allows generic visualization
utilities to be applied.
Each element is a named list with the following structure:
x: a H x W x 3 array representing the RGB image.
y: A named list containing:
masks: A torch_tensor of dtype bool and shape (21, H, W), representing a multi-channel segmentation mask.
Each of the 21 channels corresponds to a Pascal VOC classes
labels: An integer vector indicating the indices of the classes present in the mask.
A torch dataset of class pascal_detection_dataset.
The returned list inherits class image_with_bounding_box, which allows generic visualization
utilities to be applied.
Each element is a named list:
x: a H x W x 3 array representing the RGB image.
y: a list with:
labels: a character vector with object class names.
boxes: a tensor of shape (N, 4) with bounding box coordinates in (xmin, ymin, xmax, ymax) format.
Other segmentation_dataset:
coco_segmentation_dataset(),
oxfordiiitpet_segmentation_dataset(),
rf100_peixos_segmentation_dataset()
Other detection_dataset:
coco_detection_dataset(),
rf100_biology_collection(),
rf100_damage_collection(),
rf100_document_collection(),
rf100_infrared_collection(),
rf100_medical_collection(),
rf100_underwater_collection()
## Not run: # Load Pascal VOC segmentation dataset (2007 train split) pascal_seg <- pascal_segmentation_dataset( transform = transform_to_tensor, download = TRUE, year = "2007" ) # Access the first image and its mask first_item <- pascal_seg[1] first_item$x # Image first_item$y$masks # Segmentation mask first_item$y$labels # Unique class labels in the mask pascal_voc_classes(first_item$y$labels) # Class names # Visualise the first image and its mask masked_img <- draw_segmentation_masks(first_item) tensor_image_browse(masked_img) # Load Pascal VOC detection dataset (2007 train split) pascal_det <- pascal_detection_dataset( transform = transform_to_tensor, download = TRUE, year = "2007" ) # Access the first image and its bounding boxes first_item <- pascal_det[1] first_item$x # Image first_item$y$labels # Object labels first_item$y$boxes # Bounding box tensor # Visualise the first image with bounding boxes boxed_img <- draw_bounding_boxes(first_item) tensor_image_browse(boxed_img) ## End(Not run)## Not run: # Load Pascal VOC segmentation dataset (2007 train split) pascal_seg <- pascal_segmentation_dataset( transform = transform_to_tensor, download = TRUE, year = "2007" ) # Access the first image and its mask first_item <- pascal_seg[1] first_item$x # Image first_item$y$masks # Segmentation mask first_item$y$labels # Unique class labels in the mask pascal_voc_classes(first_item$y$labels) # Class names # Visualise the first image and its mask masked_img <- draw_segmentation_masks(first_item) tensor_image_browse(masked_img) # Load Pascal VOC detection dataset (2007 train split) pascal_det <- pascal_detection_dataset( transform = transform_to_tensor, download = TRUE, year = "2007" ) # Access the first image and its bounding boxes first_item <- pascal_det[1] first_item$x # Image first_item$y$labels # Object labels first_item$y$boxes # Bounding box tensor # Visualise the first image with bounding boxes boxed_img <- draw_bounding_boxes(first_item) tensor_image_browse(boxed_img) ## End(Not run)
Loads the MIT Places365 dataset for scene classification.
places365_dataset( root = tempdir(), split = c("train", "val", "test"), transform = NULL, target_transform = NULL, download = FALSE, loader = magick_loader ) places365_dataset_large( root = tempdir(), split = c("train", "val", "test"), transform = NULL, target_transform = NULL, download = FALSE, loader = magick_loader )places365_dataset( root = tempdir(), split = c("train", "val", "test"), transform = NULL, target_transform = NULL, download = FALSE, loader = magick_loader ) places365_dataset_large( root = tempdir(), split = c("train", "val", "test"), transform = NULL, target_transform = NULL, download = FALSE, loader = magick_loader )
root |
Root directory for dataset storage. The dataset will be stored under |
split |
One of |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
loader |
A function to load an image given its path. Defaults to
|
The dataset provides three splits: "train", "val", and "test".
Folder structure and image layout on disk are handled internally by the loader.
This function downloads and prepares the smaller 256x256 image version (~30 GB).
For the high-resolution variant (~160 GB), use places365_dataset_large().
Note that images in the large version come in varying sizes, so resizing may be
needed before batching.
The test split corresponds to the private evaluation set used in the
Places365 challenge. Annotation files are not publicly released, so only the
images are provided.
A torch dataset of class places365_dataset. Each element is a named
list with:
x: the image as loaded (or transformed if transform is set).
y: the integer class label. For the test split, no labels are available
and y will always be NA.
places365_dataset_large(): High resolution variant (~160 GB).
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: ds <- places365_dataset( split = "val", download = TRUE, transform = transform_to_tensor ) item <- ds[1] tensor_image_browse(item$x) # Show class index and label label_idx <- item$y label_name <- ds$classes[label_idx] label_idx # Class Index label_name # Name of the Label dl <- dataloader(ds, batch_size = 2) batch <- dataloader_next(dataloader_make_iter(dl)) batch$x ds_large <- places365_dataset_large( split = "val", download = TRUE, transform = . %>% transform_to_tensor() %>% transform_resize(c(256, 256)) ) dl <- torch::dataloader(dataset = ds_large, batch_size = 2) batch <- dataloader_next(dataloader_make_iter(dl)) batch$x ## End(Not run)## Not run: ds <- places365_dataset( split = "val", download = TRUE, transform = transform_to_tensor ) item <- ds[1] tensor_image_browse(item$x) # Show class index and label label_idx <- item$y label_name <- ds$classes[label_idx] label_idx # Class Index label_name # Name of the Label dl <- dataloader(ds, batch_size = 2) batch <- dataloader_next(dataloader_make_iter(dl)) batch$x ds_large <- places365_dataset_large( split = "val", download = TRUE, transform = . %>% transform_to_tensor() %>% transform_resize(c(256, 256)) ) dl <- torch::dataloader(dataset = ds_large, batch_size = 2) batch <- dataloader_next(dataloader_make_iter(dl)) batch$x ## End(Not run)
Remove boxes which contains at least one side smaller than min_size.
remove_small_boxes(boxes, min_size)remove_small_boxes(boxes, min_size)
boxes |
(Tensor[N, 4]): boxes in
|
min_size |
(float): minimum size |
keep (Tensor[K]): indices of the boxes that have both sides larger than min_size
Loads one of the RoboFlow 100 Biology datasets with bounding box annotations for object-detection task.
rf100_biology_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )rf100_biology_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )
dataset |
Dataset to select within |
split |
the subset of the dataset to choose between |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
download |
Logical. If TRUE, downloads the dataset if not present at |
A torch dataset. Each element is a named list with:
x: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
y: a list containing the target with:
image_id: numeric identifier of the x image.
labels: numeric identifier of the N bounding-box object class.
boxes: a torch_tensor of shape (N, 4) with bounding boxes, each in format.
The returned item inherits the class image_with_bounding_box so it can be
visualised with helper functions such as draw_bounding_boxes().
Other detection_dataset:
coco_detection_dataset(),
pascal_voc_datasets,
rf100_damage_collection(),
rf100_document_collection(),
rf100_infrared_collection(),
rf100_medical_collection(),
rf100_underwater_collection()
## Not run: ds <- rf100_biology_collection( dataset = "stomata_cell", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)## Not run: ds <- rf100_biology_collection( dataset = "stomata_cell", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)
Loads one of the RoboFlow 100 Damage & Risk assesment datasets with bounding box annotations for object-detection task.
rf100_damage_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )rf100_damage_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )
dataset |
Dataset to select within |
split |
the subset of the dataset to choose between |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
download |
Logical. If TRUE, downloads the dataset if not present at |
A torch dataset. Each element is a named list with:
x: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
y: a list containing the target with:
image_id: numeric identifier of the x image.
labels: numeric identifier of the N bounding-box object class.
boxes: a torch_tensor of shape (N, 4) with bounding boxes, each in format.
The returned item inherits the class image_with_bounding_box so it can be
visualised with helper functions such as draw_bounding_boxes().
Other detection_dataset:
coco_detection_dataset(),
pascal_voc_datasets,
rf100_biology_collection(),
rf100_document_collection(),
rf100_infrared_collection(),
rf100_medical_collection(),
rf100_underwater_collection()
## Not run: ds <- rf100_damage_collection( dataset = "solar_panel", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)## Not run: ds <- rf100_damage_collection( dataset = "solar_panel", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)
RoboFlow 100 Document dataset Collection
rf100_document_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )rf100_document_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )
dataset |
Dataset to select within |
split |
the subset of the dataset to choose between |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
download |
Logical. If TRUE, downloads the dataset if not present at |
Loads one of the RoboFlow 100 Document datasets with bounding box annotations for object-detection task.
A torch dataset. Each element is a named list with:
x: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
y: a list containing the target with:
image_id: numeric identifier of the x image.
labels: numeric identifier of the N bounding-box object class.
boxes: a torch_tensor of shape (N, 4) with bounding boxes, each in format.
The returned item inherits the class image_with_bounding_box so it can be
visualised with helper functions such as draw_bounding_boxes().
Other detection_dataset:
coco_detection_dataset(),
pascal_voc_datasets,
rf100_biology_collection(),
rf100_damage_collection(),
rf100_infrared_collection(),
rf100_medical_collection(),
rf100_underwater_collection()
## Not run: ds <- rf100_document_collection( dataset = "tweeter_post", split = "train", transform = transform_to_tensor, download = TRUE ) # Retrieve a sample and inspect annotations item <- ds[1] item$y$labels item$y$boxes # Draw bounding boxes and display the image boxed_img <- draw_bounding_boxes(item) tensor_image_browse(boxed_img) ## End(Not run)## Not run: ds <- rf100_document_collection( dataset = "tweeter_post", split = "train", transform = transform_to_tensor, download = TRUE ) # Retrieve a sample and inspect annotations item <- ds[1] item$y$labels item$y$boxes # Draw bounding boxes and display the image boxed_img <- draw_bounding_boxes(item) tensor_image_browse(boxed_img) ## End(Not run)
Loads one of the RoboFlow 100 Infrared datasets with per-dataset folders and train/valid/test splits.
rf100_infrared_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )rf100_infrared_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )
dataset |
Dataset to select within |
split |
the subset of the dataset to choose between |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
download |
Logical. If TRUE, downloads the dataset if not present at |
A torch dataset. Each element is a named list with:
x: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
y: a list containing the target with:
image_id: numeric identifier of the x image.
labels: numeric identifier of the N bounding-box object class.
boxes: a torch_tensor of shape (N, 4) with bounding boxes, each in format.
The returned item inherits the class image_with_bounding_box so it can be
visualised with helper functions such as draw_bounding_boxes().
Other detection_dataset:
coco_detection_dataset(),
pascal_voc_datasets,
rf100_biology_collection(),
rf100_damage_collection(),
rf100_document_collection(),
rf100_medical_collection(),
rf100_underwater_collection()
## Not run: ds <- rf100_infrared_collection( dataset = "thermal_dog_and_people", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)## Not run: ds <- rf100_infrared_collection( dataset = "thermal_dog_and_people", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)
Loads one of the RoboFlow 100 Medical datasets with per-dataset folders and train/valid/test splits.
rf100_medical_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )rf100_medical_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )
dataset |
Dataset to select within |
split |
the subset of the dataset to choose between |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
download |
Logical. If TRUE, downloads the dataset if not present at |
A torch dataset. Each element is a named list with:
x: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
y: a list containing the target with:
image_id: numeric identifier of the x image.
labels: numeric identifier of the N bounding-box object class.
boxes: a torch_tensor of shape (N, 4) with bounding boxes, each in format.
The returned item inherits the class image_with_bounding_box so it can be
visualised with helper functions such as draw_bounding_boxes().
Other detection_dataset:
coco_detection_dataset(),
pascal_voc_datasets,
rf100_biology_collection(),
rf100_damage_collection(),
rf100_document_collection(),
rf100_infrared_collection(),
rf100_underwater_collection()
## Not run: ds <- rf100_medical_collection( dataset = "rheumatology", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)## Not run: ds <- rf100_medical_collection( dataset = "rheumatology", split = "test", transform = transform_to_tensor, download = TRUE ) item <- ds[1] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)
Loads the Roboflow 100 "peixos" dataset for semantic segmentation. "peixos" contains 3 splits of respectively 821 / 118 / 251 color images of size 640 x 640. Segmentation masks are generated on-the-fly from polygon annotations of the unique "fish" category.
rf100_peixos_segmentation_dataset( split = c("train", "test", "valid"), root = tempdir(), download = FALSE, transform = NULL, target_transform = NULL )rf100_peixos_segmentation_dataset( split = c("train", "test", "valid"), root = tempdir(), download = FALSE, transform = NULL, target_transform = NULL )
split |
the subset of the dataset to choose between |
root |
directory path to download the dataset. |
download |
Logical. If TRUE, downloads the dataset if not present at |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
A torch dataset. Each element is a named list with:
x: H × W × 3 array (use transform_to_tensor() in transform to get
C × H × W tensor).
y: a list with:
masks: boolean tensor of shape (1, H, W).
labels: integer vector with the class index (always 1 for "fish").
The returned item is given class image_with_segmentation_mask so it can be
visualised with helpers like draw_segmentation_masks().
Other segmentation_dataset:
coco_segmentation_dataset(),
oxfordiiitpet_segmentation_dataset(),
pascal_voc_datasets
## Not run: ds <- rf100_peixos_segmentation_dataset( split = "train", transform = transform_to_tensor, download = TRUE ) item <- ds[1] overlay <- draw_segmentation_masks(item) tensor_image_browse(overlay) ## End(Not run)## Not run: ds <- rf100_peixos_segmentation_dataset( split = "train", transform = transform_to_tensor, download = TRUE ) item <- ds[1] overlay <- draw_segmentation_masks(item) tensor_image_browse(overlay) ## End(Not run)
Loads one of the underwater related RoboFlow 100 Environmental datasets: "pipes", "aquarium", "objects", or "coral". Images are provided with bounding box annotations for object-detection task.
rf100_underwater_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )rf100_underwater_collection( dataset, split = c("train", "test", "valid"), transform = NULL, target_transform = NULL, download = FALSE )
dataset |
Dataset to select within |
split |
the subset of the dataset to choose between |
transform |
Optional transform function applied to the image. |
target_transform |
Optional transform function applied to the target. |
download |
Logical. If TRUE, downloads the dataset if not present at |
A torch dataset. Each element is a named list with:
x: H x W x 3 array representing the image, auto-oriented and stretched to 640 x 640.
y: a list containing the target with:
image_id: numeric identifier of the x image.
labels: numeric identifier of the N bounding-box object class.
boxes: a torch_tensor of shape (N, 4) with bounding boxes, each in format.
The returned item inherits the class image_with_bounding_box so it can be
visualised with helper functions such as draw_bounding_boxes().
Other detection_dataset:
coco_detection_dataset(),
pascal_voc_datasets,
rf100_biology_collection(),
rf100_damage_collection(),
rf100_document_collection(),
rf100_infrared_collection(),
rf100_medical_collection()
## Not run: ds <- rf100_underwater_collection( dataset = "aquarium", split = "train", transform = transform_to_tensor, download = TRUE ) item <- ds[24] # map label ids into their class names item$y$labels <- ds$classes[item$y$labels] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)## Not run: ds <- rf100_underwater_collection( dataset = "aquarium", split = "train", transform = transform_to_tensor, download = TRUE ) item <- ds[24] # map label ids into their class names item$y$labels <- ds$classes[item$y$labels] boxed <- draw_bounding_boxes(item) tensor_image_browse(boxed) ## End(Not run)
Search through all Collection datasets by keywords in name or description, or filter by collection. This makes it easy to discover datasets relevant to your task without browsing each collection individually.
search_collection(keyword = NULL, collection = NULL)search_collection(keyword = NULL, collection = NULL)
keyword |
Character string to search for (case-insensitive). Searches in both dataset names and descriptions. If NULL, returns all datasets (optionally filtered by collection). |
collection |
Filter by collection name. One of: "biology", "medical", "infrared", "damage", "underwater", "document". If NULL, searches all collections. |
A data frame with matching datasets and their metadata. Returns NULL invisibly if no matches are found.
get_collection_catalog(), collection_catalog
## Not run: # Find all medical datasets search_collection(collection = "medical") # Find datasets about cells search_collection("cell") # Find photovoltaic/solar datasets search_collection("solar") search_collection("photovoltaic") # Find all biology datasets with "cell" in name/description search_collection("cell", collection = "biology") # List all available datasets search_collection() ## End(Not run)## Not run: # Find all medical datasets search_collection(collection = "medical") # Find datasets about cells search_collection("cell") # Find photovoltaic/solar datasets search_collection("solar") search_collection("photovoltaic") # Find all biology datasets with "cell" in name/description search_collection("cell", collection = "biology") # List all available datasets search_collection() ## End(Not run)
Converts COCO-style polygon segmentation annotations from target $segmentation variable
into boolean mask tensors as target $masks variable in order to ease later-on visualisation
via draw_segmentation_mask().
Use as target_transform in coco_detection_dataset().
target_transform_coco_masks(y)target_transform_coco_masks(y)
y |
list being COCO dataset target variable, with names |
Modified y list with added masks field (N, H, W) boolean tensor, N being the number of
classes.
Other target_transforms:
target_transform_trimap_masks()
## Not run: ds <- coco_detection_dataset( root = "data", target_transform = target_transform_coco_masks ) item <- ds[1] draw_segmentation_masks(item) ## End(Not run)## Not run: ds <- coco_detection_dataset( root = "data", target_transform = target_transform_coco_masks ) item <- ds[1] draw_segmentation_masks(item) ## End(Not run)
Converts Oxford-IIIT Pet dataset target $trimap variable (values 1,2,3) into
3-channel boolean masks tensors as target $masks variable in order to ease later-on visualisation
via draw_segmentation_mask().
Use as target_transform in oxfordiiitpet_segmentation_dataset().
target_transform_trimap_masks(y)target_transform_trimap_masks(y)
y |
List containing |
Creates three mutually exclusive masks:
Channel 1: Pet pixels (trimap == 1)
Channel 2: Background pixels (trimap == 2)
Channel 3: Outline pixels (trimap == 3)
Modified y list with added masks field (3, H, W) boolean tensor
Other target_transforms:
target_transform_coco_masks()
## Not run: ds <- oxfordiiitpet_segmentation_dataset( root = "data", target_transform = target_transform_trimap_masks ) item <- ds[1] draw_segmentation_masks(item) ## End(Not run)## Not run: ds <- oxfordiiitpet_segmentation_dataset( root = "data", target_transform = target_transform_trimap_masks ) item <- ds[1] draw_segmentation_masks(item) ## End(Not run)
Display image tensor into browser
tensor_image_browse(image, browser = getOption("browser"))tensor_image_browse(image, browser = getOption("browser"))
image |
|
browser |
argument passed to browseURL |
Other image display:
draw_bounding_boxes(),
draw_keypoints(),
draw_segmentation_masks(),
tensor_image_display(),
vision_make_grid()
Display image tensor onto the X11 device
tensor_image_display(image, animate = TRUE)tensor_image_display(image, animate = TRUE)
image |
|
animate |
support animations in the X11 display |
Other image display:
draw_bounding_boxes(),
draw_keypoints(),
draw_segmentation_masks(),
tensor_image_browse(),
vision_make_grid()
Prepares the Tiny ImageNet dataset and optionally downloads it.
tiny_imagenet_dataset(root, split = "train", download = FALSE, ...)tiny_imagenet_dataset(root, split = "train", download = FALSE, ...)
root |
directory path to download the dataset. |
split |
dataset split, |
download |
whether to download or not the dataset. |
... |
other arguments passed to |
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
vggface2_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
Adjust the brightness of an image
transform_adjust_brightness(img, brightness_factor)transform_adjust_brightness(img, brightness_factor)
img |
A |
brightness_factor |
(float): How much to adjust the brightness. Can be any non negative number. 0 gives a black image, 1 gives the original image while 2 increases the brightness by a factor of 2. |
Other unitary_transforms:
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Adjust the contrast of an image
transform_adjust_contrast(img, contrast_factor)transform_adjust_contrast(img, contrast_factor)
img |
A |
contrast_factor |
(float): How much to adjust the contrast. Can be any non negative number. 0 gives a solid gray image, 1 gives the original image while 2 increases the contrast by a factor of 2. |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Also known as Power Law Transform. Intensities in RGB mode are adjusted based on the following equation:
transform_adjust_gamma(img, gamma, gain = 1)transform_adjust_gamma(img, gamma, gain = 1)
img |
A |
gamma |
(float): Non negative real number, same as |
gain |
(float): The constant multiplier. |
Search for Gamma Correction for more details.
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
The image hue is adjusted by converting the image to HSV and cyclically shifting the intensities in the hue channel (H). The image is then converted back to original image mode.
transform_adjust_hue(img, hue_factor)transform_adjust_hue(img, hue_factor)
img |
A |
hue_factor |
(float): How much to shift the hue channel. Should be in
|
hue_factor is the amount of shift in H channel and must be in the
interval [-0.5, 0.5].
Search for Hue for more details.
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Adjust the color saturation of an image
transform_adjust_saturation(img, saturation_factor)transform_adjust_saturation(img, saturation_factor)
img |
A |
saturation_factor |
(float): How much to adjust the saturation. 0 will give a black and white image, 1 will give the original image while 2 will enhance the saturation by a factor of 2. |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Apply affine transformation on an image keeping image center invariant
transform_affine( img, angle, translate, scale, shear, interpolation = 0, fill = NULL, resample, fillcolor, center = NULL )transform_affine( img, angle, translate, scale, shear, interpolation = 0, fill = NULL, resample, fillcolor, center = NULL )
img |
A |
angle |
(float or int): rotation angle value in degrees, counter-clockwise. |
translate |
(sequence of int) – horizontal and vertical translations (post-rotation translation) |
scale |
(float) – overall scale |
shear |
(float or sequence) – shear angle value in degrees between -180 to 180, clockwise direction. If a sequence is specified, the first value corresponds to a shear parallel to the x-axis, while the second value corresponds to a shear parallel to the y-axis. |
interpolation |
(int or character): Interpolation mode. Supported values are 0 / "nearest" and 2 / "bilinear". Default is 0. |
fill |
Fill color for area outside the transform. Default is NULL. |
resample |
Deprecated. Use interpolation instead. |
fillcolor |
Deprecated. Use fill instead. |
center |
Optional center of rotation, c(x, y). Default is image center. |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
The image can be a Magick Image or a torch Tensor, in which case it is
expected to have [..., H, W] shape, where ... means an arbitrary number
of leading dimensions.
transform_center_crop(img, size)transform_center_crop(img, size)
img |
A |
size |
(sequence or int): Desired output size of the crop. If size is
an int instead of sequence like c(h, w), a square crop (size, size) is
made. If provided a tuple or list of length 1, it will be interpreted as
|
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Randomly change the brightness, contrast and saturation of an image
transform_color_jitter( img, brightness = 0, contrast = 0, saturation = 0, hue = 0 )transform_color_jitter( img, brightness = 0, contrast = 0, saturation = 0, hue = 0 )
img |
A |
brightness |
(float or tuple of float (min, max)): How much to jitter
brightness. |
contrast |
(float or tuple of float (min, max)): How much to jitter
contrast. |
saturation |
(float or tuple of float (min, max)): How much to jitter
saturation. |
hue |
(float or tuple of float (min, max)): How much to jitter hue.
|
Other random_transforms:
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
dtype and scale the values accordinglyConvert a tensor image to the given dtype and scale the values accordingly
transform_convert_image_dtype(img, dtype = torch::torch_float())transform_convert_image_dtype(img, dtype = torch::torch_float())
img |
A |
dtype |
(torch.dtype): Desired data type of the output. |
When converting from a smaller to a larger integer dtype the maximum
values are not mapped exactly. If converted back and forth, this
mismatch has no effect.
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Crop the given image at specified location and output size
transform_crop(img, top, left, height, width)transform_crop(img, top, left, height, width)
img |
A |
top |
(int): Vertical component of the top left corner of the crop box. |
left |
(int): Horizontal component of the top left corner of the crop box. |
height |
(int): Height of the crop box. |
width |
(int): Width of the crop box. |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Crop the given image into four corners and the central crop. This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns.
transform_five_crop(img, size)transform_five_crop(img, size)
img |
A |
size |
(sequence or int): Desired output size. If size is a sequence like c(h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). |
Other combining_transforms:
transform_random_apply(),
transform_random_choice(),
transform_random_order(),
transform_resized_crop(),
transform_ten_crop()
Convert image to grayscale
transform_grayscale(img, num_output_channels)transform_grayscale(img, num_output_channels)
img |
A |
num_output_channels |
(int): (1 or 3) number of channels desired for output image |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Horizontally flip a PIL Image or Tensor
transform_hflip(img)transform_hflip(img)
img |
A |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Given transformation_matrix and mean_vector, will flatten the
torch_tensor and subtract mean_vector from it which is then followed by
computing the dot product with the transformation matrix and then reshaping
the tensor to its original shape.
transform_linear_transformation(img, transformation_matrix, mean_vector)transform_linear_transformation(img, transformation_matrix, mean_vector)
img |
A |
transformation_matrix |
(Tensor): tensor |
mean_vector |
(Tensor): tensor D, D = C x H x W. |
whitening transformation: Suppose X is a column vector zero-centered data.
Then compute the data covariance matrix [D x D] with torch.mm(X.t(), X),
perform SVD on this matrix and pass it as transformation_matrix.
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n
channels, this transform will normalize each channel of the input
torch_tensor i.e.,
output[channel] = (input[channel] - mean[channel]) / std[channel]
transform_normalize(img, mean, std, inplace = FALSE)transform_normalize(img, mean, std, inplace = FALSE)
img |
A |
mean |
(sequence): Sequence of means for each channel. |
std |
(sequence): Sequence of standard deviations for each channel. |
inplace |
(bool,optional): Bool to make this operation in-place. |
This transform acts out of place, i.e., it does not mutate the input tensor.
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
The image can be a Magick Image or a torch Tensor, in which case it is
expected to have [..., H, W] shape, where ... means an arbitrary number
of leading dimensions.
transform_pad(img, padding, fill = 0, padding_mode = "constant")transform_pad(img, padding, fill = 0, padding_mode = "constant")
img |
A |
padding |
(int or tuple or list): Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, right, top and bottom borders respectively. |
fill |
(int or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only int value is supported for Tensors. |
padding_mode |
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant. Mode symmetric is not yet supported for Tensor inputs.
|
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Perspective transformation of an image
transform_perspective( img, startpoints, endpoints, interpolation = 2, fill = NULL )transform_perspective( img, startpoints, endpoints, interpolation = 2, fill = NULL )
img |
A |
startpoints |
(list of list of ints): List containing four lists of two
integers corresponding to four corners
|
endpoints |
(list of list of ints): List containing four lists of two
integers corresponding to four corners
|
interpolation |
(int, optional) Desired interpolation. An integer
|
fill |
(int or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only int value is supported for Tensors. |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Random affine transformation of the image keeping center invariant
transform_random_affine( img, degrees, translate = NULL, scale = NULL, shear = NULL, interpolation = 0, fill = 0, resample, fillcolor )transform_random_affine( img, degrees, translate = NULL, scale = NULL, shear = NULL, interpolation = 0, fill = 0, resample, fillcolor )
img |
A |
degrees |
(sequence or float or int): Range of degrees to select from. If degrees is a number instead of sequence like c(min, max), the range of degrees will be (-degrees, +degrees). |
translate |
(tuple, optional): tuple of maximum absolute fraction for
horizontal and vertical translations. For example |
scale |
(tuple, optional): scaling factor interval, e.g c(a, b), then scale is randomly sampled from the range a <= scale <= b. Will keep original scale by default. |
shear |
(sequence or float or int, optional): Range of degrees to select
from. If shear is a number, a shear parallel to the x axis in the range
(-shear, +shear) will be applied. Else if shear is a tuple or list of 2
values a shear parallel to the x axis in the range |
interpolation |
(int or character, optional): Interpolation mode. Supported values are 0 / "nearest" and 2 / "bilinear". Default is 0. |
fill |
(tuple or int): Fill color for the area outside the transform. Default is 0. This option is not supported for Tensor input. |
resample |
Deprecated. Use interpolation instead. |
fillcolor |
Deprecated. Use fill instead. |
Other random_transforms:
transform_color_jitter(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
Apply a list of transformations randomly with a given probability
transform_random_apply(img, transforms, p = 0.5)transform_random_apply(img, transforms, p = 0.5)
img |
A |
transforms |
(list or tuple): list of transformations. |
p |
(float): probability. |
Other combining_transforms:
transform_five_crop(),
transform_random_choice(),
transform_random_order(),
transform_resized_crop(),
transform_ten_crop()
Apply single transformation randomly picked from a list
transform_random_choice(img, transforms)transform_random_choice(img, transforms)
img |
A |
transforms |
(list or tuple): list of transformations. |
Other combining_transforms:
transform_five_crop(),
transform_random_apply(),
transform_random_order(),
transform_resized_crop(),
transform_ten_crop()
The image can be a Magick Image or a Tensor, in which case it is expected
to have [..., H, W] shape, where ... means an arbitrary number of leading
dimensions.
transform_random_crop( img, size, padding = NULL, pad_if_needed = FALSE, fill = 0, padding_mode = "constant" )transform_random_crop( img, size, padding = NULL, pad_if_needed = FALSE, fill = 0, padding_mode = "constant" )
img |
A |
size |
(sequence or int): Desired output size. If size is a sequence like c(h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). |
padding |
(int or tuple or list): Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, right, top and bottom borders respectively. |
pad_if_needed |
(boolean): It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. |
fill |
(int or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only int value is supported for Tensors. |
padding_mode |
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant. Mode symmetric is not yet supported for Tensor inputs.
|
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
'Random Erasing Data Augmentation' by Zhong et al. See https://arxiv.org/pdf/1708.04896
transform_random_erasing( img, p = 0.5, scale = c(0.02, 0.33), ratio = c(0.3, 3.3), value = 0, inplace = FALSE )transform_random_erasing( img, p = 0.5, scale = c(0.02, 0.33), ratio = c(0.3, 3.3), value = 0, inplace = FALSE )
img |
A |
p |
probability that the random erasing operation will be performed. |
scale |
range of proportion of erased area against input image. |
ratio |
range of aspect ratio of erased area. |
value |
erasing value. Default is 0. If a single int, it is used to erase all pixels. If a tuple of length 3, it is used to erase R, G, B channels respectively. If a str of 'random', erasing each pixel with random values. |
inplace |
boolean to make this transform inplace. Default set to FALSE. |
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
Convert image to grayscale with a probability of p.
transform_random_grayscale(img, p = 0.1)transform_random_grayscale(img, p = 0.1)
img |
A |
p |
(float): probability that image should be converted to grayscale (default 0.1). |
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
Horizontally flip an image randomly with a given probability. The image can
be a Magick Image or a torch Tensor, in which case it is expected to have
[..., H, W] shape, where ... means an arbitrary number of leading
dimensions
transform_random_horizontal_flip(img, p = 0.5)transform_random_horizontal_flip(img, p = 0.5)
img |
A |
p |
(float): probability of the image being flipped. Default value is 0.5 |
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
Apply a list of transformations in a random order
transform_random_order(img, transforms)transform_random_order(img, transforms)
img |
A |
transforms |
(list or tuple): list of transformations. |
Other combining_transforms:
transform_five_crop(),
transform_random_apply(),
transform_random_choice(),
transform_resized_crop(),
transform_ten_crop()
Performs a random perspective transformation of the given image with a given probability
transform_random_perspective( img, distortion_scale = 0.5, p = 0.5, interpolation = 2, fill = 0 )transform_random_perspective( img, distortion_scale = 0.5, p = 0.5, interpolation = 2, fill = 0 )
img |
A |
distortion_scale |
(float): argument to control the degree of distortion and ranges from 0 to 1. Default is 0.5. |
p |
(float): probability of the image being transformed. Default is 0.5. |
interpolation |
(int, optional) Desired interpolation. An integer
|
fill |
(int or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only int value is supported for Tensors. |
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_resized_crop(),
transform_random_rotation(),
transform_random_vertical_flip()
Crop the given image to a random size and aspect ratio. The image can be a
Magick Image or a Tensor, in which case it is expected to have
[..., H, W] shape, where ... means an arbitrary number of leading
dimensions
transform_random_resized_crop( img, size, scale = c(0.08, 1), ratio = c(3/4, 4/3), interpolation = 2 )transform_random_resized_crop( img, size, scale = c(0.08, 1), ratio = c(3/4, 4/3), interpolation = 2 )
img |
A |
size |
(sequence or int): Desired output size. If size is a sequence like c(h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). |
scale |
(tuple of float): range of size of the origin size cropped |
ratio |
(tuple of float): range of aspect ratio of the origin aspect ratio cropped. |
interpolation |
(int, optional) Desired interpolation. An integer
|
A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_rotation(),
transform_random_vertical_flip()
Rotate the image by angle
transform_random_rotation( img, degrees, interpolation = 0, expand = FALSE, center = NULL, fill = NULL, resample )transform_random_rotation( img, degrees, interpolation = 0, expand = FALSE, center = NULL, fill = NULL, resample )
img |
A |
degrees |
(sequence or float or int): Range of degrees to select from. If degrees is a number instead of sequence like c(min, max), the range of degrees will be (-degrees, +degrees). |
interpolation |
(int, optional): Interpolation mode. 0 for nearest, 2 for bilinear. Default is 0 (nearest). |
expand |
(bool, optional): Optional expansion flag. If true, expands the output to make it large enough to hold the entire rotated image. If false or omitted, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation. |
center |
(list or tuple, optional): Optional center of rotation, c(x, y). Origin is the upper left corner. Default is the center of the image. |
fill |
(n-tuple or int or float): Pixel fill value for area outside the rotated image. If int or float, the value is used for all bands respectively. Defaults to 0 for all bands. This option is only available for Pillow>=5.2.0. This option is not supported for Tensor input. Fill value for the area outside the transform in the output image is always 0. |
resample |
Deprecated. Use interpolation instead. |
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_vertical_flip()
The image can be a PIL Image or a torch Tensor, in which case it is expected
to have [..., H, W] shape, where ... means an arbitrary number of
leading dimensions
transform_random_vertical_flip(img, p = 0.5)transform_random_vertical_flip(img, p = 0.5)
img |
A |
p |
(float): probability of the image being flipped. Default value is 0.5 |
Other random_transforms:
transform_color_jitter(),
transform_random_affine(),
transform_random_crop(),
transform_random_erasing(),
transform_random_grayscale(),
transform_random_horizontal_flip(),
transform_random_perspective(),
transform_random_resized_crop(),
transform_random_rotation()
The image can be a Magic Image or a torch Tensor, in which case it is
expected to have [..., H, W] shape, where ... means an arbitrary number
of leading dimensions
transform_resize(img, size, interpolation = 2)transform_resize(img, size, interpolation = 2)
img |
A |
size |
(sequence or int): Desired output size. If size is a sequence like c(h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). |
interpolation |
(int, optional) Desired interpolation. An integer
|
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Crop an image and resize it to a desired size
transform_resized_crop(img, top, left, height, width, size, interpolation = 2)transform_resized_crop(img, top, left, height, width, size, interpolation = 2)
img |
A |
top |
(int): Vertical component of the top left corner of the crop box. |
left |
(int): Horizontal component of the top left corner of the crop box. |
height |
(int): Height of the crop box. |
width |
(int): Width of the crop box. |
size |
(sequence or int): Desired output size. If size is a sequence like c(h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). |
interpolation |
(int, optional) Desired interpolation. An integer
|
Other combining_transforms:
transform_five_crop(),
transform_random_apply(),
transform_random_choice(),
transform_random_order(),
transform_ten_crop()
For RGB to Grayscale conversion, ITU-R 601-2 luma transform is performed which is L = R * 0.2989 + G * 0.5870 + B * 0.1140
transform_rgb_to_grayscale(img)transform_rgb_to_grayscale(img)
img |
A |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rotate(),
transform_to_tensor(),
transform_vflip()
Angular rotation of an image
transform_rotate( img, angle, interpolation = 0, expand = FALSE, center = NULL, fill = NULL, resample )transform_rotate( img, angle, interpolation = 0, expand = FALSE, center = NULL, fill = NULL, resample )
img |
A |
angle |
(float or int): rotation angle value in degrees, counter-clockwise. |
interpolation |
(int, optional): Interpolation mode. 0 for nearest, 2 for bilinear. Default is 0 (nearest). |
expand |
(bool, optional): Optional expansion flag. If true, expands the output to make it large enough to hold the entire rotated image. If false or omitted, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation. |
center |
(list or tuple, optional): Optional center of rotation, c(x, y). Origin is the upper left corner. Default is the center of the image. |
fill |
(n-tuple or int or float): Pixel fill value for area outside the rotated image. If int or float, the value is used for all bands respectively. Defaults to 0 for all bands. This option is only available for Pillow>=5.2.0. This option is not supported for Tensor input. Fill value for the area outside the transform in the output image is always 0. |
resample |
Deprecated. Use interpolation instead. |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_to_tensor(),
transform_vflip()
Crop the given image into four corners and the central crop, plus the flipped version of these (horizontal flipping is used by default). This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns.
transform_ten_crop(img, size, vertical_flip = FALSE)transform_ten_crop(img, size, vertical_flip = FALSE)
img |
A |
size |
(sequence or int): Desired output size. If size is a sequence like c(h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). |
vertical_flip |
(bool): Use vertical flipping instead of horizontal |
Other combining_transforms:
transform_five_crop(),
transform_random_apply(),
transform_random_choice(),
transform_random_order(),
transform_resized_crop()
Converts a Magick Image or array (H x W x C) in the range [0, 255] to a
torch_tensor of shape (C x H x W) in the range [0.0, 1.0]. In the
other cases, tensors are returned without scaling.
transform_to_tensor(img)transform_to_tensor(img)
img |
A |
Because the input image is scaled to [0.0, 1.0], this transformation
should not be used when transforming target image masks.
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_vflip()
Vertically flip a PIL Image or Tensor
transform_vflip(img)transform_vflip(img)
img |
A |
Other unitary_transforms:
transform_adjust_brightness(),
transform_adjust_contrast(),
transform_adjust_gamma(),
transform_adjust_hue(),
transform_adjust_saturation(),
transform_affine(),
transform_center_crop(),
transform_convert_image_dtype(),
transform_crop(),
transform_grayscale(),
transform_hflip(),
transform_linear_transformation(),
transform_normalize(),
transform_pad(),
transform_perspective(),
transform_resize(),
transform_rgb_to_grayscale(),
transform_rotate(),
transform_to_tensor()
The VGGFace2 dataset is a large-scale face recognition dataset containing images of celebrities from a wide range of ethnicities, professions, and ages. Each identity has multiple images with variations in context, pose, age, and illumination.
vggface2_dataset( root = tempdir(), split = "val", transform = NULL, target_transform = NULL, download = FALSE )vggface2_dataset( root = tempdir(), split = "val", transform = NULL, target_transform = NULL, download = FALSE )
root |
Character. Root directory where the dataset will be stored under |
split |
One of |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
A torch dataset object vggface2_dataset:
x: RGB image array.
y: Integer label (1…N) for the identity.
ds$classes is a named list mapping integer labels to a list with:
name: Character name of the person.
gender: "Male" or "Female".
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
whoi_plankton_dataset(),
whoi_small_coralnet_dataset()
## Not run: #Load the training set ds <- vggface2_dataset(download = TRUE) item <- ds[1] item$x # image array RGB item$y # integer label ds$classes[item$y] # list(name=..., gender=...) #Load the test set ds <- vggface2_dataset(download = TRUE, train = FALSE) item <- ds[1] item$x # image array RGB item$y # integer label ds$classes[item$y] # list(name=..., gender=...) ## End(Not run)## Not run: #Load the training set ds <- vggface2_dataset(download = TRUE) item <- ds[1] item$x # image array RGB item$y # integer label ds$classes[item$y] # list(name=..., gender=...) #Load the test set ds <- vggface2_dataset(download = TRUE, train = FALSE) item <- ds[1] item$x # image array RGB item$y # integer label ds$classes[item$y] # list(name=..., gender=...) ## End(Not run)
Arranges a batch B of (image) tensors in a grid, with optional padding between images. Expects a 4d mini-batch tensor of shape (B x C x H x W).
vision_make_grid( tensor, scale = TRUE, num_rows = 8, padding = 2, pad_value = 0 )vision_make_grid( tensor, scale = TRUE, num_rows = 8, padding = 2, pad_value = 0 )
tensor |
tensor of shape (B x C x H x W) to arrange in grid. |
scale |
whether to normalize (min-max-scale) the input tensor. |
num_rows |
number of rows making up the grid (default 8). |
padding |
amount of padding between batch images (default 2). |
pad_value |
pixel value to use for padding. |
a 3d torch_tensor of shape of all images arranged in a grid.
Other image display:
draw_bounding_boxes(),
draw_keypoints(),
draw_segmentation_masks(),
tensor_image_browse(),
tensor_image_display()
WHOI-Plankton Dataset
whoi_small_plankton_dataset( split = "val", transform = NULL, target_transform = NULL, download = FALSE ) whoi_plankton_dataset( split = "val", transform = NULL, target_transform = NULL, download = FALSE )whoi_small_plankton_dataset( split = "val", transform = NULL, target_transform = NULL, download = FALSE ) whoi_plankton_dataset( split = "val", transform = NULL, target_transform = NULL, download = FALSE )
split |
One of |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
The WHOI-Plankton and WHOI-Plankton small are image classification datasets from the Woods Hole Oceanographic Institution (WHOI) of microscopic marine plankton. https://hdl.handle.net/10.1575/1912/7341 Images were collected in situ by automated submersible imaging-in-flow cytometry with an instrument called Imaging FlowCytobot (IFCB). They are small grayscale images of varying size. Images are classified into 100 classes, with an overview available in project Wiki page Dataset size is 957k and 58k respectively, and each provides a train / val / test split.
A torch dataset with a
classes attribute providing the vector of class names.
Each element is a named list:
x: a H x W x 1 integer array representing an grayscale image.
y: the class id of the image.
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_small_coralnet_dataset()
## Not run: # Load the small plankton dataset and turn images into tensor images plankton <- whoi_small_plankton_dataset(download = TRUE, transform = transform_to_tensor) # Access the first item first_item <- plankton[1] first_item$x # a tensor grayscale image with shape {1, H, W} first_item$y # id of the plankton class. plankton$classes[first_item$y] # name of the plankton class # Load the full plankton dataset plankton <- whoi_plankton_dataset(download = TRUE) # Access the first item first_item <- plankton[1] first_item$x # grayscale image array with shape {H, W} first_item$y # id of the plankton class. ## End(Not run)## Not run: # Load the small plankton dataset and turn images into tensor images plankton <- whoi_small_plankton_dataset(download = TRUE, transform = transform_to_tensor) # Access the first item first_item <- plankton[1] first_item$x # a tensor grayscale image with shape {1, H, W} first_item$y # id of the plankton class. plankton$classes[first_item$y] # name of the plankton class # Load the full plankton dataset plankton <- whoi_plankton_dataset(download = TRUE) # Access the first item first_item <- plankton[1] first_item$x # grayscale image array with shape {H, W} first_item$y # id of the plankton class. ## End(Not run)
Small Coralnet dataset is an image classification dataset of very large submarine coral reef images annotated into 3 classes and produced by CoralNet, a resource for benthic images classification.
whoi_small_coralnet_dataset( split = "val", transform = NULL, target_transform = NULL, download = FALSE )whoi_small_coralnet_dataset( split = "val", transform = NULL, target_transform = NULL, download = FALSE )
split |
One of |
transform |
Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping). |
target_transform |
Optional. A function that transforms the label. |
download |
Logical. If TRUE, downloads the dataset to |
Other classification_dataset:
caltech_dataset,
cifar10_dataset(),
eurosat_dataset(),
fer_dataset(),
fgvc_aircraft_dataset(),
flowers102_dataset(),
image_folder_dataset(),
lfw_dataset,
mnist_dataset(),
oxfordiiitpet_dataset(),
places365_dataset(),
tiny_imagenet_dataset(),
vggface2_dataset(),
whoi_plankton_dataset()