Title: | Tensors and Neural Networks with 'GPU' Acceleration |
---|---|
Description: | Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration. |
Authors: | Daniel Falbel [aut, cre, cph], Javier Luraschi [aut], Dmitriy Selivanov [ctb], Athos Damiani [ctb], Christophe Regouby [ctb], Krzysztof Joachimiak [ctb], Hamada S. Badr [ctb], Sebastian Fischer [ctb], Maximilian Pichler [ctb], RStudio [cph] |
Maintainer: | Daniel Falbel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.13.0.9001 |
Built: | 2025-01-17 14:25:47 UTC |
Source: | https://github.com/mlverse/torch |
Converts to array
as_array(x)
as_array(x)
x |
object to be converted into an array |
The graph is differentiated using the chain rule. If any of tensors are
non-scalar (i.e. their data has more than one element) and require gradient,
then the Jacobian-vector product would be computed, in this case the function
additionally requires specifying grad_tensors
. It should be a sequence of
matching length, that contains the “vector” in the Jacobian-vector product,
usually the gradient of the differentiated function w.r.t. corresponding
tensors (None is an acceptable value for all tensors that don’t need gradient
tensors).
autograd_backward( tensors, grad_tensors = NULL, retain_graph = create_graph, create_graph = FALSE )
autograd_backward( tensors, grad_tensors = NULL, retain_graph = create_graph, create_graph = FALSE )
tensors |
(list of Tensor) – Tensors of which the derivative will be computed. |
grad_tensors |
(list of (Tensor or |
retain_graph |
(bool, optional) – If |
create_graph |
(bool, optional) – If |
This function accumulates gradients in the leaves - you might need to zero them before calling it.
if (torch_is_installed()) { x <- torch_tensor(1, requires_grad = TRUE) y <- 2 * x a <- torch_tensor(1, requires_grad = TRUE) b <- 3 * a autograd_backward(list(y, b)) }
if (torch_is_installed()) { x <- torch_tensor(1, requires_grad = TRUE) y <- 2 * x a <- torch_tensor(1, requires_grad = TRUE) b <- 3 * a autograd_backward(list(y, b)) }
Every operation performed on Tensor's creates a new function object, that
performs the computation, and records that it happened. The history is
retained in the form of a DAG of functions, with edges denoting data
dependencies (input <- output). Then, when backward is called, the graph is
processed in the topological ordering, by calling backward()
methods of each
Function object, and passing returned gradients on to next Function's.
autograd_function(forward, backward)
autograd_function(forward, backward)
forward |
Performs the operation. It must accept a context |
backward |
Defines a formula for differentiating the operation. It must accept
a context |
if (torch_is_installed()) { exp2 <- autograd_function( forward = function(ctx, i) { result <- i$exp() ctx$save_for_backward(result = result) result }, backward = function(ctx, grad_output) { list(i = grad_output * ctx$saved_variable$result) } ) }
if (torch_is_installed()) { exp2 <- autograd_function( forward = function(ctx, i) { result <- i$exp() ctx$save_for_backward(result = result) result }, backward = function(ctx, grad_output) { list(i = grad_output * ctx$saved_variable$result) } ) }
grad_outputs
should be a list of length matching output containing the “vector”
in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of
the outputs. If an output doesn’t require_grad, then the gradient can be None).
autograd_grad( outputs, inputs, grad_outputs = NULL, retain_graph = create_graph, create_graph = FALSE, allow_unused = FALSE )
autograd_grad( outputs, inputs, grad_outputs = NULL, retain_graph = create_graph, create_graph = FALSE, allow_unused = FALSE )
outputs |
(sequence of Tensor) – outputs of the differentiated function. |
inputs |
(sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into .grad). |
grad_outputs |
(sequence of Tensor) – The “vector” in the Jacobian-vector
product. Usually gradients w.r.t. each output. None values can be specified for
scalar Tensors or ones that don’t require grad. If a None value would be acceptable
for all |
retain_graph |
(bool, optional) – If |
create_graph |
(bool, optional) – If |
allow_unused |
(bool, optional) – If |
If only_inputs is TRUE
, the function will only return a list of gradients w.r.t
the specified inputs. If it’s FALSE
, then gradient w.r.t. all remaining leaves
will still be computed, and will be accumulated into their .grad
attribute.
if (torch_is_installed()) { w <- torch_tensor(0.5, requires_grad = TRUE) b <- torch_tensor(0.9, requires_grad = TRUE) x <- torch_tensor(runif(100)) y <- 2 * x + 1 loss <- (y - (w * x + b))^2 loss <- loss$mean() o <- autograd_grad(loss, list(w, b)) o }
if (torch_is_installed()) { w <- torch_tensor(0.5, requires_grad = TRUE) b <- torch_tensor(0.9, requires_grad = TRUE) x <- torch_tensor(runif(100)) y <- 2 * x + 1 loss <- (y - (w * x + b))^2 loss <- loss$mean() o <- autograd_grad(loss, list(w, b)) o }
Sets or disables gradient history.
autograd_set_grad_mode(enabled)
autograd_set_grad_mode(enabled)
enabled |
bool wether to enable or disable the gradient recording. |
Class representing the context.
Class representing the context.
ptr
(Dev related) pointer to the context c++ object.
needs_input_grad
boolean listing arguments of forward
and whether they require_grad.
saved_variables
list of objects that were saved for backward via save_for_backward
.
new()
(Dev related) Initializes the context. Not user related.
AutogradContext$new( ptr, env, argument_names = NULL, argument_needs_grad = NULL )
ptr
pointer to the c++ object
env
environment that encloses both forward and backward
argument_names
names of forward arguments
argument_needs_grad
whether each argument in forward needs grad.
save_for_backward()
Saves given objects for a future call to backward().
This should be called at most once, and only from inside the forward()
method.
Later, saved objects can be accessed through the saved_variables
attribute.
Before returning them to the user, a check is made to ensure they weren’t used
in any in-place operation that modified their content.
Arguments can also be any kind of R object.
AutogradContext$save_for_backward(...)
...
any kind of R object that will be saved for the backward pass. It's common to pass named arguments.
mark_non_differentiable()
Marks outputs as non-differentiable.
This should be called at most once, only from inside the forward()
method,
and all arguments should be outputs.
This will mark outputs as not requiring gradients, increasing the efficiency
of backward computation. You still need to accept a gradient for each output
in backward()
, but it’s always going to be a zero tensor with the same
shape as the shape of a corresponding output.
This is used e.g. for indices returned from a max Function.
AutogradContext$mark_non_differentiable(...)
...
non-differentiable outputs.
mark_dirty()
Marks given tensors as modified in an in-place operation.
This should be called at most once, only from inside the forward()
method,
and all arguments should be inputs.
Every tensor that’s been modified in-place in a call to forward()
should
be given to this function, to ensure correctness of our checks. It doesn’t
matter whether the function is called before or after modification.
AutogradContext$mark_dirty(...)
...
tensors that are modified in-place.
clone()
The objects of this class are cloneable with this method.
AutogradContext$clone(deep = FALSE)
deep
Whether to make a deep clone.
CuDNN is available
backends_cudnn_is_available()
backends_cudnn_is_available()
MKL is available
backends_mkl_is_available()
backends_mkl_is_available()
Returns whether LibTorch is built with MKL support.
MKLDNN is available
backends_mkldnn_is_available()
backends_mkldnn_is_available()
Returns whether LibTorch is built with MKL-DNN support.
MPS is available
backends_mps_is_available()
backends_mps_is_available()
Returns whether LibTorch is built with MPS support.
OpenMP is available
backends_openmp_is_available()
backends_openmp_is_available()
Returns whether LibTorch is built with OpenMP support.
Raises value_error: if any of the values is not a numeric
instance,
a torch.*Tensor
instance, or an instance implementing torch_function
TODO: add has_torch_function((v,))
See: https://github.com/pytorch/pytorch/blob/master/torch/distributions/utils.py
broadcast_all(values)
broadcast_all(values)
values |
List of:
|
Clones a module.
clone_module(module, deep = FALSE, ..., replace_values = TRUE)
clone_module(module, deep = FALSE, ..., replace_values = TRUE)
module |
( |
deep |
( |
... |
(any) |
replace_values |
( |
if (torch_is_installed()) { clone_module(nn_linear(1, 1), deep = TRUE) # is the same as nn_linear(1, 1)$clone(deep = TRUE) }
if (torch_is_installed()) { clone_module(nn_linear(1, 1), deep = TRUE) # is the same as nn_linear(1, 1)$clone(deep = TRUE) }
Abstract base class for constraints.
Abstract base class for constraints.
A constraint object represents a region over which a variable is valid, e.g. within which a variable can be optimized.
check()
Returns a byte tensor of sample_shape + batch_shape
indicating
whether each event in value satisfies this constraint.
Constraint$check(value)
value
each event in value will be checked.
print()
Define the print method for constraints,
Constraint$print()
clone()
The objects of this class are cloneable with this method.
Constraint$clone(deep = FALSE)
deep
Whether to make a deep clone.
Based on the implementation from Rotated_IoU
contrib_sort_vertices(vertices, mask, num_valid)
contrib_sort_vertices(vertices, mask, num_valid)
vertices |
A Tensor with the vertices. |
mask |
A tensors containing the masks. |
num_valid |
A integer tensors. |
All tensors should be on a CUDA device so this function can be used.
This function does not make part of the official torch API.
if (torch_is_installed()) { if (cuda_is_available()) { v <- torch_randn(8, 1024, 24, 2)$cuda() mean <- torch_mean(v, dim = 2, keepdim = TRUE) v <- v - mean m <- (torch_rand(8, 1024, 24) > 0.8)$cuda() nv <- torch_sum(m$to(dtype = torch_int()), dim = -1)$to(dtype = torch_int())$cuda() result <- contrib_sort_vertices(v, m, nv) } }
if (torch_is_installed()) { if (cuda_is_available()) { v <- torch_randn(8, 1024, 24, 2)$cuda() mean <- torch_mean(v, dim = 2, keepdim = TRUE) v <- v - mean m <- (torch_rand(8, 1024, 24) > 0.8)$cuda() nv <- torch_sum(m$to(dtype = torch_int()), dim = -1)$to(dtype = torch_int())$cuda() result <- contrib_sort_vertices(v, m, nv) } }
A gradient scaler instance is used to perform dynamic gradient scaling to avoid gradient underflow when training with mixed precision.
cuda_amp_grad_scaler( init_scale = 2^16, growth_factor = 2, backoff_factor = 0.5, growth_interval = 2000, enabled = TRUE )
cuda_amp_grad_scaler( init_scale = 2^16, growth_factor = 2, backoff_factor = 0.5, growth_interval = 2000, enabled = TRUE )
init_scale |
a numeric value indicating the initial scale factor. |
growth_factor |
a numeric value indicating the growth factor. |
backoff_factor |
a numeric value indicating the backoff factor. |
growth_interval |
a numeric value indicating the growth interval. |
enabled |
a logical value indicating whether the gradient scaler should be enabled. |
A gradient scaler object.
Returns the index of a currently selected device.
cuda_current_device()
cuda_current_device()
Returns the number of GPUs available.
cuda_device_count()
cuda_device_count()
Releases all unoccupied cached memory currently held by the caching allocator
so that those can be used in other GPU application and visible in nvidia-smi
.
cuda_empty_cache()
cuda_empty_cache()
cuda_empty_cache()
doesn’t increase the amount of GPU memory available
for torch. However, it may help reduce fragmentation of GPU memory in certain
cases. See Memory management article for more details about GPU memory management.
device
Returns the major and minor CUDA capability of device
cuda_get_device_capability(device = cuda_current_device())
cuda_get_device_capability(device = cuda_current_device())
device |
Integer value of the CUDA device to return capabilities of. |
Returns a bool indicating if CUDA is currently available.
cuda_is_available()
cuda_is_available()
The return value of this function is a dictionary of statistics, each of which is a non-negative integer.
cuda_memory_stats(device = cuda_current_device()) cuda_memory_summary(device = cuda_current_device())
cuda_memory_stats(device = cuda_current_device()) cuda_memory_summary(device = cuda_current_device())
device |
Integer value of the CUDA device to return capabilities of. |
"allocated.{all,large_pool,small_pool}.{current,peak,allocated,freed}": number of allocation requests received by the memory allocator.
"allocated_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}": amount of allocated memory.
"segment.{all,large_pool,small_pool}.{current,peak,allocated,freed}": number of reserved segments from cudaMalloc().
"reserved_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}": amount of reserved memory.
"active.{all,large_pool,small_pool}.{current,peak,allocated,freed}": number of active memory blocks.
"active_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}": amount of active memory.
"inactive_split.{all,large_pool,small_pool}.{current,peak,allocated,freed}": number of inactive, non-releasable memory blocks.
"inactive_split_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}": amount of inactive, non-releasable memory.
For these core statistics, values are broken down as follows.
Pool type:
all: combined statistics across all memory pools.
large_pool: statistics for the large allocation pool (as of October 2019, for size >= 1MB allocations).
small_pool: statistics for the small allocation pool (as of October 2019, for size < 1MB allocations).
Metric type:
current: current value of this metric.
peak: maximum value of this metric.
allocated: historical total increase in this metric.
freed: historical total decrease in this metric.
"num_alloc_retries": number of failed cudaMalloc calls that result in a cache flush and retry.
"num_ooms": number of out-of-memory errors thrown.
Returns the CUDA runtime version
cuda_runtime_version()
cuda_runtime_version()
Waits for all kernels in all streams on a CUDA device to complete.
cuda_synchronize(device = NULL)
cuda_synchronize(device = NULL)
device |
device for which to synchronize. It uses the current device
given by |
Data loader. Combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset.
dataloader( dataset, batch_size = 1, shuffle = FALSE, sampler = NULL, batch_sampler = NULL, num_workers = 0, collate_fn = NULL, pin_memory = FALSE, drop_last = FALSE, timeout = -1, worker_init_fn = NULL, worker_globals = NULL, worker_packages = NULL )
dataloader( dataset, batch_size = 1, shuffle = FALSE, sampler = NULL, batch_sampler = NULL, num_workers = 0, collate_fn = NULL, pin_memory = FALSE, drop_last = FALSE, timeout = -1, worker_init_fn = NULL, worker_globals = NULL, worker_packages = NULL )
dataset |
(Dataset): dataset from which to load the data. |
batch_size |
(int, optional): how many samples per batch to load
(default: |
shuffle |
(bool, optional): set to |
sampler |
(Sampler, optional): defines the strategy to draw samples from
the dataset. If specified, |
batch_sampler |
(Sampler, optional): like sampler, but returns a batch of
indices at a time. Mutually exclusive with |
num_workers |
(int, optional): how many subprocesses to use for data
loading. 0 means that the data will be loaded in the main process.
(default: |
collate_fn |
(callable, optional): merges a list of samples to form a mini-batch. |
pin_memory |
(bool, optional): If |
drop_last |
(bool, optional): set to |
timeout |
(numeric, optional): if positive, the timeout value for collecting a batch
from workers. -1 means no timeout. (default: |
worker_init_fn |
(callable, optional): If not |
worker_globals |
(list or character vector, optional) only used when
|
worker_packages |
(character vector, optional) Only used if |
When using num_workers > 0
data loading will happen in parallel for each
worker. Note that batches are taken in parallel and not observations.
The worker initialization process happens in the following order:
num_workers
R sessions are initialized.
Then in each worker we perform the following actions:
the torch
library is loaded.
a random seed is set both using set.seed()
and using torch_manual_seed
.
packages passed to the worker_packages
argument are loaded.
objects passed trough the worker_globals
parameters are copied into the
global environment.
the worker_init
function is ran with an id
argument.
the dataset fetcher is copied to the worker.
Creates an iterator from a DataLoader
dataloader_make_iter(dataloader)
dataloader_make_iter(dataloader)
dataloader |
a dataloader object. |
Get the next element of a dataloader iterator
dataloader_next(iter, completed = NULL)
dataloader_next(iter, completed = NULL)
iter |
a DataLoader iter created with dataloader_make_iter. |
completed |
the returned value when the iterator is exhausted. |
dataset
All datasets that represent a map from keys to data samples should subclass this
class. All subclasses should overwrite the .getitem()
method, which supports
fetching a data sample for a given key. Subclasses could also optionally
overwrite .length()
, which is expected to return the size of the dataset
(e.g. number of samples) used by many sampler implementations
and the default options of dataloader()
.
dataset( name = NULL, inherit = Dataset, ..., private = NULL, active = NULL, parent_env = parent.frame() )
dataset( name = NULL, inherit = Dataset, ..., private = NULL, active = NULL, parent_env = parent.frame() )
name |
a name for the dataset. It it's also used as the class for it. |
inherit |
you can optionally inherit from a dataset when creating a new dataset. |
... |
public methods for the dataset class |
private |
passed to |
active |
passed to |
parent_env |
An environment to use as the parent of newly-created objects. |
The output is a function f
with class dataset_generator
. Calling f()
creates a new instance of the R6 class dataset
. The R6 class is stored in the
enclosing environment of f
and can also be accessed through f
s attribute
Dataset
.
By default datasets are iterated by returning each observation/item individually.
Often it's possible to have an optimized implementation to take a batch
of observations (eg, subsetting a tensor by multiple indexes at once is faster than
subsetting once for each index), in this case you can implement a .getbatch
method
that will be used instead of .getitem
when getting a batch of observations within
the dataloader. .getbatch
must work for batches of size larger or equal to 1 and
care must be taken so it doesn't drop the batch dimension when it's queried with
a length 1 batch index - for instance by using drop=FALSE
. .getitem()
is expected
to not include the batch dimension as it's added by the datalaoder.
For more on this see the the vignette("loading-data")
.
dataloader()
by default constructs a index
sampler that yields integral indices. To make it work with a map-style
dataset with non-integral indices/keys, a custom sampler must be provided.
Subset of a dataset at specified indices.
dataset_subset(dataset, indices)
dataset_subset(dataset, indices)
dataset |
(Dataset): The whole Dataset |
indices |
(sequence): Indices in the whole set selected for subset |
probs
or logits
(but not both).
Samples are binary (0 or 1). They take the value 1
with probability p
and 0
with probability 1 - p
.Creates a Bernoulli distribution parameterized by probs
or logits
(but not both).
Samples are binary (0 or 1). They take the value 1
with probability p
and 0
with probability 1 - p
.
distr_bernoulli(probs = NULL, logits = NULL, validate_args = NULL)
distr_bernoulli(probs = NULL, logits = NULL, validate_args = NULL)
probs |
(numeric or torch_tensor): the probability of sampling |
logits |
(numeric or torch_tensor): the log-odds of sampling |
validate_args |
whether to validate arguments or not. |
Distribution for details on the available methods.
Other distributions:
distr_chi2()
,
distr_gamma()
,
distr_multivariate_normal()
,
distr_normal()
,
distr_poisson()
if (torch_is_installed()) { m <- distr_bernoulli(0.3) m$sample() # 30% chance 1; 70% chance 0 }
if (torch_is_installed()) { m <- distr_bernoulli(0.3) m$sample() # 30% chance 1; 70% chance 0 }
probs
or
logits
(but not both).Creates a categorical distribution parameterized by either probs
or
logits
(but not both).
distr_categorical(probs = NULL, logits = NULL, validate_args = NULL)
distr_categorical(probs = NULL, logits = NULL, validate_args = NULL)
probs |
(Tensor): event probabilities |
logits |
(Tensor): event log probabilities (unnormalized) |
validate_args |
Additional arguments |
It is equivalent to the distribution that torch_multinomial()
samples from.
Samples are integers from where
K
is probs$size(-1)
.
If probs
is 1-dimensional with length-K
, each element is the relative probability
of sampling the class at that index.
If probs
is N-dimensional, the first N-1 dimensions are treated as a batch of
relative probability vectors.
The probs
argument must be non-negative, finite and have a non-zero sum,
and it will be normalized to sum to 1 along the last dimension. attr:probs
will return this normalized value.
The logits
argument will be interpreted as unnormalized log probabilities
and can therefore be any real number. It will likewise be normalized so that
the resulting probabilities sum to 1 along the last dimension. attr:logits
will return this normalized value.
See also: torch_multinomial()
if (torch_is_installed()) { m <- distr_categorical(torch_tensor(c(0.25, 0.25, 0.25, 0.25))) m$sample() # equal probability of 1,2,3,4 }
if (torch_is_installed()) { m <- distr_categorical(torch_tensor(c(0.25, 0.25, 0.25, 0.25))) m$sample() # equal probability of 1,2,3,4 }
df
.
This is exactly equivalent to distr_gamma(alpha=0.5*df, beta=0.5)
Creates a Chi2 distribution parameterized by shape parameter df
.
This is exactly equivalent to distr_gamma(alpha=0.5*df, beta=0.5)
distr_chi2(df, validate_args = NULL)
distr_chi2(df, validate_args = NULL)
df |
(float or torch_tensor): shape parameter of the distribution |
validate_args |
whether to validate arguments or not. |
Distribution for details on the available methods.
Other distributions:
distr_bernoulli()
,
distr_gamma()
,
distr_multivariate_normal()
,
distr_normal()
,
distr_poisson()
if (torch_is_installed()) { m <- distr_chi2(torch_tensor(1.0)) m$sample() # Chi2 distributed with shape df=1 torch_tensor(0.1046) }
if (torch_is_installed()) { m <- distr_chi2(torch_tensor(1.0)) m$sample() # Chi2 distributed with shape df=1 torch_tensor(0.1046) }
concentration
and rate
.Creates a Gamma distribution parameterized by shape concentration
and rate
.
distr_gamma(concentration, rate, validate_args = NULL)
distr_gamma(concentration, rate, validate_args = NULL)
concentration |
(float or Tensor): shape parameter of the distribution (often referred to as alpha) |
rate |
(float or Tensor): rate = 1 / scale of the distribution (often referred to as beta) |
validate_args |
whether to validate arguments or not. |
Distribution for details on the available methods.
Other distributions:
distr_bernoulli()
,
distr_chi2()
,
distr_multivariate_normal()
,
distr_normal()
,
distr_poisson()
if (torch_is_installed()) { m <- distr_gamma(torch_tensor(1.0), torch_tensor(1.0)) m$sample() # Gamma distributed with concentration=1 and rate=1 }
if (torch_is_installed()) { m <- distr_gamma(torch_tensor(1.0), torch_tensor(1.0)) m$sample() # Gamma distributed with concentration=1 and rate=1 }
The MixtureSameFamily
distribution implements a (batch of) mixture
distribution where all component are from different parameterizations of
the same distribution type. It is parameterized by a Categorical
selecting distribution" (over k
component) and a component
distribution, i.e., a Distribution
with a rightmost batch shape
(equal to [k]
) which indexes each (batch of) component.
distr_mixture_same_family( mixture_distribution, component_distribution, validate_args = NULL )
distr_mixture_same_family( mixture_distribution, component_distribution, validate_args = NULL )
mixture_distribution |
|
component_distribution |
|
validate_args |
Additional arguments |
if (torch_is_installed()) { # Construct Gaussian Mixture Model in 1D consisting of 5 equally # weighted normal distributions mix <- distr_categorical(torch_ones(5)) comp <- distr_normal(torch_randn(5), torch_rand(5)) gmm <- distr_mixture_same_family(mix, comp) }
if (torch_is_installed()) { # Construct Gaussian Mixture Model in 1D consisting of 5 equally # weighted normal distributions mix <- distr_categorical(torch_ones(5)) comp <- distr_normal(torch_randn(5), torch_rand(5)) gmm <- distr_mixture_same_family(mix, comp) }
Creates a multivariate normal (also called Gaussian) distribution parameterized by a mean vector and a covariance matrix.
distr_multivariate_normal( loc, covariance_matrix = NULL, precision_matrix = NULL, scale_tril = NULL, validate_args = NULL )
distr_multivariate_normal( loc, covariance_matrix = NULL, precision_matrix = NULL, scale_tril = NULL, validate_args = NULL )
loc |
(Tensor): mean of the distribution |
covariance_matrix |
(Tensor): positive-definite covariance matrix |
precision_matrix |
(Tensor): positive-definite precision matrix |
scale_tril |
(Tensor): lower-triangular factor of covariance, with positive-valued diagonal |
validate_args |
Bool wether to validate the arguments or not. |
The multivariate normal distribution can be parameterized either
in terms of a positive definite covariance matrix
or a positive definite precision matrix
or a lower-triangular matrix
with positive-valued
diagonal entries, such that
. This triangular matrix
can be obtained via e.g. Cholesky decomposition of the covariance.
Only one of covariance_matrix
or precision_matrix
or
scale_tril
can be specified.
Using scale_tril
will be more efficient: all computations internally
are based on scale_tril
. If covariance_matrix
or
precision_matrix
is passed instead, it is only used to compute
the corresponding lower triangular matrices using a Cholesky decomposition.
Distribution for details on the available methods.
Other distributions:
distr_bernoulli()
,
distr_chi2()
,
distr_gamma()
,
distr_normal()
,
distr_poisson()
if (torch_is_installed()) { m <- distr_multivariate_normal(torch_zeros(2), torch_eye(2)) m$sample() # normally distributed with mean=`[0,0]` and covariance_matrix=`I` }
if (torch_is_installed()) { m <- distr_multivariate_normal(torch_zeros(2), torch_eye(2)) m$sample() # normally distributed with mean=`[0,0]` and covariance_matrix=`I` }
loc
and scale
.Creates a normal (also called Gaussian) distribution parameterized by
loc
and scale
.
distr_normal(loc, scale, validate_args = NULL)
distr_normal(loc, scale, validate_args = NULL)
loc |
(float or Tensor): mean of the distribution (often referred to as mu) |
scale |
(float or Tensor): standard deviation of the distribution (often referred to as sigma) |
validate_args |
Additional arguments |
Object of torch_Normal
class
Distribution for details on the available methods.
Other distributions:
distr_bernoulli()
,
distr_chi2()
,
distr_gamma()
,
distr_multivariate_normal()
,
distr_poisson()
if (torch_is_installed()) { m <- distr_normal(loc = 0, scale = 1) m$sample() # normally distributed with loc=0 and scale=1 }
if (torch_is_installed()) { m <- distr_normal(loc = 0, scale = 1) m$sample() # normally distributed with loc=0 and scale=1 }
rate
, the rate parameter.Samples are nonnegative integers, with a pmf given by
distr_poisson(rate, validate_args = NULL)
distr_poisson(rate, validate_args = NULL)
rate |
(numeric, torch_tensor): the rate parameter |
validate_args |
whether to validate arguments or not. |
Distribution for details on the available methods.
Other distributions:
distr_bernoulli()
,
distr_chi2()
,
distr_gamma()
,
distr_multivariate_normal()
,
distr_normal()
if (torch_is_installed()) { m <- distr_poisson(torch_tensor(4)) m$sample() }
if (torch_is_installed()) { m <- distr_poisson(torch_tensor(4)) m$sample() }
Distribution is the abstract base class for probability distributions. Note: in Python, adding torch.Size objects works as concatenation Try for example: torch.Size((2, 1)) + torch.Size((1,))
.validate_args
whether to validate arguments
has_rsample
whether has an rsample
has_enumerate_support
whether has enumerate support
batch_shape
Returns the shape over which parameters are batched.
event_shape
Returns the shape of a single sample (without batching).
Returns a dictionary from argument names to
torch_Constraint
objects that
should be satisfied by each argument of this distribution. Args that
are not tensors need not appear in this dict.
support
Returns a torch_Constraint
object representing this distribution's
support.
mean
Returns the mean on of the distribution
variance
Returns the variance of the distribution
stddev
Returns the standard deviation of the distribution TODO: consider different message
new()
Initializes a distribution class.
Distribution$new(batch_shape = NULL, event_shape = NULL, validate_args = NULL)
batch_shape
the shape over which parameters are batched.
event_shape
the shape of a single sample (without batching).
validate_args
whether to validate the arguments or not. Validation can be time consuming so you might want to disable it.
expand()
Returns a new distribution instance (or populates an existing instance
provided by a derived class) with batch dimensions expanded to batch_shape.
This method calls expand on the distribution’s parameters. As such, this
does not allocate new memory for the expanded distribution instance.
Additionally, this does not repeat any args checking or parameter
broadcasting in initialize
, when an instance is first created.
Distribution$expand(batch_shape, .instance = NULL)
batch_shape
the desired expanded size.
.instance
new instance provided by subclasses that need to
override expand
.
sample()
Generates a sample_shape
shaped sample or sample_shape
shaped batch of
samples if the distribution parameters are batched.
Distribution$sample(sample_shape = NULL)
sample_shape
the shape you want to sample.
rsample()
Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.
Distribution$rsample(sample_shape = NULL)
sample_shape
the shape you want to sample.
log_prob()
Returns the log of the probability density/mass function evaluated at
value
.
Distribution$log_prob(value)
value
values to evaluate the density on.
cdf()
Returns the cumulative density/mass function evaluated at
value
.
Distribution$cdf(value)
value
values to evaluate the density on.
icdf()
Returns the inverse cumulative density/mass function evaluated at
value
.
@description
Returns tensor containing all values supported by a discrete
distribution. The result will enumerate over dimension 0, so the shape
of the result will be (cardinality,) + batch_shape + event_shape (where
event_shape = ()for univariate distributions). Note that this enumerates over all batched tensors in lock-step
list(c(0, 0), c(1, 1), ...). With
expand=FALSE, enumeration happens along dim 0, but with the remaining batch dimensions being singleton dimensions,
list(c(0), c(1), ...)'.
Distribution$icdf(value)
value
values to evaluate the density on.
enumerate_support()
Distribution$enumerate_support(expand = TRUE)
expand
(bool): whether to expand the support over the
batch dims to match the distribution's batch_shape
.
Tensor iterating over dimension 0.
entropy()
Returns entropy of distribution, batched over batch_shape.
Distribution$entropy()
Tensor of shape batch_shape.
perplexity()
Returns perplexity of distribution, batched over batch_shape.
Distribution$perplexity()
Tensor of shape batch_shape.
.extended_shape()
Returns the size of the sample returned by the distribution, given
a sample_shape
. Note, that the batch and event shapes of a distribution
instance are fixed at the time of construction. If this is empty, the
returned shape is upcast to (1,).
Distribution$.extended_shape(sample_shape = NULL)
sample_shape
(torch_Size): the size of the sample to be drawn.
.validate_sample()
Argument validation for distribution methods such as log_prob
,
cdf
and icdf
. The rightmost dimensions of a value to be
scored via these methods must agree with the distribution's batch
and event shapes.
Distribution$.validate_sample(value)
value
(Tensor): the tensor whose log probability is to be
computed by the log_prob
method.
print()
Prints the distribution instance.
Distribution$print()
clone()
The objects of this class are cloneable with this method.
Distribution$clone(deep = FALSE)
deep
Whether to make a deep clone.
Enumerate an iterator
enumerate(x, ...)
enumerate(x, ...)
x |
the generator to enumerate. |
... |
passed to specific methods. |
Enumerate an iterator
## S3 method for class 'dataloader' enumerate(x, max_len = 1e+06, ...)
## S3 method for class 'dataloader' enumerate(x, max_len = 1e+06, ...)
x |
the generator to enumerate. |
max_len |
maximum number of iterations. |
... |
passed to specific methods. |
List the Torch and Lantern libraries URLs to download as local files in order to proceed with install_torch_from_file()
.
Installs Torch and its dependencies from files.
get_install_libs_url(version = NA, type = NA) install_torch_from_file(version = NA, type = NA, libtorch, liblantern, ...)
get_install_libs_url(version = NA, type = NA) install_torch_from_file(version = NA, type = NA, libtorch, liblantern, ...)
version |
Not used |
type |
Not used. This function is deprecated. |
libtorch |
The installation archive file to use for Torch. Shall be a |
liblantern |
The installation archive file to use for Lantern. Shall be a |
... |
other parameters to be passed to |
When "install_torch()"
initiated download is not possible, but installation archive files are
present on local filesystem, "install_torch_from_file()"
can be used as a workaround to installation issue.
"libtorch"
is the archive containing all torch modules, and "liblantern"
is the C interface to libtorch
that is used for the R package. Both are highly dependent, and should be checked through "get_install_libs_url()"
if (torch_is_installed()) { ## Not run: # on a linux CPU platform get_install_libs_url() # then after making both files available into /tmp/ Sys.setenv(TORCH_URL="/tmp/libtorch-v1.13.1.zip") Sys.setenv(LANTERN_URL="/tmp/lantern-0.9.1.9001+cpu+arm64-Darwin.zip") torch::install_torch() ## End(Not run) }
if (torch_is_installed()) { ## Not run: # on a linux CPU platform get_install_libs_url() # then after making both files available into /tmp/ Sys.setenv(TORCH_URL="/tmp/libtorch-v1.13.1.zip") Sys.setenv(LANTERN_URL="/tmp/lantern-0.9.1.9001+cpu+arm64-Darwin.zip") torch::install_torch() ## End(Not run) }
Installs Torch and its dependencies.
install_torch(reinstall = FALSE, ..., .inform_restart = TRUE)
install_torch(reinstall = FALSE, ..., .inform_restart = TRUE)
reinstall |
Re-install Torch even if its already installed? |
... |
Currently unused. |
.inform_restart |
if |
This function is mainly controlled by environment variables that can be used to override the defaults:
TORCH_HOME
: the installation path. By default dependencies are installed
within the package directory. Eg what's given by system.file(package="torch")
.
TORCH_URL
: A URL, path to a ZIP file or a directory containing a LibTorch version.
Files will be installed/copied to the TORCH_HOME
directory.
LANTERN_URL
: Same as TORCH_URL
but for the Lantern library.
TORCH_INSTALL_DEBUG
: Setting it to 1, shows debug log messages during installation.
PRECXX11ABI
: Setting it to 1
will will trigger the installation of
a Pre-cxx11 ABI installation of LibTorch. This can be useful in environments with
older versions of GLIBC like CentOS7 and older Debian/Ubuntu versions.
LANTERN_BASE_URL
: The base URL for lantern files. This allows passing a directory
where lantern binaries are located. The filename is then constructed as usual.
TORCH_COMMIT_SHA
: torch repository commit sha to be used when querying lantern
uploads. Set it to 'none'
to avoid looking for build for that commit and
use the latest build for the branch.
CUDA
: We try to automatically detect the CUDA version installed in your system,
but you might want to manually set it here. You can also disable CUDA installation
by setting it to 'cpu'.
TORCH_R_VERSION
: The R torch version. It's unlikely that you need to change it,
but it can be useful if you don't have the R package installed, but want to
install the dependencies.
The TORCH_INSTALL
environment
variable can be set to 0
to prevent auto-installing torch and TORCH_LOAD
set to 0
to avoid loading dependencies automatically. These environment variables are meant for advanced use
cases and troubleshooting only.
When timeout error occurs during library archive download, or length of downloaded files differ from
reported length, an increase of the timeout
value should help.
Checks if the object is a dataloader
is_dataloader(x)
is_dataloader(x)
x |
object to check |
Checks if the object is a nn_buffer
is_nn_buffer(x)
is_nn_buffer(x)
x |
object to check |
Checks if the object is an nn_module
is_nn_module(x)
is_nn_module(x)
x |
object to check |
Checks if an object is a nn_parameter
is_nn_parameter(x)
is_nn_parameter(x)
x |
the object to check |
Checks if the object is a torch optimizer
is_optimizer(x)
is_optimizer(x)
x |
object to check |
Checks if object is a device
is_torch_device(x)
is_torch_device(x)
x |
object to check |
Check if object is a torch data type
is_torch_dtype(x)
is_torch_dtype(x)
x |
object to check. |
Check if an object is a torch layout.
is_torch_layout(x)
is_torch_layout(x)
x |
object to check |
Check if an object is a memory format
is_torch_memory_format(x)
is_torch_memory_format(x)
x |
object to check |
Checks if an object is a QScheme
is_torch_qscheme(x)
is_torch_qscheme(x)
x |
object to check |
Checks if a tensor is undefined
is_undefined_tensor(x)
is_undefined_tensor(x)
x |
tensor to check |
Creates an iterable dataset
iterable_dataset( name, inherit = IterableDataset, ..., private = NULL, active = NULL, parent_env = parent.frame() )
iterable_dataset( name, inherit = IterableDataset, ..., private = NULL, active = NULL, parent_env = parent.frame() )
name |
a name for the dataset. It it's also used as the class for it. |
inherit |
you can optionally inherit from a dataset when creating a new dataset. |
... |
public methods for the dataset class |
private |
passed to |
active |
passed to |
parent_env |
An environment to use as the parent of newly-created objects. |
if (torch_is_installed()) { ids <- iterable_dataset( name = "hello", initialize = function(n = 5) { self$n <- n self$i <- 0 }, .iter = function() { i <- 0 function() { i <<- i + 1 if (i > self$n) { coro::exhausted() } else { i } } } ) coro::collect(ids()$.iter()) }
if (torch_is_installed()) { ids <- iterable_dataset( name = "hello", initialize = function(n = 5) { self$n <- n self$i <- 0 }, .iter = function() { i <- 0 function() { i <<- i + 1 if (i > self$n) { coro::exhausted() } else { i } } } ) coro::collect(ids()$.iter()) }
See the TorchScript language reference for documentation on how to write TorchScript code.
jit_compile(source)
jit_compile(source)
source |
valid TorchScript source code. |
if (torch_is_installed()) { comp <- jit_compile(" def fn (x): return torch.abs(x) def foo (x): return torch.sum(x) ") comp$fn(torch_tensor(-1)) comp$foo(torch_randn(10)) }
if (torch_is_installed()) { comp <- jit_compile(" def fn (x): return torch.abs(x) def foo (x): return torch.sum(x) ") comp$fn(torch_tensor(-1)) comp$foo(torch_randn(10)) }
script_function
or script_module
previously saved with jit_save
Loads a script_function
or script_module
previously saved with jit_save
jit_load(path, ...)
jit_load(path, ...)
path |
a path to a |
... |
currently unused. |
Call JIT operators directly from R, keeping the familiar argument types and argument order. Note, however, that:
all arguments are required (no defaults)
axis numbering (as well as position numbers overall) starts from 0
scalars have to be wrapped in jit_scalar()
jit_ops
jit_ops
An object of class torch_ops
of length 0.
if (torch_is_installed()) { t1 <- torch::torch_rand(4, 5) t2 <- torch::torch_ones(5, 4) # same as torch::torch_matmul(t1, t2) jit_ops$aten$matmul(t1, t2) # same as torch_split(torch::torch_arange(0, 3), 2, 1) jit_ops$aten$split(torch::torch_arange(0, 3), torch::jit_scalar(2L), torch::jit_scalar(0L)) }
if (torch_is_installed()) { t1 <- torch::torch_rand(4, 5) t2 <- torch::torch_ones(5, 4) # same as torch::torch_matmul(t1, t2) jit_ops$aten$matmul(t1, t2) # same as torch_split(torch::torch_arange(0, 3), 2, 1) jit_ops$aten$split(torch::torch_arange(0, 3), torch::jit_scalar(2L), torch::jit_scalar(0L)) }
script_function
to a pathSaves a script_function
to a path
jit_save(obj, path, ...)
jit_save(obj, path, ...)
obj |
An |
path |
The path to save the serialized function. |
... |
currently unused |
if (torch_is_installed()) { fn <- function(x) { torch_relu(x) } input <- torch_tensor(c(-1, 0, 1)) tr_fn <- jit_trace(fn, input) tmp <- tempfile("tst", fileext = "pt") jit_save(tr_fn, tmp) }
if (torch_is_installed()) { fn <- function(x) { torch_relu(x) } input <- torch_tensor(c(-1, 0, 1)) tr_fn <- jit_trace(fn, input) tmp <- tempfile("tst", fileext = "pt") jit_save(tr_fn, tmp) }
script_function
or script_module
in bytecode form,
to be loaded on a mobile deviceSaves a script_function
or script_module
in bytecode form,
to be loaded on a mobile device
jit_save_for_mobile(obj, path, ...)
jit_save_for_mobile(obj, path, ...)
obj |
An |
path |
The path to save the serialized function. |
... |
currently unused |
if (torch_is_installed()) { fn <- function(x) { torch_relu(x) } input <- torch_tensor(c(-1, 0, 1)) tr_fn <- jit_trace(fn, input) tmp <- tempfile("tst", fileext = "pt") jit_save_for_mobile(tr_fn, tmp) }
if (torch_is_installed()) { fn <- function(x) { torch_relu(x) } input <- torch_tensor(c(-1, 0, 1)) tr_fn <- jit_trace(fn, input) tmp <- tempfile("tst", fileext = "pt") jit_save_for_mobile(tr_fn, tmp) }
Allows disambiguating length 1 vectors from scalars when passing them to the jit.
jit_scalar(x)
jit_scalar(x)
x |
a length 1 R vector. |
script_function
.Using jit_trace
, you can turn an existing R function into a TorchScript
script_function
. You must provide example inputs, and we run the function,
recording the operations performed on all the tensors.
jit_trace(func, ..., strict = TRUE)
jit_trace(func, ..., strict = TRUE)
func |
An R function that will be run with |
... |
example inputs that will be passed to the function while
tracing. The resulting trace can be run with inputs of different types and
shapes assuming the traced operations support those types and shapes.
|
strict |
run the tracer in a strict mode or not (default: |
The resulting recording of a standalone function produces a script_function
.
An script_function
if func
is a function and script_module
if
func
is a nn_module()
.
Tracing only correctly records functions and modules which are not data dependent
(e.g., do not have conditionals on data in tensors) and do not have any untracked
external dependencies (e.g., perform input/output or access global variables).
Tracing only records operations done when the given function is run on the given
tensors. Therefore, the returned script_function
will always run the same traced
graph on any input. This has some important implications when your module is
expected to run different sets of operations, depending on the input and/or the
module state. For example,
Tracing will not record any control-flow like if-statements or loops. When this control-flow is constant across your module, this is fine and it often inlines the control-flow decisions. But sometimes the control-flow is actually part of the model itself. For instance, a recurrent network is a loop over the (possibly dynamic) length of an input sequence.
In the returned script_function
, operations that have different behaviors
in training and eval modes will always behave as if it is in the mode it was
in during tracing, no matter which mode the script_function
is in.
In cases like these, tracing would not be appropriate and scripting is a better choice. If you trace such models, you may silently get incorrect results on subsequent invocations of the model. The tracer will try to emit warnings when doing something that may cause an incorrect trace to be produced.
Scripting is not yet supported in R.
if (torch_is_installed()) { fn <- function(x) { torch_relu(x) } input <- torch_tensor(c(-1, 0, 1)) tr_fn <- jit_trace(fn, input) tr_fn(input) }
if (torch_is_installed()) { fn <- function(x) { torch_relu(x) } input <- torch_tensor(c(-1, 0, 1)) tr_fn <- jit_trace(fn, input) tr_fn(input) }
Trace a module and return an executable ScriptModule that will be optimized
using just-in-time compilation. When a module is passed to jit_trace()
, only
the forward method is run and traced. With jit_trace_module()
, you can specify
a named list of method names to example inputs to trace (see the inputs)
argument below.
jit_trace_module(mod, ..., strict = TRUE)
jit_trace_module(mod, ..., strict = TRUE)
mod |
A torch |
... |
A named list containing sample inputs indexed by method names
in mod. The inputs will be passed to methods whose names correspond to inputs
keys while tracing. |
strict |
run the tracer in a strict mode or not (default: |
See jit_trace for more information on tracing.
if (torch_is_installed()) { linear <- nn_linear(10, 1) tr_linear <- jit_trace_module(linear, forward = list(torch_randn(10, 10))) x <- torch_randn(10, 10) torch_allclose(linear(x), tr_linear(x)) }
if (torch_is_installed()) { linear <- nn_linear(10, 1) tr_linear <- jit_trace_module(linear, forward = list(torch_randn(10, 10))) x <- torch_randn(10, 10) torch_allclose(linear(x), tr_linear(x)) }
Allows specifying that an output or input must be considered a jit tuple and instead of a list or dictionary when tracing.
jit_tuple(x)
jit_tuple(x)
x |
the list object that will be converted to a tuple. |
Letting be
or
,
the Cholesky decomposition of a complex Hermitian or real symmetric positive-definite matrix
is defined as
linalg_cholesky(A)
linalg_cholesky(A)
A |
(Tensor): tensor of shape |
Math could not be displayed. Please visit the package website.
where is a lower triangular matrix and
is the conjugate transpose when
is complex, and the
transpose when
is real-valued.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
linalg_cholesky_ex()
for a version of this operation that
skips the (slow) error checking by default and instead returns the debug
information. This makes it a faster way to check if a matrix is
positive-definite.
linalg_eigh()
for a different decomposition of a Hermitian matrix.
The eigenvalue decomposition gives more information about the matrix but it
slower to compute than the Cholesky decomposition.
Other linalg:
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_eye(10) linalg_cholesky(a) }
if (torch_is_installed()) { a <- torch_eye(10) linalg_cholesky(a) }
This function skips the (slow) error checking and error message construction
of linalg_cholesky()
, instead directly returning the LAPACK
error codes as part of a named tuple (L, info)
. This makes this function
a faster way to check if a matrix is positive-definite, and it provides an
opportunity to handle decomposition errors more gracefully or performantly
than linalg_cholesky()
does.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
If A
is not a Hermitian positive-definite matrix, or if it's a batch of matrices
and one or more of them is not a Hermitian positive-definite matrix,
then info
stores a positive integer for the corresponding matrix.
The positive integer indicates the order of the leading minor that is not positive-definite,
and the decomposition could not be completed.
info
filled with zeros indicates that the decomposition was successful.
If check_errors=TRUE
and info
contains positive integers, then a RuntimeError is thrown.
linalg_cholesky_ex(A, check_errors = FALSE)
linalg_cholesky_ex(A, check_errors = FALSE)
A |
(Tensor): the Hermitian |
check_errors |
(bool, optional): controls whether to check the content of |
If A
is on a CUDA device, this function may synchronize that device with the CPU.
This function is "experimental" and it may change in a future PyTorch release.
linalg_cholesky()
is a NumPy compatible variant that always checks for errors.
Other linalg:
linalg_cholesky()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(2, 2) out <- linalg_cholesky_ex(A) out }
if (torch_is_installed()) { A <- torch_randn(2, 2) out <- linalg_cholesky_ex(A) out }
Letting be
or
,
the condition number
of a matrix
is defined as
linalg_cond(A, p = NULL)
linalg_cond(A, p = NULL)
A |
(Tensor): tensor of shape |
p |
(int, inf, -inf, 'fro', 'nuc', optional):
the type of the matrix norm to use in the computations (see above). Default: |
Math could not be displayed. Please visit the package website.
The condition number of A
measures the numerical stability of the linear system AX = B
with respect to a matrix norm.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
p
defines the matrix norm that is computed. See the table in 'Details' to
find the supported norms.
For p
is one of ('fro', 'nuc', inf, -inf, 1, -1)
, this function uses
linalg_norm()
and linalg_inv()
.
As such, in this case, the matrix (or every matrix in the batch) A
has to be square
and invertible.
For p
in (2, -2)
, this function can be computed in terms of the singular values
Math could not be displayed. Please visit the package website.
In these cases, it is computed using linalg_svd()
. For these norms, the matrix
(or every matrix in the batch) A
may have any shape.
p |
matrix norm |
NULL |
2 -norm (largest singular value) |
'fro' |
Frobenius norm |
'nuc' |
nuclear norm |
Inf |
max(sum(abs(x), dim=2)) |
-Inf |
min(sum(abs(x), dim=2)) |
1 |
max(sum(abs(x), dim=1)) |
-1 |
min(sum(abs(x), dim=1)) |
2 |
largest singular value |
-2 |
smallest singular value |
A real-valued tensor, even when A
is complex.
When inputs are on a CUDA device, this function synchronizes that device with the CPU if
if p
is one of ('fro', 'nuc', inf, -inf, 1, -1)
.
if (torch_is_installed()) { a <- torch_tensor(rbind(c(1., 0, -1), c(0, 1, 0), c(1, 0, 1))) linalg_cond(a) linalg_cond(a, "fro") }
if (torch_is_installed()) { a <- torch_tensor(rbind(c(1., 0, -1), c(0, 1, 0), c(1, 0, 1))) linalg_cond(a) linalg_cond(a, "fro") }
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
linalg_det(A)
linalg_det(A)
A |
(Tensor): tensor of shape |
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(3, 3) linalg_det(a) a <- torch_randn(3, 3, 3) linalg_det(a) }
if (torch_is_installed()) { a <- torch_randn(3, 3) linalg_det(a) a <- torch_randn(3, 3, 3) linalg_det(a) }
Letting be
or
,
the eigenvalue decomposition of a square matrix
(if it exists) is defined as
linalg_eig(A)
linalg_eig(A)
A |
(Tensor): tensor of shape |
Math could not be displayed. Please visit the package website.
This decomposition exists if and only if is
diagonalizable
_.
This is the case when all its eigenvalues are different.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
A list (eigenvalues, eigenvectors)
which corresponds to and
above.
eigenvalues
and eigenvectors
will always be complex-valued, even when A
is real. The eigenvectors
will be given by the columns of eigenvectors
.
This function assumes that A
is diagonalizable
_ (for example, when all the
eigenvalues are different). If it is not diagonalizable, the returned
eigenvalues will be correct but .
The eigenvectors of a matrix are not unique, nor are they continuous with respect to
A
. Due to this lack of uniqueness, different hardware and software may compute
different eigenvectors.
This non-uniqueness is caused by the fact that multiplying an eigenvector by a
non-zero number produces another set of valid eigenvectors of the matrix.
In this implmentation, the returned eigenvectors are normalized to have norm
1
and largest real component.
Gradients computed using V
will only be finite when A
does not have repeated eigenvalues.
Furthermore, if the distance between any two eigenvalues is close to zero,
the gradient will be numerically unstable, as it depends on the eigenvalues
through the computation of
.
The eigenvalues and eigenvectors of a real matrix may be complex.
linalg_eigvals()
computes only the eigenvalues. Unlike linalg_eig()
, the gradients of
linalg_eigvals()
are always numerically stable.
linalg_eigh()
for a (faster) function that computes the eigenvalue decomposition
for Hermitian and symmetric matrices.
linalg_svd()
for a function that computes another type of spectral
decomposition that works on matrices of any shape.
linalg_qr()
for another (much faster) decomposition that works on matrices of
any shape.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(2, 2) wv <- linalg_eig(a) }
if (torch_is_installed()) { a <- torch_randn(2, 2) wv <- linalg_eig(a) }
Letting be
or
,
the eigenvalue decomposition of a complex Hermitian or real symmetric matrix
is defined as
linalg_eigh(A, UPLO = "L")
linalg_eigh(A, UPLO = "L")
A |
(Tensor): tensor of shape |
UPLO |
('L', 'U', optional): controls whether to use the upper or lower triangular part
of |
Math could not be displayed. Please visit the package website.
where is the conjugate transpose when
is complex, and the transpose when
is real-valued.
is orthogonal in the real case and unitary in the complex case.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
A
is assumed to be Hermitian (resp. symmetric), but this is not checked internally, instead:
If UPLO
\ = 'L'
(default), only the lower triangular part of the matrix is used in the computation.
If UPLO
\ = 'U'
, only the upper triangular part of the matrix is used.
The eigenvalues are returned in ascending order.
A list (eigenvalues, eigenvectors)
which corresponds to and
above.
eigenvalues
will always be real-valued, even when A
is complex.
It will also be ordered in ascending order.
eigenvectors
will have the same dtype as A
and will contain the eigenvectors as its columns.
The eigenvectors of a symmetric matrix are not unique, nor are they continuous with
respect to A
. Due to this lack of uniqueness, different hardware and
software may compute different eigenvectors.
This non-uniqueness is caused by the fact that multiplying an eigenvector by
-1
in the real case or by in the complex
case produces another set of valid eigenvectors of the matrix.
This non-uniqueness problem is even worse when the matrix has repeated eigenvalues.
In this case, one may multiply the associated eigenvectors spanning
the subspace by a rotation matrix and the resulting eigenvectors will be valid
eigenvectors.
Gradients computed using the eigenvectors
tensor will only be finite when
A
has unique eigenvalues.
Furthermore, if the distance between any two eigvalues is close to zero,
the gradient will be numerically unstable, as it depends on the eigenvalues
through the computation of
.
The eigenvalues of real symmetric or complex Hermitian matrices are always real.
linalg_eigvalsh()
computes only the eigenvalues values of a Hermitian matrix.
Unlike linalg_eigh()
, the gradients of linalg_eigvalsh()
are always
numerically stable.
linalg_cholesky()
for a different decomposition of a Hermitian matrix.
The Cholesky decomposition gives less information about the matrix but is much faster
to compute than the eigenvalue decomposition.
linalg_eig()
for a (slower) function that computes the eigenvalue decomposition
of a not necessarily Hermitian square matrix.
linalg_svd()
for a (slower) function that computes the more general SVD
decomposition of matrices of any shape.
linalg_qr()
for another (much faster) decomposition that works on general
matrices.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(2, 2) linalg_eigh(a) }
if (torch_is_installed()) { a <- torch_randn(2, 2) linalg_eigh(a) }
Letting be
or
,
the eigenvalues of a square matrix
are defined
as the roots (counted with multiplicity) of the polynomial
p
of degree n
given by
linalg_eigvals(A)
linalg_eigvals(A)
A |
(Tensor): tensor of shape |
Math could not be displayed. Please visit the package website.
where is the
n
-dimensional identity matrix.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
The eigenvalues of a real matrix may be complex, as the roots of a real polynomial may be complex. The eigenvalues of a matrix are always well-defined, even when the matrix is not diagonalizable.
linalg_eig()
computes the full eigenvalue decomposition.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(2, 2) w <- linalg_eigvals(a) }
if (torch_is_installed()) { a <- torch_randn(2, 2) w <- linalg_eigvals(a) }
Letting be
or
,
the eigenvalues of a complex Hermitian or real symmetric matrix
are defined as the roots (counted with multiplicity) of the polynomial
p
of degree n
given by
linalg_eigvalsh(A, UPLO = "L")
linalg_eigvalsh(A, UPLO = "L")
A |
(Tensor): tensor of shape |
UPLO |
('L', 'U', optional): controls whether to use the upper or lower triangular part
of |
Math could not be displayed. Please visit the package website.
where is the
n
-dimensional identity matrix.
The eigenvalues of a real symmetric or complex Hermitian matrix are always real.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
The eigenvalues are returned in ascending order.
A
is assumed to be Hermitian (resp. symmetric), but this is not checked internally, instead:
If UPLO
\ = 'L'
(default), only the lower triangular part of the matrix is used in the computation.
If UPLO
\ = 'U'
, only the upper triangular part of the matrix is used.
A real-valued tensor cointaining the eigenvalues even when A
is complex.
The eigenvalues are returned in ascending order.
linalg_eigh()
computes the full eigenvalue decomposition.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(2, 2) linalg_eigvalsh(a) }
if (torch_is_installed()) { a <- torch_randn(2, 2) linalg_eigvalsh(a) }
n
columns of a product of Householder matrices.Letting be
or
,
for a matrix
with columns
with
and a vector
with
,
this function computes the first
columns of the matrix
linalg_householder_product(A, tau)
linalg_householder_product(A, tau)
A |
(Tensor): tensor of shape |
tau |
(Tensor): tensor of shape |
Math could not be displayed. Please visit the package website.
where is the
m
-dimensional identity matrix and
is the conjugate transpose when
is complex, and the transpose when
is real-valued.
See Representation of Orthogonal or Unitary Matrices for
further details.
Supports inputs of float, double, cfloat and cdouble dtypes. Also supports batches of matrices, and if the inputs are batches of matrices then the output has the same batch dimensions.
This function only uses the values strictly below the main diagonal of A
.
The other values are ignored.
torch_geqrf()
can be used together with this function to form the Q
from the
linalg_qr()
decomposition.
torch_ormqr()
is a related function that computes the matrix multiplication
of a product of Householder matrices with another matrix.
However, that function is not supported by autograd.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(2, 2) h_tau <- torch_geqrf(A) Q <- linalg_householder_product(h_tau[[1]], h_tau[[2]]) torch_allclose(Q, linalg_qr(A)[[1]]) }
if (torch_is_installed()) { A <- torch_randn(2, 2) h_tau <- torch_geqrf(A) Q <- linalg_householder_product(h_tau[[1]], h_tau[[2]]) torch_allclose(Q, linalg_qr(A)[[1]]) }
Throws a runtime_error
if the matrix is not invertible.
linalg_inv(A)
linalg_inv(A)
A |
(Tensor): tensor of shape |
Letting be
or
,
for a matrix
,
its inverse matrix
(if it exists) is defined as
Math could not be displayed. Please visit the package website.
where is the
n
-dimensional identity matrix.
The inverse matrix exists if and only if is invertible. In this case,
the inverse is unique.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if
A
is a batch of matrices
then the output has the same batch dimensions.
Consider using linalg_solve()
if possible for multiplying a matrix on the left by
the inverse, as linalg_solve(A, B) == A$inv() %*% B
It is always prefered to use linalg_solve()
when possible, as it is faster and more
numerically stable than computing the inverse explicitly.
linalg_pinv()
computes the pseudoinverse (Moore-Penrose inverse) of matrices
of any shape.
linalg_solve()
computes A$inv() %*% B
with a
numerically stable algorithm.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(4, 4) linalg_inv(A) }
if (torch_is_installed()) { A <- torch_randn(4, 4) linalg_inv(A) }
Returns a namedtuple (inverse, info)
. inverse
contains the result of
inverting A
and info
stores the LAPACK error codes.
If A
is not an invertible matrix, or if it's a batch of matrices
and one or more of them is not an invertible matrix,
then info
stores a positive integer for the corresponding matrix.
The positive integer indicates the diagonal element of the LU decomposition of
the input matrix that is exactly zero.
info
filled with zeros indicates that the inversion was successful.
If check_errors=TRUE
and info
contains positive integers, then a RuntimeError is thrown.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
linalg_inv_ex(A, check_errors = FALSE)
linalg_inv_ex(A, check_errors = FALSE)
A |
(Tensor): tensor of shape |
check_errors |
(bool, optional): controls whether to check the content of |
If A
is on a CUDA device then this function may synchronize
that device with the CPU.
This function is "experimental" and it may change in a future PyTorch release.
linalg_inv()
is a NumPy compatible variant that always checks for errors.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(3, 3) out <- linalg_inv_ex(A) }
if (torch_is_installed()) { A <- torch_randn(3, 3) out <- linalg_inv_ex(A) }
Letting be
or
,
the least squares problem for a linear system
with
is defined as
linalg_lstsq(A, B, rcond = NULL, ..., driver = NULL)
linalg_lstsq(A, B, rcond = NULL, ..., driver = NULL)
A |
(Tensor): lhs tensor of shape |
B |
(Tensor): rhs tensor of shape |
rcond |
(float, optional): used to determine the effective rank of |
... |
currently unused. |
driver |
(str, optional): name of the LAPACK/MAGMA method to be used.
If |
Math could not be displayed. Please visit the package website.
where denotes the Frobenius norm.
Supports inputs of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if the inputs are batches of matrices then
the output has the same batch dimensions.
driver
chooses the LAPACK/MAGMA function that will be used.
For CPU inputs the valid values are 'gels'
, 'gelsy'
, 'gelsd
, 'gelss'
.
For CUDA input, the only valid driver is 'gels'
, which assumes that A
is full-rank.
To choose the best driver on CPU consider:
If A
is well-conditioned (its condition number is not too large), or you do not mind some precision loss.
For a general matrix: 'gelsy'
(QR with pivoting) (default)
If A
is full-rank: 'gels'
(QR)
If A
is not well-conditioned.
'gelsd'
(tridiagonal reduction and SVD)
But if you run into memory issues: 'gelss'
(full SVD).
See also the full description of these drivers
rcond
is used to determine the effective rank of the matrices in A
when driver
is one of ('gelsy'
, 'gelsd'
, 'gelss'
).
In this case, if are the singular values of
A
in decreasing order,
will be rounded down to zero if
.
If
rcond = NULL
(default), rcond
is set to the machine precision of the dtype of A
.
This function returns the solution to the problem and some extra information in a list of
four tensors (solution, residuals, rank, singular_values)
. For inputs A
, B
of shape (*, m, n)
, (*, m, k)
respectively, it cointains
solution
: the least squares solution. It has shape (*, n, k)
.
residuals
: the squared residuals of the solutions, that is, .
It has shape equal to the batch dimensions of
A
.
It is computed when m > n
and every matrix in A
is full-rank,
otherwise, it is an empty tensor.
If A
is a batch of matrices and any matrix in the batch is not full rank,
then an empty tensor is returned. This behavior may change in a future PyTorch release.
rank
: tensor of ranks of the matrices in A
.
It has shape equal to the batch dimensions of A
.
It is computed when driver
is one of ('gelsy'
, 'gelsd'
, 'gelss'
),
otherwise it is an empty tensor.
singular_values
: tensor of singular values of the matrices in A
.
It has shape (*, min(m, n))
.
It is computed when driver
is one of ('gelsd'
, 'gelss'
),
otherwise it is an empty tensor.
A list (solution, residuals, rank, singular_values)
.
The default value of rcond
may change in a future PyTorch release.
It is therefore recommended to use a fixed value to avoid potential
breaking changes.
This function computes X = A$pinverse() %*% B
in a faster and
more numerically stable way than performing the computations separately.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_tensor(rbind(c(10, 2, 3), c(3, 10, 5), c(5, 6, 12)))$unsqueeze(1) # shape (1, 3, 3) B <- torch_stack(list( rbind(c(2, 5, 1), c(3, 2, 1), c(5, 1, 9)), rbind(c(4, 2, 9), c(2, 0, 3), c(2, 5, 3)) ), dim = 1) # shape (2, 3, 3) X <- linalg_lstsq(A, B)$solution # A is broadcasted to shape (2, 3, 3) }
if (torch_is_installed()) { A <- torch_tensor(rbind(c(10, 2, 3), c(3, 10, 5), c(5, 6, 12)))$unsqueeze(1) # shape (1, 3, 3) B <- torch_stack(list( rbind(c(2, 5, 1), c(3, 2, 1), c(5, 1, 9)), rbind(c(4, 2, 9), c(2, 0, 3), c(2, 5, 3)) ), dim = 1) # shape (2, 3, 3) X <- linalg_lstsq(A, B)$solution # A is broadcasted to shape (2, 3, 3) }
If A
is complex valued, it computes the norm of A$abs()
Support input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices: the norm will be computed over the
dimensions specified by the 2-tuple dim
and the other dimensions will
be treated as batch dimensions. The output will have the same batch dimensions.
linalg_matrix_norm( A, ord = "fro", dim = c(-2, -1), keepdim = FALSE, dtype = NULL )
linalg_matrix_norm( A, ord = "fro", dim = c(-2, -1), keepdim = FALSE, dtype = NULL )
A |
(Tensor): tensor with two or more dimensions. By default its
shape is interpreted as |
ord |
(int, inf, -inf, 'fro', 'nuc', optional): order of norm. Default: |
dim |
(int, Tupleint, optional): dimensions over which to compute
the vector or matrix norm. See above for the behavior when |
keepdim |
(bool, optional): If set to |
dtype |
dtype ( |
ord
defines the norm that is computed. The following norms are
supported:
ord |
norm for matrices | norm for vectors |
NULL (default) |
Frobenius norm | 2 -norm (see below) |
"fro" |
Frobenius norm | – not supported – |
"nuc" |
nuclear norm | – not supported – |
Inf |
max(sum(abs(x), dim=2)) |
max(abs(x)) |
-Inf |
min(sum(abs(x), dim=2)) |
min(abs(x)) |
0 |
– not supported – | sum(x != 0) |
1 |
max(sum(abs(x), dim=1)) |
as below |
-1 |
min(sum(abs(x), dim=1)) |
as below |
2 |
largest singular value | as below |
-2 |
smallest singular value | as below |
other int or float |
– not supported – | sum(abs(x)^{ord})^{(1 / ord)} |
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_arange(0, 8, dtype = torch_float())$reshape(c(3, 3)) linalg_matrix_norm(a) linalg_matrix_norm(a, ord = -1) b <- a$expand(c(2, -1, -1)) linalg_matrix_norm(b) linalg_matrix_norm(b, dim = c(1, 3)) }
if (torch_is_installed()) { a <- torch_arange(0, 8, dtype = torch_float())$reshape(c(3, 3)) linalg_matrix_norm(a) linalg_matrix_norm(a, ord = -1) b <- a$expand(c(2, -1, -1)) linalg_matrix_norm(b) linalg_matrix_norm(b, dim = c(1, 3)) }
n
-th power of a square matrix for an integer n
.Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
linalg_matrix_power(A, n)
linalg_matrix_power(A, n)
A |
(Tensor): tensor of shape |
n |
(int): the exponent. |
If n=0
, it returns the identity matrix (or batch) of the same shape
as A
. If n
is negative, it returns the inverse of each matrix
(if invertible) raised to the power of abs(n)
.
linalg_solve()
computes A$inverse() %*% B
with a
numerically stable algorithm.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(3, 3) linalg_matrix_power(A, 0) }
if (torch_is_installed()) { A <- torch_randn(3, 3) linalg_matrix_power(A, 0) }
The matrix rank is computed as the number of singular values
(or eigenvalues in absolute value when hermitian = TRUE
)
that are greater than the specified tol
threshold.
linalg_matrix_rank( A, ..., atol = NULL, rtol = NULL, tol = NULL, hermitian = FALSE )
linalg_matrix_rank( A, ..., atol = NULL, rtol = NULL, tol = NULL, hermitian = FALSE )
A |
(Tensor): tensor of shape |
... |
Not currently used. |
atol |
the absolute tolerance value. When |
rtol |
the relative tolerance value. See above for the value it takes when |
tol |
(float, Tensor, optional): the tolerance value. See above for
the value it takes when |
hermitian |
(bool, optional): indicates whether |
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
If hermitian = TRUE
, A
is assumed to be Hermitian if complex or
symmetric if real, but this is not checked internally. Instead, just the lower
triangular part of the matrix is used in the computations.
If tol
is not specified and A
is a matrix of dimensions (m, n)
,
the tolerance is set to be
Math could not be displayed. Please visit the package website.
where is the largest singular value
(or eigenvalue in absolute value when
hermitian = TRUE
), and
is the epsilon value for the dtype of
A
(see torch_finfo()
).
If A
is a batch of matrices, tol
is computed this way for every element of
the batch.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_eye(10) linalg_matrix_rank(a) }
if (torch_is_installed()) { a <- torch_eye(10) linalg_matrix_rank(a) }
Efficiently multiplies two or more matrices by reordering the multiplications so that the fewest arithmetic operations are performed.
linalg_multi_dot(tensors)
linalg_multi_dot(tensors)
tensors |
( |
Supports inputs of float
, double
, cfloat
and cdouble
dtypes.
This function does not support batched inputs.
Every tensor in tensors
must be 2D, except for the first and last which
may be 1D. If the first tensor is a 1D vector of shape (n,)
it is treated as a row vector
of shape (1, n)
, similarly if the last tensor is a 1D vector of shape (n,)
it is treated
as a column vector of shape (n, 1)
.
If the first and last tensors are matrices, the output will be a matrix. However, if either is a 1D vector, then the output will be a 1D vector.
This function is implemented by chaining torch_mm()
calls after
computing the optimal matrix multiplication order.
The cost of multiplying two matrices with shapes (a, b)
and (b, c)
is
a * b * c
. Given matrices A
, B
, C
with shapes (10, 100)
,
(100, 5)
, (5, 50)
respectively, we can calculate the cost of different
multiplication orders as follows:
Math could not be displayed. Please visit the package website.
In this case, multiplying A
and B
first followed by C
is 10 times faster.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { linalg_multi_dot(list(torch_tensor(c(1, 2)), torch_tensor(c(2, 3)))) }
if (torch_is_installed()) { linalg_multi_dot(list(torch_tensor(c(1, 2)), torch_tensor(c(2, 3)))) }
If A
is complex valued, it computes the norm of A$abs()
Supports input of float, double, cfloat and cdouble dtypes.
Whether this function computes a vector or matrix norm is determined as follows:
linalg_norm(A, ord = NULL, dim = NULL, keepdim = FALSE, dtype = NULL)
linalg_norm(A, ord = NULL, dim = NULL, keepdim = FALSE, dtype = NULL)
A |
(Tensor): tensor of shape |
ord |
(int, float, inf, -inf, 'fro', 'nuc', optional): order of norm. Default: |
dim |
(int, Tupleint, optional): dimensions over which to compute
the vector or matrix norm. See above for the behavior when |
keepdim |
(bool, optional): If set to |
dtype |
dtype ( |
If dim
is an int, the vector norm will be computed.
If dim
is a 2-tuple, the matrix norm will be computed.
If dim=NULL
and ord=NULL
, A will be flattened to 1D and the 2-norm of the resulting vector will be computed.
If dim=NULL
and ord!=NULL
, A must be 1D or 2D.
ord
defines the norm that is computed. The following norms are
supported:
ord |
norm for matrices | norm for vectors |
NULL (default) |
Frobenius norm | 2 -norm (see below) |
"fro" |
Frobenius norm | – not supported – |
"nuc" |
nuclear norm | – not supported – |
Inf |
max(sum(abs(x), dim=2)) |
max(abs(x)) |
-Inf |
min(sum(abs(x), dim=2)) |
min(abs(x)) |
0 |
– not supported – | sum(x != 0) |
1 |
max(sum(abs(x), dim=1)) |
as below |
-1 |
min(sum(abs(x), dim=1)) |
as below |
2 |
largest singular value | as below |
-2 |
smallest singular value | as below |
other int or float |
– not supported – | sum(abs(x)^{ord})^{(1 / ord)} |
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_arange(0, 8, dtype = torch_float()) - 4 a b <- a$reshape(c(3, 3)) b linalg_norm(a) linalg_norm(b) }
if (torch_is_installed()) { a <- torch_arange(0, 8, dtype = torch_float()) - 4 a b <- a$reshape(c(3, 3)) b linalg_norm(a) linalg_norm(b) }
The pseudoinverse may be defined algebraically
_
but it is more computationally convenient to understand it through the SVD
_
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
linalg_pinv(A, rcond = NULL, hermitian = FALSE, atol = NULL, rtol = NULL)
linalg_pinv(A, rcond = NULL, hermitian = FALSE, atol = NULL, rtol = NULL)
A |
(Tensor): tensor of shape |
rcond |
(float or Tensor, optional): the tolerance value to determine when is a singular value zero
If it is a |
hermitian |
(bool, optional): indicates whether |
atol |
the absolute tolerance value. When |
rtol |
the relative tolerance value. See above for the value it takes when |
If hermitian= TRUE
, A
is assumed to be Hermitian if complex or
symmetric if real, but this is not checked internally. Instead, just the lower
triangular part of the matrix is used in the computations.
The singular values (or the norm of the eigenvalues when hermitian= TRUE
)
that are below the specified rcond
threshold are treated as zero and discarded
in the computation.
This function uses linalg_svd()
if hermitian= FALSE
and
linalg_eigh()
if hermitian= TRUE
.
For CUDA inputs, this function synchronizes that device with the CPU.
Consider using linalg_lstsq()
if possible for multiplying a matrix on the left by
the pseudoinverse, as linalg_lstsq(A, B)$solution == A$pinv() %*% B
It is always prefered to use linalg_lstsq()
when possible, as it is faster and more
numerically stable than computing the pseudoinverse explicitly.
linalg_inv()
computes the inverse of a square matrix.
linalg_lstsq()
computes A$pinv() %*% B
with a
numerically stable algorithm.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(3, 5) linalg_pinv(A) }
if (torch_is_installed()) { A <- torch_randn(3, 5) linalg_pinv(A) }
Letting be
or
,
the full QR decomposition of a matrix
is defined as
linalg_qr(A, mode = "reduced")
linalg_qr(A, mode = "reduced")
A |
(Tensor): tensor of shape |
mode |
(str, optional): one of |
Math could not be displayed. Please visit the package website.
where is orthogonal in the real case and unitary in the complex case, and
is upper triangular.
When
m > n
(tall matrix), as R
is upper triangular, its last m - n
rows are zero.
In this case, we can drop the last m - n
columns of Q
to form the
reduced QR decomposition:
Math could not be displayed. Please visit the package website.
The reduced QR decomposition agrees with the full QR decomposition when n >= m
(wide matrix).
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
The parameter mode
chooses between the full and reduced QR decomposition.
If A
has shape (*, m, n)
, denoting k = min(m, n)
mode = 'reduced'
(default): Returns (Q, R)
of shapes (*, m, k)
, (*, k, n)
respectively.
mode = 'complete'
: Returns (Q, R)
of shapes (*, m, m)
, (*, m, n)
respectively.
mode = 'r'
: Computes only the reduced R
. Returns (Q, R)
with Q
empty and R
of shape (*, k, n)
.
A list (Q, R)
.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_tensor(rbind(c(12., -51, 4), c(6, 167, -68), c(-4, 24, -41))) qr <- linalg_qr(a) torch_mm(qr[[1]], qr[[2]])$round() torch_mm(qr[[1]]$t(), qr[[1]])$round() }
if (torch_is_installed()) { a <- torch_tensor(rbind(c(12., -51, 4), c(6, 167, -68), c(-4, 24, -41))) qr <- linalg_qr(a) torch_mm(qr[[1]], qr[[2]])$round() torch_mm(qr[[1]]$t(), qr[[1]])$round() }
For complex A
, it returns the angle and the natural logarithm of the modulus of the
determinant, that is, a logarithmic polar decomposition of the determinant.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
linalg_slogdet(A)
linalg_slogdet(A)
A |
(Tensor): tensor of shape |
A list (sign, logabsdet)
.
logabsdet
will always be real-valued, even when A
is complex.
sign
will have the same dtype as A
.
The determinant can be recovered as sign * exp(logabsdet)
.
When a matrix has a determinant of zero, it returns (0, -Inf)
.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(3, 3) linalg_slogdet(a) }
if (torch_is_installed()) { a <- torch_randn(3, 3) linalg_slogdet(a) }
Letting be
or
,
this function computes the solution
of the linear system associated to
, which is defined as
linalg_solve(A, B)
linalg_solve(A, B)
A |
(Tensor): tensor of shape |
B |
(Tensor): right-hand side tensor of shape |
This system of linear equations has one solution if and only if is
invertible
_.
This function assumes that is invertible.
Supports inputs of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if the inputs are batches of matrices then
the output has the same batch dimensions.
Letting *
be zero or more batch dimensions,
If A
has shape (*, n, n)
and B
has shape (*, n)
(a batch of vectors) or shape
(*, n, k)
(a batch of matrices or "multiple right-hand sides"), this function returns X
of shape
(*, n)
or (*, n, k)
respectively.
Otherwise, if A
has shape (*, n, n)
and B
has shape (n,)
or (n, k)
, B
is broadcasted to have shape (*, n)
or (*, n, k)
respectively.
This function then returns the solution of the resulting batch of systems of linear equations.
This function computes X = A$inverse() @ B
in a faster and
more numerically stable way than performing the computations separately.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(3, 3) b <- torch_randn(3) x <- linalg_solve(A, b) torch_allclose(torch_matmul(A, x), b) }
if (torch_is_installed()) { A <- torch_randn(3, 3) b <- torch_randn(3) x <- linalg_solve(A, b) torch_allclose(torch_matmul(A, x), b) }
Triangular solve
linalg_solve_triangular(A, B, ..., upper, left = TRUE, unitriangular = FALSE)
linalg_solve_triangular(A, B, ..., upper, left = TRUE, unitriangular = FALSE)
A |
tensor of shape |
B |
right-hand side tensor of shape |
... |
Currently ignored. |
upper |
whether A is an upper or lower triangular matrix. |
left |
wheter to solve the system AX=B or XA=B |
unitriangular |
if |
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
Letting be
or
,
the full SVD of a matrix
, if
k = min(m,n)
, is defined as
linalg_svd(A, full_matrices = TRUE)
linalg_svd(A, full_matrices = TRUE)
A |
(Tensor): tensor of shape |
full_matrices |
(bool, optional): controls whether to compute the full or reduced
SVD, and consequently, the shape of the returned tensors |
Math could not be displayed. Please visit the package website.
where ,
is the conjugate transpose when
is complex, and the transpose when
is real-valued.
The matrices ,
(and thus
) are orthogonal in the real case, and unitary in the complex case.
When
m > n
(resp. m < n
) we can drop the last m - n
(resp. n - m
) columns of U
(resp. V
) to form the reduced SVD:
Math could not be displayed. Please visit the package website.
where .
In this case, and
also have orthonormal columns.
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
The returned decomposition is a named tuple (U, S, V)
which corresponds to ,
,
above.
The singular values are returned in descending order.
The parameter full_matrices
chooses between the full (default) and reduced SVD.
A list (U, S, V)
which corresponds to ,
,
above.
S
will always be real-valued, even when A
is complex.
It will also be ordered in descending order.
U
and V
will have the same dtype as A
. The left / right singular vectors will be given by
the columns of U
and the rows of V
respectively.
The returned tensors U
and V
are not unique, nor are they continuous with
respect to A
.
Due to this lack of uniqueness, different hardware and software may compute
different singular vectors.
This non-uniqueness is caused by the fact that multiplying any pair of singular
vectors by
-1
in the real case or by
in the complex case produces another two
valid singular vectors of the matrix.
This non-uniqueness problem is even worse when the matrix has repeated singular values.
In this case, one may multiply the associated singular vectors of
U
and V
spanning
the subspace by a rotation matrix and the resulting vectors will span the same subspace.
Gradients computed using U
or V
will only be finite when
A
does not have zero as a singular value or repeated singular values.
Furthermore, if the distance between any two singular values is close to zero,
the gradient will be numerically unstable, as it depends on the singular values
through the computation of
.
The gradient will also be numerically unstable when
A
has small singular
values, as it also depends on the computaiton of .
When full_matrices=TRUE
, the gradients with respect to U[..., :, min(m, n):]
and Vh[..., min(m, n):, :]
will be ignored, as those vectors can be arbitrary bases
of the corresponding subspaces.
linalg_svdvals()
computes only the singular values.
Unlike linalg_svd()
, the gradients of linalg_svdvals()
are always
numerically stable.
linalg_eig()
for a function that computes another type of spectral
decomposition of a matrix. The eigendecomposition works just on on square matrices.
linalg_eigh()
for a (faster) function that computes the eigenvalue decomposition
for Hermitian and symmetric matrices.
linalg_qr()
for another (much faster) decomposition that works on general
matrices.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { a <- torch_randn(5, 3) linalg_svd(a, full_matrices = FALSE) }
if (torch_is_installed()) { a <- torch_randn(5, 3) linalg_svd(a, full_matrices = FALSE) }
Supports input of float, double, cfloat and cdouble dtypes.
Also supports batches of matrices, and if A
is a batch of matrices then
the output has the same batch dimensions.
The singular values are returned in descending order.
linalg_svdvals(A)
linalg_svdvals(A)
A |
(Tensor): tensor of shape |
A real-valued tensor, even when A
is complex.
linalg_svd()
computes the full singular value decomposition.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_tensorinv()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_randn(5, 3) S <- linalg_svdvals(A) S }
if (torch_is_installed()) { A <- torch_randn(5, 3) S <- linalg_svdvals(A) S }
torch_tensordot()
If m
is the product of the first ind
dimensions of A
and n
is the product of
the rest of the dimensions, this function expects m
and n
to be equal.
If this is the case, it computes a tensor X
such that
tensordot(A, X, ind)
is the identity matrix in dimension m
.
linalg_tensorinv(A, ind = 3L)
linalg_tensorinv(A, ind = 3L)
A |
(Tensor): tensor to invert. |
ind |
(int): index at which to compute the inverse of |
Supports input of float, double, cfloat and cdouble dtypes.
Consider using linalg_tensorsolve()
if possible for multiplying a tensor on the left
by the tensor inverse as linalg_tensorsolve(A, B) == torch_tensordot(linalg_tensorinv(A), B))
It is always prefered to use linalg_tensorsolve()
when possible, as it is faster and more
numerically stable than computing the pseudoinverse explicitly.
linalg_tensorsolve()
computes torch_tensordot(linalg_tensorinv(A), B))
.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorsolve()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_eye(4 * 6)$reshape(c(4, 6, 8, 3)) Ainv <- linalg_tensorinv(A, ind = 3) Ainv$shape B <- torch_randn(4, 6) torch_allclose(torch_tensordot(Ainv, B), linalg_tensorsolve(A, B)) A <- torch_randn(4, 4) Atensorinv <- linalg_tensorinv(A, 2) Ainv <- linalg_inv(A) torch_allclose(Atensorinv, Ainv) }
if (torch_is_installed()) { A <- torch_eye(4 * 6)$reshape(c(4, 6, 8, 3)) Ainv <- linalg_tensorinv(A, ind = 3) Ainv$shape B <- torch_randn(4, 6) torch_allclose(torch_tensordot(Ainv, B), linalg_tensorsolve(A, B)) A <- torch_randn(4, 4) Atensorinv <- linalg_tensorinv(A, 2) Ainv <- linalg_inv(A) torch_allclose(Atensorinv, Ainv) }
X
to the system torch_tensordot(A, X) = B
.If m
is the product of the first B
\ .ndim
dimensions of A
and
n
is the product of the rest of the dimensions, this function expects m
and n
to be equal.
The returned tensor x
satisfies
tensordot(A, x, dims=x$ndim) == B
.
linalg_tensorsolve(A, B, dims = NULL)
linalg_tensorsolve(A, B, dims = NULL)
A |
(Tensor): tensor to solve for. |
B |
(Tensor): the solution |
dims |
(Tupleint, optional): dimensions of |
If dims
is specified, A
will be reshaped as
A = movedim(A, dims, seq(len(dims) - A$ndim + 1, 0))
Supports inputs of float, double, cfloat and cdouble dtypes.
linalg_tensorinv()
computes the multiplicative inverse of
torch_tensordot()
.
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_vector_norm()
if (torch_is_installed()) { A <- torch_eye(2 * 3 * 4)$reshape(c(2 * 3, 4, 2, 3, 4)) B <- torch_randn(2 * 3, 4) X <- linalg_tensorsolve(A, B) X$shape torch_allclose(torch_tensordot(A, X, dims = X$ndim), B) A <- torch_randn(6, 4, 4, 3, 2) B <- torch_randn(4, 3, 2) X <- linalg_tensorsolve(A, B, dims = c(1, 3)) A <- A$permute(c(2, 4, 5, 1, 3)) torch_allclose(torch_tensordot(A, X, dims = X$ndim), B, atol = 1e-6) }
if (torch_is_installed()) { A <- torch_eye(2 * 3 * 4)$reshape(c(2 * 3, 4, 2, 3, 4)) B <- torch_randn(2 * 3, 4) X <- linalg_tensorsolve(A, B) X$shape torch_allclose(torch_tensordot(A, X, dims = X$ndim), B) A <- torch_randn(6, 4, 4, 3, 2) B <- torch_randn(4, 3, 2) X <- linalg_tensorsolve(A, B, dims = c(1, 3)) A <- A$permute(c(2, 4, 5, 1, 3)) torch_allclose(torch_tensordot(A, X, dims = X$ndim), B, atol = 1e-6) }
If A
is complex valued, it computes the norm of A$abs()
Supports input of float, double, cfloat and cdouble dtypes.
This function does not necessarily treat multidimensonal A
as a batch of
vectors, instead:
linalg_vector_norm(A, ord = 2, dim = NULL, keepdim = FALSE, dtype = NULL)
linalg_vector_norm(A, ord = 2, dim = NULL, keepdim = FALSE, dtype = NULL)
A |
(Tensor): tensor, flattened by default, but this behavior can be
controlled using |
ord |
(int, float, inf, -inf, 'fro', 'nuc', optional): order of norm. Default: |
dim |
(int, Tupleint, optional): dimensions over which to compute
the vector or matrix norm. See above for the behavior when |
keepdim |
(bool, optional): If set to |
dtype |
dtype ( |
If dim=NULL
, A
will be flattened before the norm is computed.
If dim
is an int
or a tuple
, the norm will be computed over these dimensions
and the other dimensions will be treated as batch dimensions.
This behavior is for consistency with linalg_norm()
.
ord
defines the norm that is computed. The following norms are
supported:
ord |
norm for matrices | norm for vectors |
NULL (default) |
Frobenius norm | 2 -norm (see below) |
"fro" |
Frobenius norm | – not supported – |
"nuc" |
nuclear norm | – not supported – |
Inf |
max(sum(abs(x), dim=2)) |
max(abs(x)) |
-Inf |
min(sum(abs(x), dim=2)) |
min(abs(x)) |
0 |
– not supported – | sum(x != 0) |
1 |
max(sum(abs(x), dim=1)) |
as below |
-1 |
min(sum(abs(x), dim=1)) |
as below |
2 |
largest singular value | as below |
-2 |
smallest singular value | as below |
other int or float |
– not supported – | sum(abs(x)^{ord})^{(1 / ord)} |
Other linalg:
linalg_cholesky()
,
linalg_cholesky_ex()
,
linalg_det()
,
linalg_eig()
,
linalg_eigh()
,
linalg_eigvals()
,
linalg_eigvalsh()
,
linalg_householder_product()
,
linalg_inv()
,
linalg_inv_ex()
,
linalg_lstsq()
,
linalg_matrix_norm()
,
linalg_matrix_power()
,
linalg_matrix_rank()
,
linalg_multi_dot()
,
linalg_norm()
,
linalg_pinv()
,
linalg_qr()
,
linalg_slogdet()
,
linalg_solve()
,
linalg_solve_triangular()
,
linalg_svd()
,
linalg_svdvals()
,
linalg_tensorinv()
,
linalg_tensorsolve()
if (torch_is_installed()) { a <- torch_arange(0, 8, dtype = torch_float()) - 4 a b <- a$reshape(c(3, 3)) b linalg_vector_norm(a, ord = 3.5) linalg_vector_norm(b, ord = 3.5) }
if (torch_is_installed()) { a <- torch_arange(0, 8, dtype = torch_float()) - 4 a b <- a$reshape(c(3, 3)) b linalg_vector_norm(a, ord = 3.5) linalg_vector_norm(b, ord = 3.5) }
This function should only be used to load models saved in python.
For it to work correctly you need to use torch.save
with the flag:
_use_new_zipfile_serialization=True
and also remove all nn.Parameter
classes from the tensors in the dict.
load_state_dict(path, ..., legacy_stream = FALSE)
load_state_dict(path, ..., legacy_stream = FALSE)
path |
to the state dict file |
... |
additional arguments that are currently not used. |
legacy_stream |
if |
The above might change with development of this in pytorch's C++ api.
a named list of tensors.
Allow regions of your code to run in mixed precision. In these regions, ops run in an op-specific dtype chosen by autocast to improve performance while maintaining accuracy.
local_autocast( device_type, dtype = NULL, enabled = TRUE, cache_enabled = NULL, ..., .env = parent.frame() ) with_autocast( code, ..., device_type, dtype = NULL, enabled = TRUE, cache_enabled = NULL ) set_autocast(device_type, dtype = NULL, enabled = TRUE, cache_enabled = NULL) unset_autocast(context)
local_autocast( device_type, dtype = NULL, enabled = TRUE, cache_enabled = NULL, ..., .env = parent.frame() ) with_autocast( code, ..., device_type, dtype = NULL, enabled = TRUE, cache_enabled = NULL ) set_autocast(device_type, dtype = NULL, enabled = TRUE, cache_enabled = NULL) unset_autocast(context)
device_type |
a character string indicating whether to use 'cuda' or 'cpu' device |
dtype |
a torch data type indicating whether to use |
enabled |
a logical value indicating whether autocasting should be enabled in the region. Default: TRUE |
cache_enabled |
a logical value indicating whether the weight cache inside autocast should be enabled. |
... |
currently unused. |
.env |
The environment to use for scoping. |
code |
code to be executed with no gradient recording. |
context |
Returned by |
When entering an autocast-enabled region, Tensors may be any type.
You should not call half()
or bfloat16()
on your model(s) or inputs
when using autocasting.
autocast
should only be enabled during the forward pass(es) of your network,
including the loss computation(s). Backward passes under autocast are not
recommended. Backward ops run in the same type that autocast used for
corresponding forward ops.
with_autocast()
: A with context for automatic mixed precision.
set_autocast()
: Set the autocast context. For advanced users only.
unset_autocast()
: Unset the autocast context.
cuda_amp_grad_scaler()
to perform dynamic gradient scaling.
if (torch_is_installed()) { x <- torch_randn(5, 5, dtype = torch_float32()) y <- torch_randn(5, 5, dtype = torch_float32()) foo <- function(x, y) { local_autocast(device = "cpu") z <- torch_mm(x, y) w <- torch_mm(z, x) w } out <- foo(x, y) }
if (torch_is_installed()) { x <- torch_randn(5, 5, dtype = torch_float32()) y <- torch_randn(5, 5, dtype = torch_float32()) foo <- function(x, y) { local_autocast(device = "cpu") z <- torch_mm(x, y) w <- torch_mm(z, x) w } out <- foo(x, y) }
Device contexts
local_device(device, ..., .env = parent.frame()) with_device(code, ..., device)
local_device(device, ..., .env = parent.frame()) with_device(code, ..., device)
device |
A torch device to be used by default when creating new tensors. |
... |
currently unused. |
.env |
The environment to use for scoping. |
code |
The code to be evaluated in the modified environment. |
with_device()
: Modifies the default device for the selected context.
Set the learning rate of each parameter group using a cosine annealing schedule
lr_cosine_annealing( optimizer, T_max, eta_min = 0, last_epoch = -1, verbose = FALSE )
lr_cosine_annealing( optimizer, T_max, eta_min = 0, last_epoch = -1, verbose = FALSE )
optimizer |
(Optimizer): Wrapped optimizer. |
T_max |
Maximum number of iterations |
eta_min |
Minimum learning rate. Default: 0. |
last_epoch |
The index of the last epoch |
verbose |
(bool): If |
Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr.
lr_lambda(optimizer, lr_lambda, last_epoch = -1, verbose = FALSE)
lr_lambda(optimizer, lr_lambda, last_epoch = -1, verbose = FALSE)
optimizer |
(Optimizer): Wrapped optimizer. |
lr_lambda |
(function or list): A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. |
last_epoch |
(int): The index of last epoch. Default: -1. |
verbose |
(bool): If |
if (torch_is_installed()) { # Assuming optimizer has two groups. lambda1 <- function(epoch) epoch %/% 30 lambda2 <- function(epoch) 0.95^epoch ## Not run: scheduler <- lr_lambda(optimizer, lr_lambda = list(lambda1, lambda2)) for (epoch in 1:100) { train(...) validate(...) scheduler$step() } ## End(Not run) }
if (torch_is_installed()) { # Assuming optimizer has two groups. lambda1 <- function(epoch) epoch %/% 30 lambda2 <- function(epoch) 0.95^epoch ## Not run: scheduler <- lr_lambda(optimizer, lr_lambda = list(lambda1, lambda2)) for (epoch in 1:100) { train(...) validate(...) scheduler$step() } ## End(Not run) }
Multiply the learning rate of each parameter group by the factor given in the specified function. When last_epoch=-1, sets initial lr as lr.
lr_multiplicative(optimizer, lr_lambda, last_epoch = -1, verbose = FALSE)
lr_multiplicative(optimizer, lr_lambda, last_epoch = -1, verbose = FALSE)
optimizer |
(Optimizer): Wrapped optimizer. |
lr_lambda |
(function or list): A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. |
last_epoch |
(int): The index of last epoch. Default: -1. |
verbose |
(bool): If |
if (torch_is_installed()) { ## Not run: lmbda <- function(epoch) 0.95 scheduler <- lr_multiplicative(optimizer, lr_lambda = lmbda) for (epoch in 1:100) { train(...) validate(...) scheduler$step() } ## End(Not run) }
if (torch_is_installed()) { ## Not run: lmbda <- function(epoch) 0.95 scheduler <- lr_multiplicative(optimizer, lr_lambda = lmbda) for (epoch in 1:100) { train(...) validate(...) scheduler$step() } ## End(Not run) }
Sets the learning rate of each parameter group according to the 1cycle learning rate policy. The 1cycle policy anneals the learning rate from an initial learning rate to some maximum learning rate and then from that maximum learning rate to some minimum learning rate much lower than the initial learning rate.
lr_one_cycle( optimizer, max_lr, total_steps = NULL, epochs = NULL, steps_per_epoch = NULL, pct_start = 0.3, anneal_strategy = "cos", cycle_momentum = TRUE, base_momentum = 0.85, max_momentum = 0.95, div_factor = 25, final_div_factor = 10000, last_epoch = -1, verbose = FALSE )
lr_one_cycle( optimizer, max_lr, total_steps = NULL, epochs = NULL, steps_per_epoch = NULL, pct_start = 0.3, anneal_strategy = "cos", cycle_momentum = TRUE, base_momentum = 0.85, max_momentum = 0.95, div_factor = 25, final_div_factor = 10000, last_epoch = -1, verbose = FALSE )
optimizer |
(Optimizer): Wrapped optimizer. |
max_lr |
(float or list): Upper learning rate boundaries in the cycle for each parameter group. |
total_steps |
(int): The total number of steps in the cycle. Note that if a value is not provided here, then it must be inferred by providing a value for epochs and steps_per_epoch. Default: NULL |
epochs |
(int): The number of epochs to train for. This is used along with steps_per_epoch in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: NULL |
steps_per_epoch |
(int): The number of steps per epoch to train for. This is used along with epochs in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: NULL |
pct_start |
(float): The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3 |
anneal_strategy |
(str): {'cos', 'linear'} Specifies the annealing strategy: "cos" for cosine annealing, "linear" for linear annealing. Default: 'cos' |
cycle_momentum |
(bool): If |
base_momentum |
(float or list): Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is 'base_momentum' and learning rate is 'max_lr'. Default: 0.85 |
max_momentum |
(float or list): Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is 'max_momentum' and learning rate is 'base_lr' Default: 0.95 |
div_factor |
(float): Determines the initial learning rate via initial_lr = max_lr/div_factor Default: 25 |
final_div_factor |
(float): Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Default: 1e4 |
last_epoch |
(int): The index of the last batch. This parameter is used when
resuming a training job. Since |
verbose |
(bool): If |
This policy was initially described in the paper Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.
The 1cycle learning rate policy changes the learning rate after every batch.
step
should be called after a batch has been used for training.
This scheduler is not chainable.
Note also that the total number of steps in the cycle can be determined in one of two ways (listed in order of precedence):
A value for total_steps is explicitly provided.
A number of epochs (epochs) and a number of steps per epoch (steps_per_epoch) are provided.
In this case, the number of total steps is inferred by total_steps = epochs * steps_per_epoch
You must either provide a value for total_steps or provide a value for both epochs and steps_per_epoch.
if (torch_is_installed()) { ## Not run: data_loader <- dataloader(...) optimizer <- optim_sgd(model$parameters, lr = 0.1, momentum = 0.9) scheduler <- lr_one_cycle(optimizer, max_lr = 0.01, steps_per_epoch = length(data_loader), epochs = 10 ) for (i in 1:epochs) { coro::loop(for (batch in data_loader) { train_batch(...) scheduler$step() }) } ## End(Not run) }
if (torch_is_installed()) { ## Not run: data_loader <- dataloader(...) optimizer <- optim_sgd(model$parameters, lr = 0.1, momentum = 0.9) scheduler <- lr_one_cycle(optimizer, max_lr = 0.01, steps_per_epoch = length(data_loader), epochs = 10 ) for (i in 1:epochs) { coro::loop(for (batch in data_loader) { train_batch(...) scheduler$step() }) } ## End(Not run) }
Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.
lr_reduce_on_plateau( optimizer, mode = "min", factor = 0.1, patience = 10, threshold = 1e-04, threshold_mode = "rel", cooldown = 0, min_lr = 0, eps = 1e-08, verbose = FALSE )
lr_reduce_on_plateau( optimizer, mode = "min", factor = 0.1, patience = 10, threshold = 1e-04, threshold_mode = "rel", cooldown = 0, min_lr = 0, eps = 1e-08, verbose = FALSE )
optimizer |
(Optimizer): Wrapped optimizer. |
mode |
(str): One of |
factor |
(float): Factor by which the learning rate will be reduced. new_lr <- lr * factor. Default: 0.1. |
patience |
(int): Number of epochs with no improvement after which
learning rate will be reduced. For example, if |
threshold |
(float):Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4. |
threshold_mode |
(str): One of |
cooldown |
(int): Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0. |
min_lr |
(float or list): A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0. |
eps |
(float): Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8. |
verbose |
(bool): If |
if (torch_is_installed()) { ## Not run: optimizer <- optim_sgd(model$parameters(), lr=0.1, momentum=0.9) scheduler <- lr_reduce_on_plateau(optimizer, 'min') for (epoch in 1:10) { train(...) val_loss <- validate(...) # note that step should be called after validate scheduler$step(val_loss) } ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_sgd(model$parameters(), lr=0.1, momentum=0.9) scheduler <- lr_reduce_on_plateau(optimizer, 'min') for (epoch in 1:10) { train(...) val_loss <- validate(...) # note that step should be called after validate scheduler$step(val_loss) } ## End(Not run) }
Creates learning rate schedulers
lr_scheduler( classname = NULL, inherit = LRScheduler, ..., parent_env = parent.frame() )
lr_scheduler( classname = NULL, inherit = LRScheduler, ..., parent_env = parent.frame() )
classname |
optional name for the learning rate scheduler |
inherit |
an optional learning rate scheduler to inherit from |
... |
named list of methods. You must implement the |
parent_env |
passed to |
Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr.
lr_step(optimizer, step_size, gamma = 0.1, last_epoch = -1)
lr_step(optimizer, step_size, gamma = 0.1, last_epoch = -1)
optimizer |
(Optimizer): Wrapped optimizer. |
step_size |
(int): Period of learning rate decay. |
gamma |
(float): Multiplicative factor of learning rate decay. Default: 0.1. |
last_epoch |
(int): The index of last epoch. Default: -1. |
if (torch_is_installed()) { ## Not run: # Assuming optimizer uses lr = 0.05 for all groups # lr = 0.05 if epoch < 30 # lr = 0.005 if 30 <= epoch < 60 # lr = 0.0005 if 60 <= epoch < 90 # ... scheduler <- lr_step(optimizer, step_size = 30, gamma = 0.1) for (epoch in 1:100) { train(...) validate(...) scheduler$step() } ## End(Not run) }
if (torch_is_installed()) { ## Not run: # Assuming optimizer uses lr = 0.05 for all groups # lr = 0.05 if epoch < 30 # lr = 0.005 if 30 <= epoch < 60 # lr = 0.0005 if 60 <= epoch < 90 # ... scheduler <- lr_step(optimizer, step_size = 30, gamma = 0.1) for (epoch in 1:100) { train(...) validate(...) scheduler$step() } ## End(Not run) }
The output size is H, for any input size. The number of output features is equal to the number of input planes.
nn_adaptive_avg_pool1d(output_size)
nn_adaptive_avg_pool1d(output_size)
output_size |
the target output size H |
if (torch_is_installed()) { # target output size of 5 m <- nn_adaptive_avg_pool1d(5) input <- torch_randn(1, 64, 8) output <- m(input) }
if (torch_is_installed()) { # target output size of 5 m <- nn_adaptive_avg_pool1d(5) input <- torch_randn(1, 64, 8) output <- m(input) }
The output is of size H x W, for any input size. The number of output features is equal to the number of input planes.
nn_adaptive_avg_pool2d(output_size)
nn_adaptive_avg_pool2d(output_size)
output_size |
the target output size of the image of the form H x W.
Can be a tuple (H, W) or a single H for a square image H x H.
H and W can be either a |
if (torch_is_installed()) { # target output size of 5x7 m <- nn_adaptive_avg_pool2d(c(5, 7)) input <- torch_randn(1, 64, 8, 9) output <- m(input) # target output size of 7x7 (square) m <- nn_adaptive_avg_pool2d(7) input <- torch_randn(1, 64, 10, 9) output <- m(input) }
if (torch_is_installed()) { # target output size of 5x7 m <- nn_adaptive_avg_pool2d(c(5, 7)) input <- torch_randn(1, 64, 8, 9) output <- m(input) # target output size of 7x7 (square) m <- nn_adaptive_avg_pool2d(7) input <- torch_randn(1, 64, 10, 9) output <- m(input) }
The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes.
nn_adaptive_avg_pool3d(output_size)
nn_adaptive_avg_pool3d(output_size)
output_size |
the target output size of the form D x H x W.
Can be a tuple (D, H, W) or a single number D for a cube D x D x D.
D, H and W can be either a |
if (torch_is_installed()) { # target output size of 5x7x9 m <- nn_adaptive_avg_pool3d(c(5, 7, 9)) input <- torch_randn(1, 64, 8, 9, 10) output <- m(input) # target output size of 7x7x7 (cube) m <- nn_adaptive_avg_pool3d(7) input <- torch_randn(1, 64, 10, 9, 8) output <- m(input) }
if (torch_is_installed()) { # target output size of 5x7x9 m <- nn_adaptive_avg_pool3d(c(5, 7, 9)) input <- torch_randn(1, 64, 8, 9, 10) output <- m(input) # target output size of 7x7x7 (cube) m <- nn_adaptive_avg_pool3d(7) input <- torch_randn(1, 64, 10, 9, 8) output <- m(input) }
Efficient softmax approximation as described in Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou
nn_adaptive_log_softmax_with_loss( in_features, n_classes, cutoffs, div_value = 4, head_bias = FALSE )
nn_adaptive_log_softmax_with_loss( in_features, n_classes, cutoffs, div_value = 4, head_bias = FALSE )
in_features |
(int): Number of features in the input tensor |
n_classes |
(int): Number of classes in the dataset |
cutoffs |
(Sequence): Cutoffs used to assign targets to their buckets |
div_value |
(float, optional): value used as an exponent to compute sizes of the clusters. Default: 4.0 |
head_bias |
(bool, optional): If |
Adaptive softmax is an approximate strategy for training models with large output spaces. It is most effective when the label distribution is highly imbalanced, for example in natural language modelling, where the word frequency distribution approximately follows the Zipf's law.
Adaptive softmax partitions the labels into several clusters, according to their frequency. These clusters may contain different number of targets each.
Additionally, clusters containing less frequent labels assign lower dimensional embeddings to those labels, which speeds up the computation. For each minibatch, only clusters for which at least one target is present are evaluated.
The idea is that the clusters which are accessed frequently (like the first one, containing most frequent labels), should also be cheap to compute – that is, contain a small number of assigned labels. We highly recommend taking a look at the original paper for more details.
cutoffs
should be an ordered Sequence of integers sorted
in the increasing order.
It controls number of clusters and the partitioning of targets into
clusters. For example setting cutoffs = c(10, 100, 1000)
means that first 10
targets will be assigned
to the 'head' of the adaptive softmax, targets 11, 12, ..., 100
will be
assigned to the first cluster, and targets 101, 102, ..., 1000
will be
assigned to the second cluster, while targets
1001, 1002, ..., n_classes - 1
will be assigned
to the last, third cluster.
div_value
is used to compute the size of each additional cluster,
which is given as
,
where
is the cluster index (with clusters
for less frequent words having larger indices,
and indices starting from
).
head_bias
if set to True, adds a bias term to the 'head' of the
adaptive softmax. See paper for details. Set to False in the official
implementation.
NamedTuple
with output
and loss
fields:
output is a Tensor of size N
containing computed target
log probabilities for each example
loss is a Scalar representing the computed negative log likelihood loss
Labels passed as inputs to this module should be sorted according to
their frequency. This means that the most frequent label should be
represented by the index 0
, and the least frequent
label should be represented by the index n_classes - 1
.
input:
target: where each value satisfies
output1:
output2: Scalar
This module returns a NamedTuple
with output
and loss
fields. See further documentation for details.
To compute log-probabilities for all classes, the log_prob
method can be used.
The output size is H, for any input size. The number of output features is equal to the number of input planes.
nn_adaptive_max_pool1d(output_size, return_indices = FALSE)
nn_adaptive_max_pool1d(output_size, return_indices = FALSE)
output_size |
the target output size H |
return_indices |
if |
if (torch_is_installed()) { # target output size of 5 m <- nn_adaptive_max_pool1d(5) input <- torch_randn(1, 64, 8) output <- m(input) }
if (torch_is_installed()) { # target output size of 5 m <- nn_adaptive_max_pool1d(5) input <- torch_randn(1, 64, 8) output <- m(input) }
The output is of size H x W, for any input size. The number of output features is equal to the number of input planes.
nn_adaptive_max_pool2d(output_size, return_indices = FALSE)
nn_adaptive_max_pool2d(output_size, return_indices = FALSE)
output_size |
the target output size of the image of the form H x W.
Can be a tuple |
return_indices |
if |
if (torch_is_installed()) { # target output size of 5x7 m <- nn_adaptive_max_pool2d(c(5, 7)) input <- torch_randn(1, 64, 8, 9) output <- m(input) # target output size of 7x7 (square) m <- nn_adaptive_max_pool2d(7) input <- torch_randn(1, 64, 10, 9) output <- m(input) }
if (torch_is_installed()) { # target output size of 5x7 m <- nn_adaptive_max_pool2d(c(5, 7)) input <- torch_randn(1, 64, 8, 9) output <- m(input) # target output size of 7x7 (square) m <- nn_adaptive_max_pool2d(7) input <- torch_randn(1, 64, 10, 9) output <- m(input) }
The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes.
nn_adaptive_max_pool3d(output_size, return_indices = FALSE)
nn_adaptive_max_pool3d(output_size, return_indices = FALSE)
output_size |
the target output size of the image of the form D x H x W.
Can be a tuple (D, H, W) or a single D for a cube D x D x D.
D, H and W can be either a |
return_indices |
if |
if (torch_is_installed()) { # target output size of 5x7x9 m <- nn_adaptive_max_pool3d(c(5, 7, 9)) input <- torch_randn(1, 64, 8, 9, 10) output <- m(input) # target output size of 7x7x7 (cube) m <- nn_adaptive_max_pool3d(7) input <- torch_randn(1, 64, 10, 9, 8) output <- m(input) }
if (torch_is_installed()) { # target output size of 5x7x9 m <- nn_adaptive_max_pool3d(c(5, 7, 9)) input <- torch_randn(1, 64, 8, 9, 10) output <- m(input) # target output size of 7x7x7 (cube) m <- nn_adaptive_max_pool3d(7) input <- torch_randn(1, 64, 10, 9, 8) output <- m(input) }
In the simplest case, the output value of the layer with input size ,
output
and
kernel_size
can be precisely described as:
nn_avg_pool1d( kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE )
nn_avg_pool1d( kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE )
kernel_size |
the size of the window |
stride |
the stride of the window. Default value is |
padding |
implicit zero padding to be added on both sides |
ceil_mode |
when TRUE, will use |
count_include_pad |
when TRUE, will include the zero-padding in the averaging calculation |
If padding
is non-zero, then the input is implicitly zero-padded on both sides
for padding
number of points.
The parameters kernel_size
, stride
, padding
can each be
an int
or a one-element tuple.
Input:
Output: , where
if (torch_is_installed()) { # pool with window of size=3, stride=2 m <- nn_avg_pool1d(3, stride = 2) m(torch_randn(1, 1, 8)) }
if (torch_is_installed()) { # pool with window of size=3, stride=2 m <- nn_avg_pool1d(3, stride = 2) m(torch_randn(1, 1, 8)) }
In the simplest case, the output value of the layer with input size ,
output
and
kernel_size
can be precisely described as:
nn_avg_pool2d( kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
nn_avg_pool2d( kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
kernel_size |
the size of the window |
stride |
the stride of the window. Default value is |
padding |
implicit zero padding to be added on both sides |
ceil_mode |
when TRUE, will use |
count_include_pad |
when TRUE, will include the zero-padding in the averaging calculation |
divisor_override |
if specified, it will be used as divisor, otherwise |
If padding
is non-zero, then the input is implicitly zero-padded on both sides
for padding
number of points.
The parameters kernel_size
, stride
, padding
can either be:
a single int
– in which case the same value is used for the height and width dimension
a tuple
of two ints – in which case, the first int
is used for the height dimension,
and the second int
for the width dimension
Input:
Output: , where
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_avg_pool2d(3, stride = 2) # pool of non-square window m <- nn_avg_pool2d(c(3, 2), stride = c(2, 1)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_avg_pool2d(3, stride = 2) # pool of non-square window m <- nn_avg_pool2d(c(3, 2), stride = c(2, 1)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
In the simplest case, the output value of the layer with input size ,
output
and
kernel_size
can be precisely described as:
nn_avg_pool3d( kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
nn_avg_pool3d( kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
kernel_size |
the size of the window |
stride |
the stride of the window. Default value is |
padding |
implicit zero padding to be added on all three sides |
ceil_mode |
when TRUE, will use |
count_include_pad |
when TRUE, will include the zero-padding in the averaging calculation |
divisor_override |
if specified, it will be used as divisor, otherwise |
If padding
is non-zero, then the input is implicitly zero-padded on all three sides
for padding
number of points.
The parameters kernel_size
, stride
can either be:
a single int
– in which case the same value is used for the depth, height and width dimension
a tuple
of three ints – in which case, the first int
is used for the depth dimension,
the second int
for the height dimension and the third int
for the width dimension
Input:
Output: , where
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_avg_pool3d(3, stride = 2) # pool of non-square window m <- nn_avg_pool3d(c(3, 2, 2), stride = c(2, 1, 2)) input <- torch_randn(20, 16, 50, 44, 31) output <- m(input) }
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_avg_pool3d(3, stride = 2) # pool of non-square window m <- nn_avg_pool3d(c(3, 2, 2), stride = c(2, 1, 2)) input <- torch_randn(20, 16, 50, 44, 31) output <- m(input) }
Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
nn_batch_norm1d( num_features, eps = 1e-05, momentum = 0.1, affine = TRUE, track_running_stats = TRUE )
nn_batch_norm1d( num_features, eps = 1e-05, momentum = 0.1, affine = TRUE, track_running_stats = TRUE )
num_features |
|
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
momentum |
the value used for the running_mean and running_var
computation. Can be set to |
affine |
a boolean value that when set to |
track_running_stats |
a boolean value that when set to |
The mean and standard-deviation are calculated per-dimension over
the mini-batches and and
are learnable parameter vectors
of size
C
(where C
is the input size). By default, the elements of
are set to 1 and the elements of
are set to 0.
Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default :attr:momentum
of 0.1.
If track_running_stats
is set to FALSE
, this layer then does not
keep running estimates, and batch statistics are instead used during
evaluation time as well.
This momentum
argument is different from one used in optimizer
classes and the conventional notion of momentum. Mathematically, the
update rule for running statistics here is
,
where
is the estimated statistic and
is the
new observed value.
Because the Batch Normalization is done over the C
dimension, computing statistics
on (N, L)
slices, it's common terminology to call this Temporal Batch Normalization.
Input: or
Output: or
(same shape as input)
if (torch_is_installed()) { # With Learnable Parameters m <- nn_batch_norm1d(100) # Without Learnable Parameters m <- nn_batch_norm1d(100, affine = FALSE) input <- torch_randn(20, 100) output <- m(input) }
if (torch_is_installed()) { # With Learnable Parameters m <- nn_batch_norm1d(100) # Without Learnable Parameters m <- nn_batch_norm1d(100, affine = FALSE) input <- torch_randn(20, 100) output <- m(input) }
Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
nn_batch_norm2d( num_features, eps = 1e-05, momentum = 0.1, affine = TRUE, track_running_stats = TRUE )
nn_batch_norm2d( num_features, eps = 1e-05, momentum = 0.1, affine = TRUE, track_running_stats = TRUE )
num_features |
|
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
momentum |
the value used for the running_mean and running_var
computation. Can be set to |
affine |
a boolean value that when set to |
track_running_stats |
a boolean value that when set to |
The mean and standard-deviation are calculated per-dimension over
the mini-batches and and
are learnable parameter vectors
of size
C
(where C
is the input size). By default, the elements of are set
to 1 and the elements of
are set to 0. The standard-deviation is calculated
via the biased estimator, equivalent to
torch_var(input, unbiased=FALSE)
.
Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default momentum
of 0.1.
If track_running_stats
is set to FALSE
, this layer then does not
keep running estimates, and batch statistics are instead used during
evaluation time as well.
Input:
Output: (same shape as input)
This momentum
argument is different from one used in optimizer
classes and the conventional notion of momentum. Mathematically, the
update rule for running statistics here is
,
where
is the estimated statistic and
is the
new observed value.
Because the Batch Normalization is done over the
C
dimension, computing statistics
on (N, H, W)
slices, it's common terminology to call this Spatial Batch Normalization.
if (torch_is_installed()) { # With Learnable Parameters m <- nn_batch_norm2d(100) # Without Learnable Parameters m <- nn_batch_norm2d(100, affine = FALSE) input <- torch_randn(20, 100, 35, 45) output <- m(input) }
if (torch_is_installed()) { # With Learnable Parameters m <- nn_batch_norm2d(100) # Without Learnable Parameters m <- nn_batch_norm2d(100, affine = FALSE) input <- torch_randn(20, 100, 35, 45) output <- m(input) }
Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
nn_batch_norm3d( num_features, eps = 1e-05, momentum = 0.1, affine = TRUE, track_running_stats = TRUE )
nn_batch_norm3d( num_features, eps = 1e-05, momentum = 0.1, affine = TRUE, track_running_stats = TRUE )
num_features |
|
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
momentum |
the value used for the running_mean and running_var
computation. Can be set to |
affine |
a boolean value that when set to |
track_running_stats |
a boolean value that when set to |
The mean and standard-deviation are calculated per-dimension over the
mini-batches and and
are learnable parameter
vectors of size
C
(where C
is the input size). By default, the elements
of are set to 1 and the elements of
are set to
0. The standard-deviation is calculated via the biased estimator,
equivalent to
torch_var(input, unbiased = FALSE)
.
Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default momentum
of 0.1.
If track_running_stats
is set to FALSE
, this layer then does not
keep running estimates, and batch statistics are instead used during
evaluation time as well.
Input:
Output: (same shape as input)
This momentum
argument is different from one used in optimizer
classes and the conventional notion of momentum. Mathematically, the
update rule for running statistics here is:
,
where
is the estimated statistic and
is the
new observed value.
Because the Batch Normalization is done over the C
dimension, computing
statistics on (N, D, H, W)
slices, it's common terminology to call this
Volumetric Batch Normalization or Spatio-temporal Batch Normalization.
if (torch_is_installed()) { # With Learnable Parameters m <- nn_batch_norm3d(100) # Without Learnable Parameters m <- nn_batch_norm3d(100, affine = FALSE) input <- torch_randn(20, 100, 35, 45, 55) output <- m(input) }
if (torch_is_installed()) { # With Learnable Parameters m <- nn_batch_norm3d(100) # Without Learnable Parameters m <- nn_batch_norm3d(100, affine = FALSE) input <- torch_randn(20, 100, 35, 45, 55) output <- m(input) }
Creates a criterion that measures the Binary Cross Entropy between the target and the output:
nn_bce_loss(weight = NULL, reduction = "mean")
nn_bce_loss(weight = NULL, reduction = "mean")
weight |
(Tensor, optional): a manual rescaling weight given to the loss
of each batch element. If given, has to be a Tensor of size |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
The unreduced (i.e. with reduction
set to 'none'
) loss can be described as:
where is the batch size. If
reduction
is not 'none'
(default 'mean'
), then
This is used for measuring the error of a reconstruction in for example
an auto-encoder. Note that the targets should be numbers
between 0 and 1.
Notice that if is either 0 or 1, one of the log terms would be
mathematically undefined in the above loss equation. PyTorch chooses to set
, since
.
However, an infinite term in the loss equation is not desirable for several reasons.
For one, if either or
, then we would be
multiplying 0 with infinity. Secondly, if we have an infinite loss value, then
we would also have an infinite term in our gradient, since
.
This would make BCELoss's backward method nonlinear with respect to ,
and using it for things like linear regression would not be straight-forward.
Our solution is that BCELoss clamps its log function outputs to be greater than
or equal to -100. This way, we can always have a finite loss value and a linear
backward method.
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar. If reduction
is 'none'
, then , same
shape as input.
if (torch_is_installed()) { m <- nn_sigmoid() loss <- nn_bce_loss() input <- torch_randn(3, requires_grad = TRUE) target <- torch_rand(3) output <- loss(m(input), target) output$backward() }
if (torch_is_installed()) { m <- nn_sigmoid() loss <- nn_bce_loss() input <- torch_randn(3, requires_grad = TRUE) target <- torch_rand(3) output <- loss(m(input), target) output$backward() }
This loss combines a Sigmoid
layer and the BCELoss
in one single
class. This version is more numerically stable than using a plain Sigmoid
followed by a BCELoss
as, by combining the operations into one layer,
we take advantage of the log-sum-exp trick for numerical stability.
nn_bce_with_logits_loss(weight = NULL, reduction = "mean", pos_weight = NULL)
nn_bce_with_logits_loss(weight = NULL, reduction = "mean", pos_weight = NULL)
weight |
(Tensor, optional): a manual rescaling weight given to the loss
of each batch element. If given, has to be a Tensor of size |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
pos_weight |
(Tensor, optional): a weight of positive examples. Must be a vector with length equal to the number of classes. |
The unreduced (i.e. with reduction
set to 'none'
) loss can be described as:
where is the batch size. If
reduction
is not 'none'
(default 'mean'
), then
This is used for measuring the error of a reconstruction in for example
an auto-encoder. Note that the targets t[i]
should be numbers
between 0 and 1.
It's possible to trade off recall and precision by adding weights to positive examples.
In the case of multi-label classification the loss can be described as:
where is the class number (
for multi-label binary
classification,
for single-label binary classification),
is the number of the sample in the batch and
is the weight of the positive answer for the class
.
increases the recall,
increases the precision.
For example, if a dataset contains 100 positive and 300 negative examples of a single class,
then
pos_weight
for the class should be equal to .
The loss would act as if the dataset contains
positive examples.
Input: where
means, any number of additional dimensions
Target: , same shape as the input
Output: scalar. If reduction
is 'none'
, then , same
shape as input.
if (torch_is_installed()) { loss <- nn_bce_with_logits_loss() input <- torch_randn(3, requires_grad = TRUE) target <- torch_empty(3)$random_(1, 2) output <- loss(input, target) output$backward() target <- torch_ones(10, 64, dtype = torch_float32()) # 64 classes, batch size = 10 output <- torch_full(c(10, 64), 1.5) # A prediction (logit) pos_weight <- torch_ones(64) # All weights are equal to 1 criterion <- nn_bce_with_logits_loss(pos_weight = pos_weight) criterion(output, target) # -log(sigmoid(1.5)) }
if (torch_is_installed()) { loss <- nn_bce_with_logits_loss() input <- torch_randn(3, requires_grad = TRUE) target <- torch_empty(3)$random_(1, 2) output <- loss(input, target) output$backward() target <- torch_ones(10, 64, dtype = torch_float32()) # 64 classes, batch size = 10 output <- torch_full(c(10, 64), 1.5) # A prediction (logit) pos_weight <- torch_ones(64) # All weights are equal to 1 criterion <- nn_bce_with_logits_loss(pos_weight = pos_weight) criterion(output, target) # -log(sigmoid(1.5)) }
Applies a bilinear transformation to the incoming data
nn_bilinear(in1_features, in2_features, out_features, bias = TRUE)
nn_bilinear(in1_features, in2_features, out_features, bias = TRUE)
in1_features |
size of each first input sample |
in2_features |
size of each second input sample |
out_features |
size of each output sample |
bias |
If set to |
Input1:
and
means any number of additional dimensions. All but the last
dimension of the inputs should be the same.
Input2: where
.
Output: where
and all but the last dimension are the same shape as the input.
weight: the learnable weights of the module of shape
.
The values are initialized from
, where
bias: the learnable bias of the module of shape .
If
bias
is TRUE
, the values are initialized from
, where
if (torch_is_installed()) { m <- nn_bilinear(20, 30, 50) input1 <- torch_randn(128, 20) input2 <- torch_randn(128, 30) output <- m(input1, input2) print(output$size()) }
if (torch_is_installed()) { m <- nn_bilinear(20, 30, 50) input1 <- torch_randn(128, 20) input2 <- torch_randn(128, 30) output <- m(input1, input2) print(output$size()) }
Indicates that a tensor is a buffer in a nn_module
nn_buffer(x, persistent = TRUE)
nn_buffer(x, persistent = TRUE)
x |
the tensor that will be converted to nn_buffer |
persistent |
whether the buffer should be persistent or not. |
Applies the element-wise function:
nn_celu(alpha = 1, inplace = FALSE)
nn_celu(alpha = 1, inplace = FALSE)
alpha |
the |
inplace |
can optionally do the operation in-place. Default: |
More details can be found in the paper Continuously Differentiable Exponential Linear Units.
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_celu() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_celu() input <- torch_randn(2) output <- m(input) }
Sparsemax activation module.
nn_contrib_sparsemax(dim = -1)
nn_contrib_sparsemax(dim = -1)
dim |
The dimension over which to apply the sparsemax function. (-1) |
The SparseMax activation is described in 'From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification' The implementation is based on aced125/sparsemax
Applies a 1D transposed convolution operator over an input image composed of several input planes.
nn_conv_transpose1d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, output_padding = 0, groups = 1, bias = TRUE, dilation = 1, padding_mode = "zeros" )
nn_conv_transpose1d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, output_padding = 0, groups = 1, bias = TRUE, dilation = 1, padding_mode = "zeros" )
in_channels |
(int): Number of channels in the input image |
out_channels |
(int): Number of channels produced by the convolution |
kernel_size |
(int or tuple): Size of the convolving kernel |
stride |
(int or tuple, optional): Stride of the convolution. Default: 1 |
padding |
(int or tuple, optional): |
output_padding |
(int or tuple, optional): Additional size added to one side of the output shape. Default: 0 |
groups |
(int, optional): Number of blocked connections from input channels to output channels. Default: 1 |
bias |
(bool, optional): If |
dilation |
(int or tuple, optional): Spacing between kernel elements. Default: 1 |
padding_mode |
(string, optional): |
This module can be seen as the gradient of Conv1d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
stride
controls the stride for the cross-correlation.
padding
controls the amount of implicit zero-paddings on both
sides for dilation * (kernel_size - 1) - padding
number of points. See note
below for details.
output_padding
controls the additional size added to one side
of the output shape. See note below for details.
dilation
controls the spacing between the kernel points; also known as the
à trous algorithm. It is harder to describe, but this link
has a nice visualization of what dilation
does.
groups
controls the connections between inputs and outputs.
in_channels
and out_channels
must both be divisible by
groups
. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels
, each input channel is convolved with
its own set of filters (of size
).
Input:
Output: where
weight (Tensor): the learnable weights of the module of shape
.
The values of these weights are sampled from
where
bias (Tensor): the learnable bias of the module of shape (out_channels).
If bias
is TRUE
, then the values of these weights are
sampled from where
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid cross-correlation
,
and not a full cross-correlation
.
It is up to the user to add proper padding.
The padding
argument effectively adds dilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that
when a ~torch.nn.Conv1d
and a ~torch.nn.ConvTranspose1d
are initialized with same parameters, they are inverses of each other in
regard to the input and output shapes. However, when stride > 1
,
~torch.nn.Conv1d
maps multiple input shapes to the same output
shape. output_padding
is provided to resolve this ambiguity by
effectively increasing the calculated output shape on one side. Note
that output_padding
is only used to find output shape, but does
not actually add zero-padding to output.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting torch.backends.cudnn.deterministic = TRUE
.
if (torch_is_installed()) { m <- nn_conv_transpose1d(32, 16, 2) input <- torch_randn(10, 32, 2) output <- m(input) }
if (torch_is_installed()) { m <- nn_conv_transpose1d(32, 16, 2) input <- torch_randn(10, 32, 2) output <- m(input) }
Applies a 2D transposed convolution operator over an input image composed of several input planes.
nn_conv_transpose2d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, output_padding = 0, groups = 1, bias = TRUE, dilation = 1, padding_mode = "zeros" )
nn_conv_transpose2d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, output_padding = 0, groups = 1, bias = TRUE, dilation = 1, padding_mode = "zeros" )
in_channels |
(int): Number of channels in the input image |
out_channels |
(int): Number of channels produced by the convolution |
kernel_size |
(int or tuple): Size of the convolving kernel |
stride |
(int or tuple, optional): Stride of the convolution. Default: 1 |
padding |
(int or tuple, optional): |
output_padding |
(int or tuple, optional): Additional size added to one side of each dimension in the output shape. Default: 0 |
groups |
(int, optional): Number of blocked connections from input channels to output channels. Default: 1 |
bias |
(bool, optional): If |
dilation |
(int or tuple, optional): Spacing between kernel elements. Default: 1 |
padding_mode |
(string, optional): |
This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
stride
controls the stride for the cross-correlation.
padding
controls the amount of implicit zero-paddings on both
sides for dilation * (kernel_size - 1) - padding
number of points. See note
below for details.
output_padding
controls the additional size added to one side
of the output shape. See note below for details.
dilation
controls the spacing between the kernel points; also known as the à trous algorithm.
It is harder to describe, but this link
_ has a nice visualization of what dilation
does.
groups
controls the connections between inputs and outputs.
in_channels
and out_channels
must both be divisible by
groups
. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels
, each input channel is convolved with
its own set of filters (of size
).
The parameters kernel_size
, stride
, padding
, output_padding
can either be:
a single int
– in which case the same value is used for the height and width dimensions
a tuple
of two ints – in which case, the first int
is used for the height dimension,
and the second int
for the width dimension
Input:
Output: where
weight (Tensor): the learnable weights of the module of shape
.
The values of these weights are sampled from
where
bias (Tensor): the learnable bias of the module of shape (out_channels)
If bias
is True
, then the values of these weights are
sampled from where
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid cross-correlation
_,
and not a full cross-correlation
. It is up to the user to add proper padding.
The padding
argument effectively adds dilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that
when a nn_conv2d and a nn_conv_transpose2d are initialized with same
parameters, they are inverses of each other in
regard to the input and output shapes. However, when stride > 1
,
nn_conv2d maps multiple input shapes to the same output
shape. output_padding
is provided to resolve this ambiguity by
effectively increasing the calculated output shape on one side. Note
that output_padding
is only used to find output shape, but does
not actually add zero-padding to output.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting torch.backends.cudnn.deterministic = TRUE
.
if (torch_is_installed()) { # With square kernels and equal stride m <- nn_conv_transpose2d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv_transpose2d(16, 33, c(3, 5), stride = c(2, 1), padding = c(4, 2)) input <- torch_randn(20, 16, 50, 100) output <- m(input) # exact output size can be also specified as an argument input <- torch_randn(1, 16, 12, 12) downsample <- nn_conv2d(16, 16, 3, stride = 2, padding = 1) upsample <- nn_conv_transpose2d(16, 16, 3, stride = 2, padding = 1) h <- downsample(input) h$size() output <- upsample(h, output_size = input$size()) output$size() }
if (torch_is_installed()) { # With square kernels and equal stride m <- nn_conv_transpose2d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv_transpose2d(16, 33, c(3, 5), stride = c(2, 1), padding = c(4, 2)) input <- torch_randn(20, 16, 50, 100) output <- m(input) # exact output size can be also specified as an argument input <- torch_randn(1, 16, 12, 12) downsample <- nn_conv2d(16, 16, 3, stride = 2, padding = 1) upsample <- nn_conv_transpose2d(16, 16, 3, stride = 2, padding = 1) h <- downsample(input) h$size() output <- upsample(h, output_size = input$size()) output$size() }
Applies a 3D transposed convolution operator over an input image composed of several input planes.
nn_conv_transpose3d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, output_padding = 0, groups = 1, bias = TRUE, dilation = 1, padding_mode = "zeros" )
nn_conv_transpose3d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, output_padding = 0, groups = 1, bias = TRUE, dilation = 1, padding_mode = "zeros" )
in_channels |
(int): Number of channels in the input image |
out_channels |
(int): Number of channels produced by the convolution |
kernel_size |
(int or tuple): Size of the convolving kernel |
stride |
(int or tuple, optional): Stride of the convolution. Default: 1 |
padding |
(int or tuple, optional): |
output_padding |
(int or tuple, optional): Additional size added to one side of each dimension in the output shape. Default: 0 |
groups |
(int, optional): Number of blocked connections from input channels to output channels. Default: 1 |
bias |
(bool, optional): If |
dilation |
(int or tuple, optional): Spacing between kernel elements. Default: 1 |
padding_mode |
(string, optional): |
The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes.
This module can be seen as the gradient of Conv3d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
stride
controls the stride for the cross-correlation.
padding
controls the amount of implicit zero-paddings on both
sides for dilation * (kernel_size - 1) - padding
number of points. See note
below for details.
output_padding
controls the additional size added to one side
of the output shape. See note below for details.
dilation
controls the spacing between the kernel points; also known as the à trous algorithm.
It is harder to describe, but this link
_ has a nice visualization of what dilation
does.
groups
controls the connections between inputs and outputs.
in_channels
and out_channels
must both be divisible by
groups
. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels
, each input channel is convolved with
its own set of filters (of size
).
The parameters kernel_size
, stride
, padding
, output_padding
can either be:
a single int
– in which case the same value is used for the depth, height and width dimensions
a tuple
of three ints – in which case, the first int
is used for the depth dimension,
the second int
for the height dimension and the third int
for the width dimension
Input:
Output: where
weight (Tensor): the learnable weights of the module of shape
.
The values of these weights are sampled from
where
bias (Tensor): the learnable bias of the module of shape (out_channels)
If bias
is True
, then the values of these weights are
sampled from where
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid cross-correlation
,
and not a full cross-correlation
.
It is up to the user to add proper padding.
The padding
argument effectively adds dilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that
when a ~torch.nn.Conv3d
and a ~torch.nn.ConvTranspose3d
are initialized with same parameters, they are inverses of each other in
regard to the input and output shapes. However, when stride > 1
,
~torch.nn.Conv3d
maps multiple input shapes to the same output
shape. output_padding
is provided to resolve this ambiguity by
effectively increasing the calculated output shape on one side. Note
that output_padding
is only used to find output shape, but does
not actually add zero-padding to output.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting torch.backends.cudnn.deterministic = TRUE
.
if (torch_is_installed()) { ## Not run: # With square kernels and equal stride m <- nn_conv_transpose3d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv_transpose3d(16, 33, c(3, 5, 2), stride = c(2, 1, 1), padding = c(0, 4, 2)) input <- torch_randn(20, 16, 10, 50, 100) output <- m(input) ## End(Not run) }
if (torch_is_installed()) { ## Not run: # With square kernels and equal stride m <- nn_conv_transpose3d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv_transpose3d(16, 33, c(3, 5, 2), stride = c(2, 1, 1), padding = c(0, 4, 2)) input <- torch_randn(20, 16, 10, 50, 100) output <- m(input) ## End(Not run) }
Applies a 1D convolution over an input signal composed of several input
planes.
In the simplest case, the output value of the layer with input size
and output
can be
precisely described as:
nn_conv1d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, dilation = 1, groups = 1, bias = TRUE, padding_mode = "zeros" )
nn_conv1d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, dilation = 1, groups = 1, bias = TRUE, padding_mode = "zeros" )
in_channels |
(int): Number of channels in the input image |
out_channels |
(int): Number of channels produced by the convolution |
kernel_size |
(int or tuple): Size of the convolving kernel |
stride |
(int or tuple, optional): Stride of the convolution. Default: 1 |
padding |
(int, tuple or str, optional) – Padding added to both sides of the input. Default: 0 |
dilation |
(int or tuple, optional): Spacing between kernel elements. Default: 1 |
groups |
(int, optional): Number of blocked connections from input channels to output channels. Default: 1 |
bias |
(bool, optional): If |
padding_mode |
(string, optional): |
where is the valid
cross-correlation operator,
is a batch size,
denotes a number of channels,
is a length of signal sequence.
stride
controls the stride for the cross-correlation, a single
number or a one-element tuple.
padding
controls the amount of implicit zero-paddings on both sides
for padding
number of points.
dilation
controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this
link
has a nice visualization of what dilation
does.
groups
controls the connections between inputs and outputs.
in_channels
and out_channels
must both be divisible by
groups
. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels
, each input channel is convolved with
its own set of filters,
of size .
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid
cross-correlation
, and not a full cross-correlation
.
It is up to the user to add proper padding.
When groups == in_channels
and out_channels == K * in_channels
,
where K
is a positive integer, this operation is also termed in
literature as depthwise convolution.
In other words, for an input of size ,
a depthwise convolution with a depthwise multiplier
K
, can be constructed by arguments
.
Input:
Output: where
weight (Tensor): the learnable weights of the module of shape
.
The values of these weights are sampled from
where
bias (Tensor): the learnable bias of the module of shape
(out_channels). If bias
is TRUE
, then the values of these weights are
sampled from where
if (torch_is_installed()) { m <- nn_conv1d(16, 33, 3, stride = 2) input <- torch_randn(20, 16, 50) output <- m(input) }
if (torch_is_installed()) { m <- nn_conv1d(16, 33, 3, stride = 2) input <- torch_randn(20, 16, 50) output <- m(input) }
Applies a 2D convolution over an input signal composed of several input planes.
nn_conv2d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, dilation = 1, groups = 1, bias = TRUE, padding_mode = "zeros" )
nn_conv2d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, dilation = 1, groups = 1, bias = TRUE, padding_mode = "zeros" )
in_channels |
(int): Number of channels in the input image |
out_channels |
(int): Number of channels produced by the convolution |
kernel_size |
(int or tuple): Size of the convolving kernel |
stride |
(int or tuple, optional): Stride of the convolution. Default: 1 |
padding |
(int or tuple or string, optional): Zero-padding added to both sides of
the input. controls the amount of padding applied to the input. It
can be either a string |
dilation |
(int or tuple, optional): Spacing between kernel elements. Default: 1 |
groups |
(int, optional): Number of blocked connections from input channels to output channels. Default: 1 |
bias |
(bool, optional): If |
padding_mode |
(string, optional): |
In the simplest case, the output value of the layer with input size
and output
can be precisely described as:
where is the valid 2D cross-correlation operator,
is a batch size,
denotes a number of channels,
is a height of input planes in pixels, and
is
width in pixels.
stride
controls the stride for the cross-correlation, a single
number or a tuple.
padding
controls the amount of implicit zero-paddings on both
sides for padding
number of points for each dimension.
dilation
controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this link
_
has a nice visualization of what dilation
does.
groups
controls the connections between inputs and outputs.
in_channels
and out_channels
must both be divisible by
groups
. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels
, each input channel is convolved with
its own set of filters, of size:
.
The parameters kernel_size
, stride
, padding
, dilation
can either be:
a single int
– in which case the same value is used for the height and
width dimension
a tuple
of two ints – in which case, the first int
is used for the height dimension,
and the second int
for the width dimension
Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding.
When groups == in_channels
and out_channels == K * in_channels
,
where K
is a positive integer, this operation is also termed in
literature as depthwise convolution.
In other words, for an input of size :math:(N, C_{in}, H_{in}, W_{in})
,
a depthwise convolution with a depthwise multiplier K
, can be constructed by arguments
.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting backends_cudnn_deterministic = TRUE
.
Input:
Output: where
weight (Tensor): the learnable weights of the module of shape
,
.
The values of these weights are sampled from
where
bias (Tensor): the learnable bias of the module of shape
(out_channels). If bias
is TRUE
,
then the values of these weights are
sampled from where
if (torch_is_installed()) { # With square kernels and equal stride m <- nn_conv2d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv2d(16, 33, c(3, 5), stride = c(2, 1), padding = c(4, 2)) # non-square kernels and unequal stride and with padding and dilation m <- nn_conv2d(16, 33, c(3, 5), stride = c(2, 1), padding = c(4, 2), dilation = c(3, 1)) input <- torch_randn(20, 16, 50, 100) output <- m(input) }
if (torch_is_installed()) { # With square kernels and equal stride m <- nn_conv2d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv2d(16, 33, c(3, 5), stride = c(2, 1), padding = c(4, 2)) # non-square kernels and unequal stride and with padding and dilation m <- nn_conv2d(16, 33, c(3, 5), stride = c(2, 1), padding = c(4, 2), dilation = c(3, 1)) input <- torch_randn(20, 16, 50, 100) output <- m(input) }
Applies a 3D convolution over an input signal composed of several input
planes.
In the simplest case, the output value of the layer with input size
and output
can be precisely described as:
nn_conv3d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, dilation = 1, groups = 1, bias = TRUE, padding_mode = "zeros" )
nn_conv3d( in_channels, out_channels, kernel_size, stride = 1, padding = 0, dilation = 1, groups = 1, bias = TRUE, padding_mode = "zeros" )
in_channels |
(int): Number of channels in the input image |
out_channels |
(int): Number of channels produced by the convolution |
kernel_size |
(int or tuple): Size of the convolving kernel |
stride |
(int or tuple, optional): Stride of the convolution. Default: 1 |
padding |
(int, tuple or str, optional): padding added to all six sides of the input. Default: 0 |
dilation |
(int or tuple, optional): Spacing between kernel elements. Default: 1 |
groups |
(int, optional): Number of blocked connections from input channels to output channels. Default: 1 |
bias |
(bool, optional): If |
padding_mode |
(string, optional): |
where is the valid 3D
cross-correlation
operator
stride
controls the stride for the cross-correlation.
padding
controls the amount of implicit zero-paddings on both
sides for padding
number of points for each dimension.
dilation
controls the spacing between the kernel points; also known as the à trous algorithm.
It is harder to describe, but this link
_ has a nice visualization of what dilation
does.
groups
controls the connections between inputs and outputs.
in_channels
and out_channels
must both be divisible by
groups
. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels
, each input channel is convolved with
its own set of filters, of size
.
The parameters kernel_size
, stride
, padding
, dilation
can either be:
a single int
– in which case the same value is used for the depth, height and width dimension
a tuple
of three ints – in which case, the first int
is used for the depth dimension,
the second int
for the height dimension and the third int
for the width dimension
Input:
Output: where
weight (Tensor): the learnable weights of the module of shape
.
The values of these weights are sampled from
where
bias (Tensor): the learnable bias of the module of shape (out_channels). If bias
is True
,
then the values of these weights are
sampled from where
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid cross-correlation
,
and not a full cross-correlation
.
It is up to the user to add proper padding.
When groups == in_channels
and out_channels == K * in_channels
,
where K
is a positive integer, this operation is also termed in
literature as depthwise convolution.
In other words, for an input of size ,
a depthwise convolution with a depthwise multiplier
K
, can be constructed by arguments
.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting torch.backends.cudnn.deterministic = TRUE
.
Please see the notes on :doc:/notes/randomness
for background.
if (torch_is_installed()) { # With square kernels and equal stride m <- nn_conv3d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv3d(16, 33, c(3, 5, 2), stride = c(2, 1, 1), padding = c(4, 2, 0)) input <- torch_randn(20, 16, 10, 50, 100) output <- m(input) }
if (torch_is_installed()) { # With square kernels and equal stride m <- nn_conv3d(16, 33, 3, stride = 2) # non-square kernels and unequal stride and with padding m <- nn_conv3d(16, 33, c(3, 5, 2), stride = c(2, 1, 1), padding = c(4, 2, 0)) input <- torch_randn(20, 16, 10, 50, 100) output <- m(input) }
Creates a criterion that measures the loss given input tensors
,
and a
Tensor
label with values 1 or -1.
This is used for measuring whether two inputs are similar or dissimilar,
using the cosine distance, and is typically used for learning nonlinear
embeddings or semi-supervised learning.
The loss function for each sample is:
nn_cosine_embedding_loss(margin = 0, reduction = "mean")
nn_cosine_embedding_loss(margin = 0, reduction = "mean")
margin |
(float, optional): Should be a number from |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
This criterion combines nn_log_softmax()
and nn_nll_loss()
in one single class.
It is useful when training a classification problem with C
classes.
nn_cross_entropy_loss(weight = NULL, ignore_index = -100, reduction = "mean")
nn_cross_entropy_loss(weight = NULL, ignore_index = -100, reduction = "mean")
weight |
(Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size |
ignore_index |
(int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
If provided, the optional argument weight
should be a 1D Tensor
assigning weight to each of the classes.
This is particularly useful when you have an unbalanced training set.
The input
is expected to contain raw, unnormalized scores for each class.
input
has to be a Tensor of size either or
with
for the
K
-dimensional case (described later).
This criterion expects a class index in the range as the
target
for each value of a 1D tensor of size minibatch
; if ignore_index
is specified, this criterion also accepts this class index (this index may not
necessarily be in the class range).
The loss can be described as:
or in the case of the weight
argument being specified:
The losses are averaged across observations for each minibatch.
Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size with
,
where
is the number of dimensions, and a target of appropriate shape
(see below).
Input: where
C = number of classes
, or
with
in the case of
K
-dimensional loss.
Target: where each value is
, or
with
in the case of
K-dimensional loss.
Output: scalar.
If reduction
is 'none'
, then the same size as the target:
, or
with
in the case
of K-dimensional loss.
if (torch_is_installed()) { loss <- nn_cross_entropy_loss() input <- torch_randn(3, 5, requires_grad = TRUE) target <- torch_randint(low = 1, high = 5, size = 3, dtype = torch_long()) output <- loss(input, target) output$backward() }
if (torch_is_installed()) { loss <- nn_cross_entropy_loss() input <- torch_randn(3, 5, requires_grad = TRUE) target <- torch_randint(low = 1, high = 5, size = 3, dtype = torch_long()) output <- loss(input, target) output$backward() }
Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the
probability of possible alignments of input to target, producing a loss value which is differentiable
with respect to each input node. The alignment of input to target is assumed to be "many-to-one", which
limits the length of the target sequence such that it must be the input length.
nn_ctc_loss(blank = 0, reduction = "mean", zero_infinity = FALSE)
nn_ctc_loss(blank = 0, reduction = "mean", zero_infinity = FALSE)
blank |
(int, optional): blank label. Default |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
zero_infinity |
(bool, optional):
Whether to zero infinite losses and the associated gradients.
Default: |
Log_probs: Tensor of size ,
where
,
, and
.
The logarithmized probabilities of the outputs (e.g. obtained with
[nnf)log_softmax()]).
Targets: Tensor of size or
,
where
and
.
It represent the target sequences. Each element in the target
sequence is a class index. And the target index cannot be blank (default=0).
In the
form, targets are padded to the
length of the longest sequence, and stacked.
In the
form,
the targets are assumed to be un-padded and
concatenated within 1 dimension.
Input_lengths: Tuple or tensor of size ,
where
. It represent the lengths of the
inputs (must each be
). And the lengths are specified
for each sequence to achieve masking under the assumption that sequences
are padded to equal lengths.
Target_lengths: Tuple or tensor of size ,
where
. It represent lengths of the targets.
Lengths are specified for each sequence to achieve masking under the
assumption that sequences are padded to equal lengths. If target shape is
, target_lengths are effectively the stop index
for each target sequence, such that
target_n = targets[n,0:s_n]
for
each target in a batch. Lengths must each be
If the targets are given as a 1d tensor that is the concatenation of individual
targets, the target_lengths must add up to the total length of the tensor.
Output: scalar. If reduction
is 'none'
, then
, where
.
[nnf)log_softmax()]: R:nnf)log_softmax() [n,0:s_n]: R:n,0:s_n
In order to use CuDNN, the following must be satisfied: targets
must be
in concatenated format, all input_lengths
must be T
. ,
target_lengths
, the integer arguments must be of
The regular implementation uses the (more common in PyTorch)
torch_long
dtype.
dtype torch_int32
.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting torch.backends.cudnn.deterministic = TRUE
.
A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf
if (torch_is_installed()) { # Target are to be padded T <- 50 # Input sequence length C <- 20 # Number of classes (including blank) N <- 16 # Batch size S <- 30 # Target sequence length of longest target in batch (padding length) S_min <- 10 # Minimum target length, for demonstration purposes # Initialize random batch of input vectors, for *size = (T,N,C) input <- torch_randn(T, N, C)$log_softmax(2)$detach()$requires_grad_() # Initialize random batch of targets (0 = blank, 1:C = classes) target <- torch_randint(low = 1, high = C, size = c(N, S), dtype = torch_long()) input_lengths <- torch_full(size = c(N), fill_value = TRUE, dtype = torch_long()) target_lengths <- torch_randint(low = S_min, high = S, size = c(N), dtype = torch_long()) ctc_loss <- nn_ctc_loss() loss <- ctc_loss(input, target, input_lengths, target_lengths) loss$backward() # Target are to be un-padded T <- 50 # Input sequence length C <- 20 # Number of classes (including blank) N <- 16 # Batch size # Initialize random batch of input vectors, for *size = (T,N,C) input <- torch_randn(T, N, C)$log_softmax(2)$detach()$requires_grad_() input_lengths <- torch_full(size = c(N), fill_value = TRUE, dtype = torch_long()) # Initialize random batch of targets (0 = blank, 1:C = classes) target_lengths <- torch_randint(low = 1, high = T, size = c(N), dtype = torch_long()) target <- torch_randint( low = 1, high = C, size = as.integer(sum(target_lengths)), dtype = torch_long() ) ctc_loss <- nn_ctc_loss() loss <- ctc_loss(input, target, input_lengths, target_lengths) loss$backward() }
if (torch_is_installed()) { # Target are to be padded T <- 50 # Input sequence length C <- 20 # Number of classes (including blank) N <- 16 # Batch size S <- 30 # Target sequence length of longest target in batch (padding length) S_min <- 10 # Minimum target length, for demonstration purposes # Initialize random batch of input vectors, for *size = (T,N,C) input <- torch_randn(T, N, C)$log_softmax(2)$detach()$requires_grad_() # Initialize random batch of targets (0 = blank, 1:C = classes) target <- torch_randint(low = 1, high = C, size = c(N, S), dtype = torch_long()) input_lengths <- torch_full(size = c(N), fill_value = TRUE, dtype = torch_long()) target_lengths <- torch_randint(low = S_min, high = S, size = c(N), dtype = torch_long()) ctc_loss <- nn_ctc_loss() loss <- ctc_loss(input, target, input_lengths, target_lengths) loss$backward() # Target are to be un-padded T <- 50 # Input sequence length C <- 20 # Number of classes (including blank) N <- 16 # Batch size # Initialize random batch of input vectors, for *size = (T,N,C) input <- torch_randn(T, N, C)$log_softmax(2)$detach()$requires_grad_() input_lengths <- torch_full(size = c(N), fill_value = TRUE, dtype = torch_long()) # Initialize random batch of targets (0 = blank, 1:C = classes) target_lengths <- torch_randint(low = 1, high = T, size = c(N), dtype = torch_long()) target <- torch_randint( low = 1, high = C, size = as.integer(sum(target_lengths)), dtype = torch_long() ) ctc_loss <- nn_ctc_loss() loss <- ctc_loss(input, target, input_lengths, target_lengths) loss$backward() }
During training, randomly zeroes some of the elements of the input
tensor with probability p
using samples from a Bernoulli
distribution. Each channel will be zeroed out independently on every forward
call.
nn_dropout(p = 0.5, inplace = FALSE)
nn_dropout(p = 0.5, inplace = FALSE)
p |
probability of an element to be zeroed. Default: 0.5 |
inplace |
If set to |
This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper Improving neural networks by preventing co-adaptation of feature detectors.
Furthermore, the outputs are scaled by a factor of :math:\frac{1}{1-p}
during
training. This means that during evaluation the module simply computes an
identity function.
Input: . Input can be of any shape
Output: . Output is of the same shape as input
if (torch_is_installed()) { m <- nn_dropout(p = 0.2) input <- torch_randn(20, 16) output <- m(input) }
if (torch_is_installed()) { m <- nn_dropout(p = 0.2) input <- torch_randn(20, 16) output <- m(input) }
Randomly zero out entire channels (a channel is a 2D feature map,
e.g., the -th channel of the
-th sample in the
batched input is a 2D tensor
).
nn_dropout2d(p = 0.5, inplace = FALSE)
nn_dropout2d(p = 0.5, inplace = FALSE)
p |
(float, optional): probability of an element to be zero-ed. |
inplace |
(bool, optional): If set to |
Each channel will be zeroed out independently on every forward call with
probability p
using samples from a Bernoulli distribution.
Usually the input comes from nn_conv2d modules.
As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, nn_dropout2d will help promote independence between feature maps and should be used instead.
Input:
Output: (same shape as input)
if (torch_is_installed()) { m <- nn_dropout2d(p = 0.2) input <- torch_randn(20, 16, 32, 32) output <- m(input) }
if (torch_is_installed()) { m <- nn_dropout2d(p = 0.2) input <- torch_randn(20, 16, 32, 32) output <- m(input) }
Randomly zero out entire channels (a channel is a 3D feature map,
e.g., the -th channel of the
-th sample in the
batched input is a 3D tensor
).
nn_dropout3d(p = 0.5, inplace = FALSE)
nn_dropout3d(p = 0.5, inplace = FALSE)
p |
(float, optional): probability of an element to be zeroed. |
inplace |
(bool, optional): If set to |
Each channel will be zeroed out independently on every forward call with
probability p
using samples from a Bernoulli distribution.
Usually the input comes from nn_conv2d modules.
As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease.
In this case, nn_dropout3d will help promote independence between feature maps and should be used instead.
Input:
Output: (same shape as input)
if (torch_is_installed()) { m <- nn_dropout3d(p = 0.2) input <- torch_randn(20, 16, 4, 32, 32) output <- m(input) }
if (torch_is_installed()) { m <- nn_dropout3d(p = 0.2) input <- torch_randn(20, 16, 4, 32, 32) output <- m(input) }
Applies the element-wise function:
nn_elu(alpha = 1, inplace = FALSE)
nn_elu(alpha = 1, inplace = FALSE)
alpha |
the |
inplace |
can optionally do the operation in-place. Default: |
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_elu() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_elu() input <- torch_randn(2) output <- m(input) }
A simple lookup table that stores embeddings of a fixed dictionary and size. This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
nn_embedding( num_embeddings, embedding_dim, padding_idx = NULL, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, sparse = FALSE, .weight = NULL )
nn_embedding( num_embeddings, embedding_dim, padding_idx = NULL, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, sparse = FALSE, .weight = NULL )
num_embeddings |
(int): size of the dictionary of embeddings |
embedding_dim |
(int): the size of each embedding vector |
padding_idx |
(int, optional): If given, pads the output with the embedding vector at |
max_norm |
(float, optional): If given, each embedding vector with norm larger than |
norm_type |
(float, optional): The p of the p-norm to compute for the |
scale_grad_by_freq |
(boolean, optional): If given, this will scale gradients by the inverse of frequency of
the words in the mini-batch. Default |
sparse |
(bool, optional): If |
.weight |
(Tensor) embeddings weights (in case you want to set it manually) See Notes for more details regarding sparse gradients. |
weight (Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)
initialized from
Input: , LongTensor of arbitrary shape containing the indices to extract
Output: , where
*
is the input shape and
Keep in mind that only a limited number of optimizers support
sparse gradients: currently it's optim.SGD
(CUDA
and CPU
),
optim.SparseAdam
(CUDA
and CPU
) and optim.Adagrad
(CPU
)
With padding_idx
set, the embedding vector at
padding_idx
is initialized to all zeros. However, note that this
vector can be modified afterwards, e.g., using a customized
initialization method, and thus changing the vector used to pad the
output. The gradient for this vector from nn_embedding
is always zero.
if (torch_is_installed()) { # an Embedding module containing 10 tensors of size 3 embedding <- nn_embedding(10, 3) # a batch of 2 samples of 4 indices each input <- torch_tensor(rbind(c(1, 2, 4, 5), c(4, 3, 2, 9)), dtype = torch_long()) embedding(input) # example with padding_idx embedding <- nn_embedding(10, 3, padding_idx = 1) input <- torch_tensor(matrix(c(1, 3, 1, 6), nrow = 1), dtype = torch_long()) embedding(input) }
if (torch_is_installed()) { # an Embedding module containing 10 tensors of size 3 embedding <- nn_embedding(10, 3) # a batch of 2 samples of 4 indices each input <- torch_tensor(rbind(c(1, 2, 4, 5), c(4, 3, 2, 9)), dtype = torch_long()) embedding(input) # example with padding_idx embedding <- nn_embedding(10, 3, padding_idx = 1) input <- torch_tensor(matrix(c(1, 3, 1, 6), nrow = 1), dtype = torch_long()) embedding(input) }
Computes sums, means or maxes of bags
of embeddings, without instantiating the
intermediate embeddings.
nn_embedding_bag( num_embeddings, embedding_dim, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, mode = "mean", sparse = FALSE, include_last_offset = FALSE, padding_idx = NULL, .weight = NULL )
nn_embedding_bag( num_embeddings, embedding_dim, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, mode = "mean", sparse = FALSE, include_last_offset = FALSE, padding_idx = NULL, .weight = NULL )
num_embeddings |
(int): size of the dictionary of embeddings |
embedding_dim |
(int): the size of each embedding vector |
max_norm |
(float, optional): If given, each embedding vector with norm larger than |
norm_type |
(float, optional): The p of the p-norm to compute for the |
scale_grad_by_freq |
(boolean, optional): If given, this will scale gradients by the inverse of frequency of
the words in the mini-batch. Default |
mode |
(string, optional): |
sparse |
(bool, optional): If |
include_last_offset |
(bool, optional): if |
padding_idx |
(int, optional): If given, pads the output with the embedding vector at |
.weight |
(Tensor, optional) embeddings weights (in case you want to set it manually) |
weight (Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)
initialized from
if (torch_is_installed()) { # an EmbeddingBag module containing 10 tensors of size 3 embedding_sum <- nn_embedding_bag(10, 3, mode = 'sum') # a batch of 2 samples of 4 indices each input <- torch_tensor(c(1, 2, 4, 5, 4, 3, 2, 9), dtype = torch_long()) offsets <- torch_tensor(c(0, 4), dtype = torch_long()) embedding_sum(input, offsets) # example with padding_idx embedding_sum <- nn_embedding_bag(10, 3, mode = 'sum', padding_idx = 1) input <- torch_tensor(c(2, 2, 2, 2, 4, 3, 2, 9), dtype = torch_long()) offsets <- torch_tensor(c(0, 4), dtype = torch_long()) embedding_sum(input, offsets) # An EmbeddingBag can be loaded from an Embedding like so embedding <- nn_embedding(10, 3, padding_idx = 2) embedding_sum <- nn_embedding_bag$from_pretrained(embedding$weight, padding_idx = embedding$padding_idx, mode='sum') }
if (torch_is_installed()) { # an EmbeddingBag module containing 10 tensors of size 3 embedding_sum <- nn_embedding_bag(10, 3, mode = 'sum') # a batch of 2 samples of 4 indices each input <- torch_tensor(c(1, 2, 4, 5, 4, 3, 2, 9), dtype = torch_long()) offsets <- torch_tensor(c(0, 4), dtype = torch_long()) embedding_sum(input, offsets) # example with padding_idx embedding_sum <- nn_embedding_bag(10, 3, mode = 'sum', padding_idx = 1) input <- torch_tensor(c(2, 2, 2, 2, 4, 3, 2, 9), dtype = torch_long()) offsets <- torch_tensor(c(0, 4), dtype = torch_long()) embedding_sum(input, offsets) # An EmbeddingBag can be loaded from an Embedding like so embedding <- nn_embedding(10, 3, padding_idx = 2) embedding_sum <- nn_embedding_bag$from_pretrained(embedding$weight, padding_idx = embedding$padding_idx, mode='sum') }
For use with nn_sequential.
nn_flatten(start_dim = 2, end_dim = -1)
nn_flatten(start_dim = 2, end_dim = -1)
start_dim |
first dim to flatten (default = 2). |
end_dim |
last dim to flatten (default = -1). |
Input: (*, S_start,..., S_i, ..., S_end, *)
,
where S_i
is the size at dimension i
and *
means any
number of dimensions including none.
Output: (*, S_start*...*S_i*...S_end, *)
.
if (torch_is_installed()) { input <- torch_randn(32, 1, 5, 5) m <- nn_flatten() m(input) }
if (torch_is_installed()) { input <- torch_randn(32, 1, 5, 5) m <- nn_flatten() m(input) }
Fractional MaxPooling is described in detail in the paper Fractional MaxPooling by Ben Graham
nn_fractional_max_pool2d( kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE )
nn_fractional_max_pool2d( kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE )
kernel_size |
the size of the window to take a max over.
Can be a single number k (for a square kernel of k x k) or a tuple |
output_size |
the target output size of the image of the form |
output_ratio |
If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1) |
return_indices |
if |
The max-pooling operation is applied in regions by a stochastic
step size determined by the target output size.
The number of output features is equal to the number of input planes.
if (torch_is_installed()) { # pool of square window of size=3, and target output size 13x12 m <- nn_fractional_max_pool2d(3, output_size = c(13, 12)) # pool of square window and target output size being half of input image size m <- nn_fractional_max_pool2d(3, output_ratio = c(0.5, 0.5)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
if (torch_is_installed()) { # pool of square window of size=3, and target output size 13x12 m <- nn_fractional_max_pool2d(3, output_size = c(13, 12)) # pool of square window and target output size being half of input image size m <- nn_fractional_max_pool2d(3, output_ratio = c(0.5, 0.5)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
Fractional MaxPooling is described in detail in the paper Fractional MaxPooling by Ben Graham
nn_fractional_max_pool3d( kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE )
nn_fractional_max_pool3d( kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE )
kernel_size |
the size of the window to take a max over.
Can be a single number k (for a square kernel of k x k x k) or a tuple |
output_size |
the target output size of the image of the form |
output_ratio |
If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1) |
return_indices |
if |
The max-pooling operation is applied in regions by a stochastic
step size determined by the target output size.
The number of output features is equal to the number of input planes.
if (torch_is_installed()) { # pool of cubic window of size=3, and target output size 13x12x11 m <- nn_fractional_max_pool3d(3, output_size = c(13, 12, 11)) # pool of cubic window and target output size being half of input size m <- nn_fractional_max_pool3d(3, output_ratio = c(0.5, 0.5, 0.5)) input <- torch_randn(20, 16, 50, 32, 16) output <- m(input) }
if (torch_is_installed()) { # pool of cubic window of size=3, and target output size 13x12x11 m <- nn_fractional_max_pool3d(3, output_size = c(13, 12, 11)) # pool of cubic window and target output size being half of input size m <- nn_fractional_max_pool3d(3, output_ratio = c(0.5, 0.5, 0.5)) input <- torch_randn(20, 16, 50, 32, 16) output <- m(input) }
Applies the Gaussian Error Linear Units function:
nn_gelu(approximate = "none")
nn_gelu(approximate = "none")
approximate |
the gelu approximation algorithm to use: |
where is the Cumulative Distribution Function for Gaussian Distribution.
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_gelu() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_gelu() input <- torch_randn(2) output <- m(input) }
Applies the gated linear unit function
where
is the first half
of the input matrices and
is the second half.
nn_glu(dim = -1)
nn_glu(dim = -1)
dim |
(int): the dimension on which to split the input. Default: -1 |
Input: where
*
means, any number of additional
dimensions
Output: where
if (torch_is_installed()) { m <- nn_glu() input <- torch_randn(4, 2) output <- m(input) }
if (torch_is_installed()) { m <- nn_glu() input <- torch_randn(4, 2) output <- m(input) }
Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization.
nn_group_norm(num_groups, num_channels, eps = 1e-05, affine = TRUE)
nn_group_norm(num_groups, num_channels, eps = 1e-05, affine = TRUE)
num_groups |
(int): number of groups to separate the channels into |
num_channels |
(int): number of channels expected in input |
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
affine |
a boolean value that when set to |
The input channels are separated into num_groups
groups, each containing
num_channels / num_groups
channels. The mean and standard-deviation are calculated
separately over the each group. and
are learnable
per-channel affine transform parameter vectors of size
num_channels
if
affine
is TRUE
.
The standard-deviation is calculated via the biased estimator, equivalent to
torch_var(input, unbiased=FALSE)
.
Input: where
Output: ' (same shape as input)
This layer uses statistics computed from input data in both training and evaluation modes.
if (torch_is_installed()) { input <- torch_randn(20, 6, 10, 10) # Separate 6 channels into 3 groups m <- nn_group_norm(3, 6) # Separate 6 channels into 6 groups (equivalent with [nn_instance_morm]) m <- nn_group_norm(6, 6) # Put all 6 channels into a single group (equivalent with [nn_layer_norm]) m <- nn_group_norm(1, 6) # Activating the module output <- m(input) }
if (torch_is_installed()) { input <- torch_randn(20, 6, 10, 10) # Separate 6 channels into 3 groups m <- nn_group_norm(3, 6) # Separate 6 channels into 6 groups (equivalent with [nn_instance_morm]) m <- nn_group_norm(6, 6) # Put all 6 channels into a single group (equivalent with [nn_layer_norm]) m <- nn_group_norm(1, 6) # Activating the module output <- m(input) }
For each element in the input sequence, each layer computes the following function:
nn_gru( input_size, hidden_size, num_layers = 1, bias = TRUE, batch_first = FALSE, dropout = 0, bidirectional = FALSE, ... )
nn_gru( input_size, hidden_size, num_layers = 1, bias = TRUE, batch_first = FALSE, dropout = 0, bidirectional = FALSE, ... )
input_size |
The number of expected features in the input |
The number of features in the hidden state |
|
num_layers |
Number of recurrent layers. E.g., setting |
bias |
If |
batch_first |
If |
dropout |
If non-zero, introduces a |
bidirectional |
If |
... |
currently unused. |
where is the hidden state at time
t
, is the input
at time
t
, is the hidden state of the previous layer
at time
t-1
or the initial hidden state at time 0
, and ,
,
are the reset, update, and new gates, respectively.
is the sigmoid function.
Inputs: input, h_0
input of shape (seq_len, batch, input_size)
: tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence. See nn_utils_rnn_pack_padded_sequence()
for details.
h_0 of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided.
Outputs: output, h_n
output of shape (seq_len, batch, num_directions * hidden_size)
: tensor
containing the output features h_t from the last layer of the GRU,
for each t. If a PackedSequence
has been
given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using output$view(c(seq_len, batch, num_directions, hidden_size))
,
with forward and backward being direction 0
and 1
respectively.
Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the hidden state for t = seq_len
Like output, the layers can be separated using
h_n$view(num_layers, num_directions, batch, hidden_size)
.
weight_ih_l[k]
: the learnable input-hidden weights of the layer
(W_ir|W_iz|W_in), of shape
(3*hidden_size x input_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the layer
(W_hr|W_hz|W_hn), of shape
(3*hidden_size x hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the layer
(b_ir|b_iz|b_in), of shape
(3*hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the layer
(b_hr|b_hz|b_hn), of shape
(3*hidden_size)
All the weights and biases are initialized from
where
if (torch_is_installed()) { rnn <- nn_gru(10, 20, 2) input <- torch_randn(5, 3, 10) h0 <- torch_randn(2, 3, 20) output <- rnn(input, h0) }
if (torch_is_installed()) { rnn <- nn_gru(10, 20, 2) input <- torch_randn(5, 3, 10) h0 <- torch_randn(2, 3, 20) output <- rnn(input, h0) }
Applies the hard shrinkage function element-wise:
nn_hardshrink(lambd = 0.5)
nn_hardshrink(lambd = 0.5)
lambd |
the |
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_hardshrink() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_hardshrink() input <- torch_randn(2) output <- m(input) }
Applies the element-wise function:
nn_hardsigmoid()
nn_hardsigmoid()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_hardsigmoid() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_hardsigmoid() input <- torch_randn(2) output <- m(input) }
Applies the hardswish function, element-wise, as described in the paper: Searching for MobileNetV3
nn_hardswish()
nn_hardswish()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { ## Not run: m <- nn_hardswish() input <- torch_randn(2) output <- m(input) ## End(Not run) }
if (torch_is_installed()) { ## Not run: m <- nn_hardswish() input <- torch_randn(2) output <- m(input) ## End(Not run) }
Applies the HardTanh function element-wise HardTanh is defined as:
nn_hardtanh(min_val = -1, max_val = 1, inplace = FALSE)
nn_hardtanh(min_val = -1, max_val = 1, inplace = FALSE)
min_val |
minimum value of the linear region range. Default: -1 |
max_val |
maximum value of the linear region range. Default: 1 |
inplace |
can optionally do the operation in-place. Default: |
The range of the linear region :math:[-1, 1]
can be adjusted using
min_val
and max_val
.
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_hardtanh(-2, 2) input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_hardtanh(-2, 2) input <- torch_randn(2) output <- m(input) }
Measures the loss given an input tensor and a labels tensor
(containing 1 or -1).
nn_hinge_embedding_loss(margin = 1, reduction = "mean")
nn_hinge_embedding_loss(margin = 1, reduction = "mean")
margin |
(float, optional): Has a default value of |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
This is usually used for measuring whether two inputs are similar or
dissimilar, e.g. using the L1 pairwise distance as , and is typically
used for learning nonlinear embeddings or semi-supervised learning.
The loss function for
-th sample in the mini-batch is
and the total loss functions is
where .
Input: where
means, any number of dimensions. The sum operation
operates over all the elements.
Target: , same shape as the input
Output: scalar. If reduction
is 'none'
, then same shape as the input
A placeholder identity operator that is argument-insensitive.
nn_identity(...)
nn_identity(...)
... |
any arguments (unused) |
if (torch_is_installed()) { m <- nn_identity(54, unused_argument1 = 0.1, unused_argument2 = FALSE) input <- torch_randn(128, 20) output <- m(input) print(output$size()) }
if (torch_is_installed()) { m <- nn_identity(54, unused_argument1 = 0.1, unused_argument2 = FALSE) input <- torch_randn(128, 20) output <- m(input) print(output$size()) }
Return the recommended gain value for the given nonlinearity function.
nn_init_calculate_gain(nonlinearity, param = NULL)
nn_init_calculate_gain(nonlinearity, param = NULL)
nonlinearity |
the non-linear function |
param |
optional parameter for the non-linear function |
Fills the input Tensor with the value val
.
nn_init_constant_(tensor, val)
nn_init_constant_(tensor, val)
tensor |
an n-dimensional |
val |
the value to fill the tensor with |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_constant_(w, 0.3) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_constant_(w, 0.3) }
Fills the {3, 4, 5}-dimensional input Tensor
with the Dirac
delta function. Preserves the identity of the inputs in Convolutional
layers, where as many input channels are preserved as possible. In case
of groups>1, each group of channels preserves identity.
nn_init_dirac_(tensor, groups = 1)
nn_init_dirac_(tensor, groups = 1)
tensor |
a {3, 4, 5}-dimensional |
groups |
(optional) number of groups in the conv layer (default: 1) |
if (torch_is_installed()) { ## Not run: w <- torch_empty(3, 16, 5, 5) nn_init_dirac_(w) ## End(Not run) }
if (torch_is_installed()) { ## Not run: w <- torch_empty(3, 16, 5, 5) nn_init_dirac_(w) ## End(Not run) }
Fills the 2-dimensional input Tensor
with the identity matrix.
Preserves the identity of the inputs in Linear
layers, where as
many inputs are preserved as possible.
nn_init_eye_(tensor)
nn_init_eye_(tensor)
tensor |
a 2-dimensional torch tensor. |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_eye_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_eye_(w) }
Fills the input Tensor
with values according to the method
described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification
- He, K. et al. (2015), using a
normal distribution.
nn_init_kaiming_normal_( tensor, a = 0, mode = "fan_in", nonlinearity = "leaky_relu" )
nn_init_kaiming_normal_( tensor, a = 0, mode = "fan_in", nonlinearity = "leaky_relu" )
tensor |
an n-dimensional |
a |
the negative slope of the rectifier used after this layer (only used
with |
mode |
either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass. |
nonlinearity |
the non-linear function. recommended to use only with 'relu' or 'leaky_relu' (default). |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_kaiming_normal_(w, mode = "fan_in", nonlinearity = "leaky_relu") }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_kaiming_normal_(w, mode = "fan_in", nonlinearity = "leaky_relu") }
Fills the input Tensor
with values according to the method
described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification
- He, K. et al. (2015), using a
uniform distribution.
nn_init_kaiming_uniform_( tensor, a = 0, mode = "fan_in", nonlinearity = "leaky_relu" )
nn_init_kaiming_uniform_( tensor, a = 0, mode = "fan_in", nonlinearity = "leaky_relu" )
tensor |
an n-dimensional |
a |
the negative slope of the rectifier used after this layer (only used
with |
mode |
either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass. |
nonlinearity |
the non-linear function. recommended to use only with 'relu' or 'leaky_relu' (default). |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_kaiming_uniform_(w, mode = "fan_in", nonlinearity = "leaky_relu") }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_kaiming_uniform_(w, mode = "fan_in", nonlinearity = "leaky_relu") }
Fills the input Tensor with values drawn from the normal distribution
nn_init_normal_(tensor, mean = 0, std = 1)
nn_init_normal_(tensor, mean = 0, std = 1)
tensor |
an n-dimensional Tensor |
mean |
the mean of the normal distribution |
std |
the standard deviation of the normal distribution |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_normal_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_normal_(w) }
Fills the input Tensor with the scalar value 1
nn_init_ones_(tensor)
nn_init_ones_(tensor)
tensor |
an n-dimensional |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_ones_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_ones_(w) }
Fills the input Tensor
with a (semi) orthogonal matrix, as
described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
- Saxe, A. et al. (2013). The input tensor must have
at least 2 dimensions, and for tensors with more than 2 dimensions the
trailing dimensions are flattened.
nn_init_orthogonal_(tensor, gain = 1)
nn_init_orthogonal_(tensor, gain = 1)
tensor |
an n-dimensional |
gain |
optional scaling factor |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_orthogonal_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_orthogonal_(w) }
Fills the 2D input Tensor
as a sparse matrix, where the
non-zero elements will be drawn from the normal distribution
as described in Deep learning via Hessian-free optimization
- Martens, J. (2010).
nn_init_sparse_(tensor, sparsity, std = 0.01)
nn_init_sparse_(tensor, sparsity, std = 0.01)
tensor |
an n-dimensional |
sparsity |
The fraction of elements in each column to be set to zero |
std |
the standard deviation of the normal distribution used to generate the non-zero values |
if (torch_is_installed()) { ## Not run: w <- torch_empty(3, 5) nn_init_sparse_(w, sparsity = 0.1) ## End(Not run) }
if (torch_is_installed()) { ## Not run: w <- torch_empty(3, 5) nn_init_sparse_(w, sparsity = 0.1) ## End(Not run) }
Fills the input Tensor with values drawn from a truncated normal distribution.
nn_init_trunc_normal_(tensor, mean = 0, std = 1, a = -2, b = 2)
nn_init_trunc_normal_(tensor, mean = 0, std = 1, a = -2, b = 2)
tensor |
an n-dimensional Tensor |
mean |
the mean of the normal distribution |
std |
the standard deviation of the normal distribution |
a |
the minimum cutoff value |
b |
the maximum cutoff value |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_trunc_normal_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_trunc_normal_(w) }
Fills the input Tensor with values drawn from the uniform distribution
nn_init_uniform_(tensor, a = 0, b = 1)
nn_init_uniform_(tensor, a = 0, b = 1)
tensor |
an n-dimensional Tensor |
a |
the lower bound of the uniform distribution |
b |
the upper bound of the uniform distribution |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_uniform_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_uniform_(w) }
Fills the input Tensor
with values according to the method
described in Understanding the difficulty of training deep feedforward neural networks
- Glorot, X. & Bengio, Y. (2010), using a normal
distribution.
nn_init_xavier_normal_(tensor, gain = 1)
nn_init_xavier_normal_(tensor, gain = 1)
tensor |
an n-dimensional |
gain |
an optional scaling factor |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_xavier_normal_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_xavier_normal_(w) }
Fills the input Tensor
with values according to the method
described in Understanding the difficulty of training deep feedforward neural networks
- Glorot, X. & Bengio, Y. (2010), using a uniform
distribution.
nn_init_xavier_uniform_(tensor, gain = 1)
nn_init_xavier_uniform_(tensor, gain = 1)
tensor |
an n-dimensional |
gain |
an optional scaling factor |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_xavier_uniform_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_xavier_uniform_(w) }
Fills the input Tensor with the scalar value 0
nn_init_zeros_(tensor)
nn_init_zeros_(tensor)
tensor |
an n-dimensional tensor |
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_zeros_(w) }
if (torch_is_installed()) { w <- torch_empty(3, 5) nn_init_zeros_(w) }
The Kullback-Leibler divergence loss measure Kullback-Leibler divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.
nn_kl_div_loss(reduction = "mean")
nn_kl_div_loss(reduction = "mean")
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
As with nn_nll_loss()
, the input
given is expected to contain
log-probabilities and is not restricted to a 2D Tensor.
The targets are interpreted as probabilities by default, but could be considered
as log-probabilities with log_target
set to TRUE
.
This criterion expects a target
Tensor
of the same size as the
input
Tensor
.
The unreduced (i.e. with reduction
set to 'none'
) loss can be described
as:
where the index spans all dimensions of
input
and has the same
shape as
input
. If reduction
is not 'none'
(default 'mean'
), then:
In default reduction
mode 'mean'
, the losses are averaged for each minibatch
over observations as well as over dimensions. 'batchmean'
mode gives the
correct KL divergence where losses are averaged over batch dimension only.
'mean'
mode's behavior will be changed to the same as 'batchmean'
in the next
major release.
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar by default. If reduction
is 'none'
, then ,
the same shape as the input
reduction
= 'mean'
doesn't return the true kl divergence value,
please use reduction
= 'batchmean'
which aligns with KL math
definition.
In the next major release, 'mean'
will be changed to be the same as
'batchmean'
.
Creates a criterion that measures the mean absolute error (MAE) between each
element in the input and target
.
nn_l1_loss(reduction = "mean")
nn_l1_loss(reduction = "mean")
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
The unreduced (i.e. with reduction
set to 'none'
) loss can be described
as:
where is the batch size. If
reduction
is not 'none'
(default 'mean'
), then:
and
are tensors of arbitrary shapes with a total
of
elements each.
The sum operation still operates over all the elements, and divides by .
The division by
can be avoided if one sets
reduction = 'sum'
.
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar. If reduction
is 'none'
, then
, same shape as the input
if (torch_is_installed()) { loss <- nn_l1_loss() input <- torch_randn(3, 5, requires_grad = TRUE) target <- torch_randn(3, 5) output <- loss(input, target) output$backward() }
if (torch_is_installed()) { loss <- nn_l1_loss() input <- torch_randn(3, 5, requires_grad = TRUE) target <- torch_randn(3, 5) output <- loss(input, target) output$backward() }
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
nn_layer_norm(normalized_shape, eps = 1e-05, elementwise_affine = TRUE)
nn_layer_norm(normalized_shape, eps = 1e-05, elementwise_affine = TRUE)
normalized_shape |
(int or list): input shape from an expected input
of size
|
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
elementwise_affine |
a boolean value that when set to |
The mean and standard-deviation are calculated separately over the last
certain number dimensions which have to be of the shape specified by
normalized_shape
.
and
are learnable affine transform parameters of
normalized_shape
if elementwise_affine
is TRUE
.
The standard-deviation is calculated via the biased estimator, equivalent to
torch_var(input, unbiased=FALSE)
.
Input:
Output: (same shape as input)
Unlike Batch Normalization and Instance Normalization, which applies
scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and
bias with elementwise_affine
.
This layer uses statistics computed from input data in both training and evaluation modes.
if (torch_is_installed()) { input <- torch_randn(20, 5, 10, 10) # With Learnable Parameters m <- nn_layer_norm(input$size()[-1]) # Without Learnable Parameters m <- nn_layer_norm(input$size()[-1], elementwise_affine = FALSE) # Normalize over last two dimensions m <- nn_layer_norm(c(10, 10)) # Normalize over last dimension of size 10 m <- nn_layer_norm(10) # Activating the module output <- m(input) }
if (torch_is_installed()) { input <- torch_randn(20, 5, 10, 10) # With Learnable Parameters m <- nn_layer_norm(input$size()[-1]) # Without Learnable Parameters m <- nn_layer_norm(input$size()[-1], elementwise_affine = FALSE) # Normalize over last two dimensions m <- nn_layer_norm(c(10, 10)) # Normalize over last dimension of size 10 m <- nn_layer_norm(10) # Activating the module output <- m(input) }
Applies the element-wise function:
nn_leaky_relu(negative_slope = 0.01, inplace = FALSE)
nn_leaky_relu(negative_slope = 0.01, inplace = FALSE)
negative_slope |
Controls the angle of the negative slope. Default: 1e-2 |
inplace |
can optionally do the operation in-place. Default: |
or
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_leaky_relu(0.1) input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_leaky_relu(0.1) input <- torch_randn(2) output <- m(input) }
Applies a linear transformation to the incoming data: y = xA^T + b
nn_linear(in_features, out_features, bias = TRUE)
nn_linear(in_features, out_features, bias = TRUE)
in_features |
size of each input sample |
out_features |
size of each output sample |
bias |
If set to |
Input: (N, *, H_in)
where *
means any number of
additional dimensions and H_in = in_features
.
Output: (N, *, H_out)
where all but the last dimension
are the same shape as the input and :math:H_out = out_features
.
weight: the learnable weights of the module of shape
(out_features, in_features)
. The values are
initialized from s, where
bias: the learnable bias of the module of shape .
If
bias
is TRUE
, the values are initialized from
where
if (torch_is_installed()) { m <- nn_linear(20, 30) input <- torch_randn(128, 20) output <- m(input) print(output$size()) }
if (torch_is_installed()) { m <- nn_linear(20, 30) input <- torch_randn(128, 20) output <- m(input) print(output$size()) }
Applies the element-wise function:
nn_log_sigmoid()
nn_log_sigmoid()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_log_sigmoid() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_log_sigmoid() input <- torch_randn(2) output <- m(input) }
Applies the function to an n-dimensional
input Tensor. The LogSoftmax formulation can be simplified as:
nn_log_softmax(dim)
nn_log_softmax(dim)
dim |
(int): A dimension along which LogSoftmax will be computed. |
a Tensor of the same dimension and shape as the input with values in the range [-inf, 0)
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_log_softmax(1) input <- torch_randn(2, 3) output <- m(input) }
if (torch_is_installed()) { m <- nn_log_softmax(1) input <- torch_randn(2, 3) output <- m(input) }
On each window, the function computed is:
nn_lp_pool1d(norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
nn_lp_pool1d(norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
norm_type |
if inf than one gets max pooling if 0 you get sum pooling ( proportional to the avg pooling) |
kernel_size |
a single int, the size of the window |
stride |
a single int, the stride of the window. Default value is |
ceil_mode |
when TRUE, will use |
At p = , one gets Max Pooling
At p = 1, one gets Sum Pooling (which is proportional to Average Pooling)
Input:
Output: , where
If the sum to the power of p
is zero, the gradient of this function is
not defined. This implementation will set the gradient to zero in this case.
if (torch_is_installed()) { # power-2 pool of window of length 3, with stride 2. m <- nn_lp_pool1d(2, 3, stride = 2) input <- torch_randn(20, 16, 50) output <- m(input) }
if (torch_is_installed()) { # power-2 pool of window of length 3, with stride 2. m <- nn_lp_pool1d(2, 3, stride = 2) input <- torch_randn(20, 16, 50) output <- m(input) }
On each window, the function computed is:
nn_lp_pool2d(norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
nn_lp_pool2d(norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
norm_type |
if inf than one gets max pooling if 0 you get sum pooling ( proportional to the avg pooling) |
kernel_size |
the size of the window |
stride |
the stride of the window. Default value is |
ceil_mode |
when TRUE, will use |
At p = , one gets Max Pooling
At p = 1, one gets Sum Pooling (which is proportional to average pooling)
The parameters kernel_size
, stride
can either be:
a single int
– in which case the same value is used for the height and width dimension
a tuple
of two ints – in which case, the first int
is used for the height dimension,
and the second int
for the width dimension
Input:
Output: , where
If the sum to the power of p
is zero, the gradient of this function is
not defined. This implementation will set the gradient to zero in this case.
if (torch_is_installed()) { # power-2 pool of square window of size=3, stride=2 m <- nn_lp_pool2d(2, 3, stride = 2) # pool of non-square window of power 1.2 m <- nn_lp_pool2d(1.2, c(3, 2), stride = c(2, 1)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
if (torch_is_installed()) { # power-2 pool of square window of size=3, stride=2 m <- nn_lp_pool2d(2, 3, stride = 2) # pool of non-square window of power 1.2 m <- nn_lp_pool2d(1.2, c(3, 2), stride = c(2, 1)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
For each element in the input sequence, each layer computes the following function:
nn_lstm( input_size, hidden_size, num_layers = 1, bias = TRUE, batch_first = FALSE, dropout = 0, bidirectional = FALSE, ... )
nn_lstm( input_size, hidden_size, num_layers = 1, bias = TRUE, batch_first = FALSE, dropout = 0, bidirectional = FALSE, ... )
input_size |
The number of expected features in the input |
The number of features in the hidden state |
|
num_layers |
Number of recurrent layers. E.g., setting |
bias |
If |
batch_first |
If |
dropout |
If non-zero, introduces a |
bidirectional |
If |
... |
currently unused. |
where is the hidden state at time
t
, is the cell
state at time
t
, is the input at time
t
,
is the hidden state of the previous layer at time
t-1
or the initial hidden
state at time 0
, and ,
,
,
are the input, forget, cell, and output gates, respectively.
is the sigmoid function.
Inputs: input, (h_0, c_0)
input of shape (seq_len, batch, input_size)
: tensor containing the features
of the input sequence.
The input can also be a packed variable length sequence.
See nn_utils_rnn_pack_padded_sequence()
or
nn_utils_rnn_pack_sequence()
for details.
h_0 of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the initial hidden state for each element in the batch.
c_0 of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the initial cell state for each element in the batch.
If (h_0, c_0)
is not provided, both h_0 and c_0 default to zero.
Outputs: output, (h_n, c_n)
output of shape (seq_len, batch, num_directions * hidden_size)
: tensor
containing the output features (h_t)
from the last layer of the LSTM,
for each t. If a torch_nn.utils.rnn.PackedSequence
has been
given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using output$view(c(seq_len, batch, num_directions, hidden_size))
,
with forward and backward being direction 0
and 1
respectively.
Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the hidden state for t = seq_len
.
Like output, the layers can be separated using
h_n$view(c(num_layers, num_directions, batch, hidden_size))
and similarly for c_n.
c_n (num_layers * num_directions, batch, hidden_size): tensor
containing the cell state for t = seq_len
weight_ih_l[k]
: the learnable input-hidden weights of the layer
(W_ii|W_if|W_ig|W_io)
, of shape (4*hidden_size x input_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the layer
(W_hi|W_hf|W_hg|W_ho)
, of shape (4*hidden_size x hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the layer
(b_ii|b_if|b_ig|b_io)
, of shape (4*hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the layer
(b_hi|b_hf|b_hg|b_ho)
, of shape (4*hidden_size)
All the weights and biases are initialized from
where
if (torch_is_installed()) { rnn <- nn_lstm(10, 20, 2) input <- torch_randn(5, 3, 10) h0 <- torch_randn(2, 3, 20) c0 <- torch_randn(2, 3, 20) output <- rnn(input, list(h0, c0)) }
if (torch_is_installed()) { rnn <- nn_lstm(10, 20, 2) input <- torch_randn(5, 3, 10) h0 <- torch_randn(2, 3, 20) c0 <- torch_randn(2, 3, 20) output <- rnn(input, list(h0, c0)) }
Creates a criterion that measures the loss given
inputs ,
, two 1D mini-batch
Tensors
,
and a label 1D mini-batch tensor (containing 1 or -1).
If
then it assumed the first input should be ranked higher
(have a larger value) than the second input, and vice-versa for
.
nn_margin_ranking_loss(margin = 0, reduction = "mean")
nn_margin_ranking_loss(margin = 0, reduction = "mean")
margin |
(float, optional): Has a default value of |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
The loss function for each pair of samples in the mini-batch is:
Input1: where
N
is the batch size.
Input2: , same shape as the Input1.
Target: , same shape as the inputs.
Output: scalar. If reduction
is 'none'
, then .
if (torch_is_installed()) { loss <- nn_margin_ranking_loss() input1 <- torch_randn(3, requires_grad = TRUE) input2 <- torch_randn(3, requires_grad = TRUE) target <- torch_randn(3)$sign() output <- loss(input1, input2, target) output$backward() }
if (torch_is_installed()) { loss <- nn_margin_ranking_loss() input1 <- torch_randn(3, requires_grad = TRUE) input2 <- torch_randn(3, requires_grad = TRUE) target <- torch_randn(3)$sign() output <- loss(input1, input2, target) output$backward() }
Applies a 1D max pooling over an input signal composed of several input planes.
nn_max_pool1d( kernel_size, stride = NULL, padding = 0, dilation = 1, return_indices = FALSE, ceil_mode = FALSE )
nn_max_pool1d( kernel_size, stride = NULL, padding = 0, dilation = 1, return_indices = FALSE, ceil_mode = FALSE )
kernel_size |
the size of the window to take a max over |
stride |
the stride of the window. Default value is |
padding |
implicit zero padding to be added on both sides |
dilation |
a parameter that controls the stride of elements in the window |
return_indices |
if |
ceil_mode |
when |
In the simplest case, the output value of the layer with input size
and output
can be precisely described as:
If padding
is non-zero, then the input is implicitly zero-padded on both sides
for padding
number of points. dilation
controls the spacing between the kernel points.
It is harder to describe, but this link
has a nice visualization of what dilation
does.
Input:
Output: , where
if (torch_is_installed()) { # pool of size=3, stride=2 m <- nn_max_pool1d(3, stride = 2) input <- torch_randn(20, 16, 50) output <- m(input) }
if (torch_is_installed()) { # pool of size=3, stride=2 m <- nn_max_pool1d(3, stride = 2) input <- torch_randn(20, 16, 50) output <- m(input) }
Applies a 2D max pooling over an input signal composed of several input planes.
nn_max_pool2d( kernel_size, stride = NULL, padding = 0, dilation = 1, return_indices = FALSE, ceil_mode = FALSE )
nn_max_pool2d( kernel_size, stride = NULL, padding = 0, dilation = 1, return_indices = FALSE, ceil_mode = FALSE )
kernel_size |
the size of the window to take a max over |
stride |
the stride of the window. Default value is |
padding |
implicit zero padding to be added on both sides |
dilation |
a parameter that controls the stride of elements in the window |
return_indices |
if |
ceil_mode |
when |
In the simplest case, the output value of the layer with input size ,
output
and
kernel_size
can be precisely described as:
If padding
is non-zero, then the input is implicitly zero-padded on both sides
for padding
number of points. dilation
controls the spacing between the kernel points.
It is harder to describe, but this link
has a nice visualization of what dilation
does.
The parameters kernel_size
, stride
, padding
, dilation
can either be:
a single int
– in which case the same value is used for the height and width dimension
a tuple
of two ints – in which case, the first int
is used for the height dimension,
and the second int
for the width dimension
Input:
Output: , where
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_max_pool2d(3, stride = 2) # pool of non-square window m <- nn_max_pool2d(c(3, 2), stride = c(2, 1)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_max_pool2d(3, stride = 2) # pool of non-square window m <- nn_max_pool2d(c(3, 2), stride = c(2, 1)) input <- torch_randn(20, 16, 50, 32) output <- m(input) }
In the simplest case, the output value of the layer with input size ,
output
and
kernel_size
can be precisely described as:
nn_max_pool3d( kernel_size, stride = NULL, padding = 0, dilation = 1, return_indices = FALSE, ceil_mode = FALSE )
nn_max_pool3d( kernel_size, stride = NULL, padding = 0, dilation = 1, return_indices = FALSE, ceil_mode = FALSE )
kernel_size |
the size of the window to take a max over |
stride |
the stride of the window. Default value is |
padding |
implicit zero padding to be added on all three sides |
dilation |
a parameter that controls the stride of elements in the window |
return_indices |
if |
ceil_mode |
when TRUE, will use |
If padding
is non-zero, then the input is implicitly zero-padded on both sides
for padding
number of points. dilation
controls the spacing between the kernel points.
It is harder to describe, but this link
_ has a nice visualization of what dilation
does.
The parameters kernel_size
, stride
, padding
, dilation
can either be:
a single int
– in which case the same value is used for the depth, height and width dimension
a tuple
of three ints – in which case, the first int
is used for the depth dimension,
the second int
for the height dimension and the third int
for the width dimension
Input:
Output: , where
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_max_pool3d(3, stride = 2) # pool of non-square window m <- nn_max_pool3d(c(3, 2, 2), stride = c(2, 1, 2)) input <- torch_randn(20, 16, 50, 44, 31) output <- m(input) }
if (torch_is_installed()) { # pool of square window of size=3, stride=2 m <- nn_max_pool3d(3, stride = 2) # pool of non-square window m <- nn_max_pool3d(c(3, 2, 2), stride = c(2, 1, 2)) input <- torch_randn(20, 16, 50, 44, 31) output <- m(input) }
MaxPool1d
.MaxPool1d
is not fully invertible, since the non-maximal values are lost.
MaxUnpool1d
takes in as input the output of MaxPool1d
including the indices of the maximal values and computes a partial inverse
in which all non-maximal values are set to zero.
nn_max_unpool1d(kernel_size, stride = NULL, padding = 0)
nn_max_unpool1d(kernel_size, stride = NULL, padding = 0)
kernel_size |
(int or tuple): Size of the max pooling window. |
stride |
(int or tuple): Stride of the max pooling window.
It is set to |
padding |
(int or tuple): Padding that was added to the input |
input
: the input Tensor to invert
indices
: the indices given out by nn_max_pool1d()
output_size
(optional): the targeted output size
Input:
Output: , where
or as given by output_size
in the call operator
MaxPool1d
can map several input sizes to the same output
sizes. Hence, the inversion process can get ambiguous.
To accommodate this, you can provide the needed output size
as an additional argument output_size
in the forward call.
See the Inputs and Example below.
if (torch_is_installed()) { pool <- nn_max_pool1d(2, stride = 2, return_indices = TRUE) unpool <- nn_max_unpool1d(2, stride = 2) input <- torch_tensor(array(1:8 / 1, dim = c(1, 1, 8))) out <- pool(input) unpool(out[[1]], out[[2]]) # Example showcasing the use of output_size input <- torch_tensor(array(1:8 / 1, dim = c(1, 1, 8))) out <- pool(input) unpool(out[[1]], out[[2]], output_size = input$size()) unpool(out[[1]], out[[2]]) }
if (torch_is_installed()) { pool <- nn_max_pool1d(2, stride = 2, return_indices = TRUE) unpool <- nn_max_unpool1d(2, stride = 2) input <- torch_tensor(array(1:8 / 1, dim = c(1, 1, 8))) out <- pool(input) unpool(out[[1]], out[[2]]) # Example showcasing the use of output_size input <- torch_tensor(array(1:8 / 1, dim = c(1, 1, 8))) out <- pool(input) unpool(out[[1]], out[[2]], output_size = input$size()) unpool(out[[1]], out[[2]]) }
MaxPool2d
.MaxPool2d
is not fully invertible, since the non-maximal values are lost.
MaxUnpool2d
takes in as input the output of MaxPool2d
including the indices of the maximal values and computes a partial inverse
in which all non-maximal values are set to zero.
nn_max_unpool2d(kernel_size, stride = NULL, padding = 0)
nn_max_unpool2d(kernel_size, stride = NULL, padding = 0)
kernel_size |
(int or tuple): Size of the max pooling window. |
stride |
(int or tuple): Stride of the max pooling window.
It is set to |
padding |
(int or tuple): Padding that was added to the input |
input
: the input Tensor to invert
indices
: the indices given out by nn_max_pool2d()
output_size
(optional): the targeted output size
Input:
Output: , where
or as given by output_size
in the call operator
MaxPool2d
can map several input sizes to the same output
sizes. Hence, the inversion process can get ambiguous.
To accommodate this, you can provide the needed output size
as an additional argument output_size
in the forward call.
See the Inputs and Example below.
if (torch_is_installed()) { pool <- nn_max_pool2d(2, stride = 2, return_indices = TRUE) unpool <- nn_max_unpool2d(2, stride = 2) input <- torch_randn(1, 1, 4, 4) out <- pool(input) unpool(out[[1]], out[[2]]) # specify a different output size than input size unpool(out[[1]], out[[2]], output_size = c(1, 1, 5, 5)) }
if (torch_is_installed()) { pool <- nn_max_pool2d(2, stride = 2, return_indices = TRUE) unpool <- nn_max_unpool2d(2, stride = 2) input <- torch_randn(1, 1, 4, 4) out <- pool(input) unpool(out[[1]], out[[2]]) # specify a different output size than input size unpool(out[[1]], out[[2]], output_size = c(1, 1, 5, 5)) }
MaxPool3d
.MaxPool3d
is not fully invertible, since the non-maximal values are lost.
MaxUnpool3d
takes in as input the output of MaxPool3d
including the indices of the maximal values and computes a partial inverse
in which all non-maximal values are set to zero.
nn_max_unpool3d(kernel_size, stride = NULL, padding = 0)
nn_max_unpool3d(kernel_size, stride = NULL, padding = 0)
kernel_size |
(int or tuple): Size of the max pooling window. |
stride |
(int or tuple): Stride of the max pooling window.
It is set to |
padding |
(int or tuple): Padding that was added to the input |
input
: the input Tensor to invert
indices
: the indices given out by nn_max_pool3d()
output_size
(optional): the targeted output size
Input:
Output: , where
or as given by output_size
in the call operator
MaxPool3d
can map several input sizes to the same output
sizes. Hence, the inversion process can get ambiguous.
To accommodate this, you can provide the needed output size
as an additional argument output_size
in the forward call.
See the Inputs section below.
if (torch_is_installed()) { # pool of square window of size=3, stride=2 pool <- nn_max_pool3d(3, stride = 2, return_indices = TRUE) unpool <- nn_max_unpool3d(3, stride = 2) out <- pool(torch_randn(20, 16, 51, 33, 15)) unpooled_output <- unpool(out[[1]], out[[2]]) unpooled_output$size() }
if (torch_is_installed()) { # pool of square window of size=3, stride=2 pool <- nn_max_pool3d(3, stride = 2, return_indices = TRUE) unpool <- nn_max_unpool3d(3, stride = 2) out <- pool(torch_randn(20, 16, 51, 33, 15)) unpooled_output <- unpool(out[[1]], out[[2]]) unpooled_output$size() }
Your models should also subclass this class.
nn_module( classname = NULL, inherit = nn_Module, ..., private = NULL, active = NULL, parent_env = parent.frame() )
nn_module( classname = NULL, inherit = nn_Module, ..., private = NULL, active = NULL, parent_env = parent.frame() )
classname |
an optional name for the module |
inherit |
an optional module to inherit from |
... |
methods implementation |
private |
passed to |
active |
passed to |
parent_env |
passed to |
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes.
You are expected to implement the initialize
and the forward
to create a
new nn_module
.
The initialize function will be called whenever a new instance of the nn_module
is created. We use the initialize functions to define submodules and parameters
of the module. For example:
initialize = function(input_size, output_size) { self$conv1 <- nn_conv2d(input_size, output_size, 5) self$conv2 <- nn_conv2d(output_size, output_size, 5) }
The initialize function can have any number of parameters. All objects
assigned to self$
will be available for other methods that you implement.
Tensors wrapped with nn_parameter()
or nn_buffer()
and submodules are
automatically tracked when assigned to self$
.
The initialize function is optional if the module you are defining doesn't have weights, submodules or buffers.
The forward method is called whenever an instance of nn_module
is called.
This is usually used to implement the computation that the module does with
the weights ad submodules defined in the initialize
function.
For example:
forward = function(input) { input <- self$conv1(input) input <- nnf_relu(input) input <- self$conv2(input) input <- nnf_relu(input) input }
The forward
function can use the self$training
attribute to make different
computations depending wether the model is training or not, for example if you
were implementing the dropout module.
To finalize the cloning of a module, you can define a private finalize_deep_clone()
method.
This method is called on the cloned object when deep-cloning a module, after all the modules, parameters and
buffers were already cloned.
if (torch_is_installed()) { model <- nn_module( initialize = function() { self$conv1 <- nn_conv2d(1, 20, 5) self$conv2 <- nn_conv2d(20, 20, 5) }, forward = function(input) { input <- self$conv1(input) input <- nnf_relu(input) input <- self$conv2(input) input <- nnf_relu(input) input } ) }
if (torch_is_installed()) { model <- nn_module( initialize = function() { self$conv1 <- nn_conv2d(1, 20, 5) self$conv2 <- nn_conv2d(20, 20, 5) }, forward = function(input) { input <- self$conv1(input) input <- nnf_relu(input) input <- self$conv2(input) input <- nnf_relu(input) input } ) }
Container that allows named values
nn_module_dict(dict)
nn_module_dict(dict)
dict |
A named list of submodules that will be saved in that module. |
if (torch_is_installed()) { nn_module <- nn_module( initialize = function() { self$dict <- nn_module_dict(list( l1 = nn_linear(10, 20), l2 = nn_linear(20, 10) )) }, forward = function(x) { x <- self$dict$l1(x) self$dict$l2(x) } ) }
if (torch_is_installed()) { nn_module <- nn_module( initialize = function() { self$dict <- nn_module_dict(list( l1 = nn_linear(10, 20), l2 = nn_linear(20, 10) )) }, forward = function(x) { x <- self$dict$l1(x) self$dict$l2(x) } ) }
nn_module_list can be indexed like a regular R list, but
modules it contains are properly registered, and will be visible by all
nn_module
methods.
nn_module_list(modules = list())
nn_module_list(modules = list())
modules |
a list of modules to add |
if (torch_is_installed()) { my_module <- nn_module( initialize = function() { self$linears <- nn_module_list(lapply(1:10, function(x) nn_linear(10, 10))) }, forward = function(x) { for (i in 1:length(self$linears)) { x <- self$linears[[i]](x) } x } ) }
if (torch_is_installed()) { my_module <- nn_module( initialize = function() { self$linears <- nn_module_list(lapply(1:10, function(x) nn_linear(10, 10))) }, forward = function(x) { for (i in 1:length(self$linears)) { x <- self$linears[[i]](x) } x } ) }
Creates a criterion that measures the mean squared error (squared L2 norm) between
each element in the input and target
.
The unreduced (i.e. with
reduction
set to 'none'
) loss can be described
as:
nn_mse_loss(reduction = "mean")
nn_mse_loss(reduction = "mean")
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
where is the batch size. If
reduction
is not 'none'
(default 'mean'
), then:
and
are tensors of arbitrary shapes with a total
of
elements each.
The mean operation still operates over all the elements, and divides by .
The division by
can be avoided if one sets
reduction = 'sum'
.
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
if (torch_is_installed()) { loss <- nn_mse_loss() input <- torch_randn(3, 5, requires_grad = TRUE) target <- torch_randn(3, 5) output <- loss(input, target) output$backward() }
if (torch_is_installed()) { loss <- nn_mse_loss() input <- torch_randn(3, 5, requires_grad = TRUE) target <- torch_randn(3, 5) output <- loss(input, target) output$backward() }
Creates a criterion that optimizes a multi-class classification hinge
loss (margin-based loss) between input (a 2D mini-batch
Tensor
) and
output (which is a 1D tensor of target class indices,
):
nn_multi_margin_loss(p = 1, margin = 1, weight = NULL, reduction = "mean")
nn_multi_margin_loss(p = 1, margin = 1, weight = NULL, reduction = "mean")
p |
(int, optional): Has a default value of |
margin |
(float, optional): Has a default value of |
weight |
(Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
For each mini-batch sample, the loss in terms of the 1D input and scalar
output
is:
where
and
.
Optionally, you can give non-equal weighting on the classes by passing
a 1D weight
tensor into the constructor.
The loss function then becomes:
Allows the model to jointly attend to information from different representation subspaces. See reference: Attention Is All You Need
nn_multihead_attention( embed_dim, num_heads, dropout = 0, bias = TRUE, add_bias_kv = FALSE, add_zero_attn = FALSE, kdim = NULL, vdim = NULL, batch_first = FALSE )
nn_multihead_attention( embed_dim, num_heads, dropout = 0, bias = TRUE, add_bias_kv = FALSE, add_zero_attn = FALSE, kdim = NULL, vdim = NULL, batch_first = FALSE )
embed_dim |
total dimension of the model. |
num_heads |
parallel attention heads. Note that |
dropout |
a Dropout layer on attn_output_weights. Default: 0.0. |
bias |
add bias as module parameter. Default: True. |
add_bias_kv |
add bias to the key and value sequences at dim=0. |
add_zero_attn |
add a new batch of zeros to the key and value sequences at dim=1. |
kdim |
total number of features in key. Default: |
vdim |
total number of features in value. Default: |
batch_first |
if |
Inputs:
query: where L is the target sequence length, N is the
batch size, E is the embedding dimension. (but see the
batch_first
argument)
key: , where S is the source sequence length, N is the
batch size, E is the embedding dimension. (but see the
batch_first
argument)
value: where S is the source sequence length,
N is the batch size, E is the embedding dimension. (but see the
batch_first
argument)
key_padding_mask: where N is the batch size, S is the source
sequence length. If a ByteTensor is provided, the non-zero positions will
be ignored while the position with the zero positions will be unchanged. If
a BoolTensor is provided, the positions with the value of
True
will be
ignored while the position with the value of False
will be unchanged.
attn_mask: 2D mask where L is the target sequence length, S
is the source sequence length. 3D mask
where N is
the batch size, L is the target sequence length, S is the source sequence
length. attn_mask ensure that position i is allowed to attend the unmasked
positions. If a ByteTensor is provided, the non-zero positions are not
allowed to attend while the zero positions will be unchanged. If a
BoolTensor is provided, positions with
True
are not allowed to attend
while False
values will be unchanged. If a FloatTensor is provided, it
will be added to the attention weight.
Outputs:
attn_output: where L is the target sequence length, N is
the batch size, E is the embedding dimension. (but see the
batch_first
argument)
attn_output_weights:
if avg_weights
is TRUE
(the default), the output attention
weights are averaged over the attention heads, giving a tensor of shape
where N is the batch size, L is the target sequence
length, S is the source sequence length.
if avg_weights
is FALSE
, the attention weight tensor is output
as-is, with shape , where H is the number of attention
heads.
if (torch_is_installed()) { ## Not run: multihead_attn <- nn_multihead_attention(embed_dim, num_heads) out <- multihead_attn(query, key, value) attn_output <- out[[1]] attn_output_weights <- out[[2]] ## End(Not run) }
if (torch_is_installed()) { ## Not run: multihead_attn <- nn_multihead_attention(embed_dim, num_heads) out <- multihead_attn(query, key, value) attn_output <- out[[1]] attn_output_weights <- out[[2]] ## End(Not run) }
Creates a criterion that optimizes a multi-class multi-classification
hinge loss (margin-based loss) between input (a 2D mini-batch
Tensor
)
and output (which is a 2D
Tensor
of target class indices).
For each sample in the mini-batch:
nn_multilabel_margin_loss(reduction = "mean")
nn_multilabel_margin_loss(reduction = "mean")
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
where , \
, \
, \
and
for all
and
.
and
must have the same size.
The criterion only considers a contiguous block of non-negative targets that starts at the front. This allows for different samples to have variable amounts of target classes.
Input: or
where
N
is the batch size and C
is the number of classes.
Target: or
, label targets padded by -1 ensuring same shape as the input.
Output: scalar. If reduction
is 'none'
, then .
if (torch_is_installed()) { loss <- nn_multilabel_margin_loss() x <- torch_tensor(c(0.1, 0.2, 0.4, 0.8))$view(c(1, 4)) # for target y, only consider labels 4 and 1, not after label -1 y <- torch_tensor(c(4, 1, -1, 2), dtype = torch_long())$view(c(1, 4)) loss(x, y) }
if (torch_is_installed()) { loss <- nn_multilabel_margin_loss() x <- torch_tensor(c(0.1, 0.2, 0.4, 0.8))$view(c(1, 4)) # for target y, only consider labels 4 and 1, not after label -1 y <- torch_tensor(c(4, 1, -1, 2), dtype = torch_long())$view(c(1, 4)) loss(x, y) }
Creates a criterion that optimizes a multi-label one-versus-all
loss based on max-entropy, between input and target
of size
.
nn_multilabel_soft_margin_loss(weight = NULL, reduction = "mean")
nn_multilabel_soft_margin_loss(weight = NULL, reduction = "mean")
weight |
(Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
For each sample in the minibatch:
where ,
.
Input: where
N
is the batch size and C
is the number of classes.
Target: , label targets padded by -1 ensuring same shape as the input.
Output: scalar. If reduction
is 'none'
, then .
The negative log likelihood loss. It is useful to train a classification
problem with C
classes.
nn_nll_loss(weight = NULL, ignore_index = -100, reduction = "mean")
nn_nll_loss(weight = NULL, ignore_index = -100, reduction = "mean")
weight |
(Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size |
ignore_index |
(int, optional): Specifies a target value that is ignored and does not contribute to the input gradient. |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
If provided, the optional argument weight
should be a 1D Tensor assigning
weight to each of the classes. This is particularly useful when you have an
unbalanced training set.
The input
given through a forward call is expected to contain
log-probabilities of each class. input
has to be a Tensor of size either
or
with
for the
K
-dimensional case (described later).
Obtaining log-probabilities in a neural network is easily achieved by
adding a LogSoftmax
layer in the last layer of your network.
You may use CrossEntropyLoss
instead, if you prefer not to add an extra
layer.
The target
that this loss expects should be a class index in the range
where
C = number of classes
; if ignore_index
is specified, this loss also accepts
this class index (this index may not necessarily be in the class range).
The unreduced (i.e. with reduction
set to 'none'
) loss can be described as:
where is the input,
is the target,
is the weight, and
is the batch size. If
reduction
is not 'none'
(default 'mean'
), then
Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size with
,
where
is the number of dimensions, and a target of appropriate shape
(see below). In the case of images, it computes NLL loss per-pixel.
Input: where
C = number of classes
, or
with
in the case of
K
-dimensional loss.
Target: where each value is
, or
with
in the case of
K-dimensional loss.
Output: scalar.
If reduction
is 'none'
, then the same size as the target: , or
with
in the case
of K-dimensional loss.
if (torch_is_installed()) { m <- nn_log_softmax(dim = 2) loss <- nn_nll_loss() # input is of size N x C = 3 x 5 input <- torch_randn(3, 5, requires_grad = TRUE) # each element in target has to have 0 <= value < C target <- torch_tensor(c(2, 1, 5), dtype = torch_long()) output <- loss(m(input), target) output$backward() # 2D loss example (used, for example, with image inputs) N <- 5 C <- 4 loss <- nn_nll_loss() # input is of size N x C x height x width data <- torch_randn(N, 16, 10, 10) conv <- nn_conv2d(16, C, c(3, 3)) m <- nn_log_softmax(dim = 1) # each element in target has to have 0 <= value < C target <- torch_empty(N, 8, 8, dtype = torch_long())$random_(1, C) output <- loss(m(conv(data)), target) output$backward() }
if (torch_is_installed()) { m <- nn_log_softmax(dim = 2) loss <- nn_nll_loss() # input is of size N x C = 3 x 5 input <- torch_randn(3, 5, requires_grad = TRUE) # each element in target has to have 0 <= value < C target <- torch_tensor(c(2, 1, 5), dtype = torch_long()) output <- loss(m(input), target) output$backward() # 2D loss example (used, for example, with image inputs) N <- 5 C <- 4 loss <- nn_nll_loss() # input is of size N x C x height x width data <- torch_randn(N, 16, 10, 10) conv <- nn_conv2d(16, C, c(3, 3)) m <- nn_log_softmax(dim = 1) # each element in target has to have 0 <= value < C target <- torch_empty(N, 8, 8, dtype = torch_long())$random_(1, C) output <- loss(m(conv(data)), target) output$backward() }
Computes the batchwise pairwise distance between vectors ,
using the p-norm:
nn_pairwise_distance(p = 2, eps = 1e-06, keepdim = FALSE)
nn_pairwise_distance(p = 2, eps = 1e-06, keepdim = FALSE)
p |
(real): the norm degree. Default: 2 |
eps |
(float, optional): Small value to avoid division by zero. Default: 1e-6 |
keepdim |
(bool, optional): Determines whether or not to keep the vector dimension. Default: FALSE |
Input1: where
D = vector dimension
Input2: , same shape as the Input1
Output: . If
keepdim
is TRUE
, then .
if (torch_is_installed()) { pdist <- nn_pairwise_distance(p = 2) input1 <- torch_randn(100, 128) input2 <- torch_randn(100, 128) output <- pdist(input1, input2) }
if (torch_is_installed()) { pdist <- nn_pairwise_distance(p = 2) input1 <- torch_randn(100, 128) input2 <- torch_randn(100, 128) output <- pdist(input1, input2) }
nn_parameter
Indicates to nn_module that x
is a parameter
nn_parameter(x, requires_grad = TRUE)
nn_parameter(x, requires_grad = TRUE)
x |
the tensor that you want to indicate as parameter |
requires_grad |
whether this parameter should have
|
Negative log likelihood loss with Poisson distribution of target. The loss can be described as:
nn_poisson_nll_loss( log_input = TRUE, full = FALSE, eps = 1e-08, reduction = "mean" )
nn_poisson_nll_loss( log_input = TRUE, full = FALSE, eps = 1e-08, reduction = "mean" )
log_input |
(bool, optional): if |
full |
(bool, optional): whether to compute full loss, i. e. to add the
Stirling approximation term
|
eps |
(float, optional): Small value to avoid evaluation of |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
The last term can be omitted or approximated with Stirling formula. The approximation is used for target values more than 1. For targets less or equal to 1 zeros are added to the loss.
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar by default. If reduction
is 'none'
, then ,
the same shape as the input
if (torch_is_installed()) { loss <- nn_poisson_nll_loss() log_input <- torch_randn(5, 2, requires_grad = TRUE) target <- torch_randn(5, 2) output <- loss(log_input, target) output$backward() }
if (torch_is_installed()) { loss <- nn_poisson_nll_loss() log_input <- torch_randn(5, 2, requires_grad = TRUE) target <- torch_randn(5, 2) output <- loss(log_input, target) output$backward() }
Applies the element-wise function:
or
nn_prelu(num_parameters = 1, init = 0.25)
nn_prelu(num_parameters = 1, init = 0.25)
num_parameters |
(int): number of |
init |
(float): the initial value of |
Here is a learnable parameter. When called without arguments,
nn.prelu()
uses a single
parameter across all input channels. If called with
nn_prelu(nChannels)
,
a separate is used for each input channel.
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
weight (Tensor): the learnable weights of shape (num_parameters
).
weight decay should not be used when learning for good performance.
Channel dim is the 2nd dim of input. When input has dims < 2, then there is no channel dim and the number of channels = 1.
if (torch_is_installed()) { m <- nn_prelu() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_prelu() input <- torch_randn(2) output <- m(input) }
Prune head_size
last layers of a nn_module in order to
replace them by your own head, or in order to use the pruned module
as a sequential embedding module.
nn_prune_head(x, head_size)
nn_prune_head(x, head_size)
x |
nn_network to prune |
head_size |
number of nn_layers to prune |
a nn_sequential network with the top nn_layer removed
if (torch_is_installed()) { if (torch_is_installed()) { x <- nn_sequential( nn_relu(), nn_tanh(), nn_relu6(), nn_relu(), nn_linear(2,10), nn_batch_norm1d(10), nn_tanh(), nn_linear(10,3) ) prune <- nn_prune_head(x, 3) prune } }
if (torch_is_installed()) { if (torch_is_installed()) { x <- nn_sequential( nn_relu(), nn_tanh(), nn_relu6(), nn_relu(), nn_linear(2,10), nn_batch_norm1d(10), nn_tanh(), nn_linear(10,3) ) prune <- nn_prune_head(x, 3) prune } }
Applies the rectified linear unit function element-wise
nn_relu(inplace = FALSE)
nn_relu(inplace = FALSE)
inplace |
can optionally do the operation in-place. Default: |
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_relu() input <- torch_randn(2) m(input) }
if (torch_is_installed()) { m <- nn_relu() input <- torch_randn(2) m(input) }
Applies the element-wise function:
nn_relu6(inplace = FALSE)
nn_relu6(inplace = FALSE)
inplace |
can optionally do the operation in-place. Default: |
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_relu6() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_relu6() input <- torch_randn(2) output <- m(input) }
Applies a multi-layer Elman RNN with or
non-linearity
to an input sequence.
nn_rnn( input_size, hidden_size, num_layers = 1, nonlinearity = NULL, bias = TRUE, batch_first = FALSE, dropout = 0, bidirectional = FALSE, ... )
nn_rnn( input_size, hidden_size, num_layers = 1, nonlinearity = NULL, bias = TRUE, batch_first = FALSE, dropout = 0, bidirectional = FALSE, ... )
input_size |
The number of expected features in the input |
The number of features in the hidden state |
|
num_layers |
Number of recurrent layers. E.g., setting |
nonlinearity |
The non-linearity to use. Can be either |
bias |
If |
batch_first |
If |
dropout |
If non-zero, introduces a |
bidirectional |
If |
... |
other arguments that can be passed to the super class. |
For each element in the input sequence, each layer computes the following function:
where is the hidden state at time
t
, is
the input at time
t
, and is the hidden state of the
previous layer at time
t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then is used instead of
.
input of shape (seq_len, batch, input_size)
: tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence.
h_0 of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. If the RNN is bidirectional,
num_directions should be 2, else it should be 1.
output of shape (seq_len, batch, num_directions * hidden_size)
: tensor
containing the output features (h_t
) from the last layer of the RNN,
for each t
. If a :class:nn_packed_sequence
has
been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using output$view(seq_len, batch, num_directions, hidden_size)
,
with forward and backward being direction 0
and 1
respectively.
Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the hidden state for t = seq_len
.
Like output, the layers can be separated using
h_n$view(num_layers, num_directions, batch, hidden_size)
.
Input1: tensor containing input features where
and
L
represents a sequence length.
Input2: tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. where
If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: where
Output2: tensor containing the next hidden state
for each element in the batch
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer,
of shape (hidden_size, input_size)
for k = 0
. Otherwise, the shape is
(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer,
of shape (hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer,
of shape (hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer,
of shape (hidden_size)
All the weights and biases are initialized from
where
if (torch_is_installed()) { rnn <- nn_rnn(10, 20, 2) input <- torch_randn(5, 3, 10) h0 <- torch_randn(2, 3, 20) rnn(input, h0) }
if (torch_is_installed()) { rnn <- nn_rnn(10, 20, 2) input <- torch_randn(5, 3, 10) h0 <- torch_randn(2, 3, 20) rnn(input, h0) }
Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper:
nn_rrelu(lower = 1/8, upper = 1/3, inplace = FALSE)
nn_rrelu(lower = 1/8, upper = 1/3, inplace = FALSE)
lower |
lower bound of the uniform distribution. Default: |
upper |
upper bound of the uniform distribution. Default: |
inplace |
can optionally do the operation in-place. Default: |
Empirical Evaluation of Rectified Activations in Convolutional Network
.
The function is defined as:
where is randomly sampled from uniform distribution
.
See: https://arxiv.org/pdf/1505.00853.pdf
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_rrelu(0.1, 0.3) input <- torch_randn(2) m(input) }
if (torch_is_installed()) { m <- nn_rrelu(0.1, 0.3) input <- torch_randn(2) m(input) }
Applied element-wise, as:
nn_selu(inplace = FALSE)
nn_selu(inplace = FALSE)
inplace |
(bool, optional): can optionally do the operation in-place. Default: |
with and
.
More details can be found in the paper Self-Normalizing Neural Networks.
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_selu() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_selu() input <- torch_randn(2) output <- m(input) }
A sequential container. Modules will be added to it in the order they are passed in the constructor. See examples.
nn_sequential(...)
nn_sequential(...)
... |
sequence of modules to be added |
if (torch_is_installed()) { model <- nn_sequential( nn_conv2d(1, 20, 5), nn_relu(), nn_conv2d(20, 64, 5), nn_relu() ) input <- torch_randn(32, 1, 28, 28) output <- model(input) }
if (torch_is_installed()) { model <- nn_sequential( nn_conv2d(1, 20, 5), nn_relu(), nn_conv2d(20, 64, 5), nn_relu() ) input <- torch_randn(32, 1, 28, 28) output <- model(input) }
Applies the element-wise function:
nn_sigmoid()
nn_sigmoid()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_sigmoid() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_sigmoid() input <- torch_randn(2) output <- m(input) }
Applies the Sigmoid Linear Unit (SiLU) function, element-wise. The SiLU function is also known as the swish function.
nn_silu(inplace = FALSE)
nn_silu(inplace = FALSE)
inplace |
can optionally do the operation in-place. Default: |
See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the SiLU was experimented with later.
Creates a criterion that uses a squared term if the absolute
element-wise error falls below 1 and an L1 term otherwise.
It is less sensitive to outliers than the MSELoss
and in some cases
prevents exploding gradients (e.g. see Fast R-CNN
paper by Ross Girshick).
Also known as the Huber loss:
nn_smooth_l1_loss(reduction = "mean")
nn_smooth_l1_loss(reduction = "mean")
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
where is given by:
and
arbitrary shapes with a total of
elements each
the sum operation still operates over all the elements, and divides by
.
The division by
can be avoided if sets
reduction = 'sum'
.
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar. If reduction
is 'none'
, then
, same shape as the input
Creates a criterion that optimizes a two-class classification
logistic loss between input tensor and target tensor
(containing 1 or -1).
nn_soft_margin_loss(reduction = "mean")
nn_soft_margin_loss(reduction = "mean")
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
Input: where
means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar. If reduction
is 'none'
, then same shape as the input
Applies the Softmax function to an n-dimensional input Tensor
rescaling them so that the elements of the n-dimensional output Tensor
lie in the range [0,1]
and sum to 1.
Softmax is defined as:
nn_softmax(dim)
nn_softmax(dim)
dim |
(int): A dimension along which Softmax will be computed (so every slice along dim will sum to 1). |
When the input Tensor is a sparse tensor then the unspecifed
values are treated as -Inf
.
:
a Tensor of the same dimension and shape as the input with
values in the range [0, 1]
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
This module doesn't work directly with NLLLoss,
which expects the Log to be computed between the Softmax and itself.
Use LogSoftmax
instead (it's faster and has better numerical properties).
if (torch_is_installed()) { m <- nn_softmax(1) input <- torch_randn(2, 3) output <- m(input) }
if (torch_is_installed()) { m <- nn_softmax(1) input <- torch_randn(2, 3) output <- m(input) }
Applies SoftMax over features to each spatial location.
When given an image of Channels x Height x Width
, it will
apply Softmax
to each location
nn_softmax2d()
nn_softmax2d()
a Tensor of the same dimension and shape as the input with
values in the range [0, 1]
Input:
Output: (same shape as input)
if (torch_is_installed()) { m <- nn_softmax2d() input <- torch_randn(2, 3, 12, 13) output <- m(input) }
if (torch_is_installed()) { m <- nn_softmax2d() input <- torch_randn(2, 3, 12, 13) output <- m(input) }
Applies the Softmin function to an n-dimensional input Tensor
rescaling them so that the elements of the n-dimensional output Tensor
lie in the range [0, 1]
and sum to 1.
Softmin is defined as:
nn_softmin(dim)
nn_softmin(dim)
dim |
(int): A dimension along which Softmin will be computed (so every slice along dim will sum to 1). |
a Tensor of the same dimension and shape as the input, with
values in the range [0, 1]
.
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_softmin(dim = 1) input <- torch_randn(2, 2) output <- m(input) }
if (torch_is_installed()) { m <- nn_softmin(dim = 1) input <- torch_randn(2, 2) output <- m(input) }
Applies the element-wise function:
nn_softplus(beta = 1, threshold = 20)
nn_softplus(beta = 1, threshold = 20)
beta |
the |
threshold |
values above this revert to a linear function. Default: 20 |
SoftPlus is a smooth approximation to the ReLU function and can be used
to constrain the output of a machine to always be positive.
For numerical stability the implementation reverts to the linear function
when .
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_softplus() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_softplus() input <- torch_randn(2) output <- m(input) }
Applies the soft shrinkage function elementwise:
nn_softshrink(lambd = 0.5)
nn_softshrink(lambd = 0.5)
lambd |
the |
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_softshrink() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_softshrink() input <- torch_randn(2) output <- m(input) }
Applies the element-wise function:
nn_softsign()
nn_softsign()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_softsign() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_softsign() input <- torch_randn(2) output <- m(input) }
Applies the element-wise function:
nn_tanh()
nn_tanh()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_tanh() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_tanh() input <- torch_randn(2) output <- m(input) }
Applies the element-wise function:
nn_tanhshrink()
nn_tanhshrink()
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_tanhshrink() input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_tanhshrink() input <- torch_randn(2) output <- m(input) }
Thresholds each element of the input Tensor.
nn_threshold(threshold, value, inplace = FALSE)
nn_threshold(threshold, value, inplace = FALSE)
threshold |
The value to threshold at |
value |
The value to replace with |
inplace |
can optionally do the operation in-place. Default: |
Threshold is defined as:
Input: where
*
means, any number of additional
dimensions
Output: , same shape as the input
if (torch_is_installed()) { m <- nn_threshold(0.1, 20) input <- torch_randn(2) output <- m(input) }
if (torch_is_installed()) { m <- nn_threshold(0.1, 20) input <- torch_randn(2) output <- m(input) }
Creates a criterion that measures the triplet loss given an input
tensors ,
,
and a margin with a value greater than
.
This is used for measuring a relative similarity between samples. A triplet
is composed by
a
, p
and n
(i.e., anchor
, positive examples
and negative examples
respectively). The shapes of all input tensors should be
.
nn_triplet_margin_loss( margin = 1, p = 2, eps = 1e-06, swap = FALSE, reduction = "mean" )
nn_triplet_margin_loss( margin = 1, p = 2, eps = 1e-06, swap = FALSE, reduction = "mean" )
margin |
(float, optional): Default: |
p |
(int, optional): The norm degree for pairwise distance. Default: |
eps |
constant to avoid NaN's |
swap |
(bool, optional): The distance swap is described in detail in the paper
Learning shallow convolutional feature descriptors with triplet losses by
V. Balntas, E. Riba et al. Default: |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.
The loss function for each sample in the mini-batch is:
where
See also nn_triplet_margin_with_distance_loss()
, which computes the
triplet margin loss for input tensors using a custom distance function.
Input: where
is the vector dimension.
Output: A Tensor of shape if
reduction
is 'none'
, or a scalar
otherwise.
if (torch_is_installed()) { triplet_loss <- nn_triplet_margin_loss(margin = 1, p = 2) anchor <- torch_randn(100, 128, requires_grad = TRUE) positive <- torch_randn(100, 128, requires_grad = TRUE) negative <- torch_randn(100, 128, requires_grad = TRUE) output <- triplet_loss(anchor, positive, negative) output$backward() }
if (torch_is_installed()) { triplet_loss <- nn_triplet_margin_loss(margin = 1, p = 2) anchor <- torch_randn(100, 128, requires_grad = TRUE) positive <- torch_randn(100, 128, requires_grad = TRUE) negative <- torch_randn(100, 128, requires_grad = TRUE) output <- triplet_loss(anchor, positive, negative) output$backward() }
Creates a criterion that measures the triplet loss given input
tensors ,
, and
(representing anchor,
positive, and negative examples, respectively), and a nonnegative,
real-valued function ("distance function") used to compute the relationship
between the anchor and positive example ("positive distance") and the
anchor and negative example ("negative distance").
nn_triplet_margin_with_distance_loss( distance_function = NULL, margin = 1, swap = FALSE, reduction = "mean" )
nn_triplet_margin_with_distance_loss( distance_function = NULL, margin = 1, swap = FALSE, reduction = "mean" )
distance_function |
(callable, optional): A nonnegative, real-valued function that
quantifies the closeness of two tensors. If not specified,
|
margin |
(float, optional): A non-negative margin representing the minimum difference
between the positive and negative distances required for the loss to be 0. Larger
margins penalize cases where the negative examples are not distant enough from the
anchors, relative to the positives. Default: |
swap |
(bool, optional): Whether to use the distance swap described in the paper
Learning shallow convolutional feature descriptors with triplet losses by
V. Balntas, E. Riba et al. If TRUE, and if the positive example is closer to the
negative example than the anchor is, swaps the positive example and the anchor in
the loss computation. Default: |
reduction |
(string, optional): Specifies the (optional) reduction to apply to the output:
|
The unreduced loss (i.e., with reduction
set to 'none'
)
can be described as:
where is the batch size;
is a nonnegative, real-valued function
quantifying the closeness of two tensors, referred to as the
distance_function
;
and is a non-negative margin representing the minimum difference
between the positive and negative distances that is required for the loss to
be 0. The input tensors have
elements each and can be of any shape
that the distance function can handle.
If
reduction
is not 'none'
(default 'mean'
), then:
See also nn_triplet_margin_loss()
, which computes the triplet
loss for input tensors using the distance as the distance function.
Input: where
represents any number of additional dimensions
as supported by the distance function.
Output: A Tensor of shape if
reduction
is 'none'
, or a scalar
otherwise.
if (torch_is_installed()) { # Initialize embeddings embedding <- nn_embedding(1000, 128) anchor_ids <- torch_randint(1, 1000, 1, dtype = torch_long()) positive_ids <- torch_randint(1, 1000, 1, dtype = torch_long()) negative_ids <- torch_randint(1, 1000, 1, dtype = torch_long()) anchor <- embedding(anchor_ids) positive <- embedding(positive_ids) negative <- embedding(negative_ids) # Built-in Distance Function triplet_loss <- nn_triplet_margin_with_distance_loss( distance_function = nn_pairwise_distance() ) output <- triplet_loss(anchor, positive, negative) # Custom Distance Function l_infinity <- function(x1, x2) { torch_max(torch_abs(x1 - x2), dim = 1)[[1]] } triplet_loss <- nn_triplet_margin_with_distance_loss( distance_function = l_infinity, margin = 1.5 ) output <- triplet_loss(anchor, positive, negative) # Custom Distance Function (Lambda) triplet_loss <- nn_triplet_margin_with_distance_loss( distance_function = function(x, y) { 1 - nnf_cosine_similarity(x, y) } ) output <- triplet_loss(anchor, positive, negative) }
if (torch_is_installed()) { # Initialize embeddings embedding <- nn_embedding(1000, 128) anchor_ids <- torch_randint(1, 1000, 1, dtype = torch_long()) positive_ids <- torch_randint(1, 1000, 1, dtype = torch_long()) negative_ids <- torch_randint(1, 1000, 1, dtype = torch_long()) anchor <- embedding(anchor_ids) positive <- embedding(positive_ids) negative <- embedding(negative_ids) # Built-in Distance Function triplet_loss <- nn_triplet_margin_with_distance_loss( distance_function = nn_pairwise_distance() ) output <- triplet_loss(anchor, positive, negative) # Custom Distance Function l_infinity <- function(x1, x2) { torch_max(torch_abs(x1 - x2), dim = 1)[[1]] } triplet_loss <- nn_triplet_margin_with_distance_loss( distance_function = l_infinity, margin = 1.5 ) output <- triplet_loss(anchor, positive, negative) # Custom Distance Function (Lambda) triplet_loss <- nn_triplet_margin_with_distance_loss( distance_function = function(x, y) { 1 - nnf_cosine_similarity(x, y) } ) output <- triplet_loss(anchor, positive, negative) }
Unflattens a tensor dim expanding it to a desired shape. For use with [nn_sequential.
nn_unflatten(dim, unflattened_size)
nn_unflatten(dim, unflattened_size)
dim |
Dimension to be unflattened |
unflattened_size |
New shape of the unflattened dimension |
if (torch_is_installed()) { input <- torch_randn(2, 50) m <- nn_sequential( nn_linear(50, 50), nn_unflatten(2, c(2, 5, 5)) ) output <- m(input) output$size() }
if (torch_is_installed()) { input <- torch_randn(2, 50) m <- nn_sequential( nn_linear(50, 50), nn_unflatten(2, c(2, 5, 5)) ) output <- m(input) output$size() }
Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. The input data is assumed to be of the form minibatch x channels x optional depth x optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor.
nn_upsample( size = NULL, scale_factor = NULL, mode = "nearest", align_corners = NULL )
nn_upsample( size = NULL, scale_factor = NULL, mode = "nearest", align_corners = NULL )
size |
(int or |
scale_factor |
(float or |
mode |
(str, optional): the upsampling algorithm: one of |
align_corners |
(bool, optional): if |
The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.
One can either give a scale_factor or the target output size to calculate the output size. (You cannot give both, as it is ambiguous)
if (torch_is_installed()) { input <- torch_arange(start = 1, end = 4, dtype = torch_float())$view(c(1, 1, 2, 2)) nn_upsample(scale_factor = c(2), mode = "nearest")(input) nn_upsample(scale_factor = c(2, 2), mode = "nearest")(input) }
if (torch_is_installed()) { input <- torch_arange(start = 1, end = 4, dtype = torch_float())$view(c(1, 1, 2, 2)) nn_upsample(scale_factor = c(2), mode = "nearest")(input) nn_upsample(scale_factor = c(2, 2), mode = "nearest")(input) }
The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.
nn_utils_clip_grad_norm_(parameters, max_norm, norm_type = 2)
nn_utils_clip_grad_norm_(parameters, max_norm, norm_type = 2)
parameters |
(IterableTensor or Tensor): an iterable of Tensors or a single Tensor that will have gradients normalized |
max_norm |
(float or int): max norm of the gradients |
norm_type |
(float or int): type of the used p-norm. Can be |
Total norm of the parameters (viewed as a single vector).
Gradients are modified in-place.
nn_utils_clip_grad_value_(parameters, clip_value)
nn_utils_clip_grad_value_(parameters, clip_value)
parameters |
(Iterable(Tensor) or Tensor): an iterable of Tensors or a single Tensor that will have gradients normalized |
clip_value |
(float or int): maximum allowed value of the gradients. |
The gradients are clipped in the range
input
can be of size T x B x *
where T
is the length of the
longest sequence (equal to lengths[1]
), B
is the batch size, and
*
is any number of dimensions (including 0). If batch_first
is
TRUE
, B x T x *
input
is expected.
nn_utils_rnn_pack_padded_sequence( input, lengths, batch_first = FALSE, enforce_sorted = TRUE )
nn_utils_rnn_pack_padded_sequence( input, lengths, batch_first = FALSE, enforce_sorted = TRUE )
input |
(Tensor): padded batch of variable length sequences. |
lengths |
(Tensor): list of sequences lengths of each batch element. |
batch_first |
(bool, optional): if |
enforce_sorted |
(bool, optional): if |
For unsorted sequences, use enforce_sorted = FALSE
. If enforce_sorted
is
TRUE
, the sequences should be sorted by length in a decreasing order, i.e.
input[,1]
should be the longest sequence, and input[,B]
the shortest
one. enforce_sorted = TRUE
is only necessary for ONNX export.
a PackedSequence
object
This function accepts any input that has at least two dimensions. You
can apply it to pack the labels, and use the output of the RNN with
them to compute the loss directly. A Tensor can be retrieved from
a PackedSequence
object by accessing its .data
attribute.
sequences
should be a list of Tensors of size L x *
, where L
is
the length of a sequence and *
is any number of trailing dimensions,
including zero.
nn_utils_rnn_pack_sequence(sequences, enforce_sorted = TRUE)
nn_utils_rnn_pack_sequence(sequences, enforce_sorted = TRUE)
sequences |
|
enforce_sorted |
(bool, optional): if |
For unsorted sequences, use enforce_sorted = FALSE
. If enforce_sorted
is TRUE
, the sequences should be sorted in the order of decreasing length.
enforce_sorted = TRUE
is only necessary for ONNX export.
a PackedSequence
object
if (torch_is_installed()) { x <- torch_tensor(c(1, 2, 3), dtype = torch_long()) y <- torch_tensor(c(4, 5), dtype = torch_long()) z <- torch_tensor(c(6), dtype = torch_long()) p <- nn_utils_rnn_pack_sequence(list(x, y, z)) }
if (torch_is_installed()) { x <- torch_tensor(c(1, 2, 3), dtype = torch_long()) y <- torch_tensor(c(4, 5), dtype = torch_long()) z <- torch_tensor(c(6), dtype = torch_long()) p <- nn_utils_rnn_pack_sequence(list(x, y, z)) }
It is an inverse operation to nn_utils_rnn_pack_padded_sequence()
.
nn_utils_rnn_pad_packed_sequence( sequence, batch_first = FALSE, padding_value = 0, total_length = NULL )
nn_utils_rnn_pad_packed_sequence( sequence, batch_first = FALSE, padding_value = 0, total_length = NULL )
sequence |
(PackedSequence): batch to pad |
batch_first |
(bool, optional): if |
padding_value |
(float, optional): values for padded elements. |
total_length |
(int, optional): if not |
The returned Tensor's data will be of size T x B x *
, where T
is the length
of the longest sequence and B
is the batch size. If batch_first
is TRUE
,
the data will be transposed into B x T x *
format.
Tuple of Tensor containing the padded sequence, and a Tensor
containing the list of lengths of each sequence in the batch.
Batch elements will be re-ordered as they were ordered originally when
the batch was passed to nn_utils_rnn_pack_padded_sequence()
or
nn_utils_rnn_pack_sequence()
.
total_length
is useful to implement the
pack sequence -> recurrent network -> unpack sequence
pattern in a
nn_module
wrapped in ~torch.nn.DataParallel
.
if (torch_is_installed()) { seq <- torch_tensor(rbind(c(1, 2, 0), c(3, 0, 0), c(4, 5, 6))) lens <- c(2, 1, 3) packed <- nn_utils_rnn_pack_padded_sequence(seq, lens, batch_first = TRUE, enforce_sorted = FALSE ) packed nn_utils_rnn_pad_packed_sequence(packed, batch_first = TRUE) }
if (torch_is_installed()) { seq <- torch_tensor(rbind(c(1, 2, 0), c(3, 0, 0), c(4, 5, 6))) lens <- c(2, 1, 3) packed <- nn_utils_rnn_pack_padded_sequence(seq, lens, batch_first = TRUE, enforce_sorted = FALSE ) packed nn_utils_rnn_pad_packed_sequence(packed, batch_first = TRUE) }
padding_value
pad_sequence
stacks a list of Tensors along a new dimension,
and pads them to equal length. For example, if the input is list of
sequences with size L x *
and if batch_first is False, and T x B x *
otherwise.
nn_utils_rnn_pad_sequence(sequences, batch_first = FALSE, padding_value = 0)
nn_utils_rnn_pad_sequence(sequences, batch_first = FALSE, padding_value = 0)
sequences |
|
batch_first |
(bool, optional): output will be in |
padding_value |
(float, optional): value for padded elements. Default: 0. |
B
is batch size. It is equal to the number of elements in sequences
.
T
is length of the longest sequence.
L
is length of the sequence.
*
is any number of trailing dimensions, including none.
Tensor of size T x B x *
if batch_first
is FALSE
.
Tensor of size B x T x *
otherwise
This function returns a Tensor of size T x B x *
or B x T x *
where T
is the length of the longest sequence. This function assumes
trailing dimensions and type of all the Tensors in sequences are same.
if (torch_is_installed()) { a <- torch_ones(25, 300) b <- torch_ones(22, 300) c <- torch_ones(15, 300) nn_utils_rnn_pad_sequence(list(a, b, c))$size() }
if (torch_is_installed()) { a <- torch_ones(25, 300) b <- torch_ones(22, 300) c <- torch_ones(15, 300) nn_utils_rnn_pad_sequence(list(a, b, c))$size() }
Applies weight normalization to a parameter in the given module.
\eqn{\mathbf{w} = g \dfrac{\mathbf{v}}{\|\mathbf{v}\|}}
Weight normalization is a reparameterization that decouples the magnitude
of a weight tensor from its direction. This replaces the parameter specified
by name
(e.g. 'weight'
) with two parameters: one specifying the
magnitude (e.g. 'weight_g'
) and one specifying the direction
(e.g. 'weight_v'
).
The original module with the weight_v and weight_g paramters.
new()
nn_utils_weight_norm$new(name, dim)
name
(str, optional): name of weight parameter
dim
(int, optional): dimension over which to compute the norm
compute_weight()
nn_utils_weight_norm$compute_weight(module, name = NULL, dim = NULL)
module
(Module): containing module
name
(str, optional): name of weight parameter
dim
(int, optional): dimension over which to compute the norm
apply()
nn_utils_weight_norm$apply(module, name = NULL, dim = NULL)
module
(Module): containing module
name
(str, optional): name of weight parameter
dim
(int, optional): dimension over which to compute the norm
call()
nn_utils_weight_norm$call(module)
module
(Module): containing module
recompute()
nn_utils_weight_norm$recompute(module)
module
(Module): containing module
remove()
nn_utils_weight_norm$remove(module, name = NULL)
module
(Module): containing module
name
(str, optional): name of weight parameter
clone()
The objects of this class are cloneable with this method.
nn_utils_weight_norm$clone(deep = FALSE)
deep
Whether to make a deep clone.
The pytorch Weight normalization is implemented via a hook that recomputes
the weight tensor from the magnitude and direction before every forward()
call. Since torch for R still do not support hooks, the weight recomputation
need to be done explicitly inside the forward()
definition trough a call of
the recompute()
method. See examples.
By default, with dim = 0
, the norm is computed independently per output
channel/plane. To compute a norm over the entire weight tensor, use
dim = NULL
.
@references https://arxiv.org/abs/1602.07868
if (torch_is_installed()) { x = nn_linear(in_features = 20, out_features = 40) weight_norm = nn_utils_weight_norm$new(name = 'weight', dim = 2) weight_norm$apply(x) x$weight_g$size() x$weight_v$size() x$weight # the recompute() method recomputes the weight using g and v. It must be called # explicitly inside `forward()`. weight_norm$recompute(x) }
if (torch_is_installed()) { x = nn_linear(in_features = 20, out_features = 40) weight_norm = nn_utils_weight_norm$new(name = 'weight', dim = 2) weight_norm$apply(x) x$weight_g$size() x$weight_v$size() x$weight # the recompute() method recomputes the weight using g and v. It must be called # explicitly inside `forward()`. weight_norm$recompute(x) }
Applies a 1D adaptive average pooling over an input signal composed of several input planes.
nnf_adaptive_avg_pool1d(input, output_size)
nnf_adaptive_avg_pool1d(input, output_size)
input |
input tensor of shape (minibatch , in_channels , iW) |
output_size |
the target output size (single integer) |
Applies a 2D adaptive average pooling over an input signal composed of several input planes.
nnf_adaptive_avg_pool2d(input, output_size)
nnf_adaptive_avg_pool2d(input, output_size)
input |
input tensor (minibatch, in_channels , iH , iW) |
output_size |
the target output size (single integer or double-integer tuple) |
Applies a 3D adaptive average pooling over an input signal composed of several input planes.
nnf_adaptive_avg_pool3d(input, output_size)
nnf_adaptive_avg_pool3d(input, output_size)
input |
input tensor (minibatch, in_channels , iT * iH , iW) |
output_size |
the target output size (single integer or triple-integer tuple) |
Applies a 1D adaptive max pooling over an input signal composed of several input planes.
nnf_adaptive_max_pool1d(input, output_size, return_indices = FALSE)
nnf_adaptive_max_pool1d(input, output_size, return_indices = FALSE)
input |
input tensor of shape (minibatch , in_channels , iW) |
output_size |
the target output size (single integer) |
return_indices |
whether to return pooling indices. Default: |
Applies a 2D adaptive max pooling over an input signal composed of several input planes.
nnf_adaptive_max_pool2d(input, output_size, return_indices = FALSE)
nnf_adaptive_max_pool2d(input, output_size, return_indices = FALSE)
input |
input tensor (minibatch, in_channels , iH , iW) |
output_size |
the target output size (single integer or double-integer tuple) |
return_indices |
whether to return pooling indices. Default: |
Applies a 3D adaptive max pooling over an input signal composed of several input planes.
nnf_adaptive_max_pool3d(input, output_size, return_indices = FALSE)
nnf_adaptive_max_pool3d(input, output_size, return_indices = FALSE)
input |
input tensor (minibatch, in_channels , iT * iH , iW) |
output_size |
the target output size (single integer or triple-integer tuple) |
return_indices |
whether to return pooling indices. Default: |
Generates a 2D or 3D flow field (sampling grid), given a batch of
affine matrices theta
.
nnf_affine_grid(theta, size, align_corners = FALSE)
nnf_affine_grid(theta, size, align_corners = FALSE)
theta |
(Tensor) input batch of affine matrices with shape
( |
size |
(torch.Size) the target output image size. ( |
align_corners |
(bool, optional) if |
This function is often used in conjunction with nnf_grid_sample()
to build Spatial Transformer Networks
_ .
Applies alpha dropout to the input.
nnf_alpha_dropout(input, p = 0.5, training = FALSE, inplace = FALSE)
nnf_alpha_dropout(input, p = 0.5, training = FALSE, inplace = FALSE)
input |
the input tensor |
p |
probability of an element to be zeroed. Default: 0.5 |
training |
apply dropout if is |
inplace |
If set to |
Applies a 1D average pooling over an input signal composed of several input planes.
nnf_avg_pool1d( input, kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE )
nnf_avg_pool1d( input, kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE )
input |
input tensor of shape (minibatch , in_channels , iW) |
kernel_size |
the size of the window. Can be a single number or a
tuple |
stride |
the stride of the window. Can be a single number or a tuple
|
padding |
implicit zero paddings on both sides of the input. Can be a
single number or a tuple |
ceil_mode |
when True, will use |
count_include_pad |
when True, will include the zero-padding in the
averaging calculation. Default: |
Applies 2D average-pooling operation in regions by step size
steps. The number of output features is equal to the number of
input planes.
nnf_avg_pool2d( input, kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
nnf_avg_pool2d( input, kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
input |
input tensor (minibatch, in_channels , iH , iW) |
kernel_size |
size of the pooling region. Can be a single number or a
tuple |
stride |
stride of the pooling operation. Can be a single number or a
tuple |
padding |
implicit zero paddings on both sides of the input. Can be a
single number or a tuple |
ceil_mode |
when True, will use |
count_include_pad |
when True, will include the zero-padding in the
averaging calculation. Default: |
divisor_override |
if specified, it will be used as divisor, otherwise
size of the pooling region will be used. Default: |
Applies 3D average-pooling operation in regions by step
size
steps. The number of output features is equal to
.
nnf_avg_pool3d( input, kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
nnf_avg_pool3d( input, kernel_size, stride = NULL, padding = 0, ceil_mode = FALSE, count_include_pad = TRUE, divisor_override = NULL )
input |
input tensor (minibatch, in_channels , iT * iH , iW) |
kernel_size |
size of the pooling region. Can be a single number or a
tuple |
stride |
stride of the pooling operation. Can be a single number or a
tuple |
padding |
implicit zero paddings on both sides of the input. Can be a
single number or a tuple |
ceil_mode |
when True, will use |
count_include_pad |
when True, will include the zero-padding in the averaging calculation |
divisor_override |
NA if specified, it will be used as divisor, otherwise
size of the pooling region will be used. Default: |
Applies Batch Normalization for each channel across a batch of data.
nnf_batch_norm( input, running_mean, running_var, weight = NULL, bias = NULL, training = FALSE, momentum = 0.1, eps = 1e-05 )
nnf_batch_norm( input, running_mean, running_var, weight = NULL, bias = NULL, training = FALSE, momentum = 0.1, eps = 1e-05 )
input |
input tensor |
running_mean |
the running_mean tensor |
running_var |
the running_var tensor |
weight |
the weight tensor |
bias |
the bias tensor |
training |
bool wether it's training. Default: FALSE |
momentum |
the value used for the |
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
Applies a bilinear transformation to the incoming data:
nnf_bilinear(input1, input2, weight, bias = NULL)
nnf_bilinear(input1, input2, weight, bias = NULL)
input1 |
|
input2 |
|
weight |
|
bias |
|
output where
and all but the last dimension are the same shape as the input.
Function that measures the Binary Cross Entropy between the target and the output.
nnf_binary_cross_entropy( input, target, weight = NULL, reduction = c("mean", "sum", "none") )
nnf_binary_cross_entropy( input, target, weight = NULL, reduction = c("mean", "sum", "none") )
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
weight |
(tensor) weight for each value. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Function that measures Binary Cross Entropy between target and output logits.
nnf_binary_cross_entropy_with_logits( input, target, weight = NULL, reduction = c("mean", "sum", "none"), pos_weight = NULL )
nnf_binary_cross_entropy_with_logits( input, target, weight = NULL, reduction = c("mean", "sum", "none"), pos_weight = NULL )
input |
Tensor of arbitrary shape |
target |
Tensor of the same shape as input |
weight |
(Tensor, optional) a manual rescaling weight if provided it's repeated to match input tensor shape. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
pos_weight |
(Tensor, optional) a weight of positive examples. Must be a vector with length equal to the number of classes. |
Applies element-wise, .
nnf_celu(input, alpha = 1, inplace = FALSE) nnf_celu_(input, alpha = 1)
nnf_celu(input, alpha = 1, inplace = FALSE) nnf_celu_(input, alpha = 1)
input |
(N,*) tensor, where * means, any number of additional dimensions |
alpha |
the alpha value for the CELU formulation. Default: 1.0 |
inplace |
can optionally do the operation in-place. Default: FALSE |
Applies the SparseMax activation.
nnf_contrib_sparsemax(input, dim = -1)
nnf_contrib_sparsemax(input, dim = -1)
input |
the input tensor |
dim |
The dimension over which to apply the sparsemax function. (-1) |
The SparseMax activation is described in 'From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification' The implementation is based on aced125/sparsemax
Applies a 1-dimensional sequence convolution over an input sequence. Input and output dimensions are (Time, Batch, Channels) - hence TBC.
nnf_conv_tbc(input, weight, bias, pad = 0)
nnf_conv_tbc(input, weight, bias, pad = 0)
input |
input tensor of shape |
weight |
filter of shape ( |
bias |
bias of shape ( |
pad |
number of timesteps to pad. Default: 0 |
Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".
nnf_conv_transpose1d( input, weight, bias = NULL, stride = 1, padding = 0, output_padding = 0, groups = 1, dilation = 1 )
nnf_conv_transpose1d( input, weight, bias = NULL, stride = 1, padding = 0, output_padding = 0, groups = 1, dilation = 1 )
input |
input tensor of shape (minibatch, in_channels , iW) |
weight |
filters of shape (out_channels, in_channels/groups , kW) |
bias |
optional bias of shape (out_channels). Default: |
stride |
the stride of the convolving kernel. Can be a single number or
a one-element tuple |
padding |
implicit paddings on both sides of the input. Can be a
single number or a one-element tuple |
output_padding |
padding applied to the output |
groups |
split input into groups, |
dilation |
the spacing between kernel elements. Can be a single number or
a one-element tuple |
Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".
nnf_conv_transpose2d( input, weight, bias = NULL, stride = 1, padding = 0, output_padding = 0, groups = 1, dilation = 1 )
nnf_conv_transpose2d( input, weight, bias = NULL, stride = 1, padding = 0, output_padding = 0, groups = 1, dilation = 1 )
input |
input tensor of shape (minibatch, in_channels, iH , iW) |
weight |
filters of shape (out_channels , in_channels/groups, kH , kW) |
bias |
optional bias tensor of shape (out_channels). Default: |
stride |
the stride of the convolving kernel. Can be a single number or a
tuple |
padding |
implicit paddings on both sides of the input. Can be a
single number or a tuple |
output_padding |
padding applied to the output |
groups |
split input into groups, |
dilation |
the spacing between kernel elements. Can be a single number or
a tuple |
Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"
nnf_conv_transpose3d( input, weight, bias = NULL, stride = 1, padding = 0, output_padding = 0, groups = 1, dilation = 1 )
nnf_conv_transpose3d( input, weight, bias = NULL, stride = 1, padding = 0, output_padding = 0, groups = 1, dilation = 1 )
input |
input tensor of shape (minibatch, in_channels , iT , iH , iW) |
weight |
filters of shape (out_channels , in_channels/groups, kT , kH , kW) |
bias |
optional bias tensor of shape (out_channels). Default: |
stride |
the stride of the convolving kernel. Can be a single number or a
tuple |
padding |
implicit paddings on both sides of the input. Can be a
single number or a tuple |
output_padding |
padding applied to the output |
groups |
split input into groups, |
dilation |
the spacing between kernel elements. Can be a single number or
a tuple |
Applies a 1D convolution over an input signal composed of several input planes.
nnf_conv1d( input, weight, bias = NULL, stride = 1, padding = 0, dilation = 1, groups = 1 )
nnf_conv1d( input, weight, bias = NULL, stride = 1, padding = 0, dilation = 1, groups = 1 )
input |
input tensor of shape (minibatch, in_channels , iW) |
weight |
filters of shape (out_channels, in_channels/groups , kW) |
bias |
optional bias of shape (out_channels). Default: |
stride |
the stride of the convolving kernel. Can be a single number or
a one-element tuple |
padding |
implicit paddings on both sides of the input. Can be a
single number or a one-element tuple |
dilation |
the spacing between kernel elements. Can be a single number or
a one-element tuple |
groups |
split input into groups, |
Applies a 2D convolution over an input image composed of several input planes.
nnf_conv2d( input, weight, bias = NULL, stride = 1, padding = 0, dilation = 1, groups = 1 )
nnf_conv2d( input, weight, bias = NULL, stride = 1, padding = 0, dilation = 1, groups = 1 )
input |
input tensor of shape (minibatch, in_channels, iH , iW) |
weight |
filters of shape (out_channels , in_channels/groups, kH , kW) |
bias |
optional bias tensor of shape (out_channels). Default: |
stride |
the stride of the convolving kernel. Can be a single number or a
tuple |
padding |
implicit paddings on both sides of the input. Can be a
single number or a tuple |
dilation |
the spacing between kernel elements. Can be a single number or
a tuple |
groups |
split input into groups, |
Applies a 3D convolution over an input image composed of several input planes.
nnf_conv3d( input, weight, bias = NULL, stride = 1, padding = 0, dilation = 1, groups = 1 )
nnf_conv3d( input, weight, bias = NULL, stride = 1, padding = 0, dilation = 1, groups = 1 )
input |
input tensor of shape (minibatch, in_channels , iT , iH , iW) |
weight |
filters of shape (out_channels , in_channels/groups, kT , kH , kW) |
bias |
optional bias tensor of shape (out_channels). Default: |
stride |
the stride of the convolving kernel. Can be a single number or a
tuple |
padding |
implicit paddings on both sides of the input. Can be a
single number or a tuple |
dilation |
the spacing between kernel elements. Can be a single number or
a tuple |
groups |
split input into groups, |
Creates a criterion that measures the loss given input tensors x_1, x_2 and a Tensor label y with values 1 or -1. This is used for measuring whether two inputs are similar or dissimilar, using the cosine distance, and is typically used for learning nonlinear embeddings or semi-supervised learning.
nnf_cosine_embedding_loss( input1, input2, target, margin = 0, reduction = c("mean", "sum", "none") )
nnf_cosine_embedding_loss( input1, input2, target, margin = 0, reduction = c("mean", "sum", "none") )
input1 |
the input x_1 tensor |
input2 |
the input x_2 tensor |
target |
the target tensor |
margin |
Should be a number from -1 to 1 , 0 to 0.5 is suggested. If margin is missing, the default value is 0. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Returns cosine similarity between x1 and x2, computed along dim.
nnf_cosine_similarity(x1, x2, dim = 2, eps = 1e-08)
nnf_cosine_similarity(x1, x2, dim = 2, eps = 1e-08)
x1 |
(Tensor) First input. |
x2 |
(Tensor) Second input (of size matching x1). |
dim |
(int, optional) Dimension of vectors. Default: 2 |
eps |
(float, optional) Small value to avoid division by zero. Default: 1e-8 |
This criterion combines log_softmax
and nll_loss
in a single
function.
nnf_cross_entropy( input, target, weight = NULL, ignore_index = -100, reduction = c("mean", "sum", "none") )
nnf_cross_entropy( input, target, weight = NULL, ignore_index = -100, reduction = c("mean", "sum", "none") )
input |
(Tensor) |
target |
(Tensor) |
weight |
(Tensor, optional) a manual rescaling weight given to each class. If
given, has to be a Tensor of size |
ignore_index |
(int, optional) Specifies a target value that is ignored and does not contribute to the input gradient. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
The Connectionist Temporal Classification loss.
nnf_ctc_loss( log_probs, targets, input_lengths, target_lengths, blank = 0, reduction = c("mean", "sum", "none"), zero_infinity = FALSE )
nnf_ctc_loss( log_probs, targets, input_lengths, target_lengths, blank = 0, reduction = c("mean", "sum", "none"), zero_infinity = FALSE )
log_probs |
|
targets |
|
input_lengths |
|
target_lengths |
|
blank |
(int, optional) Blank label. Default |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
zero_infinity |
(bool, optional) Whether to zero infinite losses and the
associated gradients. Default: |
During training, randomly zeroes some of the elements of the input
tensor with probability p
using samples from a Bernoulli
distribution.
nnf_dropout(input, p = 0.5, training = TRUE, inplace = FALSE)
nnf_dropout(input, p = 0.5, training = TRUE, inplace = FALSE)
input |
the input tensor |
p |
probability of an element to be zeroed. Default: 0.5 |
training |
apply dropout if is |
inplace |
If set to |
Randomly zero out entire channels (a channel is a 2D feature map,
e.g., the -th channel of the
-th sample in the
batched input is a 2D tensor
) of the input tensor).
Each channel will be zeroed out independently on every forward call with
probability
p
using samples from a Bernoulli distribution.
nnf_dropout2d(input, p = 0.5, training = TRUE, inplace = FALSE)
nnf_dropout2d(input, p = 0.5, training = TRUE, inplace = FALSE)
input |
the input tensor |
p |
probability of a channel to be zeroed. Default: 0.5 |
training |
apply dropout if is |
inplace |
If set to |
Randomly zero out entire channels (a channel is a 3D feature map,
e.g., the -th channel of the
-th sample in the
batched input is a 3D tensor
) of the input tensor).
Each channel will be zeroed out independently on every forward call with
probability
p
using samples from a Bernoulli distribution.
nnf_dropout3d(input, p = 0.5, training = TRUE, inplace = FALSE)
nnf_dropout3d(input, p = 0.5, training = TRUE, inplace = FALSE)
input |
the input tensor |
p |
probability of a channel to be zeroed. Default: 0.5 |
training |
apply dropout if is |
inplace |
If set to |
Applies element-wise,
.
nnf_elu(input, alpha = 1, inplace = FALSE) nnf_elu_(input, alpha = 1)
nnf_elu(input, alpha = 1, inplace = FALSE) nnf_elu_(input, alpha = 1)
input |
(N,*) tensor, where * means, any number of additional dimensions |
alpha |
the alpha value for the ELU formulation. Default: 1.0 |
inplace |
can optionally do the operation in-place. Default: FALSE |
if (torch_is_installed()) { x <- torch_randn(2, 2) y <- nnf_elu(x, alpha = 1) nnf_elu_(x, alpha = 1) torch_equal(x, y) }
if (torch_is_installed()) { x <- torch_randn(2, 2) y <- nnf_elu(x, alpha = 1) nnf_elu_(x, alpha = 1) torch_equal(x, y) }
A simple lookup table that looks up embeddings in a fixed dictionary and size.
nnf_embedding( input, weight, padding_idx = NULL, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, sparse = FALSE )
nnf_embedding( input, weight, padding_idx = NULL, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, sparse = FALSE )
input |
(LongTensor) Tensor containing indices into the embedding matrix |
weight |
(Tensor) The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size |
padding_idx |
(int, optional) If given, pads the output with the embedding
vector at |
max_norm |
(float, optional) If given, each embedding vector with norm larger
than |
norm_type |
(float, optional) The p of the p-norm to compute for the |
scale_grad_by_freq |
(boolean, optional) If given, this will scale gradients
by the inverse of frequency of the words in the mini-batch. Default |
sparse |
(bool, optional) If |
This module is often used to retrieve word embeddings using indices. The input to the module is a list of indices, and the embedding matrix, and the output is the corresponding word embeddings.
Computes sums, means or maxes of bags
of embeddings, without instantiating the
intermediate embeddings.
nnf_embedding_bag( input, weight, offsets = NULL, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, mode = "mean", sparse = FALSE, per_sample_weights = NULL, include_last_offset = FALSE, padding_idx = NULL )
nnf_embedding_bag( input, weight, offsets = NULL, max_norm = NULL, norm_type = 2, scale_grad_by_freq = FALSE, mode = "mean", sparse = FALSE, per_sample_weights = NULL, include_last_offset = FALSE, padding_idx = NULL )
input |
(LongTensor) Tensor containing bags of indices into the embedding matrix |
weight |
(Tensor) The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size |
offsets |
(LongTensor, optional) Only used when |
max_norm |
(float, optional) If given, each embedding vector with norm
larger than |
norm_type |
(float, optional) The |
scale_grad_by_freq |
(boolean, optional) if given, this will scale gradients
by the inverse of frequency of the words in the mini-batch. Default |
mode |
(string, optional) |
sparse |
(bool, optional) if |
per_sample_weights |
(Tensor, optional) a tensor of float / double weights,
or NULL to indicate all weights should be taken to be 1. If specified,
|
include_last_offset |
(bool, optional) if |
padding_idx |
(int, optional) If given, pads the output with the embedding
vector at |
Combines an array of sliding local blocks into a large containing tensor.
nnf_fold( input, output_size, kernel_size, dilation = 1, padding = 0, stride = 1 )
nnf_fold( input, output_size, kernel_size, dilation = 1, padding = 0, stride = 1 )
input |
the input tensor |
output_size |
the shape of the spatial dimensions of the output (i.e.,
|
kernel_size |
the size of the sliding blocks |
dilation |
a parameter that controls the stride of elements within the neighborhood. Default: 1 |
padding |
implicit zero padding to be added on both sides of input. Default: 0 |
stride |
the stride of the sliding blocks in the input spatial dimensions. Default: 1 |
Currently, only 4-D output tensors (batched image-like tensors) are supported.
Applies 2D fractional max pooling over an input signal composed of several input planes.
nnf_fractional_max_pool2d( input, kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE, random_samples = NULL )
nnf_fractional_max_pool2d( input, kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE, random_samples = NULL )
input |
the input tensor |
kernel_size |
the size of the window to take a max over. Can be a
single number |
output_size |
the target output size of the image of the form |
output_ratio |
If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1) |
return_indices |
if |
random_samples |
optional random samples. |
Fractional MaxPooling is described in detail in the paper Fractional MaxPooling
_ by Ben Graham
The max-pooling operation is applied in regions by a stochastic
step size determined by the target output size.
The number of output features is equal to the number of input planes.
Applies 3D fractional max pooling over an input signal composed of several input planes.
nnf_fractional_max_pool3d( input, kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE, random_samples = NULL )
nnf_fractional_max_pool3d( input, kernel_size, output_size = NULL, output_ratio = NULL, return_indices = FALSE, random_samples = NULL )
input |
the input tensor |
kernel_size |
the size of the window to take a max over. Can be a single number |
output_size |
the target output size of the form |
output_ratio |
If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1) |
return_indices |
if |
random_samples |
undocumented argument. |
Fractional MaxPooling is described in detail in the paper Fractional MaxPooling
_ by Ben Graham
The max-pooling operation is applied in regions by a stochastic
step size determined by the target output size.
The number of output features is equal to the number of input planes.
Gelu
nnf_gelu(input, approximate = "none")
nnf_gelu(input, approximate = "none")
input |
(N,*) tensor, where * means, any number of additional dimensions |
approximate |
By default it's none, and applies element-wise x*pnorm(x), if 'tanh', then GELU is estimated. See GELU for more info. |
Applies element-wise the function
where is the Cumulative Distribution Function for
Gaussian Distribution.
See Gaussian Error Linear Units (GELUs).
The gated linear unit. Computes:
nnf_glu(input, dim = -1)
nnf_glu(input, dim = -1)
input |
(Tensor) input tensor |
dim |
(int) dimension on which to split the input. Default: -1 |
where input
is split in half along dim
to form a
and b
,
is the sigmoid function and
is the element-wise product
between matrices.
See Language Modeling with Gated Convolutional Networks.
Given an input
and a flow-field grid
, computes the
output
using input
values and pixel locations from grid
.
nnf_grid_sample( input, grid, mode = c("bilinear", "nearest"), padding_mode = c("zeros", "border", "reflection"), align_corners = FALSE )
nnf_grid_sample( input, grid, mode = c("bilinear", "nearest"), padding_mode = c("zeros", "border", "reflection"), align_corners = FALSE )
input |
(Tensor) input of shape |
grid |
(Tensor) flow-field of shape |
mode |
(str) interpolation mode to calculate output values |
padding_mode |
(str) padding mode for outside grid values |
align_corners |
(bool, optional) Geometrically, we consider the pixels of the
input as squares rather than points. If set to |
Currently, only spatial (4-D) and volumetric (5-D) input
are
supported.
In the spatial (4-D) case, for input
with shape
and
grid
with shape
, the output will have shape
.
For each output location output[n, :, h, w]
, the size-2 vector
grid[n, h, w]
specifies input
pixel locations x
and y
,
which are used to interpolate the output value output[n, :, h, w]
.
In the case of 5D inputs, grid[n, d, h, w]
specifies the
x
, y
, z
pixel locations for interpolating
output[n, :, d, h, w]
. mode
argument specifies nearest
or
bilinear
interpolation method to sample the input pixels.
grid
specifies the sampling pixel locations normalized by the
input
spatial dimensions. Therefore, it should have most values in
the range of [-1, 1]
. For example, values x = -1, y = -1
is the
left-top pixel of input
, and values x = 1, y = 1
is the
right-bottom pixel of input
.
If grid
has values outside the range of [-1, 1]
, the corresponding
outputs are handled as defined by padding_mode
. Options are
padding_mode="zeros"
: use 0
for out-of-bound grid locations,
padding_mode="border"
: use border values for out-of-bound grid locations,
padding_mode="reflection"
: use values at locations reflected by
the border for out-of-bound grid locations. For location far away
from the border, it will keep being reflected until becoming in bound,
e.g., (normalized) pixel location x = -3.5
reflects by border -1
and becomes x' = 1.5
, then reflects by border 1
and becomes
x'' = -0.5
.
This function is often used in conjunction with nnf_affine_grid()
to build Spatial Transformer Networks
_ .
Applies Group Normalization for last certain number of dimensions.
nnf_group_norm(input, num_groups, weight = NULL, bias = NULL, eps = 1e-05)
nnf_group_norm(input, num_groups, weight = NULL, bias = NULL, eps = 1e-05)
input |
the input tensor |
num_groups |
number of groups to separate the channels into |
weight |
the weight tensor |
bias |
the bias tensor |
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
Samples from the Gumbel-Softmax distribution and optionally discretizes.
nnf_gumbel_softmax(logits, tau = 1, hard = FALSE, dim = -1)
nnf_gumbel_softmax(logits, tau = 1, hard = FALSE, dim = -1)
logits |
|
tau |
non-negative scalar temperature |
hard |
if |
dim |
(int) A dimension along which softmax will be computed. Default: -1. |
Applies the hard shrinkage function element-wise
nnf_hardshrink(input, lambd = 0.5)
nnf_hardshrink(input, lambd = 0.5)
input |
(N,*) tensor, where * means, any number of additional dimensions |
lambd |
the lambda value for the Hardshrink formulation. Default: 0.5 |
Applies the element-wise function
nnf_hardsigmoid(input, inplace = FALSE)
nnf_hardsigmoid(input, inplace = FALSE)
input |
(N,*) tensor, where * means, any number of additional dimensions |
inplace |
NA If set to |
Applies the hardswish function, element-wise, as described in the paper: Searching for MobileNetV3.
nnf_hardswish(input, inplace = FALSE)
nnf_hardswish(input, inplace = FALSE)
input |
(N,*) tensor, where * means, any number of additional dimensions |
inplace |
can optionally do the operation in-place. Default: FALSE |
Applies the HardTanh function element-wise.
nnf_hardtanh(input, min_val = -1, max_val = 1, inplace = FALSE) nnf_hardtanh_(input, min_val = -1, max_val = 1)
nnf_hardtanh(input, min_val = -1, max_val = 1, inplace = FALSE) nnf_hardtanh_(input, min_val = -1, max_val = 1)
input |
(N,*) tensor, where * means, any number of additional dimensions |
min_val |
minimum value of the linear region range. Default: -1 |
max_val |
maximum value of the linear region range. Default: 1 |
inplace |
can optionally do the operation in-place. Default: FALSE |
Measures the loss given an input tensor xx and a labels tensor yy (containing 1 or -1). This is usually used for measuring whether two inputs are similar or dissimilar, e.g. using the L1 pairwise distance as xx , and is typically used for learning nonlinear embeddings or semi-supervised learning.
nnf_hinge_embedding_loss(input, target, margin = 1, reduction = "mean")
nnf_hinge_embedding_loss(input, target, margin = 1, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
margin |
Has a default value of 1. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Applies Instance Normalization for each channel in each data sample in a batch.
nnf_instance_norm( input, running_mean = NULL, running_var = NULL, weight = NULL, bias = NULL, use_input_stats = TRUE, momentum = 0.1, eps = 1e-05 )
nnf_instance_norm( input, running_mean = NULL, running_var = NULL, weight = NULL, bias = NULL, use_input_stats = TRUE, momentum = 0.1, eps = 1e-05 )
input |
the input tensor |
running_mean |
the running_mean tensor |
running_var |
the running var tensor |
weight |
the weight tensor |
bias |
the bias tensor |
use_input_stats |
whether to use input stats |
momentum |
a double for the momentum |
eps |
an eps double for numerical stability |
Down/up samples the input to either the given size
or the given
scale_factor
nnf_interpolate( input, size = NULL, scale_factor = NULL, mode = "nearest", align_corners = FALSE, recompute_scale_factor = NULL )
nnf_interpolate( input, size = NULL, scale_factor = NULL, mode = "nearest", align_corners = FALSE, recompute_scale_factor = NULL )
input |
(Tensor) the input tensor |
size |
(int or |
scale_factor |
(float or |
mode |
(str) algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area' Default: 'nearest' |
align_corners |
(bool, optional) Geometrically, we consider the pixels
of the input and output as squares rather than points. If set to TRUE,
the input and output tensors are aligned by the center points of their corner
pixels, preserving the values at the corner pixels. If set to False, the
input and output tensors are aligned by the corner points of their corner pixels,
and the interpolation uses edge value padding for out-of-boundary values,
making this operation independent of input size when |
recompute_scale_factor |
(bool, optional) recompute the scale_factor
for use in the interpolation calculation. When |
The algorithm used for interpolation is determined by mode
.
Currently temporal, spatial and volumetric sampling are supported, i.e. expected inputs are 3-D, 4-D or 5-D in shape.
The input dimensions are interpreted in the form:
mini-batch x channels x [optional depth] x [optional height] x width
.
The modes available for resizing are: nearest
, linear
(3D-only),
bilinear
, bicubic
(4D-only), trilinear
(5D-only), area
The Kullback-Leibler divergence Loss.
nnf_kl_div(input, target, reduction = "mean")
nnf_kl_div(input, target, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Function that takes the mean element-wise absolute value difference.
nnf_l1_loss(input, target, reduction = "mean")
nnf_l1_loss(input, target, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Applies Layer Normalization for last certain number of dimensions.
nnf_layer_norm( input, normalized_shape, weight = NULL, bias = NULL, eps = 1e-05 )
nnf_layer_norm( input, normalized_shape, weight = NULL, bias = NULL, eps = 1e-05 )
input |
the input tensor |
normalized_shape |
input shape from an expected input of size. If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size. |
weight |
the weight tensor |
bias |
the bias tensor |
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
Applies element-wise,
nnf_leaky_relu(input, negative_slope = 0.01, inplace = FALSE)
nnf_leaky_relu(input, negative_slope = 0.01, inplace = FALSE)
input |
(N,*) tensor, where * means, any number of additional dimensions |
negative_slope |
Controls the angle of the negative slope. Default: 1e-2 |
inplace |
can optionally do the operation in-place. Default: FALSE |
Applies a linear transformation to the incoming data: .
nnf_linear(input, weight, bias = NULL)
nnf_linear(input, weight, bias = NULL)
input |
|
weight |
|
bias |
optional tensor |
Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. Applies normalization across channels.
nnf_local_response_norm(input, size, alpha = 1e-04, beta = 0.75, k = 1)
nnf_local_response_norm(input, size, alpha = 1e-04, beta = 0.75, k = 1)
input |
the input tensor |
size |
amount of neighbouring channels used for normalization |
alpha |
multiplicative factor. Default: 0.0001 |
beta |
exponent. Default: 0.75 |
k |
additive factor. Default: 1 |
Applies a softmax followed by a logarithm.
nnf_log_softmax(input, dim = NULL, dtype = NULL)
nnf_log_softmax(input, dim = NULL, dtype = NULL)
input |
(Tensor) input |
dim |
(int) A dimension along which log_softmax will be computed. |
dtype |
( |
While mathematically equivalent to log(softmax(x)), doing these two operations separately is slower, and numerically unstable. This function uses an alternative formulation to compute the output and gradient correctly.
Applies element-wise
nnf_logsigmoid(input)
nnf_logsigmoid(input)
input |
(N,*) tensor, where * means, any number of additional dimensions |
Applies a 1D power-average pooling over an input signal composed of
several input planes. If the sum of all inputs to the power of p
is
zero, the gradient is set to zero as well.
nnf_lp_pool1d(input, norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
nnf_lp_pool1d(input, norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
input |
the input tensor |
norm_type |
if inf than one gets max pooling if 0 you get sum pooling ( proportional to the avg pooling) |
kernel_size |
a single int, the size of the window |
stride |
a single int, the stride of the window. Default value is kernel_size |
ceil_mode |
when True, will use ceil instead of floor to compute the output shape |
Applies a 2D power-average pooling over an input signal composed of
several input planes. If the sum of all inputs to the power of p
is
zero, the gradient is set to zero as well.
nnf_lp_pool2d(input, norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
nnf_lp_pool2d(input, norm_type, kernel_size, stride = NULL, ceil_mode = FALSE)
input |
the input tensor |
norm_type |
if inf than one gets max pooling if 0 you get sum pooling ( proportional to the avg pooling) |
kernel_size |
a single int, the size of the window |
stride |
a single int, the stride of the window. Default value is kernel_size |
ceil_mode |
when True, will use ceil instead of floor to compute the output shape |
Creates a criterion that measures the loss given inputs x1 , x2 , two 1D mini-batch Tensors, and a label 1D mini-batch tensor y (containing 1 or -1).
nnf_margin_ranking_loss(input1, input2, target, margin = 0, reduction = "mean")
nnf_margin_ranking_loss(input1, input2, target, margin = 0, reduction = "mean")
input1 |
the first tensor |
input2 |
the second input tensor |
target |
the target tensor |
margin |
Has a default value of 00 . |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Applies a 1D max pooling over an input signal composed of several input planes.
nnf_max_pool1d( input, kernel_size, stride = NULL, padding = 0, dilation = 1, ceil_mode = FALSE, return_indices = FALSE )
nnf_max_pool1d( input, kernel_size, stride = NULL, padding = 0, dilation = 1, ceil_mode = FALSE, return_indices = FALSE )
input |
input tensor of shape (minibatch , in_channels , iW) |
kernel_size |
the size of the window. Can be a single number or a
tuple |
stride |
the stride of the window. Can be a single number or a tuple
|
padding |
implicit zero paddings on both sides of the input. Can be a
single number or a tuple |
dilation |
controls the spacing between the kernel points; also known as the à trous algorithm. |
ceil_mode |
when True, will use |
return_indices |
whether to return the indices where the max occurs. |
Applies a 2D max pooling over an input signal composed of several input planes.
nnf_max_pool2d( input, kernel_size, stride = kernel_size, padding = 0, dilation = 1, ceil_mode = FALSE, return_indices = FALSE )
nnf_max_pool2d( input, kernel_size, stride = kernel_size, padding = 0, dilation = 1, ceil_mode = FALSE, return_indices = FALSE )
input |
input tensor (minibatch, in_channels , iH , iW) |
kernel_size |
size of the pooling region. Can be a single number or a
tuple |
stride |
stride of the pooling operation. Can be a single number or a
tuple |
padding |
implicit zero paddings on both sides of the input. Can be a
single number or a tuple |
dilation |
controls the spacing between the kernel points; also known as the à trous algorithm. |
ceil_mode |
when True, will use |
return_indices |
whether to return the indices where the max occurs. |
Applies a 3D max pooling over an input signal composed of several input planes.
nnf_max_pool3d( input, kernel_size, stride = NULL, padding = 0, dilation = 1, ceil_mode = FALSE, return_indices = FALSE )
nnf_max_pool3d( input, kernel_size, stride = NULL, padding = 0, dilation = 1, ceil_mode = FALSE, return_indices = FALSE )
input |
input tensor (minibatch, in_channels , iT * iH , iW) |
kernel_size |
size of the pooling region. Can be a single number or a
tuple |
stride |
stride of the pooling operation. Can be a single number or a
tuple |
padding |
implicit zero paddings on both sides of the input. Can be a
single number or a tuple |
dilation |
controls the spacing between the kernel points; also known as the à trous algorithm. |
ceil_mode |
when True, will use |
return_indices |
whether to return the indices where the max occurs. |
Computes a partial inverse of MaxPool1d
.
nnf_max_unpool1d( input, indices, kernel_size, stride = NULL, padding = 0, output_size = NULL )
nnf_max_unpool1d( input, indices, kernel_size, stride = NULL, padding = 0, output_size = NULL )
input |
the input Tensor to invert |
indices |
the indices given out by max pool |
kernel_size |
Size of the max pooling window. |
stride |
Stride of the max pooling window. It is set to kernel_size by default. |
padding |
Padding that was added to the input |
output_size |
the targeted output size |
Computes a partial inverse of MaxPool2d
.
nnf_max_unpool2d( input, indices, kernel_size, stride = NULL, padding = 0, output_size = NULL )
nnf_max_unpool2d( input, indices, kernel_size, stride = NULL, padding = 0, output_size = NULL )
input |
the input Tensor to invert |
indices |
the indices given out by max pool |
kernel_size |
Size of the max pooling window. |
stride |
Stride of the max pooling window. It is set to kernel_size by default. |
padding |
Padding that was added to the input |
output_size |
the targeted output size |
Computes a partial inverse of MaxPool3d
.
nnf_max_unpool3d( input, indices, kernel_size, stride = NULL, padding = 0, output_size = NULL )
nnf_max_unpool3d( input, indices, kernel_size, stride = NULL, padding = 0, output_size = NULL )
input |
the input Tensor to invert |
indices |
the indices given out by max pool |
kernel_size |
Size of the max pooling window. |
stride |
Stride of the max pooling window. It is set to kernel_size by default. |
padding |
Padding that was added to the input |
output_size |
the targeted output size |
Measures the element-wise mean squared error.
nnf_mse_loss(input, target, reduction = "mean")
nnf_mse_loss(input, target, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Allows the model to jointly attend to information from different representation subspaces. See reference: Attention Is All You Need
nnf_multi_head_attention_forward( query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training = TRUE, key_padding_mask = NULL, need_weights = TRUE, attn_mask = NULL, avg_weights = TRUE, use_separate_proj_weight = FALSE, q_proj_weight = NULL, k_proj_weight = NULL, v_proj_weight = NULL, static_k = NULL, static_v = NULL, batch_first = FALSE )
nnf_multi_head_attention_forward( query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training = TRUE, key_padding_mask = NULL, need_weights = TRUE, attn_mask = NULL, avg_weights = TRUE, use_separate_proj_weight = FALSE, q_proj_weight = NULL, k_proj_weight = NULL, v_proj_weight = NULL, static_k = NULL, static_v = NULL, batch_first = FALSE )
query |
|
key |
|
value |
|
embed_dim_to_check |
total dimension of the model. |
num_heads |
parallel attention heads. |
in_proj_weight |
input projection weight. |
in_proj_bias |
input projection bias. |
bias_k |
bias of the key and value sequences to be added at dim=0. |
bias_v |
currently undocumented. |
add_zero_attn |
add a new batch of zeros to the key and value sequences at dim=1. |
dropout_p |
probability of an element to be zeroed. |
out_proj_weight |
the output projection weight. |
out_proj_bias |
output projection bias. |
training |
apply dropout if is |
key_padding_mask |
|
need_weights |
output attn_output_weights. |
attn_mask |
2D mask |
avg_weights |
Logical; whether to average attn_output_weights over the attention heads before outputting them. This doesn't change the returned value of attn_output; it only affects the returned attention weight matrix. |
use_separate_proj_weight |
the function accept the proj. weights for query, key, and value in different forms. If false, in_proj_weight will be used, which is a combination of q_proj_weight, k_proj_weight, v_proj_weight. |
q_proj_weight |
input projection weight and bias. |
k_proj_weight |
currently undocumented. |
v_proj_weight |
currently undocumented. |
static_k |
static key and value used for attention operators. |
static_v |
currently undocumented. |
batch_first |
Logical; whether to expect query, key, and value to have batch as their first parameter, and to return output with batch first. |
Creates a criterion that optimizes a multi-class classification hinge loss
(margin-based loss) between input x (a 2D mini-batch Tensor) and output y
(which is a 1D tensor of target class indices, 0 <= y <= x$size(2) - 1
).
nnf_multi_margin_loss( input, target, p = 1, margin = 1, weight = NULL, reduction = "mean" )
nnf_multi_margin_loss( input, target, p = 1, margin = 1, weight = NULL, reduction = "mean" )
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
p |
Has a default value of 1. 1 and 2 are the only supported values. |
margin |
Has a default value of 1. |
weight |
a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input x (a 2D mini-batch Tensor) and output y (which is a 2D Tensor of target class indices).
nnf_multilabel_margin_loss(input, target, reduction = "mean")
nnf_multilabel_margin_loss(input, target, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input x and target y of size (N, C).
nnf_multilabel_soft_margin_loss( input, target, weight = NULL, reduction = "mean" )
nnf_multilabel_soft_margin_loss( input, target, weight = NULL, reduction = "mean" )
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
weight |
weight tensor to apply on the loss. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
It takes a one hot encoded target vector as input.
The negative log likelihood loss.
nnf_nll_loss( input, target, weight = NULL, ignore_index = -100, reduction = "mean" )
nnf_nll_loss( input, target, weight = NULL, ignore_index = -100, reduction = "mean" )
input |
|
target |
|
weight |
(Tensor, optional) a manual rescaling weight given to each class.
If given, has to be a Tensor of size |
ignore_index |
(int, optional) Specifies a target value that is ignored and does not contribute to the input gradient. |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Performs normalization of inputs over specified dimension.
nnf_normalize(input, p = 2, dim = 2, eps = 1e-12, out = NULL)
nnf_normalize(input, p = 2, dim = 2, eps = 1e-12, out = NULL)
input |
input tensor of any shape |
p |
(float) the exponent value in the norm formulation. Default: 2 |
dim |
(int) the dimension to reduce. Default: 1 |
eps |
(float) small value to avoid division by zero. Default: 1e-12 |
out |
(Tensor, optional) the output tensor. If |
For a tensor input
of sizes , each
-element vector
along dimension
dim
is transformed as
With the default arguments it uses the Euclidean norm over vectors along
dimension for normalization.
Takes LongTensor with index values of shape (*)
and returns a tensor
of shape (*, num_classes)
that have zeros everywhere except where the
index of last dimension matches the corresponding value of the input tensor,
in which case it will be 1.
nnf_one_hot(tensor, num_classes = -1)
nnf_one_hot(tensor, num_classes = -1)
tensor |
(LongTensor) class values of any shape. |
num_classes |
(int) Total number of classes. If set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. |
One-hot on Wikipedia: https://en.wikipedia.org/wiki/One-hot
Pads tensor.
nnf_pad(input, pad, mode = "constant", value = NULL)
nnf_pad(input, pad, mode = "constant", value = NULL)
input |
(Tensor) N-dimensional tensor |
pad |
(tuple) m-elements tuple, where |
mode |
'constant', 'reflect', 'replicate' or 'circular'. Default: 'constant' |
value |
fill value for 'constant' padding. Default: 0. |
The padding size by which to pad some dimensions of input
are described starting from the last dimension and moving forward.
dimensions
of
input
will be padded.
For example, to pad only the last dimension of the input tensor, then
pad
has the form
;
to pad the last 2 dimensions of the input tensor, then use
;
to pad the last 3 dimensions, use
.
See nn_constant_pad_2d
, nn_reflection_pad_2d
, and
nn_replication_pad_2d
for concrete examples on how each of the
padding modes works. Constant padding is implemented for arbitrary dimensions.
tensor, or the last 2 dimensions of 4D input tensor, or the last dimension of
3D input tensor. Reflect padding is only implemented for padding the last 2
dimensions of 4D input tensor, or the last dimension of 3D input tensor.
Computes the batchwise pairwise distance between vectors using the p-norm.
nnf_pairwise_distance(x1, x2, p = 2, eps = 1e-06, keepdim = FALSE)
nnf_pairwise_distance(x1, x2, p = 2, eps = 1e-06, keepdim = FALSE)
x1 |
(Tensor) First input. |
x2 |
(Tensor) Second input (of size matching x1). |
p |
the norm degree. Default: 2 |
eps |
(float, optional) Small value to avoid division by zero. Default: 1e-8 |
keepdim |
Determines whether or not to keep the vector dimension. Default: False |
Computes the p-norm distance between every pair of row vectors in the input.
This is identical to the upper triangular portion, excluding the diagonal, of
torch_norm(input[:, None] - input, dim=2, p=p)
. This function will be faster
if the rows are contiguous.
nnf_pdist(input, p = 2)
nnf_pdist(input, p = 2)
input |
input tensor of shape |
p |
p value for the p-norm distance to calculate between each vector pair
|
If input has shape then the output will have shape
.
Rearranges elements in a tensor of shape to a
tensor of shape
.
nnf_pixel_shuffle(input, upscale_factor)
nnf_pixel_shuffle(input, upscale_factor)
input |
(Tensor) the input tensor |
upscale_factor |
(int) factor to increase spatial resolution by |
Poisson negative log likelihood loss.
nnf_poisson_nll_loss( input, target, log_input = TRUE, full = FALSE, eps = 1e-08, reduction = "mean" )
nnf_poisson_nll_loss( input, target, log_input = TRUE, full = FALSE, eps = 1e-08, reduction = "mean" )
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
log_input |
if |
full |
whether to compute full loss, i. e. to add the Stirling approximation
term. Default: |
eps |
(float, optional) Small value to avoid evaluation of |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Applies element-wise the function
where weight is a learnable parameter.
nnf_prelu(input, weight)
nnf_prelu(input, weight)
input |
(N,*) tensor, where * means, any number of additional dimensions |
weight |
(Tensor) the learnable weights |
Applies the rectified linear unit function element-wise.
nnf_relu(input, inplace = FALSE) nnf_relu_(input)
nnf_relu(input, inplace = FALSE) nnf_relu_(input)
input |
(N,*) tensor, where * means, any number of additional dimensions |
inplace |
can optionally do the operation in-place. Default: FALSE |
Applies the element-wise function .
nnf_relu6(input, inplace = FALSE)
nnf_relu6(input, inplace = FALSE)
input |
(N,*) tensor, where * means, any number of additional dimensions |
inplace |
can optionally do the operation in-place. Default: FALSE |
Randomized leaky ReLU.
nnf_rrelu(input, lower = 1/8, upper = 1/3, training = FALSE, inplace = FALSE) nnf_rrelu_(input, lower = 1/8, upper = 1/3, training = FALSE)
nnf_rrelu(input, lower = 1/8, upper = 1/3, training = FALSE, inplace = FALSE) nnf_rrelu_(input, lower = 1/8, upper = 1/3, training = FALSE)
input |
(N,*) tensor, where * means, any number of additional dimensions |
lower |
lower bound of the uniform distribution. Default: 1/8 |
upper |
upper bound of the uniform distribution. Default: 1/3 |
training |
bool wether it's a training pass. DEfault: FALSE |
inplace |
can optionally do the operation in-place. Default: FALSE |
Applies element-wise,
,
with and
.
nnf_selu(input, inplace = FALSE) nnf_selu_(input)
nnf_selu(input, inplace = FALSE) nnf_selu_(input)
input |
(N,*) tensor, where * means, any number of additional dimensions |
inplace |
can optionally do the operation in-place. Default: FALSE |
if (torch_is_installed()) { x <- torch_randn(2, 2) y <- nnf_selu(x) nnf_selu_(x) torch_equal(x, y) }
if (torch_is_installed()) { x <- torch_randn(2, 2) y <- nnf_selu(x) nnf_selu_(x) torch_equal(x, y) }
Applies element-wise
nnf_sigmoid(input)
nnf_sigmoid(input)
input |
(N,*) tensor, where * means, any number of additional dimensions |
nn_silu()
for more information.Applies the Sigmoid Linear Unit (SiLU) function, element-wise.
See nn_silu()
for more information.
nnf_silu(input, inplace = FALSE)
nnf_silu(input, inplace = FALSE)
input |
(N,*) tensor, where * means, any number of additional dimensions |
inplace |
can optionally do the operation in-place. Default: FALSE |
Function that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise.
nnf_smooth_l1_loss(input, target, reduction = "mean")
nnf_smooth_l1_loss(input, target, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Creates a criterion that optimizes a two-class classification logistic loss between input tensor x and target tensor y (containing 1 or -1).
nnf_soft_margin_loss(input, target, reduction = "mean")
nnf_soft_margin_loss(input, target, reduction = "mean")
input |
tensor (N,*) where ** means, any number of additional dimensions |
target |
tensor (N,*) , same shape as the input |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Applies a softmax function.
nnf_softmax(input, dim, dtype = NULL)
nnf_softmax(input, dim, dtype = NULL)
input |
(Tensor) input |
dim |
(int) A dimension along which softmax will be computed. |
dtype |
( |
Softmax is defined as:
It is applied to all slices along dim, and will re-scale them so that the elements
lie in the range [0, 1]
and sum to 1.
Applies a softmin function.
nnf_softmin(input, dim, dtype = NULL)
nnf_softmin(input, dim, dtype = NULL)
input |
(Tensor) input |
dim |
(int) A dimension along which softmin will be computed (so every slice along dim will sum to 1). |
dtype |
( |
Note that
.
See nnf_softmax definition for mathematical formula.
Applies element-wise, the function .
nnf_softplus(input, beta = 1, threshold = 20)
nnf_softplus(input, beta = 1, threshold = 20)
input |
(N,*) tensor, where * means, any number of additional dimensions |
beta |
the beta value for the Softplus formulation. Default: 1 |
threshold |
values above this revert to a linear function. Default: 20 |
For numerical stability the implementation reverts to the linear function
when .
Applies the soft shrinkage function elementwise
nnf_softshrink(input, lambd = 0.5)
nnf_softshrink(input, lambd = 0.5)
input |
(N,*) tensor, where * means, any number of additional dimensions |
lambd |
the lambda (must be no less than zero) value for the Softshrink formulation. Default: 0.5 |
Applies element-wise, the function
nnf_softsign(input)
nnf_softsign(input)
input |
(N,*) tensor, where * means, any number of additional dimensions |
Applies element-wise,
nnf_tanhshrink(input)
nnf_tanhshrink(input)
input |
(N,*) tensor, where * means, any number of additional dimensions |
Thresholds each element of the input Tensor.
nnf_threshold(input, threshold, value, inplace = FALSE) nnf_threshold_(input, threshold, value)
nnf_threshold(input, threshold, value, inplace = FALSE) nnf_threshold_(input, threshold, value)
input |
(N,*) tensor, where * means, any number of additional dimensions |
threshold |
The value to threshold at |
value |
The value to replace with |
inplace |
can optionally do the operation in-place. Default: FALSE |
Creates a criterion that measures the triplet loss given an input tensors x1 , x2 , x3 and a margin with a value greater than 0 . This is used for measuring a relative similarity between samples. A triplet is composed by a, p and n (i.e., anchor, positive examples and negative examples respectively). The shapes of all input tensors should be (N, D).
nnf_triplet_margin_loss( anchor, positive, negative, margin = 1, p = 2, eps = 1e-06, swap = FALSE, reduction = "mean" )
nnf_triplet_margin_loss( anchor, positive, negative, margin = 1, p = 2, eps = 1e-06, swap = FALSE, reduction = "mean" )
anchor |
the anchor input tensor |
positive |
the positive input tensor |
negative |
the negative input tensor |
margin |
Default: 1. |
p |
The norm degree for pairwise distance. Default: 2. |
eps |
(float, optional) Small value to avoid division by zero. |
swap |
The distance swap is described in detail in the paper Learning shallow
convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.
Default: |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
See nn_triplet_margin_with_distance_loss()
nnf_triplet_margin_with_distance_loss( anchor, positive, negative, distance_function = NULL, margin = 1, swap = FALSE, reduction = "mean" )
nnf_triplet_margin_with_distance_loss( anchor, positive, negative, distance_function = NULL, margin = 1, swap = FALSE, reduction = "mean" )
anchor |
the anchor input tensor |
positive |
the positive input tensor |
negative |
the negative input tensor |
distance_function |
(callable, optional): A nonnegative, real-valued function that
quantifies the closeness of two tensors. If not specified,
|
margin |
Default: 1. |
swap |
The distance swap is described in detail in the paper Learning shallow
convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.
Default: |
reduction |
(string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean' |
Extracts sliding local blocks from an batched input tensor.
nnf_unfold(input, kernel_size, dilation = 1, padding = 0, stride = 1)
nnf_unfold(input, kernel_size, dilation = 1, padding = 0, stride = 1)
input |
the input tensor |
kernel_size |
the size of the sliding blocks |
dilation |
a parameter that controls the stride of elements within the neighborhood. Default: 1 |
padding |
implicit zero padding to be added on both sides of input. Default: 0 |
stride |
the stride of the sliding blocks in the input spatial dimensions. Default: 1 |
More than one element of the unfolded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensor, please clone it first.
It has been proposed in ADADELTA: An Adaptive Learning Rate Method
optim_adadelta(params, lr = 1, rho = 0.9, eps = 1e-06, weight_decay = 0)
optim_adadelta(params, lr = 1, rho = 0.9, eps = 1e-06, weight_decay = 0)
params |
(iterable): list of parameters to optimize or list defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-3) |
rho |
(float, optional): coefficient used for computing a running average of squared gradients (default: 0.9) |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-6) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
According to the original paper, decaying average of the squared gradients is computed as follows:
RMS of previous squared gradients up to time t:
Adadelta update rule:
if (torch_is_installed()) { ## Not run: optimizer <- optim_adadelta(model$parameters, lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_adadelta(model$parameters, lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Proposed in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
optim_adagrad( params, lr = 0.01, lr_decay = 0, weight_decay = 0, initial_accumulator_value = 0, eps = 1e-10 )
optim_adagrad( params, lr = 0.01, lr_decay = 0, weight_decay = 0, initial_accumulator_value = 0, eps = 1e-10 )
params |
(iterable): list of parameters to optimize or list parameter groups |
lr |
(float, optional): learning rate (default: 1e-2) |
lr_decay |
(float, optional): learning rate decay (default: 0) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
initial_accumulator_value |
the initial value for the accumulator. (default: 0) Adagrad is an especially good optimizer for sparse data. It individually modifies learning rate for every single parameter, dividing the original learning rate value by sum of the squares of the gradients. It causes that the rarely occurring features get greater learning rates. The main downside of this method is the fact that learning rate may be getting small too fast, so that at some point a model cannot learn anymore. |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-10) |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
Update rule:
The equation above and some remarks quoted after An overview of gradient descent optimization algorithms by Sebastian Ruder.
It has been proposed in Adam: A Method for Stochastic Optimization.
optim_adam( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0, amsgrad = FALSE )
optim_adam( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0, amsgrad = FALSE )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-3) |
betas |
( |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-8) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
amsgrad |
(boolean, optional): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: FALSE) |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
if (torch_is_installed()) { ## Not run: optimizer <- optim_adam(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_adam(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
For further details regarding the algorithm we refer to Decoupled Weight Decay Regularization
optim_adamw( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0.01, amsgrad = FALSE )
optim_adamw( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0.01, amsgrad = FALSE )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-3) |
betas |
( |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-8) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
amsgrad |
(boolean, optional): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: FALSE) |
Proposed in Acceleration of stochastic approximation by averaging
optim_asgd( params, lr = 0.01, lambda = 1e-04, alpha = 0.75, t0 = 1e+06, weight_decay = 0 )
optim_asgd( params, lr = 0.01, lambda = 1e-04, alpha = 0.75, t0 = 1e+06, weight_decay = 0 )
params |
(iterable): iterable of parameters to optimize or lists defining parameter groups |
lr |
(float): learning rate |
lambda |
(float, optional): decay term (default: 1e-4) |
alpha |
(float, optional): power for eta update (default: 0.75) |
t0 |
(float, optional): point at which to start averaging (default: 1e6) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
if (torch_is_installed()) { ## Not run: optimizer <- optim_asgd(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_asgd(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Proposed in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
optim_ignite_adagrad( params, lr = 0.01, lr_decay = 0, weight_decay = 0, initial_accumulator_value = 0, eps = 1e-10 )
optim_ignite_adagrad( params, lr = 0.01, lr_decay = 0, weight_decay = 0, initial_accumulator_value = 0, eps = 1e-10 )
params |
(iterable): list of parameters to optimize or list parameter groups |
lr |
(float, optional): learning rate (default: 1e-2) |
lr_decay |
(float, optional): learning rate decay (default: 0) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
initial_accumulator_value |
the initial value for the accumulator. (default: 0) Adagrad is an especially good optimizer for sparse data. It individually modifies learning rate for every single parameter, dividing the original learning rate value by sum of the squares of the gradients. It causes that the rarely occurring features get greater learning rates. The main downside of this method is the fact that learning rate may be getting small too fast, so that at some point a model cannot learn anymore. |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-10) |
See OptimizerIgnite
.
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_adagrad(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_adagrad(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
It has been proposed in Adam: A Method for Stochastic Optimization.
optim_ignite_adam( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0, amsgrad = FALSE )
optim_ignite_adam( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0, amsgrad = FALSE )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-3) |
betas |
( |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-8) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
amsgrad |
(boolean, optional): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: FALSE) |
See OptimizerIgnite
.
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_adam(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_adam(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
For further details regarding the algorithm we refer to Decoupled Weight Decay Regularization
optim_ignite_adamw( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0.01, amsgrad = FALSE )
optim_ignite_adamw( params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08, weight_decay = 0.01, amsgrad = FALSE )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-3) |
betas |
( |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-8) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
amsgrad |
(boolean, optional): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: FALSE) |
See OptimizerIgnite
.
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_adamw(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_adamw(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Proposed by G. Hinton in his course.
optim_ignite_rmsprop( params, lr = 0.01, alpha = 0.99, eps = 1e-08, weight_decay = 0, momentum = 0, centered = FALSE )
optim_ignite_rmsprop( params, lr = 0.01, alpha = 0.99, eps = 1e-08, weight_decay = 0, momentum = 0, centered = FALSE )
params |
(iterable): iterable of parameters to optimize or list defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-2) |
alpha |
(float, optional): smoothing constant (default: 0.99) |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-8) |
weight_decay |
optional weight decay penalty. (default: 0) |
momentum |
(float, optional): momentum factor (default: 0) |
centered |
(bool, optional) : if |
See OptimizerIgnite
.
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_rmsprop(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_rmsprop(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Implements stochastic gradient descent (optionally with momentum). Nesterov momentum is based on the formula from On the importance of initialization and momentum in deep learning.
optim_ignite_sgd( params, lr = optim_required(), momentum = 0, dampening = 0, weight_decay = 0, nesterov = FALSE )
optim_ignite_sgd( params, lr = optim_required(), momentum = 0, dampening = 0, weight_decay = 0, nesterov = FALSE )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float): learning rate |
momentum |
(float, optional): momentum factor (default: 0) |
dampening |
(float, optional): dampening for momentum (default: 0) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
nesterov |
(bool, optional): enables Nesterov momentum (default: FALSE) |
See OptimizerIgnite
.
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_sgd(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_ignite_sgd(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Implements L-BFGS algorithm, heavily inspired by minFunc
optim_lbfgs( params, lr = 1, max_iter = 20, max_eval = NULL, tolerance_grad = 1e-07, tolerance_change = 1e-09, history_size = 100, line_search_fn = NULL )
optim_lbfgs( params, lr = 1, max_iter = 20, max_eval = NULL, tolerance_grad = 1e-07, tolerance_change = 1e-09, history_size = 100, line_search_fn = NULL )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float): learning rate (default: 1) |
max_iter |
(int): maximal number of iterations per optimization step (default: 20) |
max_eval |
(int): maximal number of function evaluations per optimization step (default: max_iter * 1.25). |
tolerance_grad |
(float): termination tolerance on first order optimality (default: 1e-5). |
tolerance_change |
(float): termination tolerance on function value/parameter changes (default: 1e-9). |
history_size |
(int): update history size (default: 100). |
line_search_fn |
(str): either 'strong_wolfe' or None (default: None). |
This optimizer is different from the others in that in optimizer$step()
,
it needs to be passed a closure that (1) calculates the loss, (2) calls
backward()
on it, and (3) returns it. See example below.
This optimizer doesn't support per-parameter options and parameter groups (there can be only one).
Right now all parameters have to be on a single device. This will be improved in the future.
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
This is a very memory intensive optimizer (it requires additional
param_bytes * (history_size + 1)
bytes). If it doesn't fit in memory
try reducing the history size, or use a different algorithm.
if (torch_is_installed()) { a <- 1 b <- 5 rosenbrock <- function(x) { x1 <- x[1] x2 <- x[2] (a - x1)^2 + b * (x2 - x1^2)^2 } x <- torch_tensor(c(-1, 1), requires_grad = TRUE) optimizer <- optim_lbfgs(x) calc_loss <- function() { optimizer$zero_grad() value <- rosenbrock(x) value$backward() value } num_iterations <- 2 for (i in 1:num_iterations) { optimizer$step(calc_loss) } rosenbrock(x) }
if (torch_is_installed()) { a <- 1 b <- 5 rosenbrock <- function(x) { x1 <- x[1] x2 <- x[2] (a - x1)^2 + b * (x2 - x1^2)^2 } x <- torch_tensor(c(-1, 1), requires_grad = TRUE) optimizer <- optim_lbfgs(x) calc_loss <- function() { optimizer$zero_grad() value <- rosenbrock(x) value$backward() value } num_iterations <- 2 for (i in 1:num_iterations) { optimizer$step(calc_loss) } rosenbrock(x) }
Dummy value indicating a required value.
optim_required()
optim_required()
Proposed by G. Hinton in his course.
optim_rmsprop( params, lr = 0.01, alpha = 0.99, eps = 1e-08, weight_decay = 0, momentum = 0, centered = FALSE )
optim_rmsprop( params, lr = 0.01, alpha = 0.99, eps = 1e-08, weight_decay = 0, momentum = 0, centered = FALSE )
params |
(iterable): iterable of parameters to optimize or list defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-2) |
alpha |
(float, optional): smoothing constant (default: 0.99) |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-8) |
weight_decay |
optional weight decay penalty. (default: 0) |
momentum |
(float, optional): momentum factor (default: 0) |
centered |
(bool, optional) : if |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
The centered version first appears in
Generating Sequences With Recurrent Neural Networks.
The implementation here takes the square root of the gradient average before
adding epsilon (note that TensorFlow interchanges these two operations). The effective
learning rate is thus where
is the scheduled learning rate and
is the weighted moving average
of the squared gradient.
Update rule:
Proposed first in RPROP - A Fast Adaptive Learning Algorithm
optim_rprop(params, lr = 0.01, etas = c(0.5, 1.2), step_sizes = c(1e-06, 50))
optim_rprop(params, lr = 0.01, etas = c(0.5, 1.2), step_sizes = c(1e-06, 50))
params |
(iterable): iterable of parameters to optimize or lists defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-2) |
etas |
(Tuple(float, float), optional): pair of (etaminus, etaplis), that are multiplicative increase and decrease factors (default: (0.5, 1.2)) |
step_sizes |
(vector(float, float), optional): a pair of minimal and maximal allowed step sizes (default: (1e-6, 50)) |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
if (torch_is_installed()) { ## Not run: optimizer <- optim_rprop(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_rprop(model$parameters(), lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Implements stochastic gradient descent (optionally with momentum). Nesterov momentum is based on the formula from On the importance of initialization and momentum in deep learning.
optim_sgd( params, lr = optim_required(), momentum = 0, dampening = 0, weight_decay = 0, nesterov = FALSE )
optim_sgd( params, lr = optim_required(), momentum = 0, dampening = 0, weight_decay = 0, nesterov = FALSE )
params |
(iterable): iterable of parameters to optimize or dicts defining parameter groups |
lr |
(float): learning rate |
momentum |
(float, optional): momentum factor (default: 0) |
dampening |
(float, optional): dampening for momentum (default: 0) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
nesterov |
(bool, optional): enables Nesterov momentum (default: FALSE) |
The implementation of SGD with Momentum-Nesterov subtly differs from Sutskever et. al. and implementations in some other frameworks.
Considering the specific case of Momentum, the update can be written as
where ,
,
and
denote the
parameters, gradient, velocity, and momentum respectively.
This is in contrast to Sutskever et. al. and other frameworks which employ an update of the form
The Nesterov version is analogously modified.
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
if (torch_is_installed()) { ## Not run: optimizer <- optim_sgd(model$parameters(), lr = 0.1, momentum = 0.9) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
if (torch_is_installed()) { ## Not run: optimizer <- optim_sgd(model$parameters(), lr = 0.1, momentum = 0.9) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
When implementing custom optimizers you will usually need to implement
the initialize
and step
methods. See the example section below
for a full example.
optimizer( name = NULL, inherit = Optimizer, ..., private = NULL, active = NULL, parent_env = parent.frame() )
optimizer( name = NULL, inherit = Optimizer, ..., private = NULL, active = NULL, parent_env = parent.frame() )
name |
(optional) name of the optimizer |
inherit |
(optional) you can inherit from other optimizers to re-use some methods. |
... |
Pass any number of fields or methods. You should at least define
the |
private |
(optional) a list of private methods for the optimizer. |
active |
(optional) a list of active methods for the optimizer. |
parent_env |
used to capture the right environment to define the class. The default is fine for most situations. |
If you need to move a model to GPU via $cuda()
, please do so before
constructing optimizers for it. Parameters of a model after $cuda()
will be different objects from those before the call. In general, you
should make sure that the objects pointed to by model parameters subject
to optimization remain the same over the whole lifecycle of optimizer
creation and usage.
if (torch_is_installed()) { # In this example we will create a custom optimizer # that's just a simplified version of the `optim_sgd` function. optim_sgd2 <- optimizer( initialize = function(params, learning_rate) { defaults <- list( learning_rate = learning_rate ) super$initialize(params, defaults) }, step = function() { with_no_grad({ for (g in seq_along(self$param_groups)) { group <- self$param_groups[[g]] for (p in seq_along(group$params)) { param <- group$params[[p]] if (is.null(param$grad) || is_undefined_tensor(param$grad)) { next } param$add_(param$grad, alpha = -group$learning_rate) } } }) } ) x <- torch_randn(1, requires_grad = TRUE) opt <- optim_sgd2(x, learning_rate = 0.1) for (i in 1:100) { opt$zero_grad() y <- x^2 y$backward() opt$step() } all.equal(x$item(), 0, tolerance = 1e-9) }
if (torch_is_installed()) { # In this example we will create a custom optimizer # that's just a simplified version of the `optim_sgd` function. optim_sgd2 <- optimizer( initialize = function(params, learning_rate) { defaults <- list( learning_rate = learning_rate ) super$initialize(params, defaults) }, step = function() { with_no_grad({ for (g in seq_along(self$param_groups)) { group <- self$param_groups[[g]] for (p in seq_along(group$params)) { param <- group$params[[p]] if (is.null(param$grad) || is_undefined_tensor(param$grad)) { next } param$add_(param$grad, alpha = -group$learning_rate) } } }) } ) x <- torch_randn(1, requires_grad = TRUE) opt <- optim_sgd2(x, learning_rate = 0.1) for (i in 1:100) { opt$zero_grad() y <- x^2 y$backward() opt$step() } all.equal(x$item(), 0, tolerance = 1e-9) }
Abstract base class for wrapping LibTorch C++ optimizers.
optimizer_ignite( name = NULL, ..., private = NULL, active = NULL, parent_env = parent.frame() )
optimizer_ignite( name = NULL, ..., private = NULL, active = NULL, parent_env = parent.frame() )
name |
(optional) name of the optimizer |
... |
Pass any number of fields or methods. You should at least define
the |
private |
(optional) a list of private methods for the optimizer. |
active |
(optional) a list of active methods for the optimizer. |
parent_env |
used to capture the right environment to define the class. The default is fine for most situations. |
Abstract base class for wrapping LibTorch C++ optimizers.
torch::torch_optimizer
-> OptimizerIgnite
new()
Initializes the optimizer with the specified parameters and defaults.
OptimizerIgnite$new(params, defaults)
params
(list()
)
Either a list of tensors or a list of parameter groups, each containing the params
to optimizer
as well as the optimizer options such as the learning rate, weight decay, etc.
defaults
(list()
)
A list of default optimizer options.
state_dict()
Returns the state dictionary containing the current state of the optimizer.
The returned list()
contains two lists:
param_groups
: The parameter groups of the optimizer (lr
, ...) as well as to which
parameters they are applied (params
, integer indices)
state
: The states of the optimizer. The names are the indices of the parameters to which
they belong, converted to character.
OptimizerIgnite$state_dict()
(list()
)
load_state_dict()
Loads the state dictionary into the optimizer.
OptimizerIgnite$load_state_dict(state_dict)
state_dict
(list()
)
The state dictionary to load into the optimizer.
step()
Performs a single optimization step.
OptimizerIgnite$step(closure = NULL)
closure
(function()
)
A closure that conducts the forward pass and returns the loss.
(numeric()
)
The loss.
zero_grad()
Zeros out the gradients of the parameters.
OptimizerIgnite$zero_grad()
add_param_group()
Adds a new parameter group to the optimizer.
OptimizerIgnite$add_param_group(param_group)
param_group
(list()
)
A parameter group to add to the optimizer.
This should contain the params
to optimize as well as the optimizer options.
For all options that are not specified, the defaults are used.
clone()
The objects of this class are cloneable with this method.
OptimizerIgnite$clone(deep = FALSE)
deep
Whether to make a deep clone.
Samplers can be used with dataloader()
when creating batches from a torch
dataset()
.
sampler( name = NULL, inherit = Sampler, ..., private = NULL, active = NULL, parent_env = parent.frame() )
sampler( name = NULL, inherit = Sampler, ..., private = NULL, active = NULL, parent_env = parent.frame() )
name |
(optional) name of the sampler |
inherit |
(optional) you can inherit from other samplers to re-use some methods. |
... |
Pass any number of fields or methods. You should at least define
the |
private |
(optional) a list of private methods for the sampler |
active |
(optional) a list of active methods for the sampler. |
parent_env |
used to capture the right environment to define the class. The default is fine for most situations. |
A sampler must implement the .iter
and .length()
methods.
initialize
takes in a data_source
. In general this is a dataset()
.
.iter
returns a function that returns a dataset index everytime it's called.
.length
returns the maximum number of samples that can be retrieved from
that sampler.
Each sample will be retrieved by indexing tensors along the first dimension.
tensor_dataset(...)
tensor_dataset(...)
... |
tensors that have the same size of the first dimension. |
Get and set the numbers used by torch computations.
torch_set_num_threads(num_threads) torch_set_num_interop_threads(num_threads) torch_get_num_interop_threads() torch_get_num_threads()
torch_set_num_threads(num_threads) torch_set_num_interop_threads(num_threads) torch_get_num_interop_threads() torch_get_num_threads()
num_threads |
number of threads to set. |
For details see the CPU threading article in the PyTorch documentation.
torch_set_threads do not work on macOS system as it must be 1.
Abs
torch_abs(self)
torch_abs(self)
self |
(Tensor) the input tensor. |
Computes the element-wise absolute value of the given input
tensor.
if (torch_is_installed()) { torch_abs(torch_tensor(c(-1, -2, 3))) }
if (torch_is_installed()) { torch_abs(torch_tensor(c(-1, -2, 3))) }
Absolute
torch_absolute(self)
torch_absolute(self)
self |
(Tensor) the input tensor. |
Alias for torch_abs()
Acos
torch_acos(self)
torch_acos(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the arccosine of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_acos(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_acos(a) }
Acosh
torch_acosh(self)
torch_acosh(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the inverse hyperbolic cosine of the elements of input
.
The domain of the inverse hyperbolic cosine is [1, inf)
and values outside this range
will be mapped to NaN
, except for + INF
for which the output is mapped to + INF
.
if (torch_is_installed()) { a <- torch_randn(c(4))$uniform_(1, 2) a torch_acosh(a) }
if (torch_is_installed()) { a <- torch_randn(c(4))$uniform_(1, 2) a torch_acosh(a) }
Adaptive_avg_pool1d
torch_adaptive_avg_pool1d(self, output_size)
torch_adaptive_avg_pool1d(self, output_size)
self |
the input tensor |
output_size |
the target output size (single integer) |
Applies a 1D adaptive average pooling over an input signal composed of several input planes.
See nn_adaptive_avg_pool1d()
for details and output shape.
Add
torch_add(self, other, alpha = 1L)
torch_add(self, other, alpha = 1L)
self |
(Tensor) the input tensor. |
other |
(Tensor/Number) the second input tensor/number. |
alpha |
(Number) the scalar multiplier for |
Adds the scalar other
to each element of the input input
and returns a new resulting tensor.
If input
is of type FloatTensor or DoubleTensor, other
must be
a real number, otherwise it should be an integer.
Each element of the tensor other
is multiplied by the scalar
alpha
and added to each element of the tensor input
.
The resulting tensor is returned.
The shapes of input
and other
must be
broadcastable .
If other
is of type FloatTensor or DoubleTensor, alpha
must be
a real number, otherwise it should be an integer.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_add(a, 20) a = torch_randn(c(4)) a b = torch_randn(c(4, 1)) b torch_add(a, b) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_add(a, 20) a = torch_randn(c(4)) a b = torch_randn(c(4, 1)) b torch_add(a, b) }
Addbmm
torch_addbmm(self, batch1, batch2, beta = 1L, alpha = 1L)
torch_addbmm(self, batch1, batch2, beta = 1L, alpha = 1L)
self |
(Tensor) matrix to be added |
batch1 |
(Tensor) the first batch of matrices to be multiplied |
batch2 |
(Tensor) the second batch of matrices to be multiplied |
beta |
(Number, optional) multiplier for |
alpha |
(Number, optional) multiplier for |
Performs a batch matrix-matrix product of matrices stored
in batch1
and batch2
,
with a reduced add step (all matrix multiplications get accumulated
along the first dimension).
input
is added to the final result.
batch1
and batch2
must be 3-D tensors each containing the
same number of matrices.
If batch1
is a tensor,
batch2
is a
tensor,
input
must be
broadcastable with a tensor
and
out
will be a tensor.
For inputs of type FloatTensor
or DoubleTensor
, arguments beta
and alpha
must be real numbers, otherwise they should be integers.
if (torch_is_installed()) { M = torch_randn(c(3, 5)) batch1 = torch_randn(c(10, 3, 4)) batch2 = torch_randn(c(10, 4, 5)) torch_addbmm(M, batch1, batch2) }
if (torch_is_installed()) { M = torch_randn(c(3, 5)) batch1 = torch_randn(c(10, 3, 4)) batch2 = torch_randn(c(10, 4, 5)) torch_addbmm(M, batch1, batch2) }
Addcdiv
torch_addcdiv(self, tensor1, tensor2, value = 1L)
torch_addcdiv(self, tensor1, tensor2, value = 1L)
self |
(Tensor) the tensor to be added |
tensor1 |
(Tensor) the numerator tensor |
tensor2 |
(Tensor) the denominator tensor |
value |
(Number, optional) multiplier for |
Performs the element-wise division of tensor1
by tensor2
,
multiply the result by the scalar value
and add it to input
.
Integer division with addcdiv is deprecated, and in a future release
addcdiv will perform a true division of tensor1
and tensor2
.
The current addcdiv behavior can be replicated using torch_floor_divide()
for integral inputs
(input
+ value
* tensor1
// tensor2
)
and torch_div()
for float inputs
(input
+ value
* tensor1
/ tensor2
).
The new addcdiv behavior can be implemented with torch_true_divide()
(input
+ value
* torch.true_divide(tensor1
,
tensor2
).
The shapes of input
, tensor1
, and tensor2
must be
broadcastable .
For inputs of type FloatTensor
or DoubleTensor
, value
must be
a real number, otherwise an integer.
if (torch_is_installed()) { t = torch_randn(c(1, 3)) t1 = torch_randn(c(3, 1)) t2 = torch_randn(c(1, 3)) torch_addcdiv(t, t1, t2, 0.1) }
if (torch_is_installed()) { t = torch_randn(c(1, 3)) t1 = torch_randn(c(3, 1)) t2 = torch_randn(c(1, 3)) torch_addcdiv(t, t1, t2, 0.1) }
Addcmul
torch_addcmul(self, tensor1, tensor2, value = 1L)
torch_addcmul(self, tensor1, tensor2, value = 1L)
self |
(Tensor) the tensor to be added |
tensor1 |
(Tensor) the tensor to be multiplied |
tensor2 |
(Tensor) the tensor to be multiplied |
value |
(Number, optional) multiplier for |
Performs the element-wise multiplication of tensor1
by tensor2
, multiply the result by the scalar value
and add it to input
.
The shapes of tensor
, tensor1
, and tensor2
must be
broadcastable .
For inputs of type FloatTensor
or DoubleTensor
, value
must be
a real number, otherwise an integer.
if (torch_is_installed()) { t = torch_randn(c(1, 3)) t1 = torch_randn(c(3, 1)) t2 = torch_randn(c(1, 3)) torch_addcmul(t, t1, t2, 0.1) }
if (torch_is_installed()) { t = torch_randn(c(1, 3)) t1 = torch_randn(c(3, 1)) t2 = torch_randn(c(1, 3)) torch_addcmul(t, t1, t2, 0.1) }
Addmm
torch_addmm(self, mat1, mat2, beta = 1L, alpha = 1L)
torch_addmm(self, mat1, mat2, beta = 1L, alpha = 1L)
self |
(Tensor) matrix to be added |
mat1 |
(Tensor) the first matrix to be multiplied |
mat2 |
(Tensor) the second matrix to be multiplied |
beta |
(Number, optional) multiplier for |
alpha |
(Number, optional) multiplier for |
Performs a matrix multiplication of the matrices mat1
and mat2
.
The matrix input
is added to the final result.
If mat1
is a tensor,
mat2
is a
tensor, then
input
must be
broadcastable with a tensor
and
out
will be a tensor.
alpha
and beta
are scaling factors on matrix-vector product between
mat1
and mat2
and the added matrix input
respectively.
For inputs of type FloatTensor
or DoubleTensor
, arguments beta
and
alpha
must be real numbers, otherwise they should be integers.
if (torch_is_installed()) { M = torch_randn(c(2, 3)) mat1 = torch_randn(c(2, 3)) mat2 = torch_randn(c(3, 3)) torch_addmm(M, mat1, mat2) }
if (torch_is_installed()) { M = torch_randn(c(2, 3)) mat1 = torch_randn(c(2, 3)) mat2 = torch_randn(c(3, 3)) torch_addmm(M, mat1, mat2) }
Addmv
torch_addmv(self, mat, vec, beta = 1L, alpha = 1L)
torch_addmv(self, mat, vec, beta = 1L, alpha = 1L)
self |
(Tensor) vector to be added |
mat |
(Tensor) matrix to be multiplied |
vec |
(Tensor) vector to be multiplied |
beta |
(Number, optional) multiplier for |
alpha |
(Number, optional) multiplier for |
Performs a matrix-vector product of the matrix mat
and
the vector vec
.
The vector input
is added to the final result.
If mat
is a tensor,
vec
is a 1-D tensor of
size m
, then input
must be
broadcastable with a 1-D tensor of size n
and
out
will be 1-D tensor of size n
.
alpha
and beta
are scaling factors on matrix-vector product between
mat
and vec
and the added tensor input
respectively.
For inputs of type FloatTensor
or DoubleTensor
, arguments beta
and
alpha
must be real numbers, otherwise they should be integers
if (torch_is_installed()) { M = torch_randn(c(2)) mat = torch_randn(c(2, 3)) vec = torch_randn(c(3)) torch_addmv(M, mat, vec) }
if (torch_is_installed()) { M = torch_randn(c(2)) mat = torch_randn(c(2, 3)) vec = torch_randn(c(3)) torch_addmv(M, mat, vec) }
Addr
torch_addr(self, vec1, vec2, beta = 1L, alpha = 1L)
torch_addr(self, vec1, vec2, beta = 1L, alpha = 1L)
self |
(Tensor) matrix to be added |
vec1 |
(Tensor) the first vector of the outer product |
vec2 |
(Tensor) the second vector of the outer product |
beta |
(Number, optional) multiplier for |
alpha |
(Number, optional) multiplier for |
Performs the outer-product of vectors vec1
and vec2
and adds it to the matrix input
.
Optional values beta
and alpha
are scaling factors on the
outer product between vec1
and vec2
and the added matrix
input
respectively.
If vec1
is a vector of size n
and vec2
is a vector
of size m
, then input
must be
broadcastable with a matrix of size
and
out
will be a matrix of size
.
For inputs of type FloatTensor
or DoubleTensor
, arguments beta
and
alpha
must be real numbers, otherwise they should be integers
if (torch_is_installed()) { vec1 = torch_arange(1, 3) vec2 = torch_arange(1, 2) M = torch_zeros(c(3, 2)) torch_addr(M, vec1, vec2) }
if (torch_is_installed()) { vec1 = torch_arange(1, 3) vec2 = torch_arange(1, 2) M = torch_zeros(c(3, 2)) torch_addr(M, vec1, vec2) }
Allclose
torch_allclose(self, other, rtol = 1e-05, atol = 1e-08, equal_nan = FALSE)
torch_allclose(self, other, rtol = 1e-05, atol = 1e-08, equal_nan = FALSE)
self |
(Tensor) first tensor to compare |
other |
(Tensor) second tensor to compare |
rtol |
(float, optional) relative tolerance. Default: 1e-05 |
atol |
(float, optional) absolute tolerance. Default: 1e-08 |
equal_nan |
(bool, optional) if |
This function checks if all input
and other
satisfy the condition:
elementwise, for all elements of input
and other
. The behaviour of this function is analogous to
numpy.allclose <https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html>
_
if (torch_is_installed()) { torch_allclose(torch_tensor(c(10000., 1e-07)), torch_tensor(c(10000.1, 1e-08))) torch_allclose(torch_tensor(c(10000., 1e-08)), torch_tensor(c(10000.1, 1e-09))) torch_allclose(torch_tensor(c(1.0, NaN)), torch_tensor(c(1.0, NaN))) torch_allclose(torch_tensor(c(1.0, NaN)), torch_tensor(c(1.0, NaN)), equal_nan=TRUE) }
if (torch_is_installed()) { torch_allclose(torch_tensor(c(10000., 1e-07)), torch_tensor(c(10000.1, 1e-08))) torch_allclose(torch_tensor(c(10000., 1e-08)), torch_tensor(c(10000.1, 1e-09))) torch_allclose(torch_tensor(c(1.0, NaN)), torch_tensor(c(1.0, NaN))) torch_allclose(torch_tensor(c(1.0, NaN)), torch_tensor(c(1.0, NaN)), equal_nan=TRUE) }
Amax
torch_amax(self, dim = list(), keepdim = FALSE)
torch_amax(self, dim = list(), keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
keepdim |
(bool) whether the output tensor has |
Returns the maximum value of each slice of the input
tensor in the given
dimension(s) dim
.
The difference between max
/min
and amax
/amin
is:
amax
/amin
supports reducing on multiple dimensions,
amax
/amin
does not return indices,
amax
/amin
evenly distributes gradient between equal values,
while max(dim)
/min(dim)
propagates gradient only to a single
index in the source tensor.
If keepdim is
TRUE, the output tensors are of the same size as
inputexcept in the dimension(s)
dimwhere they are of size 1. Otherwise,
dims are squeezed (see [torch_squeeze()]), resulting in the output tensors having fewer dimension than
input'.
if (torch_is_installed()) { a <- torch_randn(c(4, 4)) a torch_amax(a, 1) }
if (torch_is_installed()) { a <- torch_randn(c(4, 4)) a torch_amax(a, 1) }
Amin
torch_amin(self, dim = list(), keepdim = FALSE)
torch_amin(self, dim = list(), keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
keepdim |
(bool) whether the output tensor has |
Returns the minimum value of each slice of the input
tensor in the given
dimension(s) dim
.
The difference between max
/min
and amax
/amin
is:
amax
/amin
supports reducing on multiple dimensions,
amax
/amin
does not return indices,
amax
/amin
evenly distributes gradient between equal values,
while max(dim)
/min(dim)
propagates gradient only to a single
index in the source tensor.
If keepdim
is TRUE
, the output tensors are of the same size as
input
except in the dimension(s) dim
where they are of size 1.
Otherwise, dim
s are squeezed (see torch_squeeze()
), resulting in
the output tensors having fewer dimensions than input
.
if (torch_is_installed()) { a <- torch_randn(c(4, 4)) a torch_amin(a, 1) }
if (torch_is_installed()) { a <- torch_randn(c(4, 4)) a torch_amin(a, 1) }
Angle
torch_angle(self)
torch_angle(self)
self |
(Tensor) the input tensor. |
Computes the element-wise angle (in radians) of the given input
tensor.
if (torch_is_installed()) { ## Not run: torch_angle(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i)))*180/3.14159 ## End(Not run) }
if (torch_is_installed()) { ## Not run: torch_angle(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i)))*180/3.14159 ## End(Not run) }
Arange
torch_arange( start, end, step = 1L, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_arange( start, end, step = 1L, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
start |
(Number) the starting value for the set of points. Default: |
end |
(Number) the ending value for the set of points |
step |
(Number) the gap between each pair of adjacent points. Default: |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a 1-D tensor of size
with values from the interval
[start, end)
taken with common difference
step
beginning from start
.
Note that non-integer step
is subject to floating point rounding errors when
comparing against end
; to avoid inconsistency, we advise adding a small epsilon to end
in such cases.
if (torch_is_installed()) { torch_arange(start = 0, end = 5) torch_arange(1, 4) torch_arange(1, 2.5, 0.5) }
if (torch_is_installed()) { torch_arange(start = 0, end = 5) torch_arange(1, 4) torch_arange(1, 2.5, 0.5) }
Arccos
torch_arccos(self)
torch_arccos(self)
self |
(Tensor) the input tensor. |
Alias for torch_acos()
.
Arccosh
torch_arccosh(self)
torch_arccosh(self)
self |
(Tensor) the input tensor. |
Alias for torch_acosh()
.
Arcsin
torch_arcsin(self)
torch_arcsin(self)
self |
(Tensor) the input tensor. |
Alias for torch_asin()
.
Arcsinh
torch_arcsinh(self)
torch_arcsinh(self)
self |
(Tensor) the input tensor. |
Alias for torch_asinh()
.
Arctan
torch_arctan(self)
torch_arctan(self)
self |
(Tensor) the input tensor. |
Alias for torch_atan()
.
Arctanh
torch_arctanh(self)
torch_arctanh(self)
self |
(Tensor) the input tensor. |
Alias for torch_atanh()
.
Argmax
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. If |
keepdim |
(bool) whether the output tensor has |
Returns the indices of the maximum value of all elements in the input
tensor.
This is the second value returned by torch_max
. See its
documentation for the exact semantics of this method.
Returns the indices of the maximum values of a tensor across a dimension.
This is the second value returned by torch_max
. See its
documentation for the exact semantics of this method.
if (torch_is_installed()) { ## Not run: a = torch_randn(c(4, 4)) a torch_argmax(a) ## End(Not run) a = torch_randn(c(4, 4)) a torch_argmax(a, dim=1) }
if (torch_is_installed()) { ## Not run: a = torch_randn(c(4, 4)) a torch_argmax(a) ## End(Not run) a = torch_randn(c(4, 4)) a torch_argmax(a, dim=1) }
Argmin
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. If |
keepdim |
(bool) whether the output tensor has |
Returns the indices of the minimum value of all elements in the input
tensor.
This is the second value returned by torch_min
. See its
documentation for the exact semantics of this method.
Returns the indices of the minimum values of a tensor across a dimension.
This is the second value returned by torch_min
. See its
documentation for the exact semantics of this method.
if (torch_is_installed()) { a = torch_randn(c(4, 4)) a torch_argmin(a) a = torch_randn(c(4, 4)) a torch_argmin(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(4, 4)) a torch_argmin(a) a = torch_randn(c(4, 4)) a torch_argmin(a, dim=1) }
Argsort
torch_argsort(self, dim = -1L, descending = FALSE)
torch_argsort(self, dim = -1L, descending = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int, optional) the dimension to sort along |
descending |
(bool, optional) controls the sorting order (ascending or descending) |
Returns the indices that sort a tensor along a given dimension in ascending order by value.
This is the second value returned by torch_sort
. See its documentation
for the exact semantics of this method.
if (torch_is_installed()) { a = torch_randn(c(4, 4)) a torch_argsort(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(4, 4)) a torch_argsort(a, dim=1) }
As_strided
torch_as_strided(self, size, stride, storage_offset = NULL)
torch_as_strided(self, size, stride, storage_offset = NULL)
self |
(Tensor) the input tensor. |
size |
(tuple or ints) the shape of the output tensor |
stride |
(tuple or ints) the stride of the output tensor |
storage_offset |
(int, optional) the offset in the underlying storage of the output tensor |
Create a view of an existing torch_Tensor
input
with specified
size
, stride
and storage_offset
.
More than one element of a created tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first.
Many PyTorch functions, which return a view of a tensor, are internally implemented with this function. Those functions, like `torch_Tensor.expand`, are easier to read and are therefore more advisable to use.
if (torch_is_installed()) { x = torch_randn(c(3, 3)) x t = torch_as_strided(x, list(2, 2), list(1, 2)) t t = torch_as_strided(x, list(2, 2), list(1, 2), 1) t }
if (torch_is_installed()) { x = torch_randn(c(3, 3)) x t = torch_as_strided(x, list(2, 2), list(1, 2)) t t = torch_as_strided(x, list(2, 2), list(1, 2), 1) t }
Asin
torch_asin(self)
torch_asin(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the arcsine of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_asin(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_asin(a) }
Asinh
torch_asinh(self)
torch_asinh(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the inverse hyperbolic sine of the elements of input
.
if (torch_is_installed()) { a <- torch_randn(c(4)) a torch_asinh(a) }
if (torch_is_installed()) { a <- torch_randn(c(4)) a torch_asinh(a) }
Atan
torch_atan(self)
torch_atan(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the arctangent of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_atan(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_atan(a) }
Atan2
torch_atan2(self, other)
torch_atan2(self, other)
self |
(Tensor) the first input tensor |
other |
(Tensor) the second input tensor |
Element-wise arctangent of
with consideration of the quadrant. Returns a new tensor with the signed angles
in radians between vector
and vector
. (Note that
, the second
parameter, is the x-coordinate, while
, the first
parameter, is the y-coordinate.)
The shapes of input
and other
must be
broadcastable .
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_atan2(a, torch_randn(c(4))) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_atan2(a, torch_randn(c(4))) }
Atanh
torch_atanh(self)
torch_atanh(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the inverse hyperbolic tangent of the elements of input
.
The domain of the inverse hyperbolic tangent is (-1, 1)
and values outside this range
will be mapped to NaN
, except for the values 1
and -1
for which the output is
mapped to +/-INF
respectively.
if (torch_is_installed()) { a = torch_randn(c(4))$uniform_(-1, 1) a torch_atanh(a) }
if (torch_is_installed()) { a = torch_randn(c(4))$uniform_(-1, 1) a torch_atanh(a) }
Returns a 1-dimensional view of each input tensor with zero dimensions. Input tensors with one or more dimensions are returned as-is.
torch_atleast_1d(self)
torch_atleast_1d(self)
self |
(Tensor or list of Tensors) |
if (torch_is_installed()) { x <- torch_randn(c(2)) x torch_atleast_1d(x) x <- torch_tensor(1.) x torch_atleast_1d(x) x <- torch_tensor(0.5) y <- torch_tensor(1.) torch_atleast_1d(list(x,y)) }
if (torch_is_installed()) { x <- torch_randn(c(2)) x torch_atleast_1d(x) x <- torch_tensor(1.) x torch_atleast_1d(x) x <- torch_tensor(0.5) y <- torch_tensor(1.) torch_atleast_1d(list(x,y)) }
Returns a 2-dimensional view of each each input tensor with zero dimensions. Input tensors with two or more dimensions are returned as-is.
torch_atleast_2d(self)
torch_atleast_2d(self)
self |
(Tensor or list of Tensors) |
if (torch_is_installed()) { x <- torch_tensor(1.) x torch_atleast_2d(x) x <- torch_randn(c(2,2)) x torch_atleast_2d(x) x <- torch_tensor(0.5) y <- torch_tensor(1.) torch_atleast_2d(list(x,y)) }
if (torch_is_installed()) { x <- torch_tensor(1.) x torch_atleast_2d(x) x <- torch_randn(c(2,2)) x torch_atleast_2d(x) x <- torch_tensor(0.5) y <- torch_tensor(1.) torch_atleast_2d(list(x,y)) }
Returns a 3-dimensional view of each each input tensor with zero dimensions. Input tensors with three or more dimensions are returned as-is.
torch_atleast_3d(self)
torch_atleast_3d(self)
self |
(Tensor or list of Tensors) |
Avg_pool1d
torch_avg_pool1d( self, kernel_size, stride = list(), padding = 0L, ceil_mode = FALSE, count_include_pad = TRUE )
torch_avg_pool1d( self, kernel_size, stride = list(), padding = 0L, ceil_mode = FALSE, count_include_pad = TRUE )
self |
input tensor of shape |
kernel_size |
the size of the window. Can be a single number or a tuple |
stride |
the stride of the window. Can be a single number or a tuple |
padding |
implicit zero paddings on both sides of the input. Can be a single number or a tuple |
ceil_mode |
when |
count_include_pad |
when |
Applies a 1D average pooling over an input signal composed of several input planes.
See nn_avg_pool1d()
for details and output shape.
Baddbmm
torch_baddbmm(self, batch1, batch2, beta = 1L, alpha = 1L)
torch_baddbmm(self, batch1, batch2, beta = 1L, alpha = 1L)
self |
(Tensor) the tensor to be added |
batch1 |
(Tensor) the first batch of matrices to be multiplied |
batch2 |
(Tensor) the second batch of matrices to be multiplied |
beta |
(Number, optional) multiplier for |
alpha |
(Number, optional) multiplier for |
Performs a batch matrix-matrix product of matrices in batch1
and batch2
.
input
is added to the final result.
batch1
and batch2
must be 3-D tensors each containing the same
number of matrices.
If batch1
is a tensor,
batch2
is a
tensor, then
input
must be
broadcastable with a
tensor and
out
will be a
tensor. Both
alpha
and beta
mean the
same as the scaling factors used in torch_addbmm
.
For inputs of type FloatTensor
or DoubleTensor
, arguments beta
and
alpha
must be real numbers, otherwise they should be integers.
if (torch_is_installed()) { M = torch_randn(c(10, 3, 5)) batch1 = torch_randn(c(10, 3, 4)) batch2 = torch_randn(c(10, 4, 5)) torch_baddbmm(M, batch1, batch2) }
if (torch_is_installed()) { M = torch_randn(c(10, 3, 5)) batch1 = torch_randn(c(10, 3, 4)) batch2 = torch_randn(c(10, 4, 5)) torch_baddbmm(M, batch1, batch2) }
Bartlett_window
torch_bartlett_window( window_length, periodic = TRUE, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_bartlett_window( window_length, periodic = TRUE, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
window_length |
(int) the size of returned window |
periodic |
(bool, optional) If TRUE, returns a window to be used as periodic function. If False, return a symmetric window. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Bartlett window function.
where is the full window size.
The input window_length
is a positive integer controlling the
returned window size. periodic
flag determines whether the returned
window trims off the last duplicate value from the symmetric window and is
ready to be used as a periodic window with functions like
torch_stft
. Therefore, if periodic
is true, the in
above formula is in fact
. Also, we always have
torch_bartlett_window(L, periodic=TRUE)
equal to
torch_bartlett_window(L + 1, periodic=False)[:-1])
.
If `window_length` \eqn{=1}, the returned window contains a single value 1.
Bernoulli
torch_bernoulli(self, p, generator = NULL)
torch_bernoulli(self, p, generator = NULL)
self |
(Tensor) the input tensor of probability values for the Bernoulli distribution |
p |
(Number) a probability value. If |
generator |
( |
Draws binary random numbers (0 or 1) from a Bernoulli distribution.
The input
tensor should be a tensor containing probabilities
to be used for drawing the binary random number.
Hence, all values in input
have to be in the range:
.
The element of the output tensor will draw a
value
according to the
probability value given
in
input
.
The returned out
tensor only has values 0 or 1 and is of the same
shape as input
.
out
can have integral dtype
, but input
must have floating
point dtype
.
if (torch_is_installed()) { a = torch_empty(c(3, 3))$uniform_(0, 1) # generate a uniform random matrix with range c(0, 1) a torch_bernoulli(a) a = torch_ones(c(3, 3)) # probability of drawing "1" is 1 torch_bernoulli(a) a = torch_zeros(c(3, 3)) # probability of drawing "1" is 0 torch_bernoulli(a) }
if (torch_is_installed()) { a = torch_empty(c(3, 3))$uniform_(0, 1) # generate a uniform random matrix with range c(0, 1) a torch_bernoulli(a) a = torch_ones(c(3, 3)) # probability of drawing "1" is 1 torch_bernoulli(a) a = torch_zeros(c(3, 3)) # probability of drawing "1" is 0 torch_bernoulli(a) }
Bincount
self |
(Tensor) 1-d int tensor |
weights |
(Tensor) optional, weight for each value in the input tensor. Should be of same size as input tensor. |
minlength |
(int) optional, minimum number of bins. Should be non-negative. |
Count the frequency of each value in an array of non-negative ints.
The number of bins (size 1) is one larger than the largest value in
input
unless input
is empty, in which case the result is a
tensor of size 0. If minlength
is specified, the number of bins is at least
minlength
and if input
is empty, then the result is tensor of size
minlength
filled with zeros. If n
is the value at position i
,
out[n] += weights[i]
if weights
is specified else
out[n] += 1
.
.. include:: cuda_deterministic.rst
if (torch_is_installed()) { input = torch_randint(1, 8, list(5), dtype=torch_int64()) weights = torch_linspace(0, 1, steps=5) input weights torch_bincount(input, weights) input$bincount(weights) }
if (torch_is_installed()) { input = torch_randint(1, 8, list(5), dtype=torch_int64()) weights = torch_linspace(0, 1, steps=5) input weights torch_bincount(input, weights) input$bincount(weights) }
Bitwise_and
torch_bitwise_and(self, other)
torch_bitwise_and(self, other)
self |
NA the first input tensor |
other |
NA the second input tensor |
Computes the bitwise AND of input
and other
. The input tensor must be of
integral or Boolean types. For bool tensors, it computes the logical AND.
Bitwise_not
torch_bitwise_not(self)
torch_bitwise_not(self)
self |
(Tensor) the input tensor. |
Computes the bitwise NOT of the given input tensor. The input tensor must be of integral or Boolean types. For bool tensors, it computes the logical NOT.
Bitwise_or
torch_bitwise_or(self, other)
torch_bitwise_or(self, other)
self |
NA the first input tensor |
other |
NA the second input tensor |
Computes the bitwise OR of input
and other
. The input tensor must be of
integral or Boolean types. For bool tensors, it computes the logical OR.
Bitwise_xor
torch_bitwise_xor(self, other)
torch_bitwise_xor(self, other)
self |
NA the first input tensor |
other |
NA the second input tensor |
Computes the bitwise XOR of input
and other
. The input tensor must be of
integral or Boolean types. For bool tensors, it computes the logical XOR.
Blackman_window
torch_blackman_window( window_length, periodic = TRUE, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_blackman_window( window_length, periodic = TRUE, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
window_length |
(int) the size of returned window |
periodic |
(bool, optional) If TRUE, returns a window to be used as periodic function. If False, return a symmetric window. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Blackman window function.
where is the full window size.
The input window_length
is a positive integer controlling the
returned window size. periodic
flag determines whether the returned
window trims off the last duplicate value from the symmetric window and is
ready to be used as a periodic window with functions like
torch_stft
. Therefore, if periodic
is true, the in
above formula is in fact
. Also, we always have
torch_blackman_window(L, periodic=TRUE)
equal to
torch_blackman_window(L + 1, periodic=False)[:-1])
.
If `window_length` \eqn{=1}, the returned window contains a single value 1.
Create a block diagonal matrix from provided tensors.
torch_block_diag(tensors)
torch_block_diag(tensors)
tensors |
(list of tensors) One or more tensors with 0, 1, or 2 dimensions. |
if (torch_is_installed()) { A <- torch_tensor(rbind(c(0, 1), c(1, 0))) B <- torch_tensor(rbind(c(3, 4, 5), c(6, 7, 8))) C <- torch_tensor(7) D <- torch_tensor(c(1, 2, 3)) E <- torch_tensor(rbind(4, 5, 6)) torch_block_diag(list(A, B, C, D, E)) }
if (torch_is_installed()) { A <- torch_tensor(rbind(c(0, 1), c(1, 0))) B <- torch_tensor(rbind(c(3, 4, 5), c(6, 7, 8))) C <- torch_tensor(7) D <- torch_tensor(c(1, 2, 3)) E <- torch_tensor(rbind(4, 5, 6)) torch_block_diag(list(A, B, C, D, E)) }
Bmm
torch_bmm(self, mat2)
torch_bmm(self, mat2)
self |
(Tensor) the first batch of matrices to be multiplied |
mat2 |
(Tensor) the second batch of matrices to be multiplied |
Performs a batch matrix-matrix product of matrices stored in input
and mat2
.
input
and mat2
must be 3-D tensors each containing
the same number of matrices.
If input
is a tensor,
mat2
is a
tensor,
out
will be a
tensor.
This function does not broadcast .
For broadcasting matrix products, see torch_matmul
.
if (torch_is_installed()) { input = torch_randn(c(10, 3, 4)) mat2 = torch_randn(c(10, 4, 5)) res = torch_bmm(input, mat2) res }
if (torch_is_installed()) { input = torch_randn(c(10, 3, 4)) mat2 = torch_randn(c(10, 4, 5)) res = torch_bmm(input, mat2) res }
Broadcast_tensors
torch_broadcast_tensors(tensors)
torch_broadcast_tensors(tensors)
tensors |
a list containing any number of tensors of the same type |
Broadcasts the given tensors according to broadcasting-semantics.
if (torch_is_installed()) { x = torch_arange(0, 3)$view(c(1, 4)) y = torch_arange(0, 2)$view(c(3, 1)) out = torch_broadcast_tensors(list(x, y)) out[[1]] }
if (torch_is_installed()) { x = torch_arange(0, 3)$view(c(1, 4)) y = torch_arange(0, 2)$view(c(3, 1)) out = torch_broadcast_tensors(list(x, y)) out[[1]] }
Bucketize
torch_bucketize(self, boundaries, out_int32 = FALSE, right = FALSE)
torch_bucketize(self, boundaries, out_int32 = FALSE, right = FALSE)
self |
(Tensor or Scalar) N-D tensor or a Scalar containing the search value(s). |
boundaries |
(Tensor) 1-D tensor, must contain a monotonically increasing sequence. |
out_int32 |
(bool, optional) – indicate the output data type. |
right |
(bool, optional) – if False, return the first suitable location that is found. If True, return the last such index. If no suitable index found, return 0 for non-numerical value (eg. nan, inf) or the size of boundaries (one pass the last index). In other words, if False, gets the lower bound index for each value in input from boundaries. If True, gets the upper bound index instead. Default value is False. |
Returns the indices of the buckets to which each value in the input
belongs, where the
boundaries of the buckets are set by boundaries
. Return a new tensor with the same size
as input
. If right
is FALSE (default), then the left boundary is closed.
if (torch_is_installed()) { boundaries <- torch_tensor(c(1, 3, 5, 7, 9)) boundaries v <- torch_tensor(rbind(c(3, 6, 9), c(3, 6, 9))) v torch_bucketize(v, boundaries) torch_bucketize(v, boundaries, right=TRUE) }
if (torch_is_installed()) { boundaries <- torch_tensor(c(1, 3, 5, 7, 9)) boundaries v <- torch_tensor(rbind(c(3, 6, 9), c(3, 6, 9))) v torch_bucketize(v, boundaries) torch_bucketize(v, boundaries, right=TRUE) }
Can_cast
torch_can_cast(from_, to)
torch_can_cast(from_, to)
from_ |
(dtype) The original |
to |
(dtype) The target |
Determines if a type conversion is allowed under PyTorch casting rules described in the type promotion documentation .
if (torch_is_installed()) { torch_can_cast(torch_double(), torch_float()) torch_can_cast(torch_float(), torch_int()) }
if (torch_is_installed()) { torch_can_cast(torch_double(), torch_float()) torch_can_cast(torch_float(), torch_int()) }
Do cartesian product of the given sequence of tensors.
torch_cartesian_prod(tensors)
torch_cartesian_prod(tensors)
tensors |
a list containing any number of 1 dimensional tensors. |
if (torch_is_installed()) { a = c(1, 2, 3) b = c(4, 5) tensor_a = torch_tensor(a) tensor_b = torch_tensor(b) torch_cartesian_prod(list(tensor_a, tensor_b)) }
if (torch_is_installed()) { a = c(1, 2, 3) b = c(4, 5) tensor_a = torch_tensor(a) tensor_b = torch_tensor(b) torch_cartesian_prod(list(tensor_a, tensor_b)) }
Cat
torch_cat(tensors, dim = 1L)
torch_cat(tensors, dim = 1L)
tensors |
(sequence of Tensors) any python sequence of tensors of the same type. Non-empty tensors provided must have the same shape, except in the cat dimension. |
dim |
(int, optional) the dimension over which the tensors are concatenated |
Concatenates the given sequence of seq
tensors in the given dimension.
All tensors must either have the same shape (except in the concatenating
dimension) or be empty.
torch_cat
can be seen as an inverse operation for torch_split()
and torch_chunk
.
torch_cat
can be best understood via examples.
if (torch_is_installed()) { x = torch_randn(c(2, 3)) x torch_cat(list(x, x, x), 1) torch_cat(list(x, x, x), 2) }
if (torch_is_installed()) { x = torch_randn(c(2, 3)) x torch_cat(list(x, x, x), 1) torch_cat(list(x, x, x), 2) }
Cdist
torch_cdist(x1, x2, p = 2L, compute_mode = NULL)
torch_cdist(x1, x2, p = 2L, compute_mode = NULL)
x1 |
(Tensor) input tensor of shape |
x2 |
(Tensor) input tensor of shape |
p |
NA p value for the p-norm distance to calculate between each vector pair |
compute_mode |
NA 'use_mm_for_euclid_dist_if_necessary' - will use matrix multiplication approach to calculate euclidean distance (p = 2) if P > 25 or R > 25 'use_mm_for_euclid_dist' - will always use matrix multiplication approach to calculate euclidean distance (p = 2) 'donot_use_mm_for_euclid_dist' - will never use matrix multiplication approach to calculate euclidean distance (p = 2) Default: use_mm_for_euclid_dist_if_necessary. |
Computes batched the p-norm distance between each pair of the two collections of row vectors.
Ceil
torch_ceil(self)
torch_ceil(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the ceil of the elements of input
,
the smallest integer greater than or equal to each element.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_ceil(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_ceil(a) }
Celu
torch_celu(self, alpha = 1L)
torch_celu(self, alpha = 1L)
self |
the input tensor |
alpha |
the alpha value for the CELU formulation. Default: 1.0 |
See nnf_celu()
for more info.
Celu_
torch_celu_(self, alpha = 1L)
torch_celu_(self, alpha = 1L)
self |
the input tensor |
alpha |
the alpha value for the CELU formulation. Default: 1.0 |
In-place version of torch_celu()
.
Chain_matmul
torch_chain_matmul(matrices)
torch_chain_matmul(matrices)
matrices |
(Tensors...) a sequence of 2 or more 2-D tensors whose product is to be determined. |
Returns the matrix product of the 2-D tensors. This product is efficiently computed
using the matrix chain order algorithm which selects the order in which incurs the lowest cost in terms
of arithmetic operations (
[CLRS]
_). Note that since this is a function to compute the product,
needs to be greater than or equal to 2; if equal to 2 then a trivial matrix-matrix product is returned.
If
is 1, then this is a no-op - the original matrix is returned as is.
if (torch_is_installed()) { a = torch_randn(c(3, 4)) b = torch_randn(c(4, 5)) c = torch_randn(c(5, 6)) d = torch_randn(c(6, 7)) torch_chain_matmul(list(a, b, c, d)) }
if (torch_is_installed()) { a = torch_randn(c(3, 4)) b = torch_randn(c(4, 5)) c = torch_randn(c(5, 6)) d = torch_randn(c(6, 7)) torch_chain_matmul(list(a, b, c, d)) }
Channel_shuffle
torch_channel_shuffle(self, groups)
torch_channel_shuffle(self, groups)
self |
(Tensor) the input tensor |
groups |
(int) number of groups to divide channels in and rearrange. |
math:(*, C , H, W)
:
Divide the channels in a tensor of shape
into g groups and rearrange them as
,
while keeping the original tensor shape.
if (torch_is_installed()) { input <- torch_randn(c(1, 4, 2, 2)) print(input) output <- torch_channel_shuffle(input, 2) print(output) }
if (torch_is_installed()) { input <- torch_randn(c(1, 4, 2, 2)) print(input) output <- torch_channel_shuffle(input, 2) print(output) }
Cholesky
torch_cholesky(self, upper = FALSE)
torch_cholesky(self, upper = FALSE)
self |
(Tensor) the input tensor |
upper |
(bool, optional) flag that indicates whether to return a
upper or lower triangular matrix. Default: |
Computes the Cholesky decomposition of a symmetric positive-definite
matrix or for batches of symmetric positive-definite matrices.
If upper
is TRUE
, the returned matrix U
is upper-triangular, and
the decomposition has the form:
If upper
is FALSE
, the returned matrix L
is lower-triangular, and
the decomposition has the form:
If upper
is TRUE
, and is a batch of symmetric positive-definite
matrices, then the returned tensor will be composed of upper-triangular Cholesky factors
of each of the individual matrices. Similarly, when
upper
is FALSE
, the returned
tensor will be composed of lower-triangular Cholesky factors of each of the individual
matrices.
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a = torch_mm(a, a$t()) # make symmetric positive-definite l = torch_cholesky(a) a l torch_mm(l, l$t()) a = torch_randn(c(3, 2, 2)) ## Not run: a = torch_matmul(a, a$transpose(-1, -2)) + 1e-03 # make symmetric positive-definite l = torch_cholesky(a) z = torch_matmul(l, l$transpose(-1, -2)) torch_max(torch_abs(z - a)) # Max non-zero ## End(Not run) }
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a = torch_mm(a, a$t()) # make symmetric positive-definite l = torch_cholesky(a) a l torch_mm(l, l$t()) a = torch_randn(c(3, 2, 2)) ## Not run: a = torch_matmul(a, a$transpose(-1, -2)) + 1e-03 # make symmetric positive-definite l = torch_cholesky(a) z = torch_matmul(l, l$transpose(-1, -2)) torch_max(torch_abs(z - a)) # Max non-zero ## End(Not run) }
Cholesky_inverse
torch_cholesky_inverse(self, upper = FALSE)
torch_cholesky_inverse(self, upper = FALSE)
self |
(Tensor) the input 2-D tensor |
upper |
(bool, optional) whether to return a lower (default) or upper triangular matrix |
Computes the inverse of a symmetric positive-definite matrix using its
Cholesky factor
: returns matrix
inv
. The inverse is computed using
LAPACK routines dpotri
and spotri
(and the corresponding MAGMA routines).
If upper
is FALSE
, is lower triangular
such that the returned tensor is
If upper
is TRUE
or not provided, is upper
triangular such that the returned tensor is
if (torch_is_installed()) { ## Not run: a = torch_randn(c(3, 3)) a = torch_mm(a, a$t()) + 1e-05 * torch_eye(3) # make symmetric positive definite u = torch_cholesky(a) a torch_cholesky_inverse(u) a$inverse() ## End(Not run) }
if (torch_is_installed()) { ## Not run: a = torch_randn(c(3, 3)) a = torch_mm(a, a$t()) + 1e-05 * torch_eye(3) # make symmetric positive definite u = torch_cholesky(a) a torch_cholesky_inverse(u) a$inverse() ## End(Not run) }
Cholesky_solve
torch_cholesky_solve(self, input2, upper = FALSE)
torch_cholesky_solve(self, input2, upper = FALSE)
self |
(Tensor) input matrix |
input2 |
(Tensor) input matrix |
upper |
(bool, optional) whether to consider the Cholesky factor as a lower or upper triangular matrix. Default: |
Solves a linear system of equations with a positive semidefinite
matrix to be inverted given its Cholesky factor matrix .
If upper
is FALSE
, is and lower triangular and
c
is
returned such that:
If upper
is TRUE
or not provided, is upper triangular
and
c
is returned such that:
torch_cholesky_solve(b, u)
can take in 2D inputs b, u
or inputs that are
batches of 2D matrices. If the inputs are batches, then returns
batched outputs c
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a = torch_mm(a, a$t()) # make symmetric positive definite u = torch_cholesky(a) a b = torch_randn(c(3, 2)) b torch_cholesky_solve(b, u) torch_mm(a$inverse(), b) }
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a = torch_mm(a, a$t()) # make symmetric positive definite u = torch_cholesky(a) a b = torch_randn(c(3, 2)) b torch_cholesky_solve(b, u) torch_mm(a$inverse(), b) }
Chunk
torch_chunk(self, chunks, dim = 1L)
torch_chunk(self, chunks, dim = 1L)
self |
(Tensor) the tensor to split |
chunks |
(int) number of chunks to return |
dim |
(int) dimension along which to split the tensor |
Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor.
Last chunk will be smaller if the tensor size along the given dimension
dim
is not divisible by chunks
.
Clamp
torch_clamp(self, min = NULL, max = NULL)
torch_clamp(self, min = NULL, max = NULL)
self |
(Tensor) the input tensor. |
min |
(Number) lower-bound of the range to be clamped to |
max |
(Number) upper-bound of the range to be clamped to |
Clamp all elements in input
into the range [
min
, max
]
and return
a resulting tensor:
If input
is of type FloatTensor
or DoubleTensor
, args min
and max
must be real numbers, otherwise they should be integers.
Clamps all elements in input
to be larger or equal min
.
If input
is of type FloatTensor
or DoubleTensor
, value
should be a real number, otherwise it should be an integer.
Clamps all elements in input
to be smaller or equal max
.
If input
is of type FloatTensor
or DoubleTensor
, value
should be a real number, otherwise it should be an integer.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_clamp(a, min=-0.5, max=0.5) a = torch_randn(c(4)) a torch_clamp(a, min=0.5) a = torch_randn(c(4)) a torch_clamp(a, max=0.5) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_clamp(a, min=-0.5, max=0.5) a = torch_randn(c(4)) a torch_clamp(a, min=0.5) a = torch_randn(c(4)) a torch_clamp(a, max=0.5) }
Clip
torch_clip(self, min = NULL, max = NULL)
torch_clip(self, min = NULL, max = NULL)
self |
(Tensor) the input tensor. |
min |
(Number) lower-bound of the range to be clamped to |
max |
(Number) upper-bound of the range to be clamped to |
Alias for torch_clamp()
.
Clone
torch_clone(self, memory_format = NULL)
torch_clone(self, memory_format = NULL)
self |
(Tensor) the input tensor. |
memory_format |
a torch memory format. see |
Returns a copy of input
.
This function is differentiable, so gradients will flow back from the
result of this operation to input
. To create a tensor without an
autograd relationship to input
see Tensor$detach
.
Combinations
torch_combinations(self, r = 2L, with_replacement = FALSE)
torch_combinations(self, r = 2L, with_replacement = FALSE)
self |
(Tensor) 1D vector. |
r |
(int, optional) number of elements to combine |
with_replacement |
(boolean, optional) whether to allow duplication in combination |
Compute combinations of length of the given tensor. The behavior is similar to
python's
itertools.combinations
when with_replacement
is set to False
, and
itertools.combinations_with_replacement
when with_replacement
is set to TRUE
.
if (torch_is_installed()) { a = c(1, 2, 3) tensor_a = torch_tensor(a) torch_combinations(tensor_a) torch_combinations(tensor_a, r=3) torch_combinations(tensor_a, with_replacement=TRUE) }
if (torch_is_installed()) { a = c(1, 2, 3) tensor_a = torch_tensor(a) torch_combinations(tensor_a) torch_combinations(tensor_a, r=3) torch_combinations(tensor_a, with_replacement=TRUE) }
Complex
torch_complex(real, imag)
torch_complex(real, imag)
real |
(Tensor) The real part of the complex tensor. Must be float or double. |
imag |
(Tensor) The imaginary part of the complex tensor. Must be same dtype
as |
Constructs a complex tensor with its real part equal to real
and its
imaginary part equal to imag
.
if (torch_is_installed()) { real <- torch_tensor(c(1, 2), dtype=torch_float32()) imag <- torch_tensor(c(3, 4), dtype=torch_float32()) z <- torch_complex(real, imag) z z$dtype }
if (torch_is_installed()) { real <- torch_tensor(c(1, 2), dtype=torch_float32()) imag <- torch_tensor(c(3, 4), dtype=torch_float32()) z <- torch_complex(real, imag) z z$dtype }
Conj
torch_conj(self)
torch_conj(self)
self |
(Tensor) the input tensor. |
Computes the element-wise conjugate of the given input
tensor.
if (torch_is_installed()) { ## Not run: torch_conj(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i))) ## End(Not run) }
if (torch_is_installed()) { ## Not run: torch_conj(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i))) ## End(Not run) }
Conv_tbc
torch_conv_tbc(self, weight, bias, pad = 0L)
torch_conv_tbc(self, weight, bias, pad = 0L)
self |
NA input tensor of shape |
weight |
NA filter of shape ( |
bias |
NA bias of shape ( |
pad |
NA number of timesteps to pad. Default: 0 |
Applies a 1-dimensional sequence convolution over an input sequence. Input and output dimensions are (Time, Batch, Channels) - hence TBC.
Conv_transpose1d
torch_conv_transpose1d( input, weight, bias = list(), stride = 1L, padding = 0L, output_padding = 0L, groups = 1L, dilation = 1L )
torch_conv_transpose1d( input, weight, bias = list(), stride = 1L, padding = 0L, output_padding = 0L, groups = 1L, dilation = 1L )
input |
input tensor of shape |
weight |
filters of shape |
bias |
optional bias of shape |
stride |
the stride of the convolving kernel. Can be a single number or a tuple |
padding |
|
output_padding |
additional size added to one side of each dimension in the output shape. Can be a single number or a tuple |
groups |
split input into groups, |
dilation |
the spacing between kernel elements. Can be a single number or a tuple |
Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".
See nn_conv_transpose1d()
for details and output shape.
if (torch_is_installed()) { inputs = torch_randn(c(20, 16, 50)) weights = torch_randn(c(16, 33, 5)) nnf_conv_transpose1d(inputs, weights) }
if (torch_is_installed()) { inputs = torch_randn(c(20, 16, 50)) weights = torch_randn(c(16, 33, 5)) nnf_conv_transpose1d(inputs, weights) }
Conv_transpose2d
torch_conv_transpose2d( input, weight, bias = list(), stride = 1L, padding = 0L, output_padding = 0L, groups = 1L, dilation = 1L )
torch_conv_transpose2d( input, weight, bias = list(), stride = 1L, padding = 0L, output_padding = 0L, groups = 1L, dilation = 1L )
input |
input tensor of shape |
weight |
filters of shape |
bias |
optional bias of shape |
stride |
the stride of the convolving kernel. Can be a single number or a tuple |
padding |
|
output_padding |
additional size added to one side of each dimension in the output shape. Can be a single number or a tuple |
groups |
split input into groups, |
dilation |
the spacing between kernel elements. Can be a single number or a tuple |
Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".
See nn_conv_transpose2d()
for details and output shape.
if (torch_is_installed()) { # With square kernels and equal stride inputs = torch_randn(c(1, 4, 5, 5)) weights = torch_randn(c(4, 8, 3, 3)) nnf_conv_transpose2d(inputs, weights, padding=1) }
if (torch_is_installed()) { # With square kernels and equal stride inputs = torch_randn(c(1, 4, 5, 5)) weights = torch_randn(c(4, 8, 3, 3)) nnf_conv_transpose2d(inputs, weights, padding=1) }
Conv_transpose3d
torch_conv_transpose3d( input, weight, bias = list(), stride = 1L, padding = 0L, output_padding = 0L, groups = 1L, dilation = 1L )
torch_conv_transpose3d( input, weight, bias = list(), stride = 1L, padding = 0L, output_padding = 0L, groups = 1L, dilation = 1L )
input |
input tensor of shape |
weight |
filters of shape |
bias |
optional bias of shape |
stride |
the stride of the convolving kernel. Can be a single number or a tuple |
padding |
|
output_padding |
additional size added to one side of each dimension in the output shape. Can be a single number or a tuple |
groups |
split input into groups, |
dilation |
the spacing between kernel elements. Can be a single number or a tuple |
Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"
See nn_conv_transpose3d()
for details and output shape.
if (torch_is_installed()) { ## Not run: inputs = torch_randn(c(20, 16, 50, 10, 20)) weights = torch_randn(c(16, 33, 3, 3, 3)) nnf_conv_transpose3d(inputs, weights) ## End(Not run) }
if (torch_is_installed()) { ## Not run: inputs = torch_randn(c(20, 16, 50, 10, 20)) weights = torch_randn(c(16, 33, 3, 3, 3)) nnf_conv_transpose3d(inputs, weights) ## End(Not run) }
Conv1d
torch_conv1d( input, weight, bias = list(), stride = 1L, padding = 0L, dilation = 1L, groups = 1L )
torch_conv1d( input, weight, bias = list(), stride = 1L, padding = 0L, dilation = 1L, groups = 1L )
input |
input tensor of shape |
weight |
filters of shape |
bias |
optional bias of shape |
stride |
the stride of the convolving kernel. Can be a single number or a one-element tuple |
padding |
implicit paddings on both sides of the input. Can be a single number or a one-element tuple |
dilation |
the spacing between kernel elements. Can be a single number or a one-element tuple |
groups |
split input into groups, |
Applies a 1D convolution over an input signal composed of several input planes.
See nn_conv1d()
for details and output shape.
if (torch_is_installed()) { filters = torch_randn(c(33, 16, 3)) inputs = torch_randn(c(20, 16, 50)) nnf_conv1d(inputs, filters) }
if (torch_is_installed()) { filters = torch_randn(c(33, 16, 3)) inputs = torch_randn(c(20, 16, 50)) nnf_conv1d(inputs, filters) }
Conv2d
torch_conv2d( input, weight, bias = list(), stride = 1L, padding = 0L, dilation = 1L, groups = 1L )
torch_conv2d( input, weight, bias = list(), stride = 1L, padding = 0L, dilation = 1L, groups = 1L )
input |
input tensor of shape |
weight |
filters of shape |
bias |
optional bias tensor of shape |
stride |
the stride of the convolving kernel. Can be a single number or a tuple |
padding |
implicit paddings on both sides of the input. Can be a single number or a tuple |
dilation |
the spacing between kernel elements. Can be a single number or a tuple |
groups |
split input into groups, |
Applies a 2D convolution over an input image composed of several input planes.
See nn_conv2d()
for details and output shape.
if (torch_is_installed()) { # With square kernels and equal stride filters = torch_randn(c(8,4,3,3)) inputs = torch_randn(c(1,4,5,5)) nnf_conv2d(inputs, filters, padding=1) }
if (torch_is_installed()) { # With square kernels and equal stride filters = torch_randn(c(8,4,3,3)) inputs = torch_randn(c(1,4,5,5)) nnf_conv2d(inputs, filters, padding=1) }
Conv3d
torch_conv3d( input, weight, bias = list(), stride = 1L, padding = 0L, dilation = 1L, groups = 1L )
torch_conv3d( input, weight, bias = list(), stride = 1L, padding = 0L, dilation = 1L, groups = 1L )
input |
input tensor of shape |
weight |
filters of shape |
bias |
optional bias tensor of shape |
stride |
the stride of the convolving kernel. Can be a single number or a tuple |
padding |
implicit paddings on both sides of the input. Can be a single number or a tuple |
dilation |
the spacing between kernel elements. Can be a single number or a tuple |
groups |
split input into groups, |
Applies a 3D convolution over an input image composed of several input planes.
See nn_conv3d()
for details and output shape.
if (torch_is_installed()) { # filters = torch_randn(c(33, 16, 3, 3, 3)) # inputs = torch_randn(c(20, 16, 50, 10, 20)) # nnf_conv3d(inputs, filters) }
if (torch_is_installed()) { # filters = torch_randn(c(33, 16, 3, 3, 3)) # inputs = torch_randn(c(20, 16, 50, 10, 20)) # nnf_conv3d(inputs, filters) }
Cos
torch_cos(self)
torch_cos(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the cosine of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_cos(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_cos(a) }
Cosh
torch_cosh(self)
torch_cosh(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the hyperbolic cosine of the elements of
input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_cosh(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_cosh(a) }
Cosine_similarity
torch_cosine_similarity(x1, x2, dim = 2L, eps = 1e-08)
torch_cosine_similarity(x1, x2, dim = 2L, eps = 1e-08)
x1 |
(Tensor) First input. |
x2 |
(Tensor) Second input (of size matching x1). |
dim |
(int, optional) Dimension of vectors. Default: 1 |
eps |
(float, optional) Small value to avoid division by zero. Default: 1e-8 |
Returns cosine similarity between x1 and x2, computed along dim.
if (torch_is_installed()) { input1 = torch_randn(c(100, 128)) input2 = torch_randn(c(100, 128)) output = torch_cosine_similarity(input1, input2) output }
if (torch_is_installed()) { input1 = torch_randn(c(100, 128)) input2 = torch_randn(c(100, 128)) output = torch_cosine_similarity(input1, input2) output }
Count_nonzero
torch_count_nonzero(self, dim = NULL)
torch_count_nonzero(self, dim = NULL)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints, optional) Dim or tuple of dims along which to count non-zeros. |
Counts the number of non-zero values in the tensor input
along the given dim
.
If no dim is specified then all non-zeros in the tensor are counted.
if (torch_is_installed()) { x <- torch_zeros(3,3) x[torch_randn(3,3) > 0.5] = 1 x torch_count_nonzero(x) torch_count_nonzero(x, dim=1) }
if (torch_is_installed()) { x <- torch_zeros(3,3) x[torch_randn(3,3) > 0.5] = 1 x torch_count_nonzero(x) torch_count_nonzero(x, dim=1) }
Cross
torch_cross(self, other, dim = NULL)
torch_cross(self, other, dim = NULL)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
dim |
(int, optional) the dimension to take the cross-product in. |
Returns the cross product of vectors in dimension dim
of input
and other
.
input
and other
must have the same size, and the size of their
dim
dimension should be 3.
If dim
is not given, it defaults to the first dimension found with the
size 3.
if (torch_is_installed()) { a = torch_randn(c(4, 3)) a b = torch_randn(c(4, 3)) b torch_cross(a, b, dim=2) torch_cross(a, b) }
if (torch_is_installed()) { a = torch_randn(c(4, 3)) a b = torch_randn(c(4, 3)) b torch_cross(a, b, dim=2) torch_cross(a, b) }
Cummax
torch_cummax(self, dim)
torch_cummax(self, dim)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to do the operation over |
Returns a namedtuple (values, indices)
where values
is the cumulative maximum of
elements of input
in the dimension dim
. And indices
is the index
location of each maximum value found in the dimension dim
.
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cummax(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cummax(a, dim=1) }
Cummin
torch_cummin(self, dim)
torch_cummin(self, dim)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to do the operation over |
Returns a namedtuple (values, indices)
where values
is the cumulative minimum of
elements of input
in the dimension dim
. And indices
is the index
location of each maximum value found in the dimension dim
.
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cummin(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cummin(a, dim=1) }
Cumprod
torch_cumprod(self, dim, dtype = NULL)
torch_cumprod(self, dim, dtype = NULL)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to do the operation over |
dtype |
( |
Returns the cumulative product of elements of input
in the dimension
dim
.
For example, if input
is a vector of size N, the result will also be
a vector of size N, with elements.
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cumprod(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cumprod(a, dim=1) }
Cumsum
torch_cumsum(self, dim, dtype = NULL)
torch_cumsum(self, dim, dtype = NULL)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to do the operation over |
dtype |
( |
Returns the cumulative sum of elements of input
in the dimension
dim
.
For example, if input
is a vector of size N, the result will also be
a vector of size N, with elements.
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cumsum(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(10)) a torch_cumsum(a, dim=1) }
Deg2rad
torch_deg2rad(self)
torch_deg2rad(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with each of the elements of input
converted from angles in degrees to radians.
if (torch_is_installed()) { a <- torch_tensor(rbind(c(180.0, -180.0), c(360.0, -360.0), c(90.0, -90.0))) torch_deg2rad(a) }
if (torch_is_installed()) { a <- torch_tensor(rbind(c(180.0, -180.0), c(360.0, -360.0), c(90.0, -90.0))) torch_deg2rad(a) }
Dequantize
torch_dequantize(tensor)
torch_dequantize(tensor)
tensor |
(Tensor) A quantized Tensor or a list oof quantized tensors |
Returns an fp32 Tensor by dequantizing a quantized Tensor
Given a list of quantized Tensors, dequantize them and return a list of fp32 Tensors
Det
torch_det(self)
torch_det(self)
self |
(Tensor) the input tensor of size |
Calculates determinant of a square matrix or batches of square matrices.
Backward through `det` internally uses SVD results when `input` is not invertible. In this case, double backward through `det` will be unstable in when `input` doesn't have distinct singular values. See `~torch.svd` for details.
if (torch_is_installed()) { A = torch_randn(c(3, 3)) torch_det(A) A = torch_randn(c(3, 2, 2)) A A$det() }
if (torch_is_installed()) { A = torch_randn(c(3, 3)) torch_det(A) A = torch_randn(c(3, 2, 2)) A A$det() }
A torch_device
is an object representing the device on which a torch_tensor
is or will be allocated.
torch_device(type, index = NULL)
torch_device(type, index = NULL)
type |
(character) a device type |
index |
(integer) optional device ordinal for the device type. If the device ordinal
is not present, this object will always represent the current device for the device
type, even after A |
if (torch_is_installed()) { # Via string torch_device("cuda:1") torch_device("cpu") torch_device("cuda") # current cuda device # Via string and device ordinal torch_device("cuda", 0) torch_device("cpu", 0) }
if (torch_is_installed()) { # Via string torch_device("cuda:1") torch_device("cpu") torch_device("cuda") # current cuda device # Via string and device ordinal torch_device("cuda", 0) torch_device("cpu", 0) }
Diag
torch_diag(self, diagonal = 0L)
torch_diag(self, diagonal = 0L)
self |
(Tensor) the input tensor. |
diagonal |
(int, optional) the diagonal to consider |
If input
is a vector (1-D tensor), then returns a 2-D square tensor
with the elements of input
as the diagonal.
If input
is a matrix (2-D tensor), then returns a 1-D tensor with
the diagonal elements of input
.
The argument diagonal
controls which diagonal to consider:
If diagonal
= 0, it is the main diagonal.
If diagonal
> 0, it is above the main diagonal.
If diagonal
< 0, it is below the main diagonal.
Diag_embed
torch_diag_embed(self, offset = 0L, dim1 = -2L, dim2 = -1L)
torch_diag_embed(self, offset = 0L, dim1 = -2L, dim2 = -1L)
self |
(Tensor) the input tensor. Must be at least 1-dimensional. |
offset |
(int, optional) which diagonal to consider. Default: 0 (main diagonal). |
dim1 |
(int, optional) first dimension with respect to which to take diagonal. Default: -2. |
dim2 |
(int, optional) second dimension with respect to which to take diagonal. Default: -1. |
Creates a tensor whose diagonals of certain 2D planes (specified by
dim1
and dim2
) are filled by input
.
To facilitate creating batched diagonal matrices, the 2D planes formed by
the last two dimensions of the returned tensor are chosen by default.
The argument offset
controls which diagonal to consider:
If offset
= 0, it is the main diagonal.
If offset
> 0, it is above the main diagonal.
If offset
< 0, it is below the main diagonal.
The size of the new matrix will be calculated to make the specified diagonal
of the size of the last input dimension.
Note that for offset
other than , the order of
dim1
and dim2
matters. Exchanging them is equivalent to changing the
sign of offset
.
Applying torch_diagonal
to the output of this function with
the same arguments yields a matrix identical to input. However,
torch_diagonal
has different default dimensions, so those
need to be explicitly specified.
if (torch_is_installed()) { a = torch_randn(c(2, 3)) torch_diag_embed(a) torch_diag_embed(a, offset=1, dim1=1, dim2=3) }
if (torch_is_installed()) { a = torch_randn(c(2, 3)) torch_diag_embed(a) torch_diag_embed(a, offset=1, dim1=1, dim2=3) }
Diagflat
torch_diagflat(self, offset = 0L)
torch_diagflat(self, offset = 0L)
self |
(Tensor) the input tensor. |
offset |
(int, optional) the diagonal to consider. Default: 0 (main diagonal). |
If input
is a vector (1-D tensor), then returns a 2-D square tensor
with the elements of input
as the diagonal.
If input
is a tensor with more than one dimension, then returns a
2-D tensor with diagonal elements equal to a flattened input
.
The argument offset
controls which diagonal to consider:
If offset
= 0, it is the main diagonal.
If offset
> 0, it is above the main diagonal.
If offset
< 0, it is below the main diagonal.
if (torch_is_installed()) { a = torch_randn(c(3)) a torch_diagflat(a) torch_diagflat(a, 1) a = torch_randn(c(2, 2)) a torch_diagflat(a) }
if (torch_is_installed()) { a = torch_randn(c(3)) a torch_diagflat(a) torch_diagflat(a, 1) a = torch_randn(c(2, 2)) a torch_diagflat(a) }
Diagonal
torch_diagonal(self, outdim, dim1 = 1L, dim2 = 2L, offset = 0L)
torch_diagonal(self, outdim, dim1 = 1L, dim2 = 2L, offset = 0L)
self |
(Tensor) the input tensor. Must be at least 2-dimensional. |
outdim |
dimension name if |
dim1 |
(int, optional) first dimension with respect to which to take diagonal. Default: 0. |
dim2 |
(int, optional) second dimension with respect to which to take diagonal. Default: 1. |
offset |
(int, optional) which diagonal to consider. Default: 0 (main diagonal). |
Returns a partial view of input
with the its diagonal elements
with respect to dim1
and dim2
appended as a dimension
at the end of the shape.
The argument offset
controls which diagonal to consider:
If offset
= 0, it is the main diagonal.
If offset
> 0, it is above the main diagonal.
If offset
< 0, it is below the main diagonal.
Applying torch_diag_embed
to the output of this function with
the same arguments yields a diagonal matrix with the diagonal entries
of the input. However, torch_diag_embed
has different default
dimensions, so those need to be explicitly specified.
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a torch_diagonal(a, offset = 0) torch_diagonal(a, offset = 1) x = torch_randn(c(2, 5, 4, 2)) torch_diagonal(x, offset=-1, dim1=1, dim2=2) }
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a torch_diagonal(a, offset = 0) torch_diagonal(a, offset = 1) x = torch_randn(c(2, 5, 4, 2)) torch_diagonal(x, offset=-1, dim1=1, dim2=2) }
The first-order differences are given by out[i] = input[i + 1] - input[i]
.
Higher-order differences are calculated by using torch_diff()
recursively.
torch_diff(self, n = 1L, dim = -1L, prepend = list(), append = list())
torch_diff(self, n = 1L, dim = -1L, prepend = list(), append = list())
self |
the tensor to compute the differences on |
n |
the number of times to recursively compute the difference |
dim |
the dimension to compute the difference along. Default is the last dimension. |
prepend |
values to prepend to input along dim before computing the difference. Their dimensions must be equivalent to that of input, and their shapes must match input’s shape except on dim. |
append |
values to append to input along dim before computing the difference. Their dimensions must be equivalent to that of input, and their shapes must match input’s shape except on dim. |
Only n = 1 is currently supported
if (torch_is_installed()) { a <- torch_tensor(c(1,2,3)) torch_diff(a) b <- torch_tensor(c(4, 5)) torch_diff(a, append = b) c <- torch_tensor(rbind(c(1,2,3), c(3,4,5))) torch_diff(c, dim = 1) torch_diff(c, dim = 2) }
if (torch_is_installed()) { a <- torch_tensor(c(1,2,3)) torch_diff(a) b <- torch_tensor(c(4, 5)) torch_diff(a, append = b) c <- torch_tensor(rbind(c(1,2,3), c(3,4,5))) torch_diff(c, dim = 1) torch_diff(c, dim = 2) }
Digamma
torch_digamma(self)
torch_digamma(self)
self |
(Tensor) the tensor to compute the digamma function on |
Computes the logarithmic derivative of the gamma function on input
.
if (torch_is_installed()) { a = torch_tensor(c(1, 0.5)) torch_digamma(a) }
if (torch_is_installed()) { a = torch_tensor(c(1, 0.5)) torch_digamma(a) }
Dist
torch_dist(self, other, p = 2L)
torch_dist(self, other, p = 2L)
self |
(Tensor) the input tensor. |
other |
(Tensor) the Right-hand-side input tensor |
p |
(float, optional) the norm to be computed |
Returns the p-norm of (input
- other
)
The shapes of input
and other
must be
broadcastable .
if (torch_is_installed()) { x = torch_randn(c(4)) x y = torch_randn(c(4)) y torch_dist(x, y, 3.5) torch_dist(x, y, 3) torch_dist(x, y, 0) torch_dist(x, y, 1) }
if (torch_is_installed()) { x = torch_randn(c(4)) x y = torch_randn(c(4)) y torch_dist(x, y, 3.5) torch_dist(x, y, 3) torch_dist(x, y, 0) torch_dist(x, y, 1) }
Div
torch_div(self, other, rounding_mode)
torch_div(self, other, rounding_mode)
self |
(Tensor) the input tensor. |
other |
(Number) the number to be divided to each element of |
rounding_mode |
(str, optional) – Type of rounding applied to the result:
|
Divides each element of the input input
with the scalar other
and
returns a new resulting tensor.
Each element of the tensor input
is divided by each element of the tensor
other
. The resulting tensor is returned.
The shapes of input
and other
must be broadcastable
. If the torch_dtype
of input
and
other
differ, the torch_dtype
of the result tensor is determined
following rules described in the type promotion documentation
. If out
is specified, the result must be
castable to the torch_dtype
of the
specified output tensor. Integral division by zero leads to undefined behavior.
Integer division using div is deprecated, and in a future release div will
perform true division like torch_true_divide()
.
Use torch_floor_divide()
to perform integer division,
instead.
If the torch_dtype
of input
and other
differ, the
torch_dtype
of the result tensor is determined following rules
described in the type promotion documentation . If
out
is specified, the result must be castable
to the torch_dtype
of the specified output tensor. Integral division
by zero leads to undefined behavior.
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_div(a, 0.5) a = torch_randn(c(4, 4)) a b = torch_randn(c(4)) b torch_div(a, b) }
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_div(a, 0.5) a = torch_randn(c(4, 4)) a b = torch_randn(c(4)) b torch_div(a, b) }
Divide
torch_divide(self, other, rounding_mode)
torch_divide(self, other, rounding_mode)
self |
(Tensor) the input tensor. |
other |
(Number) the number to be divided to each element of |
rounding_mode |
(str, optional) – Type of rounding applied to the result:
|
Alias for torch_div()
.
Dot
torch_dot(self, tensor)
torch_dot(self, tensor)
self |
the input tensor |
tensor |
the other input tensor |
Computes the dot product (inner product) of two tensors.
This function does not broadcast .
if (torch_is_installed()) { torch_dot(torch_tensor(c(2, 3)), torch_tensor(c(2, 1))) }
if (torch_is_installed()) { torch_dot(torch_tensor(c(2, 3)), torch_tensor(c(2, 1))) }
Dstack
torch_dstack(tensors)
torch_dstack(tensors)
tensors |
(sequence of Tensors) sequence of tensors to concatenate |
Stack tensors in sequence depthwise (along third axis).
This is equivalent to concatenation along the third axis after 1-D and 2-D
tensors have been reshaped by torch_atleast_3d()
.
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, 3)) b <- torch_tensor(c(4, 5, 6)) torch_dstack(list(a,b)) a <- torch_tensor(rbind(1,2,3)) b <- torch_tensor(rbind(4,5,6)) torch_dstack(list(a,b)) }
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, 3)) b <- torch_tensor(c(4, 5, 6)) torch_dstack(list(a,b)) a <- torch_tensor(rbind(1,2,3)) b <- torch_tensor(rbind(4,5,6)) torch_dstack(list(a,b)) }
Returns the correspondent data type.
torch_float32() torch_float() torch_float64() torch_double() torch_cfloat32() torch_chalf() torch_cfloat() torch_cfloat64() torch_cdouble() torch_cfloat128() torch_float16() torch_half() torch_uint8() torch_int8() torch_int16() torch_short() torch_int32() torch_int() torch_int64() torch_long() torch_bool() torch_quint8() torch_qint8() torch_qint32()
torch_float32() torch_float() torch_float64() torch_double() torch_cfloat32() torch_chalf() torch_cfloat() torch_cfloat64() torch_cdouble() torch_cfloat128() torch_float16() torch_half() torch_uint8() torch_int8() torch_int16() torch_short() torch_int32() torch_int() torch_int64() torch_long() torch_bool() torch_quint8() torch_qint8() torch_qint32()
Eig
self |
(Tensor) the square matrix of shape |
eigenvectors |
(bool) |
Computes the eigenvalues and eigenvectors of a real square matrix.
Einsum
torch_einsum(equation, tensors, path = NULL)
torch_einsum(equation, tensors, path = NULL)
equation |
(string) The equation is given in terms of lower case letters (indices) to be associated with each dimension of the operands and result. The left hand side lists the operands dimensions, separated by commas. There should be one index letter per tensor dimension. The right hand side follows after |
tensors |
(Tensor) The operands to compute the Einstein sum of. |
path |
(int) This function uses opt_einsum to
speed up computation or to consume less memory by optimizing contraction order. This optimization
occurs when there are at least three inputs, since the order does not matter otherwise.
Note that finding the optimal path is an NP-hard problem, thus, |
This function provides a way of computing multilinear expressions (i.e. sums of products) using the Einstein summation convention.
if (torch_is_installed()) { x = torch_randn(c(5)) y = torch_randn(c(4)) torch_einsum('i,j->ij', list(x, y)) # outer product A = torch_randn(c(3,5,4)) l = torch_randn(c(2,5)) r = torch_randn(c(2,4)) torch_einsum('bn,anm,bm->ba', list(l, A, r)) # compare torch_nn$functional$bilinear As = torch_randn(c(3,2,5)) Bs = torch_randn(c(3,5,4)) torch_einsum('bij,bjk->bik', list(As, Bs)) # batch matrix multiplication A = torch_randn(c(3, 3)) torch_einsum('ii->i', list(A)) # diagonal A = torch_randn(c(4, 3, 3)) torch_einsum('...ii->...i', list(A)) # batch diagonal A = torch_randn(c(2, 3, 4, 5)) torch_einsum('...ij->...ji', list(A))$shape # batch permute }
if (torch_is_installed()) { x = torch_randn(c(5)) y = torch_randn(c(4)) torch_einsum('i,j->ij', list(x, y)) # outer product A = torch_randn(c(3,5,4)) l = torch_randn(c(2,5)) r = torch_randn(c(2,4)) torch_einsum('bn,anm,bm->ba', list(l, A, r)) # compare torch_nn$functional$bilinear As = torch_randn(c(3,2,5)) Bs = torch_randn(c(3,5,4)) torch_einsum('bij,bjk->bik', list(As, Bs)) # batch matrix multiplication A = torch_randn(c(3, 3)) torch_einsum('ii->i', list(A)) # diagonal A = torch_randn(c(4, 3, 3)) torch_einsum('...ii->...i', list(A)) # batch diagonal A = torch_randn(c(2, 3, 4, 5)) torch_einsum('...ij->...ji', list(A))$shape # batch permute }
Empty
torch_empty( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_empty( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
... |
a sequence of integers defining the shape of the output tensor. |
names |
optional character vector naming each dimension. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a tensor filled with uninitialized data. The shape of the tensor is
defined by the variable argument size
.
if (torch_is_installed()) { torch_empty(c(2, 3)) }
if (torch_is_installed()) { torch_empty(c(2, 3)) }
Empty_like
torch_empty_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_empty_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
input |
(Tensor) the size of |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
( |
Returns an uninitialized tensor with the same size as input
.
torch_empty_like(input)
is equivalent to
torch_empty(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)
.
if (torch_is_installed()) { torch_empty(list(2,3), dtype = torch_int64()) }
if (torch_is_installed()) { torch_empty(list(2,3), dtype = torch_int64()) }
Empty_strided
torch_empty_strided( size, stride, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, pin_memory = FALSE )
torch_empty_strided( size, stride, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, pin_memory = FALSE )
size |
(tuple of ints) the shape of the output tensor |
stride |
(tuple of ints) the strides of the output tensor |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
pin_memory |
(bool, optional) If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: |
Returns a tensor filled with uninitialized data. The shape and strides of the tensor is
defined by the variable argument size
and stride
respectively.
torch_empty_strided(size, stride)
is equivalent to
torch_empty(size).as_strided(size, stride)
.
More than one element of the created tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first.
if (torch_is_installed()) { a = torch_empty_strided(list(2, 3), list(1, 2)) a a$stride(1) a$size(1) }
if (torch_is_installed()) { a = torch_empty_strided(list(2, 3), list(1, 2)) a a$stride(1) a$size(1) }
Eq
torch_eq(self, other)
torch_eq(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare
Must be a |
Computes element-wise equality
The second argument can be a number or a tensor whose shape is broadcastable with the first argument.
if (torch_is_installed()) { torch_eq(torch_tensor(c(1,2,3,4)), torch_tensor(c(1, 3, 2, 4))) }
if (torch_is_installed()) { torch_eq(torch_tensor(c(1,2,3,4)), torch_tensor(c(1, 3, 2, 4))) }
Equal
torch_equal(self, other)
torch_equal(self, other)
self |
the input tensor |
other |
the other input tensor |
TRUE
if two tensors have the same size and elements, FALSE
otherwise.
if (torch_is_installed()) { torch_equal(torch_tensor(c(1, 2)), torch_tensor(c(1, 2))) }
if (torch_is_installed()) { torch_equal(torch_tensor(c(1, 2)), torch_tensor(c(1, 2))) }
Erf
torch_erf(self)
torch_erf(self)
self |
(Tensor) the input tensor. |
Computes the error function of each element. The error function is defined as follows:
if (torch_is_installed()) { torch_erf(torch_tensor(c(0, -1., 10.))) }
if (torch_is_installed()) { torch_erf(torch_tensor(c(0, -1., 10.))) }
Erfc
torch_erfc(self)
torch_erfc(self)
self |
(Tensor) the input tensor. |
Computes the complementary error function of each element of input
.
The complementary error function is defined as follows:
if (torch_is_installed()) { torch_erfc(torch_tensor(c(0, -1., 10.))) }
if (torch_is_installed()) { torch_erfc(torch_tensor(c(0, -1., 10.))) }
Erfinv
torch_erfinv(self)
torch_erfinv(self)
self |
(Tensor) the input tensor. |
Computes the inverse error function of each element of input
.
The inverse error function is defined in the range as:
if (torch_is_installed()) { torch_erfinv(torch_tensor(c(0, 0.5, -1.))) }
if (torch_is_installed()) { torch_erfinv(torch_tensor(c(0, 0.5, -1.))) }
Exp
torch_exp(self)
torch_exp(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the exponential of the elements
of the input tensor input
.
if (torch_is_installed()) { torch_exp(torch_tensor(c(0, log(2)))) }
if (torch_is_installed()) { torch_exp(torch_tensor(c(0, log(2)))) }
Exp2
torch_exp2(self)
torch_exp2(self)
self |
(Tensor) the input tensor. |
Computes the base two exponential function of input
.
if (torch_is_installed()) { torch_exp2(torch_tensor(c(0, log2(2.), 3, 4))) }
if (torch_is_installed()) { torch_exp2(torch_tensor(c(0, log2(2.), 3, 4))) }
Expm1
torch_expm1(self)
torch_expm1(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the exponential of the elements minus 1
of input
.
if (torch_is_installed()) { torch_expm1(torch_tensor(c(0, log(2)))) }
if (torch_is_installed()) { torch_expm1(torch_tensor(c(0, log(2)))) }
Eye
torch_eye( n, m = n, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_eye( n, m = n, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
n |
(int) the number of rows |
m |
(int, optional) the number of columns with default being |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.
if (torch_is_installed()) { torch_eye(3) }
if (torch_is_installed()) { torch_eye(3) }
Computes the one dimensional discrete Fourier transform of input.
torch_fft_fft(self, n = NULL, dim = -1L, norm = NULL)
torch_fft_fft(self, n = NULL, dim = -1L, norm = NULL)
self |
(Tensor) the input tensor |
n |
(int) Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the FFT. |
dim |
(int, optional) The dimension along which to take the one dimensional FFT. |
norm |
(str, optional) Normalization mode. For the forward transform, these correspond to:
|
The Fourier domain representation of any real signal satisfies the Hermitian
property: X[i] = conj(X[-i]).
This function always returns both the positive
and negative frequency terms even though, for real inputs, the negative
frequencies are redundant. rfft() returns the more compact one-sided representation
where only the positive frequencies are returned.
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 3) t torch_fft_fft(t, norm = "backward") }
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 3) t torch_fft_fft(t, norm = "backward") }
Computes the discrete Fourier Transform sample frequencies for a signal of size n
.
torch_fft_fftfreq( n, d = 1, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_fft_fftfreq( n, d = 1, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
n |
(integer) – the FFT length |
d |
(float, optional) – the sampling length scale. The spacing between individual samples of the FFT input. The default assumes unit spacing, dividing that result by the actual spacing gives the result in physical frequency units. |
dtype |
(default: |
layout |
(default: |
device |
(default: |
requires_grad |
(default: |
By convention, torch_fft_fft()
returns positive frequency terms first, followed by the negative
frequencies in reverse order, so that f[-i]
for all 0 < i <= n/2
gives the negative frequency terms. For an FFT of length n
and with inputs spaced
in length unit d
, the frequencies are:
f = [0, 1, ..., (n - 1) // 2, -(n // 2), ..., -1] / (d * n)
For even lengths, the Nyquist frequency at f[n/2]
can be thought of as either negative
or positive. fftfreq()
follows NumPy’s convention of taking it to be negative.
if (torch_is_installed()) { torch_fft_fftfreq(5) # Nyquist frequency at f[3] is positive torch_fft_fftfreq(4) # Nyquist frequency at f[3] is given as negative }
if (torch_is_installed()) { torch_fft_fftfreq(5) # Nyquist frequency at f[3] is positive torch_fft_fftfreq(4) # Nyquist frequency at f[3] is given as negative }
Computes the one dimensional inverse discrete Fourier transform of input.
torch_fft_ifft(self, n = NULL, dim = -1L, norm = NULL)
torch_fft_ifft(self, n = NULL, dim = -1L, norm = NULL)
self |
(Tensor) the input tensor |
n |
(int, optional) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the IFFT. |
dim |
(int, optional) – The dimension along which to take the one dimensional IFFT. |
norm |
(str, optional) – Normalization mode. For the backward transform, these correspond to:
|
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 3) t x <- torch_fft_fft(t, norm = "backward") torch_fft_ifft(x) }
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 3) t x <- torch_fft_fft(t, norm = "backward") torch_fft_ifft(x) }
Computes the inverse of torch_fft_rfft()
.
Input is interpreted as a one-sided Hermitian signal in the Fourier domain,
as produced by torch_fft_rfft()
. By the Hermitian property, the output will
be real-valued.
torch_fft_irfft(self, n = NULL, dim = -1L, norm = NULL)
torch_fft_irfft(self, n = NULL, dim = -1L, norm = NULL)
self |
(Tensor) the input tensor representing a half-Hermitian signal |
n |
(int) Output signal length. This determines the length of the output
signal. If given, the input will either be zero-padded or trimmed to this
length before computing the real IFFT. Defaults to even output: |
dim |
(int, optional) – The dimension along which to take the one dimensional real IFFT. |
norm |
(str, optional) – Normalization mode. For the backward transform, these correspond to:
|
Some input frequencies must be real-valued to satisfy the Hermitian property. In these cases the imaginary component will be ignored. For example, any imaginary component in the zero-frequency term cannot be represented in a real output and so will always be ignored.
The correct interpretation of the Hermitian input depends on the length of the original data, as given by n. This is because each input shape could correspond to either an odd or even length signal. By default, the signal is assumed to be even length and odd signals will not round-trip properly. So, it is recommended to always pass the signal length n.
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 4) x <- torch_fft_rfft(t) torch_fft_irfft(x) torch_fft_irfft(x, n = t$numel()) }
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 4) x <- torch_fft_rfft(t) torch_fft_irfft(x) torch_fft_irfft(x, n = t$numel()) }
Computes the one dimensional Fourier transform of real-valued input.
torch_fft_rfft(self, n = NULL, dim = -1L, norm = NULL)
torch_fft_rfft(self, n = NULL, dim = -1L, norm = NULL)
self |
(Tensor) the real input tensor |
n |
(int) Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the real FFT. |
dim |
(int, optional) – The dimension along which to take the one dimensional real FFT. |
norm |
norm (str, optional) – Normalization mode. For the forward transform, these correspond to:
|
The FFT of a real signal is Hermitian-symmetric, X[i] = conj(X[-i])
so the
output contains only the positive frequencies below the Nyquist frequency.
To compute the full output, use torch_fft_fft()
.
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 3) torch_fft_rfft(t) }
if (torch_is_installed()) { t <- torch_arange(start = 0, end = 3) torch_fft_rfft(t) }
A list that represents the numerical properties of a floating point torch.dtype
torch_finfo(dtype)
torch_finfo(dtype)
dtype |
dtype to check information |
Fix
torch_fix(self)
torch_fix(self)
self |
(Tensor) the input tensor. |
Alias for torch_trunc()
Flatten
torch_flatten(self, dims, start_dim = 1L, end_dim = -1L, out_dim)
torch_flatten(self, dims, start_dim = 1L, end_dim = -1L, out_dim)
self |
(Tensor) the input tensor. |
dims |
if tensor is named you can pass the name of the dimensions to flatten |
start_dim |
(int) the first dim to flatten |
end_dim |
(int) the last dim to flatten |
out_dim |
the name of the resulting dimension if a named tensor. |
Flattens a contiguous range of dims in a tensor.
if (torch_is_installed()) { t = torch_tensor(matrix(c(1, 2), ncol = 2)) torch_flatten(t) torch_flatten(t, start_dim=2) }
if (torch_is_installed()) { t = torch_tensor(matrix(c(1, 2), ncol = 2)) torch_flatten(t) torch_flatten(t, start_dim=2) }
Flip
torch_flip(self, dims)
torch_flip(self, dims)
self |
(Tensor) the input tensor. |
dims |
(a list or tuple) axis to flip on |
Reverse the order of a n-D tensor along given axis in dims.
if (torch_is_installed()) { x <- torch_arange(1, 8)$view(c(2, 2, 2)) x torch_flip(x, c(1, 2)) }
if (torch_is_installed()) { x <- torch_arange(1, 8)$view(c(2, 2, 2)) x torch_flip(x, c(1, 2)) }
Fliplr
torch_fliplr(self)
torch_fliplr(self)
self |
(Tensor) Must be at least 2-dimensional. |
Flip array in the left/right direction, returning a new tensor.
Flip the entries in each row in the left/right direction. Columns are preserved, but appear in a different order than before.
Equivalent to input[,-1]
. Requires the array to be at least 2-D.
if (torch_is_installed()) { x <- torch_arange(start = 1, end = 4)$view(c(2, 2)) x torch_fliplr(x) }
if (torch_is_installed()) { x <- torch_arange(start = 1, end = 4)$view(c(2, 2)) x torch_fliplr(x) }
Flipud
torch_flipud(self)
torch_flipud(self)
self |
(Tensor) Must be at least 1-dimensional. |
Flip array in the up/down direction, returning a new tensor.
Flip the entries in each column in the up/down direction. Rows are preserved, but appear in a different order than before.
Equivalent to input[-1,]
. Requires the array to be at least 1-D.
if (torch_is_installed()) { x <- torch_arange(start = 1, end = 4)$view(c(2, 2)) x torch_flipud(x) }
if (torch_is_installed()) { x <- torch_arange(start = 1, end = 4)$view(c(2, 2)) x torch_flipud(x) }
Floor
torch_floor(self)
torch_floor(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the floor of the elements of input
,
the largest integer less than or equal to each element.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_floor(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_floor(a) }
Floor_divide
torch_floor_divide(self, other)
torch_floor_divide(self, other)
self |
(Tensor) the numerator tensor |
other |
(Tensor or Scalar) the denominator |
Return the division of the inputs rounded down to the nearest integer. See torch_div
for type promotion and broadcasting rules.
if (torch_is_installed()) { a = torch_tensor(c(4.0, 3.0)) b = torch_tensor(c(2.0, 2.0)) torch_floor_divide(a, b) torch_floor_divide(a, 1.4) }
if (torch_is_installed()) { a = torch_tensor(c(4.0, 3.0)) b = torch_tensor(c(2.0, 2.0)) torch_floor_divide(a, b) torch_floor_divide(a, 1.4) }
Fmod
torch_fmod(self, other)
torch_fmod(self, other)
self |
(Tensor) the dividend |
other |
(Tensor or float) the divisor, which may be either a number or a tensor of the same shape as the dividend |
Computes the element-wise remainder of division.
The dividend and divisor may contain both for integer and floating point
numbers. The remainder has the same sign as the dividend input
.
When other
is a tensor, the shapes of input
and
other
must be broadcastable .
if (torch_is_installed()) { torch_fmod(torch_tensor(c(-3., -2, -1, 1, 2, 3)), 2) torch_fmod(torch_tensor(c(1., 2, 3, 4, 5)), 1.5) }
if (torch_is_installed()) { torch_fmod(torch_tensor(c(-3., -2, -1, 1, 2, 3)), 2) torch_fmod(torch_tensor(c(1., 2, 3, 4, 5)), 1.5) }
Frac
torch_frac(self)
torch_frac(self)
self |
the input tensor. |
Computes the fractional portion of each element in input
.
if (torch_is_installed()) { torch_frac(torch_tensor(c(1, 2.5, -3.2))) }
if (torch_is_installed()) { torch_frac(torch_tensor(c(1, 2.5, -3.2))) }
Full
torch_full( size, fill_value, names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_full( size, fill_value, names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
size |
(int...) a list, tuple, or |
fill_value |
NA the number to fill the output tensor with. |
names |
optional names of the dimensions |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a tensor of size size
filled with fill_value
.
In PyTorch 1.5 a bool or integral fill_value
will produce a warning if
dtype
or out
are not set.
In a future PyTorch release, when dtype
and out
are not set
a bool fill_value
will return a tensor of torch.bool dtype,
and an integral fill_value
will return a tensor of torch.long dtype.
if (torch_is_installed()) { torch_full(list(2, 3), 3.141592) }
if (torch_is_installed()) { torch_full(list(2, 3), 3.141592) }
Full_like
torch_full_like( input, fill_value, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_full_like( input, fill_value, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
input |
(Tensor) the size of |
fill_value |
the number to fill the output tensor with. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
( |
memory_format=torch.preserve_format) -> Tensor
Returns a tensor with the same size as input
filled with fill_value
.
torch_full_like(input, fill_value)
is equivalent to
torch_full(input.size(), fill_value, dtype=input.dtype, layout=input.layout, device=input.device)
.
Gather
torch_gather(self, dim, index, sparse_grad = FALSE)
torch_gather(self, dim, index, sparse_grad = FALSE)
self |
(Tensor) the source tensor |
dim |
(int) the axis along which to index |
index |
(LongTensor) the indices of elements to gather |
sparse_grad |
(bool,optional) If |
Gathers values along an axis specified by dim
.
For a 3-D tensor the output is specified by::
out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0 out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1 out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2
If input
is an n-dimensional tensor with size
and
dim = i
, then index
must be an -dimensional tensor with
size
where
and
out
will have the same size as index
.
if (torch_is_installed()) { t = torch_tensor(matrix(c(1,2,3,4), ncol = 2, byrow = TRUE)) torch_gather(t, 2, torch_tensor(matrix(c(1,1,2,1), ncol = 2, byrow=TRUE), dtype = torch_int64())) }
if (torch_is_installed()) { t = torch_tensor(matrix(c(1,2,3,4), ncol = 2, byrow = TRUE)) torch_gather(t, 2, torch_tensor(matrix(c(1,1,2,1), ncol = 2, byrow=TRUE), dtype = torch_int64())) }
Gcd
torch_gcd(self, other)
torch_gcd(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
Computes the element-wise greatest common divisor (GCD) of input
and other
.
Both input
and other
must have integer types.
This defines .
if (torch_is_installed()) { if (torch::cuda_is_available()) { a <- torch_tensor(c(5, 10, 15), dtype = torch_long(), device = "cuda") b <- torch_tensor(c(3, 4, 5), dtype = torch_long(), device = "cuda") torch_gcd(a, b) c <- torch_tensor(c(3L), device = "cuda") torch_gcd(a, c) } }
if (torch_is_installed()) { if (torch::cuda_is_available()) { a <- torch_tensor(c(5, 10, 15), dtype = torch_long(), device = "cuda") b <- torch_tensor(c(3, 4, 5), dtype = torch_long(), device = "cuda") torch_gcd(a, b) c <- torch_tensor(c(3L), device = "cuda") torch_gcd(a, c) } }
Ge
torch_ge(self, other)
torch_ge(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Computes element-wise.
The second argument can be a number or a tensor whose shape is broadcastable with the first argument.
if (torch_is_installed()) { torch_ge(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
if (torch_is_installed()) { torch_ge(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
A torch_generator
is an object which manages the state of the algorithm
that produces pseudo random numbers. Used as a keyword argument in many
In-place random sampling functions.
torch_generator()
torch_generator()
if (torch_is_installed()) { # Via string generator <- torch_generator() generator$current_seed() generator$set_current_seed(1234567L) generator$current_seed() }
if (torch_is_installed()) { # Via string generator <- torch_generator() generator$current_seed() generator$set_current_seed(1234567L) generator$current_seed() }
Geqrf
torch_geqrf(self)
torch_geqrf(self)
self |
(Tensor) the input matrix |
This is a low-level function for calling LAPACK directly. This function
returns a namedtuple (a, tau) as defined in LAPACK documentation for geqrf
_ .
You'll generally want to use torch_qr
instead.
Computes a QR decomposition of input
, but without constructing
and
as explicit separate matrices.
Rather, this directly calls the underlying LAPACK function ?geqrf
which produces a sequence of 'elementary reflectors'.
See LAPACK documentation for geqrf
_ for further details.
Ger
torch_ger(self, vec2)
torch_ger(self, vec2)
self |
(Tensor) 1-D input vector |
vec2 |
(Tensor) 1-D input vector |
Outer product of input
and vec2
.
If input
is a vector of size and
vec2
is a vector of
size , then
out
must be a matrix of size .
This function does not broadcast .
if (torch_is_installed()) { v1 = torch_arange(1., 5.) v2 = torch_arange(1., 4.) torch_ger(v1, v2) }
if (torch_is_installed()) { v1 = torch_arange(1., 5.) v2 = torch_arange(1., 4.) torch_ger(v1, v2) }
Low level functionality to set and change the RNG state.
It's recommended to use torch_manual_seed()
for most cases.
torch_get_rng_state() torch_set_rng_state(state) cuda_get_rng_state(device = NULL) cuda_set_rng_state(state, device = NULL)
torch_get_rng_state() torch_set_rng_state(state) cuda_get_rng_state(device = NULL) cuda_set_rng_state(state, device = NULL)
state |
A tensor with the current state or a list containing the state for each device - (for CUDA). |
device |
The cuda device index to get or set the state. If |
torch_set_rng_state()
: Sets the RNG state for the CPU
cuda_get_rng_state()
: Gets the RNG state for CUDA.
cuda_set_rng_state()
: Sets the RNG state for CUDA.
Greater
torch_greater(self, other)
torch_greater(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Alias for torch_gt()
.
Greater_equal
torch_greater_equal(self, other)
torch_greater_equal(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Alias for torch_ge()
.
Gt
torch_gt(self, other)
torch_gt(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Computes element-wise.
The second argument can be a number or a tensor whose shape is broadcastable with the first argument.
if (torch_is_installed()) { torch_gt(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
if (torch_is_installed()) { torch_gt(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
Hamming_window
torch_hamming_window( window_length, periodic = TRUE, alpha = 0.54, beta = 0.46, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_hamming_window( window_length, periodic = TRUE, alpha = 0.54, beta = 0.46, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
window_length |
(int) the size of returned window |
periodic |
(bool, optional) If TRUE, returns a window to be used as periodic function. If False, return a symmetric window. |
alpha |
(float, optional) The coefficient |
beta |
(float, optional) The coefficient |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Hamming window function.
where is the full window size.
The input window_length
is a positive integer controlling the
returned window size. periodic
flag determines whether the returned
window trims off the last duplicate value from the symmetric window and is
ready to be used as a periodic window with functions like
torch_stft
. Therefore, if periodic
is true, the in
above formula is in fact
. Also, we always have
torch_hamming_window(L, periodic=TRUE)
equal to
torch_hamming_window(L + 1, periodic=False)[:-1])
.
If `window_length` \eqn{=1}, the returned window contains a single value 1.
This is a generalized version of `torch_hann_window`.
Hann_window
torch_hann_window( window_length, periodic = TRUE, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_hann_window( window_length, periodic = TRUE, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
window_length |
(int) the size of returned window |
periodic |
(bool, optional) If TRUE, returns a window to be used as periodic function. If False, return a symmetric window. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Hann window function.
where is the full window size.
The input window_length
is a positive integer controlling the
returned window size. periodic
flag determines whether the returned
window trims off the last duplicate value from the symmetric window and is
ready to be used as a periodic window with functions like
torch_stft
. Therefore, if periodic
is true, the in
above formula is in fact
. Also, we always have
torch_hann_window(L, periodic=TRUE)
equal to
torch_hann_window(L + 1, periodic=False)[:-1])
.
If `window_length` \eqn{=1}, the returned window contains a single value 1.
Heaviside
torch_heaviside(self, values)
torch_heaviside(self, values)
self |
(Tensor) the input tensor. |
values |
(Tensor) The values to use where |
Computes the Heaviside step function for each element in input
.
The Heaviside step function is defined as:
if (torch_is_installed()) { input <- torch_tensor(c(-1.5, 0, 2.0)) values <- torch_tensor(c(0.5)) torch_heaviside(input, values) values <- torch_tensor(c(1.2, -2.0, 3.5)) torch_heaviside(input, values) }
if (torch_is_installed()) { input <- torch_tensor(c(-1.5, 0, 2.0)) values <- torch_tensor(c(0.5)) torch_heaviside(input, values) values <- torch_tensor(c(1.2, -2.0, 3.5)) torch_heaviside(input, values) }
Histc
torch_histc(self, bins = 100L, min = 0L, max = 0L)
torch_histc(self, bins = 100L, min = 0L, max = 0L)
self |
(Tensor) the input tensor. |
bins |
(int) number of histogram bins |
min |
(int) lower end of the range (inclusive) |
max |
(int) upper end of the range (inclusive) |
Computes the histogram of a tensor.
The elements are sorted into equal width bins between min
and
max
. If min
and max
are both zero, the minimum and
maximum values of the data are used.
if (torch_is_installed()) { torch_histc(torch_tensor(c(1., 2, 1)), bins=4, min=0, max=3) }
if (torch_is_installed()) { torch_histc(torch_tensor(c(1., 2, 1)), bins=4, min=0, max=3) }
Hstack
torch_hstack(tensors)
torch_hstack(tensors)
tensors |
(sequence of Tensors) sequence of tensors to concatenate |
Stack tensors in sequence horizontally (column wise).
This is equivalent to concatenation along the first axis for 1-D tensors, and along the second axis for all other tensors.
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, 3)) b <- torch_tensor(c(4, 5, 6)) torch_hstack(list(a,b)) a <- torch_tensor(rbind(1,2,3)) b <- torch_tensor(rbind(4,5,6)) torch_hstack(list(a,b)) }
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, 3)) b <- torch_tensor(c(4, 5, 6)) torch_hstack(list(a,b)) a <- torch_tensor(rbind(1,2,3)) b <- torch_tensor(rbind(4,5,6)) torch_hstack(list(a,b)) }
Hypot
torch_hypot(self, other)
torch_hypot(self, other)
self |
(Tensor) the first input tensor |
other |
(Tensor) the second input tensor |
Given the legs of a right triangle, return its hypotenuse.
The shapes of input
and other
must be
broadcastable .
if (torch_is_installed()) { torch_hypot(torch_tensor(c(4.0)), torch_tensor(c(3.0, 4.0, 5.0))) }
if (torch_is_installed()) { torch_hypot(torch_tensor(c(4.0)), torch_tensor(c(3.0, 4.0, 5.0))) }
I0
torch_i0(self)
torch_i0(self)
self |
(Tensor) the input tensor |
Computes the zeroth order modified Bessel function of the first kind for each element of input
.
if (torch_is_installed()) { torch_i0(torch_arange(start = 0, end = 5, dtype=torch_float32())) }
if (torch_is_installed()) { torch_i0(torch_arange(start = 0, end = 5, dtype=torch_float32())) }
A list that represents the numerical properties of a integer type.
torch_iinfo(dtype)
torch_iinfo(dtype)
dtype |
dtype to get information from. |
Imag
torch_imag(self)
torch_imag(self)
self |
(Tensor) the input tensor. |
Returns the imaginary part of the input
tensor.
Not yet implemented.
if (torch_is_installed()) { ## Not run: torch_imag(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i))) ## End(Not run) }
if (torch_is_installed()) { ## Not run: torch_imag(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i))) ## End(Not run) }
Helper functions to index tensors.
torch_index(self, indices)
torch_index(self, indices)
self |
(Tensor) Tensor that will be indexed. |
indices |
( |
indices
.Modify values selected by indices
.
torch_index_put(self, indices, values, accumulate = FALSE)
torch_index_put(self, indices, values, accumulate = FALSE)
self |
(Tensor) Tensor that will be indexed. |
indices |
( |
values |
(Tensor) values that will be replaced the indexed location. Used
for |
accumulate |
(bool) Wether instead of replacing the current values with |
torch_index_put
.In-place version of torch_index_put
.
torch_index_put_(self, indices, values, accumulate = FALSE)
torch_index_put_(self, indices, values, accumulate = FALSE)
self |
(Tensor) Tensor that will be indexed. |
indices |
( |
values |
(Tensor) values that will be replaced the indexed location. Used
for |
accumulate |
(bool) Wether instead of replacing the current values with |
Index_select
torch_index_select(self, dim, index)
torch_index_select(self, dim, index)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension in which we index |
index |
(LongTensor) the 1-D tensor containing the indices to index |
Returns a new tensor which indexes the input
tensor along dimension
dim
using the entries in index
which is a LongTensor
.
The returned tensor has the same number of dimensions as the original tensor
(input
). The dim
\ th dimension has the same size as the length
of index
; other dimensions have the same size as in the original tensor.
The returned tensor does not use the same storage as the original
tensor. If out
has a different shape than expected, we
silently change it to the correct shape, reallocating the underlying
storage if necessary.
if (torch_is_installed()) { x = torch_randn(c(3, 4)) x indices = torch_tensor(c(1, 3), dtype = torch_int64()) torch_index_select(x, 1, indices) torch_index_select(x, 2, indices) }
if (torch_is_installed()) { x = torch_randn(c(3, 4)) x indices = torch_tensor(c(1, 3), dtype = torch_int64()) torch_index_select(x, 1, indices) torch_index_select(x, 2, indices) }
A simple exported version of install_path Returns the torch installation path.
torch_install_path()
torch_install_path()
Inverse
torch_inverse(self)
torch_inverse(self)
self |
(Tensor) the input tensor of size |
Takes the inverse of the square matrix input
. input
can be batches
of 2D square tensors, in which case this function would return a tensor composed of
individual inverses.
Irrespective of the original strides, the returned tensors will be transposed, i.e. with strides like `input.contiguous().transpose(-2, -1).stride()`
if (torch_is_installed()) { ## Not run: x = torch_rand(c(4, 4)) y = torch_inverse(x) z = torch_mm(x, y) z torch_max(torch_abs(z - torch_eye(4))) # Max non-zero # Batched inverse example x = torch_randn(c(2, 3, 4, 4)) y = torch_inverse(x) z = torch_matmul(x, y) torch_max(torch_abs(z - torch_eye(4)$expand_as(x))) # Max non-zero ## End(Not run) }
if (torch_is_installed()) { ## Not run: x = torch_rand(c(4, 4)) y = torch_inverse(x) z = torch_mm(x, y) z torch_max(torch_abs(z - torch_eye(4))) # Max non-zero # Batched inverse example x = torch_randn(c(2, 3, 4, 4)) y = torch_inverse(x) z = torch_matmul(x, y) torch_max(torch_abs(z - torch_eye(4)$expand_as(x))) # Max non-zero ## End(Not run) }
Is_complex
torch_is_complex(self)
torch_is_complex(self)
self |
(Tensor) the PyTorch tensor to test |
Returns TRUE if the data type of input
is a complex data type i.e.,
one of torch_complex64
, and torch.complex128
.
Is_floating_point
torch_is_floating_point(self)
torch_is_floating_point(self)
self |
(Tensor) the PyTorch tensor to test |
Returns TRUE if the data type of input
is a floating point data type i.e.,
one of torch_float64
, torch.float32
and torch.float16
.
Verifies if torch is installed
torch_is_installed()
torch_is_installed()
Is_nonzero
torch_is_nonzero(self)
torch_is_nonzero(self)
self |
(Tensor) the input tensor. |
Returns TRUE if the input
is a single element tensor which is not equal to zero
after type conversions.
i.e. not equal to torch_tensor(c(0))
or torch_tensor(c(0))
or
torch_tensor(c(FALSE))
.
Throws a RuntimeError
if torch_numel() != 1
(even in case
of sparse tensors).
if (torch_is_installed()) { torch_is_nonzero(torch_tensor(c(0.))) torch_is_nonzero(torch_tensor(c(1.5))) torch_is_nonzero(torch_tensor(c(FALSE))) torch_is_nonzero(torch_tensor(c(3))) if (FALSE) { torch_is_nonzero(torch_tensor(c(1, 3, 5))) torch_is_nonzero(torch_tensor(c())) } }
if (torch_is_installed()) { torch_is_nonzero(torch_tensor(c(0.))) torch_is_nonzero(torch_tensor(c(1.5))) torch_is_nonzero(torch_tensor(c(FALSE))) torch_is_nonzero(torch_tensor(c(3))) if (FALSE) { torch_is_nonzero(torch_tensor(c(1, 3, 5))) torch_is_nonzero(torch_tensor(c())) } }
Isclose
torch_isclose(self, other, rtol = 1e-05, atol = 1e-08, equal_nan = FALSE)
torch_isclose(self, other, rtol = 1e-05, atol = 1e-08, equal_nan = FALSE)
self |
(Tensor) first tensor to compare |
other |
(Tensor) second tensor to compare |
rtol |
(float, optional) relative tolerance. Default: 1e-05 |
atol |
(float, optional) absolute tolerance. Default: 1e-08 |
equal_nan |
(bool, optional) if |
Returns a new tensor with boolean elements representing if each element of
input
is "close" to the corresponding element of other
.
Closeness is defined as:
where input
and other
are finite. Where input
and/or other
are nonfinite they are close if and only if
they are equal, with NaNs being considered equal to each other when
equal_nan
is TRUE.
if (torch_is_installed()) { torch_isclose(torch_tensor(c(1., 2, 3)), torch_tensor(c(1 + 1e-10, 3, 4))) torch_isclose(torch_tensor(c(Inf, 4)), torch_tensor(c(Inf, 6)), rtol=.5) }
if (torch_is_installed()) { torch_isclose(torch_tensor(c(1., 2, 3)), torch_tensor(c(1 + 1e-10, 3, 4))) torch_isclose(torch_tensor(c(Inf, 4)), torch_tensor(c(Inf, 6)), rtol=.5) }
Isfinite
torch_isfinite(self)
torch_isfinite(self)
self |
(Tensor) A tensor to check |
Returns a new tensor with boolean elements representing if each element is Finite
or not.
if (torch_is_installed()) { torch_isfinite(torch_tensor(c(1, Inf, 2, -Inf, NaN))) }
if (torch_is_installed()) { torch_isfinite(torch_tensor(c(1, Inf, 2, -Inf, NaN))) }
Isinf
torch_isinf(self)
torch_isinf(self)
self |
(Tensor) A tensor to check |
Returns a new tensor with boolean elements representing if each element is +/-INF
or not.
if (torch_is_installed()) { torch_isinf(torch_tensor(c(1, Inf, 2, -Inf, NaN))) }
if (torch_is_installed()) { torch_isinf(torch_tensor(c(1, Inf, 2, -Inf, NaN))) }
Isnan
torch_isnan(self)
torch_isnan(self)
self |
(Tensor) A tensor to check |
Returns a new tensor with boolean elements representing if each element is NaN
or not.
if (torch_is_installed()) { torch_isnan(torch_tensor(c(1, NaN, 2))) }
if (torch_is_installed()) { torch_isnan(torch_tensor(c(1, NaN, 2))) }
Isneginf
torch_isneginf(self)
torch_isneginf(self)
self |
(Tensor) the input tensor. |
Tests if each element of input
is negative infinity or not.
if (torch_is_installed()) { a <- torch_tensor(c(-Inf, Inf, 1.2)) torch_isneginf(a) }
if (torch_is_installed()) { a <- torch_tensor(c(-Inf, Inf, 1.2)) torch_isneginf(a) }
Isposinf
torch_isposinf(self)
torch_isposinf(self)
self |
(Tensor) the input tensor. |
Tests if each element of input
is positive infinity or not.
if (torch_is_installed()) { a <- torch_tensor(c(-Inf, Inf, 1.2)) torch_isposinf(a) }
if (torch_is_installed()) { a <- torch_tensor(c(-Inf, Inf, 1.2)) torch_isposinf(a) }
Isreal
torch_isreal(self)
torch_isreal(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with boolean elements representing if each element of input
is real-valued or not.
All real-valued types are considered real. Complex values are considered real when their imaginary part is 0.
if (torch_is_installed()) { if (FALSE) { torch_isreal(torch_tensor(c(1, 1+1i, 2+0i))) } }
if (torch_is_installed()) { if (FALSE) { torch_isreal(torch_tensor(c(1, 1+1i, 2+0i))) } }
Inverse short time Fourier Transform. This is expected to be the inverse of torch_stft()
.
torch_istft( self, n_fft, hop_length = NULL, win_length = NULL, window = list(), center = TRUE, normalized = FALSE, onesided = NULL, length = NULL, return_complex = FALSE )
torch_istft( self, n_fft, hop_length = NULL, win_length = NULL, window = list(), center = TRUE, normalized = FALSE, onesided = NULL, length = NULL, return_complex = FALSE )
self |
(Tensor) The input tensor. Expected to be output of |
n_fft |
(int) Size of Fourier transform |
hop_length |
(Optional |
win_length |
(Optional |
window |
(Optional(torch.Tensor)) The optional window function.
(Default: |
center |
(bool) Whether |
normalized |
(bool) Whether the STFT was normalized. (Default: |
onesided |
(Optional(bool)) Whether the STFT was onesided.
(Default: |
length |
(Optional(int)]) The amount to trim the signal by (i.e. the original signal length). (Default: whole signal) |
return_complex |
(Optional(bool)) Whether the output should be complex,
or if the input should be assumed to derive from a real signal and window.
Note that this is incompatible with |
It has the same parameters (+ additional optional parameter of length
) and it should return the
least squares estimation of the original signal. The algorithm will check using the NOLA
condition (nonzero overlap).
Important consideration in the parameters window
and center
so that the envelop
created by the summation of all the windows is never zero at certain point in time. Specifically,
.
Since torch_stft()
discards elements at the end of the signal if they do not fit in a frame,
istft
may return a shorter signal than the original signal (can occur if center
is FALSE
since the signal isn't padded).
If center
is TRUE
, then there will be padding e.g. 'constant'
, 'reflect'
, etc.
Left padding can be trimmed off exactly because they can be calculated but right
padding cannot be calculated without additional information.
Example: Suppose the last window is:
[c(17, 18, 0, 0, 0)
vs c(18, 0, 0, 0, 0)
The n_fft
, hop_length
, win_length
are all the same which prevents the calculation
of right padding. These additional values could be zeros or a reflection of the signal so providing
length
could be useful. If length
is None
then padding will be aggressively removed
(some loss of signal).
D. W. Griffin and J. S. Lim, "Signal estimation from modified short-time Fourier transform," IEEE Trans. ASSP, vol.32, no.2, pp.236-243, Apr. 1984.
Kaiser_window
torch_kaiser_window( window_length, periodic, beta, dtype = NULL, layout = NULL, device = NULL, requires_grad = NULL )
torch_kaiser_window( window_length, periodic, beta, dtype = NULL, layout = NULL, device = NULL, requires_grad = NULL )
window_length |
(int) length of the window. |
periodic |
(bool, optional) If TRUE, returns a periodic window suitable for use in spectral analysis. If FALSE, returns a symmetric window suitable for use in filter design. |
beta |
(float, optional) shape parameter for the window. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Computes the Kaiser window with window length window_length
and shape parameter beta
.
Let I_0 be the zeroth order modified Bessel function of the first kind (see torch_i0()
) and
N = L - 1
if periodic
is FALSE and L
if periodic
is TRUE,
where L
is the window_length
. This function computes:
Calling torch_kaiser_window(L, B, periodic=TRUE)
is equivalent to calling
torch_kaiser_window(L + 1, B, periodic=FALSE)[:-1])
.
The periodic
argument is intended as a helpful shorthand
to produce a periodic window as input to functions like torch_stft()
.
If window_length
is one, then the returned window is a single element
tensor containing a one.
Computes the Kronecker product of self
and other
.
torch_kron(self, other)
torch_kron(self, other)
self |
( |
other |
( |
Kthvalue
torch_kthvalue(self, k, dim = -1L, keepdim = FALSE)
torch_kthvalue(self, k, dim = -1L, keepdim = FALSE)
self |
(Tensor) the input tensor. |
k |
(int) k for the k-th smallest element |
dim |
(int, optional) the dimension to find the kth value along |
keepdim |
(bool) whether the output tensor has |
Returns a namedtuple (values, indices)
where values
is the k
th
smallest element of each row of the input
tensor in the given dimension
dim
. And indices
is the index location of each element found.
If dim
is not given, the last dimension of the input
is chosen.
If keepdim
is TRUE
, both the values
and indices
tensors
are the same size as input
, except in the dimension dim
where
they are of size 1. Otherwise, dim
is squeezed
(see torch_squeeze
), resulting in both the values
and
indices
tensors having 1 fewer dimension than the input
tensor.
if (torch_is_installed()) { x <- torch_arange(1, 6) x torch_kthvalue(x, 4) x <- torch_arange(1,6)$resize_(c(2,3)) x torch_kthvalue(x, 2, 1, TRUE) }
if (torch_is_installed()) { x <- torch_arange(1, 6) x torch_kthvalue(x, 4) x <- torch_arange(1,6)$resize_(c(2,3)) x torch_kthvalue(x, 2, 1, TRUE) }
Creates the corresponding layout
torch_strided() torch_sparse_coo()
torch_strided() torch_sparse_coo()
Lcm
torch_lcm(self, other)
torch_lcm(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
Computes the element-wise least common multiple (LCM) of input
and other
.
Both input
and other
must have integer types.
This defines and
.
if (torch_is_installed()) { if (torch::cuda_is_available()) { a <- torch_tensor(c(5, 10, 15), dtype = torch_long(), device = "cuda") b <- torch_tensor(c(3, 4, 5), dtype = torch_long(), device = "cuda") torch_lcm(a, b) c <- torch_tensor(c(3L), device = "cuda") torch_lcm(a, c) } }
if (torch_is_installed()) { if (torch::cuda_is_available()) { a <- torch_tensor(c(5, 10, 15), dtype = torch_long(), device = "cuda") b <- torch_tensor(c(3, 4, 5), dtype = torch_long(), device = "cuda") torch_lcm(a, b) c <- torch_tensor(c(3L), device = "cuda") torch_lcm(a, c) } }
Le
torch_le(self, other)
torch_le(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Computes element-wise.
The second argument can be a number or a tensor whose shape is broadcastable with the first argument.
if (torch_is_installed()) { torch_le(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
if (torch_is_installed()) { torch_le(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
Lerp
torch_lerp(self, end, weight)
torch_lerp(self, end, weight)
self |
(Tensor) the tensor with the starting points |
end |
(Tensor) the tensor with the ending points |
weight |
(float or tensor) the weight for the interpolation formula |
Does a linear interpolation of two tensors start
(given by input
) and end
based
on a scalar or tensor weight
and returns the resulting out
tensor.
The shapes of start
and end
must be
broadcastable . If weight
is a tensor, then
the shapes of weight
, start
, and end
must be broadcastable .
if (torch_is_installed()) { start = torch_arange(1, 4) end = torch_empty(4)$fill_(10) start end torch_lerp(start, end, 0.5) torch_lerp(start, end, torch_full_like(start, 0.5)) }
if (torch_is_installed()) { start = torch_arange(1, 4) end = torch_empty(4)$fill_(10) start end torch_lerp(start, end, 0.5) torch_lerp(start, end, torch_full_like(start, 0.5)) }
Less
torch_less(self, other)
torch_less(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Alias for torch_lt()
.
Less_equal
torch_less_equal(self, other)
torch_less_equal(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Alias for torch_le()
.
Lgamma
torch_lgamma(self)
torch_lgamma(self)
self |
(Tensor) the input tensor. |
Computes the logarithm of the gamma function on input
.
if (torch_is_installed()) { a = torch_arange(0.5, 2, 0.5) torch_lgamma(a) }
if (torch_is_installed()) { a = torch_arange(0.5, 2, 0.5) torch_lgamma(a) }
Linspace
torch_linspace( start, end, steps = 100, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_linspace( start, end, steps = 100, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
start |
(float) the starting value for the set of points |
end |
(float) the ending value for the set of points |
steps |
(int) number of points to sample between |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a one-dimensional tensor of steps
equally spaced points between start
and end
.
The output tensor is 1-D of size steps
.
if (torch_is_installed()) { torch_linspace(3, 10, steps=5) torch_linspace(-10, 10, steps=5) torch_linspace(start=-10, end=10, steps=5) torch_linspace(start=-10, end=10, steps=1) }
if (torch_is_installed()) { torch_linspace(3, 10, steps=5) torch_linspace(-10, 10, steps=5) torch_linspace(start=-10, end=10, steps=5) torch_linspace(start=-10, end=10, steps=1) }
Loads a saved object
torch_load(path, device = "cpu")
torch_load(path, device = "cpu")
path |
a path to the saved object |
device |
a device to load tensors to. By default we load to the |
Other torch_save:
torch_save()
,
torch_serialize()
Log
torch_log(self)
torch_log(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the natural logarithm of the elements
of input
.
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_log(a) }
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_log(a) }
Log10
torch_log10(self)
torch_log10(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the logarithm to the base 10 of the elements
of input
.
if (torch_is_installed()) { a = torch_rand(5) a torch_log10(a) }
if (torch_is_installed()) { a = torch_rand(5) a torch_log10(a) }
Log1p
torch_log1p(self)
torch_log1p(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the natural logarithm of (1 + input
).
This function is more accurate than torch_log
for small
values of input
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_log1p(a) }
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_log1p(a) }
Log2
torch_log2(self)
torch_log2(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the logarithm to the base 2 of the elements
of input
.
if (torch_is_installed()) { a = torch_rand(5) a torch_log2(a) }
if (torch_is_installed()) { a = torch_rand(5) a torch_log2(a) }
Logaddexp
torch_logaddexp(self, other)
torch_logaddexp(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
Logarithm of the sum of exponentiations of the inputs.
Calculates pointwise . This function is useful
in statistics where the calculated probabilities of events may be so small as to
exceed the range of normal floating point numbers. In such cases the logarithm
of the calculated probability is stored. This function allows adding
probabilities stored in such a fashion.
This op should be disambiguated with torch_logsumexp()
which performs a
reduction on a single tensor.
if (torch_is_installed()) { torch_logaddexp(torch_tensor(c(-1.0)), torch_tensor(c(-1.0, -2, -3))) torch_logaddexp(torch_tensor(c(-100.0, -200, -300)), torch_tensor(c(-1.0, -2, -3))) torch_logaddexp(torch_tensor(c(1.0, 2000, 30000)), torch_tensor(c(-1.0, -2, -3))) }
if (torch_is_installed()) { torch_logaddexp(torch_tensor(c(-1.0)), torch_tensor(c(-1.0, -2, -3))) torch_logaddexp(torch_tensor(c(-100.0, -200, -300)), torch_tensor(c(-1.0, -2, -3))) torch_logaddexp(torch_tensor(c(1.0, 2000, 30000)), torch_tensor(c(-1.0, -2, -3))) }
Logaddexp2
torch_logaddexp2(self, other)
torch_logaddexp2(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
Logarithm of the sum of exponentiations of the inputs in base-2.
Calculates pointwise . See
torch_logaddexp()
for more details.
Logcumsumexp
torch_logcumsumexp(self, dim)
torch_logcumsumexp(self, dim)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to do the operation over |
Returns the logarithm of the cumulative summation of the exponentiation of
elements of input
in the dimension dim
.
For summation index given by
dim
and other indices , the result is
if (torch_is_installed()) { a <- torch_randn(c(10)) torch_logcumsumexp(a, dim=1) }
if (torch_is_installed()) { a <- torch_randn(c(10)) torch_logcumsumexp(a, dim=1) }
Logdet
torch_logdet(self)
torch_logdet(self)
self |
(Tensor) the input tensor of size |
Calculates log determinant of a square matrix or batches of square matrices.
Result is `-inf` if `input` has zero log determinant, and is `NaN` if `input` has negative determinant.
Backward through `logdet` internally uses SVD results when `input` is not invertible. In this case, double backward through `logdet` will be unstable in when `input` doesn't have distinct singular values. See `~torch.svd` for details.
if (torch_is_installed()) { A = torch_randn(c(3, 3)) torch_det(A) torch_logdet(A) A A$det() A$det()$log() }
if (torch_is_installed()) { A = torch_randn(c(3, 3)) torch_det(A) torch_logdet(A) A A$det() A$det()$log() }
Logical_and
torch_logical_and(self, other)
torch_logical_and(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the tensor to compute AND with |
Computes the element-wise logical AND of the given input tensors. Zeros are treated as FALSE
and nonzeros are
treated as TRUE
.
if (torch_is_installed()) { torch_logical_and(torch_tensor(c(TRUE, FALSE, TRUE)), torch_tensor(c(TRUE, FALSE, FALSE))) a = torch_tensor(c(0, 1, 10, 0), dtype=torch_int8()) b = torch_tensor(c(4, 0, 1, 0), dtype=torch_int8()) torch_logical_and(a, b) ## Not run: torch_logical_and(a, b, out=torch_empty(4, dtype=torch_bool())) ## End(Not run) }
if (torch_is_installed()) { torch_logical_and(torch_tensor(c(TRUE, FALSE, TRUE)), torch_tensor(c(TRUE, FALSE, FALSE))) a = torch_tensor(c(0, 1, 10, 0), dtype=torch_int8()) b = torch_tensor(c(4, 0, 1, 0), dtype=torch_int8()) torch_logical_and(a, b) ## Not run: torch_logical_and(a, b, out=torch_empty(4, dtype=torch_bool())) ## End(Not run) }
Logical_not
self |
(Tensor) the input tensor. |
Computes the element-wise logical NOT of the given input tensor. If not specified, the output tensor will have the bool
dtype. If the input tensor is not a bool tensor, zeros are treated as FALSE
and non-zeros are treated as TRUE
.
if (torch_is_installed()) { torch_logical_not(torch_tensor(c(TRUE, FALSE))) torch_logical_not(torch_tensor(c(0, 1, -10), dtype=torch_int8())) torch_logical_not(torch_tensor(c(0., 1.5, -10.), dtype=torch_double())) }
if (torch_is_installed()) { torch_logical_not(torch_tensor(c(TRUE, FALSE))) torch_logical_not(torch_tensor(c(0, 1, -10), dtype=torch_int8())) torch_logical_not(torch_tensor(c(0., 1.5, -10.), dtype=torch_double())) }
Logical_or
torch_logical_or(self, other)
torch_logical_or(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the tensor to compute OR with |
Computes the element-wise logical OR of the given input tensors. Zeros are treated as FALSE
and nonzeros are
treated as TRUE
.
if (torch_is_installed()) { torch_logical_or(torch_tensor(c(TRUE, FALSE, TRUE)), torch_tensor(c(TRUE, FALSE, FALSE))) a = torch_tensor(c(0, 1, 10, 0), dtype=torch_int8()) b = torch_tensor(c(4, 0, 1, 0), dtype=torch_int8()) torch_logical_or(a, b) ## Not run: torch_logical_or(a$double(), b$double()) torch_logical_or(a$double(), b) torch_logical_or(a, b, out=torch_empty(4, dtype=torch_bool())) ## End(Not run) }
if (torch_is_installed()) { torch_logical_or(torch_tensor(c(TRUE, FALSE, TRUE)), torch_tensor(c(TRUE, FALSE, FALSE))) a = torch_tensor(c(0, 1, 10, 0), dtype=torch_int8()) b = torch_tensor(c(4, 0, 1, 0), dtype=torch_int8()) torch_logical_or(a, b) ## Not run: torch_logical_or(a$double(), b$double()) torch_logical_or(a$double(), b) torch_logical_or(a, b, out=torch_empty(4, dtype=torch_bool())) ## End(Not run) }
Logical_xor
torch_logical_xor(self, other)
torch_logical_xor(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the tensor to compute XOR with |
Computes the element-wise logical XOR of the given input tensors. Zeros are treated as FALSE
and nonzeros are
treated as TRUE
.
if (torch_is_installed()) { torch_logical_xor(torch_tensor(c(TRUE, FALSE, TRUE)), torch_tensor(c(TRUE, FALSE, FALSE))) a = torch_tensor(c(0, 1, 10, 0), dtype=torch_int8()) b = torch_tensor(c(4, 0, 1, 0), dtype=torch_int8()) torch_logical_xor(a, b) torch_logical_xor(a$to(dtype=torch_double()), b$to(dtype=torch_double())) torch_logical_xor(a$to(dtype=torch_double()), b) }
if (torch_is_installed()) { torch_logical_xor(torch_tensor(c(TRUE, FALSE, TRUE)), torch_tensor(c(TRUE, FALSE, FALSE))) a = torch_tensor(c(0, 1, 10, 0), dtype=torch_int8()) b = torch_tensor(c(4, 0, 1, 0), dtype=torch_int8()) torch_logical_xor(a, b) torch_logical_xor(a$to(dtype=torch_double()), b$to(dtype=torch_double())) torch_logical_xor(a$to(dtype=torch_double()), b) }
Logit
torch_logit(self, eps = NULL)
torch_logit(self, eps = NULL)
self |
(Tensor) the input tensor. |
eps |
(float, optional) the epsilon for input clamp bound. Default: |
Returns a new tensor with the logit of the elements of input
.
input
is clamped to [eps, 1 - eps]
when eps is not None.
When eps is None and input
< 0 or input
> 1, the function will yields NaN.
if (torch_is_installed()) { a <- torch_rand(5) a torch_logit(a, eps=1e-6) }
if (torch_is_installed()) { a <- torch_rand(5) a torch_logit(a, eps=1e-6) }
Logspace
torch_logspace( start, end, steps = 100, base = 10, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_logspace( start, end, steps = 100, base = 10, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
start |
(float) the starting value for the set of points |
end |
(float) the ending value for the set of points |
steps |
(int) number of points to sample between |
base |
(float) base of the logarithm function. Default: |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a one-dimensional tensor of steps
points
logarithmically spaced with base base
between
and
.
The output tensor is 1-D of size steps
.
if (torch_is_installed()) { torch_logspace(start=-10, end=10, steps=5) torch_logspace(start=0.1, end=1.0, steps=5) torch_logspace(start=0.1, end=1.0, steps=1) torch_logspace(start=2, end=2, steps=1, base=2) }
if (torch_is_installed()) { torch_logspace(start=-10, end=10, steps=5) torch_logspace(start=0.1, end=1.0, steps=5) torch_logspace(start=0.1, end=1.0, steps=1) torch_logspace(start=2, end=2, steps=1, base=2) }
Logsumexp
torch_logsumexp(self, dim, keepdim = FALSE)
torch_logsumexp(self, dim, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
keepdim |
(bool) whether the output tensor has |
Returns the log of summed exponentials of each row of the input
tensor in the given dimension dim
. The computation is numerically
stabilized.
For summation index given by
dim
and other indices , the result is
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
if (torch_is_installed()) { a = torch_randn(c(3, 3)) torch_logsumexp(a, 1) }
if (torch_is_installed()) { a = torch_randn(c(3, 3)) torch_logsumexp(a, 1) }
Lstsq
self |
(Tensor) the matrix |
A |
(Tensor) the |
Computes the solution to the least squares and least norm problems for a full
rank matrix of size
and a matrix
of
size
.
If ,
torch_lstsq()
solves the least-squares problem:
If ,
torch_lstsq()
solves the least-norm problem:
Returned tensor has shape
. The first
rows of
contains the solution. If
, the residual sum of squares
for the solution in each column is given by the sum of squares of elements in the
remaining
rows of that column.
The case when \eqn{m < n} is not supported on the GPU.
Lt
torch_lt(self, other)
torch_lt(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Computes element-wise.
The second argument can be a number or a tensor whose shape is broadcastable with the first argument.
if (torch_is_installed()) { torch_lt(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
if (torch_is_installed()) { torch_lt(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(c(1,1,4,4), ncol = 2, byrow=TRUE))) }
Computes the LU factorization of a matrix or batches of matrices A. Returns a tuple containing the LU factorization and pivots of A. Pivoting is done if pivot is set to True.
torch_lu(A, pivot = TRUE, get_infos = FALSE, out = NULL)
torch_lu(A, pivot = TRUE, get_infos = FALSE, out = NULL)
A |
(Tensor) the tensor to factor of size (, m, n)(,m,n) |
pivot |
(bool, optional) – controls whether pivoting is done. Default: TRUE |
get_infos |
(bool, optional) – if set to True, returns an info IntTensor. Default: FALSE |
out |
(tuple, optional) – optional output tuple. If get_infos is True, then the elements in the tuple are Tensor, IntTensor, and IntTensor. If get_infos is False, then the elements in the tuple are Tensor, IntTensor. Default: NULL |
if (torch_is_installed()) { A <- torch_randn(c(2, 3, 3)) torch_lu(A) }
if (torch_is_installed()) { A <- torch_randn(c(2, 3, 3)) torch_lu(A) }
Lu_solve
torch_lu_solve(self, LU_data, LU_pivots)
torch_lu_solve(self, LU_data, LU_pivots)
self |
(Tensor) the RHS tensor of size |
LU_data |
(Tensor) the pivoted LU factorization of A from |
LU_pivots |
(IntTensor) the pivots of the LU factorization from |
Returns the LU solve of the linear system using the partially pivoted
LU factorization of A from
torch_lu
.
if (torch_is_installed()) { A = torch_randn(c(2, 3, 3)) b = torch_randn(c(2, 3, 1)) out = torch_lu(A) x = torch_lu_solve(b, out[[1]], out[[2]]) torch_norm(torch_bmm(A, x) - b) }
if (torch_is_installed()) { A = torch_randn(c(2, 3, 3)) b = torch_randn(c(2, 3, 1)) out = torch_lu(A) x = torch_lu_solve(b, out[[1]], out[[2]]) torch_norm(torch_bmm(A, x) - b) }
Lu_unpack
torch_lu_unpack(LU_data, LU_pivots, unpack_data = TRUE, unpack_pivots = TRUE)
torch_lu_unpack(LU_data, LU_pivots, unpack_data = TRUE, unpack_pivots = TRUE)
LU_data |
(Tensor) – the packed LU factorization data |
LU_pivots |
(Tensor) – the packed LU factorization pivots |
unpack_data |
(logical) – flag indicating if the data should be unpacked. If FALSE, then the returned L and U are NULL Default: TRUE |
unpack_pivots |
(logical) – flag indicating if the pivots should be unpacked into a permutation matrix P. If FALSE, then the returned P is None. Default: TRUE |
Unpacks the data and pivots from a LU factorization of a tensor into tensors L
and U
and
a permutation tensor P
such that LU_data_and_pivots <- torch_lu(P$matmul(L)$matmul(U))
.
Returns a list of tensors as list(the P tensor (permutation matrix), the L tensor, the U tensor)
Sets the seed for generating random numbers.
torch_manual_seed(seed) local_torch_manual_seed(seed, .env = parent.frame()) with_torch_manual_seed(code, ..., seed)
torch_manual_seed(seed) local_torch_manual_seed(seed, .env = parent.frame()) with_torch_manual_seed(code, ..., seed)
seed |
integer seed. |
.env |
environment that will take the modifications from manual_seed. |
code |
expression to run in the context of the seed |
... |
unused currently. |
local_torch_manual_seed()
: Modifies the torch seed in the environment scope.
with_torch_manual_seed()
: A with context to change the seed during the function execution.
Currently the local_torch_manual_seed
and with_torch_manual_seed
won't
work with Tensors in the MPS device. You can sample the tensors on CPU and
move them to MPS if reproducibility is required.
Masked_select
torch_masked_select(self, mask)
torch_masked_select(self, mask)
self |
(Tensor) the input tensor. |
mask |
(BoolTensor) the tensor containing the binary mask to index with |
Returns a new 1-D tensor which indexes the input
tensor according to
the boolean mask mask
which is a BoolTensor
.
The shapes of the mask
tensor and the input
tensor don't need
to match, but they must be broadcastable .
The returned tensor does not use the same storage as the original tensor
if (torch_is_installed()) { x = torch_randn(c(3, 4)) x mask = x$ge(0.5) mask torch_masked_select(x, mask) }
if (torch_is_installed()) { x = torch_randn(c(3, 4)) x mask = x$ge(0.5) mask torch_masked_select(x, mask) }
Matmul
torch_matmul(self, other)
torch_matmul(self, other)
self |
(Tensor) the first tensor to be multiplied |
other |
(Tensor) the second tensor to be multiplied |
Matrix product of two tensors.
The behavior depends on the dimensionality of the tensors as follows:
If both tensors are 1-dimensional, the dot product (scalar) is returned.
If both arguments are 2-dimensional, the matrix-matrix product is returned.
If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.
If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.
If both arguments are at least 1-dimensional and at least one argument is
N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first
argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the
batched matrix multiply and removed after. If the second argument is 1-dimensional, a
1 is appended to its dimension for the purpose of the batched matrix multiple and removed after.
The non-matrix (i.e. batch) dimensions are broadcasted (and thus
must be broadcastable). For example, if input
is a
tensor and
other
is a
tensor,
out
will be an tensor.
The 1-dimensional dot product version of this function does not support an `out` parameter.
if (torch_is_installed()) { # vector x vector tensor1 = torch_randn(c(3)) tensor2 = torch_randn(c(3)) torch_matmul(tensor1, tensor2) # matrix x vector tensor1 = torch_randn(c(3, 4)) tensor2 = torch_randn(c(4)) torch_matmul(tensor1, tensor2) # batched matrix x broadcasted vector tensor1 = torch_randn(c(10, 3, 4)) tensor2 = torch_randn(c(4)) torch_matmul(tensor1, tensor2) # batched matrix x batched matrix tensor1 = torch_randn(c(10, 3, 4)) tensor2 = torch_randn(c(10, 4, 5)) torch_matmul(tensor1, tensor2) # batched matrix x broadcasted matrix tensor1 = torch_randn(c(10, 3, 4)) tensor2 = torch_randn(c(4, 5)) torch_matmul(tensor1, tensor2) }
if (torch_is_installed()) { # vector x vector tensor1 = torch_randn(c(3)) tensor2 = torch_randn(c(3)) torch_matmul(tensor1, tensor2) # matrix x vector tensor1 = torch_randn(c(3, 4)) tensor2 = torch_randn(c(4)) torch_matmul(tensor1, tensor2) # batched matrix x broadcasted vector tensor1 = torch_randn(c(10, 3, 4)) tensor2 = torch_randn(c(4)) torch_matmul(tensor1, tensor2) # batched matrix x batched matrix tensor1 = torch_randn(c(10, 3, 4)) tensor2 = torch_randn(c(10, 4, 5)) torch_matmul(tensor1, tensor2) # batched matrix x broadcasted matrix tensor1 = torch_randn(c(10, 3, 4)) tensor2 = torch_randn(c(4, 5)) torch_matmul(tensor1, tensor2) }
Matrix_exp
torch_matrix_exp(self)
torch_matrix_exp(self)
self |
(Tensor) the input tensor. |
Returns the matrix exponential. Supports batched input.
For a matrix A
, the matrix exponential is defined as
The implementation is based on: Bader, P.; Blanes, S.; Casas, F. Computing the Matrix Exponential with an Optimized Taylor Polynomial Approximation. Mathematics 2019, 7, 1174.
if (torch_is_installed()) { a <- torch_randn(c(2, 2, 2)) a[1, , ] <- torch_eye(2, 2) a[2, , ] <- 2 * torch_eye(2, 2) a torch_matrix_exp(a) x <- torch_tensor(rbind(c(0, pi/3), c(-pi/3, 0))) x$matrix_exp() # should be [[cos(pi/3), sin(pi/3)], [-sin(pi/3), cos(pi/3)]] }
if (torch_is_installed()) { a <- torch_randn(c(2, 2, 2)) a[1, , ] <- torch_eye(2, 2) a[2, , ] <- 2 * torch_eye(2, 2) a torch_matrix_exp(a) x <- torch_tensor(rbind(c(0, pi/3), c(-pi/3, 0))) x$matrix_exp() # should be [[cos(pi/3), sin(pi/3)], [-sin(pi/3), cos(pi/3)]] }
Matrix_power
torch_matrix_power(self, n)
torch_matrix_power(self, n)
self |
(Tensor) the input tensor. |
n |
(int) the power to raise the matrix to |
Returns the matrix raised to the power n
for square matrices.
For batch of matrices, each individual matrix is raised to the power n
.
If n
is negative, then the inverse of the matrix (if invertible) is
raised to the power n
. For a batch of matrices, the batched inverse
(if invertible) is raised to the power n
. If n
is 0, then an identity matrix
is returned.
if (torch_is_installed()) { a = torch_randn(c(2, 2, 2)) a torch_matrix_power(a, 3) }
if (torch_is_installed()) { a = torch_randn(c(2, 2, 2)) a torch_matrix_power(a, 3) }
Matrix_rank
self |
(Tensor) the input 2-D tensor |
tol |
(float, optional) the tolerance value. Default: |
symmetric |
(bool, optional) indicates whether |
Returns the numerical rank of a 2-D tensor. The method to compute the
matrix rank is done using SVD by default. If symmetric
is TRUE
,
then input
is assumed to be symmetric, and the computation of the
rank is done by obtaining the eigenvalues.
tol
is the threshold below which the singular values (or the eigenvalues
when symmetric
is TRUE
) are considered to be 0. If tol
is not
specified, tol
is set to S.max() * max(S.size()) * eps
where S
is the
singular values (or the eigenvalues when symmetric
is TRUE
), and eps
is the epsilon value for the datatype of input
.
Max
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
out |
(tuple, optional) the result tuple of two output tensors (max, max_indices) |
other |
(Tensor) the second input tensor |
Returns the maximum value of all elements in the input
tensor.
Returns a namedtuple (values, indices)
where values
is the maximum
value of each row of the input
tensor in the given dimension
dim
. And indices
is the index location of each maximum value found
(argmax).
indices
does not necessarily contain the first occurrence of each
maximal value found, unless it is unique.
The exact implementation details are device-specific.
Do not expect the same result when run on CPU and GPU in general.
If keepdim
is TRUE
, the output tensors are of the same size
as input
except in the dimension dim
where they are of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting
in the output tensors having 1 fewer dimension than input
.
Each element of the tensor input
is compared with the corresponding
element of the tensor other
and an element-wise maximum is taken.
The shapes of input
and other
don't need to match,
but they must be broadcastable .
When the shapes do not match, the shape of the returned output tensor follows the broadcasting rules .
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_max(a) a = torch_randn(c(4, 4)) a torch_max(a, dim = 1) a = torch_randn(c(4)) a b = torch_randn(c(4)) b torch_max(a, other = b) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_max(a) a = torch_randn(c(4, 4)) a torch_max(a, dim = 1) a = torch_randn(c(4)) a b = torch_randn(c(4)) b torch_max(a, other = b) }
Maximum
torch_maximum(self, other)
torch_maximum(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
Computes the element-wise maximum of input
and other
.
If one of the elements being compared is a NaN, then that element is returned.
torch_maximum()
is not supported for tensors with complex dtypes.
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, -1)) b <- torch_tensor(c(3, 0, 4)) torch_maximum(a, b) }
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, -1)) b <- torch_tensor(c(3, 0, 4)) torch_maximum(a, b) }
Mean
torch_mean(self, dim, keepdim = FALSE, dtype = NULL)
torch_mean(self, dim, keepdim = FALSE, dtype = NULL)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
keepdim |
(bool) whether the output tensor has |
dtype |
the resulting data type. |
Returns the mean value of all elements in the input
tensor.
Returns the mean value of each row of the input
tensor in the given
dimension dim
. If dim
is a list of dimensions,
reduce over all of them.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_mean(a) a = torch_randn(c(4, 4)) a torch_mean(a, 1) torch_mean(a, 1, TRUE) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_mean(a) a = torch_randn(c(4, 4)) a torch_mean(a, 1) torch_mean(a, 1, TRUE) }
Median
torch_median(self, dim, keepdim = FALSE)
torch_median(self, dim, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
Returns the median value of all elements in the input
tensor.
Returns a namedtuple (values, indices)
where values
is the median
value of each row of the input
tensor in the given dimension
dim
. And indices
is the index location of each median value found.
By default, dim
is the last dimension of the input
tensor.
If keepdim
is TRUE
, the output tensors are of the same size
as input
except in the dimension dim
where they are of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in
the outputs tensor having 1 fewer dimension than input
.
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_median(a) a = torch_randn(c(4, 5)) a torch_median(a, 1) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_median(a) a = torch_randn(c(4, 5)) a torch_median(a, 1) }
Returns the correspondent memory format.
torch_contiguous_format() torch_preserve_format() torch_channels_last_format()
torch_contiguous_format() torch_preserve_format() torch_channels_last_format()
Take tensors, each of which can be either scalar or 1-dimensional
vector, and create
N-dimensional grids, where the
th
grid is defined by
expanding the
th
input over dimensions defined by other inputs.
torch_meshgrid(tensors, indexing)
torch_meshgrid(tensors, indexing)
tensors |
(list of Tensor) list of scalars or 1 dimensional tensors. Scalars will be treated (1,). |
indexing |
(str, optional): the indexing mode, either “xy” or “ij”, defaults to “ij”. See warning for future changes. If “xy” is selected, the first dimension corresponds to the cardinality of the second input and the second dimension corresponds to the cardinality of the first input. If “ij” is selected, the dimensions are in the same order as the cardinality of the inputs. |
In the future torch_meshgrid
will transition to indexing=’xy’ as the default.
This issue tracks this issue
with the goal of migrating to NumPy’s behavior.
if (torch_is_installed()) { x = torch_tensor(c(1, 2, 3)) y = torch_tensor(c(4, 5, 6)) out = torch_meshgrid(list(x, y)) out }
if (torch_is_installed()) { x = torch_tensor(c(1, 2, 3)) y = torch_tensor(c(4, 5, 6)) out = torch_meshgrid(list(x, y)) out }
Min
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
out |
(tuple, optional) the tuple of two output tensors (min, min_indices) |
other |
(Tensor) the second input tensor |
Returns the minimum value of all elements in the input
tensor.
Returns a namedtuple (values, indices)
where values
is the minimum
value of each row of the input
tensor in the given dimension
dim
. And indices
is the index location of each minimum value found
(argmin).
indices
does not necessarily contain the first occurrence of each
minimal value found, unless it is unique.
The exact implementation details are device-specific.
Do not expect the same result when run on CPU and GPU in general.
If keepdim
is TRUE
, the output tensors are of the same size as
input
except in the dimension dim
where they are of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in
the output tensors having 1 fewer dimension than input
.
Each element of the tensor input
is compared with the corresponding
element of the tensor other
and an element-wise minimum is taken.
The resulting tensor is returned.
The shapes of input
and other
don't need to match,
but they must be broadcastable .
When the shapes do not match, the shape of the returned output tensor follows the broadcasting rules .
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_min(a) a = torch_randn(c(4, 4)) a torch_min(a, dim = 1) a = torch_randn(c(4)) a b = torch_randn(c(4)) b torch_min(a, other = b) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_min(a) a = torch_randn(c(4, 4)) a torch_min(a, dim = 1) a = torch_randn(c(4)) a b = torch_randn(c(4)) b torch_min(a, other = b) }
Minimum
torch_minimum(self, other)
torch_minimum(self, other)
self |
(Tensor) the input tensor. |
other |
(Tensor) the second input tensor |
Computes the element-wise minimum of input
and other
.
If one of the elements being compared is a NaN, then that element is returned.
torch_minimum()
is not supported for tensors with complex dtypes.
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, -1)) b <- torch_tensor(c(3, 0, 4)) torch_minimum(a, b) }
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, -1)) b <- torch_tensor(c(3, 0, 4)) torch_minimum(a, b) }
Mm
torch_mm(self, mat2)
torch_mm(self, mat2)
self |
(Tensor) the first matrix to be multiplied |
mat2 |
(Tensor) the second matrix to be multiplied |
Performs a matrix multiplication of the matrices input
and mat2
.
If input
is a tensor,
mat2
is a
tensor,
out
will be a tensor.
This function does not broadcast .
For broadcasting matrix products, see torch_matmul
.
if (torch_is_installed()) { mat1 = torch_randn(c(2, 3)) mat2 = torch_randn(c(3, 3)) torch_mm(mat1, mat2) }
if (torch_is_installed()) { mat1 = torch_randn(c(2, 3)) mat2 = torch_randn(c(3, 3)) torch_mm(mat1, mat2) }
Mode
torch_mode(self, dim = -1L, keepdim = FALSE)
torch_mode(self, dim = -1L, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
Returns a namedtuple (values, indices)
where values
is the mode
value of each row of the input
tensor in the given dimension
dim
, i.e. a value which appears most often
in that row, and indices
is the index location of each mode value found.
By default, dim
is the last dimension of the input
tensor.
If keepdim
is TRUE
, the output tensors are of the same size as
input
except in the dimension dim
where they are of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting
in the output tensors having 1 fewer dimension than input
.
This function is not defined for torch_cuda.Tensor
yet.
if (torch_is_installed()) { a = torch_randint(0, 50, size = list(5)) a torch_mode(a, 1) }
if (torch_is_installed()) { a = torch_randint(0, 50, size = list(5)) a torch_mode(a, 1) }
Movedim
torch_movedim(self, source, destination)
torch_movedim(self, source, destination)
self |
(Tensor) the input tensor. |
source |
(int or tuple of ints) Original positions of the dims to move. These must be unique. |
destination |
(int or tuple of ints) Destination positions for each of the original dims. These must also be unique. |
Moves the dimension(s) of input
at the position(s) in source
to the position(s) in destination
.
Other dimensions of input
that are not explicitly moved remain in
their original order and appear at the positions not specified in destination
.
if (torch_is_installed()) { t <- torch_randn(c(3,2,1)) t torch_movedim(t, 2, 1)$shape torch_movedim(t, 2, 1) torch_movedim(t, c(2, 3), c(1, 2))$shape torch_movedim(t, c(2, 3), c(1, 2)) }
if (torch_is_installed()) { t <- torch_randn(c(3,2,1)) t torch_movedim(t, 2, 1)$shape torch_movedim(t, 2, 1) torch_movedim(t, c(2, 3), c(1, 2))$shape torch_movedim(t, c(2, 3), c(1, 2)) }
Mul
torch_mul(self, other)
torch_mul(self, other)
self |
(Tensor) the first multiplicand tensor |
other |
(Tensor) the second multiplicand tensor |
Multiplies each element of the input input
with the scalar
other
and returns a new resulting tensor.
If input
is of type FloatTensor
or DoubleTensor
, other
should be a real number, otherwise it should be an integer
Each element of the tensor input
is multiplied by the corresponding
element of the Tensor other
. The resulting tensor is returned.
The shapes of input
and other
must be
broadcastable .
if (torch_is_installed()) { a = torch_randn(c(3)) a torch_mul(a, 100) a = torch_randn(c(4, 1)) a b = torch_randn(c(1, 4)) b torch_mul(a, b) }
if (torch_is_installed()) { a = torch_randn(c(3)) a torch_mul(a, 100) a = torch_randn(c(4, 1)) a b = torch_randn(c(1, 4)) b torch_mul(a, b) }
Multinomial
torch_multinomial(self, num_samples, replacement = FALSE, generator = NULL)
torch_multinomial(self, num_samples, replacement = FALSE, generator = NULL)
self |
(Tensor) the input tensor containing probabilities |
num_samples |
(int) number of samples to draw |
replacement |
(bool, optional) whether to draw with replacement or not |
generator |
( |
Returns a tensor where each row contains num_samples
indices sampled
from the multinomial probability distribution located in the corresponding row
of tensor input
.
The rows of `input` do not need to sum to one (in which case we use the values as weights), but must be non-negative, finite and have a non-zero sum.
Indices are ordered from left to right according to when each was sampled (first samples are placed in first column).
If input
is a vector, out
is a vector of size num_samples
.
If input
is a matrix with m
rows, out
is an matrix of shape
.
If replacement is TRUE
, samples are drawn with replacement.
If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row.
When drawn without replacement, `num_samples` must be lower than number of non-zero elements in `input` (or the min number of non-zero elements in each row of `input` if it is a matrix).
if (torch_is_installed()) { weights = torch_tensor(c(0, 10, 3, 0), dtype=torch_float()) # create a tensor of weights torch_multinomial(weights, 2) torch_multinomial(weights, 4, replacement=TRUE) }
if (torch_is_installed()) { weights = torch_tensor(c(0, 10, 3, 0), dtype=torch_float()) # create a tensor of weights torch_multinomial(weights, 2) torch_multinomial(weights, 4, replacement=TRUE) }
Multiply
torch_multiply(self, other)
torch_multiply(self, other)
self |
(Tensor) the first multiplicand tensor |
other |
(Tensor) the second multiplicand tensor |
Alias for torch_mul()
.
Mv
torch_mv(self, vec)
torch_mv(self, vec)
self |
(Tensor) matrix to be multiplied |
vec |
(Tensor) vector to be multiplied |
Performs a matrix-vector product of the matrix input
and the vector
vec
.
If input
is a tensor,
vec
is a 1-D tensor of
size ,
out
will be 1-D of size .
This function does not broadcast .
if (torch_is_installed()) { mat = torch_randn(c(2, 3)) vec = torch_randn(c(3)) torch_mv(mat, vec) }
if (torch_is_installed()) { mat = torch_randn(c(2, 3)) vec = torch_randn(c(3)) torch_mv(mat, vec) }
Mvlgamma
torch_mvlgamma(self, p)
torch_mvlgamma(self, p)
self |
(Tensor) the tensor to compute the multivariate log-gamma function |
p |
(int) the number of dimensions |
Computes the multivariate log-gamma function <https://en.wikipedia.org/wiki/Multivariate_gamma_function>
_) with dimension
element-wise, given by
where and
is the Gamma function.
All elements must be greater than , otherwise an error would be thrown.
if (torch_is_installed()) { a = torch_empty(c(2, 3))$uniform_(1, 2) a torch_mvlgamma(a, 2) }
if (torch_is_installed()) { a = torch_empty(c(2, 3))$uniform_(1, 2) a torch_mvlgamma(a, 2) }
Nanquantile
torch_nanquantile( self, q, dim = NULL, keepdim = FALSE, interpolation = "linear" )
torch_nanquantile( self, q, dim = NULL, keepdim = FALSE, interpolation = "linear" )
self |
(Tensor) the input tensor. |
q |
(float or Tensor) a scalar or 1D tensor of quantile values in the range |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
interpolation |
The interpolation method. |
This is a variant of torch_quantile()
that "ignores" NaN
values,
computing the quantiles q
as if NaN
values in input
did
not exist. If all values in a reduced row are NaN
then the quantiles for
that reduction will be NaN
. See the documentation for torch_quantile()
.
if (torch_is_installed()) { t <- torch_tensor(c(NaN, 1, 2)) t$quantile(0.5) t$nanquantile(0.5) t <- torch_tensor(rbind(c(NaN, NaN), c(1, 2))) t t$nanquantile(0.5, dim=1) t$nanquantile(0.5, dim=2) torch_nanquantile(t, 0.5, dim = 1) torch_nanquantile(t, 0.5, dim = 2) }
if (torch_is_installed()) { t <- torch_tensor(c(NaN, 1, 2)) t$quantile(0.5) t$nanquantile(0.5) t <- torch_tensor(rbind(c(NaN, NaN), c(1, 2))) t t$nanquantile(0.5, dim=1) t$nanquantile(0.5, dim=2) torch_nanquantile(t, 0.5, dim = 1) torch_nanquantile(t, 0.5, dim = 2) }
Nansum
torch_nansum(self, dim = NULL, keepdim = FALSE, dtype = NULL)
torch_nansum(self, dim = NULL, keepdim = FALSE, dtype = NULL)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
keepdim |
(bool) whether the output tensor has |
dtype |
the desired data type of returned tensor. If specified, the
input tensor is casted to dtype before the operation is performed. This is
useful for preventing data type overflows. Default: |
Returns the sum of all elements, treating Not a Numbers (NaNs) as zero.
Returns the sum of each row of the input
tensor in the given
dimension dim
, treating Not a Numbers (NaNs) as zero.
If dim
is a list of dimensions, reduce over all of them.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
if (torch_is_installed()) { a <- torch_tensor(c(1., 2., NaN, 4.)) torch_nansum(a) torch_nansum(torch_tensor(c(1., NaN))) a <- torch_tensor(rbind(c(1, 2), c(3., NaN))) torch_nansum(a) torch_nansum(a, dim=1) torch_nansum(a, dim=2) }
if (torch_is_installed()) { a <- torch_tensor(c(1., 2., NaN, 4.)) torch_nansum(a) torch_nansum(torch_tensor(c(1., NaN))) a <- torch_tensor(rbind(c(1, 2), c(3., NaN))) torch_nansum(a) torch_nansum(a, dim=1) torch_nansum(a, dim=2) }
Narrow
torch_narrow(self, dim, start, length)
torch_narrow(self, dim, start, length)
self |
(Tensor) the tensor to narrow |
dim |
(int) the dimension along which to narrow |
start |
(int) the starting dimension |
length |
(int) the distance to the ending dimension |
Returns a new tensor that is a narrowed version of input
tensor. The
dimension dim
is input from start
to start + length
. The
returned tensor and input
tensor share the same underlying storage.
if (torch_is_installed()) { x = torch_tensor(matrix(c(1:9), ncol = 3, byrow= TRUE)) torch_narrow(x, 1, 1, 2) torch_narrow(x, 2, 2, 2) }
if (torch_is_installed()) { x = torch_tensor(matrix(c(1:9), ncol = 3, byrow= TRUE)) torch_narrow(x, 1, 1, 2) torch_narrow(x, 2, 2, 2) }
Ne
torch_ne(self, other)
torch_ne(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Computes element-wise.
The second argument can be a number or a tensor whose shape is broadcastable with the first argument.
if (torch_is_installed()) { torch_ne(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(rep(c(1,4), each = 2), ncol = 2, byrow=TRUE))) }
if (torch_is_installed()) { torch_ne(torch_tensor(matrix(1:4, ncol = 2, byrow=TRUE)), torch_tensor(matrix(rep(c(1,4), each = 2), ncol = 2, byrow=TRUE))) }
Neg
torch_neg(self)
torch_neg(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the negative of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_neg(a) }
if (torch_is_installed()) { a = torch_randn(c(5)) a torch_neg(a) }
Negative
torch_negative(self)
torch_negative(self)
self |
(Tensor) the input tensor. |
Alias for torch_neg()
Nextafter
torch_nextafter(self, other)
torch_nextafter(self, other)
self |
(Tensor) the first input tensor |
other |
(Tensor) the second input tensor |
Return the next floating-point value after input
towards other
, elementwise.
The shapes of input
and other
must be
broadcastable .
if (torch_is_installed()) { eps <- torch_finfo(torch_float32())$eps torch_nextafter(torch_tensor(c(1, 2)), torch_tensor(c(2, 1))) == torch_tensor(c(eps + 1, 2 - eps)) }
if (torch_is_installed()) { eps <- torch_finfo(torch_float32())$eps torch_nextafter(torch_tensor(c(1, 2)), torch_tensor(c(2, 1))) == torch_tensor(c(eps + 1, 2 - eps)) }
Nonzero elements of tensors.
torch_nonzero(self, as_list = FALSE)
torch_nonzero(self, as_list = FALSE)
self |
(Tensor) the input tensor. |
as_list |
If When Returns a tensor containing the indices of all non-zero elements of
If When Returns a tuple of 1-D tensors, one for each dimension in If As a special case, when |
if (torch_is_installed()) { torch_nonzero(torch_tensor(c(1, 1, 1, 0, 1))) }
if (torch_is_installed()) { torch_nonzero(torch_tensor(c(1, 1, 1, 0, 1))) }
Norm
torch_norm(self, p = 2L, dim, keepdim = FALSE, dtype)
torch_norm(self, p = 2L, dim, keepdim = FALSE, dtype)
self |
(Tensor) the input tensor |
p |
(int, float, inf, -inf, 'fro', 'nuc', optional) the order of norm. Default: |
dim |
(int, 2-tuple of ints, 2-list of ints, optional) If it is an int, vector norm will be calculated, if it is 2-tuple of ints, matrix norm will be calculated. If the value is NULL, matrix norm will be calculated when the input tensor only has two dimensions, vector norm will be calculated when the input tensor only has one dimension. If the input tensor has more than two dimensions, the vector norm will be applied to last dimension. |
keepdim |
(bool, optional) whether the output tensors have |
dtype |
( |
Returns the matrix norm or vector norm of a given tensor.
if (torch_is_installed()) { a <- torch_arange(1, 9, dtype = torch_float()) b <- a$reshape(list(3, 3)) torch_norm(a) torch_norm(b) torch_norm(a, Inf) torch_norm(b, Inf) }
if (torch_is_installed()) { a <- torch_arange(1, 9, dtype = torch_float()) b <- a$reshape(list(3, 3)) torch_norm(a) torch_norm(b) torch_norm(a, Inf) torch_norm(b, Inf) }
Normal
Normal distributed
torch_normal(mean, std, size = NULL, generator = NULL, ...)
torch_normal(mean, std, size = NULL, generator = NULL, ...)
mean |
(tensor or scalar double) Mean of the normal distribution.
If this is a |
std |
(tensor or scalar double) The standard deviation of the normal
distribution. If this is a |
size |
(integers, optional) only used if both |
generator |
a random number generator created with |
... |
Tensor option parameters like |
Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given.
The mean
is a tensor with the mean of
each output element's normal distribution
The std
is a tensor with the standard deviation of
each output element's normal distribution
The shapes of mean
and std
don't need to match, but the
total number of elements in each tensor need to be the same.
Similar to the function above, but the means are shared among all drawn elements.
Similar to the function above, but the standard-deviations are shared among all drawn elements.
Similar to the function above, but the means and standard deviations are shared
among all drawn elements. The resulting tensor has size given by size
.
When the shapes do not match, the shape of mean
is used as the shape for the returned output tensor
if (torch_is_installed()) { torch_normal(mean=0, std=torch_arange(1, 0, -0.1) + 1e-6) torch_normal(mean=0.5, std=torch_arange(1., 6.)) torch_normal(mean=torch_arange(1., 6.)) torch_normal(2, 3, size=c(1, 4)) }
if (torch_is_installed()) { torch_normal(mean=0, std=torch_arange(1, 0, -0.1) + 1e-6) torch_normal(mean=0.5, std=torch_arange(1., 6.)) torch_normal(mean=torch_arange(1., 6.)) torch_normal(2, 3, size=c(1, 4)) }
Not_equal
torch_not_equal(self, other)
torch_not_equal(self, other)
self |
(Tensor) the tensor to compare |
other |
(Tensor or float) the tensor or value to compare |
Alias for torch_ne()
.
Ones
torch_ones( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_ones( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
... |
(int...) a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. |
names |
optional names for the dimensions |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a tensor filled with the scalar value 1
, with the shape defined
by the variable argument size
.
if (torch_is_installed()) { torch_ones(c(2, 3)) torch_ones(c(5)) }
if (torch_is_installed()) { torch_ones(c(2, 3)) torch_ones(c(5)) }
Ones_like
torch_ones_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_ones_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
input |
(Tensor) the size of |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
( |
Returns a tensor filled with the scalar value 1
, with the same size as
input
. torch_ones_like(input)
is equivalent to
torch_ones(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)
.
As of 0.4, this function does not support an out
keyword. As an alternative,
the old torch_ones_like(input, out=output)
is equivalent to
torch_ones(input.size(), out=output)
.
if (torch_is_installed()) { input = torch_empty(c(2, 3)) torch_ones_like(input) }
if (torch_is_installed()) { input = torch_empty(c(2, 3)) torch_ones_like(input) }
Orgqr
torch_orgqr(self, input2)
torch_orgqr(self, input2)
self |
(Tensor) the |
input2 |
(Tensor) the |
Computes the orthogonal matrix Q
of a QR factorization, from the (input, input2)
tuple returned by torch_geqrf
.
This directly calls the underlying LAPACK function ?orgqr
.
See LAPACK documentation for orgqr
_ for further details.
Ormqr
torch_ormqr(self, input2, input3, left = TRUE, transpose = FALSE)
torch_ormqr(self, input2, input3, left = TRUE, transpose = FALSE)
self |
(Tensor) the |
input2 |
(Tensor) the |
input3 |
(Tensor) the matrix to be multiplied. |
left |
see LAPACK documentation |
transpose |
see LAPACK documentation |
Multiplies mat
(given by input3
) by the orthogonal Q
matrix of the QR factorization
formed by torch_geqrf()
that is represented by (a, tau)
(given by (input
, input2
)).
This directly calls the underlying LAPACK function ?ormqr
.
Outer
torch_outer(self, vec2)
torch_outer(self, vec2)
self |
(Tensor) 1-D input vector |
vec2 |
(Tensor) 1-D input vector |
Outer product of input
and vec2
.
If input
is a vector of size and
vec2
is a vector of
size , then
out
must be a matrix of size .
This function does not broadcast.
if (torch_is_installed()) { v1 <- torch_arange(1., 5.) v2 <- torch_arange(1., 4.) torch_outer(v1, v2) }
if (torch_is_installed()) { v1 <- torch_arange(1., 5.) v2 <- torch_arange(1., 4.) torch_outer(v1, v2) }
Pdist
torch_pdist(self, p = 2L)
torch_pdist(self, p = 2L)
self |
NA input tensor of shape |
p |
NA p value for the p-norm distance to calculate between each vector pair |
Computes the p-norm distance between every pair of row vectors in the input.
This is identical to the upper triangular portion, excluding the diagonal, of
torch_norm(input[:, NULL] - input, dim=2, p=p)
. This function will be faster
if the rows are contiguous.
If input has shape then the output will have shape
.
This function is equivalent to scipy.spatial.distance.pdist(input, 'minkowski', p=p)
if . When
it is
equivalent to
scipy.spatial.distance.pdist(input, 'hamming') * M
.
When , the closest scipy function is
scipy.spatial.distance.pdist(xn, lambda x, y: np.abs(x - y).max())
.
Pinverse
torch_pinverse(self, rcond = 1e-15)
torch_pinverse(self, rcond = 1e-15)
self |
(Tensor) The input tensor of size |
rcond |
(float) A floating point value to determine the cutoff for small singular values. Default: 1e-15 |
Calculates the pseudo-inverse (also known as the Moore-Penrose inverse) of a 2D tensor.
Please look at Moore-Penrose inverse
_ for more details
This method is implemented using the Singular Value Decomposition.
The pseudo-inverse is not necessarily a continuous function in the elements of the matrix `[1]`_. Therefore, derivatives are not always existent, and exist for a constant rank only `[2]`_. However, this method is backprop-able due to the implementation by using SVD results, and could be unstable. Double-backward will also be unstable due to the usage of SVD internally. See `~torch.svd` for more details.
if (torch_is_installed()) { input = torch_randn(c(3, 5)) input torch_pinverse(input) # Batched pinverse example a = torch_randn(c(2,6,3)) b = torch_pinverse(a) torch_matmul(b, a) }
if (torch_is_installed()) { input = torch_randn(c(3, 5)) input torch_pinverse(input) # Batched pinverse example a = torch_randn(c(2,6,3)) b = torch_pinverse(a) torch_matmul(b, a) }
Pixel_shuffle
torch_pixel_shuffle(self, upscale_factor)
torch_pixel_shuffle(self, upscale_factor)
self |
(Tensor) the input tensor |
upscale_factor |
(int) factor to increase spatial resolution by |
math:(*, C \times r^2, H, W)
to a :
Rearranges elements in a tensor of shape to a
tensor of shape
.
See ~torch.nn.PixelShuffle
for details.
if (torch_is_installed()) { input = torch_randn(c(1, 9, 4, 4)) output = nnf_pixel_shuffle(input, 3) print(output$size()) }
if (torch_is_installed()) { input = torch_randn(c(1, 9, 4, 4)) output = nnf_pixel_shuffle(input, 3) print(output$size()) }
Poisson
torch_poisson(self, generator = NULL)
torch_poisson(self, generator = NULL)
self |
(Tensor) the input tensor containing the rates of the Poisson distribution |
generator |
( |
Returns a tensor of the same size as input
with each element
sampled from a Poisson distribution with rate parameter given by the corresponding
element in input
i.e.,
if (torch_is_installed()) { rates = torch_rand(c(4, 4)) * 5 # rate parameter between 0 and 5 torch_poisson(rates) }
if (torch_is_installed()) { rates = torch_rand(c(4, 4)) * 5 # rate parameter between 0 and 5 torch_poisson(rates) }
Polar
torch_polar(abs, angle)
torch_polar(abs, angle)
abs |
(Tensor) The absolute value the complex tensor. Must be float or double. |
angle |
(Tensor) The angle of the complex tensor. Must be same dtype as
|
Constructs a complex tensor whose elements are Cartesian coordinates
corresponding to the polar coordinates with absolute value abs
and angle
angle
.
if (torch_is_installed()) { abs <- torch_tensor(c(1, 2), dtype=torch_float64()) angle <- torch_tensor(c(pi / 2, 5 * pi / 4), dtype=torch_float64()) z <- torch_polar(abs, angle) z }
if (torch_is_installed()) { abs <- torch_tensor(c(1, 2), dtype=torch_float64()) angle <- torch_tensor(c(pi / 2, 5 * pi / 4), dtype=torch_float64()) z <- torch_polar(abs, angle) z }
Polygamma
torch_polygamma(n, input)
torch_polygamma(n, input)
n |
(int) the order of the polygamma function |
input |
(Tensor) the input tensor. |
Computes the derivative of the digamma function on
input
.
is called the order of the polygamma function.
This function is not implemented for \eqn{n \geq 2}.
if (torch_is_installed()) { ## Not run: a = torch_tensor(c(1, 0.5)) torch_polygamma(1, a) ## End(Not run) }
if (torch_is_installed()) { ## Not run: a = torch_tensor(c(1, 0.5)) torch_polygamma(1, a) ## End(Not run) }
Pow
torch_pow(self, exponent)
torch_pow(self, exponent)
self |
(float) the scalar base value for the power operation |
exponent |
(float or tensor) the exponent value |
Takes the power of each element in input
with exponent
and
returns a tensor with the result.
exponent
can be either a single float
number or a Tensor
with the same number of elements as input
.
When exponent
is a scalar value, the operation applied is:
When exponent
is a tensor, the operation applied is:
When exponent
is a tensor, the shapes of input
and exponent
must be broadcastable .
self
is a scalar float
value, and exponent
is a tensor.
The returned tensor out
is of the same shape as exponent
The operation applied is:
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_pow(a, 2) exp <- torch_arange(1, 5) a <- torch_arange(1, 5) a exp torch_pow(a, exp) exp <- torch_arange(1, 5) base <- 2 torch_pow(base, exp) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_pow(a, 2) exp <- torch_arange(1, 5) a <- torch_arange(1, 5) a exp torch_pow(a, exp) exp <- torch_arange(1, 5) base <- 2 torch_pow(base, exp) }
Prod
torch_prod(self, dim, keepdim = FALSE, dtype = NULL)
torch_prod(self, dim, keepdim = FALSE, dtype = NULL)
self |
(Tensor) the input tensor. |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
dtype |
( |
Returns the product of all elements in the input
tensor.
Returns the product of each row of the input
tensor in the given
dimension dim
.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in
the output tensor having 1 fewer dimension than input
.
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_prod(a) a = torch_randn(c(4, 2)) a torch_prod(a, 1) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_prod(a) a = torch_randn(c(4, 2)) a torch_prod(a, 1) }
Promote_types
torch_promote_types(type1, type2)
torch_promote_types(type1, type2)
type1 |
( |
type2 |
( |
Returns the torch_dtype
with the smallest size and scalar kind that is
not smaller nor of lower kind than either type1
or type2
. See type promotion
documentation for more information on the type
promotion logic.
if (torch_is_installed()) { torch_promote_types(torch_int32(), torch_float32()) torch_promote_types(torch_uint8(), torch_long()) }
if (torch_is_installed()) { torch_promote_types(torch_int32(), torch_float32()) torch_promote_types(torch_uint8(), torch_long()) }
Qr
torch_qr(self, some = TRUE)
torch_qr(self, some = TRUE)
self |
(Tensor) the input tensor of size |
some |
(bool, optional) Set to |
Computes the QR decomposition of a matrix or a batch of matrices input
,
and returns a namedtuple (Q, R) of tensors such that
with
being an orthogonal matrix or batch of orthogonal matrices and
being an upper triangular matrix or batch of upper triangular matrices.
If some
is TRUE
, then this function returns the thin (reduced) QR factorization.
Otherwise, if some
is FALSE
, this function returns the complete QR factorization.
precision may be lost if the magnitudes of the elements of input
are large
While it should always give you a valid decomposition, it may not give you the same one across platforms - it will depend on your LAPACK implementation.
if (torch_is_installed()) { a = torch_tensor(matrix(c(12., -51, 4, 6, 167, -68, -4, 24, -41), ncol = 3, byrow = TRUE)) out = torch_qr(a) q = out[[1]] r = out[[2]] torch_mm(q, r)$round() torch_mm(q$t(), q)$round() }
if (torch_is_installed()) { a = torch_tensor(matrix(c(12., -51, 4, 6, 167, -68, -4, 24, -41), ncol = 3, byrow = TRUE)) out = torch_qr(a) q = out[[1]] r = out[[2]] torch_mm(q, r)$round() torch_mm(q$t(), q)$round() }
Creates the corresponding Scheme object
torch_per_channel_affine() torch_per_tensor_affine() torch_per_channel_symmetric() torch_per_tensor_symmetric()
torch_per_channel_affine() torch_per_tensor_affine() torch_per_channel_symmetric() torch_per_tensor_symmetric()
Quantile
torch_quantile(self, q, dim = NULL, keepdim = FALSE, interpolation = "linear")
torch_quantile(self, q, dim = NULL, keepdim = FALSE, interpolation = "linear")
self |
(Tensor) the input tensor. |
q |
(float or Tensor) a scalar or 1D tensor of quantile values in the range |
dim |
(int) the dimension to reduce. |
keepdim |
(bool) whether the output tensor has |
interpolation |
The interpolation method. |
Returns the q-th quantiles of all elements in the input
tensor, doing a linear
interpolation when the q-th quantile lies between two data points.
Returns the q-th quantiles of each row of the input
tensor along the dimension
dim
, doing a linear interpolation when the q-th quantile lies between two
data points. By default, dim
is None
resulting in the input
tensor
being flattened before computation.
If keepdim
is TRUE
, the output dimensions are of the same size as input
except in the dimensions being reduced (dim
or all if dim
is NULL
) where they
have size 1. Otherwise, the dimensions being reduced are squeezed (see torch_squeeze
).
If q
is a 1D tensor, an extra dimension is prepended to the output tensor with the same
size as q
which represents the quantiles.
if (torch_is_installed()) { a <- torch_randn(c(1, 3)) a q <- torch_tensor(c(0, 0.5, 1)) torch_quantile(a, q) a <- torch_randn(c(2, 3)) a q <- torch_tensor(c(0.25, 0.5, 0.75)) torch_quantile(a, q, dim=1, keepdim=TRUE) torch_quantile(a, q, dim=1, keepdim=TRUE)$shape }
if (torch_is_installed()) { a <- torch_randn(c(1, 3)) a q <- torch_tensor(c(0, 0.5, 1)) torch_quantile(a, q) a <- torch_randn(c(2, 3)) a q <- torch_tensor(c(0.25, 0.5, 0.75)) torch_quantile(a, q, dim=1, keepdim=TRUE) torch_quantile(a, q, dim=1, keepdim=TRUE)$shape }
Quantize_per_channel
torch_quantize_per_channel(self, scales, zero_points, axis, dtype)
torch_quantize_per_channel(self, scales, zero_points, axis, dtype)
self |
(Tensor) float tensor to quantize |
scales |
(Tensor) float 1D tensor of scales to use, size should match |
zero_points |
(int) integer 1D tensor of offset to use, size should match |
axis |
(int) dimension on which apply per-channel quantization |
dtype |
( |
Converts a float tensor to per-channel quantized tensor with given scales and zero points.
if (torch_is_installed()) { x = torch_tensor(matrix(c(-1.0, 0.0, 1.0, 2.0), ncol = 2, byrow = TRUE)) torch_quantize_per_channel(x, torch_tensor(c(0.1, 0.01)), torch_tensor(c(10L, 0L)), 0, torch_quint8()) torch_quantize_per_channel(x, torch_tensor(c(0.1, 0.01)), torch_tensor(c(10L, 0L)), 0, torch_quint8())$int_repr() }
if (torch_is_installed()) { x = torch_tensor(matrix(c(-1.0, 0.0, 1.0, 2.0), ncol = 2, byrow = TRUE)) torch_quantize_per_channel(x, torch_tensor(c(0.1, 0.01)), torch_tensor(c(10L, 0L)), 0, torch_quint8()) torch_quantize_per_channel(x, torch_tensor(c(0.1, 0.01)), torch_tensor(c(10L, 0L)), 0, torch_quint8())$int_repr() }
Quantize_per_tensor
torch_quantize_per_tensor(self, scale, zero_point, dtype)
torch_quantize_per_tensor(self, scale, zero_point, dtype)
self |
(Tensor) float tensor to quantize |
scale |
(float) scale to apply in quantization formula |
zero_point |
(int) offset in integer value that maps to float zero |
dtype |
( |
Converts a float tensor to quantized tensor with given scale and zero point.
if (torch_is_installed()) { torch_quantize_per_tensor(torch_tensor(c(-1.0, 0.0, 1.0, 2.0)), 0.1, 10, torch_quint8()) torch_quantize_per_tensor(torch_tensor(c(-1.0, 0.0, 1.0, 2.0)), 0.1, 10, torch_quint8())$int_repr() }
if (torch_is_installed()) { torch_quantize_per_tensor(torch_tensor(c(-1.0, 0.0, 1.0, 2.0)), 0.1, 10, torch_quint8()) torch_quantize_per_tensor(torch_tensor(c(-1.0, 0.0, 1.0, 2.0)), 0.1, 10, torch_quint8())$int_repr() }
Rad2deg
torch_rad2deg(self)
torch_rad2deg(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with each of the elements of input
converted from angles in radians to degrees.
if (torch_is_installed()) { a <- torch_tensor(rbind(c(3.142, -3.142), c(6.283, -6.283), c(1.570, -1.570))) torch_rad2deg(a) }
if (torch_is_installed()) { a <- torch_tensor(rbind(c(3.142, -3.142), c(6.283, -6.283), c(1.570, -1.570))) torch_rad2deg(a) }
Rand
torch_rand( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_rand( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
... |
(int...) a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. |
names |
optional dimension names |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a tensor filled with random numbers from a uniform distribution
on the interval
The shape of the tensor is defined by the variable argument size
.
if (torch_is_installed()) { torch_rand(4) torch_rand(c(2, 3)) }
if (torch_is_installed()) { torch_rand(4) torch_rand(c(2, 3)) }
Rand_like
torch_rand_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_rand_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
input |
(Tensor) the size of |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
( |
Returns a tensor with the same size as input
that is filled with
random numbers from a uniform distribution on the interval .
torch_rand_like(input)
is equivalent to
torch_rand(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)
.
Randint
torch_randint( low, high, size, generator = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_randint( low, high, size, generator = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
low |
(int, optional) Lowest integer to be drawn from the distribution. Default: 0. |
high |
(int) One above the highest integer to be drawn from the distribution. |
size |
(tuple) a tuple defining the shape of the output tensor. |
generator |
( |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
memory format for the resulting tensor. |
dtype=NULL, layout=torch.strided, device=NULL, requires_grad=False) -> Tensor
Returns a tensor filled with random integers generated uniformly
between low
(inclusive) and high
(exclusive).
The shape of the tensor is defined by the variable argument size
.
.. note:
With the global dtype default (torch_float32
), this function returns
a tensor with dtype torch_int64
.
if (torch_is_installed()) { torch_randint(3, 5, list(3)) torch_randint(0, 10, size = list(2, 2)) torch_randint(3, 10, list(2, 2)) }
if (torch_is_installed()) { torch_randint(3, 5, list(3)) torch_randint(0, 10, size = list(2, 2)) torch_randint(3, 10, list(2, 2)) }
Randint_like
torch_randint_like( input, low, high, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_randint_like( input, low, high, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
input |
(Tensor) the size of |
low |
(int, optional) Lowest integer to be drawn from the distribution. Default: 0. |
high |
(int) One above the highest integer to be drawn from the distribution. |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format=torch.preserve_format) -> Tensor
Returns a tensor with the same shape as Tensor input
filled with
random integers generated uniformly between low
(inclusive) and
high
(exclusive).
.. note:
With the global dtype default (torch_float32
), this function returns
a tensor with dtype torch_int64
.
Randn
torch_randn( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_randn( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
... |
(int...) a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. |
names |
optional names for the dimensions |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a tensor filled with random numbers from a normal distribution
with mean 0
and variance 1
(also called the standard normal
distribution).
The shape of the tensor is defined by the variable argument size
.
if (torch_is_installed()) { torch_randn(c(4)) torch_randn(c(2, 3)) }
if (torch_is_installed()) { torch_randn(c(4)) torch_randn(c(2, 3)) }
Randn_like
torch_randn_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_randn_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
input |
(Tensor) the size of |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
( |
Returns a tensor with the same size as input
that is filled with
random numbers from a normal distribution with mean 0 and variance 1.
torch_randn_like(input)
is equivalent to
torch_randn(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)
.
Randperm
torch_randperm( n, dtype = torch_int64(), layout = NULL, device = NULL, requires_grad = FALSE )
torch_randperm( n, dtype = torch_int64(), layout = NULL, device = NULL, requires_grad = FALSE )
n |
(int) the upper bound (exclusive) |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a random permutation of integers from 0
to n - 1
.
if (torch_is_installed()) { torch_randperm(4) }
if (torch_is_installed()) { torch_randperm(4) }
Range
torch_range( start, end, step = 1, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_range( start, end, step = 1, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
start |
(float) the starting value for the set of points. Default: |
end |
(float) the ending value for the set of points |
step |
(float) the gap between each pair of adjacent points. Default: |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a 1-D tensor of size
with values from
start
to end
with step step
. Step is
the gap between two values in the tensor.
This function is deprecated in favor of torch_arange
.
if (torch_is_installed()) { torch_range(1, 4) torch_range(1, 4, 0.5) }
if (torch_is_installed()) { torch_range(1, 4) torch_range(1, 4, 0.5) }
Real
torch_real(self)
torch_real(self)
self |
(Tensor) the input tensor. |
Returns the real part of the input
tensor. If
input
is a real (non-complex) tensor, this function just
returns it.
Not yet implemented for complex tensors.
if (torch_is_installed()) { ## Not run: torch_real(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i))) ## End(Not run) }
if (torch_is_installed()) { ## Not run: torch_real(torch_tensor(c(-1 + 1i, -2 + 2i, 3 - 3i))) ## End(Not run) }
Reciprocal
torch_reciprocal(self)
torch_reciprocal(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the reciprocal of the elements of input
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_reciprocal(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_reciprocal(a) }
Creates the reduction objet
torch_reduction_sum() torch_reduction_mean() torch_reduction_none()
torch_reduction_sum() torch_reduction_mean() torch_reduction_none()
Relu
torch_relu(self)
torch_relu(self)
self |
the input tensor |
Computes the relu tranformation.
Relu_
torch_relu_(self)
torch_relu_(self)
self |
the input tensor |
In-place version of torch_relu()
.
Remainder
torch_remainder(self, other)
torch_remainder(self, other)
self |
(Tensor) the dividend |
other |
(Tensor or float) the divisor that may be either a number or a Tensor of the same shape as the dividend |
Computes the element-wise remainder of division.
The divisor and dividend may contain both for integer and floating point numbers. The remainder has the same sign as the divisor.
When other
is a tensor, the shapes of input
and
other
must be broadcastable .
if (torch_is_installed()) { torch_remainder(torch_tensor(c(-3., -2, -1, 1, 2, 3)), 2) torch_remainder(torch_tensor(c(1., 2, 3, 4, 5)), 1.5) }
if (torch_is_installed()) { torch_remainder(torch_tensor(c(-3., -2, -1, 1, 2, 3)), 2) torch_remainder(torch_tensor(c(1., 2, 3, 4, 5)), 1.5) }
Renorm
torch_renorm(self, p, dim, maxnorm)
torch_renorm(self, p, dim, maxnorm)
self |
(Tensor) the input tensor. |
p |
(float) the power for the norm computation |
dim |
(int) the dimension to slice over to get the sub-tensors |
maxnorm |
(float) the maximum norm to keep each sub-tensor under |
Returns a tensor where each sub-tensor of input
along dimension
dim
is normalized such that the p
-norm of the sub-tensor is lower
than the value maxnorm
If the norm of a row is lower than maxnorm
, the row is unchanged
if (torch_is_installed()) { x = torch_ones(c(3, 3)) x[2,]$fill_(2) x[3,]$fill_(3) x torch_renorm(x, 1, 1, 5) }
if (torch_is_installed()) { x = torch_ones(c(3, 3)) x[2,]$fill_(2) x[3,]$fill_(3) x torch_renorm(x, 1, 1, 5) }
Repeat_interleave
torch_repeat_interleave(self, repeats, dim = NULL, output_size = NULL)
torch_repeat_interleave(self, repeats, dim = NULL, output_size = NULL)
self |
(Tensor) the input tensor. |
repeats |
(Tensor or int) The number of repetitions for each element. repeats is broadcasted to fit the shape of the given axis. |
dim |
(int, optional) The dimension along which to repeat values. By default, use the flattened input array, and return a flat output array. |
output_size |
(int, optional) – Total output size for the given axis ( e.g. sum of repeats). If given, it will avoid stream syncronization needed to calculate output shape of the tensor. |
Repeat elements of a tensor.
This is different from `torch_Tensor.repeat` but similar to `numpy.repeat`.
If the repeats
is tensor([n1, n2, n3, ...])
, then the output will be
tensor([0, 0, ..., 1, 1, ..., 2, 2, ..., ...])
where 0
appears n1
times,
1
appears n2
times, 2
appears n3
times, etc.
if (torch_is_installed()) { ## Not run: x = torch_tensor(c(1, 2, 3)) x$repeat_interleave(2) y = torch_tensor(matrix(c(1, 2, 3, 4), ncol = 2, byrow=TRUE)) torch_repeat_interleave(y, 2) torch_repeat_interleave(y, 3, dim=1) torch_repeat_interleave(y, torch_tensor(c(1, 2)), dim=1) ## End(Not run) }
if (torch_is_installed()) { ## Not run: x = torch_tensor(c(1, 2, 3)) x$repeat_interleave(2) y = torch_tensor(matrix(c(1, 2, 3, 4), ncol = 2, byrow=TRUE)) torch_repeat_interleave(y, 2) torch_repeat_interleave(y, 3, dim=1) torch_repeat_interleave(y, torch_tensor(c(1, 2)), dim=1) ## End(Not run) }
Reshape
torch_reshape(self, shape)
torch_reshape(self, shape)
self |
(Tensor) the tensor to be reshaped |
shape |
(tuple of ints) the new shape |
Returns a tensor with the same data and number of elements as input
,
but with the specified shape. When possible, the returned tensor will be a view
of input
. Otherwise, it will be a copy. Contiguous inputs and inputs
with compatible strides can be reshaped without copying, but you should not
depend on the copying vs. viewing behavior.
See torch_Tensor.view
on when it is possible to return a view.
A single dimension may be -1, in which case it's inferred from the remaining
dimensions and the number of elements in input
.
if (torch_is_installed()) { a <- torch_arange(0, 3) torch_reshape(a, list(2, 2)) b <- torch_tensor(matrix(c(0, 1, 2, 3), ncol = 2, byrow=TRUE)) torch_reshape(b, list(-1)) }
if (torch_is_installed()) { a <- torch_arange(0, 3) torch_reshape(a, list(2, 2)) b <- torch_tensor(matrix(c(0, 1, 2, 3), ncol = 2, byrow=TRUE)) torch_reshape(b, list(-1)) }
Result_type
torch_result_type(tensor1, tensor2)
torch_result_type(tensor1, tensor2)
tensor1 |
(Tensor or Number) an input tensor or number |
tensor2 |
(Tensor or Number) an input tensor or number |
Returns the torch_dtype
that would result from performing an arithmetic
operation on the provided input tensors. See type promotion documentation
for more information on the type promotion logic.
if (torch_is_installed()) { torch_result_type(tensor1 = torch_tensor(c(1, 2), dtype=torch_int()), tensor2 = 1) }
if (torch_is_installed()) { torch_result_type(tensor1 = torch_tensor(c(1, 2), dtype=torch_int()), tensor2 = 1) }
Roll
torch_roll(self, shifts, dims = list())
torch_roll(self, shifts, dims = list())
self |
(Tensor) the input tensor. |
shifts |
(int or tuple of ints) The number of places by which the elements of the tensor are shifted. If shifts is a tuple, dims must be a tuple of the same size, and each dimension will be rolled by the corresponding value |
dims |
(int or tuple of ints) Axis along which to roll |
Roll the tensor along the given dimension(s). Elements that are shifted beyond the last position are re-introduced at the first position. If a dimension is not specified, the tensor will be flattened before rolling and then restored to the original shape.
if (torch_is_installed()) { x = torch_tensor(c(1, 2, 3, 4, 5, 6, 7, 8))$view(c(4, 2)) x torch_roll(x, 1, 1) torch_roll(x, -1, 1) torch_roll(x, shifts=list(2, 1), dims=list(1, 2)) }
if (torch_is_installed()) { x = torch_tensor(c(1, 2, 3, 4, 5, 6, 7, 8))$view(c(4, 2)) x torch_roll(x, 1, 1) torch_roll(x, -1, 1) torch_roll(x, shifts=list(2, 1), dims=list(1, 2)) }
Rot90
torch_rot90(self, k = 1L, dims = c(0, 1))
torch_rot90(self, k = 1L, dims = c(0, 1))
self |
(Tensor) the input tensor. |
k |
(int) number of times to rotate |
dims |
(a list or tuple) axis to rotate |
Rotate a n-D tensor by 90 degrees in the plane specified by dims axis. Rotation direction is from the first towards the second axis if k > 0, and from the second towards the first for k < 0.
if (torch_is_installed()) { x <- torch_arange(1, 4)$view(c(2, 2)) x torch_rot90(x, 1, c(1, 2)) x <- torch_arange(1, 8)$view(c(2, 2, 2)) x torch_rot90(x, 1, c(1, 2)) }
if (torch_is_installed()) { x <- torch_arange(1, 4)$view(c(2, 2)) x torch_rot90(x, 1, c(1, 2)) x <- torch_arange(1, 8)$view(c(2, 2, 2)) x torch_rot90(x, 1, c(1, 2)) }
Round
torch_round(self, decimals)
torch_round(self, decimals)
self |
(Tensor) the input tensor. |
decimals |
Number of decimal places to round to (default: 0). If decimals is negative, it specifies the number of positions to the left of the decimal point. |
Returns a new tensor with each of the elements of input
rounded
to the closest integer.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_round(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_round(a) }
Rrelu_
torch_rrelu_( self, lower = 0.125, upper = 0.333333333333333, training = FALSE, generator = NULL )
torch_rrelu_( self, lower = 0.125, upper = 0.333333333333333, training = FALSE, generator = NULL )
self |
the input tensor |
lower |
lower bound of the uniform distribution. Default: 1/8 |
upper |
upper bound of the uniform distribution. Default: 1/3 |
training |
bool wether it's a training pass. DEfault: FALSE |
generator |
random number generator |
In-place version of torch_rrelu
.
Rsqrt
torch_rsqrt(self)
torch_rsqrt(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the reciprocal of the square-root of each of
the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_rsqrt(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_rsqrt(a) }
This function is experimental, don't use for long term storage.
torch_save(obj, path, ..., compress = TRUE)
torch_save(obj, path, ..., compress = TRUE)
obj |
the saved object |
path |
a connection or the name of the file to save. |
... |
not currently used. |
compress |
a logical specifying whether saving to a named file is to use "gzip" compression, or one of "gzip", "bzip2" or "xz" to indicate the type of compression to be used. Ignored if file is a connection. |
Other torch_save:
torch_load()
,
torch_serialize()
Creates a singleton dimension tensor.
torch_scalar_tensor(value, dtype = NULL, device = NULL, requires_grad = FALSE)
torch_scalar_tensor(value, dtype = NULL, device = NULL, requires_grad = FALSE)
value |
the value you want to use |
dtype |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Searchsorted
torch_searchsorted( sorted_sequence, self, out_int32 = FALSE, right = FALSE, side = NULL, sorter = list() )
torch_searchsorted( sorted_sequence, self, out_int32 = FALSE, right = FALSE, side = NULL, sorter = list() )
sorted_sequence |
(Tensor) N-D or 1-D tensor, containing monotonically increasing sequence on the innermost dimension. |
self |
(Tensor or Scalar) N-D tensor or a Scalar containing the search value(s). |
out_int32 |
(bool, optional) – indicate the output data type. |
right |
(bool, optional) – if False, return the first suitable location that is found. If True, return the last such index. If no suitable index found, return 0 for non-numerical value (eg. nan, inf) or the size of boundaries (one pass the last index). In other words, if False, gets the lower bound index for each value in input from boundaries. If True, gets the upper bound index instead. Default value is False. |
side |
the same as right but preferred. “left” corresponds to |
sorter |
if provided, a tensor matching the shape of the unsorted |
Find the indices from the innermost dimension of sorted_sequence
such that, if the
corresponding values in values
were inserted before the indices, the order of the
corresponding innermost dimension within sorted_sequence
would be preserved.
Return a new tensor with the same size as values
. If right
is FALSE (default),
then the left boundary of sorted_sequence
is closed.
if (torch_is_installed()) { sorted_sequence <- torch_tensor(rbind(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10))) sorted_sequence values <- torch_tensor(rbind(c(3, 6, 9), c(3, 6, 9))) values torch_searchsorted(sorted_sequence, values) torch_searchsorted(sorted_sequence, values, right=TRUE) sorted_sequence_1d <- torch_tensor(c(1, 3, 5, 7, 9)) sorted_sequence_1d torch_searchsorted(sorted_sequence_1d, values) }
if (torch_is_installed()) { sorted_sequence <- torch_tensor(rbind(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10))) sorted_sequence values <- torch_tensor(rbind(c(3, 6, 9), c(3, 6, 9))) values torch_searchsorted(sorted_sequence, values) torch_searchsorted(sorted_sequence, values, right=TRUE) sorted_sequence_1d <- torch_tensor(c(1, 3, 5, 7, 9)) sorted_sequence_1d torch_searchsorted(sorted_sequence_1d, values) }
Selu
torch_selu(self)
torch_selu(self)
self |
the input tensor |
Computes the selu transformation.
Selu_
torch_selu_(self)
torch_selu_(self)
self |
the input tensor |
In-place version of torch_selu()
.
It's just a wraper around torch_save()
.
torch_serialize(obj, ...)
torch_serialize(obj, ...)
obj |
the saved object |
... |
Additional arguments passed to |
A raw vector containing the serialized object. Can be reloaded using
torch_load()
.
Other torch_save:
torch_load()
,
torch_save()
Gets and sets the default floating point dtype.
torch_set_default_dtype(d) torch_get_default_dtype()
torch_set_default_dtype(d) torch_get_default_dtype()
d |
The default floating point dtype to set. Initially set to
|
Sgn
torch_sgn(self)
torch_sgn(self)
self |
(Tensor) the input tensor. |
For complex tensors, this function returns a new tensor whose elemants have the same angle as that of the
elements of input
and absolute value 1. For a non-complex tensor, this function
returns the signs of the elements of input
(see torch_sign
).
, if
, otherwise
if (torch_is_installed()) { if (FALSE) { x <- torch_tensor(c(3+4i, 7-24i, 0, 1+2i)) x$sgn() torch_sgn(x) } }
if (torch_is_installed()) { if (FALSE) { x <- torch_tensor(c(3+4i, 7-24i, 0, 1+2i)) x$sgn() torch_sgn(x) } }
Sigmoid
torch_sigmoid(self)
torch_sigmoid(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the sigmoid of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sigmoid(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sigmoid(a) }
Sign
torch_sign(self)
torch_sign(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the signs of the elements of input
.
if (torch_is_installed()) { a = torch_tensor(c(0.7, -1.2, 0., 2.3)) a torch_sign(a) }
if (torch_is_installed()) { a = torch_tensor(c(0.7, -1.2, 0., 2.3)) a torch_sign(a) }
Signbit
torch_signbit(self)
torch_signbit(self)
self |
(Tensor) the input tensor. |
Tests if each element of input
has its sign bit set (is less than zero) or not.
if (torch_is_installed()) { a <- torch_tensor(c(0.7, -1.2, 0., 2.3)) torch_signbit(a) }
if (torch_is_installed()) { a <- torch_tensor(c(0.7, -1.2, 0., 2.3)) torch_signbit(a) }
Sin
torch_sin(self)
torch_sin(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the sine of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sin(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sin(a) }
Sinh
torch_sinh(self)
torch_sinh(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the hyperbolic sine of the elements of
input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sinh(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sinh(a) }
Slogdet
torch_slogdet(self)
torch_slogdet(self)
self |
(Tensor) the input tensor of size |
Calculates the sign and log absolute value of the determinant(s) of a square matrix or batches of square matrices.
If `input` has zero determinant, this returns `(0, -inf)`.
Backward through `slogdet` internally uses SVD results when `input` is not invertible. In this case, double backward through `slogdet` will be unstable in when `input` doesn't have distinct singular values. See `~torch.svd` for details.
if (torch_is_installed()) { A = torch_randn(c(3, 3)) A torch_det(A) torch_logdet(A) torch_slogdet(A) }
if (torch_is_installed()) { A = torch_randn(c(3, 3)) A torch_det(A) torch_logdet(A) torch_slogdet(A) }
Sort
self |
(Tensor) the input tensor. |
dim |
(int, optional) the dimension to sort along |
descending |
(bool, optional) controls the sorting order (ascending or descending) |
stable |
(bool, optional) – makes the sorting routine stable, which guarantees that the order of equivalent elements is preserved. |
Sorts the elements of the input
tensor along a given dimension
in ascending order by value.
If dim
is not given, the last dimension of the input
is chosen.
If descending
is TRUE
then the elements are sorted in descending
order by value.
A namedtuple of (values, indices) is returned, where the values
are the
sorted values and indices
are the indices of the elements in the original
input
tensor.
if (torch_is_installed()) { x = torch_randn(c(3, 4)) out = torch_sort(x) out out = torch_sort(x, 1) out }
if (torch_is_installed()) { x = torch_randn(c(3, 4)) out = torch_sort(x) out out = torch_sort(x, 1) out }
Sparse_coo_tensor
torch_sparse_coo_tensor( indices, values, size = NULL, dtype = NULL, device = NULL, requires_grad = FALSE )
torch_sparse_coo_tensor( indices, values, size = NULL, dtype = NULL, device = NULL, requires_grad = FALSE )
indices |
(array_like) Initial data for the tensor. Can be a list, tuple, NumPy |
values |
(array_like) Initial values for the tensor. Can be a list, tuple, NumPy |
size |
(list, tuple, or |
dtype |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Constructs a sparse tensors in COO(rdinate) format with non-zero elements at the given indices
with the given values
. A sparse tensor can be uncoalesced
, in that case, there are duplicate
coordinates in the indices, and the value at that index is the sum of all duplicate value entries:
torch_sparse
_.
if (torch_is_installed()) { i = torch_tensor(matrix(c(1, 2, 2, 3, 1, 3), ncol = 3, byrow = TRUE), dtype=torch_int64()) v = torch_tensor(c(3, 4, 5), dtype=torch_float32()) torch_sparse_coo_tensor(i, v) torch_sparse_coo_tensor(i, v, c(2, 4)) # create empty sparse tensors S = torch_sparse_coo_tensor( torch_empty(c(1, 0), dtype = torch_int64()), torch_tensor(numeric(), dtype = torch_float32()), c(1) ) S = torch_sparse_coo_tensor( torch_empty(c(1, 0), dtype = torch_int64()), torch_empty(c(0, 2)), c(1, 2) ) }
if (torch_is_installed()) { i = torch_tensor(matrix(c(1, 2, 2, 3, 1, 3), ncol = 3, byrow = TRUE), dtype=torch_int64()) v = torch_tensor(c(3, 4, 5), dtype=torch_float32()) torch_sparse_coo_tensor(i, v) torch_sparse_coo_tensor(i, v, c(2, 4)) # create empty sparse tensors S = torch_sparse_coo_tensor( torch_empty(c(1, 0), dtype = torch_int64()), torch_tensor(numeric(), dtype = torch_float32()), c(1) ) S = torch_sparse_coo_tensor( torch_empty(c(1, 0), dtype = torch_int64()), torch_empty(c(0, 2)), c(1, 2) ) }
Splits the tensor into chunks. Each chunk is a view of the original tensor.
torch_split(self, split_size, dim = 1L)
torch_split(self, split_size, dim = 1L)
self |
(Tensor) tensor to split. |
split_size |
(int) size of a single chunk or list of sizes for each chunk |
dim |
(int) dimension along which to split the tensor. |
If split_size
is an integer type, then tensor
will
be split into equally sized chunks (if possible). Last chunk will be smaller if
the tensor size along the given dimension dim
is not divisible by
split_size
.
If split_size
is a list, then tensor
will be split
into length(split_size)
chunks with sizes in dim
according
to split_size_or_sections
.
Sqrt
torch_sqrt(self)
torch_sqrt(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the square-root of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sqrt(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_sqrt(a) }
Square
torch_square(self)
torch_square(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the square of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_square(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_square(a) }
Squeeze
torch_squeeze(self, dim)
torch_squeeze(self, dim)
self |
(Tensor) the input tensor. |
dim |
(int, optional) if given, the input will be squeezed only in this dimension |
Returns a tensor with all the dimensions of input
of size 1
removed.
For example, if input
is of shape:
then the
out
tensor
will be of shape: .
When dim
is given, a squeeze operation is done only in the given
dimension. If input
is of shape: ,
squeeze(input, 0)
leaves the tensor unchanged, but squeeze(input, 1)
will squeeze the tensor to the shape .
The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.
if (torch_is_installed()) { x = torch_zeros(c(2, 1, 2, 1, 2)) x y = torch_squeeze(x) y y = torch_squeeze(x, 1) y y = torch_squeeze(x, 2) y }
if (torch_is_installed()) { x = torch_zeros(c(2, 1, 2, 1, 2)) x y = torch_squeeze(x) y y = torch_squeeze(x, 1) y y = torch_squeeze(x, 2) y }
Stack
torch_stack(tensors, dim = 1L)
torch_stack(tensors, dim = 1L)
tensors |
(sequence of Tensors) sequence of tensors to concatenate |
dim |
(int) dimension to insert. Has to be between 0 and the number of dimensions of concatenated tensors (inclusive) |
Concatenates sequence of tensors along a new dimension.
All tensors need to be of the same size.
Std
torch_std(self, dim, unbiased = TRUE, keepdim = FALSE)
torch_std(self, dim, unbiased = TRUE, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
unbiased |
(bool) whether to use the unbiased estimation or not |
keepdim |
(bool) whether the output tensor has |
Returns the standard-deviation of all elements in the input
tensor.
If unbiased
is FALSE
, then the standard-deviation will be calculated
via the biased estimator. Otherwise, Bessel's correction will be used.
Returns the standard-deviation of each row of the input
tensor in the
dimension dim
. If dim
is a list of dimensions,
reduce over all of them.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
If unbiased
is FALSE
, then the standard-deviation will be calculated
via the biased estimator. Otherwise, Bessel's correction will be used.
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_std(a) a = torch_randn(c(4, 4)) a torch_std(a, dim=1) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_std(a) a = torch_randn(c(4, 4)) a torch_std(a, dim=1) }
Std_mean
torch_std_mean(self, dim, unbiased = TRUE, keepdim = FALSE)
torch_std_mean(self, dim, unbiased = TRUE, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
unbiased |
(bool) whether to use the unbiased estimation or not |
keepdim |
(bool) whether the output tensor has |
Returns the standard-deviation and mean of all elements in the input
tensor.
If unbiased
is FALSE
, then the standard-deviation will be calculated
via the biased estimator. Otherwise, Bessel's correction will be used.
Returns the standard-deviation and mean of each row of the input
tensor in the
dimension dim
. If dim
is a list of dimensions,
reduce over all of them.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
If unbiased
is FALSE
, then the standard-deviation will be calculated
via the biased estimator. Otherwise, Bessel's correction will be used.
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_std_mean(a) a = torch_randn(c(4, 4)) a torch_std_mean(a, 1) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_std_mean(a) a = torch_randn(c(4, 4)) a torch_std_mean(a, 1) }
Stft
torch_stft( input, n_fft, hop_length = NULL, win_length = NULL, window = NULL, center = TRUE, pad_mode = "reflect", normalized = FALSE, onesided = NULL, return_complex = NULL )
torch_stft( input, n_fft, hop_length = NULL, win_length = NULL, window = NULL, center = TRUE, pad_mode = "reflect", normalized = FALSE, onesided = NULL, return_complex = NULL )
input |
(Tensor) the input tensor |
n_fft |
(int) size of Fourier transform |
hop_length |
(int, optional) the distance between neighboring sliding window
frames. Default: |
win_length |
(int, optional) the size of window frame and STFT filter.
Default: |
window |
(Tensor, optional) the optional window function.
Default: |
center |
(bool, optional) whether to pad |
pad_mode |
(string, optional) controls the padding method used when
|
normalized |
(bool, optional) controls whether to return the normalized
STFT results Default: |
onesided |
(bool, optional) controls whether to return half of results to
avoid redundancy Default: |
return_complex |
(bool, optional) controls whether to return complex tensors or not. |
Short-time Fourier transform (STFT).
Ignoring the optional batch dimension, this method computes the following expression:
where is the index of the sliding window, and
is
the frequency that
. When
onesided
is the default value TRUE
,
* `input` must be either a 1-D time sequence or a 2-D batch of time sequences. * If `hop_length` is `NULL` (default), it is treated as equal to `floor(n_fft / 4)`. * If `win_length` is `NULL` (default), it is treated as equal to `n_fft`. * `window` can be a 1-D tensor of size `win_length`, e.g., from `torch_hann_window`. If `window` is `NULL` (default), it is treated as if having \eqn{1} everywhere in the window. If \eqn{\mbox{win\_length} < \mbox{n\_fft}}, `window` will be padded on both sides to length `n_fft` before being applied. * If `center` is `TRUE` (default), `input` will be padded on both sides so that the \eqn{t}-th frame is centered at time \eqn{t \times \mbox{hop\_length}}. Otherwise, the \eqn{t}-th frame begins at time \eqn{t \times \mbox{hop\_length}}. * `pad_mode` determines the padding method used on `input` when `center` is `TRUE`. See `torch_nn.functional.pad` for all available options. Default is `"reflect"`. * If `onesided` is `TRUE` (default), only values for \eqn{\omega} in \eqn{\left[0, 1, 2, \dots, \left\lfloor \frac{\mbox{n\_fft}}{2} \right\rfloor + 1\right]} are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., \eqn{X[m, \omega] = X[m, \mbox{n\_fft} - \omega]^*}. * If `normalized` is `TRUE` (default is `FALSE`), the function returns the normalized STFT results, i.e., multiplied by \eqn{(\mbox{frame\_length})^{-0.5}}. Returns the real and the imaginary parts together as one tensor of size \eqn{(* \times N \times T \times 2)}, where \eqn{*} is the optional batch size of `input`, \eqn{N} is the number of frequencies where STFT is applied, \eqn{T} is the total number of frames used, and each pair in the last dimension represents a complex number as the real part and the imaginary part.
This function changed signature at version 0.4.1. Calling with the previous signature may cause error or return incorrect result.
Sub
torch_sub(self, other, alpha = 1L)
torch_sub(self, other, alpha = 1L)
self |
(Tensor) the input tensor. |
other |
(Tensor or Scalar) the tensor or scalar to subtract from |
alpha |
the scalar multiplier for other |
Subtracts other
, scaled by alpha
, from input
.
Supports broadcasting to a common shape , type promotion , and integer, float, and complex inputs.
if (torch_is_installed()) { a <- torch_tensor(c(1, 2)) b <- torch_tensor(c(0, 1)) torch_sub(a, b, alpha=2) }
if (torch_is_installed()) { a <- torch_tensor(c(1, 2)) b <- torch_tensor(c(0, 1)) torch_sub(a, b, alpha=2) }
Subtract
torch_subtract(self, other, alpha = 1L)
torch_subtract(self, other, alpha = 1L)
self |
(Tensor) the input tensor. |
other |
(Tensor or Scalar) the tensor or scalar to subtract from |
alpha |
the scalar multiplier for other |
Alias for torch_sub()
.
Sum
torch_sum(self, dim, keepdim = FALSE, dtype = NULL)
torch_sum(self, dim, keepdim = FALSE, dtype = NULL)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
keepdim |
(bool) whether the output tensor has |
dtype |
( |
Returns the sum of all elements in the input
tensor.
Returns the sum of each row of the input
tensor in the given
dimension dim
. If dim
is a list of dimensions,
reduce over all of them.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_sum(a) a <- torch_randn(c(4, 4)) a torch_sum(a, 1) b <- torch_arange(1, 4 * 5 * 6)$view(c(4, 5, 6)) torch_sum(b, list(2, 1)) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_sum(a) a <- torch_randn(c(4, 4)) a torch_sum(a, 1) b <- torch_arange(1, 4 * 5 * 6)$view(c(4, 5, 6)) torch_sum(b, list(2, 1)) }
Svd
torch_svd(self, some = TRUE, compute_uv = TRUE)
torch_svd(self, some = TRUE, compute_uv = TRUE)
self |
(Tensor) the input tensor of size |
some |
(bool, optional) controls the shape of returned |
compute_uv |
(bool, optional) option whether to compute |
This function returns a namedtuple (U, S, V)
which is the singular value
decomposition of a input real matrix or batches of real matrices input
such that
.
If some
is TRUE
(default), the method returns the reduced singular value decomposition
i.e., if the last two dimensions of input
are m
and n
, then the returned
U
and V
matrices will contain only orthonormal columns.
If compute_uv
is FALSE
, the returned U
and V
matrices will be zero matrices
of shape and
respectively.
some
will be ignored here.
The singular values are returned in descending order. If input
is a batch of matrices,
then the singular values of each matrix in the batch is returned in descending order.
The implementation of SVD on CPU uses the LAPACK routine ?gesdd
(a divide-and-conquer
algorithm) instead of ?gesvd
for speed. Analogously, the SVD on GPU uses the MAGMA routine
gesdd
as well.
Irrespective of the original strides, the returned matrix U
will be transposed, i.e. with strides U.contiguous().transpose(-2, -1).stride()
Extra care needs to be taken when backward through U
and V
outputs. Such operation is really only stable when input
is
full rank with all distinct singular values. Otherwise, NaN
can
appear as the gradients are not properly defined. Also, notice that
double backward will usually do an additional backward through U
and
V
even if the original backward is only on S
.
When some
= FALSE
, the gradients on U[..., :, min(m, n):]
and V[..., :, min(m, n):]
will be ignored in backward as those vectors
can be arbitrary bases of the subspaces.
When compute_uv
= FALSE
, backward cannot be performed since U
and V
from the forward pass is required for the backward operation.
if (torch_is_installed()) { a = torch_randn(c(5, 3)) a out = torch_svd(a) u = out[[1]] s = out[[2]] v = out[[3]] torch_dist(a, torch_mm(torch_mm(u, torch_diag(s)), v$t())) a_big = torch_randn(c(7, 5, 3)) out = torch_svd(a_big) u = out[[1]] s = out[[2]] v = out[[3]] torch_dist(a_big, torch_matmul(torch_matmul(u, torch_diag_embed(s)), v$transpose(-2, -1))) }
if (torch_is_installed()) { a = torch_randn(c(5, 3)) a out = torch_svd(a) u = out[[1]] s = out[[2]] v = out[[3]] torch_dist(a, torch_mm(torch_mm(u, torch_diag(s)), v$t())) a_big = torch_randn(c(7, 5, 3)) out = torch_svd(a_big) u = out[[1]] s = out[[2]] v = out[[3]] torch_dist(a_big, torch_matmul(torch_matmul(u, torch_diag_embed(s)), v$transpose(-2, -1))) }
T
torch_t(self)
torch_t(self)
self |
(Tensor) the input tensor. |
Expects input
to be <= 2-D tensor and transposes dimensions 0
and 1.
0-D and 1-D tensors are returned as is. When input is a 2-D tensor this
is equivalent to transpose(input, 0, 1)
.
if (torch_is_installed()) { x = torch_randn(c(2,3)) x torch_t(x) x = torch_randn(c(3)) x torch_t(x) x = torch_randn(c(2, 3)) x torch_t(x) }
if (torch_is_installed()) { x = torch_randn(c(2,3)) x torch_t(x) x = torch_randn(c(3)) x torch_t(x) x = torch_randn(c(2, 3)) x torch_t(x) }
Take
torch_take(self, index)
torch_take(self, index)
self |
(Tensor) the input tensor. |
index |
(LongTensor) the indices into tensor |
Returns a new tensor with the elements of input
at the given indices.
The input tensor is treated as if it were viewed as a 1-D tensor. The result
takes the same shape as the indices.
if (torch_is_installed()) { src = torch_tensor(matrix(c(4,3,5,6,7,8), ncol = 3, byrow = TRUE)) torch_take(src, torch_tensor(c(1, 2, 5), dtype = torch_int64())) }
if (torch_is_installed()) { src = torch_tensor(matrix(c(4,3,5,6,7,8), ncol = 3, byrow = TRUE)) torch_take(src, torch_tensor(c(1, 2, 5), dtype = torch_int64())) }
Tan
torch_tan(self)
torch_tan(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the tangent of the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_tan(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_tan(a) }
Tanh
torch_tanh(self)
torch_tanh(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the hyperbolic tangent of the elements
of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_tanh(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_tanh(a) }
Converts R objects to a torch tensor
torch_tensor( data, dtype = NULL, device = NULL, requires_grad = FALSE, pin_memory = FALSE )
torch_tensor( data, dtype = NULL, device = NULL, requires_grad = FALSE, pin_memory = FALSE )
data |
an R atomic vector, matrix or array |
dtype |
a torch_dtype instance |
device |
a device creted with |
requires_grad |
if autograd should record operations on the returned tensor. |
pin_memory |
If set, returned tensor would be allocated in the pinned memory. |
if (torch_is_installed()) { torch_tensor(c(1, 2, 3, 4)) torch_tensor(c(1, 2, 3, 4), dtype = torch_int()) }
if (torch_is_installed()) { torch_tensor(c(1, 2, 3, 4)) torch_tensor(c(1, 2, 3, 4), dtype = torch_int()) }
It creates a tensor without taking ownership of the memory it points to.
You must call clone
if you want to copy the memory over a new tensor.
torch_tensor_from_buffer(buffer, shape, dtype = "float") buffer_from_torch_tensor(tensor)
torch_tensor_from_buffer(buffer, shape, dtype = "float") buffer_from_torch_tensor(tensor)
buffer |
An R atomic object containing the data in a contiguous array. |
shape |
The shape of the resulting tensor. |
dtype |
A torch data type for the tresulting tensor. |
tensor |
Tensor object that will be converted into a buffer. |
buffer_from_torch_tensor()
: Creates a raw vector containing the tensor data. Causes a data copy.
Returns a contraction of a and b over multiple dimensions.
tensordot
implements a generalized matrix product.
torch_tensordot(a, b, dims = 2)
torch_tensordot(a, b, dims = 2)
a |
(Tensor) Left tensor to contract |
b |
(Tensor) Right tensor to contract |
dims |
(int or tuple of two lists of integers) number of dimensions to contract or explicit lists of dimensions for |
if (torch_is_installed()) { a <- torch_arange(start = 1, end = 60)$reshape(c(3, 4, 5)) b <- torch_arange(start = 1, end = 24)$reshape(c(4, 3, 2)) torch_tensordot(a, b, dims = list(c(2, 1), c(1, 2))) ## Not run: a = torch_randn(3, 4, 5, device='cuda') b = torch_randn(4, 5, 6, device='cuda') c = torch_tensordot(a, b, dims=2)$cpu() ## End(Not run) }
if (torch_is_installed()) { a <- torch_arange(start = 1, end = 60)$reshape(c(3, 4, 5)) b <- torch_arange(start = 1, end = 24)$reshape(c(4, 3, 2)) torch_tensordot(a, b, dims = list(c(2, 1), c(1, 2))) ## Not run: a = torch_randn(3, 4, 5, device='cuda') b = torch_randn(4, 5, 6, device='cuda') c = torch_tensordot(a, b, dims=2)$cpu() ## End(Not run) }
Threshold_
torch_threshold_(self, threshold, value)
torch_threshold_(self, threshold, value)
self |
input tensor |
threshold |
The value to threshold at |
value |
The value to replace with |
In-place version of torch_threshold
.
Topk
torch_topk(self, k, dim = -1L, largest = TRUE, sorted = TRUE)
torch_topk(self, k, dim = -1L, largest = TRUE, sorted = TRUE)
self |
(Tensor) the input tensor. |
k |
(int) the k in "top-k" |
dim |
(int, optional) the dimension to sort along |
largest |
(bool, optional) controls whether to return largest or smallest elements |
sorted |
(bool, optional) controls whether to return the elements in sorted order |
Returns the k
largest elements of the given input
tensor along
a given dimension.
If dim
is not given, the last dimension of the input
is chosen.
If largest
is FALSE
then the k
smallest elements are returned.
A namedtuple of (values, indices)
is returned, where the indices
are the indices
of the elements in the original input
tensor.
The boolean option sorted
if TRUE
, will make sure that the returned
k
elements are themselves sorted
if (torch_is_installed()) { x = torch_arange(1., 6.) x torch_topk(x, 3) }
if (torch_is_installed()) { x = torch_arange(1., 6.) x torch_topk(x, 3) }
Trace
torch_trace(self)
torch_trace(self)
self |
the input tensor |
Returns the sum of the elements of the diagonal of the input 2-D matrix.
if (torch_is_installed()) { x <- torch_arange(1, 9)$view(c(3, 3)) x torch_trace(x) }
if (torch_is_installed()) { x <- torch_arange(1, 9)$view(c(3, 3)) x torch_trace(x) }
Transpose
torch_transpose(self, dim0, dim1)
torch_transpose(self, dim0, dim1)
self |
(Tensor) the input tensor. |
dim0 |
(int) the first dimension to be transposed |
dim1 |
(int) the second dimension to be transposed |
Returns a tensor that is a transposed version of input
.
The given dimensions dim0
and dim1
are swapped.
The resulting out
tensor shares it's underlying storage with the
input
tensor, so changing the content of one would change the content
of the other.
if (torch_is_installed()) { x = torch_randn(c(2, 3)) x torch_transpose(x, 1, 2) }
if (torch_is_installed()) { x = torch_randn(c(2, 3)) x torch_transpose(x, 1, 2) }
Trapz
torch_trapz(y, dx = 1L, x, dim = -1L)
torch_trapz(y, dx = 1L, x, dim = -1L)
y |
(Tensor) The values of the function to integrate |
dx |
(float) The distance between points at which |
x |
(Tensor) The points at which the function |
dim |
(int) The dimension along which to integrate. By default, use the last dimension. |
Estimate along
dim
, using the trapezoid rule.
As above, but the sample points are spaced uniformly at a distance of dx
.
if (torch_is_installed()) { y = torch_randn(list(2, 3)) y x = torch_tensor(matrix(c(1, 3, 4, 1, 2, 3), ncol = 3, byrow=TRUE)) torch_trapz(y, x = x) }
if (torch_is_installed()) { y = torch_randn(list(2, 3)) y x = torch_tensor(matrix(c(1, 3, 4, 1, 2, 3), ncol = 3, byrow=TRUE)) torch_trapz(y, x = x) }
Triangular_solve
torch_triangular_solve( self, A, upper = TRUE, transpose = FALSE, unitriangular = FALSE )
torch_triangular_solve( self, A, upper = TRUE, transpose = FALSE, unitriangular = FALSE )
self |
(Tensor) multiple right-hand sides of size |
A |
(Tensor) the input triangular coefficient matrix of size |
upper |
(bool, optional) whether to solve the upper-triangular system of equations (default) or the lower-triangular system of equations. Default: |
transpose |
(bool, optional) whether |
unitriangular |
(bool, optional) whether |
Solves a system of equations with a triangular coefficient matrix
and multiple right-hand sides
.
In particular, solves and assumes
is upper-triangular
with the default keyword arguments.
torch_triangular_solve(b, A)
can take in 2D inputs b, A
or inputs that are
batches of 2D matrices. If the inputs are batches, then returns
batched outputs X
if (torch_is_installed()) { A = torch_randn(c(2, 2))$triu() A b = torch_randn(c(2, 3)) b torch_triangular_solve(b, A) }
if (torch_is_installed()) { A = torch_randn(c(2, 2))$triu() A b = torch_randn(c(2, 3)) b torch_triangular_solve(b, A) }
Tril
torch_tril(self, diagonal = 0L)
torch_tril(self, diagonal = 0L)
self |
(Tensor) the input tensor. |
diagonal |
(int, optional) the diagonal to consider |
Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices
input
, the other elements of the result tensor out
are set to 0.
The lower triangular part of the matrix is defined as the elements on and below the diagonal.
The argument diagonal
controls which diagonal to consider. If
diagonal
= 0, all elements on and below the main diagonal are
retained. A positive value includes just as many diagonals above the main
diagonal, and similarly a negative value excludes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
for
where
are the dimensions of the matrix.
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a torch_tril(a) b = torch_randn(c(4, 6)) b torch_tril(b, diagonal=1) torch_tril(b, diagonal=-1) }
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a torch_tril(a) b = torch_randn(c(4, 6)) b torch_tril(b, diagonal=1) torch_tril(b, diagonal=-1) }
Tril_indices
torch_tril_indices( row, col, offset = 0, dtype = NULL, device = NULL, layout = NULL )
torch_tril_indices( row, col, offset = 0, dtype = NULL, device = NULL, layout = NULL )
row |
( |
col |
( |
offset |
( |
dtype |
( |
device |
( |
layout |
( |
Returns the indices of the lower triangular part of a row
-by-
col
matrix in a 2-by-N Tensor, where the first row contains row
coordinates of all indices and the second row contains column coordinates.
Indices are ordered based on rows and then columns.
The lower triangular part of the matrix is defined as the elements on and below the diagonal.
The argument offset
controls which diagonal to consider. If
offset
= 0, all elements on and below the main diagonal are
retained. A positive value includes just as many diagonals above the main
diagonal, and similarly a negative value excludes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
for
where
are the dimensions of the matrix.
When running on CUDA, `row * col` must be less than \eqn{2^{59}} to prevent overflow during calculation.
if (torch_is_installed()) { ## Not run: a = torch_tril_indices(3, 3) a a = torch_tril_indices(4, 3, -1) a a = torch_tril_indices(4, 3, 1) a ## End(Not run) }
if (torch_is_installed()) { ## Not run: a = torch_tril_indices(3, 3) a a = torch_tril_indices(4, 3, -1) a a = torch_tril_indices(4, 3, 1) a ## End(Not run) }
Triu
torch_triu(self, diagonal = 0L)
torch_triu(self, diagonal = 0L)
self |
(Tensor) the input tensor. |
diagonal |
(int, optional) the diagonal to consider |
Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices
input
, the other elements of the result tensor out
are set to 0.
The upper triangular part of the matrix is defined as the elements on and above the diagonal.
The argument diagonal
controls which diagonal to consider. If
diagonal
= 0, all elements on and above the main diagonal are
retained. A positive value excludes just as many diagonals above the main
diagonal, and similarly a negative value includes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
for
where
are the dimensions of the matrix.
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a torch_triu(a) torch_triu(a, diagonal=1) torch_triu(a, diagonal=-1) b = torch_randn(c(4, 6)) b torch_triu(b, diagonal=1) torch_triu(b, diagonal=-1) }
if (torch_is_installed()) { a = torch_randn(c(3, 3)) a torch_triu(a) torch_triu(a, diagonal=1) torch_triu(a, diagonal=-1) b = torch_randn(c(4, 6)) b torch_triu(b, diagonal=1) torch_triu(b, diagonal=-1) }
Triu_indices
torch_triu_indices( row, col, offset = 0, dtype = NULL, device = NULL, layout = NULL )
torch_triu_indices( row, col, offset = 0, dtype = NULL, device = NULL, layout = NULL )
row |
( |
col |
( |
offset |
( |
dtype |
( |
device |
( |
layout |
( |
Returns the indices of the upper triangular part of a row
by
col
matrix in a 2-by-N Tensor, where the first row contains row
coordinates of all indices and the second row contains column coordinates.
Indices are ordered based on rows and then columns.
The upper triangular part of the matrix is defined as the elements on and above the diagonal.
The argument offset
controls which diagonal to consider. If
offset
= 0, all elements on and above the main diagonal are
retained. A positive value excludes just as many diagonals above the main
diagonal, and similarly a negative value includes just as many diagonals below
the main diagonal. The main diagonal are the set of indices
for
where
are the dimensions of the matrix.
When running on CUDA, `row * col` must be less than \eqn{2^{59}} to prevent overflow during calculation.
if (torch_is_installed()) { ## Not run: a = torch_triu_indices(3, 3) a a = torch_triu_indices(4, 3, -1) a a = torch_triu_indices(4, 3, 1) a ## End(Not run) }
if (torch_is_installed()) { ## Not run: a = torch_triu_indices(3, 3) a a = torch_triu_indices(4, 3, -1) a a = torch_triu_indices(4, 3, 1) a ## End(Not run) }
TRUE_divide
torch_true_divide(self, other)
torch_true_divide(self, other)
self |
(Tensor) the dividend |
other |
(Tensor or Scalar) the divisor |
Performs "true division" that always computes the division
in floating point. Analogous to division in Python 3 and equivalent to
torch_div
except when both inputs have bool or integer scalar types,
in which case they are cast to the default (floating) scalar type before the division.
if (torch_is_installed()) { dividend = torch_tensor(c(5, 3), dtype=torch_int()) divisor = torch_tensor(c(3, 2), dtype=torch_int()) torch_true_divide(dividend, divisor) torch_true_divide(dividend, 2) }
if (torch_is_installed()) { dividend = torch_tensor(c(5, 3), dtype=torch_int()) divisor = torch_tensor(c(3, 2), dtype=torch_int()) torch_true_divide(dividend, divisor) torch_true_divide(dividend, 2) }
Trunc
torch_trunc(self)
torch_trunc(self)
self |
(Tensor) the input tensor. |
Returns a new tensor with the truncated integer values of
the elements of input
.
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_trunc(a) }
if (torch_is_installed()) { a = torch_randn(c(4)) a torch_trunc(a) }
Unbind
torch_unbind(self, dim = 1L)
torch_unbind(self, dim = 1L)
self |
(Tensor) the tensor to unbind |
dim |
(int) dimension to remove |
Removes a tensor dimension.
Returns a tuple of all slices along a given dimension, already without it.
if (torch_is_installed()) { torch_unbind(torch_tensor(matrix(1:9, ncol = 3, byrow=TRUE))) }
if (torch_is_installed()) { torch_unbind(torch_tensor(matrix(1:9, ncol = 3, byrow=TRUE))) }
Unique_consecutive
torch_unique_consecutive( self, return_inverse = FALSE, return_counts = FALSE, dim = NULL )
torch_unique_consecutive( self, return_inverse = FALSE, return_counts = FALSE, dim = NULL )
self |
(Tensor) the input tensor |
return_inverse |
(bool) Whether to also return the indices for where elements in the original input ended up in the returned unique list. |
return_counts |
(bool) Whether to also return the counts for each unique element. |
dim |
(int) the dimension to apply unique. If |
Eliminates all but the first element from every consecutive group of equivalent elements.
.. note:: This function is different from [`torch_unique`] in the sense that this function only eliminates consecutive duplicate values. This semantics is similar to `std::unique` in C++.
if (torch_is_installed()) { x = torch_tensor(c(1, 1, 2, 2, 3, 1, 1, 2)) output = torch_unique_consecutive(x) output torch_unique_consecutive(x, return_inverse=TRUE) torch_unique_consecutive(x, return_counts=TRUE) }
if (torch_is_installed()) { x = torch_tensor(c(1, 1, 2, 2, 3, 1, 1, 2)) output = torch_unique_consecutive(x) output torch_unique_consecutive(x, return_inverse=TRUE) torch_unique_consecutive(x, return_counts=TRUE) }
Unsafe_chunk
torch_unsafe_chunk(self, chunks, dim = 1L)
torch_unsafe_chunk(self, chunks, dim = 1L)
self |
(Tensor) the tensor to split |
chunks |
(int) number of chunks to return |
dim |
(int) dimension along which to split the tensor |
Works like torch_chunk()
but without enforcing the autograd restrictions
on inplace modification of the outputs.
This function is safe to use as long as only the input, or only the outputs are modified inplace after calling this function. It is user's responsibility to ensure that is the case. If both the input and one or more of the outputs are modified inplace, gradients computed by autograd will be silently incorrect.
Unsafe_split
torch_unsafe_split(self, split_size, dim = 1L)
torch_unsafe_split(self, split_size, dim = 1L)
self |
(Tensor) tensor to split. |
split_size |
(int) size of a single chunk or list of sizes for each chunk |
dim |
(int) dimension along which to split the tensor. |
Works like torch_split()
but without enforcing the autograd restrictions
on inplace modification of the outputs.
This function is safe to use as long as only the input, or only the outputs are modified inplace after calling this function. It is user's responsibility to ensure that is the case. If both the input and one or more of the outputs are modified inplace, gradients computed by autograd will be silently incorrect.
Unsqueeze
torch_unsqueeze(self, dim)
torch_unsqueeze(self, dim)
self |
(Tensor) the input tensor. |
dim |
(int) the index at which to insert the singleton dimension |
Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
A dim
value within the range [-input.dim() - 1, input.dim() + 1)
can be used. Negative dim
will correspond to unsqueeze
applied at dim
= dim + input.dim() + 1
.
if (torch_is_installed()) { x = torch_tensor(c(1, 2, 3, 4)) torch_unsqueeze(x, 1) torch_unsqueeze(x, 2) }
if (torch_is_installed()) { x = torch_tensor(c(1, 2, 3, 4)) torch_unsqueeze(x, 1) torch_unsqueeze(x, 2) }
Vander
torch_vander(x, N = NULL, increasing = FALSE)
torch_vander(x, N = NULL, increasing = FALSE)
x |
(Tensor) 1-D input tensor. |
N |
(int, optional) Number of columns in the output. If N is not specified,
a square array is returned |
increasing |
(bool, optional) Order of the powers of the columns. If TRUE, the powers increase from left to right, if FALSE (the default) they are reversed. |
Generates a Vandermonde matrix.
The columns of the output matrix are elementwise powers of the input vector
.
If increasing is TRUE, the order of the columns is reversed
. Such a
matrix with a geometric progression in each row is
named for Alexandre-Theophile Vandermonde.
if (torch_is_installed()) { x <- torch_tensor(c(1, 2, 3, 5)) torch_vander(x) torch_vander(x, N=3) torch_vander(x, N=3, increasing=TRUE) }
if (torch_is_installed()) { x <- torch_tensor(c(1, 2, 3, 5)) torch_vander(x) torch_vander(x, N=3) torch_vander(x, N=3, increasing=TRUE) }
Var
torch_var(self, dim, unbiased = TRUE, keepdim = FALSE)
torch_var(self, dim, unbiased = TRUE, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
unbiased |
(bool) whether to use the unbiased estimation or not |
keepdim |
(bool) whether the output tensor has |
Returns the variance of all elements in the input
tensor.
If unbiased
is FALSE
, then the variance will be calculated via the
biased estimator. Otherwise, Bessel's correction will be used.
Returns the variance of each row of the input
tensor in the given
dimension dim
.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
If unbiased
is FALSE
, then the variance will be calculated via the
biased estimator. Otherwise, Bessel's correction will be used.
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_var(a) a = torch_randn(c(4, 4)) a torch_var(a, 1) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_var(a) a = torch_randn(c(4, 4)) a torch_var(a, 1) }
Var_mean
torch_var_mean(self, dim, unbiased = TRUE, keepdim = FALSE)
torch_var_mean(self, dim, unbiased = TRUE, keepdim = FALSE)
self |
(Tensor) the input tensor. |
dim |
(int or tuple of ints) the dimension or dimensions to reduce. |
unbiased |
(bool) whether to use the unbiased estimation or not |
keepdim |
(bool) whether the output tensor has |
Returns the variance and mean of all elements in the input
tensor.
If unbiased
is FALSE
, then the variance will be calculated via the
biased estimator. Otherwise, Bessel's correction will be used.
Returns the variance and mean of each row of the input
tensor in the given
dimension dim
.
If keepdim
is TRUE
, the output tensor is of the same size
as input
except in the dimension(s) dim
where it is of size 1.
Otherwise, dim
is squeezed (see torch_squeeze
), resulting in the
output tensor having 1 (or len(dim)
) fewer dimension(s).
If unbiased
is FALSE
, then the variance will be calculated via the
biased estimator. Otherwise, Bessel's correction will be used.
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_var_mean(a) a = torch_randn(c(4, 4)) a torch_var_mean(a, 1) }
if (torch_is_installed()) { a = torch_randn(c(1, 3)) a torch_var_mean(a) a = torch_randn(c(4, 4)) a torch_var_mean(a, 1) }
Vdot
torch_vdot(self, other)
torch_vdot(self, other)
self |
(Tensor) first tensor in the dot product. Its conjugate is used if it's complex. |
other |
(Tensor) second tensor in the dot product. |
Computes the dot product (inner product) of two tensors. The vdot(a, b) function handles complex numbers differently than dot(a, b). If the first argument is complex, the complex conjugate of the first argument is used for the calculation of the dot product.
This function does not broadcast .
if (torch_is_installed()) { torch_vdot(torch_tensor(c(2, 3)), torch_tensor(c(2, 1))) if (FALSE) { a <- torch_tensor(list(1 +2i, 3 - 1i)) b <- torch_tensor(list(2 +1i, 4 - 0i)) torch_vdot(a, b) torch_vdot(b, a) } }
if (torch_is_installed()) { torch_vdot(torch_tensor(c(2, 3)), torch_tensor(c(2, 1))) if (FALSE) { a <- torch_tensor(list(1 +2i, 3 - 1i)) b <- torch_tensor(list(2 +1i, 4 - 0i)) torch_vdot(a, b) torch_vdot(b, a) } }
View_as_complex
torch_view_as_complex(self)
torch_view_as_complex(self)
self |
(Tensor) the input tensor. |
Returns a view of input
as a complex tensor. For an input complex
tensor of size
, this function returns a
new complex tensor of
size
where the last
dimension of the input tensor is expected to represent the real and imaginary
components of complex numbers.
torch_view_as_complex is only supported for tensors with
torch_dtype
torch_float64()
and torch_float32()
. The input is
expected to have the last dimension of size
2. In addition, the
tensor must have a stride
of 1 for its last dimension. The strides of all
other dimensions must be even numbers.
if (torch_is_installed()) { if (FALSE) { x=torch_randn(c(4, 2)) x torch_view_as_complex(x) } }
if (torch_is_installed()) { if (FALSE) { x=torch_randn(c(4, 2)) x torch_view_as_complex(x) } }
View_as_real
torch_view_as_real(self)
torch_view_as_real(self)
self |
(Tensor) the input tensor. |
Returns a view of input
as a real tensor. For an input complex tensor of
size
, this function returns a new
real tensor of size
, where the last dimension of size 2
represents the real and imaginary components of complex numbers.
torch_view_as_real()
is only supported for tensors with complex dtypes
.
if (torch_is_installed()) { if (FALSE) { x <- torch_randn(4, dtype=torch_cfloat()) x torch_view_as_real(x) } }
if (torch_is_installed()) { if (FALSE) { x <- torch_randn(4, dtype=torch_cfloat()) x torch_view_as_real(x) } }
Vstack
torch_vstack(tensors)
torch_vstack(tensors)
tensors |
(sequence of Tensors) sequence of tensors to concatenate |
Stack tensors in sequence vertically (row wise).
This is equivalent to concatenation along the first axis after all 1-D tensors
have been reshaped by torch_atleast_2d()
.
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, 3)) b <- torch_tensor(c(4, 5, 6)) torch_vstack(list(a,b)) a <- torch_tensor(rbind(1,2,3)) b <- torch_tensor(rbind(4,5,6)) torch_vstack(list(a,b)) }
if (torch_is_installed()) { a <- torch_tensor(c(1, 2, 3)) b <- torch_tensor(c(4, 5, 6)) torch_vstack(list(a,b)) a <- torch_tensor(rbind(1,2,3)) b <- torch_tensor(rbind(4,5,6)) torch_vstack(list(a,b)) }
Where
torch_where(condition, self = NULL, other = NULL)
torch_where(condition, self = NULL, other = NULL)
condition |
(BoolTensor) When TRUE (nonzero), yield x, otherwise yield y |
self |
(Tensor) values selected at indices where |
other |
(Tensor) values selected at indices where |
Return a tensor of elements selected from either x
or y
, depending on condition
.
The operation is defined as:
torch_where(condition)
is identical to
torch_nonzero(condition, as_tuple=TRUE)
.
The tensors `condition`, `x`, `y` must be broadcastable .
See also torch_nonzero()
.
if (torch_is_installed()) { ## Not run: x = torch_randn(c(3, 2)) y = torch_ones(c(3, 2)) x torch_where(x > 0, x, y) ## End(Not run) }
if (torch_is_installed()) { ## Not run: x = torch_randn(c(3, 2)) y = torch_ones(c(3, 2)) x torch_where(x > 0, x, y) ## End(Not run) }
Zeros
torch_zeros( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
torch_zeros( ..., names = NULL, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE )
... |
a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. |
names |
optional dimension names |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
Returns a tensor filled with the scalar value 0
, with the shape defined
by the variable argument size
.
if (torch_is_installed()) { torch_zeros(c(2, 3)) torch_zeros(c(5)) }
if (torch_is_installed()) { torch_zeros(c(2, 3)) torch_zeros(c(5)) }
Zeros_like
torch_zeros_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
torch_zeros_like( input, dtype = NULL, layout = NULL, device = NULL, requires_grad = FALSE, memory_format = torch_preserve_format() )
input |
(Tensor) the size of |
dtype |
( |
layout |
( |
device |
( |
requires_grad |
(bool, optional) If autograd should record operations on the returned tensor. Default: |
memory_format |
( |
Returns a tensor filled with the scalar value 0
, with the same size as
input
. torch_zeros_like(input)
is equivalent to
torch_zeros(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)
.
As of 0.4, this function does not support an out
keyword. As an alternative,
the old torch_zeros_like(input, out=output)
is equivalent to
torch_zeros(input.size(), out=output)
.
if (torch_is_installed()) { input = torch_empty(c(2, 3)) torch_zeros_like(input) }
if (torch_is_installed()) { input = torch_empty(c(2, 3)) torch_zeros_like(input) }
This does two things:
with_detect_anomaly(code)
with_detect_anomaly(code)
code |
Code that will be executed in the detect anomaly context. |
Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function.
Any backward computation that generate "nan" value will raise an error.
This mode should be enabled only for debugging as the different tests will slow down your program execution.
if (torch_is_installed()) { x <- torch_randn(2, requires_grad = TRUE) y <- torch_randn(1) b <- (x^y)$sum() y$add_(1) try({ b$backward() with_detect_anomaly({ b$backward() }) }) }
if (torch_is_installed()) { x <- torch_randn(2, requires_grad = TRUE) y <- torch_randn(1) b <- (x^y)$sum() y$add_(1) try({ b$backward() with_detect_anomaly({ b$backward() }) }) }
Context-manager that enables gradient calculation. Enables gradient calculation, if it has been disabled via with_no_grad.
with_enable_grad(code) local_enable_grad(.env = parent.frame())
with_enable_grad(code) local_enable_grad(.env = parent.frame())
code |
code to be executed with gradient recording. |
.env |
The environment to use for scoping. |
This context manager is thread local; it will not affect computation in other threads.
local_enable_grad()
: Locally enable gradient computations.
if (torch_is_installed()) { x <- torch_tensor(1, requires_grad = TRUE) with_no_grad({ with_enable_grad({ y <- x * 2 }) }) y$backward() x$grad }
if (torch_is_installed()) { x <- torch_tensor(1, requires_grad = TRUE) with_no_grad({ with_enable_grad({ y <- x * 2 }) }) y$backward() x$grad }
Temporarily modify gradient recording.
with_no_grad(code) local_no_grad(.env = parent.frame())
with_no_grad(code) local_no_grad(.env = parent.frame())
code |
code to be executed with no gradient recording. |
.env |
The environment to use for scoping. |
local_no_grad()
: Disable autograd until it goes out of scope
if (torch_is_installed()) { x <- torch_tensor(runif(5), requires_grad = TRUE) with_no_grad({ x$sub_(torch_tensor(as.numeric(1:5))) }) x x$grad }
if (torch_is_installed()) { x <- torch_tensor(runif(5), requires_grad = TRUE) with_no_grad({ x$sub_(torch_tensor(as.numeric(1:5))) }) x x$grad }