Certain types of objects are tied to a given R session. Such objects cannot be saved to file by one R process and then later be reloaded in another R process and expected to work correctly. If attempted, we will often get an informative error but not always. For the same reason, these type of objects cannot be exported to another R processes(*) for parallel processing regardless of which parallelization framework we use. We refer to these type of objects as “non-exportable objects”.
(*) The exception might be when forked processes are used, i.e. plan(multicore).  However, such attempts to work around the underlying problem, which is non-exportable objects, should be avoided and considered non-stable.  Moreover, such code will fail to parallelize when using other future backends.
An example of a non-exportable object is a connection, e.g. a file connection. For instance, if you create a file connection,
con <- file("output.log", open = "wb")
cat("hello ", file = con)
flush(con)
readLines("output.log", warn = FALSE)
## [1] "hello "
it will not work when used in another R process. If we try, the result is “unknown”, e.g.
library(future)
plan(multisession)
f <- future({ cat("world!", file = con); flush(con) })
value(f)
## NULL
close(con)
readLines("output.log", warn = FALSE)
## [1] "hello "
In other words, the output "world!" written by the R worker is completely lost.
The culprit here is that the connection uses a so called external pointer:
str(con)
## Classes 'file', 'connection'  atomic [1:1] 3
##   ..- attr(*, "conn_id")=<externalptr> 
which is bound to the main R process and makes no sense to the worker. Ideally, the R process of the worker would detect this and produce an informative error message, but as seen here, that does not always occur.
To help avoiding exporting non-exportable objects by mistakes, which typically happens because a global variable is non-exportable, the future framework provides a mechanism for automatically detecting such objects. To enable it, do:
options(future.globals.onReference = "error")
f <- future({ cat("world!", file = con); flush(con) })
## Error: Detected a non-exportable reference ('externalptr') in one of the globals
## ('con' of class 'file') used in the future expression
Comment: The future.globals.onReference options is set to "ignore" by default due to the extra overhead "error" introduces, which can be significant for very large nested objects.  Furthermore, some subclasses of external pointers can be exported without causing problems.
The below table and sections provide a few examples of non-exportable R objects that you may run into when trying to parallelize your code, or simply from trying reload objects saved in a previous R session. If you identify other cases, please consider reporting them so they can be documented here and possibly even be fixed.
| Package | Examples of non-exportable types or classes | 
|---|---|
| base | connection ( externalptr) | 
| DBI | DBIConnection ( externalptr) | 
| keras | keras.engine.sequential.Sequential ( externalptr) | 
| magick | magick-image ( externalptr) | 
| ncdf4 | ncdf4 (custom reference; non-detectable) | 
| parallel | cluster and cluster nodes ( connection) | 
| raster | RasterLayer ( externalptr; not all) | 
| Rcpp | NativeSymbol ( externalptr) | 
| reticulate | python.builtin.function ( externalptr), python.builtin.module (externalptr) | 
| rJava | jclassName ( externalptr) | 
| rstan | stanmodel ( externalptr) | 
| ShortRead | FastqFile, FastqStreamer, FastqStreamerList ( connection) | 
| sparklyr | tbl_spark ( externalptr) | 
| terra | SpatRaster, SpatVector ( externalptr) | 
| udpipe | udpipe_model ( externalptr) | 
| xgboost | xgb.DMatrix ( externalptr) | 
| xml2 | xml_document ( externalptr) | 
These are illustrated in sections 'Packages that rely on external pointers' and 'Packages with other types of non-external objects' below.
library(future)
plan(multisession, workers = 2)
cl <- parallel::makeCluster(2L)
y <- parSapply(cl, X = 2:3, FUN = sqrt)
y
## [1] 1.414214 1.732051
y %<-% parSapply(cl, X = 2:3, FUN = sqrt)
y
## Error in summary.connection(connection) : invalid connection
If we turn on options(future.globals.onReference = "error"), we will catch this already when we create the future:
y %<-% parSapply(cl, X = 2:3, FUN = sqrt)
## Error: Detected a non-exportable reference ('externalptr') in one of the globals
## ('cl' of class 'SOCKcluster') used in the future expression
If an object carries an external pointer, it is likely that it can only be used in the R session where it was created.  If it is exported to and used in a parallel process, it will likely cause an error there.  As shown above, and in below examples, setting option future.globals.onReference to "error" will make future to scan for external pointer:s before launching the future on a parallel worker, and throw an error if one is detected.
However, there are objects with external pointer:s that can be exported, e.g. data.table objects of the data.table package is one such example.  In other words, the existence of a external pointer is just a suggestion for an object being non-exportable - it is not a sufficient condition.
Below are some examples of packages who produce non-exportable objects with external pointer:s.
DBI provides a unified database interface for communication between R and various database engines. Analogously to regular connections in R, DBIConnection objects can not safely be exported to another R process, e.g.
library(future)
options(future.globals.onReference = "error")
plan(multisession)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dummy %<-% print(con)
## Error: Detected a non-exportable reference ('externalptr') in one of the globals
## ('con' of class 'SQLiteConnection') used in the future expression
The keras package provides an R interface to Keras, which “is a high-level neural networks API developed with a focus on enabling fast experimentation”. The R implementation accesses the Keras Python API via reticulate. However, Keras model instances in R make use of R connections and external pointers, which prevents them from being exported to external R processes. For example, if the attempt to use a Keras model in a multisession workers, the worker will produce a run-time error:
library(keras)
library(future)
plan(multisession)
model <- keras_model_sequential()
f <- future(model %>% 
       layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 
       layer_dropout(rate = 0.4) %>% 
       layer_dense(units = 128, activation = 'relu') %>%
       layer_dropout(rate = 0.3) %>%
       layer_dense(units = 10, activation = 'softmax'))
model2 <- value(f)
Error in object$add(layer) : attempt to apply non-function
If we turn on options(future.globals.onReference = "error"), we will catch this already when we create the future:
Error: Detected a non-exportable reference ('externalptr') in one of the
globals ('model' of class 'keras.engine.sequential.Sequential') used in
the future expression
The magick package provides an R-level API for ImageMagick to work with images. When working with this API, the images are represented internally as external pointers of class 'magick_image' that cannot be be exported to another R process, e.g.
library(future)
plan(multisession)
library(magick)
frink <- magick::image_read("https://jeroen.github.io/images/frink.png")
f <- future(image_fill(frink, "orange", "+100+200", 20))
v <- value(f)
## Error: Image pointer is dead. You cannot save or cache image objects
## between R sessions.
If we set:
options(future.globals.onReference = "error")
we'll see that this is caught even before attempting to run this in parallel;
> f <- future(image_fill(frink, "orange", "+100+200", 20))
## Error: Detected a non-exportable reference ('externalptr' of class
## 'magick-image') in one of the globals ('frink' of class 'magick-image')
## used in the future expression
The raster package provides methods for working with spatial data, which are held in 'RasterLayer' objects. Not all but some of these objects use an external pointer. For example,
library(future)
plan(multisession)
library(raster)
r <- raster(system.file("external/test.grd", package = "raster"))
tf <- tempfile(fileext = ".grd")
s <- writeStart(r, filename = tf,  overwrite = TRUE)
f <- future({
  print(dim(r))
  print(dim(s))
})
Error: Detected a non-exportable reference ('externalptr') in one of the
globals ('s' of class 'RasterLayer') used in the future expression
Note that it is only the RasterLayer object s that carries an external pointer and cannot be passed on to an external worker.  In contrast, RasterLayer object r does not have this problem and would be fine to pass on to a worker.
Another example is Rcpp , which allow us to easily create R functions that are implemented in C++, e.g.
Rcpp::sourceCpp(code = "
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int my_length(NumericVector x) {
    return x.size();
}
")
so that:
x <- 1:10
my_length(x)
## [1] 10
However, since this function uses an external pointer internally, we cannot pass it to another R process:
library(future)
plan(multisession)
n %<-% my_length(x)
n
## Error in .Call(<pointer: (nil)>, x) : NULL value passed as symbol address
We can detect / protect against this using:
options(future.globals.onReference = "error")
n %<-% my_length(x)
## Error: Detected a non-exportable reference ('externalptr' of class
## 'NativeSymbol') in one of the globals ('my_length' of class 'function')
## used in the future expression
The reticulate package provides methods for creating and calling Python code from within R. If one attempt to use Python-binding objects from this package, we get errors like:
library(future)
plan(multisession)
library(reticulate)
os <- import("os")
pwd %<-% os$getcwd()
pwd
## Error in eval(quote(os$getcwd()), new.env()) : 
##   attempt to apply non-function
and by telling the future package to validate globals further, we get:
options(future.globals.onReference = "error")
pwd %<-% os$getcwd()
## Error: Detected a non-exportable reference ('externalptr') in one of the
## globals ('os' of class 'python.builtin.module') used in the future expression
Another reticulate example is when we try to use a Python function that we create ourselves as in:
cat("def twice(x):\n    return 2*x\n", file = "twice.py")
source_python("twice.py")
twice(1.2)
## [1] 2.4
y %<-% twice(1.2)
y
## Error in unserialize(node$con) : 
##   Failed to retrieve the value of MultisessionFuture from cluster node #1
##   (on 'localhost').  The reason reported was 'error reading from connection'
which, again, is because:
options(future.globals.onReference = "error")
y %<-% twice(1.2)
## Error: Detected a non-exportable reference ('externalptr') in one of the globals
## ('twice' of class 'python.builtin.function') used in the future expression
Here is an example that shows how rJava objects cannot be exported to external R processes.
library(future)
plan(multisession)
library(rJava)
.jinit() ## Initialize Java VM on master
Double <- J("java.lang.Double")
d0 <- new(Double, "3.14")
d0
## [1] "Java-Object{3.14}"
f <- future({
  .jinit() ## Initialize Java VM on worker
  new(Double, "3.14")
})
d1 <- value(f)
d1
## [1] "Java-Object<null>"
Although no error is produced, we see that the value d1 is a Java NULL Object.  As before, we can catch this by using:
options(future.globals.onReference = "error")
f <- future({
  .jinit() ## Initialize Java VM on worker
  new(Double, "3.14")
})
## Error: Detected a non-exportable reference ('externalptr') in one of the
## globals ('Double' of class 'jclassName') used in the future expression
The ShortRead package from Bioconductor implements efficient methods for sampling, iterating, and reading FASTQ files. Some of the helper objects used cannot be saved to file or exported to a parallel worker, because they comprise of connections and other non-exportable objects.
Here is an example that illustrates how an attempt to use a 'FastqStreamer' object created in the main R session fails when used in a parallel worker:
library(future)
plan(multisession)
# Adopted from example("FastqStreamer", package = "ShortRead")
library(ShortRead)
sp <- SolexaPath(system.file("extdata", package="ShortRead"))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")
fs <- FastqStreamer(fl, 50)
reads %<-% yield(fs)
reads
## Error in status(update = TRUE) : invalid FastqStreamer
To catch this earlier, and to get a more informative error message, we do as before;
options(future.globals.onReference = "error")
reads %<-% yield(fs)
## Error: Detected a non-exportable reference ('externalptr') in one of the
## globals ('fs' of class 'FastqStreamer') used in the future expression
library(future)
plan(multisession)
library(sparklyr)
sc <- spark_connect(master = "local")
file <- system.file("misc", "exDIF.csv", package = "utils")
data <- spark_read_csv(sc, "exDIF", file)
d %<-% dim(data)
d
## Error in unserialize(node$con) : 
##   Failed to retrieve the value of MultisessionFuture (<none>) from cluster
## SOCKnode #1 (PID 29864 on localhost 'localhost'). The reason reported was
## 'unknown input format'. Post-mortem diagnostic: A process with this PID
## exists, which suggests that the localhost worker is still alive.
To catch this as soon as possible,
options(future.globals.onReference = "error")
d %<-% dim(data)
## Error: Detected a non-exportable reference ('externalptr') in one of
## the globals ('data' of class 'tbl_spark') used in the future expression
library(future)
plan(multisession)
library(terra)
file <- system.file("ex/lux.shp", package = "terra")
v <- vect(file)
dv %<-% dim(v)
dv
Error in x@ptr$nrow() : external pointer is not valid
file <- system.file("ex/elev.tif", package="terra")
r <- rast(file)
dr %<-% dim(r)
dr
## Error in .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: (nil)>,  : 
##  NULL value passed as symbol address
To catch this as soon as possible,
options(future.globals.onReference = "error")
dv %<-% dim(v)
## Error: Detected a non-exportable reference ('externalptr' of class
## 'RegisteredNativeSymbol') in one of the globals ('v' of class
## 'SpatVector') used in the future expression
dr %<-% dim(data)
## Error: Detected a non-exportable reference ('externalptr' of class
## 'RegisteredNativeSymbol') in one of the globals ('r' of class
## 'SpatRaster') used in the future expression
For some workarounds, see help("wrap", package = "terra").
library(future)
plan(multisession)
library(udpipe)
udmodel <- udpipe_download_model(language = "dutch")
udmodel <- udpipe_load_model(file = udmodel$file_model)
x %<-% udpipe_annotate(udmodel, x = "Ik ging op reis en ik nam mee.")
x
## Error in udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer, tagger,  : 
##   external pointer is not valid
To catch this as soon as possible,
options(future.globals.onReference = "error")
x %<-% udpipe_annotate(udmodel, x = "Ik ging op reis en ik nam mee.")
## Error: Detected a non-exportable reference ('externalptr') in one of the
## globals ('udmodel' of class 'udpipe_model') used in the future expression
Now, it is indeed possible to parallelize \pkg{udpipe} calls. For details on how to do this, see the 'UDPipe Natural Language Processing - Parallel' vignette that comes with the \pkg{udpipe} package.
The xgboost package provides fast gradient-boosting methods. Some of its data structures use external pointers. For example,
library(future)
plan(multisession)
library(xgboost)
data(agaricus.train, package = "xgboost")
train <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
class(train)
## [1] "xgb.DMatrix"
d <- dim(dtrain)
d
## [1] 6513  126
works just fine but if we attempt to pass on the 'xgb.DMatrix' object train to an external worker, we silently get a incorrect value:
f <- future(dim(dtrain))
d <- value(f)
d
## NULL
This is unfortunate, but we can at least detect this by:
options(future.globals.onReference = "error")
f <- future(dim(dtrain))
## Error: Detected a non-exportable reference ('externalptr' of class 'xgb.DMatrix')
## in one of the globals ('dtrain' of class 'xgb.DMatrix') used in the future expression
Another example is XML objects of the xml2 package, which may produce evaluation errors (or just invalid results depending on how they are used), e.g.
library(future)
plan(multisession)
library(xml2)
doc <- read_xml("<body></body>")
f <- future(xml_children(doc))
value(f)
## Error: external pointer is not valid
The future framework can help detect this before sending off the future to the worker;
options(future.globals.onReference = "error")
f <- future(xml_children(xml))
## Error: Detected a non-exportable reference ('externalptr') in one of the
## globals ('xml' of class 'xml_document') used in the future expression
One workaround when dealing with non-exportable objects is to look for ways to encode the object such that it can be exported, and the decoded on the receiving end.  With xml2, we can use xml2::xml_serialize() and xml2::xml_unserialize() to do this.  Here is how we can rewrite the above example such that we can pass xml2 object back and forth between the main R session and R workers:
## Encode the 'xml_document' object 'doc' as a 'raw' object
doc_raw <- xml_serialize(doc, connection = NULL)
f <- future({
  ## In the future, reconstruct the 'xml_document' object
  ## from the 'raw' object
  doc <- xml_unserialize(doc_raw)
  ## Continue as usual
  children <- xml_children(doc)
  ## Send back a 'raw' representation of the 'xml_nodeset'
  ## object 'children'
  xml_serialize(children, connection = NULL)
})
## Reconstruct the 'xml_nodeset' object in the main R session
children <- xml_unserialize(value(f))
Package ncdf4 provides an R API to work with data that live in netCDF files. For example, we can create a simple netCDF file that holds a variable 'x':
library(ncdf4)
x <- ncvar_def("x", units="count", dim=list())
file <- nc_create("example.nc", x)
ncvar_put(file, x, 42)
nc_close(file)
We can now use this netCDF file next time we start R, e.g.
library(ncdf4)
file <- nc_open("example.nc")
y <- ncvar_get(file)
y
## [1] 42
However, it would fail if we attempt to use ncdf, which is an object of class 'ncdf4', in a parallel worker, we will get an error:
library(future)
plan(multisession)
library(ncdf4)
file <- nc_open("example.nc")
f <- future(ncvar_get(file))
y <- value(f)
## Error in R_nc4_inq_varndims: NetCDF: Not a valid ID
## Error in ncvar_ndims(ncid, varid) : error returned from C call
This is because ncdf4 objects make use of internal references that are unique to the R session where they were created.  However, these are not formal external pointer:s, meaning the future framework cannot detect them.  That is, using options(future.globals.onReference = "error") is of no help here.
A workaround is to open the netCDF in each worker, e.g.
library(future)
plan(multisession)
library(ncdf4)
f <- future({
  file <- nc_open("example.nc")
  value <- ncvar_get(file)
  nc_close(file)
  value
})
y <- value(f)
y
## [1] 42