rtape

An R package for storing large collections of arbitrary R objects as a tape-like files and manipulating them in a memory efficient way.

Package page on CRAN
Package source on Github
Full documentation in pdf
Quick tour

Quick tour

Adding object to the tape

To add some object, say x to some tape file, say tape.tape just call rtapeAdd("tape.tape",x). If the tape file does not exist, it will be created. Following calls to rtapeAdd will append new objects to the end of the tape.

Note that rtape does not store objects' names like save; objects are identified only by their order (or, obviously, by their contents). This way you can directly save some calculation result, for instance rtapeAdd("tape.tape",3+4).

Adding from parallel processes

When two or more processes try to add to a single tape at once, there is a risk of simultaneous write that would corrupt the tape. In order to resolve this one should add safe="retry" to all possibly concurrent rtapeAdd calls.

This argument works via periodically checked dirlock. One can get more control over this process by using safe="try" mode and writing custom retry procedure.

Reading tape

Once one has written something to the tape, it is time to retrieve the stored data. The rtapeAsList function reads the whole tape and returns as a list of consecutive objects.

This function loads all the data in the tape to the memory, what is obviously contrary to what one wants to do when dealing with big data. In this case function mapping should be used.

Mapping function on tapes

rtapeLapply applies a given function to the each object on the tape and returns all the results as a list.

Mapping is done in such way that only the currently processed object resides in memory, thus one can easily manage tapes larger than available RAM.

Rerecording tape

Results of function mapping can be also directly written to the other tape; to this end there is rtapeRerecord function.

This function is wise enough to perform rerecording in-place i.e., when given the same source and target tape. In this case, it creates a temporary tape, writes results there and finally replaces the source tape with it. This way no data is lost in case of error.