Client Core Readahead Cache

The readahead cache buffers read data for all OrangeFS files. All reads less than about 1/2 the buffer size are translated into a block read. Note that the kernel module prevents requests larger than the I/O buffer size to be passed to the client core. Subsequent reads to the same block are satisfied from the buffer avoiding an I/O request to the server. A write to the file causes all buffers holding data from that file to be flushed - and thus all writes go directly to the server. When the kernel frees a "struct file" (generally when all processes on the node have closed the file) it initiates a "release" which also flushes all buffers holding data from that file. Otherwise read buffers are kept until all buffers are in use, at which point buffers are selected for reuse with an LRU algorithm. The readahead cache can cause non sequentially consistent semantics if multiple nodes are accessing a file and at least one node is writing to the file. A call to fsync() causes the cache to flush all buffers used by a file, thus an application can use this call to ensure that updates to a file by another node are eventually visible.

To use the readahead cache the OFS must be configured and built with the --enable-racache option. This compiles in support for the cache and sets a default of 32 buffers of 2MB each. Three /proc files control the readahead:

  • /proc/sys/pvfs2/readahead-count reports/sets the number of buffers
  • /proc/sys/pvfs2/readahead-size reports/sets the size of the buffers in bytes
  • /proc/sys/pvfs2/readahead-count-size reports/sets both the number and size of buffers at the same time

You can cat each file to report the current value of the parameter(s) or you can echo a string containing decimal numbers to set the parameter(s). Each time the parameters are set the client-core will flush the cache, free the buffers, and build a whole new set of buffers based on the new parameter(s). For example:

echo "32 2000000" > /proc/sys/pvfs2/readahead-count-size

creates 32 buffers with 2 million bytes each (note that buffers should generally be an even power of 2 in size). Setting the count and size to 0 disables the readahead buffer.