When operating with files in programming, we often read entire file content into physical memory. While writing the code doesn’t take much time, what happens behind the scenes is far more complex. During the process, your program has to invoke multiple system calls between buffers in kernel and user spaces back and forth which is considered inefficient. When reading a large file, mmap(memory-mapped file) is preferred in the terms of performance.
What is mmap and how does it work?
A memory-mapped file is a technique that allows developers to map the content of a file to a range of virtual memory addresses.
Once a file is mapped, developers can read the file content via a memory pointer. The actual reads from disk are performed after the pointer accessed a specific location, in a lazy manner. Generally, the chunk of mapped memory is considered kernel space in most operating systems. The benefits are quite obvious with these mechanisms:
- Pointers allow developers to seek and change data flexibly.
- On-demand loading saves memory space, allowing the program to handle larger files.
- Reading/writing mapped data in the kernel space doesn’t involve excessive copies between spaces.
Example in Golang for reading a file using mmap
package main
import (
"fmt"
"os"
"github.com/edsrzf/mmap-go"
)
func main() {
f, err := os.OpenFile("1.txt", os.O_RDWR, 0755) // initialise the file object
if err != nil {
panic(err)
}
defer f.Close() // close the file object afterwards
m, err := mmap.Map(f, mmap.RDWR, 0) // create memory mapping
// m, err := mmap.MapRegion(f, 100, mmap.RDWR, 0, 0) // create a regional mapping
// in the regional mapping:
// the secondary paramater 100 represents the mapping length
// The last parameter represents the offset, the offset must be a multiple of the value of os.Getpagesize().
defer m.Unmap() // unmap the file afterwords
for _, v := range m { // fetch over the file content
fmt.Print(string(v))
}
m[0] = 65 // replace the first character with 'A'
m.Flush() // synchronise the memory content to the file
}