my question:

how do they leverage the characteristics of NVM e.g. non-volatile, byte-addressable

  1. for write operation, it allocates storage in requested operation granularity instead of block allocation.
the architecture of assise

🤗 Assise tries to leverage the high speed of read and write operation of NVM, integrating it into a distributed file system. And it improves I/O performance.

Assise allocates storage size in a dynamic granularity in NVM instead of block allocation like tradition block storage device.

For write operation, there are 2 stages.

  1. libfs directly writes to a process-local cache in NVM

For read operation,

  1. libfs first checks local cache

reads from remote nodes will be cached in local dram

How to maintain cache coherence?

cc-nvm tracks write order via update log in process-local NVM. Each posix call that updates state is recorded. cc-nvm leverages the ordering guarantee of RDMA to write the log in order to replicas.

cc-nvm serializes concurrent access to shared state by untrusted libfses and recovers the same serialization after a crash via leases. (leases are just like reader-writer lock.

hierarchical coherence

to localize coherence enforcement, leases are delegated hierarchically. libfses will need to request for leases from local sharedfs, then sharedfs may need to forward the request to the root cluster manager. the leases will be recycled or be expired by cluster manager every 5 seconds.

This allows cc-nvm to migrate lease management to the sharedfs that is local to the libfses requesting them.

hierarchical structure allows cc-nvm to minimize network communication and lease delegation overhead.

crash recovery and fail-over

libfs recovery

the local sharedfs will evicts the dead libfs update log, recovering all completed writes and then expires its leases.

sharedfs recovery

os crash, use nvm to dramatically accelerate os reboot by storing a checkpoint of a freshly booted os. by examining the sharedfs log stored in nvm, it can initiate recovery for all previously running libfs instances.

cache replica fail-over

to avoid waiting for node recovery after a power failure, it immediately fail-over to cache replica. writes to the file system can invalidate cached data of the failed node. to track writes, the cluster manager maintains an epoch number, which it increments on node failure and recovery. all sharedfs shared a per-epoch bitmap in a sparse file indicating what inodes have been written during each epoch.

node recovery

when a node crashes, the cluster manager make sure that ll of the node’s leases expire before the node can rejoin. A recovering sharedfs contacts an online sharedfs to collect relevant epoch bitmaps. sharedfs then invalidate every block from every file that has been written since its crash.

strength

  1. use hierarchical leases management to localize the acquirement of the lease, thus reducing the network overhead

weakness

  1. the cluster manager may crash, thus causing single node failure.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BilyZ

BilyZ

master of SYSU, do research on computer system software