Hadoop Tutorials.CO.IN
Big Data - Hadoop - Hadoop Ecosystem - NoSQL - Spark

Internals of HDFS File Read Operations

by Rahul Hargude

Since HDFS plays with large sets of data it is interesting to know how it manages File I/O operations over it when requested by client. In the following section we are going to understand the flow of HDFS file read operation. Let's have a look at pictorial representation of the data read operation and components involved in it.

HDFS File Read Operations

Before going to understand read operation let's take a walkthrough of few important components.

HDFS Client: HDFS client interacts with Namenode and Datanode on behalf of user to fulfil user request. User establishes communication with HDFS through File System API and normal I/O operations, processing of user request and providing response over it is carried out by File System API processes.

Namenode: Namenode is the masternode of HDFS cluster. It stores metadata information and edit log in it. Metadata information contains addresses of block locations of Datanodes, this information is used for file read and write operation to access the blocks in a HDFS cluster.

Datanode: Datanodes also known as slave nodes holds the actual data. Datanode only stores block, a block is what is used to store and process the data. Data resides within blocks of Datanode. Datanode gives periodic heartbeat signals to Masternode to indicate that it is alive and can be used to store and retrieve data

Packet: A Packet is a small chunk of data which is used during transmission; packet is a subset of block. The default size of a block is around 64 MB or 128 MB, it will create a huge network overload if we transfer data of the size of blocks, hence client API transfers this block in small chunks known as packets.

Next we will go through data flow in HDFS file read operation.

HDFS File Read workflow

1. To start the file read operation, client opens the required file by calling open() on Filesystem object which is an instance of DistributedFileSystem. Open method initiate HDFS client for the read request.

2. DistributedFileSystem interacts with Namenode to get the block locations of file to be read. Block locations are stored in metadata of namenode. For each block,Namenode returns the sorted address of Datanode that holds the copy of that block.Here sorting is done based on the proximity of Datanode with respect to Namenode, picking up the nearest Datanode first.

3. DistributedFileSystem returns an FSDataInputStream, which is an input stream to support file seeks to the client. FSDataInputStream uses a wrapper DFSInputStream to manage I/O operations over Namenode and Datanode. Following steps are performed in read operation.

a) Client calls read() on DFSInputStream. DFSInputStream holds the list of address of block locations on Datanode for the first few blocks of the file. It then locates the first block on closest Datanode and connects to it.

b) Block reader gets initialized on target Block/Datanode along with below information:

  • Block ID.
  • Data start offset to read from.
  • Length of data to read.
  • Client name.

c) Data is streamed from the Datanode back to the client in form of packets, this data is copied directly to input buffer provided by client.DFS client is reading and performing checksum operation and updating the client buffer

d) Read () is called repeatedly on stream till the end of block is reached. When end of block is reached DFSInputStream will close the connection to Datanode and search next closest Datanode to read the block from it.

4. Blocks are read in order, once DFSInputStream done through reading of the first few blocks, it calls the Namenode to retrieve Datanode locations for the next batch of blocks.

5. When client has finished reading it will call Close() on FSDataInputStream to close the connection.

6. If Datanode is down during reading or DFSInputStream encounters an error during communication, DFSInputStream will switch to next available Datanode where replica can be found. DFSInputStream remembers the Datanode which encountered an error so that it does not retry them for later blocks.

As you can see that client with the help of Namenode gets the list of best Datanode for each block and communicates directly with Datanode to retrieve the data. Here Namenode serves the address of block location on Datanode rather than serving data itself which could become the bottleneck as the number of clients grows. This design allows HDFS to scale up to a large numbers of clients since the data traffic is spread across all the Datanodes of clusters.


Follow us on Twitter

Recommended for you