Explain Hadoop Distributed File System(HDFS).

227views

written 5.0 years ago by

• modified 4.6 years ago

Hadoop file system was developed using distributed file system design. It is run on commodity hardware.

Unlike other distributed systems, HDFS is highly fault over and designed using low cost hardware.

HDFS holds very large amount of data and provides easier access to store such huge data, the files are store across multiple manacles. These files are stored in redundant fashion to rescue the systems from possible data losses in case of failure.

Features of HDFS

It is suitable for the distributional storage and processing. Hadoop provides a command interface to internet with GHDFS, The built-in servers are nanenode and data node help users to easily drive the status of cluster ,streaming access to file system data. HDFS provides file permission and authentication HDFS architecture.

Following elements. HDFS follows the master-slave relation and it has the name node

The nane node is the commodity hardware that contains the GWV/ Linux operating system and the nane node software. It is a software that can be run on data hardware. The system having the nanenode gets as the faster servers and it does the following tasks.

Manages the files system manespare. Regulation clients access to files, It also execute file system operations such as naming, and opening files and directories data node.

The data node is a commodity hardware having the GNU/Linux operating system and data node software. for every node (commodity hardware / system) in a cluster, there will be a data node these nodes manages the data storage of their system.

Data nodes perform read write operation on the file systems, as per client request. They also perform operations such as block creation, detection and replication according to the instruction of the name node block.

Generally, the user data is stored in the files of HDFS. The file in a file system will be divided into one on move segments and/or stored in individual data nodes. These file segments are called as blocks in other words, the minimum amount of data that HDFS can read or write is called a block. The default block size is 64 MB, but it can be increased as per the need to change in HDFS configuration goals of HDFS.

Fault detection and recovery : Since HDFS includes a large number of commodity hardware, failure of components is frequent. Therefore HDFS should have mechanism for quick and automatic fault detection and recovery.

Huge data sets : HDFS should have hundreds of nodes per cluster to move the applications having huge data sets.

Hardware at data : A requested tasks can be done efficiently, when the computation takes place near the data. Especially where huge data base are involved it reduces the network traffic and increase the through PD

enter image description here

ADD COMMENT EDIT