Java API操作Hdfs的示例详解

更新时间：2022年08月24日 11:35:49 作者：bitcarmanlee

这篇文章主要介绍了Java API操作Hdfs详细示例,遍历当前目录下所有文件与文件夹，可以使用listStatus方法实现上述需求，本文通过实例代码给大家介绍的非常详细，需要的朋友可以参考下

1.遍历当前目录下所有文件与文件夹

可以使用listStatus方法实现上述需求。
listStatus方法签名如下

  /**
   * List the statuses of the files/directories in the given path if the path is
   * a directory.
   * 
   * @param f given path
   * @return the statuses of the files/directories in the given patch
   * @throws FileNotFoundException when the path does not exist;
   *         IOException see specific implementation
   */
  public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException, 
                                                         IOException;

可以看出listStatus只需要传入参数Path即可，返回的是一个FileStatus的数组。
而FileStatus包含有以下信息

/** Interface that represents the client side information for a file.
 */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class FileStatus implements Writable, Comparable {

  private Path path;
  private long length;
  private boolean isdir;
  private short block_replication;
  private long blocksize;
  private long modification_time;
  private long access_time;
  private FsPermission permission;
  private String owner;
  private String group;
  private Path symlink;
  ....

从FileStatus中不难看出，包含有文件路径，大小，是否是目录，block_replication, blocksize…等等各种信息。

import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.slf4j.LoggerFactory

object HdfsOperation {
	
	val logger = LoggerFactory.getLogger(this.getClass)
	
	def tree(sc: SparkContext, path: String) : Unit = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val fsPath = new Path(path)
		val status = fs.listStatus(fsPath)
		for(filestatus:FileStatus <- status) {
			logger.error("getPermission is: {}", filestatus.getPermission)
			logger.error("getOwner is: {}", filestatus.getOwner)
			logger.error("getGroup is: {}", filestatus.getGroup)
			logger.error("getLen is: {}", filestatus.getLen)
			logger.error("getModificationTime is: {}", filestatus.getModificationTime)
			logger.error("getReplication is: {}", filestatus.getReplication)
			logger.error("getBlockSize is: {}", filestatus.getBlockSize)
			if (filestatus.isDirectory) {
				val dirpath = filestatus.getPath.toString
				logger.error("文件夹名字为: {}", dirpath)
				tree(sc, dirpath)
			} else {
				val fullname = filestatus.getPath.toString
				val filename = filestatus.getPath.getName
				logger.error("全部文件名为: {}", fullname)
				logger.error("文件名为: {}", filename)
			}
		}
	}
}

如果判断fileStatus是文件夹，则递归调用tree方法，达到全部遍历的目的。

2.遍历所有文件

上面的方法是遍历所有文件以及文件夹。如果只想遍历文件，可以使用listFiles方法。

	def findFiles(sc: SparkContext, path: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val fsPath = new Path(path)
		val files = fs.listFiles(fsPath, true)
		while(files.hasNext) {
			val filestatus = files.next()
			val fullname = filestatus.getPath.toString
			val filename = filestatus.getPath.getName
			logger.error("全部文件名为: {}", fullname)
			logger.error("文件名为: {}", filename)
			logger.error("文件大小为: {}", filestatus.getLen)
		}
	}

  /**
   * List the statuses and block locations of the files in the given path.
   * 
   * If the path is a directory, 
   *   if recursive is false, returns files in the directory;
   *   if recursive is true, return files in the subtree rooted at the path.
   * If the path is a file, return the file's status and block locations.
   * 
   * @param f is the path
   * @param recursive if the subdirectories need to be traversed recursively
   *
   * @return an iterator that traverses statuses of the files
   *
   * @throws FileNotFoundException when the path does not exist;
   *         IOException see specific implementation
   */
  public RemoteIterator<LocatedFileStatus> listFiles(
      final Path f, final boolean recursive)
  throws FileNotFoundException, IOException {
  ...

从源码可以看出，listFiles 返回一个可迭代的对象RemoteIterator<LocatedFileStatus>，而listStatus返回的是个数组。同时，listFiles返回的都是文件。

3.创建文件夹

	def mkdirToHdfs(sc: SparkContext, path: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val result = fs.mkdirs(new Path(path))
		if (result) {
			logger.error("mkdirs already success!")
		} else {
			logger.error("mkdirs had failed!")
		}
	}

4.删除文件夹

	def deleteOnHdfs(sc: SparkContext, path: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val result = fs.delete(new Path(path), true)
		if (result) {
			logger.error("delete already success!")
		} else {
			logger.error("delete had failed!")
		}
	}

5.上传文件

	def uploadToHdfs(sc: SparkContext, localPath: String, hdfsPath: String): Unit = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		fs.copyFromLocalFile(new Path(localPath), new Path(hdfsPath))
		fs.close()
	}

6.下载文件

	def downloadFromHdfs(sc: SparkContext, localPath: String, hdfsPath: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		fs.copyToLocalFile(new Path(hdfsPath), new Path(localPath))
		fs.close()
	}

到此这篇关于Java API操作Hdfs详细示例的文章就介绍到这了,更多相关Java API操作Hdfs内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家！

您可能感兴趣的文章:

有关Java中的BeanInfo介绍
Java的BeanInfo在工作中并不怎么用到，我也是在学习spring源码的时候，发现SpringBoot启动时候会设置一个属叫"spring.beaninfo.ignore"，网上一些地方说这个配置的意思是是否跳过java BeanInfo的搜索，但是BeanInfo又是什么呢？本文我们将对此做一个详细介绍
2021-09-09
SpringBoot实现热部署的三种方式
本文主要介绍了SpringBoot实现热部署的三种方式,主要包括配置pom.xml文件,使用插件的执行命令mvn spring-boot:run启动项,使用springloader本地启动修改jvm参数,使用devtools工具包,感兴趣的可以了解一下
2023-12-12
Java最简单的DES加密算法实现案例
下面小编就为大家带来一篇Java最简单的DES加密算法实现案例。小编觉得挺不错的，现在就分享给大家，也给大家做个参考。一起跟随小编过来看看吧
2017-06-06
mybatis postgresql 批量删除操作方法
PostgreSQL是一种特性非常齐全的自由软件的对象-关系型数据库管理系统（ORDBMS），这篇文章主要介绍了mybatis postgresql 批量删除操作,需要的朋友可以参考下
2020-02-02
Java实现二叉树的基本操作详解
这篇文章主要为大家详细介绍了Java数据结构与算法中二叉树的基本操作，文中的示例代码讲解详细，具有一定的学习价值，感兴趣的小伙伴可以了解一下
2022-10-10
深度解析Spring内置作用域及其在实践中的应用
这篇文章主要详细介绍了Spring内置的作用域类型及其在实践中的应用，文中有详细的代码示例，对我们的饿学习或工作有一定的参考价值，感兴趣的同学可以借鉴阅读
2023-06-06
Java 将Excel转为SVG的方法
本文以Java示例展示如何将Excel文档转为SVG格式。通过本文中的方法，在将Excel转为SVG时，如果sheet工作表中手动设置了分页，则将每个分页的内容单独保存为一个svg文件，如果sheet工作表中没有设置分页，则将Excel sheet表格中默认的分页范围保存为svg。
2021-05-05
SpringBoot 导出数据生成excel文件返回方式
这篇文章主要介绍了SpringBoot 导出数据生成excel文件返回方式，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2020-10-10
每日六道java新手入门面试题,通往自由的道路--线程池
这篇文章主要为大家分享了最有价值的6道线程池面试题，涵盖内容全面，包括数据结构和算法相关的题目、经典面试编程题等，对hashCode方法的设计、垃圾收集的堆和代进行剖析，感兴趣的小伙伴们可以参考一下
2021-06-06
使用Cloud Studio构建SpringSecurity权限框架(腾讯云 Cloud Studio 实战训练
随着云计算技术的成熟和普及，传统编程能力和资源以云服务的形式开放出来，从中间件、数据库等水平能力服务组件到人脸识别、鉴权服务等基本业务服务组件很容易的在云端获取，本文介绍使用Cloud Studio构建SpringSecurity权限框架的相关知识，感兴趣的朋友一起看看吧
2023-08-08