编程技术网

关注微信公众号,定时推送前沿、专业、深度的编程技术资料。

 找回密码
 立即注册

QQ登录

只需一步,快速开始

极客时间

将数据从 Apache Pig 存储到 SequenceFile:Storing data to SequenceFile from Apache Pig

user376536 hadoop 2022-5-7 16:27 17人围观

腾讯云服务器
将数据从 Apache Pig 存储到 SequenceFile的处理方法

Apache Pig 可以使用 PiggyBank SequenceFileLoader 从 Hadoop 序列文件加载数据:

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:

注册/home/hadoop/pig/contrib/piggybank/java/piggybank.jar;

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

是否还有允许从 Pig 写入 Hadoop 序列文件的库?

Is there also a library out there that would allow writing to Hadoop sequence files from Pig?

问题解答

这只是实现 StoreFunc 的问题.

It's just a matter of implementing a StoreFunc to do so.

这现在是可能的,尽管一旦 Pig 0.7 发布它会变得容易一些,因为它包括对加载/存储界面的完全重新设计.

This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.

Hadoop 扩展包" Twitter 即将开源 github,包括用于生成基于 Google Protocol Buffers 的加载和存储函数的代码(建立在输入/输出格式上——显然你已经有了用于序列文件的那些).如果您需要有关如何做一些不那么琐碎的事情的示例,请查看它.不过应该相当简单.

The "Hadoop expansion pack" Twitter is about to open source open-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.

这篇关于将数据从 Apache Pig 存储到 SequenceFile的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程技术网(www.editcode.net)!

腾讯云服务器

相关推荐

阿里云服务器
关注微信
^