Автор: Kevin Sitto, Marshall Presser
Издательство: O'Reilly Media
Год: 2015
ISBN: 9781491947937
Формат: pdf
Страниц: 132
Размер: 6,4 mb
Язык: English
You’ll quickly understand how Hadoop’s projects, subprojects, and related technologies work together.
Each chapter introduces a different topic—such as core technologies or data transfer—and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you’ll have a good grasp of the playing field.
Topics include:
– Core technologies—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark
– Database and data management—Cassandra, HBase, MongoDB, and Hive
– Serialization—Avro, JSON, and Parquet
– Management and monitoring—Puppet, Chef, Zookeeper, and Oozie
– Analytic helpers—Pig, Mahout, and MLLib
– Data transfer—Scoop, Flume, distcp, and Storm
– Security, access control, auditing—Sentry, Kerberos, and Knox
– Cloud computing and virtualization—Serengeti, Docker, and Whirr