Автор: Deepak Vohra
Издательство: Apress
Год: 2016
Страниц: 429
Язык: английский
Формат: pdf (true), epub
Размер: 32.2 MB
Эта книга представляет собой практическое руководство по использованию проектов Apache Hadoop, включая, MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout и Apache Solr. От настройки среды для запуска примеров приложений каждая глава представляет собой практическое руководство по использованию проекта экосистемы Apache Hadoop. В то время как существует несколько книг по Apache Hadoop, большинство из них основано на основных проектах MapReduce и HDFS, и ни одна не обсуждает другие проекты экосистемы Apache Hadoop, и как они работают вместе как единая платформа разработки для больших данных.
This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project.
While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform.
I want to welcome you to the world of Hadoop. If you are novice or an expert looking to expand your knowledge of the technology, then you have arrived at the right place. This book contains a wealth of knowledge that can help the former become the latter. Even most experts in a technology focus on particular aspects. This book will broaden your horizon and add valuable tools to your toolbox of knowledge.
Hadoop is a large-scale distributed processing system designed for a cluster consisting of hundreds or thousands of nodes, each with multi-processor CPU cores. Hadoop is designed to distribute and process large quantities of data across the nodes in the cluster. Hadoop does not require any special hardware and is designed for commodity hardware. Hadoop may be used with any type of data, structured or unstructured. Hadoop is not a database, nor does Hadoop replace traditional database systems. Hadoop is designed to cover the scenario of storing and processing large-scale data, which most traditional systems do not support or do not support efficiently. Hadoop can join and aggregate data from many different sources and deliver results to other enterprise systems for further analysis. Hadoop is designed to process web-scale data in the order of hundreds of GB to 100s of TB, even to several PB.
What You Will Learn:
Set up the environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5
Run a MapReduce job
Store data with Apache Hive, and Apache HBase
Index data in HDFS with Apache Solr
Develop a Kafka messaging system
Stream Logs to HDFS with Apache Flume
Transfer data from MySQL database to Hive, HDFS, and HBase with Sqoop
Create a Hive table over Apache Solr
Develop a Mahout User Recommender System
Who This Book Is For:
Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.
Скачать Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools