Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks

Автор: literator от 5-10-2024, 13:47, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow
Автор: Manoj Kumar
Издательство: Orange Education Pvt Ltd, AVA
Год: 2024
Страниц: 533
Язык: английский
Формат: epub (true)
Размер: 111.4 MB

Master Databricks to Transform Data into Strategic Insights for Tomorrow’s Business Challenges. In today’s data-driven world, mastering data engineering is crucial for driving innovation and delivering real business impact. Databricks is one of the most powerful platforms which unifies data, analytics and AI requirements of numerous organizations worldwide. Mastering Data Engineering and Analytics with Databricks goes beyond the basics, offering a hands-on, practical approach tailored for professionals eager to excel in the evolving landscape of data engineering and analytics. This book uniquely blends foundational knowledge with advanced applications, equipping readers with the expertise to build, optimize, and scale data pipelines that meet real-world business needs. With a focus on actionable learning, it delves into complex workflows, including real-time data processing, advanced optimization with Delta Lake, and seamless ML integration with MLflow—skills critical for today’s data professionals. Drawing from real-world case studies in FMCG and CPG industries, this book not only teaches you how to implement Databricks solutions but also provides strategic insights into tackling industry-specific challenges. From setting up your environment to deploying CI/CD pipelines, you'll gain a competitive edge by mastering techniques that are directly applicable to your organization’s data strategy. By the end, you’ll not just understand Databricks—you’ll command it, positioning yourself as a leader in the data engineering space. This book is designed for data engineering students, aspiring data engineers, experienced data professionals, cloud data architects, data scientists and analysts. A basic understanding of data engineering concepts, familiarity with data analytics, and some experience with cloud computing or programming languages such as Python or SQL will help readers fully benefit from the book’s content.
 

PostgreSQL 17 QuickStart Pro: Add expertise around WAL processing, JSON table, IO performance, logical replication and index vacuuming

Автор: literator от 5-10-2024, 01:53, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: PostgreSQL 17 QuickStart Pro: Add expertise around WAL processing, JSON table, IO performance, logical replication and index vacuuming
Автор: Tessa Vorin
Издательство: GitforGits
Год: 2024
Страниц: 257
Язык: английский
Формат: pdf, azw3, epub, mobi
Размер: 10.1 MB

PostgreSQL 17 QuickStart Pro is the definitive hands-on, practical book for professionals at every level, from entry-level administrators to seasoned experts. It provides rapid learning and practical implementation of PostgreSQL 17, focusing on the latest features and best practices to effectively manage, configure, and optimize PostgreSQL databases—and it does so effectively. The book begins by using the Titanic dataset to illustrate practical examples of upgrade strategies, post-upgrade validation, and database configuration. Next, it covers cluster administration, configuration settings, and performance tracking. You will master the management of permissions and roles through intricate role hierarchies, authentication methods, and security settings. Next, we'll optimize server performance, plan queries, and manage resources based on real performance data. The next section dives deep into complicated data types, bulk data operations, advanced indexing methods, and the creation of triggers and functions, all with an emphasis on effective data management. Next, you will learn about table partitioning strategies, performing physical and logical backups, database restoration, and process automation using BART. We then move on to streaming replication, where we will configure, administer, and monitor replication to ensure optimal uptime. Finally, we will explore point-in-time recovery (PITR), which allows us to restore databases to specific points in time by replaying WAL logs. In short, this book will equip database administrators with the knowledge and skills to confidently handle PostgreSQL 17 databases.
 

Learn SQL in less than 20 hours: Understand the Language and Do It Yourself

Автор: literator от 3-10-2024, 16:08, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Learn SQL in less than 20 hours: Understand the Language and Do It Yourself
Автор: Juan Pablo Romero Aguirre
Издательство: Independently published
Год: 2024
Страниц: 213
Язык: английский
Формат: pdf, epub
Размер: 10.1 MB

Learn SQL in Less Than 20 Hours is your essential guide to mastering SQL and database management. Designed for beginners, this book takes you step by step through the fundamentals and basic commands of SQL, with a practical approach that allows you to start applying what you learn from day one. In the information age, data is the most valuable resource. Whether you're interested in programming, data analysis, or simply want to acquire a new skill, this book provides you with a solid foundation in SQL, an indispensable language for working with databases and extracting value from information. This book is structured as a practical course divided into five days, combining theory and exercises that help you consolidate your knowledge. Each chapter is designed to simulate the time you would spend in a real training environment, ensuring effective and results-oriented learning. At the end of the book, you will find an appendix with practical exercises to solidify your knowledge and become proficient in SQL. Additionally, you have access to a GitHub repository where exercises and code examples are regularly updated, ensuring you always have up-to-date material to practice with.
 

ScyllaDB in Action (Final Release)

Автор: literator от 2-10-2024, 19:33, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: ScyllaDB in Action (Final Release)
Автор: Bo Ingram
Издательство: Manning Publications
Год: 2025
Страниц: 394
Язык: английский
Формат: pdf (true)
Размер: 10.9 MB

Build, maintain, and run databases that are easy to scale and quick to query—all with ScyllaDB. ScyllaDB in Action is your guide to everything you need to know about ScyllaDB, from your very first queries to running it in a production environment. It starts you with the basics of creating, reading, and deleting data and expands your knowledge from there. You'll soon have mastered everything you need to build, maintain, and run an effective and efficient database. This book teaches you ScyllaDB the best way—through hands-on examples. Dive into the node-based architecture of ScyllaDB to understand how its distributed systems work, how you can troubleshoot problems, and how you can constantly improve performance. ScyllaDB is a versatile NoSQL database that can move large volumes of data fast. Very, very, very fast. This drop-in replacement for Cassandra takes full advantage of modern multi-core hardware and scales to handle large real-time data workloads with incredibly low latency. It features built-in monitoring and management tools, and its efficient use of computing resources can save a lot of money on high-volume applications. ScyllaDB in Action is written for anyone looking to learn ScyllaDB or work with it. To get the best out of it, you should have some basic familiarity with SQL. You’ve probably written a SELECT statement before, and that knowledge will assist you throughout the book as you learn about Scylla. If you’re a database expert, that’s okay too! You’ll get to break some habits and pick up some new ones to effectively use ScyllaDB. You should also have some experience with a programming language—preferably Python, as you’ll use it to build the sample application to learn about the database driver and its client-side features.
 

Mastering Microsoft Access 2024

Автор: literator от 29-09-2024, 20:22, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Mastering Microsoft Access 2024: A Comprehensive Guide to Designing, Managing, and Optimizing Your Databases with Step-by-Step Tutorials and Expert Tips - From Novice to Database
Автор: Thaddeus Locke
Издательство: Independently published
Год: 2024
Страниц: 718
Язык: английский
Формат: epub
Размер: 28.0 MB

Welcome to the ultimate guide to mastering Microsoft Access 2024! In today's data-driven world, organizing and analyzing information is paramount, and Microsoft Access remains one of the most powerful tools for managing databases efficiently. Whether you're a seasoned professional seeking to enhance your skills or a newcomer eager to harness the potential of Access, this book is your comprehensive companion on the journey to becoming an Access expert. In the Chapter 1, you will learn about the various vocabularies that are being used in Microsoft Access databases. In this section, you will learn about various databases, tables, records fields, and values and you will also learn about other relational databases. Objects in the Access Database which include tables, queries, display forms that include data entry, reports macros and VBA will also be discussed in detail in this chapter. Lately, you will learn about the various step design methods that should be embarked upon when making plans for database objects.
 

Windows 11 fur Einsteiger - September 2024

Автор: magnum от 28-09-2024, 16:45, Коментариев: 0

Категория: КНИГИ » ОС И БД

Windows 11 fur Einsteiger - September 2024Название: Windows 11 fur Einsteiger - September 2024
Автор: Papercut Limited
Издательство: Papercut Limited
Год: 2024
Страниц: 76
Формат: PDF
Язык: немецкий
Размер: 38,6 MB

"Windows 11 для новичков" - это полное руководство для новых пользователей компьютеров и ноутбуков, а также для тех, кто хочет узнать все, что вам нужно для начала работы с операционной системой Microsoft. Это независимое руководство наполнено полезными советами и пошаговыми полностью иллюстрированными инструкциями, написанными на простом и понятном языке. Страницы этой публикации научат вас всему, что вам нужно знать о вашем компьютере и приложениях, сначала разобравшись в операционной системе, на которой они работают. Благодаря этому неофициальному руководству пользователя ни одна проблема не останется неразрешенной, ни один вопрос не останется без ответа, по мере того, как вы изучаете, исследуете и улучшаете свой опыт использования Windows 11.
 

Business Intelligence, Analytics, Data Science, and AI: A Managerial Perspective, 5th Edition

Автор: literator от 27-09-2024, 13:50, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Business Intelligence, Analytics, Data Science, and AI: A Managerial Perspective, 5th Edition
Автор: Ramesh Sharda, Dursun Delen, Efraim Turban
Издательство: Pearson Education, Inc.
Год: 2024
Страниц: 727
Язык: английский
Формат: pdf (true)
Размер: 39.2 MB

Business Intelligence, Analytics, Data Science, and AI is your guide to the business-related impact of Artificial Intelligence (AI), Data Science and analytics, designed to prepare you for a managerial role. The text's vignettes and cases feature modern companies and non-profit organizations and illustrate capabilities, costs and justifications of BI across various business units. With coverage of many data science/AI applications, you'll explore tools, then learn from various organizations' experiences employing such applications. Ample hands-on practice is provided, can be completed with a range of software, and will help you use analytics as a future manager. The 5th Edition integrates the fully updated content of Analytics, Data Science, and Artificial Intelligence, 11/e and Business Intelligence, Analytics, and Data Science, 4/e into one textbook, strengthened by 4 new chapters that will equip you for today's analytics and AI tech, such as ChatGPT. Examples explore analytics in sports, gaming, agriculture and “data for good.” Hadoop is an open-source framework for processing, storing, and analyzing massive amounts of distributed, unstructured data. A related new style of database called NoSQL (Not Only SQL) has emerged to, like Hadoop, process large volumes of multistructured data. Evolving out of the traditional artificial neural networks (ANN), Deep Learning is changing the very foundation of how Machine Learning works. Thanks to large collections of data and improved computational resources, Deep Learning is making a profound impact on how computers can discover complex patterns using the self-extracted features from the data (as opposed to a data scientist providing the feature vector to the learning algorithm).
 

Working with Network dаta: A Data Science Perspective

Автор: literator от 26-09-2024, 15:46, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Working with Network dаta: A Data Science Perspective
Автор: James Bagrow, Yong‐Yeol Ahn
Издательство: Cambridge University Press
Год: 2024
Страниц: 554
Язык: английский
Формат: pdf
Размер: 15.1 MB

Drawing examples from real-world networks, this essential book traces the methods behind network analysis and explains how network data is first gathered, then processed and interpreted. The text will equip you with a toolbox of diverse methods and data modelling approaches, allowing you to quickly start making your own calculations on a huge variety of networked systems. This book sets you up to succeed, addressing the questions of what you need to know and what to do with it, when beginning to work with network data. The hands-on approach adopted throughout means that beginners quickly become capable practitioners, guided by a wealth of interesting examples that demonstrate key concepts. Exercises using real-world data extend and deepen your understanding, and develop effective working patterns in network calculations and analysis. There are great textbooks on network science. We complement these with a focus on the practical side of network science—working with network data. The purpose of this book is to provide a more practical guide for data scientists to use network science. Machine Learning, also called statistical learning, is the science and technology of building predictive models that generalize from data. Machine Learning has not only disrupted many industries, it’s revolutionizing how science is done. Because of its ubiquity, network data are also heavily leveraged by Machine Learning. Thus, scientists working with network data (and all forms of data) can benefit from the tools and techniques of Machine Learning. There are several types of Machine Learning methods: supervised learning and unsupervised learning are the primary categories and then there are self-supervised learning, reinforcement learning, and more. Suitable for both graduate students and researchers across a range of disciplines, this novel text provides a fast-track to network data expertise.
 

Apache Hudi: The Definitive Guide (Early Release)

Автор: literator от 26-09-2024, 02:28, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Apache Hudi: The Definitive Guide: Building Robust, Open, and High-Performing Data Lakehouses (Early Release)
Автор: Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, Rebecca Bilbro
Издательство: O’Reilly Media, Inc.
Год: 2024-09-24
Язык: английский
Формат: pdf, epub, mobi
Размер: 10.1 MB

Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using their query engine of choice. Authors Shiyan Xu, Prashant Wason, Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications. The data platform layer can then become a limiting factor for innovation, straining to provide data fresh enough for analytics, and slowing down use cases of Machine Learning and AI. Herein lies a key advantage of using Hudi to empower analytics for this next generation of data-intensive applications. Hudi is designed to provide native support for near real-time analytics as well as time travel, and this is most evident in the different ways in which data can be read from Hudi.
 

Streaming Databases: Unifying Batch and Stream Processing

Автор: literator от 25-09-2024, 14:28, Коментариев: 0

Категория: КНИГИ » ОС И БД

Название: Streaming Databases: Unifying Batch and Stream Processing (Final Release)
Автор: Hubert Dulay, Ralph M. Debusmann
Издательство: O’Reilly Media, Inc.
Год: 2024
Страниц: 260
Язык: английский
Формат: True PDF, True EPUB (Retail Copy)
Размер: 15.4 MB

Real-time applications are becoming the norm today. But building a model that works properly requires real-time data from the source, in-flight stream processing, and low latency serving of its analytics. With this practical book, data engineers, data architects, and data analysts will learn how to use streaming databases to build real-time solutions. Authors Hubert Dulay and Ralph M. Debusmann take you through streaming database fundamentals, including how these databases reduce infrastructure for real-time solutions. You'll learn the difference between streaming databases, stream processing, and real-time online analytical processing (OLAP) databases. And you'll discover when to use push queries versus pull queries, and how to serve synchronous and asynchronous data emanating from streaming databases. So what is a streaming database? Database systems have many different flavors, from traditional relational databases to XML, graph, object, vector, and NoSQL databases. Many of these are well known and have been established for many decades. Streaming, or stream processing, is much less established, although it has seen a steep adoption rate in the industry over the past decade or so, led by the rise of Apache Kafka as the de facto streaming platform. Whether you’re a seasoned database engineer or a novice developer, this book guides you to unlocking the full potential of streaming databases and embracing the future of data processing.