Автор: Jerald Savin
Издательство: Routledge
Год: 2023
Страниц: 243
Язык: английский
Формат: pdf (true)
Размер: 10.2 MB
Pulling aside the curtain of ‘Big Data’ buzz, this book introduces C-suite and other non-technical senior leaders to the essentials of obtaining and maintaining accurate, reliable data, especially for decision-making purposes. Bad data begets bad decisions, and an understanding of data fundamentals ― how data is generated, organized, stored, evaluated, and maintained ― has never been more important when solving problems such as the pandemic-related supply chain crisis. This book addresses the data-related challenges that businesses face, answering questions such as:
What are the characteristics of high-quality data?
How do you get from bad data to good data?
What procedures and practices ensure high-quality data?
How do you know whether your data supports the decisions you need to make?
The design of this book is to highlight the issues that regularly arise when working with Data. This book is intended to be a reasonably comprehensive enumeration of these issues, their impact on decision-making, and ‘good practices’ that identify effective approaches and techniques to address these issues. The goal of this book is to provide a common language with which to discuss the factors and forces that affect Data, Data Quality, and hence the Quality of our Decisions.
What is Data? To ask what data is, is both simple and complicated. The simple answer is that data is the stuff stored in databases. The data were previously captured by computer applications, by websites, or by sensors and placed in databases for operational use and for future analytical use. A more nuanced definition of data considers factors such as where the data came from, where it is stored, how it will be used, how it might predict the future. This are a long way from data as a bunch of zeros (0’s) and ones (1’s).
Data Management is perhaps the broadest most encompassing term covering all domains, disciplines, techniques, and tools used to manage data. Data Management is a widely used, overarching, all-encompassing term. In other words, Data Management includes everything involved in managing, manipulating, storing, and using data.
Big Data is a big deal. Maybe Yes and Maybe No, depending upon the circumstances. This is said in the context of ‘Big Data’ is a big deal, which may not necessarily be true. Big Data is not necessarily an issue for all businesses; it could represent big opportunities. And newer technologies are making Big Data more accessible to businesses regardless of size or resources. What distinguishes Big Data? This question was usually answered in terms of V’s. Originally there were three V’s but over time the number of V’s has grown to 8 or 9. The original 3 Vs were:
- Variety
- Velocity
- Volume
Big Data is big business. Data Brokers are the companies that acquire, process, and resell Big Data. They are in the business of Big Data. The most recognizable Data Brokers are probably the credit reporting agencies. They collect, aggregate, organize, cleanse, and distribute financial information about consumer and business financial activities, but they are not alone. There are many other companies in the Big Data Pipeline. The originators of Big Data are very familiar. They collect and/or generate and resell business and personal data. We certainly recognize, Google, Facebook, Instagram, TikTok, Amazon as well as other big players in search, social media, and retail spaces. We also recognize the banks, the supermarkets, the big box stores, and other retail establish ments with which we interact daily. Their business activities are sources of Big Data.
Databases are the technical applications designed specifically to hold data. Ever since, SQL has been the standard for transaction processing applications and their data; hence, much of the material in this book refers to SQL. But SQL is not alone. NoSQL was developed to compensate for some of SQL’s limitations. NoSQL can also be referred to as ‘not only SQL,’ which in effective confirms the dominance of SQL in the database marketplace. But today SQL must share the spotlight with 4 other basic NoSQL models. They are:
- Key-Value Data Stores
- Document-based Data Stores
- Columnar-based Data Stores
- Graph-based Data Stores
Each of these models has its own strengths. In general, they have been described as schema-less. Schema-agnostic is probably a better description. Either way, they have flexibility not found in SQL. SQL works great for structured transaction data but what happens when databases are confronted with unstructured data, such as social media posts, other text or narrative content, and the datasets get really large? Enter NoSQL alternatives.
Скачать The Discipline of dаta: What Non-Technical Executives Don't Know About Data and Why It's Urgent They Find Out