
Автор: Kelly P. Vincent
Издательство: Apress
Год: 2025
Страниц: 901
Язык: английский
Формат: pdf (true), epub
Размер: 31.0 MB
Curious about Data Science but not sure where to start? This book is a beginner-friendly guide to what Data Science is and how people use it. It walks you through the essential topics—what data analysis involves, which skills are useful, and how terms like “data analytics” and “machine learning” connect—without getting too technical too fast.
Data Science isn’t just about crunching numbers, pulling data from a database, or running fancy algorithms. It’s about asking the right questions, understanding the process from start to finish, and knowing what’s possible (and what’s not). This book teaches you all of that, while also introducing important topics like ethics, privacy, and security—because working with data means thinking about people, too.
Whether you're a student exploring new skills, a professional navigating data-driven decisions, or someone considering a career change, this book is your friendly gateway into the world of Data Science, one of today’s most exciting fields. No coding or programming experience? No problem. You'll build a solid foundation and gain the confidence to engage with Data Science concepts— just as AI and data become increasingly central to everyday life.
Learning a programming language is a required step for most data scientist positions. Data scientists generally don’t do all types of software development—most won’t be doing front-end work (like web development). Instead, data scientists usually work the most with scripts or notebooks that are single files. If you have to do your own productionizing, you’ll probably also sometimes design multi-file software to carry out your modeling, and you may use other tools to take your code into production. Python and R are the two most common languages used by data scientists, and this chapter will introduce the basics of programming in common languages like these, as well as look a bit at how these two are different.
Programming languages and other computing tools have been used in Data Science for a long time, but the dominance of the primary languages used today—Python and R—has come about more recently. R was originally developed in the 1990s to help teach statistics, and it ran on top of S. Many programs written in R could run in S without modification. It is still a great language for learning statistics and programming, but it quickly spread like wildfire outside of teaching and into the general stats world. It’s still heavily used by statisticians and now data scientists.
Python came into Data Science a little later, but it’s starting to eclipse R in popularity among data scientists. Python started as a general-purpose language with particular strengths in text processing and efficient syntax (allowing for more in fewer lines). It wasn’t really used for data science until the Pandas package (an add-on for the core language) was released in 2009 and Scikit-learn (the main Machine Learning package) in 2010. Pandas changed the style of programming in Python to be more R-like with the addition of data frames, the primary data structure in R. Data Science itself was starting to take off back then, and programmers who were already using Python naturally started using Pandas and Scikit-learn as they started doing Data Science, and its popularity also took off.
Скачать A Friendly Guide to Data Science: Everything You Should Know About the Hottest Field in Tech
