Python para Data Engineers

Senior data engineer with extensive experience leading data engineering, data architecture, big data, cloud data services and software development projects; in industries such as software, semiconductors, SaaS, web, e-commerce, retail, banking and media entertainment. Very enthusiastic about sharing knowledge, teaching and volunteering.
Python es un lenguaje bastante extenso, pero para un rol de Data Engineer, los conocimientos que se requieren saber normalmente están acotados a unos cuantos. Lo ideal es mantener el foco y concentrarse en las funciones y conocimientos específicos mínimos para un Data Engineer y crecer esta habilidad conforme se vaya requiriendo. De esta manera, podemos reducir el tiempo de aprendizaje del lenguaje.
Primero, separo la importancia de familiarizarse con Python y con su estructura y sintaxis en primera instancia, y después entrar a los conocimientos particulares necesarios para un Data Engineer. Además, dejo fuera de este artículo temas como Pandas o PySpark, a los que les dedicaré artículos por separado por su importancia.
Conceptos basicos
Syntax and Data Structures:
Understanding Python syntax, variables, basic data types (integers, floats, strings, booleans), and fundamental data structures like lists, tuples, dictionaries, and sets.
Control Flow:
Proficiency in using conditional statements (if/elif/else), loops (for, while), and error handling with
try-exceptblocks.Functions:
Defining and calling functions, understanding arguments, return values, and scope.
File Handling:
Reading from and writing to various file formats, including CSV, JSON, and potentially more advanced formats like Parquet or Avro.
Object-Oriented Programming (OOP) Basics:
Grasping concepts like classes, objects, inheritance, and encapsulation for building modular and reusable code.
Conceptos y Librerias especificas de Data Engineering
Pandas:
Essential for data manipulation and analysis, including reading and writing data, filtering, sorting, merging, and aggregating dataframes.
Numpy:
Important for numerical operations and working with arrays, especially when dealing with large datasets or mathematical computations.
Database Connectivity:
Utilizing libraries like
SQLAlchemy,psycopg2(for PostgreSQL), orpymysql(for MySQL) to connect to and interact with databases.API Interaction:
Using the
requestslibrary to interact with RESTful APIs for data extraction or integration.Datetime Handling:
Working with
datetimeobjects, managing timezones, and converting between different date and time formats.Automation and Orchestration:
Understanding how Python can be used to automate tasks, potentially including basic scripting for cron jobs or interacting with orchestration tools like Apache Airflow.
Testing and Debugging:
Writing unit tests for code and effectively debugging Python scripts to identify and resolve issues.
Cursos y temas en Youtube interesantes
Cursos
Gratuitos:
Con costo:
Extras
Realizar los ejercicios y challenges de Python en paginas como HackerRank para agilizar el proceso de aprendizaje:
I’ll keep updating the article, sorry for the spanish-english mix.



