Seattle Data Guy
Seattle Data Guy
  • 254
  • 4 924 547
Going From Data Engineer To Head Of Data - How To Run A Data Team Successfully
You join a 1000 person company as the head of data. What should you do?
I would invest a a lot of time up front to understand the business(especially if you haven't worked in the industry).
I was just talking to someone at the Snowflake Summit who told me they made the mistake of recently being put in charge of a data team and their first response was "Great, what tools can I use".
If you can't answer the following questions within the first month or two, you're probably going in the wrong direction.
- What are the main drivers of the business?
- What are the main pain points of the customers?
- What does the business flow look like?
- Who is our customer?
- (What other questions would you add)
More than likely, if you're a Head Of Data, you've hopefully got the technical chops to excel in the role.
But if you don't have a strong understanding of the business, you'll either end up becoming a task taker who struggles to get out of ad-hoc requests or you'll build a bunch of fancy infrastructure that delivers nothing for the business.
If you enjoyed this video, check out some of my other top videos.
Top Courses To Become A Data Engineer In 2022
ua-cam.com/video/kW8_l57w74g/v-deo.html
What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
ua-cam.com/video/-ClWgwC0Sbw/v-deo.html
If you would like to learn more about data engineering, then check out Googles GCP certificate
bit.ly/3NQVn7V
If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
seattledataguy.substack.com/​​
Or check out my blog
www.theseattledataguy.com/
And if you want to support the channel, then you can become a paid member of my newsletter
seattledataguy.substack.com/subscribe
Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
_____________________________________________________________
Subscribe: ua-cam.com/channels/mLGJ3VYBcfRaWbP6JLJcpA.html
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Переглядів: 2 013

Відео

Apache Spark Vs Apache Flink - Looking Through How Different Companies Approach Spark And Flink
Переглядів 3,3 тис.14 днів тому
As data increased in volume, velocity, and variety, so, in turn, did the need for tools that could help process and manage those larger data sets coming at us at ever faster speeds. As a result, frameworks such as Apache Spark and Apache Flink became popular due to their abilities to handle big data processing in a fast, efficient, and scalable manner. But we often find that sometimes it can be...
Intro To Databricks SQL AI Functions - 5 SQL AI Functions Databricks Has And How To Use Them
Переглядів 2,3 тис.21 день тому
Databricks and Snowflake have been releasing various forms of AI SQL functionality. So I asked Josue Bogran if he'd walk through using some of Databricks SQL AI functions that they just put out! If you'd like to learn more about Databricks you should follow Josue here- www.linkedin.com/in/josuebogran/ If you enjoyed this video, check out some of my other top videos. Top Courses To Become A Data...
If I could give advice to myself when starting as a data engineer
Переглядів 4,7 тис.Місяць тому
We all get stuck in our careers. Whether it's because of the team we're on, the point of life we're in, etc. So I wanted to talk about some tips I have for people looking to supercharge their data engineering career. If you enjoyed this video, check out some of my other top videos. Top Courses To Become A Data Engineer ua-cam.com/video/kW8_l57w74g/v-deo.html What Is The Modern Data Stack - Intr...
Data Modeling Where Theory Meets Reality - How Different Companies I Worked At Modeled Their Data
Переглядів 10 тис.Місяць тому
Data modeling varies at different companies. At facebook we had plenty of storage and often treated historical data modeling very differently compared to when I worked at an enterprise. The concept of slowly changing dimensions wasn't as prevalent and instead we simply stored snapshots of data every day. So let's talk about modeling historical data and how it varied. If you enjoyed this video, ...
How To Escape The Rat Race - 6 Tips I Wish I Had Before I Became An Independent Consultant
Переглядів 3,3 тис.Місяць тому
It feels like everyone is trying to quit their 9-5. Ok, that could just be a some level of bias, but there are plenty of people looking to escape the rat race. The question becomes how and what are the pitfalls along the way. In this video I'll provide the advice I wish I had when I started my journey. I hope it helps! If you enjoyed this video, check out some of my other top videos. The Ultima...
What Is S3 And How Can You Query It With AWS Athena - AWS Data Engineering 101
Переглядів 3,1 тис.2 місяці тому
S3 is a commonly used AWS solution for data lakes and staging areas. Data engineers need AWS and it also supports so many other solutions like Snowflake when hosted on AWS. So what is S3 and how can data engineers use it? How can data engineers use it to read from AWS Athena? Also, I reference a video that shows how to set up an S3 Snowpipe integration, here is the link from @mastering_snowflak...
What Tools Should Data Engineers Know In 2024 - 100 Days Of Data Engineering
Переглядів 29 тис.2 місяці тому
What tools should a data engineer know? Honestly this video is more of a list of tools that goes far beyond what most data engineers know but I wanted to create a video that shared a list of data engineering tools for the 100 days of data engineering video. So here it is! Also, if you're looking to check out some tools that I am advising for you can look at Estuary for data ingestion bit.ly/3Ed...
Using AWS Lambda As A Data Engineering - Automating An API Extract With AWS Lambda And Eventbridge
Переглядів 4,4 тис.3 місяці тому
I recently posted the first video in a series about AWS and data engineering. This is the second video where we will dive into how you can use AWS Lambda to perform automations to scrape data from an API. You can find the basic code here - gist.github.com/bAcheron/8945c0c0ecd59df5e02397a35ba445e6 Also if you'd like to see the prior video you can find it here: AWS Services YoU Need To Know As A ...
Best AWS Services You Need To Know As A Data Engineer - How To Become A Data Engineer
Переглядів 6 тис.3 місяці тому
Best AWS Services You Need To Know As A Data Engineer - How To Become A Data Engineer
Optimizing Your Data Infrastructure - How To Become A Better Data Engineer
Переглядів 6 тис.3 місяці тому
Optimizing Your Data Infrastructure - How To Become A Better Data Engineer
Data Modeling - Walking Through How To Data Model As A Data Engineer - Dimensional Modeling 101
Переглядів 23 тис.4 місяці тому
Data Modeling - Walking Through How To Data Model As A Data Engineer - Dimensional Modeling 101
How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It
Переглядів 9 тис.4 місяці тому
How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It
Fastest way to Start Your Data Engineer Journey in 2024 - 100 Days Of Data Engineering Crash Course
Переглядів 69 тис.5 місяців тому
Fastest way to Start Your Data Engineer Journey in 2024 - 100 Days Of Data Engineering Crash Course
The Ultimate Guide To Starting An Independent Consulting Company In 2024 | Data Consulting 101
Переглядів 12 тис.5 місяців тому
The Ultimate Guide To Starting An Independent Consulting Company In 2024 | Data Consulting 101
Data Modeling - Why Data Engineers Need To Understand It - An Introduction To Data Engineering
Переглядів 30 тис.6 місяців тому
Data Modeling - Why Data Engineers Need To Understand It - An Introduction To Data Engineering
What Is Apache Druid And Why Do Companies Like Netflix And Reddit Use It?
Переглядів 7 тис.7 місяців тому
What Is Apache Druid And Why Do Companies Like Netflix And Reddit Use It?
The Realities Of Airflow - The Mistakes New Data Engineers Make Using Apache Airflow
Переглядів 13 тис.8 місяців тому
The Realities Of Airflow - The Mistakes New Data Engineers Make Using Apache Airflow
Data Architects Vs Data Engineers - Is There A Difference?
Переглядів 11 тис.9 місяців тому
Data Architects Vs Data Engineers - Is There A Difference?
What Is Docker - Docker Intro And Tutorial On Setting Up Airflow | High Paying Data Engineer Skills
Переглядів 7 тис.9 місяців тому
What Is Docker - Docker Intro And Tutorial On Setting Up Airflow | High Paying Data Engineer Skills
How To Fast Track Your Data Engineering Career - Translating Business Requirements Into Value
Переглядів 7 тис.9 місяців тому
How To Fast Track Your Data Engineering Career - Translating Business Requirements Into Value
Everyone's Data Infrastructure Is A Mess - The Truth About Working As A Data Engineer
Переглядів 7 тис.10 місяців тому
Everyone's Data Infrastructure Is A Mess - The Truth About Working As A Data Engineer
Data Modeling Challenges - The Issues Data Engineers & Architects Face When Implementing Data Models
Переглядів 24 тис.11 місяців тому
Data Modeling Challenges - The Issues Data Engineers & Architects Face When Implementing Data Models
Why I Left Data Science - And Picked Data Engineering Instead
Переглядів 16 тис.11 місяців тому
Why I Left Data Science - And Picked Data Engineering Instead
What Is Change Data Capture - Understanding Data Engineering 101
Переглядів 9 тис.Рік тому
What Is Change Data Capture - Understanding Data Engineering 101
How I'd Become A Data Engineer (If I had to start over as a data analyst in 2023)
Переглядів 60 тис.Рік тому
How I'd Become A Data Engineer (If I had to start over as a data analyst in 2023)
A Decade In Data Engineering - Has Anything Actually Changed?
Переглядів 9 тис.Рік тому
A Decade In Data Engineering - Has Anything Actually Changed?
Data Engineering Vs Machine Learning Pipelines - What Is The Difference
Переглядів 8 тис.Рік тому
Data Engineering Vs Machine Learning Pipelines - What Is The Difference
Will Data Engineering Exist In 5 Years - Is Data Engineering A Good Career Choice?
Переглядів 53 тис.Рік тому
Will Data Engineering Exist In 5 Years - Is Data Engineering A Good Career Choice?
Can AI Code A Data Engineering Project - Using ChatGPT To Code A Python Project
Переглядів 6 тис.Рік тому
Can AI Code A Data Engineering Project - Using ChatGPT To Code A Python Project

КОМЕНТАРІ

  • @Kira-ji5pr
    @Kira-ji5pr 16 годин тому

    I’m thinking of switching from full stack to data engineering . Any advice ??

  • @William-B
    @William-B День тому

    We’re a young data team for a large organization. Biggest roadblocks for us are issues with data governance (“you can’t have or report on our data”), budget for tooling (“prove the value of the tool, then we can purchase it”), and cloud concerns (“all my data is on-prem. You can’t just put it in the cloud”)

  • @smrtysam
    @smrtysam День тому

    This has happened to me. Now I’m leading a team of data scientists, engineers, analysts and migration specialists. I’ve had to learn so much so quick about strategy and people management. I’ve had to coach the people on my team to really empower and own their own tasks. At the beginning of being head of data I was taking on way too many “low level tasks”. Now I’m delegating and empowering. I still have alot to learn though.

  • @crisithink9509
    @crisithink9509 День тому

    I wonder how much Data God has in the Aether/Astral realm 🤔

  • @SeattleDataGuy
    @SeattleDataGuy 2 дні тому

    If you're looking for help setting up your data team and strategy, then feel free to set-up a free consultation here - calendly.com/ben-rogojan/consultation

  • @Ian-vh2vv
    @Ian-vh2vv 2 дні тому

    Just went thru this process with my company the past year. Great video. With us it went something like: - Where is all of our data - How are we doing reporting now - What are the shortcomings of existing reporting solutions - Do we need a warehouse (yes) - What warehouse do we pick - What ETL stack makes sense for our use case - What do we integrate in what order to maximize value and get adoption rolling Also, Having someone on the exec level champion the BI effort and really push it forward was huge for the thing to actually materialize.

    • @SeattleDataGuy
      @SeattleDataGuy 2 дні тому

      Thanks for sharing! I really appreciate it when people add more context and their own experiences. Were there any gotchas you ran into while going through this process?

    • @baw5xc333
      @baw5xc333 22 години тому

      How long did this rollout take?

    • @Ian-vh2vv
      @Ian-vh2vv 20 годин тому

      @@baw5xc333 about 6 months from step 1 until I started development (first snowflake table and started integrating our first source system)

  • @sirus312
    @sirus312 2 дні тому

    I keep hearing from top CEOs that with Palantir we don't need teams anymore

    • @SeattleDataGuy
      @SeattleDataGuy 2 дні тому

      I'd love to believe this! I guess the reason I have a hard time believing it is because I know there are lots of consultants that work in the space of setting up Palantir which suggests that it still requires technical skills to set-up and work with(also based on a few conversations I have had with people working with Palantir). But always happy to be wrong.

  • @hakeem1340
    @hakeem1340 2 дні тому

    Thank you for sharing

  • @hantt
    @hantt 2 дні тому

    the de role should not exist, it should just be sde who also own data as a product. kind of lile front end, backend, thete will be a data focused engineer, that we can call data engineer. o wait

  • @nathannguyen2041
    @nathannguyen2041 3 дні тому

    Hm. Makes me think that I should DM the data engineer that I vaguely know and have communicated with once or twice on Slack about what kind of work he does and if I would be able to work on low priority projects. Any recommended ice breakers?

  • @crypt_hodl
    @crypt_hodl 3 дні тому

    Interested! can you please have special pricing for people in Africa. 50% reduction is good but our earnings are way too low probably 20x less than those in US or Europe. It becomes difficult for us to participate in this type of good courses. Any help! Thanks.

  • @madihenry7861
    @madihenry7861 4 дні тому

    Hi! can you please share the full screen for what you have typed under the config_file?

  • @data-dynamo-guy
    @data-dynamo-guy 4 дні тому

    I also find myself building stuff rather than analyzing business problems @@

    • @SeattleDataGuy
      @SeattleDataGuy 4 дні тому

      It's always interesting how we all come to the same conclusion, thanks for watching!

  • @Aristocle
    @Aristocle 5 днів тому

    Is there a service or scripting language that allows me to write relationships between tables/databases in a modern material design style?

  • @serk-s
    @serk-s 5 днів тому

    Man, you really need to stop pitching your voice higher at the end of your sentences :(

    • @SeattleDataGuy
      @SeattleDataGuy 4 дні тому

      fair enough, on the flip side i have picked up a vocal fry trying to do that lol

  • @richardmartin6605
    @richardmartin6605 6 днів тому

    Would love to see article reviews!

  • @initialb811
    @initialb811 6 днів тому

    This is really awesome. Would love to see more of this!

  • @TJInTech10
    @TJInTech10 6 днів тому

    thx for breaking it down

    • @SeattleDataGuy
      @SeattleDataGuy 4 дні тому

      glad you found it helper!

    • @TJInTech10
      @TJInTech10 4 дні тому

      @@SeattleDataGuy yes, thx , I'm trying to understand how Knowledge graph/Vector DB's will integrate into this too, is it safe to assume both will be essential pieces of the enterprise ai layer/stack now being invested in heavily, or do you see one being more relevant in next 2-5 yrs?

  • @AnalyticsEngineer-hg3to
    @AnalyticsEngineer-hg3to 11 днів тому

    Don’t just be a task taker, be a strategic player.

    • @SeattleDataGuy
      @SeattleDataGuy 4 дні тому

      thanks for reading my articles and watching my videos!

  • @B-gaming930-fl5qr
    @B-gaming930-fl5qr 11 днів тому

    E5 is where it's at 750 Million 😂

  • @osoucy
    @osoucy 12 днів тому

    To me, one of the main benefit of Spark Structured Streaming is that you can easily switch between near real-time (micro batches) and scheduled batch processing without having to re-writing a single line of code. This is a very effective way of scaling up and down and balancing costs vs latency.

  • @cestlachance7575
    @cestlachance7575 13 днів тому

    Is this really a good video? i feel like he just namedrops every techs

  • @moussaelaqqaoui
    @moussaelaqqaoui 14 днів тому

    Hello ben, can we have a discussion please !

  • @DataPains
    @DataPains 14 днів тому

    Great video! Thank you for sharing!

  • @danhorus
    @danhorus 15 днів тому

    13:03 in Spark, we avoid Python UDFs like the plague because they're much slower than native Spark code. I wonder if the same is true for Flink, given that it also runs on JVMs. A quick Google search indicates that vectorized UDFs are a thing in Flink too, so I assume the same limitations apply

    • @SeattleDataGuy
      @SeattleDataGuy 15 днів тому

      Thanks for the added context! It's much appreciated I now am thinking if I have ever had a good experience with a UDF 🤣. I always remember touting them, but even in one case where i do recall trying it out on SQL Server, we found it slow.

    • @danhorus
      @danhorus 15 днів тому

      ​​@@SeattleDataGuy With Spark, there are several ways to write transformations. By far, the best option is to use native Spark functions, as they compile to highly optimized and parallelized Java byte code. The second best option is to write UDFs in Scala or Java, as everything still runs in the same JVM. The third best option, in case you want/need to use Python, is to write a vectorized UDF (also known as Pandas UDF), which leverages Apache Arrow to move data between the JVM and the Python interpreter in batches. Finally, as a last resort, you can use regular Python UDFs, however they're a lot slower because they basically compute results row by row rather than in big batches. If you have slow Spark jobs using Python UDFs, refactoring them is usually a good way to gain some performance. About this blog post, I'm not sure the author is aware of this limitation, but if they need this code to run very very fast, they should probably avoid Python UDFs too

    • @danhorus
      @danhorus 15 днів тому

      ​@@SeattleDataGuyI wrote a long comment about the different types of UDFs in Spark, but apparently UA-cam decided to delete it. Maybe you'll find it marked as spam, lol

    • @SeattleDataGuy
      @SeattleDataGuy 15 днів тому

      @@danhorus Did you put a url in it? That seems to be the main reason I have seen youtube define things as spam. I'll look

    • @danhorus
      @danhorus 15 днів тому

      Not really, but let's try again, haha. In Spark, there are many ways to apply data transformations. By far the best option is to use native Spark functions, as they compile to highly optimized/parallelized Java byte code. The second best option to maximize performance is to use Scala or Java UDFs, as they run inside the JVM with a minor performance hit. The third option, if you want/need to use Python, is to write a vectorized UDF (also known as Pandas UDF), which leverages Apache Arrow to transfer big batches of records to the Python interpreter and back to the JVM after processing. Finally, the last option you should consider is the regular Python UDF, as it basically transforms row by row and has much worse performance as a result. If you have a slow Spark job, refactoring Python UDFs can make it a lot faster. I'm not sure the authors of the blog post are aware of this, but they can probably make their code faster too

  • @jace743
    @jace743 15 днів тому

    I’d watch if you did live article reviews!

    • @SeattleDataGuy
      @SeattleDataGuy 15 днів тому

      Yeah! I think watching other creators do it, I really gotta slow down to do it well

  • @ankittjindal
    @ankittjindal 15 днів тому

    Recommend me some books as I only have an idea of python and sql so..which book best for me as a beginner in data engineering field

  • @damien__j
    @damien__j 15 днів тому

    Great video thanks!

  • @knkootbaoat6759
    @knkootbaoat6759 16 днів тому

    gotta make things complex otherwise we wouldnt get paid as much. i half joke. we dont make it complex it's just situations are inherently complex

  • @AyushMandloi
    @AyushMandloi 16 днів тому

    Sound of transition is very loud

  • @prico3358
    @prico3358 16 днів тому

    Better crossover than a batman & Iron man movie.

  • @tommynelson4795
    @tommynelson4795 18 днів тому

    Minor tip. I’d recommend removing the very high pitch transitions from your videos. I thought my tinnitus was acting up haha. Other than that great vid!

  • @user-ux4iu7us7p
    @user-ux4iu7us7p 20 днів тому

    What are your thoughts on the new AWS Data Engineering Certification?

  • @elcoxeroni8273
    @elcoxeroni8273 21 день тому

    Thank you for this really great content! Which is the book you are referring to in your video? I like the structure much and am considering buying it. Thanks in advance!

  • @mrgenetics4063
    @mrgenetics4063 21 день тому

    I want to become a data scientist or engineer….my biology degree has never brought me financial security and I hope to be rich one day

  • @otavioattuy5394
    @otavioattuy5394 23 дні тому

    Where do I find the theory behind the "types" of dimension tables?

  • @glstnlev
    @glstnlev 24 дні тому

    Interesting use case about SCD2 but how in practice do we create these tables? I understand the importance and how useful is it to have a new row for each change but can’t get how to model it to make it work

  • @abrahamgomez653
    @abrahamgomez653 24 дні тому

    I love learning about data engineering and overall cloud computing. Cloud is the future.

  • @DerekGatlin
    @DerekGatlin 25 днів тому

    Thank you guys so much for your transparency- it is refreshing and I am more interested in working with you in the future as a result.

  • @septic7
    @septic7 25 днів тому

    Are these salaries adjusted for 2024 tranches ? 😅🥲

  • @maxonthetrack
    @maxonthetrack 25 днів тому

    awesome! I enjoy learning about these AI concepts in this hands-on practical way

  • @NoahPitts713
    @NoahPitts713 25 днів тому

    Josue is the man! thank you both for the great conversation

  • @poorbadger
    @poorbadger 25 днів тому

    Re: SQL Serverless…. Databricks now has job/workflow serverless which works with notebooks - a few limitations but most functionality is supported. I still use SQL all the time but that’s made the cluster start up penalty w notebooks way better

  • @saadoa4969
    @saadoa4969 25 днів тому

    dissapointing to know that you don't answer your viewers' emails. Solid content though

    • @SeattleDataGuy
      @SeattleDataGuy 25 днів тому

      I do my best! I am always playing catch up, but thank you for the support!

  • @SreejaThumma
    @SreejaThumma 26 днів тому

    Can you also make a video on the difference between DataBricks, Snowflake and Solix technologies

  • @Syed-A-Rizvi
    @Syed-A-Rizvi 26 днів тому

    so how much sql do I need? I know data science folks need expert level sql

  • @richardduncan3403
    @richardduncan3403 26 днів тому

    Real talk :)

  • @CalSticks
    @CalSticks 27 днів тому

    I really like these videos - the guests have all been fantastic and it's great to hear their views on the wider data space. Thanks for continuing to put them together. p.s. looks like you're trying to get better at not trailing off when finishing a thought - but I can tell it's hard! (I have the same problem)

  • @user-bc6bk7bg9i
    @user-bc6bk7bg9i 27 днів тому

    Hey Ben, what are your thoughts on MS Fabric as a data Engineer? IS is just another tool in the bucket or it actually solves the issue it claims to solve?

  • @norbinn
    @norbinn 27 днів тому

    In terms of data consulting, do you find more clients in need of Snowflake or Databricks expertise? Is there any correlation with the size / price point of the project?