this post was submitted on 17 Jun 2024
10 points (91.7% liked)
Data Engineering
388 readers
1 users here now
A community for discussion about data engineering
Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I stepped in to a cluster of a situation where an analyst driven product consisting of around 100 manually triggered, sequential, human in the loop for QA, loop back, edit and rerun SQL steps which were completed every month. This was after a bunch of data science spark routines ran (all off one of the DS' laptops, and it took a day! The machine was unavailable for his other use while this happe ed! Lol
Then, the product of all that work was loaded into an excel file where further QA occurred. The excel file was shipped to clients and represented the deliverable. Folks were paying a LOT for this analysis, every month.
The excel took about 30 minutes to load, and god help you if you tried to move anything or conduct your own analysis.
The eng team built a proper ingest pipe, computation/model platform, and by draw of the straw I got the task of unraveling the pile of analyst sql into a DBT workflow. Then we pivoted the deliverable to looker where the only SQL that happened there was specific to the display and final organization of the data.
If you find yourself in a similar situation, and your stakeholder is a squad of highly intelligent analysts with deep domain knowledge but shallow eng knowledge, DBT can be a godsend.
In the right space, I can't recommend it enough