this post was submitted on 01 Feb 2024
6 points (100.0% liked)
Data Engineering
387 readers
1 users here now
A community for discussion about data engineering
Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
An article requires registration with google or with email. Without registration there is a paywall.
Strange. I was able to view the article without either. I also tried in a private browser on both mobile and desktop.
I'm getting this paywall that cannot be skipped anyhow.
I don't know what's bypassing it on my setups, but it's on the wayback machine.
https://web.archive.org/web/2/https://www.analyticsvidhya.com/blog/2023/12/spark-vs-presto-a-comprehensive-comparison/
Thank you! The conclusion is quite good, like use spark as ETL and Presto (Trino) for analytical queries but the article looks very outdated.
Spark is not about RDDs. Today the most usage of Spark is via DataFrame API. And it is not just syntax. The Catalyst itslef provide a lot of performance optimizations, like predicate pushdown on the level of orc/parquet reading, automatic skew joins detection, prunning, etc.
Also Presto in this case should be called as Trino because there was a rebranding in 2020
I was a questioning the quality of the source, thanks for confirming that it's not a top quality article.