![]() from TPC-H, and we will see how Spectrum performs, compared to using data stored in Redshift. In this example we will use the same dataset and queries used in our previous blogs, i.e. We will also examine a feature in Redshift called Spectrum, which allows querying data in S3 then we will walk through a hands-on example to see how Redshift is used. In this article, we will go through the basic concepts of Redshift and also discuss some technical aspects thanks to which the data stored in Redshift can be optimised for querying. Actually, the combination of S3, Athena and Redshift is what AWS proposes as a data lakehouse. We are not going to make a thorough comparison between Athena and Redshift, but if you are interested in the comparison of these two technologies and what situations are more suited to one or the other, you can find interesting articles online such as this one. In this article, we are going to learn about Amazon Redshift, an AWS data warehouse that, in some situations, might be better suited to your analytical workloads than Athena. As we commented, Athena is great for relatively simple ad hoc queries in S3 data lakes even when data is large, but there are situations (complex queries, heavy usage of reporting tools, concurrency) in which it is important to consider alternative approaches, such as data warehousing technologies. In our second article, we introduced Athena and its serverless querying capabilities. The rest of tables are left unpartitioned. Partitioned Parquets: 32.5 GB – the largest tables, which are partitioned, are lineitem with 21.5GB and orders with 5GB, with one partition per day each partition has one file and there around 2,000 partitions per table. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |