With Redshift, top dog in cloud technology Amazon Web Service (AWS) offers managed cloud data warehousing services at significantly cheaper costs and a higher level of flexibility in comparison to an onsite solution in your own data center. Besides, it is the ideal opportunity for companies to get acquainted with the cloud without having to relocate critical operational systems. managetopia has an experienced AWS certified solution architect to assist and support clients with the AWS Cloud.
Our client has increased their focus on cloud solutions since the opening of the first AWS Data center on German soil. The advantages are apparent: no investments or fixed costs needed for the purchase of an expensive server, no commitment to server technology companies for the next few years, the flexibility of increasing or reducing server capacity, cost savings for IT staff and security in their own Datacenter and transparent cost structures.
As the analysis of several dozen gigabytes of transaction data for a large retail company was to be carried out, it quickly became apparent that AWS was the only way forward, as essential resources could be provided in a timely manner. There was not enough capacity onsite and the purchase of a new server would have been too costly and it would have taken too long. Initially the decision needed to be made, how to transfer the data to the cloud. In principle there were 3 options to choose from:
- Transfer using the standard ISP connection
- Transfer using a direct dedicated connection to the AWS Backbone to one of the many AWS Direct Connect locations
- Send as Snowball (*)
In principle, the latter involved a multiple layered secured SSD hard-drive, the size of an Amazon package, with dedicated security personnel and high level encryption used to secure the data. After dispatch to AWS the data can be imported quickly by AWS. Although our client had access to a dedicated AWS connection and it was possible to transfer all data within a couple of hours, Snowball was ruled out as an option; even though postal delivery times took 3 to 4 days, it would have only stood to our advantage if the data volume was in the terabyte region.
Parallel to the transfer of the data, the Redshift Cluster needed to be drafted and setup. Crucial for the performance and of course the costs was the number of computing nodes that the Redshift Cluster has: more servers, quicker queries. Each server is charged by the hour. With around 300 million datasets in the main Fact Table of the DWH (Data Warehouse), the run-time for most of the queries only needed one node in a matter of seconds, although some complex SQL queries needed in some instances a couple (few) minutes. So, we asked ourselves if we could improve the performance by adding more computer nodes. As we were relying on one analyst, instead of a team to handle the analysis, the decision was made to accept the run-time, thereby keeping the costs down. Besides, it is possible to resize the Cluster at any time. This followed the initial setup of the DWH Cluster that was ready to use in a matter of minutes, using just a few clicks.
As in all data intensive projects data cleansing was necessary, often the main driver for the effort involved. With large volume data the approach needs to be structured: individual steps take several minutes, sometimes hours. After several days needed for data cleansing, it was possible to move forward with populating the Data Warehouse with the data, which took no time at all. Using the AWS Backbone the transfer of several Gigabytes is extremely quick, similar to a local network.
After the successful setup and population of the Redshift DWH it was time to analyze the data. Redshift acts the same as a local DWH. Using PostgreSQL, Redshift interprets statements as standard SQL. Analysts connect using standard SQL clients, Tableau, Alteryx and other BI tools. After several weeks the client was presented with the results of the analysis. On project completion, the data was archived and the Redshift Cluster was shut down – similar to the setup only taking a matter of minutes. As a result costs were no longer incurred. Within just a few weeks it was possible to complete a complex analysis project with large volume data. It clearly demonstrated the potential of the cloud, in particular the flexibility and cost saving that can be achieved.
(*) With the launch of Amazon Snowmobile at the end of 2016 a further data transfer service was added. It entailed moving extremely large amounts of data (Exabyte-scale data sets) to AWS that needed to be collected by truck (https://aws.amazon.com/snowmobile/))
This service from AWS is called: Direct Connect (https://aws.amazon.com/directconnect/)
AWS Snowball (https://aws.amazon.com/snowball/)