While the public cloud is still something that many companies are still hesitant about, It does allow for the fastest go to market strategy for most companies in today’s fast changing world. In a Service based market where managed environments and products are hailed as best fit for most companies, the Amazon cloud sit’s atop the pyramid.
The Client was using Action Matrix as it’s columnar Discovery Database and Action Vector as it’s in-memory Data warehouse for near realtime analytics. Actian Matrix shut down it’s product and the client would lose support of it’s production system in about 6 months. While Cloud migration was on the Client’s roadmap this news helped fast track the cloud migration project.
Analyzing all systems that the data warehouse and it’s dependent system’s were interacting with and building a migration roadmap with projects still in flight.
Selecting a replacement Database that would fit the current and future growth demands.
Getting buy in from Info-sec group to deploy company critical data to the cloud
Fugetron did a quick POC comparing Teradata, memSQl, Snowflake and Redshift to test current features and workloads of the current enterprise environment. Prepared and presented the findings to the client executive team and won an approval to migrate to Amazon Redshift.
A team of experienced data engineers setup Informatica on EC2 instances and prepared the Integration environment.
Facilitated the data migration strategy using Python Script at the same time implemented a data staging environment in Amazon S3.
Built a Data lake in Amazon S3 using implementation best practices employing a raw, cleaned and Prepared data setup file structure.
Used Attunity Replicate to write real time changes directly to the Data lake in S3
Built Python based script framework to process the real time data to staging tables within Redshift
Used Informatica to push down processing to the redshift database layer.
Redshift enabled the Client to size their cluster to current need but still enabled them to scale based on demand.
Migrated all data products in 3 phases ensuring no impact to business and still meeting Licensing and support agreement with old vendor.
Technologies used – Python, Informatica, AWS S3, AWS Ec2, AWS Redshift, Teradata, snowflake, MemSQL.
Team size –1 Architect, 4 Developers, 2 QA Engineers.
Project duration – 6 months
Project Governance – Agile delivery governed by Joint Steering Committee, Daily Scrum, Weekly Status Reports
Delivery model – Hybrid [Onsite/ Offshore]
Results and Benefits:
- Minimizing and eventually eliminating DBA role and on prep file storage costs.
- A custom scalable environment for self serve analytics.
- Faster to market strategy for New projects and data products.