Introduction
In 2022, the customer, in collaboration with Xebia, successfully built a Data Transfer solution on the customers AWS platform. Since its implementation, the customer has identified several new requirements and areas for expansion. Xebia's experts were asked again to come up with enhancements and additional features for the next version of the Data Transfer solution.
Objective
The main goal is to extend the functionality of the current Data Transfer solution to guarantee data security and classification in an automated way. These enhancements will need to be deployed within the same AWS account as the existing solution, adhering to all current agreements. The solution’s design should be modular and reusable, allowing for potential deployment by other teams within the customer using Infrastructure as Code (IaC). Any additional deployments beyond the current scope will be addressed separately.
Inbound Solution
The Inbound Solution includes several AWS components to ensure security, efficiency, and scalability. AWS Lambda functions handle backend processing tasks, including data fetching from third-party vendors and SaaS products, as well as inbound file conversion from CSV to Parquet. These functions are orchestrated using AWS Step Functions to maintain seamless data pipeline execution. Incoming files are stored in Amazon S3, where they are scanned for security threats by Trend Micro Cloud One File Storage Security. Amazon WorkSpaces offers virtual desktops for end-user access. The solution mandates the use of AWS Identity Center (formerly AWS SSO) for user access via IAM roles, providing secure and controlled access. A lifecycle policy for Amazon S3 ensures files are removed in alignment with the credential expiration policy. AWS Lake Formation is utilized for data governance, ensuring proper data classification and security, while AWS Glue is employed for data cataloging and ETL processes. Additionally, AWS Macie is used to discover and protect PII data.
Within the customer’s Data Platform, Amazon CloudWatch (CW) alarms are extensively used to ensure process robustness and reliability. CloudWatch alarms are configured to monitor AWS Lambda functions, providing immediate notifications in case of failures, enabling prompt responses to maintain seamless operation. Alarms are also set for AWS Glue crawler runs to alert on failures, ensuring the data catalog remains up-to-date and accurate. Additionally, CloudWatch alarms monitor the count of messages in Amazon SQS queues. When the message count exceeds a specified threshold, these alarms trigger AWS Step Functions or Lambda functions to handle the increased load, thereby maintaining optimal performance and throughput of the data pipeline. This comprehensive use of CloudWatch alarms facilitates proactive monitoring and automatic scaling of the infrastructure, ensuring high availability and reliability of their services.
Furthermore, various file types, such as CSV, JSON, and Parquet, are stored in Amazon S3 buckets, which are encrypted using customer-managed KMS (Key Management Service) keys. Access to these buckets is tightly controlled through IAM (Identity and Access Management) policies and KMS key policies, ensuring that only authorized users and services can access the encrypted data. Additionally, KMS is leveraged for encrypting various AWS services within the platform, including SNS (Simple Notification Service), SQS (Simple Queue Service), Lambda environment variables, Secrets Manager, and DynamoDB. This comprehensive use of KMS across the customer’s services ensures robust encryption and security, protecting sensitive data both at rest and in transit. Moreover, AWS Security Hub, Amazon GuardDuty, and AWS Config are employed to enhance the security posture, while AWS Control Tower Service Control Policies (SCPs) govern which services and regions can be used, and solutions are deployed to multiple AWS accounts to maintain a secure and compliant multi-account environment.
Business Impact
The enhanced Data Transfer solution significantly improves the customer's ability to manage and share data securely and efficiently. The implementation of MFA and lifecycle policies strengthens security, ensuring compliance with AWS data privacy best practices and regulatory requirements. The modular design and use of Infrastructure as Code (IaC) enable scalability, allowing other teams within the customer to deploy similar solutions effortlessly. These enhancements lead to better data governance, increased operational efficiency, and robust security, ultimately supporting the customer's growth and innovation.
By leveraging AWS's comprehensive suite of services, the customer can now handle inbound data transfers with greater confidence, meeting both current needs and future expansion plans.