AWS Database Migration Service
Understand what AWS Database Migration Service is and how it works.
We'll cover the following
AWS Database Migration Service (DMS) allows users to migrate relational and non-relational databases and data warehouses securely and efficiently. It is not necessary for the source and the target setup to be on AWS. The migration can occur between a combination of on-cloud and on-premises data stores. However, at least one of the data stores should be on an AWS service.
Traditionally, the migration process was time-consuming. It required identifying the capacity of the target data store, performing specific installations, monitoring, testing, and debugging the migrated resources. AWS DMS has out-ruled all these limitations with a single service. It automatically manages all the migrated databases’ management and deployment processes.
How AWS DMS works
AWS DMS works by cost-effectively replicating data from the source database to the target database. The source and target data stores are called endpoints. DMS first builds a connection with the source endpoint and formats the data compatible with the target endpoint. For this, the user needs to create a replication instance. The replication instance is responsible for reading the source data store, formatting data, and performing the actual migration.
Next, a source and target endpoint must be created to build a connection with the source and target data stores. When the connection is successful for both endpoints, the user can create the data migration task to migrate data.
The data migration task has three types:
Migrate existing data: This is the simplest option for replicating data, where DMS simply creates a copy of our source data. This is useful when we want to test the data in the source database and make a copy of the original data. This is a one-time operation.
Migrate existing data and replicate ongoing changes: In this type, DMS makes a copy of our existing data, keeps looking for the changes, and reflects the data change on the target side.
Replicate data changes only: This option is usually used for homogeneous migrations, where migration happens from the self-managed source to the equivalent database on the AWS-managed service.
We can select the migration type at the time of creating the data migration task.
Improving AWS DMS performance
Multiple factors affect the performance of AWS DMS. It is important to keep these factors in mind while performing homogeneous or heterogeneous migrations. Below are some measures that can help increase AWS DMS performance:
For homogenous migrations, each source data type must be converted to its equivalent AWS DMS data type. The CPU must always be available during these conversions, otherwise it may slow the migration process.
Large instances must be used when the data being migrated is large or multiple replication tasks are being used.
By default, the storage range of a replication server is 50–100 GB, which is mostly enough for database migration. However, for large transactions or in the case of multiple replication tasks, we must increase the storage. Choosing a Multi-AZ instance, especially during the ongoing replication phase, can protect against storage failures.
The number of tables being loaded in parallel can affect the performance as well. By default, eight tables are loaded in parallel by AWS DMS. Increasing this number while using a very large replication server can increase the performance. However, for smaller instances, reducing this number is better for performance.
Indexes, triggers, and referential integrity constraints can impact the performance and success of the migration process. For full loaf tasks, dropping primary and secondary key indexes, and triggers, and turning off referential integrity constraints is recommended. For Change Data Capture (CDC) tasks, secondary indexes should be added before the CDC phase to optimize performance.
Maintaining transactional consistency can improve performance. Dividing the tables that are not directly participating in common transactions, into multiple replication tasks can increase overall performance. Batch optimized apply option can also be used to maintain transactional integrity. As AWS DMS, by default, operates in transactional mode, using batch-optimized apply will group transactions, increasing efficiency.
Use case: AWS Schema Conversion Tool
Let’s assume we have a critical business application that relies on an Oracle database hosted on-premises. We wish to migrate this application and its data to Amazon Aurora PostgreSQL, a managed PostgreSQL-compatible service offered by AWS. However, there are some key differences between Oracle and PostgreSQL that can cause challenges during migration.
Heterogeneous environment: Migrating from Oracle (source) to Aurora PostgreSQL (target) is a heterogeneous migration, requiring additional steps to handle database schema and code incompatibilities.
Schema differences: Data types, functions, stored procedures, and views might need conversion to ensure they work correctly in the target database.
Manual conversion effort: Traditionally, manually modifying the schema and code to be compatible with Aurora PostgreSQL can be time-consuming and error-prone.
We can use AWS DMS along with the AWS Schema Conversion Tool to resolve the issue mentioned above.
We connect the Schema Conversion Tool to the on-premises Oracle database. The Schema Conversion Tool analyzes the Oracle schema and generates conversion scripts or a schema definition file. We configure the AWS DMS task with the source and target endpoints and the schema definition file. The DMS replication instance establishes connections to the source and target databases. DMS extracts data from the Oracle database based on the configured task. During data migration, DMS applies any necessary transformations based on the provided schema information to ensure compatibility with Aurora PostgreSQL. The transformed data is loaded into the target Aurora PostgreSQL cluster.
Get hands-on with 1300+ tech skills courses.