Second article in a four-part series

Read the first part ->

In our previous article, we had briefly described the principles of migrating an IMS database towards a relational database (SQL). In this new article, we are going to describe how the structure of the legacy database can be reverse engineered and how the new database can be designed.

The data migration is broken down into three phases :

  • reverse engineering of the structure of the legacy database

  • conception of the new database

  • migration of the data themselves

We will describe the 2 first phases in detail here below :

Reverse engineering

The reverse engineering of the legacy database is the implemented process that will document the database structure in the most precise and complete way possible. This documentation must be sufficient to allow a developer to develop or maintain an application using this database. Which means that it has to contain at least for each record the list of all fields (including length and type), the access keys, the identifiers, the sort keys; and also the links existing between the different records. 

The issue with the reverse engineering of an IMS database is that these informations are scattered across several places. The DBD (DataBase description) contains the name, the length of the segments and the « father - son » relationship between the segments. Only the fields used as access keys must be declared (just their name and length). Declaration of other fields is optional. A more precise decomposition of records is defined by the programs using the database. A same record can have more than one possible decompositions.

Rever has developed different tools facilitating the reverse engineering of IMS databases :

  • DBD extractor : analyzes one or more DBDs in order to model records inside the repository of DB-MAIN

migration schema 1

  • COBOL « copy » extractor : analyzes a COBOL copy in order to model the declared variable(s) inside the repository of DB-MAIN
  • PL1 « include » file extractor : analyzes a PL1 include in order to model the declared variable(s) inside the repository of DB-MAIN

migration schema 2

  • Schema integration : the DBDs and Copy / Includes are different sources of information that describe the same database. This tool allows you to integrate those different models into one single model (the same segment is defined in a DBD and in a copy or an include)

migration schema 3

The use of these different tools allows you to discover a full and precise model of the database.


Conception of the new database

In this phase, the new database (SQL) is going to be derived from the model discovered during the previous phase.

The model of the IMS database contains structures that would be impossible to represent in SQLthey must be transformed into equivalent constructions. The father-son links must be transformed in referential constraints (foreign key). The multi-valued fields (value arrays) must be transformed in a series of columns or as a sub-table, etc.

DB-MAIN offers a set of transformations in order to go from the IMS model to a relational model. It is also possible to partially automate this transformation. A full automation is impossible since some of the choices we make during the transformation are going to influence the readability and performances of the new database. For instance, if the IMS database contains a multi-valued field (an array), there are three possible transformations :

  • Creation of a table to represent the elements of the array : produces a standardized database, easily readable, easily available and evolutive (you can easily modify the number of elements in the array). By contrast, the performances will be degraded because a joint between both tables will be necessary to rebuild the IMS record.
  • Creation of a column to represent each element of the array : produces an efficient database but its structure is less clear (specially if the table contains a lot of elements).
  • Creation of a big column in which the different elements are concatenated : produces an efficient database but less easy to reach (the column has to be decomposed in order to find the different elements).

As shown in this example, the conception of the database is a very important phase because it will determine the performance and readability of the new database.


In the third part of this article, we will address in detail the migration of data themselves.


Abbreviations :

  • IMS = Information Management System
  • SQL = Structured Query Language
  • Cobol = Common Business-Oriented Language
  • PL1 = Programming Language number 1
  • DBD = Database Description
  • DBMS = Database Management System

Find out more ? DB-Main is a data-modeling and data-architecture tool. Historically linked to the database engineering lab of the UNamurRever continues to improve the software in order to help developers and analysts in most data engineering processes.



Jean Henrard