The following article discusses some of the data quality issues that can be addressed by manual scripts on a copy of the database being analysed. This process allows quick wins to be taken for relatively little cost and effort by a good database administrator.
This can be supported by more detailed data profiling using tools such as Evoke and Trillium (formerly Avallino) and a short article from Intelligent Enterprise magazine lists some justifications for data profiling
Nulls and Not Nulls
The first step is to establish all the columns that should be NOT NULL. This is simply achieved by running the following script
set pagesize 0
set linesize 256
set feedback off
set heading off
column tt format a255
select ‘prompt Trying to modify ‘||table_name||’.’||column_name tt
‘alter table ‘||table_name||’ modify ‘||column_name||’ not null;’ tt
set heading on
set feedback on
This script will read the database metadata and find all columns, it will then try and convert them to NOT NULL. The script will fail and move onto the next statement if the column is already NOT NULL or contains nulls.
The result of this is a set of columns that the application regards as mandatory regardless of it’s definition.
The next step is to carry out a simple process of analysing each table that does not have a primary key defined and ensuring one is created. The tables that do not have a primary key defined can be found with the following:
select table_name from user_tables
select table_name from user_constraints
where constraint_type = ‘P’
The next step is to identify all the foreign keys that are used in the system.
This is done from domain knowledge and local expertise, it may also be possible to identify some from where a sensible naming convention has been used to define the tables.
Once identified the appropriate foreign keys should be applied to the database.