Basic Data Quality Checks

This article looks at basic data quality audit that can be done within a database. Examples are given using Oracle syntax however the techniques can also be applied to other databases

 

Introduction

The following article discusses some of the data quality issues that can be addressed by manual scripts on a copy of the database being analysed. This process allows quick wins to be taken for relatively little cost and effort by a good database administrator.

This can be supported by more detailed data profiling using tools such as Evoke and Trillium (formerly Avallino) and a short article from Intelligent Enterprise magazine lists some justifications for data profiling

Nulls and Not Nulls

The first step is to establish all the columns that should be NOT NULL. This is simply achieved by running the following script

set pagesize 0
set linesize 256
set feedback off
set heading off

column tt format a255

spool notnull.sql

select ‘prompt Trying to modify ‘||table_name||’.’||column_name tt
‘alter table ‘||table_name||’ modify ‘||column_name||’ not null;’ tt
from user_tab_columns
/

spool off

set heading on
set feedback on

spool notnull.lis

start notnull.sql

spool off

This script will read the database metadata and find all columns, it will then try and convert them to NOT NULL. The script will fail and move onto the next statement if the column is already NOT NULL or contains nulls.

The result of this is a set of columns that the application regards as mandatory regardless of it’s definition.

Primary Keys

The next step is to carry out a simple process of analysing each table that does not have a primary key defined and ensuring one is created. The tables that do not have a primary key defined can be found with the following:

select table_name from user_tables
minus
select table_name from user_constraints
where constraint_type = ‘P’
/

Foreign Keys

The next step is to identify all the foreign keys that are used in the system.
This is done from domain knowledge and local expertise, it may also be possible to identify some from where a sensible naming convention has been used to define the tables.

Once identified the appropriate foreign keys should be applied to the database. 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.