Data Profiling
Data profiling is the process of examining the data available in an existing data source (e.g. Databases, files, etc.) and collecting statistics and information about that data.
Data profiling is also referred to as data discovery.
This method is widely used in enterprise data warehousing.
Data profiling uses different kinds of descriptive statistics including mean, minimum, maximum, percentile, frequency and other aggregates such as count and sum.
The additional metadata information obtained during profiling is data type, length, discrete values, uniqueness and abstract type recognition.
Types of Analysis Performed
- Completeness Analysis
- Uniqueness Analysis
- Values Distribution Analysis – What is the distribution of records across different values for a given attribute?
- Range Analysis
- Pattern Analysis
Benefits
The benefits of data profiling is to improve data quality.
Data profiling clarifies the structure, relationship, content and derivation rules of data, which aid in the understanding of anomalies within metadata.