Data Profiling

Senthil Nayagan
1 min readApr 28, 2018

--

Data profiling is the process of examining the data available in an existing data source (e.g. Databases, files, etc.) and collecting statistics and information about that data.

Data profiling is also referred to as data discovery.

This method is widely used in enterprise data warehousing.

Data profiling uses different kinds of descriptive statistics including mean, minimum, maximum, percentile, frequency and other aggregates such as count and sum.

The additional metadata information obtained during profiling is data type, length, discrete values, uniqueness and abstract type recognition.

Types of Analysis Performed

  • Completeness Analysis
  • Uniqueness Analysis
  • Values Distribution Analysis – What is the distribution of records across different values for a given attribute?
  • Range Analysis
  • Pattern Analysis

Benefits

The benefits of data profiling is to improve data quality.

Data profiling clarifies the structure, relationship, content and derivation rules of data, which aid in the understanding of anomalies within metadata.

--

--

Senthil Nayagan
Senthil Nayagan

Written by Senthil Nayagan

I am a Data Engineer by profession, a Rustacean by interest, and an avid Content Creator.

No responses yet