The Python pandas function DataFrame.describe() is used to generate a statistical summary of the numerical columns in a DataFrame. This summary includes key statistical metrics like mean, standard deviation, minimum, maximum and different percentiles.

What is the syntax for pandas’ describe() function?

The basic syntax of describe() for DataFrames is simple. It looks like this:

DataFrame.describe(percentiles=None, include=None, exclude=None)
python

Important parameters for pandas’ DataFrame.describe()

Using the following parameters, you can adjust the output of describe():

Parameter Description Default value
percentiles Lists the percentiles that should be included in the summary [.25, .5, .75]
include Specifies which data types to include in the description; possible values are numpy.number, numpy.object, all or None None
exclude Specifies which data types to exclude from the description; functions like the include parameter None

Examples of how to use pandas describe()

If you need a quick overview of the key statistical metrics of a dataset, the pandas DataFrame.describe() function is extremely useful.

Example 1: Statistical summary of numerical data

In the following example, we take a look at the DataFrame df, which contains different types of sales data.

import pandas as pd
import numpy as np
# Example DataFrame with sales data
data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Quantity': [10, 20, 15, 5, 30],
    'Price': [100, 150, 200, 80, 120],
    'Revenue': [1000, 3000, 3000, 400, 3600]
}
df = pd.DataFrame(data)
print(df)
python

Now, you can use pandas describe() to get a statistical summary of the numerical data in the columns:

summary = df.describe()
print(summary)
python

The output of the pandas DataFrame.describe() function is as follows:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
25%    10.000000  100.000000  1000.000000
50%    15.000000  120.000000  3000.000000
75%    20.000000  150.000000  3000.000000
max    30.000000  200.000000  3600.000000

The key metrics shown in the output are:

  • count: Number of non-NaN (Not a Number) entries
  • mean: Average of the values (also accessible via DataFrame.mean())
  • std: Standard deviation of the values
  • min, 25%, 50%, 75%, max: Minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum values

Example 2: Customising percentiles

You can customise the percentiles in the pandas DataFrame.describe() output with the percentiles parameter:

# Statistical summary with custom percentiles
custom_summary = df.describe(percentiles=[0.1, 0.5, 0.9])
print(custom_summary)
python

This function call provides the following output:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
10%     7.000000   88.000000   640.000000
50%    15.000000  120.000000  3000.000000
90%    26.000000  180.000000  3360.000000
max    30.000000  200.000000  3600.000000

In the output, 10%, 50% and 90% are included instead of the standard percentiles output in the previous example.

Go to Main Menu