What is the pandas DataFrame describe() method?
The Python pandas function DataFrame.describe() is used to generate a statistical summary of the numerical columns in a DataFrame. This summary includes key statistical metrics like mean, standard deviation, minimum, maximum and different percentiles.
What is the syntax for pandas’ describe() function?
The basic syntax of describe() for DataFrames is simple. It looks like this:
DataFrame.describe(percentiles=None, include=None, exclude=None)pythonImportant parameters for pandas’ DataFrame.describe()
Using the following parameters, you can adjust the output of describe():
| Parameter | Description | Default value |
|---|---|---|
percentiles
|
Lists the percentiles that should be included in the summary | [.25, .5, .75]
|
include
|
Specifies which data types to include in the description; possible values are numpy.number, numpy.object, all or None
|
None
|
exclude
|
Specifies which data types to exclude from the description; functions like the include parameter
|
None
|
Examples of how to use pandas describe()
If you need a quick overview of the key statistical metrics of a dataset, the pandas DataFrame.describe() function is extremely useful.
Example 1: Statistical summary of numerical data
In the following example, we take a look at the DataFrame df, which contains different types of sales data.
import pandas as pd
import numpy as np
# Example DataFrame with sales data
data = {
'Product': ['A', 'B', 'C', 'D', 'E'],
'Quantity': [10, 20, 15, 5, 30],
'Price': [100, 150, 200, 80, 120],
'Revenue': [1000, 3000, 3000, 400, 3600]
}
df = pd.DataFrame(data)
print(df)pythonNow, you can use pandas describe() to get a statistical summary of the numerical data in the columns:
summary = df.describe()
print(summary)pythonThe output of the pandas DataFrame.describe() function is as follows:
Quantity Price Revenue
count 5.000000 5.000000 5.000000
mean 16.000000 130.000000 2200.000000
std 9.617692 46.904158 1407.124728
min 5.000000 80.000000 400.000000
25% 10.000000 100.000000 1000.000000
50% 15.000000 120.000000 3000.000000
75% 20.000000 150.000000 3000.000000
max 30.000000 200.000000 3600.000000The key metrics shown in the output are:
count: Number of non-NaN (Not a Number) entriesmean: Average of the values (also accessible via DataFrame.mean())std: Standard deviation of the valuesmin,25%,50%,75%,max: Minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum values
Example 2: Customising percentiles
You can customise the percentiles in the pandas DataFrame.describe() output with the percentiles parameter:
# Statistical summary with custom percentiles
custom_summary = df.describe(percentiles=[0.1, 0.5, 0.9])
print(custom_summary)pythonThis function call provides the following output:
Quantity Price Revenue
count 5.000000 5.000000 5.000000
mean 16.000000 130.000000 2200.000000
std 9.617692 46.904158 1407.124728
min 5.000000 80.000000 400.000000
10% 7.000000 88.000000 640.000000
50% 15.000000 120.000000 3000.000000
90% 26.000000 180.000000 3360.000000
max 30.000000 200.000000 3600.000000In the output, 10%, 50% and 90% are included instead of the standard percentiles output in the previous example.