What is the pandas DataFrame describe() method?

Contents

The Python pandas function DataFrame.describe() is used to generate a statistical summary of the numerical columns in a DataFrame. This summary includes key statistical metrics like mean, standard deviation, minimum, maximum and different percentiles.

What is the syntax for pandas’ `describe()` function?

The basic syntax of describe() for DataFrames is simple. It looks like this:

DataFrame.describe(percentiles=None, include=None, exclude=None)

python

Important parameters for pandas’ `DataFrame.describe()`

Using the following parameters, you can adjust the output of describe():

Parameter	Description	Default value
`percentiles`	Lists the percentiles that should be included in the summary	`[.25, .5, .75]`
`include`	Specifies which data types to include in the description; possible values are `numpy.number`, `numpy.object`, `all` or `None`	`None`
`exclude`	Specifies which data types to exclude from the description; functions like the `include` parameter	`None`

Examples of how to use pandas `describe()`

If you need a quick overview of the key statistical metrics of a dataset, the pandas DataFrame.describe() function is extremely useful.

Example 1: Statistical summary of numerical data

In the following example, we take a look at the DataFrame df, which contains different types of sales data.

import pandas as pd
import numpy as np
# Example DataFrame with sales data
data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Quantity': [10, 20, 15, 5, 30],
    'Price': [100, 150, 200, 80, 120],
    'Revenue': [1000, 3000, 3000, 400, 3600]
}
df = pd.DataFrame(data)
print(df)

python

Now, you can use pandas describe() to get a statistical summary of the numerical data in the columns:

summary = df.describe()
print(summary)

python

The output of the pandas DataFrame.describe() function is as follows:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
25%    10.000000  100.000000  1000.000000
50%    15.000000  120.000000  3000.000000
75%    20.000000  150.000000  3000.000000
max    30.000000  200.000000  3600.000000

The key metrics shown in the output are:

count: Number of non-NaN (Not a Number) entries
mean: Average of the values (also accessible via DataFrame.mean())
std: Standard deviation of the values
min, 25%, 50%, 75%, max: Minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum values

Example 2: Customising percentiles

You can customise the percentiles in the pandas DataFrame.describe() output with the percentiles parameter:

# Statistical summary with custom percentiles
custom_summary = df.describe(percentiles=[0.1, 0.5, 0.9])
print(custom_summary)

python

This function call provides the following output:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
10%     7.000000   88.000000   640.000000
50%    15.000000  120.000000  3000.000000
90%    26.000000  180.000000  3360.000000
max    30.000000  200.000000  3600.000000

In the output, 10%, 50% and 90% are included instead of the standard percentiles output in the previous example.

OhSuratShutterstock

How to load files into Python with pandas read_csv()

Python pandas read_csv() is a powerful function for quickly and efficiently accessing the contents of CSV files in Python. The function is flexible and offers numerous parameters so you can customise the loading process to suit your needs. Understanding pandas read_csv() is…

Mr. Kosalshutterstock

How to index pandas DataFrames

Pandas DataFrame indexing is a powerful tool for efficient and effective data handling. With various methods, you can target specific data and subsets of your DataFrame. In this article, we’ll explore what the pandas DataFrame index is, how to access column and row data using…

BEST-BACKGROUNDSShutterstock

How to clean data in pandas with dropna()

The pandas DataFrame.dropna() function is a powerful tool for cleaning datasets. The function efficiently removes missing values and can be used with various parameters, allowing programmers to specify different requirements for data cleaning. Learn about the syntax, parameters…

ESB Professionalshutterstock

How to use Pandas DataFrame to manipulate tables quickly in Python

The Pandas module is one of the most powerful tools for data manipulation in Python. One of the central data structures in Pandas is the DataFrame. DataFrames can be used to manipulate two-dimensional, structured data efficiently. We explain the structure of the data structure as…

BEST-BACKGROUNDSShutterstock

How to loop through DataFrames with pandas iterrows()

Pandas DataFrame.iterrows() is a helpful function for looping through rows in a DataFrame, especially when you need to process data row by row. This is especially useful for calculations or conditional logic. In this article, we’ll cover the syntax of panda iterrows() and show…

UndreyShutterstock

What is the Python pandas property iloc[]?

When working with DataFrames in Python pandas, not all rows or columns of a DataFrame are always relevant for data analysis. The pandas DataFrame property iloc[] is a useful tool for selecting rows or columns using their indices. In this article, we’ll take a look at the syntax…

What is the pandas DataFrame describe() method?

What is the syntax for pandas’ describe() function?

Important parameters for pandas’ DataFrame.describe()

Examples of how to use pandas describe()

Example 1: Statistical summary of numerical data

Example 2: Customising percentiles

What is the syntax for pandas’ `describe()` function?

Important parameters for pandas’ `DataFrame.describe()`

Examples of how to use pandas `describe()`