Published
- 5 min read
Exploring CDF vs PPF in SciPy: Understanding Probability Functions
Photo by Robert Stump on Unsplash
Introduction
In probability theory, the Probability Point Function (PPF) and Cumulative Distribution Function (CDF) serve as fundamental tools in understanding and quantifying uncertainty within random variables. In this blog, we delve into the significance, applications, and practical implementation of PPF and CDF by analyzing the daily return of Apple’s stock price in 2023. Meanwhile, we will introduce SciPy, an open-source Python library designed for scientific and technical computing.
Cumulative Distribution Function (CDF)
Definition
The Cumulative Distribution Function (CDF) is a function that describes the probability distribution of a random variable by specifying the probability that the variable will be less than or equal to a certain value. In simpler terms, it gives the probability of a random variable taking on a value less than or equal to a specified number.
Mathematically, for a random variable X, the CDF is denoted as F(x) and is expressed as:
, for all 。
Probability Point Function (PPF) or Inverse Cumulative Distribution Function (CDF)
Definition
The Probability Point Function (PPF), also known as the inverse cumulative distribution function, operates inversely to the CDF. It takes a probability value as input and returns the corresponding value of the random variable for which the CDF equals that probability.
Mathematically, if F(x) is the CDF of a random variable X, then the PPF is denoted as and is expressed as:
, such that
The PPF is particularly useful in statistics for determining values associated with specific probabilities, such as percentiles or critical values.
Contrasting PPF and CDF
-
Functionality
CDF provides the probability of the variable being less than or equal to x;
PPF helps to find the value associated with a given probability.
-
Usage
CDF is fundamental for understanding the overall behavior of a probability distribution.
PPF mainly used for determining critical values or thresholds.
-
Representation
CDF: Graphically represents the cumulative probability distribution, usually starting from 0 to 1, illustrating how probabilities accumulate as values increase.
PPF: Graphically represents the inverse of the cumulative distribution function, plotting the values of the random variable against probabilities, showing the values associated with specific probabilities.
Practical Example in Scipy
In this example, we’ll demonstrate the concepts of PPF and CDF within SciPy. We’ll use the daily returns of Apple (AAPL) in 2023 as our dataset.
Get Stock Return Data Using yfinance
# For getting historical financial data
import yfinance as yf
# For scientific computing in statistics
from scipy.stats import norm
# For plotting and visualization
import matplotlib.pyplot as plt
# Fetch AAPL data
stock_data = yf.download('AAPL', start='2023-01-01', end='2023-12-31')
# Get a concise summary of our DataFrame
stock_data.info()
"""
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 250 entries, 2023-01-03 to 2023-12-29
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 250 non-null float64
1 High 250 non-null float64
2 Low 250 non-null float64
3 Close 250 non-null float64
4 Adj Close 250 non-null float64
5 Volume 250 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.7 KB
"""
# Get the daiily return using `pct_change()`, remove missing values
stock_returns = stock_data['Adj Close'].pct_change().dropna()
Inspect the distribution by plotting histogram of daily returns
# Inspect the distribution by plotting histogram of daily returns
plt.figure(figsize=(8, 6))
plt.hist(stock_returns, bins=50, density=True, alpha=0.7, color='skyblue', edgecolor='black')
plt.title('Histogram of Apple Inc. (AAPL) Daily Returns in 2023')
plt.xlabel('Daily Returns(%)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
Construct the norm distribution in SciPy
# Calculate mean and standard deviation of daily stock returns
mean_return = stock_returns.mean()
std_deviation = stock_returns.std()
# Create a normal distribution based on the calculated mean and standard deviation
appl_daily_return_distribution = norm(loc=mean_return, scale=std_deviation)
Calculating CDF and PPF in SciPy
The .cdf()
and .ppf()
methods in SciPy are essential functionalities within the scipy.stats
module that handle Cumulative Distribution Function (CDF) and Probability Point Function (PPF), respectively, for various probability distributions.
# CDF
appl_daily_return_distribution.cdf(0.01)
# 0.7420136860623278
# 74.20% of the daily returns of Apple is less or equal to 1%
appl_daily_return_distribution.cdf(0.005)
# 0.5994001904523659
# 59.94% of the daily returns of Apple is less or equal to 0.5%
appl_daily_return_distribution.cdf(0.01) - appl_daily_return_distribution.cdf(0.005)
# 0.14261349560996184
# 14.26% of the daily returns of Apple is greater than 0.5% and less or equal to 1%
# PPF
appl_daily_return_distribution.ppf(0.9)
# 0.017944086843352632
# The value of which 90% of all the daily returns are less or equal to is 1.79%
appl_daily_return_distribution.ppf(1 - 0.9)
# -0.014274232117790355
# The value of which 90% of all the daily returns are less or equal to is -1.42%
# In other words, the value of which 90% of all the daily returns are greater to is -1.42%.
Visualize the Distribution
CDF
# Calculate the CDF for a range of values
x_values = sorted(stock_returns)
y_cdf = appl_daily_return_distribution.cdf(x_values)
# Plotting the Cumulative Distribution Function (CDF)
plt.figure(figsize=(8, 6))
plt.plot(x_values, y_cdf, label='CDF')
plt.title("Cumulative Distribution Function (CDF) of Apple's Stock Daily Returns in 2023")
plt.xlabel('Daily Returns(%)')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.grid(True)
plt.show()
PPF
# Calculate the PPF for a range of probabilities
probabilities = [0.05, 0.25, 0.5, 0.75, 0.95]
x_ppf = appl_daily_return_distribution.ppf(probabilities)
# Plotting the Probability Point Function (PPF)
plt.figure(figsize=(8, 6))
plt.plot(probabilities, x_ppf, marker='o', linestyle='None', label='PPF')
plt.title("Probability Point Function (PPF) of of Apple's Stock Daily Returns in 2023")
plt.xlabel('Probability')
plt.ylabel('Stock Daily Returns(%)')
plt.legend()
plt.grid(True)
plt.show()
Conclusion
In summary, while the CDF gives the probability that a random variable is less than or equal to a particular value, the PPF helps in finding the value of the random variable for a given probability. They are complementary functions often used together in statistical analysis and probability calculations.