200字范文 > Python金融大数据分析——第10章推断统计学笔记1

Python金融大数据分析——第10章推断统计学笔记1

时间：2019-10-20 20:23:02

第10章推断统计学 10.1 随机数10.2 模拟 10.2.1 随机变量10.2.2 随机过程几何布朗运动平方根扩散随机波动率跳跃扩散 10.2.3 方差缩减

第10章推断统计学

Python金融大数据分析——第10章推断统计学笔记1

Python金融大数据分析——第10章推断统计学笔记2

Python金融大数据分析——第10章推断统计学笔记3

10.1 随机数

import numpy as npimport numpy.random as randomimport matplotlib.pyplot as plt# rand 函数返回开区间［O, 1 ）内的随机数,随机数的个数由参数指定random.rand(10)# array([ 0.43812444, 0.34411735, 0.23250114, 0.76714005, 0.20759936,# 0.70901672, 0.95422155, 0.08556689, 0.59033963, 0.84513443])random.rand(5, 5)# array([[ 0.95942423, 0.91671855, 0.33619313, 0.37931534, 0.59388659],# [ 0.84503838, 0.92572621, 0.57089753, 0.84832724, 0.6923007 ],# [ 0.0257402 , 0.73027026, 0.07831274, 0.85126426, 0.43927961],# [ 0.31733426, 0.0367936 , 0.26154412, 0.68299204, 0.06117947],# [ 0.3355343 , 0.72317741, 0.95397264, 0.91341195, 0.8424168 ]])# 如果想生成区间［5, 10 ）内的随机数，可以这样转换 rand 的返回值a = 5.b = 10.random.rand(10) * (b - a) + a# array([ 7.72031281, 9.49373699, 7.26951207, 5.08434385, 7.07330462,# 5.5169059 , 7.93266969, 9.59174389, 7.55476132, 9.07963314])# 由于 NumPy 的广播特性，这也适合于多维数组random.rand(5, 5) * (b - a) + a# array([[ 6.56262146, 6.58686089, 9.25527619, 7.36295298, 8.10034672],# [ 9.51719011, 8.79297476, 7.32629772, 8.85443737, 6.95337673],# [ 9.87850678, 8.87835651, 5.55394611, 9.09984161, 7.46512384],# [ 8.54888728, 8.34351926, 7.95810147, 6.20483389, 8.86515313],# [ 9.37562883, 5.81284007, 8.34719867, 6.14204529, 7.31620939]])

简单随机数生成函数

sample_size = 500rn1 = random.rand(sample_size, 3)rn2 = random.randint(0, 10, sample_size)rn3 = random.sample(size=sample_size)a = [0, 25, 50, 75, 100]rn4 = random.choice(a, size=sample_size)fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(7, 7))ax1.hist(rn1, bins=25, stacked=True)ax1.set_title( and)ax1.set_ylabel(frequency)ax1.grid(True)ax2.hist(rn2, bins=25)ax2.set_title( andint)ax2.grid(True)ax3.hist(rn3, bins=25)ax3.set_title(sample)ax3.set_ylabel(frequency)ax3.grid(True)ax4.hist(rn4, bins=25)ax4.set_title(choice)ax4.grid(True)

根据不同分布生成随机数的函数

虽然在金融学中使用（标准）正态分布遭到了许多批评，但是它们是不可或缺的工具，在分析和数值应用中仍然是最广泛使用的分布类型。原因之一是许多金融模型直接依赖于正态分布或者对数正态分布。另一个原因是许多不直接依赖（对数）正态假设的金融模型可以离散化，从而通过使用正态分布进行近似模拟。

sample_size = 500rn1 = random.standard_normal(sample_size) # 均值为0, 标准差为l的标准正态分布rn2 = random.normal(100, 20, sample_size) # 均值为 100 ，标准差为 20 的正态分布rn3 = random.chisquare(df=0.5, size=sample_size) # 自由度为 0.5 的卡方分布rn4 = random.poisson(lam=1.0, size=sample_size) # λ 值为 1 的泊松分布fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(7, 7))ax1.hist(rn1, bins=25)ax1.set_title(standard normal)ax1.set_ylabel(frequency)ax1.grid(True)ax2.hist(rn2, bins=25)ax2.set_title( ormal(100,20))ax2.grid(True)ax3.hist(rn3, bins=25)ax3.set_title(chi square)ax3.set_ylabel(frequency)ax3.grid(True)ax4.hist(rn4, bins=25)ax4.set_title(Poisson)ax4.grid(True)

10.2 模拟

蒙特卡洛模拟（MCS）是金融学中最重要的数值技术之一（在重要性和使用广泛程度上也许没有 “之一” ｝。这主要是因为它是最灵活的数学表达式（如积分）求值方法，特别适合于金融、衍生品的估值。但是，这种灵活性的代价是相对高的计算负担，估算一个值就可能需要数十万次甚至数百万次复杂计算。

10.2.1 随机变量

我们考虑期权定价所用的Black-Scholes-Meron设置。这种设置中，在今日股票指数水平 S0 S 0 给定的情况下，未来某个日期T的股票指数水平 ST S T

公式：以Black-Scholes-Merton设置模拟未来指数水平

ST=S0exp((r−12σ2)T+σT−−√z) S T = S 0 e x p ( ( r − 1 2 σ 2 ) T + σ T z )

ST S T ：T日的指数水平

r ：恒定无风险短期利率

σ σ ：S的恒定波动率（= 收益率的标准差）

z：标准正态分布随机变量

import numpy as npimport numpy.random as randomimport matplotlib.pyplot as pltS0 = 100 # initial valuer = 0.05 # constant short ratesigma = 0.25 # constant volatilityT = 2.0 # in yearsI = 10000 # number of random drawsST1 = S0 * np.exp((r - 0.5 * sigma ** 2) * T + sigma * np.sqrt(T) * random.standard_normal(I))plt.hist(ST1, bins=50)plt.xlabel(index level)plt.ylabel(frequency)plt.grid(True)

( 通过 standard_normal ) 模拟的几何布朗运动

随机变量呈对数正态分布。因此，我们还可以尝试使用 lognormal 函数直接得出随机变量值。在这种情况下，必须向函数提供均值和标准差：

ST2 = S0 * random.lognormal((r - 0.5 * sigma ** 2) * T, sigma * np.sqrt(T), size=I)plt.hist(ST2, bins=50)plt.xlabel(index level)plt.ylabel(frequency)plt.grid(True)

( 通过 lognormal ) 模拟的几何布朗运动

使用 scipy.stats子库和下面定义的助手函数 print_statistics 比较模拟结果的分布特性：

import scipy.stats as scsdef print_statistics(a1, a2):"""Print selected statistics"""sta1 = scs.describe(a1)sta2 = scs.describe(a2)print(\%14s %14s %14s % (statistic, data set 1, data set 2))print(45 * -)print(\%14s %14.3f %14.3f % (size, sta1[0], sta2[0]))print(\%14s %14.3f %14.3f % (min, sta1[1][0], sta2[1][0]))print(\%14s %14.3f %14.3f % (max, sta1[1][1], sta2[1][1]))print(\%14s %14.3f %14.3f % (mean, sta1[2], sta2[2]))print(\%14s %14.3f %14.3f % (std, np.sqrt(sta1[3]), np.sqrt(sta2[3])))print(\%14s %14.3f %14.3f % (skew, sta1[4], sta2[4]))print(\%14s %14.3f %14.3f % (kurtosis, sta1[5], sta2[5]))print_statistics(ST1, ST2)#statisticdata set 1data set 2# ---------------------------------------------# size10000.00010000.000# min 28.691 28.718# max 497.050 438.493# mean 110.298 111.023# std 40.380 40.577# skew1.1451.156# kurtosis2.6682.428

两个模拟结果的特性很类似，差异主要是由于模拟中的所谓采样误差。在离散地模拟连续随机过程时会引人离散化误差，但是由于模拟方法的特性，这种误差在此不起任何作用。

10.2.2 随机过程

粗略地讲，随机过程是一个随机变量序列。在这个意义上，我们应该预期，在模拟一个过程时，对一个随机变量的一序列重复模拟应该有某种类似之处。这个结论大体上是正确的，但是随机数的选取一般不是独立的？而是依赖于前几次选取的结果。不过，金融学中使用的随机过程通常表现出马尔科夫特性——主要的含义是：明天的过程值只依赖于今天的过程状态，而不依赖其他任何 “历史” 状态．甚至不依赖整个路径历史。这种过程也被称做 “无记忆过程”。