更新纪录:
2020/05/07-更新排版、代码、参考文献前言:
ESN是RNN的一个变种,优点在于训练速度比一般RNN快几百倍(用回归求解矩阵)。主要参考下面几篇文献,如果没有接触过这个网路的朋友请务必先看[1]和[2]。以前硕班有摸过一下,现在因为工作所需又重新複习了下,这边作个笔记然后有空在画成图上传到这篇。下面会有两个代码实现,但请注意这两个代码所实现的ESN结构略有不同,所以请务必看一下文献提到的Paper。
总共看了两个代码(ESN鼻祖的学生[6]、Medium网友[7]),后者算是变种版。下面会提供自己修改后的版本,其运行结果与原代码相同,只是稍微修改一下架构和补上自己的注释:
简易ESN[6]
# -*- coding: utf-8 -*-"""原代码, Mantas Lukoševičius(开山祖师Jaeger的学生), http://mantas.info数据集可以到Lukoševičius那边下载"""import matplotlib.pyplot as pltimport scipyimport numpy as npimport pandas as pdfrom sklearn.linear_model import Ridge, RidgeCV, ElasticNet, ElasticNetCV, Lasso, LassoCVfrom sklearn.metrics import mean_squared_errornp.random.seed(42)"""Step 1. 读取数据(MackeyGlass_t17),尺寸为(10000, 1)"""data = pd.read_table('MackeyGlass_t17.txt', header=None)"""Step 2. 定义训练集长度len_train、测试集长度len_test、暂态长度len_transient、误差计算长度len_error"""len_train = 2000len_test = 2000len_transient = min(int(len_train / 10), 100)len_error = 500"""Step 3. 画出数据前1000笔的曲线"""plt.figure('A sample of data').clear()plt.plot(data[0:1000])plt.title('A sample of data')"""Step 4. 定义ESN架构输入层神经元数inSize: - 等于输入资料的特徵数输出层神经元数outSize: - 等于目标资料的特徵数隐藏层神经元数resSize: - 建议最少100颗,且每次增幅单位为50颗 - 没看过超过1000的洩漏率a: - 加强短期记忆能力,範围在[0, 1] - 若a=1则ESN;a<1则Leaky ESN(P7)输入层的缩放因子input_scaling - 範围在[0, 1]稀疏程度sparsity: - 建议在0.01(1%)~0.05(5%),也有人建议0.05(5%)~0.2(20%)指定谱半径rho: - 範围在[0, 1)"""inSize = 1outSize = 1resSize = 100a = 0.3input_scaling = 1.0sparsity = 1.00rho = 1.25"""Step 5. 初始化输入层权重Win、隐藏层权重W输入层权重Win: - 如果有偏权值,则尺寸为(隐藏层神经元数, 1+输入层神经元数) - 否则(隐藏层神经元数, 输入层神经元数) - 常见範围在[-1, 1]或者[-0.5, 0.5] - 建议用常态分布初始隐藏层权重W_0: - 尺寸为(隐藏层神经元数, 隐藏层神经元数) - 常见的初始範围在[-1, 1]或者[-0.5, 0.5] - 这边提供另一种写法,借鉴于Madalina Ciortan,这个方法才真正达到稀疏效果 - W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense() - 0.5 - scipy.sparse.rand(列数, 行数, density=稀疏程度).todense() - 假设density=0.1(10%)、列数=10、行数=10,则10x10的矩阵中只有总计10个元素不为零,而且是不规则零散分布在矩阵中初始隐藏层权重的谱半径rhoW: - max( abs( W )) - rhoW<1,隐藏层才具有ECHO特性隐藏层权重W: - W = W_0 * rho/rhoW"""Win = (np.random.rand(resSize, 1+inSize)-0.5) * input_scalingW_0 = np.random.rand(resSize,resSize)-0.5# W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense()# W_0[np.where(W_0 > 0)] = W_0[np.where(W_0 > 0)] - 0.5rhoW = np.max(np.abs(np.linalg.eig(W_0)[0])) W = W_0 * rho/rhoW"""Step 5. 收集:取得每一时刻的状态x(t),然后存在所有时刻的状态X中 - t=0时,x(0)为一个尺寸(隐藏层神经元数,1)的零矩阵,y(0)为一个尺寸(1,输出层神经元数)的零矩阵 - 记忆库响应状态x(t=0)=(resSize, 1) - 有偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) ) = (隐藏层神经元数,1) = 纯量*(隐藏层神经元数,1) + 纯量*( (隐藏层神经元数,1+输入层神经元数)*(1+输入层神经元数,1) + (隐藏层神经元数, 隐藏层神经元数)*(隐藏层神经元数,1) ) = (隐藏层神经元数,1) = 纯量*(隐藏层神经元数,1) + 纯量*( (输入层神经元数,1) + (输入层神经元数,1) ) - 无偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*u(t) + W*x(t-1) ) - 记得要删掉X的前len_transient笔"""X = []x = np.zeros([resSize, 1])for t in range(len_train): u = data.loc[t] x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack((1, u))) + np.dot(W, x) ) X.append(np.vstack([1, u, x]))X = X[len_transient:]X = np.array(X).reshape(-1, 1+inSize+resSize)"""Step 6. 训练:利用所有时刻的状态X、所有时刻的目标Yt,以Ridge回归求得输出层权重Wout - 输出层权重的尺寸为(输出层神经元数, 1+输入层神经元数+隐藏层神经元数) - 借鉴Madalina Ciortan,我直接用Ridge取代了Wout - L1正则的回归器效果都不太好(Lasso、ElasticNet),我猜是因为很多权重被删掉的关係 - y(t) = Wout*[1;u(t);x(t)] (输出层神经元数,1) = (输出层神经元数,1+输入层神经元数+隐藏层神经元数)*(1+输入层神经元数+隐藏层神经元数,1)"""Yt = data.loc[len_transient+1:len_train]ridge = Ridge(alpha=1e-8)ridge.fit(X, Yt) """Step 7. 测试 - 第一笔测试输入u(t-1)=data[2000] - 第一笔状态x(t-1)延续于Step 5 - Y是用来记录模型在测试阶段各时刻的输出 - 作者提供两种模型输出:递迴式多步预测和单步预测,选一个用就可以了 - 有偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) ) y(t) = Wout*[1;u(t);x(t)] - 无偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*u(t) + W*x(t-1) ) y(t) = Wout*[u(t);x(t)] """Y_pred = []u = data.loc[len_train]for t in range(len_test): x = (1-a)*x + a*np.tanh( np.dot( Win, np.vstack([1,u]) ) + np.dot( W, x ) ) y = ridge.predict(np.vstack((1,u,x)).T) Y_pred.append(y) # 递迴式多步预测,也就是当前时刻的模型输出会作为下一刻模型的输入 u = y # 单步预测,当前时刻的输入数据是从实际量测或者离线数据中取得 # u = data.loc[len_train+t+1] Y_pred = np.array(Y_pred)Y_pred = Y_pred.reshape(-1, outSize)"""Step 8. 採MSE计算测试集中前len_error笔误差 - 为什么只计算len_error笔误差,因为Mantas当初是採递迴式多步预测作为输出方式 - 递迴式多步预测(Recursive Multi-Step Forecasting)的问题在于预测能力会随着时间而衰弱,从本代码的执行结果中也显而易见 - 所以我猜这就是他只取500笔作计算的原因"""mse = mean_squared_error(data.loc[len_train+1:len_train+len_error], Y_pred[:len_error])print('MSE = ' + str( mse )) """画出测试集输入讯号"""plt.figure('Target and generated signals').clear()plt.plot(data.loc[len_train+1:len_train+len_test].to_numpy(), 'g')plt.plot( Y_pred, 'b' )plt.title('Target and generated signals $y(n)$ starting at $n=0$')plt.legend(['Target signal', 'Free-running predicted signal'])"""画出20条权重在t=100~300的状态响应X"""plt.figure('Some reservoir activations').clear()plt.plot( X[0:200, 0:20] )# 看起来是t=0~200,但加上沖洗时间initLen=100后应为t=100~300plt.title('Some reservoir activations $\mathbf{x}(n)$')"""画出输出权重的值"""plt.figure('Output weights').clear()plt.bar( range(1+inSize+resSize), ridge.coef_.ravel() )plt.title('Output weights $\mathbf{W}^{out}$')plt.show()
递迴式多步预测结果,可以注意到从第1000笔开始发散,所以Lukoševičius才会只取500笔计算误差
单步预测结果
Medium网友[7]:
# -*- coding: utf-8 -*-"""重点文献:1. Reservoir computing approaches for representation and classification of multivariate time series2. https://towardsdatascience.com/gentle-introduction-to-echo-state-networks-af99e5373c68"""import numpy as npimport scipy.iofrom scipy import sparsefrom sklearn.preprocessing import OneHotEncoderfrom sklearn.decomposition import PCAfrom sklearn.linear_model import Ridgefrom sklearn.metrics import accuracy_score, f1_score"""Reservoir implementation"""class Reservoir(object): def __init__(self, n_internal_units=100, spectral_radius=0.99, leak=None, connectivity=0.3, input_scaling=0.2, noise_level=0.01, circle=False): """ Step 1. 建立参数 - n_internal_units = 记忆库神经元,某文献说约在300~100,但我看过的期刊最少都100起跳 - input_scaling = Win的缩放因子 - noise_level = 白噪音的缩放因子 - leak = 洩漏率,範围在[0, 1]之间 """ self._n_internal_units = n_internal_units self._input_scaling = input_scaling self._noise_level = noise_level self._leak = leak """ Step 2. 输入权重Win初始化为None - _input_weights = 输入权重 """ self._input_weights = None """ Step 3. 根据递迴机制circle启用与否(Wr),来决定记忆库权重W的建立方式 - circle = 递迴机制 - _internal_weights = 记忆库权重,size=[n_internal_units, n_internal_units] """ if circle: # W+Wr self._internal_weights = self._initialize_internal_weights_Circ(n_internal_units, spectral_radius) else: # W self._internal_weights = self._initialize_internal_weights(n_internal_units, connectivity, spectral_radius) def _initialize_internal_weights_Circ(self, n_internal_units, spectral_radius): """ """ internal_weights = np.zeros((n_internal_units, n_internal_units)) internal_weights[0,-1] = spectral_radius for i in range(n_internal_units-1): internal_weights[i+1,i] = spectral_radius return internal_weights def _initialize_internal_weights(self, n_internal_units, connectivity, spectral_radius): """ Step 4. 以常态分布产生稀疏度connectivity的记忆库权重W - internal_weights = 记忆库权重W,size=[n_internal_units, n_internal_units] - connectivity = 稀疏度,部分期刊建议设1%~5%。例如某个10X10的矩阵,稀疏度设10%,则矩阵中只有10个元素不为零。 - spectral_radius = 谱半径,必须要<1 """ internal_weights = sparse.rand(n_internal_units, n_internal_units, density=connectivity).todense() """ Step 5. 记忆库权重W所有非零项範围从0~+1=>-0.5~+0.5 """ internal_weights[np.where(internal_weights > 0)] -= 0.5 """ Step 6. 求得记忆库权重W的特徵值E,然后以e_max=max( abs( E ) )和谱半径spectral_radius调整记忆库权重W - W = (spectral_radius*W)/e_max - 这边的spectral_radius为调整谱半径用的係数 - 实际上W的谱半径是e_max没有错 """ E, _ = np.linalg.eig(internal_weights) e_max = np.max(np.abs(E)) internal_weights /= np.abs(e_max)/spectral_radius return internal_weights def _compute_state_matrix(self, X, n_drop=0): """ Step 10. 初始化过往状态x(t) previous_state = 过往状态,size=[N, _n_internal_units] """ N, T, _ = X.shape previous_state = np.zeros((N, self._n_internal_units), dtype=float) """ Step 11. 建立状态矩阵X state_matrix = 状态矩阵,size=[N, T-n_drop, _n_internal_units] """ state_matrix = np.empty((N, T - n_drop, self._n_internal_units), dtype=float) for t in range(T): # 取得当前输入u(t) current_input = X[:, t, :] # 计算活化函数前的状态状态x(t) # state_before_tanh = W*x(t-1)+Win*u(t) state_before_tanh = self._internal_weights.dot(previous_state.T) + self._input_weights.dot(current_input.T) # 加入白噪音 #state_before_tanh = W*x(t-1)+Win*u(t) + random*_noise_level state_before_tanh += np.random.rand(self._n_internal_units, N)*self._noise_level # 加入非线性tanh()以及洩漏率_leak # 虽然原作者列了条件,但应该可以概括为x(t) = (1-a)*x(t-1) + tanh(W*x(t-1)+Win*u(t) + random*_noise_level) # 当a=1即条件1;否则条件2 if self._leak is None: # 条件1:a=1,ESN previous_state = np.tanh(state_before_tanh).T else: # 条件2:a=[0, 1),Leaky ESN # 这边应该要写x(t)=(1-a)*x(t-1) + a*x(t) previous_state = (1.0 - self._leak)*previous_state + np.tanh(state_before_tanh).T # 当超过沖洗时间,将过往状态储存x(t)在状态矩阵X if (t > n_drop - 1): state_matrix[:, t - n_drop, :] = previous_state return state_matrix def get_states(self, X, n_drop=0, bidir=True): # 取得数据列数N、时间步数T、特徵数V # 若Win未建立,则建立Win """ Step 8. 根据数据的三个维度大小(N:列数或资料笔数;T:时间步数;V:行数或特徵数),建立输入权重Win - _input_weights = 输入权重,size=[_n_internal_units, V] - np.random.binomial(1, 0.5 , [self._n_internal_units, V]):RxV个0或1 - 2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0:RxV个-1或1 - (2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0)*self._input_scaling:RxV个-_input_scaling或_input_scaling """ N, T, V = X.shape if self._input_weights is None: self._input_weights = (2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0)*self._input_scaling """ Step 9. 计算顺向状态矩阵X - states/state_matrix = 状态矩阵,size=[N, T-n_drop, _n_internal_units] """ states = self._compute_state_matrix(X, n_drop) """ Step 12. 如果双向机制bidir启动,将数据反向后再算一次状态矩阵X,然后顺向状态矩阵X与反向状态矩阵Xr合併 """ if bidir is True: # 数据沿着时间轴反向 X_r = X[:, ::-1, :] # 计算反向状态矩阵states_r states_r = self._compute_state_matrix(X_r, n_drop) # 顺向状态矩阵states与反向状态矩阵states_r合併 states = np.concatenate((states, states_r), axis=2) return states def getReservoirEmbedding(self, X,pca, ridge_embedding, n_drop=5, bidir=True, test = False): """ Step 7. 取得初始状态res_states - 初始状态矩阵res_states = 各时刻的状态(除了前n_drop笔) - 抛弃笔数n_drop = 前n笔不记录在状态矩阵states """ res_states = self.get_states(X, n_drop=5, bidir=True) """ Step 13. 为了降维,将初始状态矩阵res_states从[N, T-n_drop, _n_internal_units]重塑为res_states[N*(T-n_drop), _n_internal_units] 资料笔数N_samples """ N_samples = res_states.shape[0] res_states = res_states.reshape(-1, res_states.shape[2]) """ Step 14. 根据当前模式,决定PCA是对res_states採学习或者转换方式 经降维后的res_states """ # 若目前为测试模式,则pca仅执行转换 if test: red_states = pca.transform(res_states) # 若目前为训练模式,则pca执行学习 else: red_states = pca.fit_transform(res_states) # 重新将red_states从[N*(T-n_drop), _n_internal_units]=>[N, T-n_drop, _n_internal_units] red_states = red_states.reshape(N_samples,-1,red_states.shape[1]) """ Step 15. 透过Ridge回归,进行编码 编码器input_repr """ coeff_tr = [] biases_tr = [] for i in range(X.shape[0]): ridge_embedding.fit(red_states[i, 0:-1, :], red_states[i, 1:, :]) coeff_tr.append(ridge_embedding.coef_.ravel()) biases_tr.append(ridge_embedding.intercept_.ravel()) print(np.array(coeff_tr).shape,np.array(biases_tr).shape) input_repr = np.concatenate((np.vstack(coeff_tr), np.vstack(biases_tr)), axis=1) return input_repr"""读取资料"""data = scipy.io.loadmat('dataset/JpVow.mat')X = data['X'] Y = data['Y']Xte = data['Xte']Yte = data['Yte']"""对输出作热编码"""onehot_encoder = OneHotEncoder(sparse=False, categories='auto')Y = onehot_encoder.fit_transform(Y)Yte = onehot_encoder.transform(Yte)"""建立降维器、编码器,以及解码器"""pca = PCA(n_components=75) ridge_embedding = Ridge(alpha=10, fit_intercept=True)readout = Ridge(alpha=5)"""ESN训练"""res = Reservoir(n_internal_units=450, spectral_radius=0.6, leak=0.6, connectivity=0.25, input_scaling=0.1, noise_level=0.01, circle=False)"""降维器训练、编码器训练"""input_repr = res.getReservoirEmbedding(X, pca, ridge_embedding, n_drop=5, bidir=True, test = False)"""将测试资料餵入ESN,并且进行编码"""input_repr_te = res.getReservoirEmbedding(Xte,pca, ridge_embedding, n_drop=5, bidir=True, test = True)"""Step 16. 解码器训练"""readout.fit(input_repr, Y)"""将测试资料餵入解码器"""logits = readout.predict(input_repr_te)"""分类"""pred_class = np.argmax(logits, axis=1)"""结果可视化"""true_class = np.argmax(Yte, axis=1)print(accuracy_score(true_class, pred_class))print(f1_score(true_class, pred_class, average='weighted'))
以[6]维框架,加入了[7]的Wfb、noise、Win的定义方法后,其误差可以降到9.58e-10
# -*- coding: utf-8 -*-"""原代码, Mantas Lukoševičius(开山祖师Jaeger的学生), http://mantas.info数据集可以到Lukoševičius那边下载"""import matplotlib.pyplot as pltimport scipyimport numpy as npimport pandas as pdfrom sklearn.linear_model import Ridge, RidgeCV, ElasticNet, ElasticNetCV, Lasso, LassoCVfrom sklearn.metrics import mean_squared_errornp.random.seed(42)"""Step 1. 读取数据(MackeyGlass_t17),尺寸为(10000, 1)"""data = pd.read_table('MackeyGlass_t17.txt', header=None).to_numpy()"""Step 2. 定义训练集长度len_train、测试集长度len_test、暂态长度len_transient、误差计算长度len_error"""len_train = 2000len_test = 2000len_transient = min(int(len_train / 10), 100)len_error = 500"""Step 3. 画出数据前1000笔的曲线"""plt.figure('A sample of data').clear()plt.plot(data[0:1000])plt.title('A sample of data')"""Step 4. 定义ESN架构输入层神经元数Nu: - 等于输入资料的特徵数输出层神经元数Ny: - 等于目标资料的特徵数隐藏层神经元数Nx: - 建议最少100颗,且每次增幅单位为50颗 - 没看过超过1000的洩漏率a: - 加强短期记忆能力,範围在[0, 1] - 若a=1则ESN;a<1则Leaky ESN(P7)输入层的缩放因子input_scaling - 範围在[0, 1]稀疏程度sparsity: - 建议在0.01(1%)~0.05(5%),也有人建议0.05(5%)~0.2(20%)指定谱半径rho: - 範围在[0, 1)使用偏权值use_bias使用输出层回馈权重(Wfb)use_fb"""Nu = 1Ny = 1Nx = 500a = 0.3input_scaling = 0.01sparsity = 0.05rho = 0.99use_bias = Trueuse_fb = Truenoise_level = 0.00"""Step 5. 初始化输入层权重Win、隐藏层权重W输入层权重Win: - 如果有偏权值,则尺寸为(隐藏层神经元数, 1+输入层神经元数) - 否则(隐藏层神经元数, 输入层神经元数) - 常见範围在[-1, 1]或者[-0.5, 0.5] - 建议用常态分布输出层回馈权重Wfb: - 尺寸为(输出层神经元数, 1) - 常见範围在[-1, 1]或者[-0.5, 0.5] - 建议用常态分布初始隐藏层权重W_0: - 尺寸为(隐藏层神经元数, 隐藏层神经元数) - 常见的初始範围在[-1, 1]或者[-0.5, 0.5] - 这边提供另一种写法,借鉴于Madalina Ciortan,这个方法才真正达到稀疏效果 - W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense() - 0.5 - scipy.sparse.rand(列数, 行数, density=稀疏程度).todense() - 假设density=0.1(10%)、列数=10、行数=10,则10x10的矩阵中只有总计10个元素不为零,而且是不规则零散分布在矩阵中初始隐藏层权重的谱半径rhoW: - max( abs( W_0 )) - rhoW<1,隐藏层才具有ECHO特性隐藏层权重W: - W = W_0 * rho/rhoW"""if use_bias==True: # Win = (np.random.rand(Nx, 1+Nu)-0.5) * input_scaling Win = (2.0*np.random.binomial(1, 0.5 , [Nx, 1+Nu]) - 1.0) * input_scalingelse: # Win = (np.random.rand(Nx, Nu)-0.5) * input_scaling Win = (2.0*np.random.binomial(1, 0.5 , [Nx, Nu]) - 1.0) * input_scalingif use_fb==True: # Wfb = np.random.rand(Nx,Ny)-0.5 Wfb = (2.0*np.random.binomial(1, 0.5 , [Nx, Ny]) - 1.0)else: Wfb = np.zeros([Nx,Ny])# W_0 = np.random.rand(Nx,Nx)-0.5W_0 = scipy.sparse.rand(Nx, Nx, density=sparsity).todense()W_0[np.where(W_0 > 0)] = W_0[np.where(W_0 > 0)] - 0.5rhoW = np.max(np.abs(np.linalg.eig(W_0)[0])) W = W_0 * rho/rhoW"""Step 5. 收集:取得每一时刻的状态x(t),然后存在所有时刻的状态X中 - t=0时,x(0)为一个尺寸(隐藏层神经元数,1)的零矩阵,y(0)为一个尺寸(输出层神经元数,1)的零矩阵 - 无偏权值、无回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win* u(t) + W*x(t-1) + 0 );X尺寸=(笔数, 输入层神经元数+隐藏层神经元数 ) - 无偏权值、有回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win* u(t) + W*x(t-1) + Wfb*y(t-1) );X尺寸=(笔数, 输入层神经元数+隐藏层神经元数+输出层神经元数) - 有偏权值、无回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) + 0 );X尺寸=(笔数, 1+输入层神经元数+隐藏层神经元数 ) - 有偏权值、有回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) + Wfb*y(t-1) );X尺寸=(笔数, 1+输入层神经元数+隐藏层神经元数+输出层神经元数) - X应为一个尺寸(隐藏层神经元数,笔数) - 在训练Wout前记得要删掉X的前len_transient笔"""X = []x = np.zeros([Nx, 1])y = np.zeros([Ny, 1])for t in range(len_train): u = data[t:t+1] if use_bias==True: x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack([1, u])) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level) else: x = (1-a)*x + a*np.tanh( np.dot(Win, u) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand([Nx, 1])*noise_level) if use_bias==False and use_fb==False: X.append(np.vstack([u, x])) elif use_bias==False and use_fb==True: X.append(np.vstack([u, x, y])) elif use_bias==True and use_fb==False: X.append(np.vstack([1, u, x])) elif use_bias==True and use_fb==True: X.append(np.vstack([1, u, x, y])) y = data[t:t+1]X = X[len_transient:]X = np.array(X).reshape(len_train-len_transient, -1)"""Step 6. 训练:利用所有时刻的状态X、所有时刻的目标Yt,以Ridge回归求得输出层权重Wout - 输出层权重的尺寸为(输出层神经元数, 1+输入层神经元数+隐藏层神经元数) - 借鉴Madalina Ciortan,我直接用Ridge取代了Wout - L1正则的回归器效果都不太好(Lasso、ElasticNet),我猜是因为很多权重被删掉的关係 - 无偏权值、无回馈:Wout尺寸=(输出层神经元数, 输入层神经元数+隐藏层神经元数 ) - 无偏权值、有回馈:Wout尺寸=(输出层神经元数, 输入层神经元数+隐藏层神经元数+输出层神经元数) - 有偏权值、无回馈:Wout尺寸=(输出层神经元数, 1+输入层神经元数+隐藏层神经元数 ) - 有偏权值、有回馈:Wout尺寸=(输出层神经元数, 1+输入层神经元数+隐藏层神经元数+输出层神经元数)"""y_train = data[len_transient+1:len_train+1]ridge = Ridge(alpha=1e-8)ridge.fit(X, y_train) """Step 7. 测试 - 第一笔测试输入u(t-1)=data[2000] - 第一笔状态x(t-1)延续于Step 5 - Y是用来记录模型在测试阶段各时刻的输出 - 作者提供两种模型输出:递迴式多步预测和单步预测,选一个用就可以了 - 无偏权值、无回馈:y(t) = Wout*[ u(t);x(t)] - 无偏权值、有回馈:y(t) = Wout*[ u(t);x(t);y(t-1)] - 有偏权值、无回馈:y(t) = Wout*[1;u(t);x(t)] - 有偏权值、有回馈:y(t) = Wout*[1;u(t);x(t);y(t-1)] """y_pred = []u = data[len_train:len_train+1]for t in range(len_test): if use_bias==True: x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack([1, u])) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level) else: x = (1-a)*x + a*np.tanh( np.dot(Win, u) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level) # print(t) # if t==504: # print() if use_bias==False and use_fb==False: y = ridge.predict(np.vstack([u, x]).T) elif use_bias==False and use_fb==True: y = ridge.predict(np.vstack([u, x, y]).T) elif use_bias==True and use_fb==False: y = ridge.predict(np.vstack([1, u, x]).T) elif use_bias==True and use_fb==True: y = ridge.predict(np.vstack([1, u, x, y]).T) y_pred.append(y) # 递迴式多步预测,也就是当前时刻的模型输出会作为下一刻模型的输入 # u = y # 单步预测,当前时刻的输入数据是从实际量测或者离线数据中取得 u = data[len_train+t+1:len_train+t+2] y = data[len_train+t:len_train+t+1] y_pred = np.array(y_pred)y_pred = y_pred.reshape(-1, Ny)"""Step 8. 採MSE计算测试集中前len_error笔误差 - 为什么只计算len_error笔误差,因为Mantas当初是採递迴式多步预测作为输出方式 - 递迴式多步预测(Recursive Multi-Step Forecasting)的问题在于预测能力会随着时间而衰弱,从本代码的执行结果中也显而易见 - 所以我猜这就是他只取500笔作计算的原因"""mse = mean_squared_error(data[len_train+1:len_train+1+len_error], y_pred[:len_error])print('MSE = ' + str( mse )) """画出测试集输入讯号"""plt.figure('Target and generated signals').clear()plt.plot(data[len_train+1:len_train+1+len_test], 'g')plt.plot( y_pred, 'b' )plt.title('Target and generated signals $y(n)$ starting at $n=0$')plt.legend(['Target signal', 'Free-running predicted signal'])"""画出20条权重在t=100~300的状态响应X"""plt.figure('Some reservoir activations').clear()plt.plot( X[0:200, 0:20] )# 看起来是t=0~200,但加上沖洗时间initLen=100后应为t=100~300plt.title('Some reservoir activations $\mathbf{x}(n)$')"""画出输出权重的值"""plt.figure('Output weights').clear()if use_bias==False and use_fb==False: plt.bar( range(Nu+Nx), ridge.coef_.ravel() )elif use_bias==False and use_fb==True: plt.bar( range(Nu+Nx+Ny), ridge.coef_.ravel() )elif use_bias==True and use_fb==False: plt.bar( range(1+Nu+Nx), ridge.coef_.ravel() )elif use_bias==True and use_fb==True: plt.bar( range(1+Nu+Nx+Ny), ridge.coef_.ravel() )plt.title('Output weights $\mathbf{W}^{out}$')plt.show()
参考文献
[1]An Introduction to the Echo State Network and its Applications in Power System:入门级期刊(含应用),若不认识ESN的话这篇一定要先看
[2]Adaptive Nonlinear System Identification with Echo State Networks:开山祖师Jaeger写的,也是入门级期刊,若不认识ESN的话这篇一定要先看
[3]A Practical Guide to Applying Echo State Networks:Lukoševičius(开山祖师Jaeger的硕+博学生)写的,重点观念在P2~P3
[4]Predictive Modeling with Echo State Networks:这篇加了Wfb,以及各参数的实验结果,2.1节是重点
[5]Echo State Networks for Adaptive Filtering:重点观念在P18的Algorithm 1,终于明白Lukoševičius原代码的1.25是什么了
[6]Simple Echo State Network implementations:Lukoševičius提供简易代码,与[2]相呼应
[7]Gentle introduction to Echo State Networks:Ciortan(Medium网友)的代码,该代码源头可追朔到[8]
[8]Reservoir computing approaches for representation and classification of multivariate time series:作时间序列辨识的,加入双向机制、降维和编解码器
[9]An Introduction to:Reservoir Computing and Echo State Networks:入门级投影片