笔记-回声状态神经网路(Echo state network)

更新纪录:

2020/05/07-更新排版、代码、参考文献

前言:

ESN是RNN的一个变种,优点在于训练速度比一般RNN快几百倍(用回归求解矩阵)。主要参考下面几篇文献,如果没有接触过这个网路的朋友请务必先看[1]和[2]。以前硕班有摸过一下,现在因为工作所需又重新複习了下,这边作个笔记然后有空在画成图上传到这篇。下面会有两个代码实现,但请注意这两个代码所实现的ESN结构略有不同,所以请务必看一下文献提到的Paper。

总共看了两个代码(ESN鼻祖的学生[6]、Medium网友[7]),后者算是变种版。下面会提供自己修改后的版本,其运行结果与原代码相同,只是稍微修改一下架构和补上自己的注释:

简易ESN[6]

# -*- coding: utf-8 -*-"""原代码, Mantas Lukoševičius(开山祖师Jaeger的学生), http://mantas.info数据集可以到Lukoševičius那边下载"""import matplotlib.pyplot as pltimport scipyimport numpy as npimport pandas as pdfrom sklearn.linear_model import Ridge, RidgeCV, ElasticNet, ElasticNetCV, Lasso, LassoCVfrom sklearn.metrics import mean_squared_errornp.random.seed(42)"""Step 1. 读取数据(MackeyGlass_t17),尺寸为(10000, 1)"""data = pd.read_table('MackeyGlass_t17.txt', header=None)"""Step 2. 定义训练集长度len_train、测试集长度len_test、暂态长度len_transient、误差计算长度len_error"""len_train = 2000len_test = 2000len_transient = min(int(len_train / 10), 100)len_error = 500"""Step 3. 画出数据前1000笔的曲线"""plt.figure('A sample of data').clear()plt.plot(data[0:1000])plt.title('A sample of data')"""Step 4. 定义ESN架构输入层神经元数inSize:    - 等于输入资料的特徵数输出层神经元数outSize:    - 等于目标资料的特徵数隐藏层神经元数resSize:    - 建议最少100颗,且每次增幅单位为50颗    - 没看过超过1000的洩漏率a:    - 加强短期记忆能力,範围在[0, 1]    - 若a=1则ESN;a<1则Leaky ESN(P7)输入层的缩放因子input_scaling    - 範围在[0, 1]稀疏程度sparsity:    - 建议在0.01(1%)~0.05(5%),也有人建议0.05(5%)~0.2(20%)指定谱半径rho:    - 範围在[0, 1)"""inSize = 1outSize = 1resSize = 100a = 0.3input_scaling = 1.0sparsity = 1.00rho = 1.25"""Step 5. 初始化输入层权重Win、隐藏层权重W输入层权重Win:    - 如果有偏权值,则尺寸为(隐藏层神经元数, 1+输入层神经元数)    - 否则(隐藏层神经元数, 输入层神经元数)    - 常见範围在[-1, 1]或者[-0.5, 0.5]    - 建议用常态分布初始隐藏层权重W_0:    - 尺寸为(隐藏层神经元数, 隐藏层神经元数)    - 常见的初始範围在[-1, 1]或者[-0.5, 0.5]    - 这边提供另一种写法,借鉴于Madalina Ciortan,这个方法才真正达到稀疏效果    - W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense() - 0.5    - scipy.sparse.rand(列数, 行数, density=稀疏程度).todense()    - 假设density=0.1(10%)、列数=10、行数=10,则10x10的矩阵中只有总计10个元素不为零,而且是不规则零散分布在矩阵中初始隐藏层权重的谱半径rhoW:    - max( abs( W ))    - rhoW<1,隐藏层才具有ECHO特性隐藏层权重W:    - W = W_0 * rho/rhoW"""Win = (np.random.rand(resSize, 1+inSize)-0.5) * input_scalingW_0 = np.random.rand(resSize,resSize)-0.5# W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense()# W_0[np.where(W_0 > 0)] = W_0[np.where(W_0 > 0)] - 0.5rhoW = np.max(np.abs(np.linalg.eig(W_0)[0])) W = W_0 * rho/rhoW"""Step 5. 收集:取得每一时刻的状态x(t),然后存在所有时刻的状态X中    - t=0时,x(0)为一个尺寸(隐藏层神经元数,1)的零矩阵,y(0)为一个尺寸(1,输出层神经元数)的零矩阵    - 记忆库响应状态x(t=0)=(resSize, 1)    - 有偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) )     = (隐藏层神经元数,1) = 纯量*(隐藏层神经元数,1) + 纯量*( (隐藏层神经元数,1+输入层神经元数)*(1+输入层神经元数,1) + (隐藏层神经元数, 隐藏层神经元数)*(隐藏层神经元数,1) )    = (隐藏层神经元数,1) = 纯量*(隐藏层神经元数,1) + 纯量*( (输入层神经元数,1) + (输入层神经元数,1) )        - 无偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*u(t) + W*x(t-1) )    - 记得要删掉X的前len_transient笔"""X = []x = np.zeros([resSize, 1])for t in range(len_train):    u = data.loc[t]    x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack((1, u))) + np.dot(W, x) )    X.append(np.vstack([1, u, x]))X = X[len_transient:]X = np.array(X).reshape(-1, 1+inSize+resSize)"""Step 6. 训练:利用所有时刻的状态X、所有时刻的目标Yt,以Ridge回归求得输出层权重Wout    - 输出层权重的尺寸为(输出层神经元数, 1+输入层神经元数+隐藏层神经元数)    - 借鉴Madalina Ciortan,我直接用Ridge取代了Wout    - L1正则的回归器效果都不太好(Lasso、ElasticNet),我猜是因为很多权重被删掉的关係        - y(t) = Wout*[1;u(t);x(t)]      (输出层神经元数,1) = (输出层神经元数,1+输入层神经元数+隐藏层神经元数)*(1+输入层神经元数+隐藏层神经元数,1)"""Yt = data.loc[len_transient+1:len_train]ridge = Ridge(alpha=1e-8)ridge.fit(X, Yt)    """Step 7. 测试    - 第一笔测试输入u(t-1)=data[2000]    - 第一笔状态x(t-1)延续于Step 5    - Y是用来记录模型在测试阶段各时刻的输出    - 作者提供两种模型输出:递迴式多步预测和单步预测,选一个用就可以了    - 有偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) )               y(t) = Wout*[1;u(t);x(t)]    - 无偏权值:x(t) = (1-a)*x(t-1) + a*tanh( Win*u(t) + W*x(t-1) )              y(t) = Wout*[u(t);x(t)]    """Y_pred = []u = data.loc[len_train]for t in range(len_test):    x = (1-a)*x + a*np.tanh( np.dot( Win, np.vstack([1,u]) ) + np.dot( W, x ) )    y = ridge.predict(np.vstack((1,u,x)).T)    Y_pred.append(y)    # 递迴式多步预测,也就是当前时刻的模型输出会作为下一刻模型的输入    u = y    # 单步预测,当前时刻的输入数据是从实际量测或者离线数据中取得    # u = data.loc[len_train+t+1] Y_pred = np.array(Y_pred)Y_pred = Y_pred.reshape(-1, outSize)"""Step 8. 採MSE计算测试集中前len_error笔误差    - 为什么只计算len_error笔误差,因为Mantas当初是採递迴式多步预测作为输出方式    - 递迴式多步预测(Recursive Multi-Step Forecasting)的问题在于预测能力会随着时间而衰弱,从本代码的执行结果中也显而易见    - 所以我猜这就是他只取500笔作计算的原因"""mse = mean_squared_error(data.loc[len_train+1:len_train+len_error], Y_pred[:len_error])print('MSE = ' + str( mse ))    """画出测试集输入讯号"""plt.figure('Target and generated signals').clear()plt.plot(data.loc[len_train+1:len_train+len_test].to_numpy(), 'g')plt.plot( Y_pred, 'b' )plt.title('Target and generated signals $y(n)$ starting at $n=0$')plt.legend(['Target signal', 'Free-running predicted signal'])"""画出20条权重在t=100~300的状态响应X"""plt.figure('Some reservoir activations').clear()plt.plot( X[0:200, 0:20] )# 看起来是t=0~200,但加上沖洗时间initLen=100后应为t=100~300plt.title('Some reservoir activations $\mathbf{x}(n)$')"""画出输出权重的值"""plt.figure('Output weights').clear()plt.bar( range(1+inSize+resSize), ridge.coef_.ravel() )plt.title('Output weights $\mathbf{W}^{out}$')plt.show()

递迴式多步预测结果,可以注意到从第1000笔开始发散,所以Lukoševičius才会只取500笔计算误差
http://img2.58codes.com/2024/201247666GZswxCOWZ.png
单步预测结果
http://img2.58codes.com/2024/20124766slv9Uq7Koe.png

Medium网友[7]:

# -*- coding: utf-8 -*-"""重点文献:1. Reservoir computing approaches for representation and classification of multivariate time series2. https://towardsdatascience.com/gentle-introduction-to-echo-state-networks-af99e5373c68"""import numpy as npimport scipy.iofrom scipy import sparsefrom sklearn.preprocessing import OneHotEncoderfrom sklearn.decomposition import PCAfrom sklearn.linear_model import Ridgefrom sklearn.metrics import accuracy_score, f1_score"""Reservoir implementation"""class Reservoir(object):        def __init__(self, n_internal_units=100, spectral_radius=0.99, leak=None,                 connectivity=0.3, input_scaling=0.2, noise_level=0.01, circle=False):        """        Step 1. 建立参数        - n_internal_units = 记忆库神经元,某文献说约在300~100,但我看过的期刊最少都100起跳        - input_scaling = Win的缩放因子        - noise_level = 白噪音的缩放因子        - leak = 洩漏率,範围在[0, 1]之间        """        self._n_internal_units = n_internal_units        self._input_scaling = input_scaling        self._noise_level = noise_level        self._leak = leak                        """        Step 2. 输入权重Win初始化为None        - _input_weights = 输入权重        """        self._input_weights = None                        """        Step 3. 根据递迴机制circle启用与否(Wr),来决定记忆库权重W的建立方式        - circle = 递迴机制        - _internal_weights = 记忆库权重,size=[n_internal_units, n_internal_units]        """               if circle:            # W+Wr            self._internal_weights = self._initialize_internal_weights_Circ(n_internal_units,                                                                            spectral_radius)        else:            # W            self._internal_weights = self._initialize_internal_weights(n_internal_units,                                                                       connectivity,                                                                       spectral_radius)    def _initialize_internal_weights_Circ(self, n_internal_units, spectral_radius):        """                """        internal_weights = np.zeros((n_internal_units, n_internal_units))        internal_weights[0,-1] = spectral_radius        for i in range(n_internal_units-1):            internal_weights[i+1,i] = spectral_radius                        return internal_weights            def _initialize_internal_weights(self, n_internal_units,                                     connectivity, spectral_radius):        """        Step 4. 以常态分布产生稀疏度connectivity的记忆库权重W        - internal_weights = 记忆库权重W,size=[n_internal_units, n_internal_units]        - connectivity = 稀疏度,部分期刊建议设1%~5%。例如某个10X10的矩阵,稀疏度设10%,则矩阵中只有10个元素不为零。            - spectral_radius = 谱半径,必须要<1        """        internal_weights = sparse.rand(n_internal_units,                                       n_internal_units,                                       density=connectivity).todense()        """        Step 5. 记忆库权重W所有非零项範围从0~+1=>-0.5~+0.5                """        internal_weights[np.where(internal_weights > 0)] -= 0.5                        """        Step 6. 求得记忆库权重W的特徵值E,然后以e_max=max( abs( E ) )和谱半径spectral_radius调整记忆库权重W        - W = (spectral_radius*W)/e_max        - 这边的spectral_radius为调整谱半径用的係数        - 实际上W的谱半径是e_max没有错        """        E, _ = np.linalg.eig(internal_weights)        e_max = np.max(np.abs(E))        internal_weights /= np.abs(e_max)/spectral_radius               return internal_weights    def _compute_state_matrix(self, X, n_drop=0):        """        Step 10. 初始化过往状态x(t)        previous_state = 过往状态,size=[N, _n_internal_units]        """        N, T, _ = X.shape        previous_state = np.zeros((N, self._n_internal_units), dtype=float)                """        Step 11. 建立状态矩阵X        state_matrix = 状态矩阵,size=[N, T-n_drop, _n_internal_units]        """                state_matrix = np.empty((N, T - n_drop, self._n_internal_units), dtype=float)        for t in range(T):            # 取得当前输入u(t)            current_input = X[:, t, :]            # 计算活化函数前的状态状态x(t)            # state_before_tanh = W*x(t-1)+Win*u(t)            state_before_tanh = self._internal_weights.dot(previous_state.T) + self._input_weights.dot(current_input.T)            # 加入白噪音            #state_before_tanh = W*x(t-1)+Win*u(t) + random*_noise_level            state_before_tanh += np.random.rand(self._n_internal_units, N)*self._noise_level            # 加入非线性tanh()以及洩漏率_leak            # 虽然原作者列了条件,但应该可以概括为x(t) = (1-a)*x(t-1) + tanh(W*x(t-1)+Win*u(t) + random*_noise_level)            # 当a=1即条件1;否则条件2            if self._leak is None:                # 条件1:a=1,ESN                previous_state = np.tanh(state_before_tanh).T            else:                # 条件2:a=[0, 1),Leaky ESN                # 这边应该要写x(t)=(1-a)*x(t-1) + a*x(t)                previous_state = (1.0 - self._leak)*previous_state + np.tanh(state_before_tanh).T            # 当超过沖洗时间,将过往状态储存x(t)在状态矩阵X            if (t > n_drop - 1):                state_matrix[:, t - n_drop, :] = previous_state        return state_matrix        def get_states(self, X, n_drop=0, bidir=True):        # 取得数据列数N、时间步数T、特徵数V        # 若Win未建立,则建立Win                """        Step 8. 根据数据的三个维度大小(N:列数或资料笔数;T:时间步数;V:行数或特徵数),建立输入权重Win        - _input_weights = 输入权重,size=[_n_internal_units, V]        - np.random.binomial(1, 0.5 , [self._n_internal_units, V]):RxV个0或1        - 2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0:RxV个-1或1        - (2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0)*self._input_scaling:RxV个-_input_scaling或_input_scaling        """        N, T, V = X.shape        if self._input_weights is None:            self._input_weights = (2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0)*self._input_scaling                            """        Step 9. 计算顺向状态矩阵X        - states/state_matrix = 状态矩阵,size=[N, T-n_drop, _n_internal_units]        """        states = self._compute_state_matrix(X, n_drop)                        """        Step 12. 如果双向机制bidir启动,将数据反向后再算一次状态矩阵X,然后顺向状态矩阵X与反向状态矩阵Xr合併        """        if bidir is True:            # 数据沿着时间轴反向            X_r = X[:, ::-1, :]            # 计算反向状态矩阵states_r            states_r = self._compute_state_matrix(X_r, n_drop)            # 顺向状态矩阵states与反向状态矩阵states_r合併            states = np.concatenate((states, states_r), axis=2)        return states        def getReservoirEmbedding(self, X,pca, ridge_embedding,  n_drop=5, bidir=True, test = False):                """        Step 7. 取得初始状态res_states        - 初始状态矩阵res_states = 各时刻的状态(除了前n_drop笔)        - 抛弃笔数n_drop = 前n笔不记录在状态矩阵states        """        res_states = self.get_states(X, n_drop=5, bidir=True)                """        Step 13. 为了降维,将初始状态矩阵res_states从[N, T-n_drop, _n_internal_units]重塑为res_states[N*(T-n_drop), _n_internal_units]        资料笔数N_samples              """        N_samples = res_states.shape[0]        res_states = res_states.reshape(-1, res_states.shape[2])                           """        Step 14. 根据当前模式,决定PCA是对res_states採学习或者转换方式        经降维后的res_states            """        # 若目前为测试模式,则pca仅执行转换        if test:            red_states = pca.transform(res_states)        # 若目前为训练模式,则pca执行学习        else:            red_states = pca.fit_transform(res_states)                  # 重新将red_states从[N*(T-n_drop), _n_internal_units]=>[N, T-n_drop, _n_internal_units]        red_states = red_states.reshape(N_samples,-1,red_states.shape[1])                                  """        Step 15. 透过Ridge回归,进行编码        编码器input_repr        """        coeff_tr = []        biases_tr = []           for i in range(X.shape[0]):            ridge_embedding.fit(red_states[i, 0:-1, :], red_states[i, 1:, :])            coeff_tr.append(ridge_embedding.coef_.ravel())            biases_tr.append(ridge_embedding.intercept_.ravel())        print(np.array(coeff_tr).shape,np.array(biases_tr).shape)        input_repr = np.concatenate((np.vstack(coeff_tr), np.vstack(biases_tr)), axis=1)        return input_repr"""读取资料"""data = scipy.io.loadmat('dataset/JpVow.mat')X = data['X']  Y = data['Y']Xte = data['Xte']Yte = data['Yte']"""对输出作热编码"""onehot_encoder = OneHotEncoder(sparse=False, categories='auto')Y = onehot_encoder.fit_transform(Y)Yte = onehot_encoder.transform(Yte)"""建立降维器、编码器,以及解码器"""pca = PCA(n_components=75)  ridge_embedding = Ridge(alpha=10, fit_intercept=True)readout = Ridge(alpha=5)"""ESN训练"""res = Reservoir(n_internal_units=450, spectral_radius=0.6, leak=0.6,                 connectivity=0.25, input_scaling=0.1, noise_level=0.01, circle=False)"""降维器训练、编码器训练"""input_repr = res.getReservoirEmbedding(X, pca, ridge_embedding,  n_drop=5, bidir=True, test = False)"""将测试资料餵入ESN,并且进行编码"""input_repr_te = res.getReservoirEmbedding(Xte,pca, ridge_embedding,  n_drop=5, bidir=True, test = True)"""Step 16. 解码器训练"""readout.fit(input_repr, Y)"""将测试资料餵入解码器"""logits = readout.predict(input_repr_te)"""分类"""pred_class = np.argmax(logits, axis=1)"""结果可视化"""true_class = np.argmax(Yte, axis=1)print(accuracy_score(true_class, pred_class))print(f1_score(true_class, pred_class, average='weighted'))

以[6]维框架,加入了[7]的Wfb、noise、Win的定义方法后,其误差可以降到9.58e-10

# -*- coding: utf-8 -*-"""原代码, Mantas Lukoševičius(开山祖师Jaeger的学生), http://mantas.info数据集可以到Lukoševičius那边下载"""import matplotlib.pyplot as pltimport scipyimport numpy as npimport pandas as pdfrom sklearn.linear_model import Ridge, RidgeCV, ElasticNet, ElasticNetCV, Lasso, LassoCVfrom sklearn.metrics import mean_squared_errornp.random.seed(42)"""Step 1. 读取数据(MackeyGlass_t17),尺寸为(10000, 1)"""data = pd.read_table('MackeyGlass_t17.txt', header=None).to_numpy()"""Step 2. 定义训练集长度len_train、测试集长度len_test、暂态长度len_transient、误差计算长度len_error"""len_train = 2000len_test = 2000len_transient = min(int(len_train / 10), 100)len_error = 500"""Step 3. 画出数据前1000笔的曲线"""plt.figure('A sample of data').clear()plt.plot(data[0:1000])plt.title('A sample of data')"""Step 4. 定义ESN架构输入层神经元数Nu:    - 等于输入资料的特徵数输出层神经元数Ny:    - 等于目标资料的特徵数隐藏层神经元数Nx:    - 建议最少100颗,且每次增幅单位为50颗    - 没看过超过1000的洩漏率a:    - 加强短期记忆能力,範围在[0, 1]    - 若a=1则ESN;a<1则Leaky ESN(P7)输入层的缩放因子input_scaling    - 範围在[0, 1]稀疏程度sparsity:    - 建议在0.01(1%)~0.05(5%),也有人建议0.05(5%)~0.2(20%)指定谱半径rho:    - 範围在[0, 1)使用偏权值use_bias使用输出层回馈权重(Wfb)use_fb"""Nu = 1Ny = 1Nx = 500a = 0.3input_scaling = 0.01sparsity = 0.05rho = 0.99use_bias = Trueuse_fb = Truenoise_level = 0.00"""Step 5. 初始化输入层权重Win、隐藏层权重W输入层权重Win:    - 如果有偏权值,则尺寸为(隐藏层神经元数, 1+输入层神经元数)    - 否则(隐藏层神经元数, 输入层神经元数)    - 常见範围在[-1, 1]或者[-0.5, 0.5]    - 建议用常态分布输出层回馈权重Wfb:    - 尺寸为(输出层神经元数, 1)    - 常见範围在[-1, 1]或者[-0.5, 0.5]    - 建议用常态分布初始隐藏层权重W_0:    - 尺寸为(隐藏层神经元数, 隐藏层神经元数)    - 常见的初始範围在[-1, 1]或者[-0.5, 0.5]    - 这边提供另一种写法,借鉴于Madalina Ciortan,这个方法才真正达到稀疏效果    - W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense() - 0.5    - scipy.sparse.rand(列数, 行数, density=稀疏程度).todense()    - 假设density=0.1(10%)、列数=10、行数=10,则10x10的矩阵中只有总计10个元素不为零,而且是不规则零散分布在矩阵中初始隐藏层权重的谱半径rhoW:    - max( abs( W_0 ))    - rhoW<1,隐藏层才具有ECHO特性隐藏层权重W:    - W = W_0 * rho/rhoW"""if use_bias==True:    # Win = (np.random.rand(Nx, 1+Nu)-0.5) * input_scaling    Win = (2.0*np.random.binomial(1, 0.5 , [Nx, 1+Nu]) - 1.0) * input_scalingelse:    # Win = (np.random.rand(Nx, Nu)-0.5) * input_scaling    Win = (2.0*np.random.binomial(1, 0.5 , [Nx, Nu]) - 1.0) * input_scalingif use_fb==True:    # Wfb = np.random.rand(Nx,Ny)-0.5    Wfb = (2.0*np.random.binomial(1, 0.5 , [Nx, Ny]) - 1.0)else:    Wfb = np.zeros([Nx,Ny])# W_0 = np.random.rand(Nx,Nx)-0.5W_0 = scipy.sparse.rand(Nx, Nx, density=sparsity).todense()W_0[np.where(W_0 > 0)] = W_0[np.where(W_0 > 0)] - 0.5rhoW = np.max(np.abs(np.linalg.eig(W_0)[0])) W = W_0 * rho/rhoW"""Step 5. 收集:取得每一时刻的状态x(t),然后存在所有时刻的状态X中    - t=0时,x(0)为一个尺寸(隐藏层神经元数,1)的零矩阵,y(0)为一个尺寸(输出层神经元数,1)的零矩阵    - 无偏权值、无回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win*   u(t)  + W*x(t-1) + 0          );X尺寸=(笔数,   输入层神经元数+隐藏层神经元数            )    - 无偏权值、有回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win*   u(t)  + W*x(t-1) + Wfb*y(t-1) );X尺寸=(笔数,   输入层神经元数+隐藏层神经元数+输出层神经元数)    - 有偏权值、无回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) + 0          );X尺寸=(笔数, 1+输入层神经元数+隐藏层神经元数            )    - 有偏权值、有回馈:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) + Wfb*y(t-1) );X尺寸=(笔数, 1+输入层神经元数+隐藏层神经元数+输出层神经元数)    - X应为一个尺寸(隐藏层神经元数,笔数)    - 在训练Wout前记得要删掉X的前len_transient笔"""X = []x = np.zeros([Nx, 1])y = np.zeros([Ny, 1])for t in range(len_train):    u = data[t:t+1]    if use_bias==True:        x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack([1, u])) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level)    else:        x = (1-a)*x + a*np.tanh( np.dot(Win, u) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand([Nx, 1])*noise_level)    if use_bias==False and use_fb==False:        X.append(np.vstack([u, x]))    elif use_bias==False and use_fb==True:        X.append(np.vstack([u, x, y]))    elif use_bias==True and use_fb==False:        X.append(np.vstack([1, u, x]))    elif use_bias==True and use_fb==True:        X.append(np.vstack([1, u, x, y]))    y = data[t:t+1]X = X[len_transient:]X = np.array(X).reshape(len_train-len_transient, -1)"""Step 6. 训练:利用所有时刻的状态X、所有时刻的目标Yt,以Ridge回归求得输出层权重Wout    - 输出层权重的尺寸为(输出层神经元数, 1+输入层神经元数+隐藏层神经元数)    - 借鉴Madalina Ciortan,我直接用Ridge取代了Wout    - L1正则的回归器效果都不太好(Lasso、ElasticNet),我猜是因为很多权重被删掉的关係    - 无偏权值、无回馈:Wout尺寸=(输出层神经元数,   输入层神经元数+隐藏层神经元数            )    - 无偏权值、有回馈:Wout尺寸=(输出层神经元数,   输入层神经元数+隐藏层神经元数+输出层神经元数)    - 有偏权值、无回馈:Wout尺寸=(输出层神经元数, 1+输入层神经元数+隐藏层神经元数            )    - 有偏权值、有回馈:Wout尺寸=(输出层神经元数, 1+输入层神经元数+隐藏层神经元数+输出层神经元数)"""y_train = data[len_transient+1:len_train+1]ridge = Ridge(alpha=1e-8)ridge.fit(X, y_train)    """Step 7. 测试    - 第一笔测试输入u(t-1)=data[2000]    - 第一笔状态x(t-1)延续于Step 5    - Y是用来记录模型在测试阶段各时刻的输出    - 作者提供两种模型输出:递迴式多步预测和单步预测,选一个用就可以了    - 无偏权值、无回馈:y(t) = Wout*[  u(t);x(t)]    - 无偏权值、有回馈:y(t) = Wout*[  u(t);x(t);y(t-1)]    - 有偏权值、无回馈:y(t) = Wout*[1;u(t);x(t)]    - 有偏权值、有回馈:y(t) = Wout*[1;u(t);x(t);y(t-1)]  """y_pred = []u = data[len_train:len_train+1]for t in range(len_test):    if use_bias==True:        x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack([1, u])) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level)    else:        x = (1-a)*x + a*np.tanh( np.dot(Win, u) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level)    # print(t)    # if t==504:    #     print()    if use_bias==False and use_fb==False:        y = ridge.predict(np.vstack([u, x]).T)    elif use_bias==False and use_fb==True:        y = ridge.predict(np.vstack([u, x, y]).T)    elif use_bias==True and use_fb==False:        y = ridge.predict(np.vstack([1, u, x]).T)    elif use_bias==True and use_fb==True:        y = ridge.predict(np.vstack([1, u, x, y]).T)    y_pred.append(y)    # 递迴式多步预测,也就是当前时刻的模型输出会作为下一刻模型的输入    # u = y    # 单步预测,当前时刻的输入数据是从实际量测或者离线数据中取得    u = data[len_train+t+1:len_train+t+2]    y = data[len_train+t:len_train+t+1]    y_pred = np.array(y_pred)y_pred = y_pred.reshape(-1, Ny)"""Step 8. 採MSE计算测试集中前len_error笔误差    - 为什么只计算len_error笔误差,因为Mantas当初是採递迴式多步预测作为输出方式    - 递迴式多步预测(Recursive Multi-Step Forecasting)的问题在于预测能力会随着时间而衰弱,从本代码的执行结果中也显而易见    - 所以我猜这就是他只取500笔作计算的原因"""mse = mean_squared_error(data[len_train+1:len_train+1+len_error], y_pred[:len_error])print('MSE = ' + str( mse ))    """画出测试集输入讯号"""plt.figure('Target and generated signals').clear()plt.plot(data[len_train+1:len_train+1+len_test], 'g')plt.plot( y_pred, 'b' )plt.title('Target and generated signals $y(n)$ starting at $n=0$')plt.legend(['Target signal', 'Free-running predicted signal'])"""画出20条权重在t=100~300的状态响应X"""plt.figure('Some reservoir activations').clear()plt.plot( X[0:200, 0:20] )# 看起来是t=0~200,但加上沖洗时间initLen=100后应为t=100~300plt.title('Some reservoir activations $\mathbf{x}(n)$')"""画出输出权重的值"""plt.figure('Output weights').clear()if use_bias==False and use_fb==False:    plt.bar( range(Nu+Nx), ridge.coef_.ravel() )elif use_bias==False and use_fb==True:    plt.bar( range(Nu+Nx+Ny), ridge.coef_.ravel() )elif use_bias==True and use_fb==False:    plt.bar( range(1+Nu+Nx), ridge.coef_.ravel() )elif use_bias==True and use_fb==True:    plt.bar( range(1+Nu+Nx+Ny), ridge.coef_.ravel() )plt.title('Output weights $\mathbf{W}^{out}$')plt.show()

参考文献

[1]An Introduction to the Echo State Network and its Applications in Power System:入门级期刊(含应用),若不认识ESN的话这篇一定要先看
[2]Adaptive Nonlinear System Identification with Echo State Networks:开山祖师Jaeger写的,也是入门级期刊,若不认识ESN的话这篇一定要先看
[3]A Practical Guide to Applying Echo State Networks:Lukoševičius(开山祖师Jaeger的硕+博学生)写的,重点观念在P2~P3
[4]Predictive Modeling with Echo State Networks:这篇加了Wfb,以及各参数的实验结果,2.1节是重点
[5]Echo State Networks for Adaptive Filtering:重点观念在P18的Algorithm 1,终于明白Lukoševičius原代码的1.25是什么了
[6]Simple Echo State Network implementations:Lukoševičius提供简易代码,与[2]相呼应
[7]Gentle introduction to Echo State Networks:Ciortan(Medium网友)的代码,该代码源头可追朔到[8]
[8]Reservoir computing approaches for representation and classification of multivariate time series:作时间序列辨识的,加入双向机制、降维和编解码器
[9]An Introduction to:Reservoir Computing and Echo State Networks:入门级投影片


关于作者: 网站小编

码农网专注IT技术教程资源分享平台,学习资源下载网站,58码农网包含计算机技术、网站程序源码下载、编程技术论坛、互联网资源下载等产品服务,提供原创、优质、完整内容的专业码农交流分享平台。

热门文章