训练Pytorch的Transformer模型-58码农网

样本资料準备

'''Hyperparameters:These values define the architecture and behavior of the transformer model:src_vocab_size, tgt_vocab_size: Vocabulary sizes for source and target sequences, both set to 5000.d_model: Dimensionality of the model's embeddings, set to 512.num_heads: Number of attention heads in the multi-head attention mechanism, set to 8.num_layers: Number of layers for both the encoder and the decoder, set to 6.d_ff: Dimensionality of the inner layer in the feed-forward network, set to 2048.max_seq_length: Maximum sequence length for positional encoding, set to 100.dropout: Dropout rate for regularization, set to 0.1.'''src_vocab_size = 5000tgt_vocab_size = 5000d_model = 512num_heads = 8num_layers = 6d_ff = 2048max_seq_length = 100dropout = 0.1'''This line creates an instance of the Transformer class, initializing it with the givenhyperparameters. The instance will have the architecture and behavior defined by these hyperparameters.'''transformer = Transformer(src_vocab_size, tgt_vocab_size, d_model, num_heads, num_layers, d_ff, max_seq_length, dropout)'''Generate random sample data.src_data: Random integers between 1 and src_vocab_size, representing a batch of source sequences with shape (64, max_seq_length).tgt_data: Random integers between 1 and tgt_vocab_size, representing a batch of target sequences with shape (64, max_seq_length).These random sequences can be used as inputs to the transformer model, simulating a batch of data with 64 examples and sequences of length 100.'''src_data = torch.randint(1, src_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)tgt_data = torch.randint(1, tgt_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)

此程式码片段示範如何初始化Transformer模型并产生可输入模型的随机来源序列和目标序列。所选的超参数决定了变压器的具体结构和属性。此设定可以是较大脚本的一部分，其中模型根据实际的序列到序列任务（例如机器翻译或文字摘要）进行训练和评估。

训练模型

'''criterion = nn.CrossEntropyLoss(ignore_index=0): Defines the loss function as cross-entropy loss. The ignore_index argument is set to 0, meaning the loss will not consider targets with an index of 0 (typically reserved for padding tokens).optimizer = optim.Adam(...): Defines the optimizer as Adam with a learning rate of 0.0001 and specific beta values.'''criterion = nn.CrossEntropyLoss(ignore_index=0)optimizer = optim.Adam(transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)'''transformer.train(): Sets the transformer model to training mode, enabling behaviors like dropout that only apply during training.'''transformer.train()'''The code snippet trains the model for 100 epochs using a typical training loop:for epoch in range(100): Iterates over 100 training epochs.optimizer.zero_grad(): Clears the gradients from the previous iteration.output = transformer(src_data, tgt_data[:, :-1]): Passes the source data and the target data (excluding the last token in each sequence) through the transformer. This is common in sequence-to-sequence tasks where the target is shifted by one token.loss = criterion(...): Computes the loss between the model's predictions and the target data (excluding the first token in each sequence). The loss is calculated by reshaping the data into one-dimensional tensors and using the cross-entropy loss function.loss.backward(): Computes the gradients of the loss with respect to the model's parameters.optimizer.step(): Updates the model's parameters using the computed gradients.print(f"Epoch: {epoch+1}, Loss: {loss.item()}"): Prints the current epoch number and the loss value for that epoch.'''for epoch in range(100):    optimizer.zero_grad()    output = transformer(src_data, tgt_data[:, :-1])    loss = criterion(output.contiguous().view(-1, tgt_vocab_size), tgt_data[:, 1:].contiguous().view(-1))    loss.backward()    optimizer.step()    print(f"Epoch: {epoch+1}, Loss: {loss.item()}")

此程式码片段在随机产生的来源序列和目标序列上训练 Transformer 模型 100 个时期。它使用 Adam 优化器和交叉熵损失函数。每个时期都会显示出损失，以便您监控训练进度。在现实场景中，您可以将随机来源序列和目标序列替换为任务（例如机器翻译）中的实际资料。

评估模型表现

'''transformer.eval(): Puts the transformer model in evaluation mode. This is important becauseit turns off certain behaviors like dropout that are only used during training.'''transformer.eval()'''Generate random sample validation data.val_src_data: Random integers between 1 and src_vocab_size, representing a batch of validation source sequences with shape (64, max_seq_length).val_tgt_data: Random integers between 1 and tgt_vocab_size, representing a batch of validation target sequences with shape (64, max_seq_length).'''val_src_data = torch.randint(1, src_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)val_tgt_data = torch.randint(1, tgt_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)'''Validation Loop:with torch.no_grad(): Disables gradient computation, as we don't need to compute gradients during validation. This can reduce memory consumption and speed up computations.val_output = transformer(val_src_data, val_tgt_data[:, :-1]): Passes the validation source data and the validation target data (excluding the last token in each sequence) through the transformer.val_loss = criterion(...): Computes the loss between the model's predictions and the validation target data (excluding the first token in each sequence). The loss is calculated by reshaping the data into one-dimensional tensors and using the previously defined cross-entropy loss function.print(f"Validation Loss: {val_loss.item()}"): Prints the validation loss value.'''with torch.no_grad():    val_output = transformer(val_src_data, val_tgt_data[:, :-1])    val_loss = criterion(val_output.contiguous().view(-1, tgt_vocab_size), val_tgt_data[:, 1:].contiguous().view(-1))    print(f"Validation Loss: {val_loss.item()}")

此程式码片段在随机生成的验证资料集上评估Transformer模型，计算验证损失并显示它。在现实场景中，随机验证资料应替换为您正在处理的任务中的实际验证资料。验证损失可以让您了解模型在未见过的数据上的表现如何，这是模型泛化能力的关键衡量标準。

样本资料準备

训练模型

评估模型表现

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1训练Pytorch的Transformer模型

2使用原生js控制ckeditor自适应高度

3实现免密码ssh 登入远端主机

4[ rsync ] 两台linux 服务器做同步

5Python: pandas + gpt3.5 用一句话让 LLM 分析数据