# mindtext.modules.encoder.roberta

> _class_ __mindtext.modules.encoder.roberta.RobertaConfig__ _(seq_length: int = 128,
                 vocab_size: int = 32000,
                 hidden_size: int = 768,
                 num_hidden_layers: int = 12,
                 num_attention_heads: int = 12,
                 intermediate_size=3072,
                 hidden_act: str = "gelu",
                 hidden_dropout_prob: float = 0.1,
                 attention_probs_dropout_prob: float = 0.1,
                 max_position_embeddings: int = 512,
                 type_vocab_size: int = 16,
                 initializer_range: float = 0.02,
                 dtype: mstype = mstype.float32,
                 compute_type: mstype = mstype.float32)_

ROBERTA模型的配置

> __init__ (_seq_length: int = 128,
                 vocab_size: int = 32000,
                 hidden_size: int = 768,
                 num_hidden_layers: int = 12,
                 num_attention_heads: int = 12,
                 intermediate_size=3072,
                 hidden_act: str = "gelu",
                 hidden_dropout_prob: float = 0.1,
                 attention_probs_dropout_prob: float = 0.1,
                 max_position_embeddings: int = 512,
                 type_vocab_size: int = 16,
                 initializer_range: float = 0.02,
                 dtype: mstype = mstype.float32,
                 compute_type: mstype = mstype.float32_)

参数 
* __vocab_size_ (_int_): 每个embedding向量的shape，默认为32000。
* __seq_length__ (_int_): 输入序列的的长度，默认为128。
* __hidden_size__ (_int_): bert编码器的大小，默认为768。
* __num_hidden_layer__ (_int_): 隐藏层的数量，默认为12。
* __num_attention_heads__ (_int_): 在BertTransformer编码器里面每一个attention层的注意力头数量，默认为12。
* __intermediate_size__ (_int_): 中间层的隐藏维度大小，默认为3072。
* __hidden_act__ (_str_): bert的激活函数，默认为“gelu”。
* __hidden_dropout_prob__ (_float): dropout的大小，默认为0.1。
* __attention_probs_dropout_prob__ (_float_): BertAttention的dropout概率，默认为0.1。
* __max_position_embeddings__ (_int_): 用于模型的最大序列长度，默认为512。
* __type_vocab_size__ (_int_): 典型字典的大小，默认为16。
* __initializer_range__ (_float_): 截断正态分布的初始值，默认为0.02。
* __dtype__ (_mindspore.dtype_): 数据类型，默认为mstype.float32。
* __compute_type__ (_mindspore.dtype_): Bert模型的计算数据类型，默认为mstype.float32。

> _class_ __mindtext.modules.encoder.roberta.EmbeddingLookup__ _(vocab_size: int, embedding_size: int, embedding_shape: int, 
> use_one_hot_embeddings: bool = False, initializer_range: float = 0.02)_

一个embedding的查找表

> __init__ (_vocab_size: int, embedding_size: int, embedding_shape: int, use_one_hot_embeddings: bool = False, initializer_range: float = 0.02_)

参数 
* __vocab_size__ (_int_): embedding的字典大小。
* __embedding_size__ (_int_): embedding向量的大小。
* __embedding_shape__ (_lsit_): embedding向量的shape为[batch_size, seq_length, embedding_size]。
* __use_one_hot_embeddings__ (_bool_): 是否使用one hot编码格式，默认为False。
* __initializer_range__ (_float_): 截断正态分布的初始值，默认为0.02。

> __construct__ (_input_ids: Tensor_)

得到输出和embedding的查找表

参数
* __input_ids__ (_mindspore.Tensor_): 一个包含字符和对于ids的信息向量


返回
* __output__ (_mindspore.Tensor_): input_ids转换的高维度word embedding。
* __embedding_table__ (_Tensor matrix_): 修正过的查找表。

> _class_ __mindtext.modules.encoder.roberta.RobertaEmbedding__ _(embedding_size: int, embedding_shape: list, use_token_type: bool = False, 
> token_type_vocab_size: int = 16, use_one_hot_embeddings: bool = False, max_position_embeddings: int = 512, dropout_prob: float = 0.1)_

类似于BertEmbedding对于位置编码有小的调整

> __init__ (_embedding_size: int, embedding_shape: list, use_token_type: bool = False, token_type_vocab_size: int = 16, 
> use_one_hot_embeddings: bool = False, max_position_embeddings: int = 512, dropout_prob: float = 0.1_)

参数
* __embedding_size__ (_int_): embedding向量的大小。
* __embedding_shape__ (_list_): embedding向量的shape为[batch_size, seq_length, embedding_size]。
* __use_token_type__ (_bool_): 是否使用token类型编码，默认为False。
* __token_type_vocab_size__ (_int_): token类型编码的大小，默认为16。
* __use_one_hot_embeddings__ (_bool_): 是否使用one hot编码格式，默认为False。
* __max_position_embeddings__ (_int_):在模型使用的最大长度，默认为512。
* __dropout_prob__ (_float_): dropout的大小，默认为0.1。

> __construct__ (_input_ids: Tensor, token_type_ids: Tensor, word_embeddings: Tensor_)

参数
* __input_ids__ (_mindspore.Tensor_): 输入向量。
* __token_type_ids__ (_mindspore.Tensor_): segment id的向量。
* __word_embeddings__ (_mindspore.Tensor_): word embedding向量。

返回
* __output__ (_mindspore.Tensor_): 融合位置编码和segment编码的向量。

> _class_ __mindtext.modules.encoder.luke.EncoderOutput__ _(in_channels: int, out_channels: int, initializer_range: float = 0.02, 
> dropout_prob: float = 0.1, compute_type: mstype = mstype.float32)_

用于位置和token类型编码到word embedding的后处理

> __init__ (_in_channels: int, out_channels: int, initializer_range: float = 0.02, dropout_prob: float = 0.1, compute_type: mstype = mstype.float32_)

参数
* __in_channels__ (_int_): 输入向量。
* __out_channels__ (_list_): 输出向量。
* __initializer_range__ (_float_): 截断正态分布的初始值，默认为0.02。
* __dropout_prob__ (_float_): dropout的大小，默认为0.1。
* __compute_type__ (_mindspore.dtype_): Bert模型的计算数据类型，默认为mstype.float32。


> __construct__ (_hidden_status: Tensor, input_tensor: Tensor_)

参数
* __hidden_status__ (_mindspore.Tensor_): 隐藏层状态。
* __input_tensor__ (_mindspore.Tensor_): 残差计算的输入。

返回
* __output__ (_mindspore.Tensor_): 输入的线性计算和残差计算。

> _class_ __mindtext.modules.encoder.luke.AttentionLayer__ _(from_tensor_width: int,
                 to_tensor_width: int,
                 num_attention_heads: int = 1,
                 size_per_head: int = 512,
                 has_attention_mask: bool = True,
                 query_act: str = None,
                 key_act: str = None,
                 value_act: str = None,
                 attention_probs_dropout_prob: float = 0.0,
                 initializer_range: float = 0.02,
                 do_return_2d_tensor: bool = False,
                 batch_size: int = None,
                 from_seq_length: int = None,
                 to_seq_length: int = None,
                 compute_type: mstype = mstype.float32)_

使用多头治理力机制

> __init__ (_from_tensor_width: int,
                 to_tensor_width: int,
                 num_attention_heads: int = 1,
                 size_per_head: int = 512,
                 has_attention_mask: bool = True,
                 query_act: str = None,
                 key_act: str = None,
                 value_act: str = None,
                 attention_probs_dropout_prob: float = 0.0,
                 initializer_range: float = 0.02,
                 do_return_2d_tensor: bool = False,
                 batch_size: int = None,
                 from_seq_length: int = None,
                 to_seq_length: int = None,
                 compute_type: mstype = mstype.float32_)

参数
* __from_tensor_width__ (_int_): from_tensor的最后一维大小。
* __to_tensor_width__ (_int_): to_tensor的最后一维大小。
* __from_seq_length__ (_int_): from_tensor的序列长度。
* __to_seq_length__ (_int_): to_tensor的序列长度。
* __num_attention_heads__ (_int_): 注意力头数量，默认为1。
* __size_per_head__ (_int_): 每个注意力头的维度大小，默认为512。
* __query_act__ (_str_): query transformer的激活函数，默认为None。
* __key_act__ (_str_): key transformer的激活函数，默认为None。
* __value_act__ (_str_): value transformer的激活函数，默认为None。
* __has_attention_mask__ (_bool_): 是否使用attention mask，默认为True。
* __attention_probs_dropout_prob__ (_float_): 自注意力的dropout，默认为0.0。
* __initializer_range__ (_float_): 截断正态分布的初始值，默认为0.02。
* __do_return_2d_tensor__ (_bool_): True返回2维张量，False返回3维张量，默认为False。
* __batch_size__ (_int_): 数据集的batch size。
* __computer_type__ (_mindspore.dtype_): 注意力计算的类型，默认为mstype.float32。

> __construct__ (_from_tensor: Tensor, to_tensor: Tensor, attention_mask: Tensor = None_)

参数
* __from_tensor__ (_mindspore.Tensor_)：from_tensor，通常是一个attention的query向量(Q)，shape是(batch_size, from_seq_len, dim)。
* __to_tensor__ (_mindspore.Tensor_)：to_tensor，通常是key和value对于attention来说K=V，shape是(batch_size, to_seq_len, dim)。
* __attention_mask__ (_Optional[mindspore.Tensor]_)：注意力的mask矩阵（2D或者3D），值是[0/1]或者[True/False]，默认为None，shape为(from_seq_len, to_seq_len)或者(batch_size, from_seq_len, to_seq_len)。

返回
* __context_layer__ (_mindspore.Tensor_): 注意力层的输出。

> _class_ __mindtext.modules.encoder.roberta.BertEncoderCell__ _(batch_size: int,
                 hidden_size: int = 768,
                 seq_length: int = 512,
                 num_attention_heads: int = 12,
                 intermediate_size: int = 3072,
                 attention_probs_dropout_prob: float = 0.02,
                 initializer_range: float = 0.02,
                 hidden_dropout_prob: float = 0.1,
                 hidden_act: str = "gelu",
                 compute_type: mstype = mstype.float32)_


> __init__ (_batch_size: int,
                 hidden_size: int = 768,
                 seq_length: int = 512,
                 num_attention_heads: int = 12,
                 intermediate_size: int = 3072,
                 attention_probs_dropout_prob: float = 0.02,
                 initializer_range: float = 0.02,
                 hidden_dropout_prob: float = 0.1,
                 hidden_act: str = "gelu",
                 compute_type: mstype = mstype.float32_)

参数
* __batch_size__ (_int_): 数据集的batch size。
* __hidden_size__ (_int_): bert编码层的大小，默认为768。
* __seq_length__ (_int_): 输入的长度，默认为512。
* __num_attention_heads__ (_int_): 注意力头数量，默认为12。
* __intermediate_size__ (_int_): 中间层的隐藏维度大小，默认为3072。
* __hidden_act__ (_str_): bert的激活函数，默认为“gelu”。
* __hidden_dropout_prob__ (_float): dropout的大小，默认为0.1。
* __attention_probs_dropout_prob__ (_float_): BertAttention的dropout概率，默认为0.02。
* __initializer_range__ (_float_): 截断正态分布的初始值，默认为0.02。
* __compute_type__ (_mindspore.dtype_): Bert模型的计算数据类型，默认为mstype.float32。

> __construct__ (_input_tensor: Tensor, attention_mask: Tensor_)

参数
* __input_tensor__ (_mindspore.Tensor_): BertEncoderCell的输入。
* __attention_mask__ (_mindspore.Tensor_): attention_mask。

返回
* __output__ (_mindspore.Tensor_): encoder的输出。

> _class_ __mindtext.modules.encoder.roberta.BertTransformer__ _(batch_size: int,
                 hidden_size: int,
                 seq_length: int,
                 num_hidden_layers: int,
                 num_attention_heads: int = 12,
                 intermediate_size: int = 3072,
                 attention_probs_dropout_prob: float = 0.1,
                 initializer_range: float = 0.02,
                 hidden_dropout_prob: float = 0.1,
                 hidden_act: str = "gelu",
                 compute_type: mstype = mstype.float32,
                 return_all_encoders: bool = False)_


> __init__ (_batch_size: int,
                 hidden_size: int,
                 seq_length: int,
                 num_hidden_layers: int,
                 num_attention_heads: int = 12,
                 intermediate_size: int = 3072,
                 attention_probs_dropout_prob: float = 0.1,
                 initializer_range: float = 0.02,
                 hidden_dropout_prob: float = 0.1,
                 hidden_act: str = "gelu",
                 compute_type: mstype = mstype.float32,
                 return_all_encoders: bool = False_)

参数
* __batch_size__ (_int_): 数据集的batch size。
* __hidden_size__ (_int_): bert编码层的大小。
* __seq_length__ (_int_): 输入的长度，默认为512。
* __num_hidden_layers__ (_int_): 编码器隐含层的层数。
* __num_attention_heads__ (_int_): 注意力头数量，默认为12。
* __intermediate_size__ (_int_): 中间层的隐藏维度大小，默认为3072。
* __hidden_act__ (_str_): bert的激活函数，默认为“gelu”。
* __hidden_dropout_prob__ (_float): dropout的大小，默认为0.1。
* __attention_probs_dropout_prob__ (_float_): BertAttention的dropout概率，默认为0.1。
* __initializer_range__ (_float_): 截断正态分布的初始值，默认为0.02。
* __compute_type__ (_mindspore.dtype_): Bert模型的计算数据类型，默认为mstype.float32。
* __ return_all_encoders__ (_bool_): 是否返回所有的编码，默认为False。

> __construct__ (_input_tensor: Tensor, attention_mask: Tensor_)

参数
* __input_tensor__ (_mindspore.Tensor_): BertEncoderCell的输入。
* __attention_mask__ (_mindspore.Tensor_): attention_mask。

返回
* __all_encoder_layers__ (_mindspore.Tensor_): encoder的输出。

> _class_ __mindtext.modules.encoder.roberta.SecurityCast__ _(src_type: str)_

把字符串转换为Mstype

> __init__ (_src_type: str_) 

参数
* __src_type__ (_str_): mstype的字符串。

返回
* __desc_type__ (_mstypr_): 转换后的输出。

> _class_ __mindtext.modules.encoder.roberta.numbtpye2mstype__ _(dst_type: mstype = mstype.float32)_

提供一个安全的转换

> __init__ (_dst_type: mstype = mstype.float32_) 

参数
* __dst_type__ (_mstype_): 输出元素的类型，默认为mstype.float32。

> __construct__ (_x: Tensor_)

参数
* __x__ (_mindspore.Tensor_): 输入数据。

返回
* __output__ (_mindspore.Tensor_): 转换后的数据。

> _class_ __mindtext.modules.encoder.roberta.CreateAttentionMaskFromInputMask__ _(config: RobertaConfig)_

从输入的mask创建attention mask

> __init__ (_config: RobertaConfig_) 

参数
* __config__ (_RobertaConfig_): RobertaConfig。

> __construct__ (_input_mask: Tensor_)

参数
* __input_mask__ (_mindspore.Tensor_): 输入的mask。

返回
* __attention_mask__ (_mindspore.Tensor_): 创建的attention mask。

> _class_ __mindtext.modules.encoder.roberta.RobertaModel__ _(config: BertConfig, is_training: bool, use_one_hot_embeddings: bool = False)_

> __init__ (_config: BertConfig, is_training: bool, use_one_hot_embeddings: bool = False_) 

参数
* __config__ (_RobertaConfig_): RobertaConfigo。
* __is_training__ (_bool_): True是训练，False是评估。
* __use_one_hot_embeddings__ (_bool_): 是否使用one hot编码格式，默认为False。

> __construct__ (_input_ids: Tensor, token_type_ids: Tensor, input_mask: Tensor_)

参数
* __input_ids__ (_mindspore.Tensor_): 包含字符到ids的向量。
* __token_type_ids__ (_mindspore.Tensor_): 包含segment ids的向量。
* __input_mask__ (_mindspore.Tensor_): 输入的mask。

返回
* __sequence_output__ (_mindspore.Tensor_): 序列输出。
* __pooled_output__ (_mindspore.Tensor_): 第一个token cls。
* __embedding_table__ (_mindspore.Tensor_): 修正的查找表。