# mindtext.modules.tokenlizer.tokenization_transformer

> __convert_to_printable__ (_text_)

转换text成一个可打印的格式

> __convert_to_unicode__ (_text_)

转换text成一个Unicode格式

> __load_vocab_file__ (_vocab_file_)

加载一个词汇表文件并且转换成一个{token:id}的字典

> __convert_by_vocab_dict__ (_vocab_file_)

根据词汇字典转换[tokens|ids]的序列

> _class_ __mindtext.modules.tokenlizer.tokenization_roberta.WhiteSpaceTokenizer__ _(vocab_file)_

> __init__ (_vocab_file_)

参数
* __vocab_file__ (_path_):字典的路径

> __tokenize__ (_text_)

将文本tokenizes

> __convert_tokens_to_ids__ (_text_)

将token转换为对于的索引

> __convert_ids_to_tokens__ (_text_)

将索引转换成单词