基于规则匹配与深度学习AbTransformer的渔业标准表格信息抽取方法

孙哲涛, 于红, 宋奇书, 李光宇, 邵立铭, 杨惠宁, 张思佳, 孙华

大连海洋大学学报 ›› 2023, Vol. 38 ›› Issue (1) : 140-148.

PDF(7368 KB)
PDF(7368 KB)
大连海洋大学学报 ›› 2023, Vol. 38 ›› Issue (1) : 140-148. DOI: 10.16535/j.cnki.dlhyxb.2022-305

基于规则匹配与深度学习AbTransformer的渔业标准表格信息抽取方法

  • 孙哲涛,于红*,宋奇书,李光宇,邵立铭,杨惠宁,张思佳,孙华
作者信息 +

Fishery standard table information extraction method based on rule matching and deep learning AbTransformer

  • SUN Zhetao, YU Hong*, SONG Qishu, LI Guangyu, SHAO Liming,YANG Huining, ZHANG Sijia, SUN Hua
Author information +
文章历史 +

摘要

为解决渔业标准文本中表格结构多样、表头位置不固定导致抽取效果不佳的问题,提出一种结合规则匹配(rule-based-matching,RBM)与AbTransformer(Absolute Transformer)深度学习模型的表格信息抽取方法,该方法对规则类表格信息采用规则模板与BERT-BiLSTM-CRF模型进行信息抽取,对非规则类表格信息采用改进的Transformer进行抽取,即在位置编码模块中引入行位置编码,与特征向量拼接以获取表格行列位置。结果表明:本文中提出的AbTransformer模型相较于机器学习MLP模型,AUC值提升了1.46%,相较于TabTransformer模型,AUC值提高了1.18%;本文中提出的RBM-AbTransformer模型与AbTransformer模型相比,准确率、召回率和F1值分别提高了7.78%、4.19%和5.27%。研究表明,结合RBM与 AbTransformer的渔业标准表格信息抽取方法,有效解决了表格结构多样、表头位置不固定的问题,提升了渔业标准表格信息抽取的整体效果。

Abstract

In order to solve the problem of poor extraction effect caused by the diversity of table structure and unfixed header position in fishery standard text, a table information extraction method combining rule based on matching (RBM) and Absolute Transformer (AbTransformer) is proposed. The rule template and BERT-BiLSTM-CRF model are used to extract information from rule tables. The Transformer is improved by introducing row position coding into the position coding module and splicing it with the feature vector to obtain the line and column positions of the table to extract the irregular table information. The standard table information extraction is completed by combining the two. The results showed that the AbTransformer model proposed in this paper had the AUC value of 1.46% higher than the machine learning MLP model did, and 1.18% higher than the TabTransformer model did. RBM-AbTransformer method had 7.78% higher accuracy, 4.19% higher recall and 5.27% higher F1 score compared with AbTransformer method. The findings indicated that the information extraction method of fishery standard form combining RBM and AbTransformer effectively solved the problems of diversified table structures and unfixed header positions, and that improved the overall effect of information extraction of fishery standard form.

关键词

渔业标准 / 实体识别 / 表格信息抽取 / 深度学习 / Transformer模型

Key words

fishery standard / entity recognition / table extraction / deep learning / Transformer model

引用本文

导出引用
孙哲涛, 于红, 宋奇书, 李光宇, 邵立铭, 杨惠宁, 张思佳, 孙华. 基于规则匹配与深度学习AbTransformer的渔业标准表格信息抽取方法[J]. 大连海洋大学学报, 2023, 38(1): 140-148 https://doi.org/10.16535/j.cnki.dlhyxb.2022-305
SUN Zhetao, YU Hong, SONG Qishu, LI Guangyu, SHAO Liming, YANG Huining, ZHANG Sijia, SUN Hua. Fishery standard table information extraction method based on rule matching and deep learning AbTransformer[J]. Journal of Dalian Fisheries University, 2023, 38(1): 140-148 https://doi.org/10.16535/j.cnki.dlhyxb.2022-305
中图分类号: S 932.2    TP 391   

基金

设施渔业教育部重点实验室(大连海洋大学)开放课题(2021-MOEKLECA-KF-05);国家自然科学基金(61802046)

PDF(7368 KB)

7317

Accesses

0

Citation

Detail

段落导航
相关文章

/