Fishery standard table information extraction method based on rule matching and deep learning AbTransformer
SUN Zhetao, YU Hong*, SONG Qishu, LI Guangyu, SHAO Liming,YANG Huining, ZHANG Sijia, SUN Hua
1.Key Laboratory of Marine Information Technology of Liaoning Province, College of Information Engineering, Dalian Ocean University, Dalian 116023, China; 2.Key Laboratory of Environment Controlled Aquaculture (Dalian Ocean University), Ministry of Education, Dalian 116023, China
Abstract: In order to solve the problem of poor extraction effect caused by the diversity of table structure and unfixed header position in fishery standard text, a table information extraction method combining rule based on matching (RBM) and Absolute Transformer (AbTransformer) is proposed. The rule template and BERT-BiLSTM-CRF model are used to extract information from rule tables. The Transformer is improved by introducing row position coding into the position coding module and splicing it with the feature vector to obtain the line and column positions of the table to extract the irregular table information. The standard table information extraction is completed by combining the two. The results showed that the AbTransformer model proposed in this paper had the AUC value of 1.46% higher than the machine learning MLP model did, and 1.18% higher than the TabTransformer model did. RBM-AbTransformer method had 7.78% higher accuracy, 4.19% higher recall and 5.27% higher F1 score compared with AbTransformer method. The findings indicated that the information extraction method of fishery standard form combining RBM and AbTransformer effectively solved the problems of diversified table structures and unfixed header positions, and that improved the overall effect of information extraction of fishery standard form.
孙哲涛, 于红, 宋奇书, 李光宇, 邵立铭, 杨惠宁, 张思佳, 孙华. 基于规则匹配与深度学习AbTransformer的渔业标准表格信息抽取方法[J]. 大连海洋大学学报, 2023, 38(1): 140-148.
SUN Zhetao, YU Hong, SONG Qishu, LI Guangyu, SHAO Liming, YANG Huining, ZHANG Sijia, SUN Hua. Fishery standard table information extraction method based on rule matching and deep learning AbTransformer. Journal of Dalian Ocean University, 2023, 38(1): 140-148.