Journal of Dalian Ocean University 2023, Vol. 38 Issue (1): 140-148 DOI: 10.16535/j.cnki.dlhyxb.2022-305 |
|
|
|
|
|
|
Fishery standard table information extraction method based on rule matching and deep learning AbTransformer |
SUN Zhetao, YU Hong*, SONG Qishu, LI Guangyu, SHAO Liming,YANG Huining, ZHANG Sijia, SUN Hua
|
1.Key Laboratory of Marine Information Technology of Liaoning Province, College of Information Engineering, Dalian Ocean University, Dalian 116023, China; 2.Key Laboratory of Environment Controlled Aquaculture (Dalian Ocean University), Ministry of Education, Dalian 116023, China |
|
|
Abstract In order to solve the problem of poor extraction effect caused by the diversity of table structure and unfixed header position in fishery standard text, a table information extraction method combining rule based on matching (RBM) and Absolute Transformer (AbTransformer) is proposed. The rule template and BERT-BiLSTM-CRF model are used to extract information from rule tables. The Transformer is improved by introducing row position coding into the position coding module and splicing it with the feature vector to obtain the line and column positions of the table to extract the irregular table information. The standard table information extraction is completed by combining the two. The results showed that the AbTransformer model proposed in this paper had the AUC value of 1.46% higher than the machine learning MLP model did, and 1.18% higher than the TabTransformer model did. RBM-AbTransformer method had 7.78% higher accuracy, 4.19% higher recall and 5.27% higher F1 score compared with AbTransformer method. The findings indicated that the information extraction method of fishery standard form combining RBM and AbTransformer effectively solved the problems of diversified table structures and unfixed header positions, and that improved the overall effect of information extraction of fishery standard form.
|
Published: 02 March 2023
|
|
|
|
|
|
|