ANALYSIS OF ENERGY SAVING AND EMISSION REDUCTION OF SECONDARY FIBER MILL BASED ON DATA MINING

X Apply the association rule algorithm to the field of pulping. X Optimize the process parameters of the whole process of waste paper pulping process. X The qualified rate of finished pulp was increased by 6.93%. X Reduce costs and carbon emissions of secondary fiber plants. Abstract. Waste paper recycling is an important way to realize the environmental protection development of the papermaking industry. The quality of the pulp will affect the pulp sales of the secondary fiber paper mills. The waste paper pulp can be adjusted by controlling the pulping process working conditions, but the working conditions of the waste paper pulping process have too many parameters. And the parameters are coupled with each other, it is difficult to control. In order to find the best working conditions and improve the quality of the pulp, this study uses the association rules algorithm to optimize the parameters for the waste paper pulping process. These parameters are power of refiner, waste paper concentration of refiner, the volume of slurry that enters deinked process, deinking agent amount, deinking time, deinking temperature, bleaching agent amount, bleaching time, and bleaching temperature. The test results show that the qualified rate of the pulp produced under the improved working conditions is 92.56%, an increase of 6.93%, and the average electricity consumption per ton of pulp is reduced by 5.76 kWh/t. In addition to potential economic benefits, this method can reduce carbon emissions.


Introduction
China has been the world's largest producer and consumer of paper and board. In 2017, China's paper and board output reached 111.3 million tons, of which 88.7 million tons were recycled paper. At the same time, during 2008-2017, the consumption of recycled pulp increased from 44.39 million tons to 63.02 million tons, accounting for about 65% of the total pulp consumption (China Paper Association [CPA], 2018). As an important papermaking raw material, waste paper is playing an increasingly important role in the papermaking industry (Li et al., 2020). However, due to the wide range of waste paper sources, many types, and uneven quality, the raw material properties of different batches of In the production process, different types of waste paper are mixed to make deinked pulp (DIP). DIP properties, especially the brightness, are particularly valued by companies. Some researcher use process optimization to improve the brightness of DIP. Veluchamy and Kalamdhad (2017) uses hot air to pretreat pulp to improve deinking efficiency. Vashisth et al. (2011) improved the flotation process. Saini et al. (2020) uses biological enzyme technology to improve the brightness of pulp. In addition to the deinking process, the bleaching process is also a key process to improve the brightness of the pulp. The development of new bleaching chemicals is the mainstream of optimizing bleaching processes, such as ozone (Kaur et al., 2019), peroxyacetic acid  and chlorine dioxide (Axegård, 2019). The above studies have focused on process optimization and the creation of new processes. These all require mills to invest a lot of money. Mills hope to improve the quality of pulp through the adjustment of existing pulping parameters.
DIP is a kind of non-uniform substance. Many factors could affect the DIP properties, such as types of fiber raw material, pulping methods, printing technology, fiber recycling times, storage duration time and preservation conditions (Okwonna, 2013). There is a nonlinear relationship between the above factors and DIP properties (Hossein et al., 2015). Therefore, in a secondary fiber paper mill, the adjustment of the pulping process mainly depends on manual experience. This leads to inappropriate consumption of raw materials and energy (Chakraborty et al., 2019). Some researchers have used pulping process parameters to predict pulp quality through machine learning methods. For example, Iglesias et al. (2017) established a Kappa value prediction model, Zhou et al. (2016) proposed a DOF (Degrees of freedom) prediction model, and Li et al. (2017) proposed a pulp quality prediction model. The above prediction model proves that there is a nonlinear relationship between pulp quality and process parameters. It is possible to control the quality of the pulp through the adjustment of existing pulping parameters. Liu et al. (2019) improved pulp quality by improving the ratio of waste paper pulp. Tsatsis et al. (2019) increases the time, temperature and concentration during pulping to improve the brightness of the pulp. Danielewicz and Surma-Ślusarska (2019) studied the effect of beating process on fiber morphology. Kumar et al. (2019) optimizes the pretreatment time and temperature of the biological enzyme to improve the pulp brightness. The above parameter optimization research focuses on a single process stage. Pulping is a continuous process, which lacks optimization of multiple parameters throughout the process. And these researches only optimized the quality of the pulp, without analyzing the cost of the production process.
This study optimizes multiple parameters throughout the pulping process. This problem can be solved using data mining algorithms. Association rule algorithm is a common data mining algorithm. Apriori algorithm and FP-growth algorithm are two main association rules algorithms. In order to improve the efficiency and interpretability of rule mining, Wang et al. (2018) constructed a frequent itemset tree in parallel with mining frequent item-sets without generating candidate sets. Han et al. (2000) proposed the classic FP-growth algorithm for frequent pattern detection and association rule mining. The production process of the pulp industry has accumulated a lot of historical data, and the association rules algorithm can mine the relationship between pulp quality and production parameters from a large amount of historical data.
Based on the previous research and the actual situations in pulp mills, association rules algorithm is used in this work. This work first applied association rule algorithm to the pulp industry. Through the mining of historical production data, we have obtained a more scientific parameter optimization scheme. And analyzed and discussed the pulp quality, pulp cost and carbon emission after parameter optimization.

Case studies and data collection
In the production process, different types of waste paper are mixed to make deinked pulp (DIP), and the DIP properties (such as brightness, beating degree, and tensile index) mainly depend on the pulping process working condition.
The field data that contained the working condition data of pulping process and the DIP properties data between the years 2018 were acquired from a secondary fiber paper mill, whose flow diagram of the DIP pulping process is plotted in Figure 1.
According to the testing frequency in Table 1, obviously, the DIP brightness was the most concerned property in the paper mill. There are many parameters in the pulping process. And the parameters are coupled with each other, Figure 1. Flow diagram of DIP pulping process in the paper mill they are difficult to control. In terms of parameter selection, the variables that affect DIP brightness are mainly selected. This study uses correlation analysis to select parameters that have a greater impact on brightness. The calculation formula of the correlation coefficient as Eq. (1): where,

( )
cov , X Y is the covariance between X and Y variables; X s and Y s are the standard deviation of the two variables X and Y. The parameter with the absolute value of the correlation coefficient of brightness greater than 0.6. In the end, the parameters we need to optimize are power of refiner (PR), waste paper concentration of refiner (WPCR), the volume of slurry that enters deinked process (VSED), deinking agent amount (DAA), deinking time (DT), deinking temperature (DTemp), bleaching agent amount (BAA), bleaching time (BT), and bleaching temperature (BTemp).
According to the standards established by the pulp mill, this study marked the qualified pulp as 0, and marked the unqualified pulp as -1.

Methodology
In this paper, by controlling the DIP process, the brightness of waste paper recycling pulp is satisfied. The flow chart of the article is shown in Figure 2.
In this study, all the applied algorithms are coded in Python, and the optimization of parameters and specific implementation methods are explained.

Association rules
Assume that the set is a set of all items in a data set. Each item set (T) is a subset of I, and the set of all item sets is our input data set (D). For the data set D, N represents the total number of transactions that D contains. The number of items contained in each item set T is called the dimension of the transaction. If the dimension length of the item is k, this item set is called a k-item set.
The two indicators of support and confidence can determine whether an association rule is valid. The support degree is the proportion of the number of transactions occurring simultaneously with X and Y in the data set D, and the expression is support (X⇒Y) = P (X∪Y). Frequent item sets can be obtained through the support degree. Reliability describes the probability of X appearing under the condition of Y, which can measure the credibility of association rules. Its expression is confidence (X∪Y) = P (X | Y). In order to filter out rules with a certain degree of support and confidence among many association rules, the user needs to set a minimum support (minsupp) and a minimum confidence (minconf) in advance. For an item set X⊆D, if support (X) is greater than or equal to the minimum support, it is considered to be a frequent item set. Association rules that satisfy both the minimum support and the minimum confidence are called strong association rules. In practical applications, the settings of minsupp and minconf need to be set according to specific problems.

FP-growth algorithm
Assume that the set is a set of all items in a data set. Each transaction (also called an item set) T is a subset of I, and the set of all transactions is our input data set (D) (take the data set in Table S1 in the Supporting Information as an example).
The algorithm flow is as follows: 1. The min support is set to 3. The first scan of the data set can get a frequent itemset and its support count, and the obtained frequent item sets are arranged in descending order to obtain a Table L  Create an FP-tree, create a root node, and mark it as "null". Scan the database for the second time, and sort each transaction in the order of Table L, and create a path based on this null root node. Taking transaction TID 1 as an example, a path { 3 4 5 1 2 : 1, : 1, : 1, : 1, : 1, 3. Get the path of all transactions. When the created path encounters the same node, the node is incremented by 1 (as shown in Figure 3).
4. Mining the FP-tree, recursively mining from the FPtree from the bottom up according to the set minimum support, and linking the suffixes to generate the final frequent item set.

Apriori algorithm
The process of apriori algorithm to obtain frequent item sets is as follows: (1) Scan the data set RH, add item that appears for the first time to candidate set CP1 t. Set the number of occurrences of the item to 1, and if the item appears again in subsequent times, increase the number of occurrences by 1. Finally, we can get frequent candidate 1-item sets. Deleting data items that occur less than min support in CP 1 results in frequent 1-item set FP 1 .
(2) Connection step: in order to find out 1-item set FP 2 , self-connection operation needs to be performed on FP 1 . Assume that there are n frequent items (3) Pruning step: first filter out transactions that cannot generate frequent candidate sets, then scan the transaction database RH. Calculate the number of times for each item in the candidate 2-items set CP 2 , and determine whether it is greater than min support. If so, this item joins FP 2 . If not, delete the item. When all projects in FP 2 have been executed, they will get frequent 2-items set FP 2 .
(4) The frequent 2-items set FP 2 is self-connected to generate a candidate 3-items set CP 3 , and then a pruning operation is performed to generate the frequent 3-items set FP 3 . Generally, FP k-1 is self-connected to obtain candidate k-items set CP k , and frequent k-items set FP k is obtained through pruning. If CP k is empty, the algorithm is terminated, and the union of FP 1 to FP k is taken as the final freent item sets RL.

Data discretization
In the industrial production process, there are a large number of continuous variables, such as environmental temperature and humidity, pressure, flow rate, liquid level, etc. There are also a large number of discrete variables, such as the start and stop status of the equipment and the switching status of the valve.
The working condition data in the papermaking industry is generally continuous data, but the association rule algorithm can only mine discrete data. Therefore, continuous data need to be discretized before association rule data mining. The discretization of continuous data is a qualitative analysis method. For a continuous data, it fluctuates within a certain range. Based on an evaluation method, several division points can be set. Part of continuous data is divided into different ranges of continuous data into different sub-intervals. Finally, different sub-intervals are described with specific symbols or integer values, and the discretization operation of continuous data is finally realized.
In this paper, the continuous data discretization method based on the 3σ principle is used to discretize the selected research variables. The continuous data is classified Partial data before and after discretization are shown in Tables S2 and S3.

Best frequent set
After data discretization, the data can be input into the association rules algorithm to find potential laws. Different minimum support settings will result in different frequent sets. The relationship between the setting of min support and the number of frequent sets is shown in Figure 4. The goal of our research is to find a 10-items frequent set. The greater the minimum support, the more occurrences of this condition in historical data.
When the minimum support degree is set to 1400, the frequent sets appearing are related to unqualified working conditions. Therefore, in this study, the minimum support was set to 1200. The obtained two 10-items frequent sets are shown in Table 2(a) and (b).
As shown in Table 2(a) and (b), the two frequent sets are about unqualified working conditions and qualified working conditions respectively. The frequent sets of qualified conditions and unqualified conditions have obvious differences. In order to ensure the brightness of pulp is qualified, the following production conditions should be controlled: PR is in the range (u, u + s], that is, in the range of 230~247 kW; t WPCR is within the range (u, u + s], that is, within the range of 4.98~5.05%. VSED is within the range (u, u + s], that is, within the range of 66.38~71.37 m 3 /h. DAA is in the range (u, u + s], that is, in the range of 3.34~3.53 t/h. DT is within (u, u + s], that is, within 1084.62~1124.97 s. DTemp is within the range (u, u + s], that is, within the range of 56.8~61.3 °C. BAA amount is within the range (u + s, u + 2s] that is, within the range of 1.42~1.54 t/h; BT was within the range of (u -s, u], that is, within the range of 888.41~928.22s. BTemp is in the range (u, u + s], that is, within 35.8~37.3 °C.
The running time also has a big impact. The comparison of the operation of Apriori algorithm and FP-growth algorithm under different min support is shown in Figure 5. As shown in Figure 5, the running time of the two data mining algorithms is significantly different. The running time of the FP-growth algorithm is significantly lower than that of the apriori algorithm. The running results of the two algorithms are the same. Considering the algorithm efficiency, the FP-growth algorithm has obvious advantages. FP-growth algorithm is more suitable for data mining in secondary fiber mills.

Parameter optimization results
In addition to the DIP properties, secondary fiber plants also focused on the cost of the pulping process. In the pulping process, the main energy consumption is electricity. Figure 6 shows the electricity consumption per ton of pulp fluctuates greatly and there is still the potential for cost savings.
In section 4.1, we have found the best parameter range for producing qualified pulp. Within the above range, this study further optimized the parameters to reduce the power consumption. This section used correlation analysis to find the relationship between the parameters and the electricity cost per ton of pulp.
As shown in Figure 7, the three variables that are most highly correlated with the electricity are: PR, DT, and BT. In order to reduce the electricity consumption per ton of pulp, the PR, DT and BT should be set as low as possible within the allowable range. According to the working condition range excavated by FP-growth, the pulp power should be set to 230 kW, the deinking time set to 1084.62 s, and the bleaching time set to 888.41 s. The remaining parameters should be controlled in the range of Table 2.

Results and discussion
After parameter optimization, the pulping mill carried out the production for 3 days. The brightness data and electricity data were collected. A total of 4320 sets of sample data were obtained. Figure 8 shows that before parameter optimization, the product qualification rate is 85.63%. For this pulp mill, the product qualification rate can meet the normal production requirements. However, in order to prevent unqualified pulp flowing into the papermaking process, it is necessary to further improve the pulp qualification rate. After the parameter optimization, the qualified rate of the pulp is 92.88%, and the qualified rate of pulp is increased by 7.25%.

Economic performance
In this section, the economic performance before and after parameter optimization was compared to see whether after parameter optimization pulping process is applicable.
In 2018, for this pulp mill, the average electricity consumption per ton of pulp is 586.43 kWh/t. After parameter optimization, the average power consumption per ton of pulp is 580.67 kWh/t. The industrial electricity price in the area where this pulp mill is located is 1.0765 yuan/ kWh. The pulp mill can save 6.200 yuan for producing one ton of pulp. The designed production capacity of the pulp mill is 300,000 tons per year, then the annual cost savings is 1.86 million yuan.
For most pulp mills, they have their own power stations. The coal commonly used in power stations is lignite. The energy consumption saved can be calculated by the following formula: where e F (kWh/t) is the electric power, so we need to transform its unit from kWh/t to GJ/t. The potential energy consumption benefit will be 6220.8 GJ/year with a design capacity of 300,000 ton/year.
The combustion heat of lignite is 27 MJ/kg, and the combustion efficiency of the power generation boiler is 35%. A total of 658.28 tons of lignite can be saved last year. The average price of lignite is 145 yuan/t. After parameter optimization, the pulp mill can reduce costs by 95450.6 yuan.

Carbon emission
To achieve cleaner production, pulp mills usually spend a lot in exhaust gas treatment. Generally, nitrogen oxides, sulfur dioxide, mercury and ash can be removed completely by chemical scrubber or electrostatic precipitator. But carbon dioxide is more difficult to be captured.
According to the experimental results (Suuberg et al., 1978), the product composition of lignite after combustion is shown in Table 3.  The results of Section 5.2 show that after optimization of parameters, a total of 658.28 tons of lignite can be saved. The gas emissions that can be reduced are shown in Figure 9. The results show that compared with before the parameter optimization, the parameter optimization can reduce the carbon emissions of 1.60 t.
Compared with the cogeneration technologies (Shabbir & Mirzaeian, 2017) and carbon dioxide absorption technology (Nwaoha & Tontiwachwuthikul, 2019) proposed, the method proposed in this paper does not require the pulp mill to invest new funds for process improvement. The CO 2 intensity dropped, which reflected the improvement of energy efficiency in the pulp and paper industry (Zhou et al., 2016).

Conclusions
Pulp making is a carbon emission intensive industry. The improvement of the production efficiency is important for the reduction of the carbon emission. To fill this gap, a method for parameter optimization of the pulping process based on data mining is proposed in this work. The DIP qualification rate is investigated. Carbon emissions and economic performance are also conducted to provide suggestions for further development. Compared with before the parameter optimization, the qualified rate of the pulp was increased by 7.25%. The potential economic benefit for a secondary fiber paper mill that purchased electricity is 1.86 million yuan. The potential economic benefit for the secondary fiber paper mill with its own power stations is 95450.6 yuan. At the same time, it can reduce carbon emissions of 1.6 t.