Decision making of textile production in H company: customer management and production tasks distribution
File data/H_company.db
is an SQLite database that includes records of textile (chemical fiber) production in H company.
customers
The quality of the customers is measured by H company.
Name | Type | Description |
---|---|---|
customer_id | text | customer's identifier |
parts | text | the production that the customer buys; one company can have multiple records if buying multiple parts from H company |
score | integer | the score of the customer, ranges 1~5 |
factories
The factories are managed by H company.
Each factory manages multiple batches. Each batch has the same production settings. Production tasks assigned to the same batch number must produce the same type of textile. I don't know the relationship between batch, factory, parts, and specification in advance, but look up from orders. (See the view "batches".)
Name | Type | Description |
---|---|---|
factory_id | text | Factory's identifier |
orders
The orders placed by customers.
When an order is placed, H company will allocate a batch according to the specification. Then, H company checks whether there is stocked production. If there is, it will ship; if not, it will start producing. The batches distributed to 5 of the factories recorded efficiency. Among these, 40 batches in 2 factories record the efficiency of spinning, and 218 batches in 3 factories record the efficiency of texturing. The records of efficiency in spinning and texturing don't have common batches or factories. Other batches don't record efficiency.
Name | Type | Description |
---|---|---|
order_id | integer | the order's identifier |
created_time | text | date time when the order is received |
customer_id | text | the customer's identifier who placed the order |
batch_id | text | the batch's identifier |
factory_id | text | the factory's identifier who works on the order |
weight | real | the weight of ordered textile (KG) |
parts | text | the production type of ordered textile |
specification | text | the specification of ordered textile |
price | real | the price of 1KG ordered production (CNY 2020) |
voucher | real | the discount amount of this order (CNY 2020) |
quality | real | the quality of ordered textile, ranged 1~5 |
spinning
Spinning produces FDY and POY from melted polyester and winds the fibers on the bobbin.
This table tracks 40 batches in orders. I have summarized the production and scrap rate if one shift has multiple records. In this dataset, it's guaranteed (batch_id, date, night)
is unique.
Name | Type | Description |
---|---|---|
batch_id | text | the batch's identifier |
date | text | the date of shift |
night | integer | whether the night shift |
production | real | the mass of total production (KG) |
scrap_rate | real | the mass of abandoned bobbins over the total mass of bobbins, in percentage ranged 0~100, efficiency is 100 / (100 + scrap_rate) |
texturing
Texturing produces DTY from the output of spinning.
This table tracks 218 batches in orders. The efficiency of the same batch may change in different weather. I have manually adjusted with concerns to correct the mistakes of workers recording the time range (from start_time
to end_time
). In this dataset, it's guaranteed the time ranges for each batch_id
don't overlap.
Name | Type | Description |
---|---|---|
batch_id | text | the batch's identifier |
start_time | text | date time when the record starts |
end_time | text | date time when the record ends (machine isn't always open in the time range) |
efficiency | real | efficiency coefficient on the machine, ranged 0~1 |
The script 1_date_distribution.py shows the distribution of the dates of the orders. I split orders in the first 80% of dates to the training set, where created_time <= '2020-09-06'
, and orders in the last 20% of dates to the testing set, where created_time > '2020-09-06'
.
I use 3 metrics to represent the benefits of each order and 3 other metrics to represent the cost of each order.
Benefits:
- Amount of money earned from the order.
- Customer score. (If the customer has a higher order, prioritizing this order is good for maintaining valuable customer relationships.)
- Logarithmic of weight. (Larger orders are preferred to dilute each order's fixed costs, such as selling fee and warehouse rental fee.)
Costs:
- Weight of raw materials. (Assume equal to the weight of the production.)
- Quality of raw materials. (Higher-quality products need more costs of quality control.)
- Rarity of raw materials. (Rare specification adds more difficulty. Rarity is calculated based on the weight of the production historically ordered.)
Assumption: the values of benefits and costs of all orders should be positive. (Required by DEA)
To find the formula for how I calculate the values, read these queries, which retrieve the values of benefits and costs of each order:
- The training set orders_benefit_cost_train.sql
- The testing set orders_benefit_cost_test.sql
Name | Type | Description |
---|---|---|
order_id | integer | the order's identifier |
b_payment | real | benefit: amount of money |
b_customer | integer | benefit: customer score |
b_scale | real | benefit: logarithmic of weight |
c_material | real | cost: weight of raw materials |
c_quality | integer | cost: quality of raw materials |
c_rarity | real | cost: rarity of raw materials |
We use data enveloping analysis to measure the efficiency of each order.
Assume the values of benefits of the current order are
The efficiency of the current order is
Because
We do linearization using this property. Thus LP(1) is transformed into LP(2).
In LP(2),
The results of orders' efficiency (in both training and testing sets) are saved in the DEA table in the database.
Name | Type | Description |
---|---|---|
order_id | integer | the order's identifier |
efficiency | real | DEA efficiency coefficient of this order, ranged 0~1 |
In ground truth, some orders are allocated to batches that I lose track of; other orders are allocated to batches whose production is tracked in the spinning and texturing table. The following analysis is focused only on those whose batches are tracked. To filter these orders, I provide the following queries:
- Tracked in spinning table, training set spinning_orders_train.sql
- Tracked in spinning table, testing set spinning_orders_test.sql
- Tracked in texturing table, training set texturing_orders_train.sql
- Tracked in texturing table, testing set texturing_orders_test.sql
On the one hand, we know the efficiency of orders by DEA, which represents the benefits of this order. The results have the following structure.
Name | Type | Description |
---|---|---|
order_id | integer | the order's identifier |
efficiency | real | DEA efficiency coefficient of this order, ranged 0~1 |
weight | integer | the weight of the ordered product, rounded to integer (KG) |
batch_id | text | the ground truth: which batch the order is allocated to |
On the other hand, we can read the efficiency of each batch via its historical production records in spinning and texturing tables. The queries are written in texturing_batch_efficiency.sql and spinning_batch_efficiency.sql
Name | Type | Description |
---|---|---|
batch_id | text | the batch's identifier |
scrap_rate | efficiency | real | see the meaning in the "Database" section: "spinning/scrap_rate" and "texturing/efficiency" |
capacity | real | (not used) the total weight of production that the batch can handle in a specific period; simulated by the total weight the batch has produced in history |
first_appear | text | (not used) the date when the efficiency of the batch is first recorded |
In the following context, I use a min-cost flow algorithm to solve this assignment problem. Formally, I should consider the capacity of batches. However, I don't know the real capacity of these factories, and simulating the value by the total weight the batch has produced in history will make the assignment problem infeasible. So I assume the capacity of each batch is infinity, which means each batch can handle the production tasks of all allocated orders.
The ground truth provides a production plan adopted by the real H company and allocates the filtered orders to the trackable batches. I aim to find an optimal production plan to allocate these orders to batches, achieving the maximum overall benefits under the criteria I already find: DEA efficiency of orders and batches.
Denote the efficiency of each order
The capacity constraint is turned off in this case.
According to this article, such a weighted assignment problem can be solved by a min-cost flow algorithm.
The result
The results are saved in tables min_cost_flow_spinning
and min_cost_flow_texturing
.
Name | Type | Description |
---|---|---|
order_id | integer | the order's identifier |
order_weight | integer | weight of product in this order |
allocate | integer | weight of product in this order allocated to batch_id |
batch_id | text | the batch's identifier |
efficiency | real | unit cost coefficient |
We perform the boostrapping method to draw orders from the ground truth and generate a null distribution of the production plan. Then, we find the quantile of the min-cost flow method to test whether it is significantly better. The statistics are the average efficiency weighted by allocated weight.