Skip to content

Code Corruption

Ryu Xin edited this page Oct 5, 2022 · 4 revisions

In the first half of the year when I joined a company in early 2015, I observed that the platform at that time was unable to provide agile support for the business: insufficient manpower and serious overtime; long scheduling time and frequent delays; co-construction mechanism, difficulties in coordination and other issues. This article is in the context of that time, recording the problems I observed and some initial thoughts. PS: The sample code of this article has nothing to do with the actual code.

E-commerce business is diverse, and each department has different business positioning. Some businesses are for specific "vertical" industries, such as clothing business, electrical business, automobile business, vacation business, communication business, etc. Some businesses are also positioned as platform-based businesses that provide business support for all industries, such as group buying business, shopping guide business, etc. In the early days, the processing logic of each business was consistent, but with the development needs of their respective businesses, business processing logic produced differentiated requirements.

As the business becomes more and more, the code gradually begins to corrode locally, and eventually the system becomes overwhelmed and unsustainable.

Code corruption

Let's take the "inventory reduction strategy" as an example. In the early stage of the platform's inventory reduction processing logic, the inventory reduction strategy is "reduce inventory before payment of goods". However, for some commodities with relatively small inventory and relatively expensive prices, such as air conditioners, refrigerators and other major appliances, if the strategy of "reducing inventory before payment of the goods" is adopted, it will lead to bad auctions by some competing merchants, that is, only order Goods, but no payment. In this case, there will be no goods to sell in the inventory of the seller of electricity, and the actual goods are still backlogged in the warehouse, resulting in a large amount of capital occupation and warehouse occupation. Therefore, in the electrical appliance business, they hope to customize the inventory reduction strategy of this business as "reduce inventory after payment". At this time, the original code with consistent processing logic has evolved into the following form:

public ReduceTypeEnum getProductInventoryReducePolicy(OrderLine orderLine){
    if (orderLine.getProduct().hasTag(9527) ){ 
        //If the item ordered is appliance
        return ReduceTypeEnum.AFTER_PAYMENT;
    }
    return ReduceTypeEnum.BEFORE_PAYMENT;
}

The development of the business is so rapid that the above code cannot be stabilized at all. When some virtual goods appear, some virtual goods have no inventory limit and do not require inventory deduction after purchase by buyers. At this point, the above code becomes like this:

public ReduceTypeEnum getProductInventoryReducePolicy(OrderLine orderLine){

     if ( orderLine.getProduct().isVirtual() ){ //If it is a virtual commodity
         return ReduceTypeEnum.NO_REDUCATION;
     }
     if ( orderLine.getProduct().hasTag(9527) ){ //If the item ordered is appliance
         return ReduceTypeEnum.AFTER_PAYMENT;
     }
     return ReduceTypeEnum.BEFORE_PAYMENT;
}

After adding this logic while doing the virtual commodity business, it will be released after verification and no problem. However, after a few months, the major appliance business added a "store pickup" method of receiving goods. For some home appliances, consumers hope that they can choose a satisfactory and flawless product in the store. For the goods picked up by the store, the virtual delivery code is also issued instead of physical logistics. However, in order to ensure that the goods can be picked up in the store after receiving the pick-up code, the inventory of the goods corresponding to the pick-up code is still to be deducted. However, the code after the above changes is invalid for this scenario, because the logic of not reducing the inventory of virtual goods is always in front of the big home appliance business. After the business development analysis of major appliances, the above code is further adjusted as follows:

public ReduceTypeEnum getProductInventoryReducePolicy(OrderLine orderLine){

     if ( orderLine.getProduct().isVirtual() ){ //If it is a virtual commodity
         if ( !orderLine.getProduct().hasTag(9527) ){
             return ReduceTypeEnum.NO_REDUCATION;
         }
     }
     if ( orderLine.getProduct().hasTag(9527) ){ //If the item ordered is appliance
         return ReduceTypeEnum.AFTER_PAYMENT;
     }
     return ReduceTypeEnum.BEFORE_PAYMENT;
}

As the categories of goods in the electrical appliance business become more and more abundant and the number of goods increases, the uniform use of "payment to reduce inventory" in the home appliance business to reduce inventory cannot meet the demand. The inventory reduction strategy of the home appliance business has become "if the unit price of the product is greater than 5,000, it is the after payment reduction, otherwise it is the before payment reduction". The above code is further corrupted into the following:

public ReduceTypeEnum getProductInventoryReducePolicy(OrderLine orderLine){

     if ( orderLine.getProduct().isVirtual() ){
         if ( !orderLine.getProduct().hasTag(9527) ){
             return ReduceTypeEnum.NO_REDUCATION;
         }
     }
     if ( orderLine.getProduct().hasTag(9527) ){
         if( orderLine.getProduct().getPrice().getCent() > 500000L ) {
             return ReduceTypeEnum.AFTER_PAYMENT;
         }
         return ReduceTypeEnum.BEFORE_PAYMENT;
     }
     return ReduceTypeEnum.BEFORE_PAYMENT;
}

The corruption process of the above code was not formed in a day, it also accumulated over many years. This is still a very simple customization of a business's inventory reduction strategy. The most extreme method I have ever seen is thousands of lines long, filled with a large number of branch logic judgments, and the code is indented and nested very deeply. When faced with a new deduction scenario, every programmer carefully finds a seemingly suitable position, adds his own if statement, and prays that it will not affect other unrelated businesses.

Introducing design patterns, but still no silver bullet

With more and more customized business logic, aspiring programmers began to consider how to introduce design patterns to solve the problem of code corruption. Let’s take the above business customized inventory reduction strategy as an example. Experienced programmers will define an SPI interface for this customization point, as follows:

public interface GetCustomInventoryReducePolicySpi {

    boolean filter( ReducePolicySettingReq request );

    ReduceTypeEnum execute( ReducePolicySettingReq request );
}

Since there will be multiple implementation classes that implement the SPI interface, in order to accurately execute the correct interface implementation class, the SPI interface will define two methods:

  • filter: used to judge whether the implementation class of the current SPI is in effect according to the current context request parameters
  • execute: When the current SPI implementation class takes effect, the platform will call the execute method of the current SPI instance to obtain a custom inventory reduction strategy

In this way, we can define each if statement in the corrupt code as an SPI implementation class, and the platform is responsible for traversing these SPI implementation classes and finding the first one whose return value is not null. as follows:

public class VirtualProductReduceInventoryPolicyImpl 
     implements GetCustomInventoryReducePolicySpi {

    public boolean filter( ReducePolicySettingReq request ){
        return request.getOrderLine().getProduct().isVirtual();
    }

    public ReduceTypeEnum execute( ReducePolicySettingReq request ){
        return ReduceTypeEnum.NO_REDUCATION;
    }
}

public class BigSellerAppliancesReduceInventoryPolicyImpl 
     implements GetCustomInventoryReducePolicySpi {

    public boolean filter( ReducePolicySettingReq request ){
        return request.getOrderLine().getProduct().hasTag(9527);
    }

    public ReduceTypeEnum execute( ReducePolicySettingReq request ){
        OrderLine orderLine = request.getOrderLine();
        if( orderLine.getProduct().getPrice().getCent() > 500000L ) {
             return ReduceTypeEnum.AFTER_PAYMENT;
         }
         return ReduceTypeEnum.BEFORE_PAYMENT;
    }
}

public class InventoryReduceProcessor extends SpiProcessor{
    private List<GetCustomInventoryReducePolicySpi> reducePolicySpis =
        Lists.newArrayList(
            new BigSellerAppliancesReduceInventoryPolicyImpl(),
            new VirtualProductReduceInventoryPolicyImpl()
            ......
        );

    public ReduceTypeEnum getProductInventoryReducePolicy(OrderLine orderLine){
        return reducePolicySpis.stream().filter( p -> p.filter(request))
             .map( p-> p.execute(p))
             .findFirst()
             .orElse(ReduceTypeEnum.AFTER_PAYMENT);
    }
}

After processing this way, the platform's core processing code eliminates a lot of if..else... , everyone rejoices. Facts have also proved that when there are not many implementation classes of an SPI interface, this method is still very effective, and it can indeed greatly improve the scalability of the system. However, there are many industries supported by the e-commerce platform, which will eventually lead to the continuous growth and expansion of the list of SPI implementation class registrations. Subsequently, whenever a requirement involves the logic customization of the inventory reduction strategy, it is necessary to carefully analyze and evaluate how to modify the SPI registration tree. like:

  • Do I need to modify the existing SPI implementation, or do I need to register a new SPI instance?
  • For SPI instances that need to be added, in which order should they be registered?
  • How to assess the impact on other SPIs after newly added SPIs or changes to existing SPIs?

With the increase of SPI implementation classes, how to add SPI instances without affecting existing implementations has become a huge challenge. Technicians need to read the code carefully to see how the filter method of each registered SPI instance is written to ensure that the effective conditions of the newly added SPI instances are not too large to affect other SPI instances, and the effective conditions cannot be If it is too small, its own SPI instance will never take effect.

Our technicians are very smart, they will use more advanced technology, such as rules engine, remote RPC calls, etc. to write filter methods. At this point, it is almost impossible to read the code to assess when these SPI instances will take effect.

In a complex business system, there are nearly a thousand interface definitions for SPIs like this, and dozens of implementation classes are registered on each SPI interface. The SPI implementation class registered in this way, whether it is registered explicitly or dynamically registered by the configuration file, will find and execute the matched SPI implementation class in a traversal manner during runtime. This mechanism leads to a huge coupling between different businesses and businesses. Any addition or modification of SPI will have huge uncertainty on whether it will affect other businesses. As the scale of the business continues to grow, this SPI registration and management mechanism begins to fail and becomes a key obstacle to the rapid iteration and development of the business.


中文版:https://www.ryu.xin/2021/07/20/bad-code/