A Hierarchical Softmax Framework for PyTorch.
hierarchicalsoftmax can be installed from PyPI:
pip install hierarchicalsoftmax
Alternatively, hierarchicalsoftmax can be installed using pip from the git repository:
pip install git+https://github.com/rbturnbull/hierarchicalsoftmax.git
Build up a hierarchy tree for your categories using the SoftmaxNode instances:
from hierarchicalsoftmax import SoftmaxNode
root = SoftmaxNode("root")
a = SoftmaxNode("a", parent=root)
aa = SoftmaxNode("aa", parent=a)
ab = SoftmaxNode("ab", parent=a)
b = SoftmaxNode("b", parent=root)
ba = SoftmaxNode("ba", parent=b)
bb = SoftmaxNode("bb", parent=b)
The SoftmaxNode class inherits from the anytree Node class which means that you can use methods from that library to build and interact with your hierarchy tree.
The tree can be rendered as a string with the render method:
root.render(print=True)
This results in a text representation of the tree:
root ├── a │ ├── aa │ └── ab └── b ├── ba └── bb
The tree can also be rendered to a file using graphviz if it is installed:
root.render(filepath="tree.svg")
Then you can add a final layer to your network that has the right size of outputs for the softmax layers. You can do that manually by setting the output number of features to root.layer_size. Alternatively you can use the HierarchicalSoftmaxLinear or HierarchicalSoftmaxLazyLinear classes:
from torch import nn
from hierarchicalsoftmax import HierarchicalSoftmaxLinear
model = nn.Sequential(
nn.Linear(in_features=20, out_features=100),
nn.ReLU(),
HierarchicalSoftmaxLinear(in_features=100, root=root)
)
Once you have the hierarchy tree, then you can use the HierarchicalSoftmaxLoss module:
from hierarchicalsoftmax import HierarchicalSoftmaxLoss
loss = HierarchicalSoftmaxLoss(root=root)
Metric functions are provided to show accuracy and the F1 score:
from hierarchicalsoftmax import greedy_accuracy, greedy_f1_score
accuracy = greedy_accuracy(predictions, targets, root=root)
f1 = greedy_f1_score(predictions, targets, root=root)
The nodes predicted from the final layer of the model can be inferred using the greedy_predictions function which provides a list of the predicted nodes:
from hierarchicalsoftmax import greedy_predictions
outputs = model(inputs)
inferred_nodes = greedy_predictions(outputs)
The loss for each node can be weighted relative to each other by setting the alpha value for each parent node. By default the alpha value of a node is 1.
For example, the loss for the first level of classification (under the root node) will contribute twice as much to the loss than under the a or b nodes.
from hierarchicalsoftmax import SoftmaxNode
root = SoftmaxNode("root", alpha=2.0)
a = SoftmaxNode("a", parent=root)
aa = SoftmaxNode("aa", parent=a)
ab = SoftmaxNode("ab", parent=a)
b = SoftmaxNode("b", parent=root)
ba = SoftmaxNode("ba", parent=b)
bb = SoftmaxNode("bb", parent=b)
You can add label smoothing to the loss by setting the label_smoothing parameter to any of the nodes.
You can use the Focal Loss instead of a basic cross-entropy loss for any of the nodes by setting the gamma parameter to any of the nodes.
- Robert Turnbull <[email protected]>