Releases: carefree0910/carefree-learn
carefree-learn 0.4.1
Release Notes
We're happy to announce that carefree-learn
released v0.4.x
, which is much clearer, much more unified, and also much more lightweight!
Main Changes
In the v0.3.x
era, Pipeline
s (e.g., MLPipeline
, CVPipeline
) are implemented in a tricky and unpleasant way - they overly depend on inheritance, causing hundreds of thousands of duplicated / unneeded / workaround codes.
Same problems also occur at the data
module: MLData
is powerful but nobody can maintain it, CVData
utilizes third party libraries pretty well but nobody knows how to use it.
In v0.4.x
, we abstracted out the true Pipeline
structure: it should just be a series of Block
s. When it is running, each Block
do its own job. When saving/loading, each Block
saves/loads its own assets.
Under this design, we are able to refactor the original Pipeline
s into a unified one. What's more exciting is that the data
module can also be refactored into the new Pipeline
structure, like this.
Documentation
v0.4.x
is still under heavy development, so currently the documentation is out of date. I'll try to update it ASAP and before that, it might be a good start to look at the examples, which cover quite a few use cases!
carefree-learn 0.3.2
Miscellaneous fixes and updates.
carefree-learn 0.3.1
Miscellaneous fixes and updates.
carefree-learn 0.3.0
We're happy to announce that carefree-learn
released v0.3.x
, which made it much more light-weight!
carefree-learn 0.2.5
Miscellaneous fixes and updates.
carefree-learn 0.2.4
Miscellaneous fixes and updates.
carefree-learn 0.2.3
Miscellaneous fixes and updates.
carefree-learn 0.2.2
Miscellaneous fixes and updates.
carefree-learn 0.2.1
Release Notes
We're happy to announce that carefree-learn
released v0.2.x
, which made it capable of solving not only tabular tasks, but also other general deep learning tasks!
Introduction
Deep Learning with PyTorch made easy 🚀!
Like many similar projects, carefree-learn
can be treated as a high-level library to help with training neural networks in PyTorch. However, carefree-learn
does more than that.
carefree-learn
is highly customizable for developers. We have already wrapped (almost) every single functionality / process into a single module (a Python class), and they can be replaced or enhanced either directly from source codes or from local codes with the help of some pre-defined functions provided bycarefree-learn
(see Register Mechanism).carefree-learn
supports easy-to-use saving and loading. By default, everything will be wrapped into a.zip
file, andonnx
format is natively supported!carefree-learn
supports Distributed Training.
Apart from these, carefree-learn
also has quite a few specific advantages in each area:
Machine Learning 📈
carefree-learn
provides an end-to-end pipeline for tabular tasks, including AUTOMATICALLY deal with (this part is mainly handled bycarefree-data
, though):- Detection of redundant feature columns which can be excluded (all SAME, all DIFFERENT, etc).
- Detection of feature columns types (whether a feature column is string column / numerical column / categorical column).
- Imputation of missing values.
- Encoding of string columns and categorical columns (Embedding or One Hot Encoding).
- Pre-processing of numerical columns (Normalize, Min Max, etc.).
- And much more...
carefree-learn
can help you deal with almost ANY kind of tabular datasets, no matter how dirty and messy it is. It can be either trained directly with some numpy arrays, or trained indirectly with some files locate on your machine. This makescarefree-learn
stand out from similar projects.
When we say ANY, it means that carefree-learn
can even train on one single sample.
For example
import cflearn
toy = cflearn.ml.make_toy_model()
data = toy.data.cf_data.converted
print(f"x={data.x}, y={data.y}") # x=[[0.]], y=[[1.]]
This is especially useful when we need to do unittests or to verify whether our custom modules (e.g. custom pre-processes) are correctly integrated into carefree-learn
.
For example
import cflearn
import numpy as np
# here we implement a custom processor
@cflearn.register_processor("plus_one")
class PlusOne(cflearn.Processor):
@property
def input_dim(self) -> int:
return 1
@property
def output_dim(self) -> int:
return 1
def fit(self, columns: np.ndarray) -> cflearn.Processor:
return self
def _process(self, columns: np.ndarray) -> np.ndarray:
return columns + 1
def _recover(self, processed_columns: np.ndarray) -> np.ndarray:
return processed_columns - 1
# we need to specify that we use the custom process method to process our labels
toy = cflearn.ml.make_toy_model(cf_data_config={"label_process_method": "plus_one"})
data = toy.data.cf_data
y = data.converted.y
processed_y = data.processed.y
print(f"y={y}, new_y={processed_y}") # y=[[1.]], new_y=[[2.]]
There is one more thing we'd like to mention: carefree-learn
is Pandas-free. The reasons why we excluded Pandas are listed in carefree-data
.
Computer Vision 🖼️
carefree-learn
also provides an end-to-end pipeline for computer vision tasks, and:-
Supports native
torchvision
datasets.data = cflearn.cv.MNISTData(transform="to_tensor")
Currently only
mnist
is supported, but will add more in the future (if needed) ! -
Focuses on the
ImageFolderDataset
for customization, which:- Automatically splits the dataset into train & valid.
- Supports generating labels in parallel, which is very useful when calculating labels is time consuming.
See IFD introduction for more details.
-
carefree-learn
supports various kinds ofCallback
s, which can be used for saving intermediate visualizations / results.- For instance,
carefree-learn
implements anArtifactCallback
, which can dump artifacts to disk elaborately during training.
- For instance,
Examples
Machine Learning 📈 | Computer Vision 🖼️ |
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True) |
import cflearn
data = cflearn.cv.MNISTData(batch_size=16, transform="to_tensor")
m = cflearn.api.resnet18_gray(10).fit(data) |
Please refer to Quick Start and Developer Guides for detailed information.
Migration Guide
From 0.1.x
to v0.2.x
, the design principle of carefree-learn
changed in two aspects:
- The
DataLayer
inv0.1.x
has changed to the more generalDataModule
inv0.2.x
. - The
Model
inv0.1.x
, which is constructed bypipe
s, has changed to generalModel
.
These changes are made because we want to make carefree-learn
compatible with general deep learning tasks (e.g. computer vision tasks).
Data Module
Internally, the Pipeline
will train & predict on DataModule
in v0.2.x
, but carefree-learn
also provided useful APIs to make user experiences as identical to v0.1.x
as possible:
Train
v0.1.x | v0.2.x |
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.make().fit(x, y) |
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True) |
Predict
v0.1.x | v0.2.x |
predictions = m.predict(x) |
predictions = m.predict(cflearn.MLInferenceData(x)) |
Evaluate
v0.1.x | v0.2.x |
cflearn.evaluate(x, y, metrics=["mae", "mse"], pipelines=m) |
cflearn.ml.evaluate(cflearn.MLInferenceData(x, y), metrics=["mae", "mse"], pipelines=m) |
Model
It's not very straight forward to migrate models from v0.1.x
to v0.2.x
, so if you require such migration, feel free to submit an issue and we will analyze the problems case by case!