Skip to content

Commit

Permalink
[docs] updates and improvements for documentation (#940)
Browse files Browse the repository at this point in the history
* added python version badge

* fixed typos

* fixed links

* readthedocs doesn't support links with anchor out of box

* fixed table rendering at ReadTheDocs #776#issuecomment-319851551

* fixed table rendering at ReadTheDocs

* added link to Key-Events page

* fixed links

* hotfix

* fixed markdown
  • Loading branch information
StrikerRUS authored and guolinke committed Sep 29, 2017
1 parent 8aef4bf commit d292512
Show file tree
Hide file tree
Showing 16 changed files with 584 additions and 502 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ LightGBM, Light Gradient Boosting Machine
[![Documentation Status](https://readthedocs.org/projects/lightgbm/badge/?version=latest)](https://lightgbm.readthedocs.io/)
[![GitHub Issues](https://img.shields.io/github/issues/Microsoft/LightGBM.svg)](https://github.com/Microsoft/LightGBM/issues)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/Microsoft/LightGBM/blob/master/LICENSE)
[![Python Versions](https://img.shields.io/pypi/pyversions/lightgbm.svg)](https://pypi.python.org/pypi/lightgbm)
[![PyPI Version](https://badge.fury.io/py/lightgbm.svg)](https://badge.fury.io/py/lightgbm)
<!--- # Uncomment after updating PyPI [![Python Versions](https://img.shields.io/pypi/pyversions/lightgbm.svg)](https://pypi.python.org/pypi/lightgbm) -->

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

Expand All @@ -35,7 +35,7 @@ News

05/03/2017 : LightGBM v2 stable release.

04/10/2017 : LightGBM supports GPU-accelerated tree learning now. Please read our [GPU Tutorial](./docs/GPU-Tutorial.md) and [Performance Comparison](./docs/GPU-Performance.md).
04/10/2017 : LightGBM supports GPU-accelerated tree learning now. Please read our [GPU Tutorial](./docs/GPU-Tutorial.md) and [Performance Comparison](./docs/GPU-Performance.rst).

02/20/2017 : Update to LightGBM v2.

Expand All @@ -47,6 +47,8 @@ News

12/02/2016 : Release [**python-package**](https://github.com/Microsoft/LightGBM/tree/master/python-package) beta version, welcome to have a try and provide feedback.

More detailed update logs : [Key Events](https://github.com/Microsoft/LightGBM/blob/master/docs/Key-Events.md).


External (unofficial) Repositories
----------------------------------
Expand Down
30 changes: 15 additions & 15 deletions docs/Advanced-Topic.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
# Advanced Topics

## Missing value handle
## Missing Value Handle

* LightGBM enables the missing value handle by default, you can disable it by set ```use_missing=false```.
* LightGBM uses NA (NAN) to represent the missing value by default, you can change it to use zero by set ```zero_as_missing=true```.
* When ```zero_as_missing=false``` (default), the unshown value in sparse matrices (and LightSVM) is treated as zeros.
* When ```zero_as_missing=true```, NA and zeros (including unshown value in sparse matrices (and LightSVM)) are treated as missing.
* When ```zero_as_missing=false``` (default), the unshown value in sparse matrices (and LightSVM) is treated as zeros.
* When ```zero_as_missing=true```, NA and zeros (including unshown value in sparse matrices (and LightSVM)) are treated as missing.

## Categorical feature support
## Categorical Feature Support

* LightGBM can offer a good accuracy when using native categorical features. Not like simply one-hot coding, LightGBM can find the optimal split of categorical features. Such a optimal split can provide the much better accuracy than one-hot coding solution.
* Use `categorical_feature` to specific the categorical features. Refer to the parameter `categorical_feature` in [Parameters](./Parameters.md).
* Need to convert to `int` type first, and only support non-negative numbers. It is better to convert into continues ranges.
* LightGBM can offer a good accuracy when using native categorical features. Not like simply one-hot coding, LightGBM can find the optimal split of categorical features. Such an optimal split can provide the much better accuracy than one-hot coding solution.
* Use `categorical_feature` to specify the categorical features. Refer to the parameter `categorical_feature` in [Parameters](./Parameters.md).
* Converting to `int` type is needed first, and there is support for non-negative numbers only. It is better to convert into continues ranges.
* Use `max_cat_group`, `cat_smooth_ratio` to deal with over-fitting (when #data is small or #category is large).
* For categocal features with high cardinality (#categoriy is large), it is better to convert it to numerical features.
* For categorical features with high cardinality (#category is large), it is better to convert it to numerical features.

## LambdaRank
## LambdaRank

* The label should be `int` type, and larger number represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect).
* The label should be `int` type, and larger numbers represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect).
* Use `label_gain` to set the gain(weight) of `int` label.
* Use `max_position` to set the NDCG optimization position.

## Parameters Tuning

* Refer to [Parameters tuning](./Parameters-tuning.md).
* Refer to [Parameters Tuning](./Parameters-tuning.md).

## GPU support
## GPU Support

* Refer to [GPU Tutorial](./GPU-Tutorial.md) and [GPU Targets](./GPU-Targets.md).
* Refer to [GPU Tutorial](./GPU-Tutorial.md) and [GPU Targets](./GPU-Targets.rst).

## Parallel Learning
## Parallel Learning

* Refer to https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide
* Refer to [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide).
31 changes: 21 additions & 10 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
LightGBM FAQ
=======================
============

### Catalog

- [Critical](FAQ.md#Critical)
- [LightGBM](FAQ.md#LightGBM)
- [R-package](FAQ.md#R-package)
- [Python-package](FAQ.md#python-package)
- [Critical](#critical)
- [LightGBM](#lightgbm)
- [R-package](#r-package)
- [Python-package](#python-package)

---

### Critical

You encountered a critical issue when using LightGBM (crash, prediction error, non sense outputs...). Who should you contact?

If your issue is not critical, just post an issue [Microsoft/LightGBM repository](https://github.com/Microsoft/LightGBM/issues).
If your issue is not critical, just post an issue in [Microsoft/LightGBM repository](https://github.com/Microsoft/LightGBM/issues).

If it is a critical issue, identify first what error you have:

Expand All @@ -40,7 +40,7 @@ Remember this is a free/open community support. We may not be available 24/7 to

- **Question 1**: Where do I find more details about LightGBM parameters?

- **Solution 1**: Look at [Parameters.md](Parameters.md) and [Laurae++/Parameters](https://sites.google.com/view/lauraepp/parameters) website
- **Solution 1**: Look at [Parameters](./Parameters.md) and [Laurae++/Parameters](https://sites.google.com/view/lauraepp/parameters) website.

---

Expand All @@ -52,7 +52,7 @@ Remember this is a free/open community support. We may not be available 24/7 to

- **Question 3**: When running LightGBM on a large dataset, my computer runs out of RAM.

- **Solution 3**: Multiple solutions: set `histogram_pool_size` parameter to the MB you want to use for LightGBM (histogram_pool_size + dataset size = approximately RAM used), lower `num_leaves` or lower `max_bin` (see [issue #562](https://github.com/Microsoft/LightGBM/issues/562)).
- **Solution 3**: Multiple solutions: set `histogram_pool_size` parameter to the MB you want to use for LightGBM (histogram_pool_size + dataset size = approximately RAM used), lower `num_leaves` or lower `max_bin` (see [Microsoft/LightGBM#562](https://github.com/Microsoft/LightGBM/issues/562)).

---

Expand All @@ -64,7 +64,7 @@ Remember this is a free/open community support. We may not be available 24/7 to

- **Question 5**: When using LightGBM GPU, I cannot reproduce results over several runs.

- **Solution 5**: It is a normal issue, there is nothing we/you can do about, you may try to use `gpu_use_dp = true` for reproducibility (see [issue #560](https://github.com/Microsoft/LightGBM/pull/560#issuecomment-304561654)). You may also use CPU version.
- **Solution 5**: It is a normal issue, there is nothing we/you can do about, you may try to use `gpu_use_dp = true` for reproducibility (see [Microsoft/LightGBM#560](https://github.com/Microsoft/LightGBM/pull/560#issuecomment-304561654)). You may also use CPU version.

---

Expand Down Expand Up @@ -115,7 +115,18 @@ Remember this is a free/open community support. We may not be available 24/7 to
---
- **Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already construct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`.
- **Question 2**: I see error messages like
```
Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset
```
but I already construct dataset by some code like
```
train = lightgbm.Dataset(X_train, y_train)
```
or error messages like
```
Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
```
- **Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when construct a Booster. And if you set `free_raw_data=True` (default), the raw data (with python data struct) will be freed. So, if you want to:
Expand Down
183 changes: 0 additions & 183 deletions docs/GPU-Performance.md

This file was deleted.

Loading

0 comments on commit d292512

Please sign in to comment.