feat: add cmdstan-based optimizer for augurs-prophet #121

sd2k · 2024-10-09T18:36:25Z

This commit adds an implementation of Optimizer to the augurs-prophet crate that uses cmdstan to optimize or sample from the Prophet model, similar to the Prophet Python package.

The cmdstan feature is disabled by default since it requires external dependencies and may not work well on all platforms.

The compile-cmdstan feature can be enabled to build the Prophet model at build time. This requires a working installation of stan and setting the STAN_PATH environment variable to the path to the stan installation. This will embed the Prophet model binary and the libtbb dynamic library into the final executable, which will increase the size of the final executable by about 2MB, but the final binary won't require any additional dependencies.

There's an example in the examples directory which fits and predicts using cmdstan. It can be run with something like:

$ STAN_PATH=/home/ben/.conda/envs/stan cargo run --example prophet_cmdstan

(replacing the STAN_PATH value with the path to your Stan installation), or by running

$ cargo run --bin download-stan-model --features download
$ download-stan-model
$ cargo run --example prophet_cmdstan

Summary by CodeRabbit

Release Notes

New Features
- Introduced a command-line interface for downloading and managing the Prophet model.
- Added functionalities for optimizing or sampling from the Prophet model using CmdStan.
- Enhanced documentation with an example for utilizing the Prophet model with CmdStan.
- Added new features for the augurs package to support Prophet functionalities.
- Expanded capabilities of the augurs-prophet crate with new dependencies and features.
- New commands added for downloading the Prophet Stan model and running all tests.
Bug Fixes
- Corrected author email format in project metadata.
Documentation
- Expanded README with usage examples for downloading and using the Prophet Stan model.
- Updated examples to include new features and usage instructions.
Chores
- Updated testing commands to improve clarity and functionality.

This commit adds an implementation of `Optimizer` to the `augurs-prophet` crate that uses `cmdstan` to optimize or sample from the Prophet model, similar to the Prophet Python package. The `cmdstan` feature is disabled by default since it requires external dependencies and may not work well on all platforms. The `compile-cmdstan` feature can be enabled to build the Prophet model at build time. This requires a working installation of `stan` and setting the `STAN_PATH` environment variable to the path to the `stan` installation. This will embed the Prophet model binary and the libtbb dynamic library into the final executable, which will increase the size of the final executable by about 2MB, but the final binary won't require any additional dependencies. There's an example in the `examples` directory which fits and predicts using cmdstan. It can be runwith something like: ```bash STAN_PATH=/home/ben/.conda/envs/stan cargo run --example prophet_cmdstan ``` (replacing the `STAN_PATH` value with the path to your Stan installation).

coderabbitai · 2024-10-10T15:13:25Z

Walkthrough

The pull request includes modifications to several files, primarily focusing on enhancing the Rust workflow, adding dependencies, and improving documentation for the augurs-prophet crate. Key changes involve the addition of a step to download the Prophet Stan model in the CI workflow, updates to the Cargo.toml files to include new dependencies and features, and the introduction of new files for model compilation and command-line interface functionality. Documentation has also been improved with examples and clarifications regarding the usage of the Prophet model.

Changes

File Path	Change Summary
`.github/workflows/rust.yml`	Added step to download Prophet Stan model; updated test command from `just test` to `just test-all`.
`.gitignore`	Added entries for `prophet_stan_model` and `/build`.
`Cargo.toml`	Added dependency `serde_json` and corrected author email; updated `augurs-outlier` dependency.
`crates/augurs-prophet/Cargo.toml`	Added multiple new optional dependencies and features; defined new binary target for model download.
`crates/augurs-prophet/README.md`	Added "Example" section with usage instructions for downloading and using the Prophet Stan model.
`crates/augurs-prophet/build.rs`	Introduced `compile_model` function for compiling the Prophet model with CmdStan.
`crates/augurs-prophet/prophet.stan`	Added functions for modeling time series data with changepoints and various trend types.
`crates/augurs-prophet/src/bin/main.rs`	Implemented CLI for downloading and managing the Prophet model from a PyPI wheel.
`crates/augurs-prophet/src/cmdstan.rs`	Introduced functionality for optimizing/sampling with CmdStan; added error handling and command execution logic.
`crates/augurs-prophet/src/lib.rs`	Added new public module `cmdstan` for accessing CmdStan functionalities.
`crates/augurs-prophet/src/optimizer.rs`	Enhanced serialization capabilities for `Data` and `TrendIndicator`; added test module for serialization.
`crates/augurs-prophet/src/positive_float.rs`	Enabled serialization for `PositiveFloat` struct when "serde" feature is active.
`crates/augurs/Cargo.toml`	Added new features `prophet-cmdstan` and `prophet-compile-cmdstan`.
`examples/forecasting/Cargo.toml`	Updated `augurs` dependency features to include `prophet` and `prophet-cmdstan`.
`examples/forecasting/examples/prophet_cmdstan.rs`	Created example demonstrating usage of Prophet model with CmdStan optimizer.
`justfile`	Updated testing commands and added command for downloading the Prophet Stan model.

Possibly related PRs

feat: add augurs-dtw crate with dynamic time warping implementation #98: Modifications to the Rust workflow and the addition of a step to download the Prophet Stan model, relevant for time series analysis.
feat: add augurs-clustering crate with DBSCAN algorithm #100: Introduction of the augurs-clustering crate, possibly related to enhancements in testing workflows for time series data.
feat: add 'augurs' convenience crate, re-exporting other crates #117: Introduction of the augurs convenience crate, aligning with the main PR's modifications to streamline user experience.
ci: run Rust CI workflow on PRs to non-main bases #122: Changes to the Rust CI workflow, directly related to modifications in the .github/workflows/rust.yml file.

🐰 In the meadow, where models bloom,
A Stan model's now in the room!
With code and tests all set to play,
We hop along the Rusty way.
Dependencies added, examples bright,
Let’s forecast the future, with all our might! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 18

🧹 Outside diff range and nitpick comments (19)

crates/augurs-prophet/src/lib.rs (1)
20-21: LGTM! Consider adding documentation for the new module.

The addition of the cmdstan module is well-placed and correctly feature-gated, aligning with the PR objectives. Making it public allows users to access the new functionality when the feature is enabled.

Consider adding a brief doc comment above the module declaration to explain its purpose and when it's available. For example:
/// Provides cmdstan-based optimization for the Prophet model.
/// This module is only available when the `cmdstan` feature is enabled.
#[cfg(feature = "cmdstan")]
pub mod cmdstan;
justfile (2)

11-15: Improved test command structure and exclusions.

The changes to the test command are well-structured and align with the PR objectives. The use of cargo nextest and the exclusion of augurs-js and pyaugurs packages are good improvements.

Consider adding a comment explaining why augurs-js and pyaugurs are excluded to improve maintainability for future contributors.

40-42: New command added for downloading Prophet Stan model.

The addition of the download-prophet-stan-model command is a good implementation that aligns with the PR objectives. It provides a convenient way to download the required Stan model for the new cmdstan-based optimization feature.

Consider adding a brief comment explaining the purpose and any prerequisites for using this command, similar to the helpful comments provided for other commands in this file.
crates/augurs-prophet/src/positive_float.rs (1)
5-5: LGTM! Consider adding documentation for the new serde feature.

The addition of the serde feature for serialization and deserialization is a good improvement. It allows users to opt-in to this functionality when needed, which is consistent with Rust's philosophy of zero-cost abstractions.

Consider adding a brief documentation comment above the struct to mention the available features:
/// A positive-only, 64 bit precision floating point number.
///
/// # Features
///
/// - When the "bytemuck" feature is enabled, this type implements `bytemuck::Pod` and `bytemuck::Zeroable`.
/// - When the "serde" feature is enabled, this type can be serialized and deserialized using serde.
#[repr(transparent)]
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
#[cfg_attr(feature = "bytemuck", derive(bytemuck::Pod, bytemuck::Zeroable))]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct PositiveFloat(f64);
This documentation will help users understand the available features and their effects on the PositiveFloat type.
.github/workflows/rust.yml (1)
37-40: Approve the new step with suggestions for improvement.

The addition of the step to download the Prophet Stan model is a good improvement to ensure the necessary dependencies are available for testing. However, consider the following enhancements:

Provide more specific information in the comment about which example requires this model.

Add error handling or a verification step to ensure the download was successful.

Consider caching the downloaded model to speed up future workflow runs.

Here's a suggested improvement:
      - name: Download Prophet Stan model
        # Download the Prophet Stan model required for the cmdstan optimizer example
        run: |
          just download-prophet-stan-model
          if [ $? -ne 0 ]; then
            echo "Failed to download Prophet Stan model"
            exit 1
          fi
        
      - name: Cache Prophet Stan model
        uses: actions/cache@v2
        with:
          path: path/to/prophet/stan/model
          key: ${{ runner.os }}-prophet-stan-model
Replace path/to/prophet/stan/model with the actual path where the model is downloaded.
crates/augurs-prophet/Cargo.toml (1)

36-44: New features are well-defined and align with PR objectives.

The addition of cmdstan, compile-cmdstan, download, and serde features with their respective dependencies is appropriate and provides flexibility for users.

The internal-ignore-cmdstan-failure feature is well-documented, but its potential for misuse is concerning.

Consider renaming internal-ignore-cmdstan-failure to internal-ignore-cmdstan-failure-dev-only to further emphasize its intended use case and potential risks.
Cargo.toml (1)
48-48: Consider using a more flexible version specifier for serde_json.

The addition of serde_json is appropriate and aligns with the PR objectives. However, the current version specifier "1.0.128" is very specific. In Rust, it's common to use more flexible version specifiers to allow for compatible updates.

Consider changing the version specifier to "1.0" or "^1.0.128". This will allow for minor version updates that maintain backwards compatibility:
-serde_json = "1.0.128"
+serde_json = "1.0"
or
-serde_json = "1.0.128"
+serde_json = "^1.0.128"
examples/forecasting/examples/prophet_cmdstan.rs (4)
1-11: Enhance documentation with additional context and feature information.

The documentation provides clear instructions for running the example. However, consider adding the following improvements:

Brief explanation of the example's purpose (demonstrating Prophet model with CmdStan optimizer).

Mention the compile-cmdstan feature and its significance.

Note about the increase in executable size when using this feature.

This additional information will provide users with a more comprehensive understanding of the example and its implications.

12-39: LGTM! Consider extracting data preparation into a separate function.

The imports and data preparation look good. The TrainingData is correctly initialized with timestamps, values, and additional regressors.

To improve readability, consider extracting the data preparation logic into a separate function. This would make the main function cleaner and easier to understand at a glance.

Here's a suggested refactor:
fn prepare_training_data() -> Result<TrainingData, Box<dyn std::error::Error>> {
    let ds = vec![
        1704067200, 1704871384, 1705675569, 1706479753, 1707283938, 1708088123, 1708892307,
        1709696492, 1710500676, 1711304861, 1712109046, 1712913230, 1713717415,
    ];
    let y = vec![
        1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0,
    ];
    Ok(TrainingData::new(ds, y)?
        .with_regressors(HashMap::from([
            (
                "foo".to_string(),
                vec![1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0],
            ),
            (
                "bar".to_string(),
                vec![4.0, 5.0, 6.0, 4.0, 5.0, 6.0, 4.0, 5.0, 6.0, 4.0, 5.0, 6.0, 4.0],
            ),
        ]))?)
}

// In main():
let data = prepare_training_data()?;
41-43: Enhance comments for CmdstanOptimizer initialization options.

The initialization of CmdstanOptimizer looks good. To improve clarity, consider expanding the comment about the embedded version to explain when to use each approach. This will help users understand the trade-offs between the two options.

Here's a suggested improvement for the comments:
// Initialize CmdstanOptimizer with a path to the precompiled Prophet model binary.
// Use this approach when you want to manage the Stan model separately or when you need more control over the model version.
let cmdstan = CmdstanOptimizer::with_prophet_path("prophet_stan_model/prophet_model.bin")?;

// Alternatively, use the embedded version when you want to include the Stan model in your binary.
// This increases the binary size but simplifies deployment. Requires the `compile-cmdstan` feature to be enabled.
// let cmdstan = CmdstanOptimizer::new_embedded();
45-49: LGTM! Consider adding error handling for regressor addition.

The Prophet model setup and fitting look good. The regressors are added correctly, and the fit method is called with default options.

To improve robustness, consider adding error handling when adding regressors. Although it's unlikely to fail in this context, it's a good practice to handle potential errors, especially in example code.

Here's a suggested improvement:
let mut prophet = Prophet::new(Default::default(), cmdstan);
prophet.add_regressor("foo".to_string(), Regressor::additive()).map_err(|e| format!("Failed to add 'foo' regressor: {}", e))?;
prophet.add_regressor("bar".to_string(), Regressor::additive()).map_err(|e| format!("Failed to add 'bar' regressor: {}", e))?;

prophet.fit(data, Default::default())?;
crates/augurs-prophet/README.md (4)
6-44: Great addition of the "Example" section!

The new example provides clear instructions and a comprehensive code snippet that demonstrates key functionalities of the augurs-prophet crate. It's well-structured and follows Rust best practices.

Consider adding a brief explanation of the output format after line 41. This would help users understand what to expect from the println! statement.

🧰 Tools

🪛 Markdownlint

11-11: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

12-12: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

58-63: Improved clarity in the "Optimizers" section

The updates to this section provide more detailed information about the Optimizer and Sampler traits, as well as clear explanations of the cmdstan module and its features. This information is crucial for users to understand how to use the library effectively.

Consider adding a brief note about the performance implications or trade-offs of using the compile-cmdstan feature. This could help users make an informed decision about whether to enable this feature.

Also applies to: 78-85

11-13: Consider addressing Markdownlint warnings

The Markdownlint tool has flagged lines 11 and 12 for using dollar signs before commands without showing output (MD014). While not critical, addressing these could improve the README's adherence to Markdown standards.

You can address this by either:

Removing the $ signs from the beginning of the commands, or

Adding example output after each command.

For example:
```sh
cargo install --bin download-stan-model --features download augurs-prophet
download-stan-model
<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint</summary><blockquote>

11-11: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

---

12-12: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

</blockquote></details>

</details>

---

Line range hint `1-85`: **Excellent improvements to the README**

The changes to this README significantly enhance the documentation of the `augurs-prophet` crate. The addition of the example section and the clarifications in the Optimizers section provide users with a much better understanding of how to use the crate and its features.



Consider adding a "Quick Start" section at the beginning of the README, just after the initial description. This section could include:
1. A brief description of what the crate does
2. Installation instructions (including how to enable the `cmdstan` feature)
3. A minimal example of how to use the crate

This would provide an even quicker entry point for users who want to get started immediately.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint</summary><blockquote>

11-11: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

---

12-12: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

</blockquote></details>

</details>

</blockquote></details>
<details>
<summary>crates/augurs-prophet/build.rs (1)</summary><blockquote>

`75-77`: **Clarify the Usage of `internal-ignore-cmdstan-failure` Feature**

The comment suggests that no one should use this feature, yet it affects the build behavior.



Consider rephrasing the comment to clarify its intended purpose and usage:

```diff
-// This is a complete hack but lets us get away with still using
-// the `--all-features` flag of Cargo without everything failing
-// if there isn't a Stan installation.
-// Basically, if have this feature enabled, skip any failures in
-// the build process and just create some empty files.
-// This will cause things to fail at runtime if there isn't a Stan
-// installation, but that's okay because no-one should ever use this
-// feature.
+// The `internal-ignore-cmdstan-failure` feature is used to bypass build failures
+// when Stan is not installed, allowing us to use `--all-features` during development.
+// It creates placeholder files to prevent build errors, but will cause runtime
+// failures if Stan is required. This feature should only be used for internal
+// development purposes and not in production.
This provides clearer guidance on when and why this feature should be utilized.
crates/augurs-prophet/prophet.stan (1)

117-126: Ensure trend_indicator covers all possible cases

The transformed parameters block assigns trend based on trend_indicator. To prevent unintended behavior, consider adding an else clause or validate trend_indicator to ensure it only takes values 0, 1, or 2.

crates/augurs-prophet/src/optimizer.rs (1)

357-442: Consider adding negative tests for serialization failures

To strengthen the test coverage, consider adding tests that verify serialization failure cases, such as when mandatory fields are missing or contain invalid data.
crates/augurs-prophet/src/cmdstan.rs (1)
396-435: Avoid panics by returning errors in new_embedded

In the new_embedded method, the code uses expect and could panic if creating the temporary directory or writing the Prophet model binary fails. Relying on panics for error handling is not recommended.

Consider returning a Result<Self, Error> instead of panicking. This allows the calling code to handle errors gracefully.
-    pub fn new_embedded() -> Self {
+    pub fn new_embedded() -> Result<Self, Error> {
         static PROPHET_INSTALLATION: std::sync::LazyLock<ProphetInstallation> =
             std::sync::LazyLock::new(|| {
-                let dir = tempfile::tempdir().expect("could not create temporary directory");
+                let dir = match tempfile::tempdir() {
+                    Ok(dir) => dir,
+                    Err(e) => return Err(Error::CreateTempDir(e)),
+                };
                 // Remaining code...
             });
-        Self {
+        Ok(Self {
             prophet_installation: PROPHET_INSTALLATION.clone(),
-        }
+        })
     }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d9f79a7 and 59b5547.

📒 Files selected for processing (18)

.github/workflows/rust.yml (1 hunks)
.gitignore (1 hunks)
Cargo.toml (1 hunks)
crates/augurs-outlier/Cargo.toml (1 hunks)
crates/augurs-prophet/.gitignore (1 hunks)
crates/augurs-prophet/Cargo.toml (1 hunks)
crates/augurs-prophet/README.md (3 hunks)
crates/augurs-prophet/build.rs (1 hunks)
crates/augurs-prophet/prophet.stan (1 hunks)
crates/augurs-prophet/src/bin/main.rs (1 hunks)
crates/augurs-prophet/src/cmdstan.rs (1 hunks)
crates/augurs-prophet/src/lib.rs (1 hunks)
crates/augurs-prophet/src/optimizer.rs (6 hunks)
crates/augurs-prophet/src/positive_float.rs (1 hunks)
crates/augurs/Cargo.toml (1 hunks)
examples/forecasting/Cargo.toml (1 hunks)
examples/forecasting/examples/prophet_cmdstan.rs (1 hunks)
justfile (2 hunks)

✅ Files skipped from review due to trivial changes (2)

.gitignore
crates/augurs-prophet/.gitignore

🧰 Additional context used

🪛 Markdownlint

crates/augurs-prophet/README.md

11-11: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

12-12: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

🔇 Additional comments (24)

examples/forecasting/Cargo.toml (1)
14-14: LGTM! Consider adding explanatory comments.

The addition of "prophet" and "prophet-cmdstan" features to the augurs dependency aligns with the PR objectives. This change enables the use of the new cmdstan-based optimizer for augurs-prophet in this example.

However, please consider the following points:

Adding these features might increase compilation time and binary size, especially the "prophet-cmdstan" feature which likely involves external dependencies.

It would be helpful to add a comment explaining the purpose of these new features and any requirements (like setting STAN_PATH) for users who might want to run this example.

Could you confirm if this example is specifically intended to demonstrate the cmdstan functionality? If not, you might want to create a separate example for cmdstan or make it optional in this one.

To verify the impact of these changes, you can run the following script:

✅ Verification successful

Verification Successful

The prophet_cmdstan.rs example in the examples/forecasting/examples/ directory confirms that this example specifically demonstrates the cmdstan functionality. The review comment is verified.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check if there's a new example demonstrating cmdstan functionality

# Test: Search for new prophet or cmdstan related examples
rg --type rust 'prophet.*cmdstan' examples
Length of output: 382
crates/augurs-outlier/Cargo.toml (1)

29-29: LGTM! Good practice for workspace dependency management.

The change from serde_json = "1.0.128" to serde_json.workspace = true in the [dev-dependencies] section is a positive improvement. This modification:

Centralizes version management in the workspace root.

Ensures consistency of the serde_json version across all crates in the workspace.

Simplifies future updates of this dependency.

This change aligns well with Rust's workspace best practices and will make maintenance easier in the long run.

justfile (2)

17-27: Well-structured comprehensive test command added.

The new test-all command is a valuable addition, providing a way to run a comprehensive suite of tests while appropriately excluding certain packages and features. The comment explaining the exclusion of iai tests is particularly helpful.

Line range hint 29-36: Consistent updates to documentation commands.

The modifications to the doctest and doc commands, excluding augurs-js and pyaugurs, maintain consistency with the testing commands. The added comment provides valuable context for these exclusions.
crates/augurs/Cargo.toml (1)
38-39: New features for Prophet cmdstan integration look good.

The addition of prophet-cmdstan and prophet-compile-cmdstan features aligns well with the PR objectives. These features correctly depend on the corresponding features in the augurs-prophet crate, allowing users to opt-in to the cmdstan-based optimizer functionality.

A few points to consider:

The feature names are clear and follow the existing naming convention in the file.

The dependencies are correctly specified, pointing to the augurs-prophet crate.

These optional features provide flexibility for users who need the cmdstan functionality without affecting those who don't.

To ensure these new features are properly integrated, let's verify the augurs-prophet crate's Cargo.toml file:

✅ Verification successful

Verified the presence of cmdstan and compile-cmdstan features in crates/augurs-prophet/Cargo.toml.

Both features are correctly defined and align with the modifications in crates/augurs/Cargo.toml.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the presence of cmdstan and compile-cmdstan features in augurs-prophet crate

# Test: Check for cmdstan and compile-cmdstan features in augurs-prophet/Cargo.toml
# Expect: Both features should be present
rg --type toml 'cmdstan|compile-cmdstan' crates/augurs-prophet/Cargo.toml
Length of output: 262
.github/workflows/rust.yml (1)

42-42: Approve the test command change with a request for clarification.

The change from just test to just test-all suggests a more comprehensive testing approach, which is good for ensuring thorough coverage. This aligns with the PR objectives mentioning the use of --all-features.

Could you please clarify:

Does just test-all internally use the --all-features flag?

How does this change affect the build time and test coverage?

Are there any specific reasons why we're moving to test all features by default in CI?

To verify the changes in the testing command, you can run the following script:

This script will help us understand the differences between the test and test-all commands and verify if --all-features is being used.

crates/augurs-prophet/Cargo.toml (3)

18-20: New dependencies look good and align with PR objectives.

The addition of serde, serde_json, tempfile, ureq, and zip as optional dependencies is appropriate for the cmdstan-based optimization feature. Marking them as optional keeps the base package lightweight, which is a good practice.

Also applies to: 22-23

26-26: Dev-dependency addition is appropriate.

Adding the augurs crate as a dev-dependency is a good choice for testing and examples. Using the workspace version ensures consistency across the project.

31-32: Build-dependency addition is appropriate.

Adding tempfile as an optional build dependency aligns with the PR objectives, likely needed for compiling the cmdstan model during the build process. The explicit version specification is good for reproducibility.
Cargo.toml (2)
Line range hint 1-1: Summary of changes in Cargo.toml

The changes to this file are minor but important:

Addition of serde_json dependency, which aligns with the PR objectives.

Correction of an author's email address for proper attribution.

These changes improve dependency management and maintain accurate project metadata. The rest of the file structure and configurations appear to be standard for a Rust workspace.

Line range hint 1-1: Verify the author's email address correction.

The correction of the author's email address is a good catch. However, this change is not visible in the provided code snippet.

Please run the following script to verify the email address correction:

✅ Verification successful

Please run the following script to further verify the author's email address correction:

Author's email address has been correctly updated.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the author's email address correction in Cargo.toml

# Test: Check if the corrected email address is present
grep -n '[email protected]>' Cargo.toml

# Expected output: A line number followed by the corrected email address
# If no output is produced, the correction may not have been applied
Length of output: 45

Script:
# Description: Verify the author's email address correction in Cargo.toml without trailing '>'

# Test: Check if the corrected email address is present without '>'
grep -n '[email protected]' Cargo.toml

# Expected output: A line number followed by the corrected email address
# If no output is produced, the correction may not have been applied or is formatted differently
Length of output: 83
examples/forecasting/examples/prophet_cmdstan.rs (1)

1-56: Overall, good example with room for minor improvements.

This example effectively demonstrates the usage of the Prophet model with the CmdStan optimizer. The code is well-structured and covers the essential steps: data preparation, model initialization, fitting, and prediction.

Key points for improvement:

Enhance documentation with more context about the example's purpose and feature implications.

Extract data preparation into a separate function for better readability.

Improve comments around CmdstanOptimizer initialization options.

Add error handling for regressor addition and prediction steps.

Provide more detailed and informative output of predictions.

Implementing these suggestions will make the example more robust, informative, and easier to understand for users of the augurs-prophet crate.

crates/augurs-prophet/prophet.stan (6)

7-27: Implementation of get_changepoint_matrix appears correct

The get_changepoint_matrix function correctly constructs the changepoint indicator matrix A. The implementation properly handles the indexing and updates A as intended.

31-46: logistic_gamma function is correctly calculating adjusted offsets

The logistic_gamma function accurately computes the adjusted offsets for piecewise continuity in the logistic trend model. The use of append_row and cumulative_sum is appropriate for generating k_s.

48-62: logistic_trend function is correctly implemented

The logistic_trend function effectively calculates the logistic trend using the parameters and the changepoint matrix A. The dimensionality of operations aligns correctly.

66-75: linear_trend function correctly computes the linear trend

The linear_trend function correctly models the linear trend component by incorporating trend adjustments through delta and the changepoint matrix A.

79-84: flat_trend function is correctly generating a constant trend

The flat_trend function appropriately returns a constant vector m of length T, representing a flat trend.

87-101: Data block variables are properly defined

The variables in the data block are correctly declared with appropriate types and constraints. This sets a solid foundation for the model.

crates/augurs-prophet/src/optimizer.rs (4)

30-31: Conditional derivation of serde::Serialize and serde::Deserialize

The use of #[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] for InitialParams is correct and ensures that serialization traits are only derived when the serde feature is enabled.

152-167: Ensure all fields are serialized for Data struct

The custom serialization for Data correctly serializes all fields, including the custom X field. Ensure that any new fields added to Data in the future are also included in the serialize_struct call to maintain consistency.

182-191: Correct implementation of fmt::Display for Algorithm

The fmt::Display implementation for Algorithm accurately maps each variant to its lowercase string representation, which enhances usability when converting the enum to a string.

366-370: Clarify construction of sigmas with PositiveFloat

In the test, the sigmas vector is constructed using try_into().unwrap(). While acceptable in tests, using unwrap() can cause panics if the conversion fails. Ensure that the values are guaranteed to be positive to prevent panics.

crates/augurs-prophet/src/cmdstan.rs (2)

417-423: Set executable permissions on non-Unix platforms

Currently, the code sets executable permissions for the Prophet binary only on Unix-like systems using std::os::unix::fs::PermissionsExt. This means that on Windows, the binary might lack the necessary executable permissions.

Check if the Prophet binary requires executable permissions to run on Windows and adjust the code accordingly. You might need to handle permissions using std::os::windows::fs::PermissionsExt on Windows platforms.

271-276: Handle potential parsing errors when reading output

In the parse_output method, if the data row contains unexpected values or formatting issues, parsing with val.parse() might fail, leading to a runtime error.

Ensure that the code gracefully handles parsing errors. You could collect parsing results and report any issues with specific parameters.

crates/augurs-prophet/Cargo.toml

examples/forecasting/examples/prophet_cmdstan.rs

crates/augurs-prophet/build.rs

coderabbitai · 2024-10-10T15:22:45Z

crates/augurs-prophet/build.rs

+    let libtbb_path = stan_path.join("lib/libtbb.so.12");
+    let dest_libtbb_path = build_dir.join("libtbb.so.12");
+    std::fs::copy(&libtbb_path, &dest_libtbb_path)?;


⚠️ Potential issue

Handle Dynamic Library Version Flexibly

Hardcoding the library name libtbb.so.12 may lead to compatibility issues if the version changes.

Consider modifying the code to dynamically find the libtbb library:

Option 1: Use glob patterns to match any version of the shared library.

use glob::glob; let libtbb_pattern = stan_path.join("lib/libtbb.so.*"); let libtbb_path = glob(&libtbb_pattern.to_string_lossy())? .filter_map(Result::ok) .next() .ok_or("libtbb.so not found")?;

Option 2: Allow the library name to be configurable or document the required version explicitly.

This ensures that your build script adapts to different versions of the libtbb library.

crates/augurs-prophet/build.rs

crates/augurs-prophet/src/optimizer.rs

coderabbitai · 2024-10-10T15:22:47Z

crates/augurs-prophet/src/cmdstan.rs

+        if let Ok(ld_library_path) = std::env::var("LD_LIBRARY_PATH") {
+            let path = format!("{}:{}", ld_library_path, self.lib_dir.display());
+            cmd.env("LD_LIBRARY_PATH", path);
+        }
+        // Otherwise, just use our lib_dir.
+        else {
+            cmd.env("LD_LIBRARY_PATH", &self.lib_dir);
+        }


⚠️ Potential issue

Handle LD_LIBRARY_PATH in a cross-platform manner

The current implementation sets the LD_LIBRARY_PATH environment variable, which is specific to Unix-like systems. This might cause issues on non-Unix platforms like Windows, where the equivalent is the PATH environment variable.

Consider using the dylib crate or conditional compilation to handle library paths in a platform-independent way. This ensures that the library directories are correctly set regardless of the operating system.

crates/augurs-prophet/src/cmdstan.rs

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

crates/augurs-prophet/src/optimizer.rs (2)
127-173: LGTM: Custom serialization for Data struct

The custom serialization implementation for the Data struct is well-implemented and correctly handles the serialization of the X field as a nested array. The XSerializer is a good approach to flatten the X vector into the required format.

One minor suggestion:
Consider extracting the XSerializer into a separate struct outside of the serialize method for better code organization and potential reuse.

365-446: LGTM with minor suggestion: serialize_data test

The serialize_data test is well-implemented and thoroughly verifies the serialization of the Data struct, including the correct nested array serialization of the X field.

Minor suggestion:
There's a small inconsistency in the test data. S is set to 2, but t_change has 3 elements. Consider adjusting one of these to match:
S: 3,
t_change: vec![0.0, 0.0, 0.0],
Or:
S: 2,
t_change: vec![0.0, 0.0],
This will ensure that the test data is internally consistent.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 59b5547 and 5682c3b.

📒 Files selected for processing (1)

crates/augurs-prophet/src/optimizer.rs (6 hunks)

🧰 Additional context used

🔇 Additional comments (3)

crates/augurs-prophet/src/optimizer.rs (3)
30-30: LGTM: Conditional serde derive for InitialParams

The addition of conditional serde derive for InitialParams is a good practice. It allows the struct to be serialized when the "serde" feature is enabled, providing flexibility for users who need serialization capabilities.

186-195: LGTM: Display implementation for Algorithm enum

The Display implementation for the Algorithm enum is well-implemented. It provides a lowercase string representation of the algorithm names, which is useful for generating human-readable output or interfacing with systems that expect lowercase algorithm names.

55-84: 🛠️ Refactor suggestion

Consider using serde_repr or string serialization for TrendIndicator

While the current implementation is functional, it can be error-prone if new variants are added or the order changes. As suggested in a previous review, consider using the serde_repr crate for more maintainable serialization:
use serde_repr::*;

#[derive(Serialize_repr, Deserialize_repr)]
#[repr(u8)]
pub enum TrendIndicator {
    Linear = 0,
    Logistic = 1,
    Flat = 2,
}
Alternatively, you could serialize the enum as strings for better readability:
#[derive(Serialize, Deserialize)]
pub enum TrendIndicator {
    #[serde(rename = "linear")]
    Linear,
    #[serde(rename = "logistic")]
    Logistic,
    #[serde(rename = "flat")]
    Flat,
}
These approaches would be more robust and easier to maintain.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

crates/augurs-prophet/src/optimizer.rs (2)
127-175: LGTM: Custom serialization for Data struct

The custom serialization for the Data struct, including the XSerializer, is well-implemented and correctly handles the specific format requirements. The division by zero check in XSerializer is a good safety measure.

One minor suggestion:

Consider using const for the number of fields in the Data struct:
const DATA_FIELD_COUNT: usize = 13;
let mut s = serializer.serialize_struct("Data", DATA_FIELD_COUNT)?;
This would make it easier to maintain if the number of fields changes in the future.

366-448: LGTM: Test for Data serialization

The serialize_data test is well-structured and correctly verifies the serialization of the Data struct. The use of pretty_assertions is a good choice for clear error messages.

One suggestion to improve the test:

Consider parameterizing the test with different Data instances to cover more cases, especially edge cases. For example:
#[test]
fn serialize_data_parameterized() {
    let test_cases = vec![
        (Data { /* ... */ }, r#"{ /* expected JSON */ }"#),
        // Add more test cases here
    ];

    for (data, expected_json) in test_cases {
        let serialized = serde_json::to_string_pretty(&data).unwrap();
        pretty_assertions::assert_eq!(serialized, expected_json);
    }
}
This approach would make it easier to add more test cases in the future and ensure comprehensive coverage of the serialization logic.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5682c3b and fd0c5a3.

📒 Files selected for processing (1)

crates/augurs-prophet/src/optimizer.rs (6 hunks)

🧰 Additional context used

🔇 Additional comments (4)

crates/augurs-prophet/src/optimizer.rs (4)
30-30: LGTM: Conditional serde derive for InitialParams

The addition of conditional serde derive for InitialParams is correct and follows best practices for optional feature support.

188-197: LGTM: Display implementation for Algorithm enum

The Display implementation for the Algorithm enum is correct and provides a useful lowercase string representation of each variant.

Line range hint 1-448: Overall assessment: Well-implemented serialization support

The changes to crates/augurs-prophet/src/optimizer.rs successfully add serialization support to key structures and improve the overall code organization. The custom serialization for the Data struct is particularly well-implemented, addressing the specific requirements for the X field.

Key points:

Conditional serde derive for InitialParams is correct.

Custom serialization for TrendIndicator works but could be simplified using serde_repr.

Data struct serialization is well-implemented with proper handling of the X field.

Display implementation for Algorithm enum is correct and useful.

The new test for Data serialization is comprehensive but could be expanded for more coverage.

These changes significantly enhance the interoperability of the augurs-prophet crate with external systems that expect specific JSON formats.

55-84: 🛠️ Refactor suggestion

Consider using serde_repr for enum serialization

The custom serialization for TrendIndicator is correct, but it can be simplified using the serde_repr crate. This would reduce boilerplate and make the code more maintainable.

Consider replacing the custom implementations with:
use serde_repr::{Serialize_repr, Deserialize_repr};

#[cfg_attr(feature = "serde", derive(Serialize_repr, Deserialize_repr))]
#[repr(u8)]
pub enum TrendIndicator {
    Linear = 0,
    Logistic = 1,
    Flat = 2,
}
Don't forget to add serde_repr to your dependencies in Cargo.toml.

sd2k · 2024-10-10T16:17:15Z

This almost certainly won't work as-is on Windows but I have no way of testing that right now, and don't plan on using it on Windows any time soon. Contributions to fix that would be welcome!

sd2k force-pushed the prophet-cmdstan branch 4 times, most recently from 320e955 to 2dfa468 Compare October 10, 2024 14:22

Base automatically changed from prophet to main October 10, 2024 14:30

sd2k added 7 commits October 10, 2024 15:32

Add download bin, rework features to be less annoying

528856c

Create empty files if compile step fails in build script

564f478

Run examples and benchmarks (except iai ones) in CI

df699b6

Handle lib dir more robustly and document better

a80e9bc

Add example to Prophet README

032b508

Correctly serialize X (regressors) as nested sequences

59b5547

sd2k force-pushed the prophet-cmdstan branch from 2dfa468 to 59b5547 Compare October 10, 2024 15:13

coderabbitai bot reviewed Oct 10, 2024

View reviewed changes

Handle division by zero in X serialization

fd0c5a3

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

sd2k force-pushed the prophet-cmdstan branch from 5682c3b to fd0c5a3 Compare October 10, 2024 15:59

Use correct env name in rerun-if-env-changed

02c3b17

coderabbitai bot reviewed Oct 10, 2024

View reviewed changes

Use 'OUT_DIR' instead of 'build'

3396905

sd2k merged commit 2d994bd into main Oct 10, 2024
20 checks passed

sd2k deleted the prophet-cmdstan branch October 10, 2024 16:17

sd2k mentioned this pull request Oct 10, 2024

chore: release v0.4.0 #123

Closed

This was referenced Oct 11, 2024

feat: capture stdout/stderr from cmdstan and emit tracing events #124

Merged

feat: add prophet-wasmstan component and JS package stubs #126

Merged

This was referenced Oct 16, 2024

chore: release v0.4.0 #128

Merged

chore: release v0.4.1 #132

Merged

chore: release v0.4.2 #133

Merged

sd2k mentioned this pull request Oct 16, 2024

chore: release v0.4.3 #134

Merged

This was referenced Oct 25, 2024

chore: add benchmark for Prophet #140

Merged

fix: correctly pass cap_scaled to piecewise_logistic #142

Merged

fix(wasmstan): use Stan's json_data var context #145

Merged

This was referenced Nov 7, 2024

feat: add wasmtime based optimizer #146

Merged

Add Prophet WASM example #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add cmdstan-based optimizer for augurs-prophet #121

feat: add cmdstan-based optimizer for augurs-prophet #121

sd2k commented Oct 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 10, 2024

coderabbitai bot Oct 10, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

sd2k commented Oct 10, 2024

feat: add cmdstan-based optimizer for augurs-prophet #121

feat: add cmdstan-based optimizer for augurs-prophet #121

Conversation

sd2k commented Oct 9, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Oct 10, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

sd2k commented Oct 10, 2024

sd2k commented Oct 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)