Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake data Issue #354 #360

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

VikramsDataScience
Copy link

Pull Request

Issue #354

Description

  • Changed the .pre-commit-config.yaml file to use python 3.11.5.
  • I also noticed that the requirements.txt didn't have the pre-commit library listed. So I modified it to include this library. From what I can tell it conforms to their best practice (https://pre-commit.com/#install). I hope that's okay? Please let me know if you want me to revert, though :).
  • Made the changes to all 6 routes of the API to incorporate an is_fake() environment in the gsp.py module. I've attempted to make the changes to mimic the module. @peterdudfield could you let me know if this is along the lines of what you're after, please? And, based on your feedback, I can make the necessary changes in the follow up commits :).
  • For the fake forecast generation, I've used the make_fake_forecast() function from the nowcasting_datamodel.fake module. But, there's also the 'plural' make_fake_forecasts() function available, that seems to call the same make_fake_forecast() function, but for multiple grid supply point IDs. Could you also advise if this is correct, please?

Fixes #

How Has This Been Tested?

  • For testing purposes I've slightly modified the relative imports in auth_utils, cache.py, database.py, pydantic_models, and utils.py modules to allow testing the fake data in all the API routes of the gsp.py module. If all goes well and you're happy with the changes in the PR, I can revert the changes in the relative imports of the other modules prior to approval/merge.
  • I've also set the sys path so that the nowcasting_datamodel repo can be used for relative imports
  • From what I can tell, the changes seem to run on my local machine with the exception of the following warnings - which seem unrelated to the fake data changes but due to some updates that may need to be made to the pydantic library? I'm not too sure, though:
    image
  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?
I haven't plotted any of the changes, other than running the test with the screenshot above. But, please advise if there's anything specific you'd like to see?

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation Not sure if any documentation changes are needed for this change. But, I've added a docstring to the is_fake() function.
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@peterdudfield
Copy link
Collaborator

Thanks so much @VikramsDataScience for this. Really appreciate you doing this.

I wonder if there's a few more things you could do?

  1. Could you check that when you run things locally, could you manually check that the routes your modified work? And they look like the return object? You might find some are empty?

  2. Would you be able to add a test or two, to check they work? This means in the future we are really onfident about this working

  3. Bonus: If you fancy running the front end using https://github.com/openclimatefix/quartz-frontend/tree/main/apps/nowcasting-app and seeing if the app looks like it would in live. You can access the live service at app.quartz.solar

Thanks once again for doing all this work, really appreciate that your doing this. No pressure, to do this extra work, just say.

@VikramsDataScience
Copy link
Author

Hey @peterdudfield. Sure thing. I'm happy to do it :)!

If its cool with you, I'll look to get this done sometime next week? This is only because it's a public holiday in Melbourne tomorrow, and I fancied having a 3 day weekend :).

@peterdudfield
Copy link
Collaborator

Thanks

No worries, do this when it suits you

@VikramsDataScience
Copy link
Author

Hey @peterdudfield. I just wanted to provide a quick update:

  • I'm just now looking to test the changes. I've initially had quite a lot of trouble with Docker (on my local). I've got some issues with WSL2 setting up Docker on Desktop. Seems like it cannot set up the Linux Engine locally for some reason. So I haven't been able to test this with Docker. It took me quite a few hours of messing around (including with playing with my BIOS settings) to realise that it wouldn't work!
  • So, I've instead opted to set up the PostGres DB locally for which you guys have already created data models in the nowcasting_datamodel. I've just been able to get it running on my local, I'll try to get some tests running. Please bear with me in the meantime! I'll come back with more questions if I hit upon any blockers :).

@peterdudfield
Copy link
Collaborator

No problem

Please do share any learnings, just so we can help others too

@VikramsDataScience
Copy link
Author

Hey @peterdudfield. So sorry its taking me so long to update this PR since last week! It's been a bit of a distracting and chaotic period lately.

I've tried to write a test case to test the changes for the fake forecast, and it may be generating forecasts, but I'm not really sure and wanted to get your thoughts, please?

I've made a number of changes to various modules, in my local environment, so I didn't want to include them as commits, as they're only done to test the changes, and won't be part of the eventual merge into production. Instead I wanted to just provide some snippets of the changes, and which modules they affect, and you could let me know if I'm on the right track? And, once we're settled on what is correct, I can then include everything into a commit?

  • With the test cases, I've written one possible test case to begin with that's attempted to mimic some of the other similar test cases in the test_gsp.py module. This one was intended to test the get_forecasts_for_a_specific_gsp() function (/{gsp_id}/forecast route) from the gsp.py module, just to kick things off before looking at the other routes/endpoints:
@freeze_time("2024-01-01")
def test_forecasts_for_specific_gsps(db_session, api_client, gsp_id=1):
    """Test if route /v0/solar/GB/gsp/{gsp_id}/forecast works"""

    # Using some dummy values
    forecast_value = [
        ForecastValueSQL(
            target_time=datetime(2024, 2, 1, tzinfo=timezone.utc),
            expected_power_generation_megawatts=50.0,
            adjust_mw=0.2,
            properties={},  
            forecast_id=1,  
            created_utc=datetime(2024, 2, 1, tzinfo=timezone.utc)
        )
    ]

    forecasts = make_fake_forecast(
        gsp_id=gsp_id,
        session=db_session,
        forecast_values=forecast_value,
        add_latest=True, 
        model_name="fake_model"
        )
    db_session.add(forecasts)
    db_session.commit()

    forecast_from_db = db_session.query(ForecastSQL).filter_by(id=forecasts.id).first()

    # Print the values in the DB
    print(f"Stored Forecast ID: {forecast_from_db.id}")
    print(f"Forecast creation time: {forecast_from_db.forecast_creation_time}")
    for value in forecast_from_db.forecast_values:
        print(f"Stored Target time: {value.target_time},\n Stored Expected power: {value.expected_power_generation_megawatts}")

    app.dependency_overrides[get_session] = lambda: db_session

    response = api_client.get(f"/v0/solar/GB/gsp/{gsp_id}/forecast")

    assert response.status_code == 200

    return [ForecastSQL(**f) for f in response.json()]

This test case returns the following output. So, it looks like I'm getting a connection to the DB for the route/endpoint and that allows the assert in the test case to pass, and I'm seeing something populating in the 'expected_power_generation_megawatts' value (I've highlighted it on the screenshot). The other outputs seem to be print statements coming from the get_forecasts_for_a_specific_gsp() function in the gsp.py module. I'm really not sure if this correct, though. Can you advise, please?:
image

@VikramsDataScience
Copy link
Author

Hey @peterdudfield. With regards to the changes I was referencing in yesterday's lengthy comment, I've put them into a commit and tested it. Could you kindly review, and let me know, please?

@VikramsDataScience
Copy link
Author

  • Hey @peterdudfield. I've made another push to the PR. When you return, could review the changes and let me know, please?
  • Updated the gsp.py module to recursively generate "gsp_ids" for the get_all_available_forecasts() and get_truths_for_all_gsps() functions. Both these functions seem to make respective calls using the get_forecasts_from_database() and get_truth_values_for_all_gsps_from_database() that returns 0 forecasts. Please correct me if I'm wrong, but I think this should expected, since the make_fake_forecasts() call in the test case I've written should take over that duty, and override the need to connect to the DB?
  • From what I can understand of the DB data model, it seems that the forecasts are stored in forecast_values.expected_power_generation_megawatts? If you could please confirm this, as I'm not 100% sure if that's correct, and I can change the test cases if necessary?

@peterdudfield
Copy link
Collaborator

Thanks for all this work, getting is_fake working.
I think the test cases duplicate other tests? Sorry I took a while to get back to you

@VikramsDataScience
Copy link
Author

Hey @peterdudfield. No worries at all! I think you were on leave?

Apologies for the duplicated test cases. When you were asking about writing some tests, I didn't think to look at the existing test cases that were already there! I've now removed the duplicated test cases, and pushed the changes to the PR.

Please let me know if there's anything else?

@peterdudfield
Copy link
Collaborator

Hey @peterdudfield. No worries at all! I think you were on leave?

Apologies for the duplicated test cases. When you were asking about writing some tests, I didn't think to look at the existing test cases that were already there! I've now removed the duplicated test cases, and pushed the changes to the PR.

Please let me know if there's anything else?

yea, I was away for a week.

Sorry I should have said this yesterday, yea, test would be good, but ones that test is_fake would be really useful

@VikramsDataScience
Copy link
Author

All good. Sorry for the delayed response, but I've had my week hijacked with some other important priorities. Could I look into developing some new test cases sometime next week, please?

@peterdudfield
Copy link
Collaborator

Of course,no problem. Thsnk you for your help

@VikramsDataScience
Copy link
Author

Of course! A pleasure, as always, @peterdudfield!

@VikramsDataScience
Copy link
Author

Hey @peterdudfield. So sorry for my tardy updates with this PR! It's been a very distracting period for me, and has been a bit difficult to manage some of my competing priorities.

I've written a new test case for the fake environment that has passed the checks on my local machine, and is generating forecasts for one of the DB routes/endpoints (I thought I'd start with the specific GSP ID test to begin). It's generating quite a few 0.0 forecasts, but happily no negative values:
image
Could you review the test case and let me know if this is okay with you, please? And, if all is good with you, I can proceed to write very similar test cases for the remaining routes/endpoints :).

@peterdudfield
Copy link
Collaborator

No worries. This looks great.

Yea a few similar tests would be great

@VikramsDataScience
Copy link
Author

VikramsDataScience commented Nov 7, 2024

I've finished writing the test cases using the NationalForecastValue, ForecastValue, and the ManyForecasts as the return objects for the test_gsp and test_national modules. I've run all the test cases locally and I'm happy to report that they're all passing! If you'd like, I can provide screenshots, as well?

Otherwise, I'm hopeful that the CI workflow will pass now!

@VikramsDataScience
Copy link
Author

VikramsDataScience commented Nov 8, 2024

I've gone through the error log statement generated by the runner, and it looks like it's failing several of the pre-existing test cases. But, so far, the error log hasn't flagged any of the new is_fake test cases failing:
image
With the above, I've gone through all the test cases that the runner has flagged in the error, and I'm able to recreate all of them locally, except for 1 - the test_merged_routes.py::test_read_forecast_values_gsp is passing locally:
image

It appears that the new is_fake test cases are running but the 8 other test cases in the test_gsp, and test_national modules are failing.

Not sure how you'd like to approach resolving these issues, given that they're out-of-scope for this PR? Maybe to raise some new issues that the community could look into to address these failed test cases? Perhaps these test cases' logic are out of date with changes to the data model, and as a result, they're returning a different number of forecasts to the ones previously required when these test cases were first written, or something along those lines, do you think?

Also, if this means raising new issues with the community, how would you like to approach merging all the changes made in this PR? They appear to be working as expected.

@peterdudfield
Copy link
Collaborator

They were all passing on 'main'. Locally do you run all the tests?
It might be that the ENV VAR is still set?

@VikramsDataScience
Copy link
Author

They were all passing on 'main'. Locally do you run all the tests? It might be that the ENV VAR is still set?

Hmm, I'm sceptical that running all the tests will yield a different result. I'm very open to any suggestions you might have, as I'm starting to run out of ideas, given how long we've been working on this PR? And, of course, you know this codebase far better than myself!

Do you mean that is_fake is being set whilst these other test cases are run?

@peterdudfield
Copy link
Collaborator

They were all passing on 'main'. Locally do you run all the tests? It might be that the ENV VAR is still set?

Hmm, I'm sceptical that running all the tests will yield a different result. I'm very open to any suggestions you might have, as I'm starting to run out of ideas, given how long we've been working on this PR? And, of course, you know this codebase far better than myself!

Do you mean that is_fake is being set whilst these other test cases are run?

I think its worth running all the tests, to see if its the same results as CI.

Yea, my guess is that is_fake is still set on the other cases. And in some ways needs to be un-set.

Yea totally understand you've been working hard on this. Another options is we merge to a non main branch, and I, or someone else fixes the tests or other things. Its totally up to you

@VikramsDataScience
Copy link
Author

All good. I'm very happy to help!

I'll keep trying over next week and see how I fair ☺️

@VikramsDataScience
Copy link
Author

Sorry, Peter, once again a few other priorities have come up this week that need my attention.

I did have a play around yesterday with the test cases, and I may have a couple of theories as to why they're failing, but haven't experimented to see my theories pan out! Once I get these other priorities out of the way, I'll return to the PR :).

@VikramsDataScience
Copy link
Author

VikramsDataScience commented Nov 18, 2024

Hey Peter. So with these test cases, I've just started looking into them. And, as you correctly suggested, to run them all together, rather than individually. Locally, I'm able to reproduce 8/9 failures in the CI workflow (on my local, the test_read_forecast_values_gsp()) is working.

And, looking at the individual test cases, I was able to 'fix' two of them - test_read_latest_all_gsp() and test_read_latest_all_gsp_normalized() by adding one extra 'gsp_id' to their DB call (by changing gsp_ids=list(range(0, 11))), since the failure was caused by the first assert statement that's testing for the presence of 10 forecasts in the return object, but 9 were returning.

The other test cases have different failure conditions - they seem to be caused by duplicate values in the DB that are persisting from the previous test cases that added those values:
image

The question (and permission, really) I wanted to ask is that can I alter the conditions of these cases to stop them from failing our new merge? But my fear is that, based on your previous comment, you were saying they're currently passing on the main branch? So, if I alter them to pass on this new merge, I'm afraid that it would damage the integrity of the existing logic, and then they might begin failing afterward? Or that the new logic might end up testing the wrong thing! If you're cool with me altering the code, I'm happy to keep working on it until they pass, but I just don't want to damage integrity of the logic! Please let me know how you'd like me to proceed :).

@VikramsDataScience
Copy link
Author

Small update and a bit of good news to the previous comment:
image
I've managed to fix 3 of the test cases by applying a pytest.fixture() to the setup_gsp_yield_data() the helper function, and then yielding the db_session in the test cases that call it: test_read_truths_for_all_gsp(), test_read_truths_for_all_gsp(), and test_read_truths_for_all_gsp_filter_gsp(). The good new is that this was done without changing any of the underlying logic in those test cases. But, please do let me know about how you'd like me to proceed with the points in the previous comment.

@peterdudfield
Copy link
Collaborator

Small update and a bit of good news to the previous comment: image I've managed to fix 3 of the test cases by applying a pytest.fixture() to the setup_gsp_yield_data() the helper function, and then yielding the db_session in the test cases that call it: test_read_truths_for_all_gsp(), test_read_truths_for_all_gsp(), and test_read_truths_for_all_gsp_filter_gsp(). The good new is that this was done without changing any of the underlying logic in those test cases. But, please do let me know about how you'd like me to proceed with the points in the previous comment.

Thats great, good to do that. Can you push that? Then we can see what other tests are failing

@VikramsDataScience
Copy link
Author

I've made the changes from the above comments, and I'm happy to say that we've now made some good progress in solving our issues. So, we should have fewer errors in the next workflow run. These are some of the changes:

  • Explicitly disabled is_fake() in the new test cases to prevent any issues that may come up that the is_fake ENV may cause.
  • Fixed the test_gsp issues from the previous comments by individually yielding the stored values in the db_session to prevent the duplication errors we've been seeing.
  • Also applied the same fix for the test_national test_read_truth_national_gsp() test cases, as the error from the CI workflow seems to come from the same source.
  • I've also rolled back the change I made to address that assertion issue from test_gsp.py::test_read_latest_all_gsp and test_gsp.py::test_read_latest_all_gsp_normalized. So, we'll continue to get those errors in the CI workflow. Maybe these two test cases are not including the national GSP for some reason (gsp_id==0)? From what I could see when I ran an assert on the forecasts object, there were 10 forecasts in that object, but when these forecasts are inserted to the DB, the gsp_id==0 gets omitted for some weird reason, I think. Keen to hear any thoughts you might have on how we can solve for this issue?:
    image
  • I'm not able to locally reproduce the error from the test_merged_routes.py::test_read_forecast_values_gsp that the workflow is generating (it's passing on my machine). So I'm not sure how to repair that error when it comes up again. Very open to any suggestions for that one, as well?

If my local tests work as expected in the workflow, I'd reckon that we will probably get 4 errors, rather than 9, in the next run.

@VikramsDataScience
Copy link
Author

Hey Peter,

I may have a fix for the test_read_latest_all_gsp_normalized() and test_read_latest_all_gsp() test cases. It took me quite some time to try to debug this issue! I had to write and test a variety of statements to see where the gsp_id==0 was dropping off, but after quite a bit of testing and experimentation, I tried using the 'gsp_ids' as a query parameter in the DB endpoint, and now the test_gsp.py passes all the tests on my local machine:
image
This should now mean 2 remaining failed cases:

  • test_merged_routes.py::test_read_forecast_values_gsp
  • test_national.py::test_read_truth_national_gsp

@VikramsDataScience
Copy link
Author

  • I think that our remaining errors may be to do with incorrect DB endpoints.
  • On that note, I'm trying an experiment to address the test_read_truth_national_gsp() and test_read_forecast_values_gsp() failures. Unfortunately, for the remaining two test cases, I can't reproduce them on my local machine for some weird reason. So I'll only be able to test any changes by pushing to the PR and examining the error logs generated by the runners. So, you may get a few pushes to the PR as I try work through this!

@VikramsDataScience
Copy link
Author

Hey Peter,

Are you able to trigger the workflow to see my newest changes have any effect?

@peterdudfield
Copy link
Collaborator

peterdudfield commented Nov 22, 2024 via email

@VikramsDataScience
Copy link
Author

I think you can close and reop the PR, adn that triggers them Peter Dudfield +44 7814 169 251

________________________________ From: Vikram Pande @.> Sent: 22 November 2024 09:47 To: openclimatefix/uk-pv-national-gsp-api @.> Cc: Peter Dudfield @.>; State change @.> Subject: Re: [openclimatefix/uk-pv-national-gsp-api] Fake data Issue #354 (PR #360) Hey Peter, Are you able to trigger the workflow to see my newest changes have any effect? — Reply to this email directly, view it on GitHub<#360 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIUKWXJLBDHDEX44K65MID2B34SNAVCNFSM6AAAAABOXLO6XCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJTGM2DSMBRGY. You are receiving this because you modified the open/close state.Message ID: @.***>

No problems. I wasn't sure if I had sufficient permissions, but I can do that from now :)

@VikramsDataScience
Copy link
Author

Whoa! I think it may have worked!

@peterdudfield
Copy link
Collaborator

Thats great they are now working. I relised something when reviewing this. For the routes that say get the truth we need to not use make_fake_forecast but use make fake gsp yeilds. Would you be able to change this over? I suspect we might be able to role back some changes and simplify this PR then.

Again, thanks so much for this work, and its very nearly over the line. I'd be very happy if you do this bit, but I totally understand if you want me or someone else to do it

@VikramsDataScience
Copy link
Author

That's cool. I should be able to take care of it and update the PR if all is well, and otherwise comment if I have any challenges :)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants