Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementaion for stationary sources of pollution #3945

Open
wants to merge 2 commits into
base: staging
Choose a base branch
from

Conversation

pronapro
Copy link

@pronapro pronapro commented Nov 28, 2024

Description

[Implementation for API endpoints for stationary sources of pollution ]

Changes Made

  • Made API endpoints for Data insertion and retrieval.
  • Data insertion endpoints take in the image, duration of interest and radius of area of interest and return the geometry, confidence score and location and environment profiles identified.
  • Retrieving endpoints return data saved in the database by parameters such as confidence score.

Testing

  • Tested locally
  • Tested against staging environment
  • Relevant tests passed: [List test names]

Affected Services

  • Which services were modified:
    • Spatial microservice

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new API endpoints for uploading images and retrieving pollutant data based on confidence levels.
    • Added functionality for processing and predicting pollutant presence using uploaded images.
    • Enhanced configuration options for MongoDB connection settings.
    • Comprehensive methods for managing pollutant data through image uploads and MongoDB interactions.
  • Bug Fixes

    • Improved error handling for data insertion and retrieval processes.
  • Documentation

    • Updated requirements to reflect new dependencies and specific versioning for better stability.

These updates enhance the application's ability to process geospatial data and improve user interaction with pollutant data.

Copy link
Contributor

coderabbitai bot commented Nov 28, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces significant updates to the application, including the addition of new environment variables MONGO_URI and DB_NAME for MongoDB configuration, new API routes for image uploads and data retrieval in the Flask controller, and the creation of a new class PredictionAndProcessing for handling geospatial data related to pollutant identification. Additionally, the requirements.txt file has been extensively revised to include a wide range of dependencies, ensuring the application is equipped with the necessary libraries for its enhanced functionalities.

Changes

File Path Change Summary
src/spatial/configure.py Added new environment variables MONGO_URI and DB_NAME to the Config class, with default values if not set.
src/spatial/controllers/controllers.py Introduced three new routes: /upload-image (POST), /get-data-by-confidence (GET), and /get-all-data (GET) for API functionality.
src/spatial/models/pollutant_identification.py Added PredictionAndProcessing class with multiple static methods for processing images and predicting pollutant presence.
src/spatial/requirements.txt Overhauled dependencies with numerous additions, updates, and removals, including specific version specifications for many libraries.
src/spatial/views/pollutant_views.py Created new API methods in PollutantApis class for image upload and MongoDB interactions, including error handling for data processing.
src/spatial/Dockerfile Modified to include additional system dependencies for GDAL and added environment variables for include paths.

Possibly related PRs

  • setup backend code to make satellite predictions #3613: The changes in this PR also involve modifications to the src/spatial/configure.py file, specifically adding new variables to the Config class, which is relevant to the main PR's addition of MONGO_URI and DB_NAME.
  • auto reporting #3846: This PR introduces new environment variables in the Config class of src/spatial/configure.py, similar to the main PR's addition of MONGO_URI and DB_NAME, indicating a focus on enhancing configuration options.

Suggested labels

work in progress

Suggested reviewers

  • Baalmart
  • NicholasTurner23
  • Codebmk
  • MartinKalema
  • AnthonyByansi

🌟 In the code's embrace, new paths we pave,
With Mongo's URI, our data we save.
Images upload, predictions align,
Pollutants detected, in records they shine.
Dependencies flourish, our project will thrive,
In this vibrant code, we truly come alive! 🌿


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 11.60%. Comparing base (34ba34d) to head (9756e54).
Report is 12 commits behind head on staging.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           staging    #3945      +/-   ##
===========================================
- Coverage    11.73%   11.60%   -0.13%     
===========================================
  Files          114      115       +1     
  Lines        15318    15680     +362     
  Branches       318      318              
===========================================
+ Hits          1797     1820      +23     
- Misses       13521    13860     +339     

see 2 files with indirect coverage changes

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🧹 Outside diff range and nitpick comments (11)
src/spatial/controllers/controllers.py (2)

84-94: Consider adding request validation middleware

Given that these endpoints are part of a critical pollution monitoring system, consider adding request validation middleware to ensure:

  1. Valid image formats and size limits for /upload-image
  2. Proper confidence score range validation for /get-data-by-confidence

Would you like me to provide an example implementation of the validation middleware?


84-94: Document API endpoints

Consider adding OpenAPI/Swagger documentation for these new endpoints to help API consumers understand:

  • Expected request/response formats
  • Valid parameter ranges
  • Error responses

Example documentation format:

@controller_bp.route('/upload-image', methods=['POST'])
def upload_image_for_prediction():
    """
    Upload an image for pollutant prediction.
    ---
    parameters:
      - name: image
        in: formData
        type: file
        required: true
        description: Image file to analyze
      - name: duration
        in: query
        type: integer
        description: Duration of interest in minutes
      - name: radius
        in: query
        type: number
        description: Radius of area of interest in meters
    responses:
        description: Successful prediction
        schema:
          properties:
            geometry: object
            confidence_score: number
            location_profile: object
    """
    return PollutantApis.upload_image()
src/spatial/requirements.txt (1)

1-156: Consider organizing dependencies by category

The requirements file contains a comprehensive set of dependencies that align well with the PR objectives for handling stationary sources of pollution. However, to improve maintainability, consider organizing the dependencies into logical categories with comments:

# Web Framework
Flask~=3.0.3
Flask-Cors~=5.0.0

# Machine Learning
tensorflow==<version>
torch==<version>
scikit-learn~=1.5.2

# Geospatial Processing
geopandas==0.13.2
shapely~=2.0.6

# etc...
src/spatial/configure.py (1)

31-31: Consider integrating satellite data for pollution source validation

Given the extensive satellite data collections available in this configuration (particularly the SO2, NO2, and CO data from Copernicus), consider leveraging this data to validate or enrich the stationary pollution source detections.

You could enhance the pollution source detection by:

  1. Cross-referencing detected locations with satellite data
  2. Using satellite measurements to validate pollution levels
  3. Incorporating atmospheric conditions from the satellite data

Would you like me to provide a code example demonstrating this integration?

src/spatial/views/pollutant_views.py (2)

113-113: Use Logging Instead of Print Statements

Using print statements is not recommended for production code. It's better to use a logging framework for outputting messages.

Replace print statements with logging:

-    print(f"Querying with min_confidence: {min_confidence}")
+    app.logger.info(f"Querying with min_confidence: {min_confidence}")

For the commented-out print statement:

-    # print(f"Inserted documents with IDs: {insert_result.inserted_ids}")
+    app.logger.info(f"Inserted documents with IDs: {insert_result.inserted_ids}")

Also applies to: 94-94


124-135: Refactor to Reduce Code Duplication

The process of converting retrieved documents to a GeoJSON FeatureCollection is repeated in both get_data_by_confidence and get_all_data. Refactoring this into a helper function will enhance maintainability.

Consider creating a helper function:

def create_geojson_feature_collection(data):
    return {
        "type": "FeatureCollection",
        "features": [
            {
                "type": "Feature",
                "geometry": doc["geometry"],
                "properties": doc["properties"]
            } for doc in data
        ]
    }

Then, use this function in both methods:

geojson_data = create_geojson_feature_collection(data)

Also applies to: 155-166

src/spatial/models/pollutant_identification.py (5)

19-19: Eliminate unused import mapping from shapely.geometry

The function mapping imported on line 19 is not used in the code.

🧰 Tools
🪛 Ruff (0.8.0)

19-19: shapely.geometry.mapping imported but unused

Remove unused import: shapely.geometry.mapping

(F401)


24-24: Discard unused import geemap

The module geemap imported on line 24 is not utilized.

🧰 Tools
🪛 Ruff (0.8.0)

24-24: geemap imported but unused

Remove unused import: geemap

(F401)


31-31: Remove unused import from matplotlib.patches

The Polygon class imported as pltPolygon on line 31 is not used in the code.

🧰 Tools
🪛 Ruff (0.8.0)

31-31: matplotlib.patches.Polygon imported but unused

Remove unused import: matplotlib.patches.Polygon

(F401)


36-37: Eliminate unused MongoDB imports

The following imports are not used and can be removed:

  • Line 36: from pymongo import MongoClient
  • Line 37: from bson import json_util
🧰 Tools
🪛 Ruff (0.8.0)

36-36: pymongo.MongoClient imported but unused

Remove unused import: pymongo.MongoClient

(F401)


37-37: bson.json_util imported but unused

Remove unused import: bson.json_util

(F401)


185-188: Review exception handling and variable usage

Within the exception block starting at line 185:

  • The exception e is assigned but not used.
  • The variable building_types is assigned a value but not utilized elsewhere.

Consider utilizing the exception e for logging or debugging purposes, or remove it if unnecessary. Also, if building_types is not needed, you can remove its assignment to streamline the code.

🧰 Tools
🪛 Ruff (0.8.0)

185-185: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


188-188: Local variable building_types is assigned to but never used

Remove assignment to unused variable building_types

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 34ba34d and c59556e.

📒 Files selected for processing (5)
  • src/spatial/configure.py (1 hunks)
  • src/spatial/controllers/controllers.py (2 hunks)
  • src/spatial/models/pollutant_identification.py (1 hunks)
  • src/spatial/requirements.txt (1 hunks)
  • src/spatial/views/pollutant_views.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
src/spatial/models/pollutant_identification.py

2-2: os imported but unused

Remove unused import: os

(F401)


3-3: time imported but unused

Remove unused import: time

(F401)


8-8: json imported but unused

Remove unused import: json

(F401)


12-12: pandas imported but unused

Remove unused import: pandas

(F401)


13-13: geopandas imported but unused

Remove unused import: geopandas

(F401)


19-19: shapely.geometry.mapping imported but unused

Remove unused import: shapely.geometry.mapping

(F401)


24-24: geemap imported but unused

Remove unused import: geemap

(F401)


31-31: matplotlib.patches.Polygon imported but unused

Remove unused import: matplotlib.patches.Polygon

(F401)


36-36: pymongo.MongoClient imported but unused

Remove unused import: pymongo.MongoClient

(F401)


37-37: bson.json_util imported but unused

Remove unused import: bson.json_util

(F401)


48-48: Undefined name get_trained_model_from_gcs

(F821)


49-49: Undefined name Config

(F821)


50-50: Undefined name Config

(F821)


62-62: Undefined name service_account

(F821)


63-63: Undefined name Config

(F821)


65-65: Undefined name Config

(F821)


83-83: Undefined name load_tiff

(F821)


88-88: Undefined name normalize_image

(F821)


117-117: Undefined name gp

(F821)


175-175: Undefined name flatten_highway

(F821)


185-185: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


188-188: Local variable building_types is assigned to but never used

Remove assignment to unused variable building_types

(F841)

src/spatial/views/pollutant_views.py

108-108: SyntaxError: unindent does not match any outer indentation level


109-109: SyntaxError: Expected class, function definition or async function definition after decorator


145-145: SyntaxError: unindent does not match any outer indentation level


146-146: SyntaxError: Expected class, function definition or async function definition after decorator

🔇 Additional comments (3)
src/spatial/controllers/controllers.py (1)

16-16: LGTM: Clean import statement

The import of PollutantApis is appropriately placed with other view imports.

src/spatial/requirements.txt (1)

51-51: ⚠️ Potential issue

Add version specifications for all dependencies

Several critical dependencies are missing version specifications, which could lead to breaking changes and reproducibility issues:

google-cloud-storage
google-generativeai
huggingface_hub
openai
tensorflow
transformers
torch
gunicorn

For machine learning projects, it's particularly important to pin versions of ML libraries to ensure consistent behavior.

Also applies to: 53-53, 60-60, 87-87, 132-132, 146-146, 147-147, 151-151

✅ Verification successful

Add version specifications to ensure dependency stability

Based on the latest stable versions from PyPI, here are the recommended version specifications:

  • google-cloud-storage==2.18.2
  • google-generativeai==0.8.3
  • huggingface_hub==0.26.3
  • openai==1.55.2
  • tensorflow==2.18.0
  • transformers==4.46.3
  • torch==2.5.1
  • gunicorn==23.0.0

Using these pinned versions will ensure reproducible builds and prevent unexpected breaking changes from dependency updates.

Also applies to: 53-53, 60-60, 87-87, 132-132, 146-146, 147-147, 151-151

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Get the latest stable versions of these packages from PyPI
for pkg in google-cloud-storage google-generativeai huggingface_hub openai tensorflow transformers torch gunicorn
do
  echo "Package: $pkg"
  curl -s "https://pypi.org/pypi/$pkg/json" | jq -r '.info.version'
done

Length of output: 1978

src/spatial/configure.py (1)

31-31: Verify architectural alignment with existing systems

The introduction of MongoDB for storing pollutant data represents a significant architectural decision. Given that this PR is focused on stationary sources of pollution, we should ensure this aligns with the broader system architecture.

Let's verify the existing data storage patterns:

✅ Verification successful

MongoDB is already well-established in the system architecture

Based on the codebase analysis, MongoDB is extensively used across multiple services including:

  • Device Registry (src/device-registry)
  • Device Monitoring (src/device-monitoring)
  • Analytics (src/analytics)
  • Data Management (src/data-mgt)
  • Predictions (src/predict)
  • Authentication (src/auth-service)

The introduction of MongoDB for storing pollutant data in the spatial service aligns perfectly with the existing architectural patterns, as MongoDB is already the primary database across the platform's microservices architecture.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing database connections and patterns
rg -g '!node_modules' -g '!*.pyc' -A 3 'mongodb://' .
rg -g '!node_modules' -g '!*.pyc' -A 3 'MONGO_' .

# Look for related configuration patterns
rg -g '!node_modules' -g '!*.pyc' -A 3 'database.*config' .

Length of output: 50533

Comment on lines 88 to 91
def get_data_by_confidence():
return PollutantApis.get_data_by_confidencee()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in method name

There's a typo in the method name get_data_by_confidencee() (extra 'e' at the end).

Apply this fix:

-    return PollutantApis.get_data_by_confidencee()
+    return PollutantApis.get_data_by_confidence()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@controller_bp.route('/get-data-by-confidence', methods=['GET'])
def get_data_by_confidence():
return PollutantApis.get_data_by_confidencee()
@controller_bp.route('/get-data-by-confidence', methods=['GET'])
def get_data_by_confidence():
return PollutantApis.get_data_by_confidence()

Comment on lines +84 to +87
@controller_bp.route('/upload-image', methods=['POST'])
def upload_image_for_prediction():
return PollutantApis.upload_image()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Error handling improvements needed for image upload endpoint

The current implementation has basic error handling but lacks several critical safeguards for file uploads:

  • No file type validation (ALLOWED_EXTENSIONS not defined)
  • No file size limits (MAX_CONTENT_LENGTH not configured)
  • No error handling for image processing failures (preprocess_image and predict_and_get_centroids don't have try-catch blocks)
  • No cleanup of temporary files in case of processing errors

Recommended improvements:

  • Add file type validation for accepted image formats
  • Configure maximum file size limit
  • Wrap image processing operations in try-catch blocks
  • Implement proper cleanup of uploaded files
  • Add error handling for model prediction failures
🔗 Analysis chain

Verify error handling for image upload endpoint

The implementation looks clean, but given that this endpoint handles file uploads, we should ensure proper error handling is in place.

Let's check the implementation of the upload handler:


The previous pattern didn't yield results, so I'm:

  1. Searching for the class implementation with the method
  2. Using ripgrep as a backup to find any occurrences of the method name with context

This will help us verify the error handling implementation in the upload endpoint.


I can see the beginning of the error handling in the implementation, but let's get the complete method implementation to verify all error cases are properly handled, including file type validation, size limits, and processing errors.


I can see basic error handling for file presence and empty filename, but I need to verify:

  1. The complete method implementation to see how it handles processing errors
  2. If there are file type restrictions defined and enforced

I can see error handling for:

  1. Missing file and empty filename
  2. Invalid form parameters
  3. Need to verify error handling in the preprocessing and prediction methods
  4. Check if there's any file size limit configured

This will help ensure all potential error cases are covered.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the upload_image implementation to verify error handling
ast-grep --pattern 'def upload_image():
  $$$
}'

Length of output: 50


Script:

#!/bin/bash
# Let's try searching for the class and method implementation
ast-grep --pattern 'class PollutantApis:
  $$$
  def upload_image($$$):
    $$$'

# Also search for any similar method names
rg "upload_image" -A 5

Length of output: 1051


Script:

#!/bin/bash
# Let's examine the complete implementation in pollutant_views.py
rg -A 15 "def upload_image" src/spatial/views/pollutant_views.py

Length of output: 661


Script:

#!/bin/bash
# Let's get more context to see the complete method implementation
rg -A 30 "def upload_image" src/spatial/views/pollutant_views.py

# Also check for file type validation constants
rg "ALLOWED_EXTENSIONS|UPLOAD_FOLDER" src/spatial/views/pollutant_views.py

Length of output: 1615


Script:

#!/bin/bash
# Let's check the preprocess_image and predict_and_get_centroids methods for error handling
rg -A 10 "def preprocess_image|def predict_and_get_centroids" 

# Also check if there's any file size limit configuration
rg "MAX_CONTENT_LENGTH|app\.config\['MAX_CONTENT_LENGTH'\]"

Length of output: 2006

google-cloud-bigquery[pandas]
gunicorn
flask-cors~=5.0.0
Flask-Cors==5.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate Flask-Cors entry

The package Flask-Cors appears twice with different version specifiers:

  • Line 1: Flask-Cors==5.0.0
  • Line 39: flask-cors~=5.0.0

This could lead to installation conflicts. Keep only one entry, preferably using the compatible release operator ~=.

-Flask-Cors==5.0.0

Also applies to: 39-39

@@ -17,18 +1,156 @@
Flask-Cors==5.0.0
Flask==3.0.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate Flask entry

The package Flask appears twice with different cases and version specifiers:

  • Line 2: Flask==3.0.3
  • Line 156: flask~=3.0.3

Maintain consistency in package naming and avoid duplicates.

-flask~=3.0.3

Also applies to: 156-156

opt_einsum==3.4.0
osmnx==1.9.3
packaging==24.2
pandas-gbq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate BigQuery dependencies

There are multiple overlapping BigQuery-related packages:

pandas-gbq
google-cloud-bigquery
google-cloud-bigquery-storage
google-cloud-bigquery[pandas]

The google-cloud-bigquery[pandas] extras already include pandas integration, making pandas-gbq redundant.

-pandas-gbq

Also applies to: 152-152, 153-153, 154-154

Comment on lines +62 to +65
ee.Initialize(credentials=service_account.Credentials.from_service_account_file(
Config.CREDENTIALS,
scopes=['https://www.googleapis.com/auth/earthengine']
), project=Config.GOOGLE_CLOUD_PROJECT_ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure service_account and Config are defined for Earth Engine initialization

In the initialize_earth_engine method, service_account and Config are not defined:

  • Line 62: service_account
  • Lines 63 and 65: Config

To resolve this issue, import service_account from the appropriate module (e.g., from google.oauth2 import service_account) and define or import Config with the necessary credentials and project information.

🧰 Tools
🪛 Ruff (0.8.0)

62-62: Undefined name service_account

(F821)


63-63: Undefined name Config

(F821)


65-65: Undefined name Config

(F821)

Comment on lines +211 to +226
def get_environment_profile(latitude, longitude, months, radius):
today = datetime.date.today()
start_date = ee.Date.fromYMD(today.year, today.month, today.day).advance(-months, 'month')
end_date = ee.Date.fromYMD(today.year, today.month, today.day)
point = ee.Geometry.Point(longitude, latitude)

def get_mean(collection, band):
collection = collection.filterDate(start_date, end_date).filterBounds(point).select(band)
mean_value = collection.mean().reduceRegion(ee.Reducer.mean(), point.buffer(radius), scale=1000).get(band)
return mean_value.getInfo() if mean_value else None

return {
'mean_AOD': get_mean(ee.ImageCollection('MODIS/061/MCD19A2_GRANULES'), 'Optical_Depth_047'),
'mean_CO': get_mean(ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_CO'), 'CO_column_number_density'),
'mean_NO2': get_mean(ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2'), 'tropospheric_NO2_column_number_density')
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for Earth Engine data retrieval

In the get_environment_profile method, external API calls are made without error handling. Failures in these calls could lead to unhandled exceptions.

Consider wrapping the Earth Engine API calls with try-except blocks to handle potential errors gracefully. This will enhance the robustness of your application.

road_type_lengths = defaultdict(float)
# Calculate the length for each road type
for _, edge in edges.iterrows():
for road_type in flatten_highway(edge['highway']):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct method call to flatten_highway

The method flatten_highway is called without specifying the class name:

  • Line 175: for road_type in flatten_highway(edge['highway']):

Since flatten_highway is a static method of the PredictionAndProcessing class, it should be called using the class name:

- for road_type in flatten_highway(edge['highway']):
+ for road_type in PredictionAndProcessing.flatten_highway(edge['highway']):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for road_type in flatten_highway(edge['highway']):
for road_type in PredictionAndProcessing.flatten_highway(edge['highway']):
🧰 Tools
🪛 Ruff (0.8.0)

175-175: Undefined name flatten_highway

(F821)

geom['properties']['confidence_score'] = mean_confidence

# Create a GeoDataFrame from the geometries
gpd_polygonized_raster = gp.GeoDataFrame.from_features(geoms, crs='epsg:3857')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo: Undefined name gp should be gpd

On line 117, gp.GeoDataFrame.from_features is used, but gp is not defined. This appears to be a typo.

Replace gp with gpd to use the GeoPandas library correctly:

- gpd_polygonized_raster = gp.GeoDataFrame.from_features(geoms, crs='epsg:3857')
+ gpd_polygonized_raster = gpd.GeoDataFrame.from_features(geoms, crs='epsg:3857')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
gpd_polygonized_raster = gp.GeoDataFrame.from_features(geoms, crs='epsg:3857')
gpd_polygonized_raster = gpd.GeoDataFrame.from_features(geoms, crs='epsg:3857')
🧰 Tools
🪛 Ruff (0.8.0)

117-117: Undefined name gp

(F821)

Comment on lines +68 to +90
@staticmethod
def load_tiff(file_path):
with rasterio.open(file_path) as src:
image = src.read()
profile = src.profile
return reshape_as_image(image), profile

# Function to normalize the image
@staticmethod
def normalize_image(image):
return image / np.max(image)

# Function to preprocess the image for prediction
@staticmethod
def preprocess_image(image_path):
image, image_profile = load_tiff(image_path)
# Check if the image has 4 channels (e.g., RGBA)
if image.shape[-1] == 4:
# Discard the alpha channel (keep only the first three channels: RGB)
image = image[:, :, :3]
image = normalize_image(image)
return image, image_profile
@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct static method calls within the class

Static methods are being called without specifying the class name:

  • Line 83: image, image_profile = load_tiff(image_path)
  • Line 88: image = normalize_image(image)

Since these are static methods within the PredictionAndProcessing class, they should be called using the class name:

- image, image_profile = load_tiff(image_path)
+ image, image_profile = PredictionAndProcessing.load_tiff(image_path)
- image = normalize_image(image)
+ image = PredictionAndProcessing.normalize_image(image)

Alternatively, if you prefer cleaner code, consider changing these methods to class methods or instance methods.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def load_tiff(file_path):
with rasterio.open(file_path) as src:
image = src.read()
profile = src.profile
return reshape_as_image(image), profile
# Function to normalize the image
@staticmethod
def normalize_image(image):
return image / np.max(image)
# Function to preprocess the image for prediction
@staticmethod
def preprocess_image(image_path):
image, image_profile = load_tiff(image_path)
# Check if the image has 4 channels (e.g., RGBA)
if image.shape[-1] == 4:
# Discard the alpha channel (keep only the first three channels: RGB)
image = image[:, :, :3]
image = normalize_image(image)
return image, image_profile
@staticmethod
@staticmethod
def load_tiff(file_path):
with rasterio.open(file_path) as src:
image = src.read()
profile = src.profile
return reshape_as_image(image), profile
# Function to normalize the image
@staticmethod
def normalize_image(image):
return image / np.max(image)
# Function to preprocess the image for prediction
@staticmethod
def preprocess_image(image_path):
image, image_profile = PredictionAndProcessing.load_tiff(image_path)
# Check if the image has 4 channels (e.g., RGBA)
if image.shape[-1] == 4:
# Discard the alpha channel (keep only the first three channels: RGB)
image = image[:, :, :3]
image = PredictionAndProcessing.normalize_image(image)
return image, image_profile
@staticmethod
🧰 Tools
🪛 Ruff (0.8.0)

83-83: Undefined name load_tiff

(F821)


88-88: Undefined name normalize_image

(F821)

Copy link
Contributor

@Baalmart Baalmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. @pronapro for this. The future is indeed bright.

  • I can see that the deploy preview for the PR is failing. Please investigate this.
  • Can you please reach out (individually) to @wabinyai and @Mnoble-19 to thoroughly review this PR?

Thanks again.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (7)
src/spatial/Dockerfile (3)

4-12: Consider enhancing the system dependencies installation

The GDAL dependencies installation looks good, but could be improved for better reliability and efficiency.

Consider these improvements:

 RUN apt-get update && apt-get install -y --no-install-recommends \
     gcc \
     build-essential \
     libgomp1 \
     gdal-bin \
     libgdal-dev \
     libproj-dev \
     proj-bin \
+    python3-dev \
+    && apt-get clean \
     && rm -rf /var/lib/apt/lists/*

Additionally, consider pinning specific versions of critical packages for reproducible builds.


17-19: Enhance GDAL configuration with additional environment variables

While the basic GDAL include paths are set, consider adding more configuration for optimal GDAL operation.

Consider adding these additional environment variables:

 ENV CPLUS_INCLUDE_PATH=/usr/include/gdal
 ENV C_INCLUDE_PATH=/usr/include/gdal
+ENV GDAL_VERSION=$(gdal-config --version)
+ENV PROJ_LIB=/usr/share/proj

Line range hint 1-43: Consider optimizing for geospatial processing workloads

Given that this service handles geospatial data processing:

  1. Consider using a more specialized base image like gdal/gdal for better performance
  2. Memory and CPU requirements might need adjustment for processing large datasets
  3. Consider adding health checks for GDAL functionality

Example health check addition:

 FROM base as production
 ENV FLASK_ENV=production
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD gdal-config --version || exit 1
 CMD gunicorn --bind=0.0.0.0:5000 app:app
src/spatial/requirements.txt (1)

37-37: Clarify web framework strategy and fix versions

The presence of both FastAPI and Flask could indicate:

  1. A transition in progress
  2. Unnecessary complexity maintaining two web frameworks

Consider:

  1. Choosing one framework (FastAPI is more modern and async-first)
  2. If both are needed, document the reason
  3. Specify versions for stability:
-fastapi==0.115.0  # Specify if needed, otherwise can be removed.
-flask-cors 
-flask    
+fastapi==0.115.0
+flask-cors==5.0.0
+flask==3.0.3

Also applies to: 39-39, 153-153

src/spatial/controllers/controllers.py (2)

84-87: Enhance function name and add API documentation

The function name upload_image_for_prediction could be more specific to its purpose with stationary pollution sources. Additionally, add docstring documentation for API consumers.

 @controller_bp.route('/upload-image', methods=['POST'])
-def upload_image_for_prediction():
+def upload_stationary_pollution_source():
+    """Upload and analyze an image of a stationary pollution source.
+    
+    Returns:
+        JSON response containing geometry, confidence score, and environmental profiles
+    """
     return PollutantApis.upload_image()

92-94: Add pagination for get-all-data endpoint

The endpoint retrieving all data should implement pagination to prevent performance issues with large datasets.

 @controller_bp.route('/get-all-data', methods=['GET'])
 def get_all_data():
+    """Retrieve all pollution source data with pagination.
+    
+    Query Parameters:
+        page (int): Page number (default: 1)
+        per_page (int): Items per page (default: 20)
+    
+    Returns:
+        JSON response containing paginated pollution source data
+    """
+    page = request.args.get('page', 1, type=int)
+    per_page = request.args.get('per_page', 20, type=int)
+    if page < 1 or per_page < 1:
+        return jsonify({'error': 'Invalid pagination parameters'}), 400
     return PollutantApis.get_all_data()
src/spatial/views/pollutant_views.py (1)

6-11: Enhance MongoDB connection handling

The MongoDB connection setup could be more robust with connection error handling and connection pooling.

Consider enhancing the connection setup:

 def connect_mongo():
+    try:
+        # Enable connection pooling
+        client = MongoClient(Config.MONGO_URI, maxPoolSize=50)
-        client = MongoClient(Config.MONGO_URI)
         db = client[Config.DB_NAME]
+        # Verify connection
+        client.server_info()
         return db
+    except Exception as e:
+        raise Exception(f"Failed to connect to MongoDB: {e}")
 
 db = connect_mongo()
 collection = db["pollutant_data"]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c59556e and 9756e54.

📒 Files selected for processing (6)
  • src/spatial/Dockerfile (1 hunks)
  • src/spatial/configure.py (1 hunks)
  • src/spatial/controllers/controllers.py (3 hunks)
  • src/spatial/models/pollutant_identification.py (1 hunks)
  • src/spatial/requirements.txt (1 hunks)
  • src/spatial/views/pollutant_views.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spatial/configure.py
🧰 Additional context used
🪛 Ruff (0.8.0)
src/spatial/models/pollutant_identification.py

2-2: os imported but unused

Remove unused import: os

(F401)


3-3: time imported but unused

Remove unused import: time

(F401)


8-8: json imported but unused

Remove unused import: json

(F401)


12-12: pandas imported but unused

Remove unused import: pandas

(F401)


19-19: shapely.geometry.mapping imported but unused

Remove unused import: shapely.geometry.mapping

(F401)


24-24: geemap imported but unused

Remove unused import: geemap

(F401)


31-31: matplotlib.patches.Polygon imported but unused

Remove unused import: matplotlib.patches.Polygon

(F401)


36-36: pymongo.MongoClient imported but unused

Remove unused import: pymongo.MongoClient

(F401)


37-37: bson.json_util imported but unused

Remove unused import: bson.json_util

(F401)


62-62: Undefined name service_account

(F821)


83-83: Undefined name load_tiff

(F821)


88-88: Undefined name normalize_image

(F821)


185-185: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


188-188: Local variable building_types is assigned to but never used

Remove assignment to unused variable building_types

(F841)

src/spatial/views/pollutant_views.py

2-2: configure.get_trained_model_from_gcs imported but unused

Remove unused import: configure.get_trained_model_from_gcs

(F401)


16-16: Undefined name request

(F821)


17-17: Undefined name jsonify

(F821)


20-20: Undefined name request

(F821)


22-22: Undefined name jsonify

(F821)


25-25: Undefined name secure_filename

(F821)


26-26: Undefined name UPLOAD_FOLDER

(F821)


31-31: Undefined name request

(F821)


32-32: Undefined name request

(F821)


34-34: Undefined name jsonify

(F821)


37-37: Undefined name time

(F821)


44-44: Undefined name time

(F821)


48-48: Undefined name jsonify

(F821)


52-52: Undefined name datetime

(F821)


65-65: Undefined name time

(F821)


67-67: Undefined name time

(F821)


71-71: Undefined name time

(F821)


73-73: Undefined name time

(F821)


76-76: Undefined name jsonify

(F821)


81-81: Undefined name mapping

(F821)


98-98: Undefined name jsonify

(F821)


101-101: Undefined name jsonify

(F821)


114-114: Undefined name request

(F821)


124-124: Undefined name jsonify

(F821)


139-139: Undefined name Response

(F821)


140-140: Undefined name json

(F821)


140-140: Undefined name json_util

(F821)


145-145: Undefined name jsonify

(F821)


155-155: Undefined name jsonify

(F821)


170-170: Undefined name Response

(F821)


171-171: Undefined name json

(F821)


171-171: Undefined name json_util

(F821)


176-176: Undefined name jsonify

(F821)

🔇 Additional comments (9)
src/spatial/requirements.txt (1)

91-91: Consolidate Google Cloud dependencies

The BigQuery-related packages have overlapping functionality:

  • pandas-gbq
  • google-cloud-bigquery
  • google-cloud-bigquery-storage

Keep only the necessary ones to avoid conflicts.

Also applies to: 150-151

src/spatial/controllers/controllers.py (3)

16-16: LGTM: Clean import statement

The import follows Python conventions and is logically placed with other view imports.


84-87: ⚠️ Potential issue

Add request validation for image upload endpoint

The current implementation delegates all validation to the PollutantApis class. Consider adding basic request validation at the controller level.

 @controller_bp.route('/upload-image', methods=['POST'])
 def upload_image_for_prediction():
+    if 'image' not in request.files:
+        return jsonify({'error': 'No image file provided'}), 400
+    if not request.form.get('duration'):
+        return jsonify({'error': 'Duration parameter is required'}), 400
+    if not request.form.get('radius'):
+        return jsonify({'error': 'Radius parameter is required'}), 400
     return PollutantApis.upload_image()

84-94: Verify error handling implementation in PollutantApis

Let's verify the error handling implementation in the PollutantApis class methods.

✅ Verification successful

Error handling is well-implemented in PollutantApis

The verification shows comprehensive error handling across all endpoints:

  • Input validation with proper 400 responses for invalid requests
  • Not found conditions returning 404 status
  • Server errors properly caught with 500 status codes
  • Detailed error messages in JSON format for all error cases

Specific error cases handled:

  • File upload validation
  • Parameter validation for months and radius
  • MongoDB operations
  • Data processing for centroids
  • Empty result sets
  • Generic exception handling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check error handling in PollutantApis implementation

# Search for try-except blocks in the implementation
echo "Checking error handling in PollutantApis methods..."
rg -A 10 "try:|except:|finally:" src/spatial/views/pollutant_views.py

# Check for response error handling
echo "Checking response formatting..."
rg "jsonify.*error|abort.*[45][0-9]{2}" src/spatial/views/pollutant_views.py

Length of output: 4241

src/spatial/views/pollutant_views.py (2)

75-77: 🛠️ Refactor suggestion

Improve error handling granularity

The current error handling is too broad and exposes internal error details.

Implement more specific error handling:

-            except Exception as e:
-                return jsonify({"error": f"Error processing data for centroid: {e}"}), 500
+            except ValueError as e:
+                return jsonify({"error": "Invalid coordinates provided"}), 400
+            except TimeoutError:
+                return jsonify({"error": "Processing timeout occurred"}), 504
+            except Exception as e:
+                # Log the actual error for debugging
+                current_app.logger.error(f"Error processing centroid: {e}")
+                return jsonify({"error": "An error occurred while processing the data"}), 500

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.0)

76-76: Undefined name jsonify

(F821)


16-27: ⚠️ Potential issue

Enhance file upload validation and security

The current file validation is basic and could be more secure.

Add comprehensive file validation:

+ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg', 'gif'}
+def allowed_file(filename):
+    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

 if 'file' not in request.files:
     return jsonify({"error": "No file part in the request"}), 400

 file = request.files['file']
 if file.filename == '':
     return jsonify({"error": "No selected file"}), 400

+if not allowed_file(file.filename):
+    return jsonify({"error": "Invalid file type. Allowed types: png, jpg, jpeg, gif"}), 400

+# Generate unique filename to prevent overwrites
+timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
+filename = f"{timestamp}_{secure_filename(file.filename)}"
-filename = secure_filename(file.filename)
 file_path = os.path.join(UPLOAD_FOLDER, filename)
+
+# Ensure upload directory exists
+os.makedirs(UPLOAD_FOLDER, exist_ok=True)
 file.save(file_path)

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.0)

16-16: Undefined name request

(F821)


17-17: Undefined name jsonify

(F821)


20-20: Undefined name request

(F821)


22-22: Undefined name jsonify

(F821)


25-25: Undefined name secure_filename

(F821)


26-26: Undefined name UPLOAD_FOLDER

(F821)

src/spatial/models/pollutant_identification.py (3)

39-41: Ensure get_trained_model_from_gcs and Config are correctly imported

While get_trained_model_from_gcs and Config are imported from configure, please verify that:

  • The configure module exists and is accessible.
  • Config contains the required attributes (GOOGLE_CLOUD_PROJECT_ID, PROJECT_BUCKET).
  • get_trained_model_from_gcs functions as expected.

Would you like assistance in verifying these imports and configurations?

Also applies to: 48-51


62-62: ⚠️ Potential issue

Import the service_account module

The service_account module is used but not imported, leading to a potential NameError. Please add the following import statement:

+ from google.oauth2 import service_account

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.0)

62-62: Undefined name service_account

(F821)


60-65: ⚠️ Potential issue

Move initialize_earth_engine method inside the class

The @staticmethod decorator is applied to initialize_earth_engine, but it's defined outside of the PredictionAndProcessing class. To maintain proper structure, move this method inside the class:

- @staticmethod
- def initialize_earth_engine():
+ class PredictionAndProcessing:
+     @staticmethod
+     def initialize_earth_engine():

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.0)

62-62: Undefined name service_account

(F821)

Comment on lines +21 to +22
#install tensorflow
#RUN pip install tensorflow==2.11.1 --no-deps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

TensorFlow dependency is actively used and should not be removed

The TensorFlow installation in the Dockerfile should not be commented out as it's actively used in src/spatial/models/pollutant_identification.py for:

  • Loading a pre-trained model using tf.keras.models.load_model
  • Making predictions with the loaded model
  • Processing image data for pollutant identification

The code shows critical dependencies on TensorFlow that are essential for the application's machine learning functionality. Please uncomment and restore the TensorFlow installation in the Dockerfile.

🔗 Analysis chain

Verify the removal of TensorFlow dependency

The TensorFlow installation has been commented out without explanation. This could impact the application's functionality, especially for machine learning operations.

Let's check if TensorFlow is still required:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if TensorFlow is still referenced in the codebase
echo "Checking for TensorFlow imports and usage..."
rg -l "import tensorflow" || echo "No direct TensorFlow imports found"
rg -l "from tensorflow" || echo "No direct TensorFlow imports found"

# Check if it's in requirements.txt
echo "Checking requirements.txt for TensorFlow..."
grep -i "tensorflow" requirements.txt || echo "TensorFlow not found in requirements.txt"

Length of output: 554


Script:

#!/bin/bash
# Check the content of the file that imports tensorflow
echo "Content of pollutant_identification.py:"
cat src/spatial/models/pollutant_identification.py

# Check if there are any other potential ML/DL frameworks being used
echo -e "\nChecking for other ML frameworks:"
rg -l "import torch|from torch|import keras|from keras"

# Check for any dependency files
echo -e "\nLooking for dependency files:"
fd -g "*requirements*.txt" -g "setup.py" -g "pyproject.toml"

Length of output: 9325

google-cloud-bigquery[pandas]
gunicorn
flask-cors~=5.0.0
tensorflow==2.15.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Specify versions for core ML dependencies

The ML stack requires careful version management to ensure compatibility:

  1. TensorFlow 2.15.1 requires specific numpy versions
  2. torch and transformers should have pinned versions for reproducibility

Apply these changes:

tensorflow==2.15.1
-#keras==2.9.0    # Ensure compatibility with TensorFlow.
-numpy
+numpy>=1.26.0,<2.1.0  # Required by TensorFlow 2.15.1
-transformers
-torch
+transformers==4.36.0
+torch==2.1.1

Also applies to: 75-75, 85-85, 144-145

contourpy
cycler
decorator
dnspython
earthengine-api~=1.1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize geospatial dependency versions

The geospatial stack shows inconsistent version specifications:

  • Some use ~= (earthengine-api, esda)
  • Others use == (fiona, geopandas)

Standardize version specifications:

-earthengine-api~=1.1.1
-esda~=2.6.0
-fiona==1.10.1
-geopandas==0.13.2
-pyproj==3.5.0
-shapely~=2.0.6
+earthengine-api==1.1.1
+esda==2.6.0
+fiona==1.10.1
+geopandas==0.13.2
+pyproj==3.5.0
+shapely==2.0.6

Also applies to: 34-34, 38-38, 46-47, 108-108, 125-125

@@ -17,18 +1,153 @@
tensorflow==2.15.1
wrapt>=1.11.0,<1.15
Jinja2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Pin versions for reproducible builds

Many dependencies lack version specifications, which could lead to unexpected breaks when new versions are released.

At minimum, pin versions for core dependencies:

-Jinja2
-Markdown
-MarkupSafe
-Werkzeug
-anyio
+Jinja2==3.1.3
+Markdown==3.5.1
+MarkupSafe==2.1.3
+Werkzeug==3.0.1
+anyio==4.2.0

Also applies to: 5-9, 13-31

Comment on lines +88 to +91
def get_data_by_confidence():
return PollutantApis.get_data_by_confidence()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add parameter validation and documentation for confidence endpoint

The confidence score endpoint needs input validation and documentation for API consumers.

 @controller_bp.route('/get-data-by-confidence', methods=['GET'])
 def get_data_by_confidence():
+    """Retrieve pollution source data filtered by confidence score.
+    
+    Query Parameters:
+        min_confidence (float): Minimum confidence score (0.0 to 1.0)
+    
+    Returns:
+        JSON response containing filtered pollution source data
+    """
+    min_confidence = request.args.get('min_confidence', type=float)
+    if min_confidence is None or not 0 <= min_confidence <= 1:
+        return jsonify({'error': 'Invalid min_confidence parameter'}), 400
     return PollutantApis.get_data_by_confidence()

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +111 to +124
def get_data_by_confidence():
try:
# Parse the minimum confidence score from query parameters
min_confidence = float(request.args.get('min_confidence', 0))
print(f"Querying with min_confidence: {min_confidence}")

# Query documents where confidence_score is greater than or equal to min_confidence
data = list(collection.find({
"properties.confidence_score": {"$gte": min_confidence}
}))

# If no documents match, return an error message
if not data:
return jsonify({"error": "No data found for the given confidence score."}), 404
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add pagination and remove debug print

The endpoint could benefit from pagination for large datasets, and contains a debug print statement.

Implement pagination and clean up debugging:

     def get_data_by_confidence():
         try:
             # Parse the minimum confidence score from query parameters
             min_confidence = float(request.args.get('min_confidence', 0))
-            print(f"Querying with min_confidence: {min_confidence}")
+            
+            # Add pagination parameters
+            page = int(request.args.get('page', 1))
+            per_page = int(request.args.get('per_page', 10))
+            skip = (page - 1) * per_page
 
             # Query documents where confidence_score is greater than or equal to min_confidence
             data = list(collection.find({
                 "properties.confidence_score": {"$gte": min_confidence}
-            }))
+            }).skip(skip).limit(per_page))

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.0)

114-114: Undefined name request

(F821)


124-124: Undefined name jsonify

(F821)

Comment on lines +148 to +176
def get_all_data():
try:
# Retrieve all documents from the collection
data = list(collection.find({}))

# If no documents are found, return an error message
if not data:
return jsonify({"error": "No data found in the database."}), 404

# Convert the retrieved documents to a GeoJSON FeatureCollection
geojson_data = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": doc["geometry"],
"properties": doc["properties"]
} for doc in data
]
}

# Return the GeoJSON data
return Response(
json.dumps(geojson_data, default=json_util.default),
mimetype='application/geo+json'
)

except Exception as e:
return jsonify({"error": f"Failed to fetch data: {e}"}), 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize performance for large datasets

The current implementation could face performance issues with large datasets.

Consider implementing these optimizations:

     @staticmethod
     def get_all_data():
         try:
+            # Add pagination parameters
+            page = int(request.args.get('page', 1))
+            per_page = int(request.args.get('per_page', 10))
+            skip = (page - 1) * per_page
+
+            # Get total count for pagination metadata
+            total_count = collection.count_documents({})
+
             # Retrieve all documents from the collection
-            data = list(collection.find({}))
+            data = list(collection.find({}).skip(skip).limit(per_page))
 
             # If no documents are found, return an error message
             if not data:
                 return jsonify({"error": "No data found in the database."}), 404
 
             # Convert the retrieved documents to a GeoJSON FeatureCollection
             geojson_data = {
                 "type": "FeatureCollection",
-                "features": [
+                "features": [
                     {
                         "type": "Feature",
                         "geometry": doc["geometry"],
                         "properties": doc["properties"]
                     } for doc in data
-                ]
+                ],
+                "metadata": {
+                    "page": page,
+                    "per_page": per_page,
+                    "total_count": total_count,
+                    "total_pages": (total_count + per_page - 1) // per_page
+                }
             }

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.0)

155-155: Undefined name jsonify

(F821)


170-170: Undefined name Response

(F821)


171-171: Undefined name json

(F821)


171-171: Undefined name json_util

(F821)


176-176: Undefined name jsonify

(F821)

Comment on lines +2 to +3
import os
import time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused imports to enhance code clarity

The following imports are unused and can be safely removed:

  • Line 2: import os
  • Line 3: import time
  • Line 8: import json
  • Line 12: import pandas as pd
  • Line 19: from shapely.geometry import mapping
  • Line 24: import geemap
  • Line 31: from matplotlib.patches import Polygon as pltPolygon
  • Line 36: from pymongo import MongoClient
  • Line 37: from bson import json_util

Cleaning up unused imports helps maintain readable and efficient code.

Also applies to: 8-8, 12-12, 19-19, 24-24, 31-31, 36-37

🧰 Tools
🪛 Ruff (0.8.0)

2-2: os imported but unused

Remove unused import: os

(F401)


3-3: time imported but unused

Remove unused import: time

(F401)

# Function to preprocess the image for prediction
@staticmethod
def preprocess_image(image_path):
image, image_profile = load_tiff(image_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct static method calls within the class

Static methods should be called using the class name or by fully qualifying the method. Update the method calls as follows:

  • Line 83: Replace load_tiff(image_path) with PredictionAndProcessing.load_tiff(image_path)
  • Line 88: Replace normalize_image(image) with PredictionAndProcessing.normalize_image(image)

This ensures clarity and prevents potential errors.

Also applies to: 88-88

🧰 Tools
🪛 Ruff (0.8.0)

83-83: Undefined name load_tiff

(F821)

Comment on lines +185 to +188
except Exception as e:
number_of_buildings = 'Error'
building_density = 'Error'
building_types = 'Error'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle unused variables e and building_types

The variables e and building_types are assigned but never used:

  • Line 185: Exception variable e is assigned but not utilized.
  • Line 188: building_types is assigned but not used afterward.

Consider removing these variables or using them appropriately to avoid confusion.

🧰 Tools
🪛 Ruff (0.8.0)

185-185: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


188-188: Local variable building_types is assigned to but never used

Remove assignment to unused variable building_types

(F841)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants