Replies: 8 comments 4 replies
-
We talked about a project that has set up a small camera network in an area that is also doing physical sampling of seeds and saplings. The aim is to observe "phenophases" of deposited material and complement lab-based sampling with visual sampling Common ground with other projects: image preprocessing, managing cloud object storage, exploration of suitable computer vision techniques Researcher support needs: setting up python virtual environments, some pair programming to consolidate python skills needed to work with ML/CV toolkits, getting into a git-based workflow Some high-level consultancy on ML approaches from University of Edinburgh via https://homepages.inf.ed.ac.uk/omacaod/ |
Beta Was this translation helpful? Give feedback.
-
BioCLIP - A Vision Foundation Model for the Tree of Life came up in an email thread (via @mattfry-ceh ) and is worth linking here. Fined-tuned CLIP/ViT - "We curate and release TreeOfLife-10M (the largest and most diverse available dataset of biology images), train BioCLIP, rigorously benchmark our approach on diverse fine-grained biology classification tasks"
|
Beta Was this translation helpful? Give feedback.
-
https://github.com/agentmorris/MegaDetector - MILA / MegaDetector via @KatrionaGoldmann at Turing Inst "...helping conservation biologists spend less time doing boring things with camera trap images." - object detection model that originally came out of AI for Earth / Planetary Computer. Could be a shoe-in for prototyping; would be really interested to explore how the group in Biodiversity are reusing it. |
Beta Was this translation helpful? Give feedback.
-
@albags and myself and the researcher who's coding on this project ran a 2 hour mob programming session yesterday, which was interesting. Some of this needs split out into other places (additions to the diagrams of requests coming to the RSE group and how we route them #19 ; and documentation PRs for problems we've helped solve more than once). A quick summary:
Followups here (for me anyway) are
|
Beta Was this translation helpful? Give feedback.
-
A quick update note on how this is going! @albags and I are holding mob programming sessions with @elirai about once every three weeks, in practise more code walkthrough and consultancy than much hands-on work. The work's progressing really fast (almost completely without our involvement!) and currently at a stage of trialling NNI (new to me) for hyperparameter sweeping. It's all on JASMIN now, performance limitations on the available on-prem VM.
|
Beta Was this translation helpful? Give feedback.
-
[heart] Alba Gomez Segura reacted to your message:
…________________________________
From: Jo Walsh ***@***.***>
Sent: Wednesday, August 28, 2024 5:02:40 AM
To: NERC-CEH/rse_group ***@***.***>
Cc: Alba Gomez Segura ***@***.***>; Mention ***@***.***>
Subject: Re: [NERC-CEH/rse_group] Computer vision model reuse / DataLabs (Discussion #12)
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the senders email address and know the content is safe.
A quick update note on how this is going!
@albags<https://github.com/albags> and I are holding mob programming sessions with @elirai<https://github.com/elirai> about once every three weeks, in practise more code walkthrough and consultancy than much hands-on work.
The work's progressing really fast (almost completely without our involvement!) and currently at a stage of trialling NNI<https://nni.readthedocs.io/en/stable/> (new to me) for hyperparameter sweeping. It's all on JASMIN now, performance limitations on the available on-prem VM.
* Data and metadata management for image collections - ease of getting data into cloud storage and reusing it in experiments; ease of linking processes applied to intermediate forms of data ("provenance"). The object store API<https://github.com/NERC-CEH/object_store_api/> is intended to address the first part of this. DVC<https://dvc.org/> could be a shoe-in for the latter part - @matthewcoole<https://github.com/matthewcoole> has been evaluating it recently for LLM fine-tuning workflows. DVC has got s3 storage integration<https://dvc.org/doc/user-guide/data-management/remote-storage/amazon-s3> which could work with the s3 storage on JASMIN<https://s3-portal.jasmin.ac.uk/login> or the planned in-house one. The trick here is introducing this in a way that has the right kind of affordance, doesn't feel like too much overhead for a researcher in the early stages of an experiment
* Experiment tracking for ML model training. I've happily used MLFlow<https://mlflow.org/> for this, both standalone and as it's integrated behind Databricks / Azure Machine Learning. DVC has its own moral equivalent, Weights and Biases<https://wandb.ai/site> could work well in a single-developer, local-install setting. There are too many options, this is a matter both of making a strong recommendation and producing example-driven documentation. We'd all benefit from a hands-on deep dive into DVC, post-RSECon
* Python packaging for re-use. There's a project repository with a naturally-emerging practise of breaking down experimental notebooks into sets of functions. Thanks @lewis-chambers<https://github.com/lewis-chambers> for making the python-template<https://github.com/NERC-CEH/python-template/> repo used in the bigger Digital Research Infrastructure projects publically available!
*
*
*
—
Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFNXLR4HDGO4P4QHCV6QEW3ZTVKXBAVCNFSM6AAAAABJPXZYUWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANBXGAZTCMQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
This email and any attachments are intended solely for the named recipients and are confidential. If you are not the intended recipient, please reply to the email to highlight the error and delete this email from your system; you must not use, disclose, copy, or distribute this email or any of its attachments. UK Centre for Ecology & Hydrology (UKCEH) has taken reasonable precautions to minimise risk of this email or any attachments containing viruses or malware, but the recipient should carry out its own virus and malware checks before opening the attachments. UKCEH does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKCEH business are solely those of the author and do not represent the views of UKCEH. We process your personal data in accordance with our Privacy Notice, available on the UKCEH website. https://www.ceh.ac.uk/privacy-notice Registered office address; Maclean Building Benson Lane, Crowmarsh Gifford, Wallingford, Oxfordshire, United Kingdom, OX10 8BB Companies Registered Name; UK Centre for Ecology & Hydrology Place of Registration; England Registered Company Number; 11314957
|
Beta Was this translation helpful? Give feedback.
-
We spent about 1.5 hours today looking at how to fit DVC in this project to help with either data management, model training experiment tracking, or both via pipelines. Notes to catch you up @albags - and thanks for popping in to share notes on DVC @matthewcoole It would be wonderful to get an automated workflow setup going with The training data is included in the repository; the growing image collection for doing change detection and classification from is living in what looks like the filesystem of a JASMIN Group Workspace
We touched on JASMIN's object storage and the way in which different projects use that. It's added complexity for an early stage project, reaction is useful for planning how to make it easier to get data to and fro, we still haven't picked out a pattern for doing this happily with DVC. We also touched on off-the-shelf model choice for doing classification of areas picked out as containing changes by the initial model. This probably wants to be a chain or ensemble. It could be LeafNet for an initial prototype. This part of the project isn't ours to get into though, we're here for the wiring and tooling... |
Beta Was this translation helpful? Give feedback.
-
Asian Hornet Database - labelling annotation, good test case, gold-ish standard Project may soon become unfunded |
Beta Was this translation helpful? Give feedback.
-
@cmtso referred a researcher to me who has compound issues working with computer vision models within DataLabs. We have an intro chat later to find out more about the use case, the data etc. I'm not sure what to expect but anticipate something like
I'll update this with any notes, @albags is in the invite
See also https://github.com/NERC-CEH/datalab
Beta Was this translation helpful? Give feedback.
All reactions