Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[instructlab] Add new plugin for Instructlab #3804

Closed

Conversation

jcastill
Copy link
Member

This change adds a new plugin that captures information from an installation of Instructlab. The capture works in two ways:

  • It goes through all containers whose names contain Instructlab, and
  • If specified, it will gather information from a user home directory.

In the first case, we gather different outputs from the command 'ilab', as well as directories containing training data, configuration files, chatlogs, taxonomies, and other data.
In the second case, we capture only certain directories. Users can also specify to gather .cache directory, where models and OCI directories can be found. This option is disabled by default because, while the information can be very useful, the directories are very big.

Related: RH: PLMCORE-10599, RHEL-54137, RHEL-58173


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3804
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@jcastill
Copy link
Member Author

Looking forward to suggestions about cleaning up this a bit.
The last section, when a user can be specified, will eventually have the same commands as with the containers, but I had some issues running them through both 'runas' and 'su -'. I want to investigate these issues further (mainly, timeouts), but I want to have this plugin ready asap so our support guys can start using it.

Copy link
Member

@arif-ali arif-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments

sos/report/plugins/instructlab.py Outdated Show resolved Hide resolved
sos/report/plugins/instructlab.py Outdated Show resolved Hide resolved
sos/report/plugins/instructlab.py Outdated Show resolved Hide resolved
@jcastill jcastill force-pushed the jcastillo-add-new-instructlab-plugin branch from 8816821 to 0e4cd28 Compare October 11, 2024 21:12
Comment on lines 80 to 83
_containers = self.get_containers()

for _con in _containers:
if _con[1].startswith('instructlab'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the Plugin() container methods for returning only containers of a certain name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

in_container = True
container_names.append(_con[1])

if in_container:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever run outside a container? If not, we can drop the conditional. If yes, then why do we not collect these when instructlab is running on the host directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in version 1.1 of RHEL AI, instructlab can run both in a container and outside a container, in the same installation. This may change for 1.2 and later

Comment on lines 89 to 140
self.add_copy_spec(
[f'{cont_share_conf_path}rhel_ai_config.yaml',
f'{cont_config_path}/config.yaml'],
container=cont)
self.add_copy_spec(
[f"{cont_local_path}/{data_dir}"
for data_dir in data_dirs],
container=cont
)
self.add_cmd_output(
[f"ilab {sub}" for sub in subcmds],
container=cont
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long standing sos style is to open and close the list bracket and method parens on the same line, with the contents indented.

self.add_copy_spec([
     f"{thing}",
     f"{thing2}"
])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Comment on lines 112 to 115
ilab_user = self.get_option("ilab_user") if \
self.get_option("ilab_user") else ''
if ilab_user:
Copy link
Member

@TurboTurtle TurboTurtle Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional is redundant as ilab_user defaults to ''.

This would also be cleaner with a walrus for the second conditional, though I guess we have yet to formally enact #3533. @arif-ali thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TurboTurtle we are more than happy to go ahead with py38 as agreed for future developments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in the #3533, ok for us as well.

ilab_user = self.get_option("ilab_user") if \
self.get_option("ilab_user") else ''
if ilab_user:
ilab_dir = f'/home/{ilab_user}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed for this to be the home dir of the given user. The plugin should determine the correct home dir rather than hardcoding an assumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of RHEL AI it is... the default user will live in /home always.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't one override the home directory (of one user or all)? Well, the bootc's kickstart will create the home directory there, but the KS template can be customized (and the docs even assumes the /home/<user> is not hardcoded, per sentence "The default location for the InstructLab data is in the home/ directory."). So I would vote for kind-of ~{ilab_user} approach than the hardcoded path.

Comment on lines 116 to 129
if self.get_option("ilab_conf_dir"):
ilab_dir = f'{ilab_dir}{self.get_option("ilab_conf_dir")}'
data_dirs_base = f'{ilab_dir}{local_share_dir}'

self.add_copy_spec(f"{ilab_dir}{config_dir}")
self.add_copy_spec([
f"{data_dirs_base}/{data_dir}" for data_dir in data_dirs
])
self.add_dir_listing(f"{ilab_dir}/{cache_dir}",
recursive=True)

if self.get_option("get-cache"):
self.add_copy_spec(
f'{ilab_dir}/{cache_dir}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: be consistent with the use of single or double quotes for f-strings. I personally use double-quotes for any f-string substitutions though I'm not of the mind to impose that as a project style, but it is helpful to be consistent within a plugin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I totally missed to change this while reviewing, sorry

Comment on lines 90 to 94
[f'{cont_share_conf_path}rhel_ai_config.yaml',
f'{cont_config_path}/config.yaml'],
container=cont)
self.add_copy_spec(
[f"{cont_local_path}/{data_dir}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is harder to follow what paths we're working with when some of the variables include the trailing / in dirs and others do not. This should be consistent within a given plugin even if we don't have a formal style recommendation for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, missed arranging some consistency while reviewing. Will fix this as well in t he next push

#
# See the LICENSE file in the source distribution for further information.

from sos.report.plugins import (Plugin, IndependentPlugin, PluginOpt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous parenthesis when imports don't span multiple lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Comment on lines 14 to 25
class Instructlab(Plugin, IndependentPlugin):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring explaining what the plugin is for and a high-level explanation of collections and options should be added for use with sos help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@jcastill jcastill force-pushed the jcastillo-add-new-instructlab-plugin branch from 0e4cd28 to c824e89 Compare October 14, 2024 06:48
Comment on lines 56 to 58
cont_cache_path = f"{cont_opt_path}{cache_dir}"
cont_config_path = f"{cont_opt_path}{config_dir}"
cont_local_path = f"{cont_opt_path}{local_share_dir}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer using os.path.join method to joining.. well.. paths :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@jcastill jcastill force-pushed the jcastillo-add-new-instructlab-plugin branch from c824e89 to 5b94214 Compare October 14, 2024 08:28
Comment on lines 116 to 121
ilab_dir = os.path.join("~/", ilab_user)
if self.get_option("ilab_conf_dir"):
ilab_dir = os.path.join(
ilab_dir, self.get_option('ilab_conf_dir')
)
data_dirs_base = os.path.join(ilab_dir, local_share_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin.path_join() accounts for non-/ sysroot.

Use pwd to source the homedir for the given user instead of relying on shell translation.

Comment on lines 123 to 132
self.add_copy_spec(f"{ilab_dir}{config_dir}")
self.add_copy_spec([
f"{data_dirs_base}/{data_dir}" for data_dir in data_dirs
])
self.add_dir_listing(f"{ilab_dir}/{cache_dir}",
recursive=True)

if self.get_option("get-cache"):
self.add_copy_spec(
f'{ilab_dir}/{cache_dir}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin.path_join() here as well. Be consistent.

Comment on lines 88 to 92
ilab_con = None
for con in self.containers:
if self.get_container_by_name(con):
ilab_con = con
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be significantly more efficient and readable as a one-liner using the regex supporting container method.

Comment on lines 94 to 107
self.add_copy_spec(
[f"{cont_share_conf_path}/rhel_ai_config.yaml",
f"{cont_config_path}/config.yaml"],
container=ilab_con)
self.add_copy_spec(
[f"{cont_local_path}/{data_dir}"
for data_dir in data_dirs],
container=ilab_con)
self.add_cmd_output(
[f"ilab {sub}" for sub in subcmds],
container=ilab_con)
self.add_dir_listing(cont_cache_path,
recursive=True,
container=ilab_con)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other paths, be consistent and use Plugin.path_join().

sos-style still needs addressing.

commands = ('ilab',)

option_list = [
PluginOpt('ilab_user', default='', val_type=str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a default user to collect for?

PluginOpt('ilab_user', default='', val_type=str,
desc='user that runs instructlab'),
PluginOpt('ilab_conf_dir', default='', val_type=str,
desc='instructlab data directory'),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this option and ilab_user to ilab-conf-dir and ilab-user in the next push

@jcastill jcastill force-pushed the jcastillo-add-new-instructlab-plugin branch from 5b94214 to 20dee44 Compare October 15, 2024 08:10
This change adds a new plugin that captures information from
an installation of Instructlab. The capture works in two ways:
- It goes through all containers whose names contain Instructlab, and
- If specified, it will gather information from a user home directory.

In the first case, we gather different outputs from the command 'ilab',
as well as directories containing training data, configuration files,
chatlogs, taxonomies, and other data.
In the second case, we capture only certain directories.
Users can also specify to gather .cache directory, where models and
OCI directories can be found. This option is disabled by default
because, while the information can be very useful, the directories
are very big.

Related: RH: PLMCORE-10599, RHEL-54137, RHEL-58173

Signed-off-by: Jose Castillo <[email protected]>
@jcastill jcastill force-pushed the jcastillo-add-new-instructlab-plugin branch from 20dee44 to 98aff96 Compare October 15, 2024 11:43
jcastill added a commit to jcastill/sos that referenced this pull request Oct 25, 2024
Add a new preset for RHEL AI, that captures
the output of Instructlab plugin, as well as
nvidia and bootc.

Related: RHEL-58169
Depends-on: sosreport#3804

Signed-off-by: Jose Castillo <[email protected]>
jcastill added a commit to jcastill/sos that referenced this pull request Oct 25, 2024
Add a new preset for RHEL AI, that captures
the output of Instructlab plugin, as well as
nvidia and bootc.

Related: RHEL-58169
Depends-on: sosreport#3804

Signed-off-by: Jose Castillo <[email protected]>
jcastill added a commit to jcastill/sos that referenced this pull request Oct 25, 2024
Add a new preset for RHEL AI, that captures
the output of Instructlab plugin, as well as
nvidia and bootc.

Related: RHEL-58169
Depends-on: sosreport#3804

Signed-off-by: Jose Castillo <[email protected]>
@pmoravec pmoravec closed this in c5068dc Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants