Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Tableau Reading Support #10733

Merged
merged 36 commits into from
Aug 7, 2024
Merged

Initial Tableau Reading Support #10733

merged 36 commits into from
Aug 7, 2024

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Aug 1, 2024

Pull Request Description

  • Adds Hyper_File allowing reading a Tableau hyper file.
  • Can read the schema and table list.
  • Can read the structure of a table.
  • Can read data into an Enso Table.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

jdunkerley and others added 12 commits July 31, 2024 15:02
The main problem was that Classloader used during HyperAPI was incorrect
We should be using the one that loaded the class, which was a rather
straightforward change. Additionally we must set correct permissions on
`hyperd`.
sbt will attempt to download necessary dependenciess to
`unmanagedClasspath` if jars are missing.
@jdunkerley jdunkerley added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Aug 1, 2024
jdunkerley and others added 11 commits August 1, 2024 13:23
build.sbt Outdated
fetchZipToUnmanaged := {
val unmanagedDirectory = (Compile / unmanagedBase).value
if (IO.listFiles(unmanagedDirectory).size < 2) { // Heuristic, should have at least hyperapi jar and os-specific one.
System.out.println(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ManagedLogger?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I didn't see the need for any logging. Somehow CI doesn't like my solution and we can't reproduce locally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you prefer. I think it's useful to see a log for this (at least on debug level) to be able to tell what's happening. I'd especially appreciate a message right before we start a download of the 200Mb file - for fiber connections that's moments, but when you are developing on a train, getting a message explaining to you why the build process just started to block for 5 minutes is actually precious information.

I think it's better to do this via ManagedLogger though, as it allows users to manage log levels and gathers logs in one place - print is going to be completely separate from it.

'jna-platform', licensed under the Apache-2.0, is distributed with the Tableau.
The license file can be found at `licenses/APACHE2.0`.
Copyright notices related to this dependency can be found in the directory `net.java.dev.jna.jna-platform-5.14.0`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to say but I think ideally we should add an entry about HyperAPI here. Can be added by modifying notice-header.

But current is probably OK too since we do have the COPYRIGHT-HYPERAPI file, so feel free to ignore.

Comment on lines +35 to +39
## ICON metadata
Returns the name of the current schema.
`*` represents all schemas.
schema : Text
schema self = self.internal_schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean if f.schema returns "*"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means accessing all tables - schema isn't used much inside Hyper but it is in theory supported.

Comment on lines 58 to 79
// Check if any files in the hyper directory, otherwise download them.
try (var files = Files.list(HYPER_PATH)) {
if (files.findAny().isEmpty()) {
switch (OSPlatform.CurrentPlatform) {
case WINDOWS -> downloadHyper(
"https://enso-data-samples.s3.us-west-1.amazonaws.com/tableau/hyperd.exe",
"hyperd.exe",
false);
case MAC_ARM64 -> downloadHyper(
"https://enso-data-samples.s3.us-west-1.amazonaws.com/tableau/macos-arm64/hyperd",
"hyperd",
true);
case MAX_X64 -> throw new IOException(
"Unsupported platform: Only ARM64 Mac is supported.");
case LINUX -> downloadHyper(
"https://enso-data-samples.s3.us-west-1.amazonaws.com/tableau/linux/hyperd",
"hyperd",
true);
case OTHER -> throw new IOException(
"Unsupported platform: " + OSPlatform.CurrentPlatform);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot this executable be distributed together with the library?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's massive (>200Mb) so making it fetch it when used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fine for now but I think for longer term we should prefer to keep this as part of the library and try using our existing infrastructure to fetch the whole library on demand.

Also that's why I'd like to put this into a cache directory - if we ever switch to distributing with library, a cache dir will get more likely to be cleaned up, while this may stay on user's system for longer.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

@jdunkerley jdunkerley marked this pull request as ready for review August 2, 2024 11:13
jdunkerley and others added 4 commits August 2, 2024 13:15
Previously the task was reporting an empty sequence despite fetching the
required jars.
build.sbt Outdated Show resolved Hide resolved
@schema (hyper -> make_schema_selector hyper True)
set_schema : Text -> Hyper_File
set_schema self schema =
if schema == self.schema then self else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is schema a collection of tables, or is it table metadata?

Copy link
Collaborator

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving my own sbt changes because I'm always right

image

jdunkerley and others added 4 commits August 2, 2024 16:56
Since our CI builds things multiple times and it might happen
`unmanagedClasspath` was not triggered before `unmanagedJars`, we could
have a situation when necessary jars were not copied to the
distribution.
This change ensures that is no longer the case.
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Aug 6, 2024
@mergify mergify bot merged commit b8c036c into develop Aug 7, 2024
42 checks passed
@mergify mergify bot deleted the wip/jd/tableau branch August 7, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants