Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use arrays for storing source files in projects #9259

Closed
wants to merge 3 commits into from

Conversation

auduchinok
Copy link
Member

@auduchinok auduchinok commented May 21, 2020

I've started with trying to avoid converting between source files lists and arrays when creating parsing options for a project. Changing it to always use arrays required updating some other places and changing them added even more places needing fixes. I've managed to build FCS solution and removed some extra list conversions and mapping along the way.

I'm not sure if we want such changes but some of the previous discussions lead to idea that using arrays where possible would probably be good in some places in the compiler. I'd like to get some feedback on this PR, in particular if this approach should be used. :)

PR also changes how isLastCompiland and isExe are passed around. Would be good to simplify it even more if using arrays for source files paths is approved.

@dsyme
Copy link
Contributor

dsyme commented May 22, 2020

We should bite the bullet and adopt an immutable array type in the compiler.

The issues are discussed here from the perspective of the whole language experience: fsharp/fslang-suggestions#619. Adopting this in the compiler is a way to move forward on resolving this issue.

It's annoying that System.Collections.Immutable is not part of the default reference set for .netstandard 2.1 - the extra dependency for FSharp.Core just feels so awkward.

@cartermp
Copy link
Contributor

Note that we'd be unlikely to move to NS 2.1 for FSharp.Core any time soon, since that would break compat with .NET Core 2.1, which is an LTS release. Maybe in August of 2021 :)

Why the concern with the additional dependency?

@auduchinok
Copy link
Member Author

auduchinok commented Jun 2, 2020

@dsyme Should I continue fixing the tests and usages outside FCS solution? If you're OK with this approach I'll be glad to finish it so it gets merged and doesn't get stale.

@dsyme
Copy link
Contributor

dsyme commented Jun 4, 2020

@TIHan What do you think about this change and this whole issue?

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is doing two things, a refactor of isLastCompiland and changing of source file storage to use arrays instead of lists. We should have two separate PRs for them.

I like the changes for isLastCompiland and my only comment about it is we should rename isLastCompiland to isLastFile. Edit: The name isLastCompiland is fine, we don't need to change it.

As for the list to array change, why would we do it? What does it improve? Arrays are faster at iterating than lists, but is the list iteration/transformation time on source files a performance problem? Arrays being mutable are also a concern which is why we should start using ImmutableArrays. The compiler already depends on Collections.Immutable AFAIK, so using it internally with our own wrapper shouldn't be a problem.

Deciding to use a FSharpList or ImmutableArray is tricky. At the moment, we should default to using FSharpList because of the pattern match and initialization advantages and List. API; it's also just pretty common. When we spot a performance problem or we know that it's used in a performance critical area, changing it to an ImmutableArray might be desired.

If we figure out how to expand ImmutableArray to take advantage of pattern matching and initialization - then it could become the new default immutable collection to use, but we are not there yet.

@cartermp
Copy link
Contributor

cartermp commented Jun 4, 2020

I did a little bit of experimentation with completion lists and trying to move things over to arrays instead of lists. Unfortunately, switching over to arrays must be viral. Unless this is a viral change, the following two things must happen:

  • Convert lists into arrays
  • Convert arrays back into lists

I found that this ends up being more expensive, even if the converting back/forth is done "at the fringes". Unfortunately, this also makes things kind of hard to test and assess. You effectively need to make the change in the core of the compiler and thread everything back out through tools to avoid these conversion costs.

@auduchinok
Copy link
Member Author

As for the list to array change, why would we do it?

I'm trying to avoid needless conversion between these lists and arrays which is happenning multiple times.
As previous measurements by @cartermp and others showed in many cases it's significantly faster to work with arrays than with linked lists (and it's not only about iteration but also about being able to allocate less, to get a collection length instantly, and to use more efficient collection processing functions).
If we're to choose a single data structure for now, I propose we go with arrays (like it's done in this PR) until we have a better option (like ImmutableArray when it's proven to be as performant as arrays), and I'd be happy to propose more similar changes. If you're not happy with using a more performant data structure across the changed code then, well, I'll probably have to leave it as is.

TIHan requested changes

I'm sorry, I don't think I can see a precise request in the message, what kind of changes do you expect here, apart from splitting out the isLastCompiland change?

@TIHan
Copy link
Contributor

TIHan commented Jun 4, 2020

@cartermp did show measurements for his problem. Do you have measurements for yours? Can you provide evidence that changing lists to arrays in this case increases performance? If it does, then I will absolutely be inclined to approve the PR - though I would use ImmutableArray instead. ImmutableArray is fast, just like array, do not worry. Rolsyn/C# use it heavily; very battle tested.

Increasing performance really requires evidence. You just really can't change a significant amount of code saying it increases performance without showing evidence that it does.

I'm trying to avoid needless conversion between these lists and arrays which is happenning multiple times.

I know that Parsing and Project options use arrays on their fields. You could make them lists and it would stop the needless conversion that happens internally, but not externally since existing code use arrays.

If we're to choose a single data structure for now, I propose we go with arrays

I disagree. If you really want to use an array, we should just use ImmutableArray even if we don't have the perfect module API for it. Otherwise, a list is fine.

what kind of changes do you expect

For isLastCompiland changes: I was suggesting the places where the value named, isLastCompiland, be renamed to isLastFile since we are getting rid of that tuple; but, after thinking about it, Compiland is just as descriptive as File - so it's fine to leave the name as is.

@auduchinok
Copy link
Member Author

TIHan requested changes

What particular change do you request in the arrays part?

@brettfo brettfo changed the base branch from master to main August 19, 2020 20:02
@KevinRansom
Copy link
Member

@auduchinok , thanks for this contribution, it has been quite a while since there was any work on it, and so I am going to close it.

Please reactivate, when you have some time to revisit it.

Thanks

Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants