Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

Closed
cartermp opened this issue Jan 12, 2019 · 13 comments
Closed

TcSymbolUseData[] accounting for 19.1 MB on the LOH #6084

cartermp opened this issue Jan 12, 2019 · 13 comments

Comments

@cartermp
Copy link
Contributor

cartermp commented Jan 12, 2019

This was after opening VS (dev16.0, with built-in VSIX) for about three minutes, slowly implementing a function in FSharp.Editor that uses FCS and Roslyn types for about 3 minutes. I say slowly because I code slowly 🙂

image

This is the TcSymbolUses ctor getting called 49 times, which is ~390k on the LOH each time. This is called by TypeCheckTask in the incremental builder:

https://github.com/Microsoft/visualfsharp/blob/631401a1cc7c487a6b27486242ac7df3e77bd495/src/fsharp/service/IncrementalBuild.fs#L1378

Since in my scenario the large majority of these symbols are the same (in fact they would probably be the mostly the same in nearly every scenario), I wonder if there's an opportunity to cache them.

@cartermp cartermp added this to the 16.0 milestone Jan 12, 2019
@dsyme dsyme changed the title TcSymbolUse[] accounting for 19.1 MB on the LOH TcSymbolUseData[] accounting for 19.1 MB on the LOH Jan 12, 2019
@dsyme
Copy link
Contributor

dsyme commented Jan 12, 2019

The way things are set up, we have to save some information after checking a file - and the information we save is a dump of all the symbol resolutions. For a medium-sized file I'd imagine this can be 400K no problem. The information is not particularly long lived - the next check of the file will make it irrelevant and the information will be collected. I think this architecture is OK - if we were trying to reduce the amount of re-computed information we would do that by making all of checking more incremental from parsing through to type-checking.

SO it's not a priori wrong to allocate this information, even in one linear array (note that pre-indexing the information or attempting to compress it is a waste of time since it is very often discarded). It does however feel odd if the CLR is somehow assuming it is long-lived.

Is the CLR applying the (in this case false) heuristic that all LOH objects are long-lived? If so I suppose we could artificially chunk the array though it feels, well, artificial. What's the minimum size on the LOH?

[ Aside: TBH much of this memory work would feel easier if the CLR just had a intrinsic allocation method that said "this object is big but transient", or magically worked that out. From a GC perspective there should be no real problem with allocating short-lived big objects apart from the data-copying costs involved, as the data should just evaporate immediately on next collection. ]

@dsyme
Copy link
Contributor

dsyme commented Jan 12, 2019

BTW this is the definition of TcSymbolUseData:

[<Struct>]
type TcSymbolUseData = 
   { Item: Item
     ItemOccurence: ItemOccurence
     DisplayEnv: DisplayEnv
     Range: range }

Looking at this

  1. someone might want to turn ItemOccurence into a struct.

For the others:

  • DisplayEnv is generally a pointer to a shared object, not much to do there without too much work (well, ok, you could make it a uint16 indexing into an array of DisplayEnv)
  • Item is a pointer to more information about the item that is a big discriminated union that's going to be hard to reduce
  • Range is a struct already, and has recently got a tad bigger, not much we can do about that

@dsyme
Copy link
Contributor

dsyme commented Jan 12, 2019

To answer my question:

'Big' objects go here – as the size at which an object may end up on this heap is 85,000 bytes, this usually means arrays with more than about 20,000 entries. https://www.red-gate.com/simple-talk/dotnet/net-framework/the-dangers-of-the-large-object-heap/

So we should chunk this thing I guess. Ugh how artificial. Fighting the memory manager is a PITA.

@dsyme
Copy link
Contributor

dsyme commented Jan 12, 2019

If someone wants to look at this, it should be fairly easy to chunk the array allocated here based on sizeof<TcSymbolUseData> and a heuristic about LOH objects, it's pretty well encapsulated. The member AllUsesOfAllSymbols is only ever iterated, so you simply need to change consumers to iterate an array-of-arrays

type TcSymbolUses(g, capturedNameResolutions : ResizeArray<CapturedNameResolution>, formatSpecifierLocations: (range * int)[]) = 
    
    // Make sure we only capture the information we really need to report symbol uses
    let allUsesOfSymbols = [| for cnr in capturedNameResolutions -> { Item=cnr.Item; ItemOccurence=cnr.ItemOccurence; DisplayEnv=cnr.DisplayEnv; Range=cnr.Range } |]
    let capturedNameResolutions = () 
    do ignore capturedNameResolutions // don't capture this!

    member this.GetUsesOfSymbol(item) = 
        [| for symbolUse in allUsesOfSymbols do
               if protectAssemblyExploration false (fun () -> ItemsAreEffectivelyEqual g item symbolUse.Item) then
                  yield symbolUse |]

    member this.AllUsesOfSymbols = allUsesOfSymbols

    member this.GetFormatSpecifierLocationsAndArity() = formatSpecifierLocations

@dsyme
Copy link
Contributor

dsyme commented Jan 12, 2019

(I must add: it is great to see this LOH-analysis work beginning to hone in on the actual data we need to save, rather than data that should never have been allocated/copied in the first place :) )

@baronfel
Copy link
Member

I might be off-base here, but the TcSymbolUseData[] is created from the capturedNameResolutions parameter, which is itself backed by an array at least capturedNameResolutions.Length long, right? If so, this means that the backing array for the list should be on the LOH as well, so we may need to go a step further back and chunk the way those name resolutions are generated in addition.

@dsyme
Copy link
Contributor

dsyme commented Jan 12, 2019

@baronfel Yes, that's correct (except that CapturedNameResolution is not a large struct so there will be less on the LOH).

@baronfel
Copy link
Member

Ah, I see the caveat there now. Because CapturedNameResolution contains less data we can have more array elements before we hit the 85,000 byte limit, so we may not hit the LOH for that structure at all. Thanks for clarifying.

@cartermp
Copy link
Contributor Author

FYI there is excellent information in our docs about this: https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap

@cartermp
Copy link
Contributor Author

SO it's not a priori wrong to allocate this information, even in one linear array (note that pre-indexing the information or attempting to compress it is a waste of time since it is very often discarded).

This feels like a bit of a design smell to me. If we need to allocate ~390k of data very often (in this case, every 3-4 seconds), then it implies something needs to be cached. I was only working with two files and typing in one, and only adding a single function. I don't see why that requires

@cartermp
Copy link
Contributor Author

But relatively speaking this is far less pressing than other issues filed

@dsyme
Copy link
Contributor

dsyme commented Jan 14, 2019

This feels like a bit of a design smell to me. If we need to allocate ~390k of data very often (in this case, every 3-4 seconds), then it implies something needs to be cached. I was only working with two files and typing in one, and only adding a single function. I don't see why that requires

Yes. The smell is really just a whiff of the much bigger stench of re-checking entire files very frequently even on small changes.

We could in theory pool the arrays, to avoid the reallocation. However that must be done with real care as the arrays contain references. And, putting aside the LOH, I'm certain that the costs of actually doing the re-checking are much higher than the cost of allocating and filling this specific data structure.

@dsyme
Copy link
Contributor

dsyme commented Jan 14, 2019

FYI there is excellent information in our docs about this: https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap

Thanks, yes, that's great up-to-date info

TIHan pushed a commit that referenced this issue Jan 17, 2019
* chunkify TcSymbolUseData

* move LOH size out to a constant

* do chunking and mapping together to reduce allocations

* clarify comment around GC impacts

* add comment informing others of the potential for LOH allocations
auduchinok added a commit to auduchinok/fsharp that referenced this issue May 14, 2019
auduchinok added a commit to auduchinok/fsharp that referenced this issue May 23, 2019
auduchinok added a commit to auduchinok/fsharp that referenced this issue Sep 28, 2019
auduchinok added a commit to auduchinok/fsharp that referenced this issue Nov 22, 2019
auduchinok added a commit to auduchinok/fsharp that referenced this issue Feb 27, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Mar 3, 2020
DedSec256 pushed a commit to DedSec256/fsharp that referenced this issue Apr 3, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Apr 9, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Apr 27, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Jun 9, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 20, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 27, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Oct 22, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Oct 28, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Dec 11, 2020
auduchinok added a commit to auduchinok/fsharp that referenced this issue Mar 5, 2021
En3Tho pushed a commit to En3Tho/visualfsharp that referenced this issue Mar 5, 2021
En3Tho pushed a commit to En3Tho/visualfsharp that referenced this issue Mar 5, 2021
auduchinok added a commit to auduchinok/fsharp that referenced this issue Apr 2, 2021
auduchinok added a commit to auduchinok/fsharp that referenced this issue May 17, 2021
auduchinok added a commit to auduchinok/fsharp that referenced this issue May 17, 2021
auduchinok added a commit to auduchinok/fsharp that referenced this issue Jun 22, 2021
auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 12, 2021
auduchinok added a commit to auduchinok/fsharp that referenced this issue Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants