Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track for loops #47

Closed
Phlya opened this issue Oct 26, 2018 · 9 comments · Fixed by #149
Closed

Track for loops #47

Phlya opened this issue Oct 26, 2018 · 9 comments · Fixed by #149

Comments

@Phlya
Copy link
Contributor

Phlya commented Oct 26, 2018

Hi, would be great to have a new track type for showing loops on Hi-C data. For example, a bedpe file could be used to plot rectangles (and this could include plotting TADs, if the coordinate pairs are the same). Something like this:
image

http://higlass.io/app/?config=AJQIPRFlREeqK2a1pciz_g

@joachimwolff
Copy link
Collaborator

This feature is already in development and part of the branch 'longrangecontacts' of HiCExplorer. It is implemented as a --loop option for hicPlotMatrix. However, it is just a red circle and not a nice square.
This branch is changing a lot, therefore I don't recommend it to use it. Please be a bit more patient, I hope to finish it this year and than I will bring it to pyGenomeTracks too.

@Phlya
Copy link
Contributor Author

Phlya commented Oct 30, 2018

Of course, no worries, thank you! I was just thinking of adding a new option for the links track myself, actually, but I suppose I'll just wait for when you add it then.

@Phlya
Copy link
Contributor Author

Phlya commented Nov 1, 2018

BTW a little rough around the edges, but I actually made a custom track for now, and it seems to work fine.

class LoopTrack(tracksClass.GenomeTrack):
    SUPPORTED_ENDINGS = ['.bedpe']  # this is used by make_tracks_file to guess the type of track based on file name
    TRACK_TYPE = 'loops'
    OPTIONS_TXT = """
height = 3
title = Loops
color = blue
line width = 1
"""
    def __init__(self, *args, **kwarg):
        super(LoopTrack, self).__init__(*args, **kwarg)
        if 'line width' not in self.properties:
            self.properties['line width'] = 1
        if 'line style' not in self.properties:
            self.properties['line style'] = 'solid'
        self.max_height = None
        valid_intervals = 0
        intervals = []
        line_number = 0
        with open(self.properties['file'], 'r') as file_h:
            for line in file_h.readlines():
                line_number += 1
                if line.startswith(('browser', 'track', '#', 'chrom1')):
                    continue
                try:
                    chrom1, start1, end1, chrom2, start2, end2 = line.strip().split('\t')[:6]
                except Exception as detail:
                    self.log.error('File not valid. The format is chrom1 start1, end1, '
                                   'chrom2, start2, end2\nError: {}\n in line\n {}'.format(detail, line))
                    exit(1)

                try:
                    start1 = int(start1)
                    end1 = int(end1)
                    start2 = int(start2)
                    end2 = int(end2)
                except ValueError as detail:
                    self.log.error("Error reading line: {}. One of the fields is not "
                                   "an integer.\nError message: {}".format(line_number, detail))
                    exit(1)

                assert start1 <= end1, "Error in line #{}, end1 smaller than start1 in {}".format(line_number, line)
                assert start2 <= end2, "Error in line #{}, end2 smaller than start2 in {}".format(line_number, line)

                if chrom1 != chrom2:
                    self.log.warn("Only loops in same chromosome are used. Skipping line\n{}\n".format(line))
                    continue

                if start2 < start1:
                    start1, start2 = start2, start1
                    end1, end2 = end2, end1

                # each interval spans from the smallest start to the largest end
                intervals.append([chrom1, start1, end1, start2, end2])
                valid_intervals += 1

        if valid_intervals == 0:
            self.log.warn("No valid intervals were found in file {}".format(self.properties['file']))
        intervals = pd.DataFrame(intervals, columns=['chrom', 'start1', 'end1', 'start2', 'end2'])
        file_h.close()
        self.loops = intervals

        if 'color' not in self.properties:
            self.properties['color'] = 'blue'

        if 'alpha' not in self.properties:
            self.properties['alpha'] = 0.8
    
    def plot_y_axis(self, ax, plot_ax):
        pass
    
    def plot_loops(self, ax, loop):
        from matplotlib.patches import Polygon
        width1 = loop[1] - loop[0]
        width2 = loop[3] - loop[2]
        x0 = (loop[1]+loop[2])/2
        y0 = loop[2] - loop[1]
        
        x1 = x0 + width2/2
        y1 = y0 + width2#/2
        
        x2 = (loop[1]+loop[2])/2
        y2 = loop[3] - loop[0]
        
        x3 = x0 - width1/2
        y3 = y0 + width1#/2

        rectangle = Polygon(np.array([[x0, y0], [x1, y1], [x2, y2], [x3, y3]]),
                           facecolor='none', edgecolor=self.properties['color'],
                           linewidth=self.line_width,
                           ls=self.properties['line style'])
        ax.add_artist(rectangle)

    def plot(self, ax, chrom_region, region_start, region_end):
        """
        Makes a rectangle highlighting an interaction between two regions on a
        linear scale representing interactions between Hi-C bins.
        :param ax: matplotlib axis
        :param label_ax: matplotlib axis for labels
        """
        self.max_height = 0
        count = 0
        
        loops_in_region = self.loops.query("(chrom=='%s')&(end1>%s)&(end1<%s)&(start2>%s)&(start2<%s)" %\
                                           (chrom_region, region_start, region_end, region_start, region_end))
        loops_in_region = loops_in_region[['start1', 'end1', 'start2', 'end2']]
        
        for idx, interval in loops_in_region.iterrows():
            if 'line width' in self.properties:
                self.line_width = float(self.properties['line width'])
            else:
                self.line_width = 0.5 * np.sqrt(interval.data)
            
            self.plot_loops(ax, interval)
            count += 1

        self.log.debug("{} loops were plotted".format(count))
        self.log.debug('title is {}'.format(self.properties['title']))

@rikrdo89
Copy link

rikrdo89 commented Oct 2, 2019

Hi, I was also trying to load loop tracks overlaid onto HIC maps... has support for bedpe files been implemented in the latest release of pyGenomeTracks?

@lldelisle
Copy link
Collaborator

Hi,
Unfortunately, not for the moment, but I am preparing a new release in which I had planned to integrate the loops.
@Phlya , would you mind, adding it to one of your branch so I can pull it and give you credit?
For the bedpe, you would like which type of plot (arrow for each interval, linked?).
Thanks,

@Phlya
Copy link
Contributor Author

Phlya commented Oct 3, 2019

Hi, I am happy to submit a PR, but haven't used it for ages... And as it turns out, pyGenomeTracks now requires python>=3.6 and I am running 3.5, so would have to solve that to test it. So if you want to check that it actually works yourself and fix any potential problems, I am happy to contribute.

This implementations draws rectangles corresponding to each loop.

@lldelisle
Copy link
Collaborator

@Phlya , no problem. It is just that I prefer you appear as a contributor. Please, just copy paste it in a file in the tracks folder in a branch of your repo and I will pull it and make it work.
Thanks

@Phlya
Copy link
Contributor Author

Phlya commented Oct 3, 2019

It's in my master branch, if that's OK.

Phlya@1d166a0

@lldelisle
Copy link
Collaborator

Pefect I am working on it.

@lldelisle lldelisle mentioned this issue Oct 4, 2019
@lldelisle lldelisle mentioned this issue Oct 24, 2019
bgruening added a commit that referenced this issue Dec 7, 2019
deprecations in preperation for a new 4.0 release.

During the years we have introduced several ways to enable/disable settings.
We have used on/off, yes/no, 0/1, true/false. From now on we recommend
to only use true/false so that we can unify our config files and make it
more intuitive for all pgt-users.

We also removed all 2 word items and concatenate them now by `_` (#132).
For example `line width` is now `line_width`. Thanks @LeilyR!

Features:
* Every config is now checked for syntax errors before anything is executed (@lldelisle)
* Added the possibility to merge transcripts into one single gene representation when using gtf (@lldelisle)
* Added the possibility to rasterize bedgraph plots (@lldelisle)
* Added the possibility to use summary functions on bedgraphs (@lldelisle)
* Generate an empty track if a requested region is not in the given track-files. Fixed #120 (@LeilyR)
* Generate an empty track if a chromosom is missing in bedgraph files. Fixed #120
* Improved UCSC style for intron arrows (@lldelisle)
* Flybase style now supports color_utr and height_uts (@lldelisle)
* A new tracktype `hlines` was added (@lldelisle)
* Allow to plot the arcs of the links with a color scale based on scores as proposed in #30 (@lldelisle)
* Allow to plot rectangle on a heatmap for loops. Fixed #47 (@Phlya, @lldelisle)
* A lot of HiC Matrix improvements (@lldelisle)
* The `alpha` property can now be used nearly everywhere (@lldelisle)

Also checkout our updated [readme]().

Merry Xmas and a happy new year!
The PGT team!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants