Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current state and future of Graphite (let's discuss) #2418

Open
deniszh opened this issue Jan 27, 2019 · 30 comments
Open

Current state and future of Graphite (let's discuss) #2418

deniszh opened this issue Jan 27, 2019 · 30 comments

Comments

@deniszh
Copy link
Member

deniszh commented Jan 27, 2019

Hello!
I want to discuss the current state of Graphite project, how we can proceed further and how we can coordinate our efforts. Please note, that Graphite is open-source project driving by the community, and I'm only one of its formal co-maintainers, so, I'm not BDFL here and just giving you my opinion here. I'm calling other maintainers, contributors and companies to provide own view and contribute and discuss ideas here.

The current state of Graphite and its ecosystem.

Because of the big size of projects' review, I put it aside - in a separate post, please check it out if interested. I'll put only my outcome here - how I mentioned many times before, IMO Graphite is not only a project currently, but more like the whole ecosystem of projects, developed at a different time by different developers for different purposes. Not all of these projects are compatible with all features of the original project, but a user can (and should) pick up that or another implementation considering own use case, requirements, and implementation.

Graphite 2.0

I think that Graphite already reached status of mature project now, and theoretically we can left it as is and not do much. But if we want to push it further - what we should/can do in Graphite 2.0?
IMO we should:

  • Target small to medium installations (big installs can be covered by Metrictank, Biggraphite, and Clickhouse).
  • Keep using whisper as storage - it has own upsides and downsides, but still, have huge installation base and should be supported.
  • Officially deprecate python carbon daemon and replace it with go-carbon. Please note, that go-carbon is a separate project with own maintainers, not sure should we use it as is and contribute to its development or fork it.
  • Still use graphite-web for rendering, but
  • Get rid of Django - it was (and still) a constant source of incompatibilities and installation issues. (OTOH - will we have the same issues with e.g. Flask then? Or maybe we just strictly should use LTS versions?)
  • Get rid of state in graphite-web - or make it at optional, at least (i.e. separate rendering and dashboards/tree-view, as was done in graphite-api).

But plan above is not that smooth, though

  • What should we do with relay? Officially adopt carbon-c-relay or carbon-relay-ng? Again, use it as separate components or fork? Please also note that go-carbon has no support of blacklisting too.
  • What should we do with aggregators? IMO lack of flexible aggregators is one of the main downsides of the whole Graphite ecosystem.
    OTOH: if it was not developed for that long time, probably it's not that much needed and limited support in carbon-c-relay or carbon-relay-ng is enough?
  • Should we also deprecate graphite clustering protocol and switch to carbonserver completely?

About decision making

I'm not sure that we have some formal structure or committee behind Graphite. It's a quite small project, with many developers and maintainers, which coming and going, implementing some specific feature, interested only in some part of the project etc.
IMO we do not need even a have a formal vote on that discussion. If we have no dedicated person who will implement something - our votes mean nothing. I think we're a small group and we'll be able to reach some consensus on a plan - but proper implementation plan is much more important IMO.

Please treat text above only as my personal viewpoint, and reveal your own ideas.

/cc @DanCech @piotr1212 @iksaif @cbowman0 @Dieterbe @obfuscurity @cdavis @mleinart @brutasse @tmm1 @gwaldo @esc @SEJeff @jssjr @bitprophet

@iksaif
Copy link
Member

iksaif commented Jan 27, 2019

  • Target small to medium installations (big installs can be covered by Metrictank, Biggraphite, and Clickhouse).

I think it would be nice to have something that works easilly for smalls installs but still can be extended for big installs. Ideally with the same components.

  • Officially deprecate python carbon daemon and replace it with go-carbon. Please note, that go-carbon is a separate project with own maintainers, not sure should we use it as is and contribute to its development or fork it.

Which is why I'm a bit afraid about that... Plus it makes it harder to share code between graphite and carbon. But considering the workforce that we have I do not have better suggestions.

  • Get rid of Django - it was (and still) a constant source of incompatibilities and installation issues. (OTOH - will we have the same issues with e.g. Flask then? Or maybe we just strictly should use LTS versions?)
  • Get rid of state in graphite-web - or make it at optional, at least (i.e. separate rendering and dashboards/tree-view, as was done in graphite-api).

I'm all in to simplify graphite-web. It's nice to keep a minimialistic console view to debug it though but I don't think dashboards are super useful these days.

@obfuscurity
Copy link
Member

obfuscurity commented Jan 27, 2019 via email

@deniszh
Copy link
Member Author

deniszh commented Jan 27, 2019

@obfuscurity : I'm not sure that we need to go that path either, just thinking out loud here.
Maybe you can share your thoughts, Jason, pretty please?

@deniszh
Copy link
Member Author

deniszh commented Jan 27, 2019

@iksaif

I think it would be nice to have something that works easilly for smalls installs but still can be extended for big installs. Ideally with the same components.

Ok, let me rephase it. "Target small to medium installations as we doing now", because for big installations whisper is not a best option IMO. But current implementation (or go-graphite) can be scaled to thousands of servers, Booking.com doing that - but if you planning something big right now I do not think yoiu should choose whisper.

Which is why I'm a bit afraid about that...

Indeed. I do not like loosing control over own code. But we can always fork.

Plus it makes it harder to share code between graphite and carbon.

IMO sharing nothing can be good in that case. We have quite big amount of code duplication in current implementation.

@obfuscurity
Copy link
Member

obfuscurity commented Jan 27, 2019 via email

@piotr1212
Copy link
Member

I've got a lot of ideas but it all depends on people having time to implement them, which I don't think anyone has, therefore I don't think it makes much sense to write them all down.

This brings me to the point of how to get Graphite more appealing to contribute to for new developers. I think making it easier to install would help. I still sometimes struggle to get a dev environment working on a clean linux install, and I've been using this for over 6 years. I can imagine it turns potential contributors off. Other thing is that there is a lot of old ugly code...

I'm not in favour of any big revolutionary changes, Graphite's strongest point imo is that "it just works" and has a large user base.

Whisper has it's downsides, but it is simple and it is still the only database backend which I have seen work stable in large installations. Only feature I was missing was automatic rebuilding of failed cluster nodes (but that would probably make it complex and less stable).

@obfuscurity
Copy link
Member

obfuscurity commented Jan 27, 2019 via email

@deniszh
Copy link
Member Author

deniszh commented Jan 27, 2019

@obfuscurity :

Let's just say that I feel pretty strongly that one of Graphite's biggest
weaknesses is around metrics naming discipline and the lack of
auth{entication,authorization} around metrics ingestion.

Well, agreed. But please note that even in current state Graphite is technically ready for OpenMertrics support. OTOH, Openmetrics doesn't have anything about auth part either...

@obfuscurity
Copy link
Member

obfuscurity commented Jan 27, 2019 via email

@deniszh
Copy link
Member Author

deniszh commented Jan 27, 2019

Yep, I just mean OpenMetrics will help with naming metrics discipline - not as a standard, but as a different way of naming metrics.

@DanCech
Copy link
Member

DanCech commented Jan 28, 2019

I've been thinking along the same lines @deniszh and had some good discussions with @Dieterbe, I just wish I had more time to dedicate to the project. The biggest thing I'd add to your list would be focusing on ease of installation and sensible defaults, I know @piotr1212 has been doing some work in that direction and ditching django would also help in that area.

@lomik I'd love to hear your thoughts/suggestions/reaction to the idea of adopting go-carbon as the official daemon

@obfuscurity
Copy link
Member

obfuscurity commented Jan 28, 2019 via email

@6d6178
Copy link

6d6178 commented Jan 28, 2019

Just want to give my opinion as a happy-ish user of the graphite/carbon stack. I started using Graphite about 1.5 years ago and it is one of the first projects that I got indepth expirience with.
What bothers me the most about the current situation is the installation, as most people seem to agree on. Not being able to install graphite without errors on the 6th go is "unacceptable". (My install script sits at 30 lines for installation from plain CentOS/pip and it still failed last time because of Django)
Which brings me to Django. Honestly I expect most people to use Grafana in front of Graphite. Maintaining the Dashboard is probably not worth a lot of time and dependency problems. So finding the easiest way to open the API should probably be the focus. (Personal opinion, may be wrong)
As for the components, I am mostly happy with performance. Haven't had issues with the performance of the relay. Whisper can be slow, but it is okay most of the time.
Whisper seems to me like a sane way to store metrics. In my expirience, if you size the whisper-storages properly, they are also not as big. And I like the fact that the databases are files and therefore easily accessible and editable. But again, maybe a personal thing.
I do like the idea to have the whole stack in one language. Graphite will probably have applications on every processor architecture. But I would also accept (not sure about prefer) that a different solution is better and therfore ditch a selfmade solution. Then again, for example, should it be carbon-relay-ng or c-carbon-relay (or maybe keep carbon-relay)?
Overall I really like the project and what it spawned with different implementations in different languages.
If I was a better programmer, I probably would have done some work on the project. But I am really not that great. I'll keep an eye on documentation and finding bugs though.

@lomik
Copy link

lomik commented Jan 28, 2019

@lomik I'd love to hear your thoughts/suggestions/reaction to the idea of adopting go-carbon as the official daemon

I don't mind. It is hard for me to support it since I already migrated all my own installations to the clickhouse stack.

But you need to do something with built in carbonserver component:

  • It is incompatible with graphite-web
  • IMHO, this is a bad idea to merge store and fetch layers in single daemon. I like the separation in classic graphite stack.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 28, 2019

I think I'm a relative outsider since I never had commit access, but my thoughts:

on "many different implementations"

I suspect that a decent amount of people that have found out about graphite, get overwhelmed because the ecosystem is so complicated. so many projects with similar names (BTW I have contributed to this problem with graphite-ng). I suspect it scares them off. (It would be nice to get more insights into whether this is merely a perceived concern or real, but not sure how. inquiries suffer from survivor bias). This notion that "the official graphite stuff is merely a reference implementation":

  1. is rarely explained, if at all. adding much to the confusion (even the frontpage of https://graphiteapp.org/ claims booking.com uses "graphite" and then goes on to list "graphite's components" which simply lists the stock python stuff
  2. is imho needlessly complicated. That's why I would seek to simplify the ecosystem to the extent possible. in particular it seems go-carbon should be merged into the official project (other ideas: get rid of graphite-ng, merge graphite-api and graphite-web, rename carbonapi to something more appropriate).

I do agree it makes sense to keep evolving the python+whisper stack - a stack optimized for small/medium scale - in a non-breaking way. And it also makes sense for a few projects to exist targeting large scale (e.g. metrictank, clickhouse, etc), though code reuse is always a worthwhile persuit

go-carbon part of official stack?

re "go-carbon is a separate project with own maintainers". Who maintains it now, if lomik left it? Do the goals of the project align with the goals of the official graphite-project? In the pursuit of simplifying stuff, just bringing the project under the graphite umbrella seems to make more sense.
Is it drop-in backwards compatible? (e.g. config syntax and feature completeness). If it contains "more new stuff" such as carbonserver that we're not sure we want to support, we can mark it as extra/experimental/unsupported, but i'm more concerned with stuff breaking if we were to switch to it

go vs python

I liked your blog post, Denis, and you point out Go scales easier vertically than python (basically goroutines vs threads), but let's also be clear that Go seems to generally perform better (do more with fewer resources). This has been shown in charts in various projects (go-carbon, carbonapi, etc), and it seems to be a very useful property with installations pushing the limits of what a "medium size installation" is.
While I wouldn't suggest spending significant efforts transforming large python code bases to go, I do think that for those pieces where the work has already been mostly done (and has been battle tested) it should be a fairly low-effort way to bring in good improvements.
Everything else about go vs python gets personal/subjective quickly (do we want to have 2 languages in the code bases, static typing vs dynamic typing, deployment model etc). Personally I think go is a great fit and a reasonable long term language choice, but this is very debatable and I think this decision should be largely made by those willing to put in the work.

committee / governance

I liked the idea of a very informal governance committee. I see it as a good way to be able to resolve difficult project decisions, such as the ones being discussed now. (and let's be frank, some of this stuff has been discussed for years)
If candidates stand up / get nominated and are voted into an odd-numbered group, at least binary decisions can be made with a simple quorum vote. We can make rules such as "a company can only be represented by one member", etc. It would bring some clarity to the decision making process.
Then again, if the current maintainers don't think there's a problem with the decision making process, then there simply is no need for a committee.

statsd

statsd suffers from an even worse form of the ecosystem chaos problem I mentioned above for graphite, in addition to be unmaintained. But almost always, statsd goes hand in hand with graphite.
I explained my thoughts in more detail at https://gist.github.com/Dieterbe/c94d5ea9e747f89e34801894a39aa68f and concluded that I think the graphite project should either adopt a statsd implementation (and it should become "the reference implementation") or even merge it into a carbon relay.

other new feature idea to make graphite more relevant again

first class support for units (track units along with timeseries, change unit strings when processed with functions, provide units in render responses so dashboards like grafana can automatically put right unit on axis labels etc)

other

  • @deniszh what did you mean with "IMO lack of flexible aggregators" ?
  • what are the downsides to carbonserver compared to our current cluster method? do we lose any functionality or desirable characteristics?

@deniszh
Copy link
Member Author

deniszh commented Jan 30, 2019

Hello @Dieterbe,
sorry, was little busy lately. Will try to answer your questions.

This notion that "the official graphite stuff is merely a reference implementation": is rarely explained, if at all. adding much to the confusion (even the frontpage of https://graphiteapp.org/ claims booking.com uses "graphite" and then goes on to list "graphite's components" which simply lists the stock python stuff

Well, that's true - but why should we care and promote 3rd party implementations if it's not officially supported and not part of a project, right? Booking was using python stuff some time ago, they completely migrated to Go stack not that long time ago.

re "go-carbon is a separate project with own maintainers". Who maintains it now, if lomik left it?

He didn't leave - just not using it in prod anymore. He's still accepting patches and making releases.

Do the goals of the project align with the goals of the official graphite-project? In the pursuit of simplifying stuff, just bringing the project under the graphite umbrella seems to make more sense.

There's no official goal exist on go-carbon page. But I can assume it has the same goal as "go-graphite" project - reimplement Graphite in Go. And that's why I'm still not convinced should we accept "go-carbon" in python Graphite.
Go-graphite project does exist, there's not much movement there lately, its goal is clear.
Maybe we could officially approve "Go-graphite" as a project and participate there, fixing documentation on how to deploy it - and then keep Python Graphite as is, even officially deprecate it if/when we reach enough compatibility.
I don't know. That's why I decided to ask the community here.
Also, carbonapi just got tag support, but TagDB still implemented only in python stack.

Is it drop-in backwards compatible? (e.g. config syntax and feature completeness).

That's another source of my hesitation. It can read whisper files and compatible with schema and aggregation files, but:

  • config is not drop-in compatible (i.e. completely different)
  • relaying is not supported
  • no blacklisting / whitelisting

Also, have some additional features:

  • Metric ingestion through Kafka and/or HTTP
  • Carbonlink-like GRPC api (aka "carbonserver")

So, if we want to adopt it we should do something with relaying - i.e. officially accept carbon-relay-ng. But I still think that's another reason to do that in "go-graphite" and not an official project.

I think the graphite project should either adopt a statsd implementation (and it should become "the reference implementation") or even merge it into a carbon relay.

Good idea

first class support for units (track units along with timeseries, change unit strings when processed with functions, provide units in render responses so dashboards like grafana can automatically put right unit on axis labels etc)

Also agreed. We can just add support of TYPE from Openmetric format.

@deniszh what did you mean with "IMO lack of flexible aggregators" ?

Well, exactly that. IMO all existing implementations are buggy and/or slow - python, carbon-c-relay and graphite-relay-ng (didn't try aggregations in NG personally, just word-of-mouths).

what are the downsides to carbonserver compared to our current cluster method? do we lose any functionality or desirable characteristics?

I'm not aware of any downsides, but TBH I didn't check its implementation thoroughly. That's old repo, before merging code to go-carbon - https://github.com/grobian/carbonserver

@lomik
Copy link

lomik commented Jan 30, 2019

There's no official goal exist on go-carbon page. But I can assume it has the same goal as "go-graphite" project - reimplement Graphite in Go

Reimplement in go is not a goal, but a means to achieve it. This is why I did not transfer go-carbon to go-graphite project.

Main goals was:

  • maximum compatibility with graphite-web and graphite-carbon. But for some features (such as relay and blacklist) I always used external aggregators and never planned to implement it
  • performance
  • easy installation (single binary instead of large number of libraries)
  • easy maintenance (single process instead of relay and multi process sharding)

At the moment I think that all these goals have been achieved. And I have no new goals.

@azhiltsov
Copy link

azhiltsov commented Jan 30, 2019

what are the downsides to carbonserver compared to our current cluster method? do we lose any functionality or desirable characteristics?

Carbonserver has nothing to do with clusters. It just returning metrics based on a query.
Clusterization is responsibility of carbonzipper on render side and carbon-c-relay on ingest side.

@azhiltsov
Copy link

Most of statsd implementations can't provide you a redundant distributed aggregations, the only one I know which give a such promise is https://github.com/avito-tech/bioyino

Another pain point is a lack of delivery acknowledgement in relay protocol. I would rather replace it with GRPC/HTTP/whatever and provide a backward compatibility on a client-facing side.

@deniszh
Copy link
Member Author

deniszh commented Jan 30, 2019

Bioyino is a great project, but not all users are in Booking.com or Avito scale, i.e. targeting distributed aggregations as a hard requirement, IMHO.

@Civil
Copy link

Civil commented Feb 1, 2019

what are the downsides to carbonserver compared to our current cluster method? do we lose any functionality or desirable characteristics?

carbonserver appeared as a attempt to implement a subset of graphite-web that's enough to server as a CLUSTER_SERVER for graphite-web. With time it evolved and also have a binary protobuf-based protocol to return those metrics. However it lacks TagDB support, however there were some attempts to implement it here: https://github.com/go-graphite/go-carbon/tree/carbonserver-tags and here go-graphite/go-carbon#1 (maybe it's worth to revive and finish them).

Carbonlink-like GRPC api (aka "carbonserver")

There is a Protobuf + GRPC support for CLUSTER_SERVER-style communication inside carbonserver (but it doesn't support tags as of now).

There is also a true carbonlink-style API with the same purpose:
https://github.com/lomik/go-carbon/blob/master/helper/carbonpb/carbon.proto

@stale
Copy link

stale bot commented Apr 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 13, 2020
@deniszh
Copy link
Member Author

deniszh commented Apr 13, 2020

BTW, I still have some proposition about Graphite future, let's un-stale this issue for now.

@stale stale bot removed the stale label Apr 13, 2020
@stale
Copy link

stale bot commented Jun 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 12, 2020
@stale stale bot closed this as completed Jun 19, 2020
@deniszh
Copy link
Member Author

deniszh commented Jun 21, 2020

OK, I just released docker image v1.1.7-3 with experimental support of go-carbon. Theoretically, go-carbon almost reached parity by features of python version (except e.g. aggregation and filtering), but absense of good alternative relay worrying me more.

@deniszh deniszh reopened this Jun 21, 2020
@stale stale bot removed the stale label Jun 21, 2020
@Civil
Copy link

Civil commented Jun 21, 2020

As far as I understand, go-carbon with carbonserver enabled still do not have any support for tags (you can write tagged metrics, but you can't read them through carbonserver, only directly + carboncache).

@azhiltsov
Copy link

azhiltsov commented Jun 21, 2020

but absense of good alternative relay worrying me more.

@deniszh, We are currently working on the new relay, check it out: https://github.com/bookingcom/nanotube

@deniszh
Copy link
Member Author

deniszh commented Jun 22, 2020

As far as I understand, go-carbon with carbonserver enabled still do not have any support for tags (you can write tagged metrics, but you can't read them through carbonserver, only directly + carboncache).

That's true, indeed. But I'm still not sure should use carbonserver and ditch out graphite clustering, at least now.

@deniszh
Copy link
Member Author

deniszh commented Jun 22, 2020

but absense of good alternative relay worrying me more.

@deniszh, We are currently working on the new relay, check it out: https://github.com/bookingcom/nanotube

Thanks, I'm aware of nanotube, pretty good project!

@stale
Copy link

stale bot commented Aug 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants