Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation: three-tier architecture #41

Open
jar398 opened this issue Dec 13, 2016 · 0 comments
Open

Recommendation: three-tier architecture #41

jar398 opened this issue Dec 13, 2016 · 0 comments

Comments

@jar398
Copy link
Member

jar398 commented Dec 13, 2016

Writeup as requested by @kcranston

In designing a possible integration of phylesystem-api and otindex, I recommend following the multitier idea (https://en.wikipedia.org/wiki/Multitier_architecture), other things being equal. I used to just think that this was just computer industry BS, but have come to see the logic behind it.

"a client–server architecture in which presentation, application processing, and data management functions are physically separated"

For Open Tree, the data management functions are:

  1. the github repo clone access and update functions that are managed by peyotl
  2. the supplementary uploaded file set currently managed by the webapp

Application processing includes all of our cache-like databases and the web services on top of them: OTI (or otindex), taxomachine, treemachine, parts of phylesystem-api (?), conflict service. Note that as currently imagined otindex does not do data management; it is just a cache.

Presentation of course is the webapp.

Data management is characterized by being centralized and uncached. (Of course there is such a thing as a distributed database but we are nowhere close to making use of such a thing.) It is reponsible for updating the data, not just reading it. The physical instantiation of the data management tier is unreplicated. It wants to be as lean as possible because it is going to be hit a lot and there are limited opportunities for making it faster.

The application processing (API) can be replicated, since it is not in the business of taking care of the 'truth' of the data (update, consistency, and so on). It has caches of the data - but that's a completely different story. Caching is not data management, it is data use (application processing).

The key word here is physically separated. 3-tier does not imply that the code be in separate repos. It just says the three functions should be physically separate, once deployed.

The advantages are not just in performance (replication) but in robustness - you can crash, reboot, test etc. application processing servers without threatening the data management server (and therefore the data itself).

I think it would be nice if 1 and 2 were eventually on the same server although our file upload set it write-once so it is considerably less sensitive than the phylesystem. (I'll make a tandem opentree issue.)

This is just a recommendation. If we decide not to do it, nothing much will change; all we do is make it harder, down the road, to replicate the application processing logic, should we ever want to or need to. That is, by intermixing data management and application processing in new code, we miss an opportunity to clean up the architecture, and we run up technical debt.

I'm not saying replication should be implemented. I'm just saying that it might be wise to avoid design decisions in new work that make 3-tier harder in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant