Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collaborative cloud computing architecture (backend) #273

Closed
StefanKarpinski opened this issue Dec 2, 2011 · 12 comments
Closed

collaborative cloud computing architecture (backend) #273

StefanKarpinski opened this issue Dec 2, 2011 · 12 comments
Labels
speculative Whether the change will be implemented is speculative

Comments

@StefanKarpinski
Copy link
Member

Here's a possible architecture that's simple and will scale well:

The top half of the diagram is a pretty standard scalable web stack design, with shared-nothing stateless web servers fronted by load balancers that just map HTTP requests randomly to the web servers. The databases and memcached instances are for persistent and transient shared non-computational state: user names, what sessions to map users to, which julia session servers are hosting those sessions, what the state of a session is in case someone wants to join midway though, including input and output history. Chat and other non-computational add-on services are also implemented entirely in the upper half of the diagram. The strict separation between the stateless part, the traditional non-computational state, and the computational state is the key feature of this architecture.

Terminology

  • A session is a single distributed Julia computation in which multiple users may participate.
  • A user is a single browser/person connected to one or many ongoing computation sessions.

Note that the relationship between sessions and users is many-to-many: a computation session may have many users participating in it and a user may be participating in multiple computation sessions at once (through multiple browser windows).

Architecture Details

Users talk to a different, random web server every time — it doesn't matter which one. The user and session metadata in memcache and the databases is used to map them to the same julia session server every time. This design means that if anything goes wrong with a web server or a load balancer, you just depool it and carry on. If a julia session server goes down, we're kind of screwed, but we need to come up with a fault tolerance story on that end anyway. At least with this design, if something in the web stack goes wrong, it doesn't affect anything else and is simple to fix (depool, reboot, spawn new web servers and/or load balancers).

The load balancers and the web servers talk HTTP/HTTPS to the outside world and talk memcache protocol and appropriate database protocols to the user and session state servers. There should probably be a very simple query response protocol between the web servers and the julia session servers. The julia session servers talk to the julia compute nodes using julia serialization and communication protocols that already exist and work. We should push everything but actual julia computation itself into the webs: if someone joins late and needs to know what's going on in the session, that should be cached in the databases and memcached servers and be served to them by the web stack — that kind of thing never hits the julia system. Only when someone actually requests that a new computation be done do we have to go to the julia session servers.

The bottom half is where the computation happens. The julia session servers have one process per session — and multiple users can get mapped to the same session — that's how the collaboration happens. Each server can host multiple sessions by having multiple processes, each corresponding to a single session. The session servers in turn farm data and work out to the compute nodes. Compute node processes belong to at most one session: if a compute node is going to do work for more than one session, which is entirely possible, then it will have at least one process for each of them; it may be beneficial to have multiple processes working for a single session on the same server in order to better utilize multicore machines.

Keep in mind that initially we'll have one/zero of a lot of things in this diagram: load balancers (can be zero initially), web servers (can be one), database nodes (one) and memcached nodes (one). Moreover, several of these things can live on the same machine easily. However, this design would let us easily scale up to serve arbitrary traffic and it's not much harder to build this design than something else that won't scale well. I don't much care what the web servers run, but something standard like ruby on rails + apache seems like a reasonable way to go (please no PHP). The web app itself should be fairly simple. The database should probably be mysql since that's kind of the industry standard and it supports simple, reliable master-master replication so you can get fault tolerance just by having two database nodes that mirror each other. No need to worry about anything like sharding at this point — it's way too early for any of that. For the memcahce nodes, writes should go in parallel to all of them and reads come from one at random. That's a simple scheme and works well in practice.

@ghost ghost assigned stepchowfun Dec 2, 2011
@StefanKarpinski
Copy link
Member Author

Assigned to @boyers to be cheeky. Obviously, this is a big thing that we'll all have to work on.

@stepchowfun
Copy link

Sweet, I have a ticket!

@JeffBezanson
Copy link
Member

It's like an entire business plan inside an issue :P

@StefanKarpinski
Copy link
Member Author

Hey! This isn't at all like a business plan — it has actual meaningful content!

@ViralBShah
Copy link
Member

We can treat this as an umbrella issue. A few months from now, an epic checkin will close it!

@tautologico
Copy link
Contributor

The image link is broken.

@StefanKarpinski
Copy link
Member Author

@tautologico: fixed. Thanks for the heads up.

@xianyi
Copy link

xianyi commented Aug 24, 2012

Hi,

What's the status of this feature? I'm very interesting in Julia and cloud computing.

Thanks

Xianyi

@StefanKarpinski
Copy link
Member Author

Non-existent. This was pretty much a design document.

@xianyi
Copy link

xianyi commented Aug 24, 2012

Hi @StefanKarpinski ,

We may obtain the funding to do some works in cloud computing. If we get the funding, we will improve Julia about this feature.

Xianyi

@StefanKarpinski
Copy link
Member Author

Excellent, let's keep in touch about that. We're working on / discussion similar things.

@ViralBShah
Copy link
Member

The ipython work addresses this, and the scope of this issue is too wide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests

6 participants