Archive for the ‘Computing’ Category

Interesting distributed computing problem

July 22, 2010

Routing items from seller to buyer for an Ebay like company. If 1 item has to be sent from 1 seller to 1 buyer, the shortest (cheapest) path can be found easily. But sending 10,000 items from 1000 sellers to 5000 buyers using the same algorithm will obviously be inefficient. Sending all the items to a central location and then distributing it is also non-optimal. Theoretically solving the problem is obviously NP-hard. We were more interested in building a real-time distributed system which does the task using limited knowledge. Given 100 distribution centers, how do they communicate between themselves to distribute huge number of items per day between different buyers and sellers. Servers can fail, two servers can potentially give conflicting information.

The system should support fault tolerance and state machine replication. Realize that making all servers agree on one routing path for x items is a Distributed Consensus problem.

Cluster Computing Magazines

November 29, 2009

Obviously there are people who know far more about cluster computing than me. I subscribe to Linux Magazine, and I manage to get just High Performance Computing resources (as a HPC Weekly) and something called Virtualization (I have yet to figure out what that is). I don’t claim that this is the best source for cluster computing, but it seems to be sponsored by IBM. It also contains far too many impressive words and acronyms that I don’t know.

I also subscribe to other things and read other sites; soon you will be asking for my entire bookmarks list, which I will share online as soon as I can figure out how to attach a file in this blog. WordPress seems to think that any file should contain an image of Katrina Kaif to get uploaded. Junta, any ideas how to share a file through WordPress? Anyway, it’s Google bookmarks, there should be some way of sharing it by googling.

OpenMP, MPI and pthreads

November 24, 2009

OpenMP and MPI are complementary.  OpenMP is used to optimize performance of a program running on a single machine with multiple CPUs, i.e an SMP.  MPI uses a message passing mechanism to divide a big problem into smaller problems. Each small problem runs on a separate machine, in parallel with the other.  It then uses a MapReduce paradigm to collect the result and solve the big problem. The advantages of the MapReduce paradigm will be explained separately in another post.

To combine them to solve a big problem, first use MPI to divide the problem into parts which can be solved on a single machine. Now to solve this single-machine problem optimally, use OpenMP to efficiently allocate the work among the multiple CPUs on that machine. This will result in a multithreaded program.

Why should we use OpenMP over pthreads? Both of them use threads to divide the work, right? Well, OpenMP enables you to specify a program at a higher abstraction than pthreads. You need not explicitly write the synchronization part of the code which is the trickiest part in multi-threaded computing.

All three, OpenMP, MPI, and pthreads are portable while internally taking advantage of the underlying OS and hardware architecture.

Of course, there are cons here. MPI is extremely difficult to learn and code and involves many low level implementation details. You may not be able to entirely substitute pthreads with MPI.  MPI parallelizes only loops; any producer-consumer kind of situation has to be explicitly dealt with using pthreads.