The University of Arizona’s iPlant Collaborative’s community of scientists, developers and educators has received a $3.8 million National Science Foundation grant to fund the development of a general-purpose data storage platform. The platform can be used to store and manage huge amounts of data being generated by projects such as sequencing the whole human genome. Nirav Merchant of the iPlant Collaborative and Arizona Research Laboratories will tell us more.
TED SIMONS: Tonight's edition of Arizona technology and innovation looks at new ways to store and maintain huge amounts of data. To that end, the University of Arizona is working to develop a general purpose data storage platform. Here to explain is Nirav Merchant, he's co-principal investigator of the U of A's I-plant collaborative. Good to see you again. Thanks for joining us.
NIRAV MERCHANT: Thank you for having me.
TED SIMONS: You bet. I-plant collaborative, what is that?
NIRAV MERCHANT: A platform for the next generation of large scale analytics. A global platform used by many disciplines, and users who cannot operate on their own laptops, desk tops, science scaled beyond what they can manage, they come to I-plant to solve that problem.
TED SIMONS: A $3.8 million grant from the national science foundation for what?
NIRAV MERCHANT: Grant along with my colleague who couldn't be here, and John -- Netflix, the experience on Netflix is pleasant because the movie starts right away. Science data and you want to analyze it, you have to wait for a while. You don't change a Netflix movie, edit it and send it back to Netflix. In our case, edit and change the data and send it back to us. How do you manage that many movies coming back to you?
TED SIMONS: Indeed, it's one thing to download Citizen Kane or something like that, but what you're talking about, the goal posts are moving, to mix metaphors a little bit.
NIRAV MERCHANT: Absolutely, we are talking of scales beyond a BLU-ray DVD, so these are massive data sets and it takes a long long time and people don't want to wait.
TED SIMONS: When we're talking about huge amounts of data, how huge?
NIRAV MERCHANT: So we've already approached what is called a peta scale so the average user now works with 10 to 20 TERABYTES in our space, and that is about a few hundred movies at a time.
TED SIMONS: How big is the hardware for something like this?
NIRAV MERCHANT: That's actually a very good question. Hardware is large, but it is highly distributed. It is spread about six institutions, and you have the pieces of hardware that is highly distributed. These are larger but they are all over the U.S. and even outside of the U.S.
TED SIMONS: My goodness. I'm guessing the cloud is king here, correct?
NIRAV MERCHANT: Yes, absolutely.
TED SIMONS: How does that work? Basically the different spots around the country, basically communicate through the cloud?
NIRAV MERCHANT: When you reach us, it finds out where you're coming from, and then positions your data the closest to you. If you are coming from Texas, it realizes you are coming in from Texas and your data shows up there before you want a piece of it.
TED SIMONS: When you are doing the human genome, something that takes incredible amounts of data and changes all of the time, have to be on top of it, manage it, storage it, how do you make sure it is secured?
NIRAV MERCHANT: So that's what we use different layers of encryption to make sure that when data is moving around, it is in a form that nobody else can snoop on it and also we do that because we want to be sure when you get a copy of it, it is the full copy because when you work with that amount of data, sometimes a few BYTES and bits get dropped off. That is not a good thing. You ensure that the data is complete at the same time it is encrypted.
TED SIMONS: What happens if we come back and my genome is missing a Gene, something fell off, how do you find that data?
NIRAV MERCHANT: A good question, a complicated process, it has been dropped off, reprocess it and ask for the data again. It recompares, yes, something was missing and now you have the full copy.
TED SIMONS: This is called the syndicate?
NIRAV MERCHANT: Yes.
TED SIMONS: How does the syndicate change from pre-syndicate storage and management days?
NIRAV MERCHANT:: If you just take a step back and look at how we get data today, it is from point A to point B. There is no other place in between that is buffering it or keeping it for you. So, when people wanted that data, they always came to the source. Now you have multiple copies of that data throughout the cloud and so you get the closest copy and the moment you ask for more, we start sending the other pieces that we anticipate you will need.
TED SIMONS: : Interesting. Again, not only you, but that person over there and that person over there can make the same questions and get the same kind of results.
NIRAV MERCHANT: Absolutely. And make changes on the machine and send them back and let each other know what happened.
TED SIMONS: You will know the changes as they're happening.
NIRAV MERCHANT: Absolutely. As closest to real time as we can get.
TED SIMONS: Again, we're talking about massive data projects, human genome is one thing, but there are a variety of things that take up a lot of space, correct?
NIRAV MERCHANT: Absolutely. We talk of human genome, but the plant genome is even bigger in the complex.
TED SIMONS: I bet it is. All right you're done with that, can you erase this data?
NIRAV MERCHANT: Absolutely, you know where the source is and it -- you are not required to keep a full copy because there are already multiple copies of the data throughout the infrastructure. You can ask for it back. Much like watching a movie on Netflix, pick it up where you left it.
TED SIMONS: The content distribution network, is that what that is called?
NIRAV MERCHANT: Yes, but as I said before, content distribution network are good at pushing data to you one way. We are doing it both ways. You make the change, it comes back to us and everybody else. We redistribute that.
TED SIMONS: So is this up and operational now? What time line are we looking at?
NIRAV MERCHANT: Looking in the next six months to go live with it. We just got started this month.
TED SIMONS: Still working on it but pretty sure we will see this soon.
NIRAV MERCHANT: Absolutely, and I hope to be back to talk more about it.
TED SIMONS: We will try to get you back. Thank you for joining us. We appreciate it. Thursday on "Arizona Horizon," we'll look at whether or not it's time to increase the gas tax. And it's constitution day. We'll mark the occasion by discussing the history of the bill of rights. That's at 5:30 and 10:00 on the next "Arizona Horizon." That is it for now. I'm Ted Simons. Thank you for joining us. You have great evening.
VIDEO: "Arizona Horizon" is made possible by contributions from the friends of eight, members of your Arizona PBS station. Thank you.
Nirav Merchant :Member of the iPlant Collaborative and Arizona Research Laboratories