ESUG SummerTalk - Fuel, binary object serializer

Hi folks. I am really happy to announce that ESUG is sponsoring me for Fuel development through the ESUG SummerTalk. I am Martin Dias, a student at Buenos Aires, Argentina. The idea behind this SummerTalk is to implement Fuel, a binary, fast and general-purpose object graph serializer in Pharo<http://www.pharo-project.org/>. It is based on VisualWorks' Parcels ideas. Actually, the project has already started since several months. Tristan Bourgois and I started with the project while doing an internship with RMoD, INRIA <http://rmod.lille.inria.fr/web/pier>. Since a couple of months, Mariano Martinez Peck <http://marianopeck.wordpress.com/> joined the team, and now he is the official mentor in the SummerTalk. ESUG website for SummertTalk: http://www.esug.org/wiki/pier/Promotion/SummerTalk/SummerTalk2011 The website with all the necessary information is here: http://rmod.lille.inria.fr/web/pier/software/Fuel It even includes slides explaining the algorithm. In addition, a paper is in progress. For the moment, Fuel already provides the following features: - Fast pickle format. It is much faster to materialize than to serialize. - Correctly support class reshape (when the class of serialized objects has changed). - Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize. - Be able to completely serialize classes and traits (not just a global name). - Support cycles and avoid duplicates in the graph. - Integration to Moose <http://www.moosetechnology.org/> with an extension to export and import their models. - Detection of globals: for example if you serialize Transcript, it is not duplicated and instead managed as a global reference. - Solve common problems like Set rehash. - Buffered writing: we use a buffered write stream for the serialization part (thanks Sven!). - No need of special support from the VM. - Try to have a good object oriented design. - Well tested (about 120 tests, for the moment). - Large set of benchmarks (even benchmarks for Moose extension). And of course, there are a lot features for the future. You can see some of them in the website and some in the issue tracker: http://code.google.com/p/fuel/issues/list We really appreciate all kind of feedback and comments. If you want to try it, check in the website how to do it. It is extremely easy. Once again, I want to thank a lot to ESUG for sponsoring the project. I plan to create a "news" section in the website with some RSS. I will keep you informed. Best regards, Martin

2011/5/24 Martin Dias <tinchodias@gmail.com>:
Hi folks. I am really happy to announce that ESUG is sponsoring me for Fuel development through the ESUG SummerTalk. I am Martin Dias, a student at Buenos Aires, Argentina. The idea behind this SummerTalk is to implement Fuel, a binary, fast and general-purpose object graph serializer in Pharo. It is based on VisualWorks' Parcels ideas.
Please excuse me if this is the wrong forum to discuss things.
Actually, the project has already started since several months. Tristan Bourgois and I started with the project while doing an internship with RMoD, INRIA. Since a couple of months, Mariano Martinez Peck joined the team, and now he is the official mentor in the SummerTalk.
ESUG website for SummertTalk: http://www.esug.org/wiki/pier/Promotion/SummerTalk/SummerTalk2011
The website with all the necessary information is here: http://rmod.lille.inria.fr/web/pier/software/Fuel It even includes slides explaining the algorithm. In addition, a paper is in progress.
Could you make the slides available in some other format than Flash?
For the moment, Fuel already provides the following features:
- Fast pickle format. It is much faster to materialize than to serialize.
What has led to the conclusion that materialization is more important than serialization? I can image scenarios when there is a one to one relationship and scenarios where serialization is more important (e.g. session replication).
- Correctly support class reshape (when the class of serialized objects has changed).
So what do you do when an instance variable was added? Set it to nil and hope that everything will continue to work?
- Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize.
Really? You serialize Socket, Process, FileStream and something meaningful happens?
- Be able to completely serialize classes and traits (not just a global name). - Support cycles and avoid duplicates in the graph. - Integration to Moose with an extension to export and import their models. - Detection of globals: for example if you serialize Transcript, it is not duplicated and instead managed as a global reference. - Solve common problems like Set rehash. - Buffered writing: we use a buffered write stream for the serialization part (thanks Sven!). - No need of special support from the VM. - Try to have a good object oriented design. - Well tested (about 120 tests, for the moment). - Large set of benchmarks (even benchmarks for Moose extension).
And of course, there are a lot features for the future. You can see some of them in the website and some in the issue tracker: http://code.google.com/p/fuel/issues/list
We really appreciate all kind of feedback and comments. If you want to try it, check in the website how to do it. It is extremely easy.
Once again, I want to thank a lot to ESUG for sponsoring the project. I plan to create a "news" section in the website with some RSS. I will keep you informed.
Cheers Philippe

On Tue, May 24, 2011 at 11:39 PM, Philippe Marschall < philippe.marschall@gmail.com> wrote:
Could you make the slides available in some other format than Flash?
I second that. ...
- Fast pickle format. It is much faster to materialize than to serialize.
What has led to the conclusion that materialization is more important than serialization? I
I don't think Martin made a claim about importance, just current implementation behavior. One horse will be faster than another. Currently, the materialization horse is faster than the serialization horse.
- Correctly support class reshape (when the class of serialized objects has changed).
So what do you do when an instance variable was added? Set it to nil and hope that everything will continue to work?
Of course. One might hope that the developer that added an instance variable to a system which had existing instanciated objects would provide suitable lazy initialization as needed. There is no magic.
- Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize.
Really? You serialize Socket, Process, FileStream and something meaningful happens?
Again, no magic. One would hope an interface object that became detached from it's service would know how to reattach. ttfn, Steve -- Steve Cline cline@acm.org http://www.clines.org http://www.linkedin.com/in/stevecline "Do what's right, and try to get along with people, in that order" - Ezra Taft Benson

- Correctly support class reshape (when the class of serialized objects has changed).
So what do you do when an instance variable was added? Set it to nil and hope that everything will continue to work?
Of course. One might hope that the developer that added an instance variable to a system which had existing instanciated objects would provide suitable lazy initialization as needed. There is no magic.
Exactly. Read the answer I have just send.
- Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize.
Really? You serialize Socket, Process, FileStream and something meaningful happens?
Again, no magic. One would hope an interface object that became detached
from it's service would know how to reattach.
Exactly. Ideally, a class will be able to implement #prepareToBeSerializer and #postMaterializationAction or stuff like that. So, what we want to provide is the infrastructure, the hooks. Then, we shoud identify those base classes that MUST have something like that like the cases pointed out by Philippe For example, the same happes with Set and Dictionaries. They have to be rehashes once they are materialized.
ttfn, Steve -- Steve Cline cline@acm.org http://www.clines.org http://www.linkedin.com/in/stevecline "Do what's right, and try to get along with people, in that order" - Ezra Taft Benson
_______________________________________________ Esug-list mailing list Esug-list@lists.esug.org http://lists.esug.org/mailman/listinfo/esug-list_lists.esug.org
-- Mariano http://marianopeck.wordpress.com

Hi Philippe. I don't know why but your emails always go to spam in my account :(
Actually, the project has already started since several months. Tristan Bourgois and I started with the project while doing an internship with RMoD, INRIA. Since a couple of months, Mariano Martinez Peck joined the team, and now he is the official mentor in the SummerTalk.
ESUG website for SummertTalk: http://www.esug.org/wiki/pier/Promotion/SummerTalk/SummerTalk2011
The website with all the necessary information is here: http://rmod.lille.inria.fr/web/pier/software/Fuel It even includes slides explaining the algorithm. In addition, a paper is in progress.
Could you make the slides available in some other format than Flash?
Ok, Martin will do it and upload them directly to the website.
For the moment, Fuel already provides the following features:
- Fast pickle format. It is much faster to materialize than to serialize.
What has led to the conclusion that materialization is more important than serialization?
One of the most important uses we want to do with Fuel (in a future) is to be able to use it for Monticello (to replace mzc). The idea in addition is to be able to boostrap a really small pharo image (hetzel) and be able to load stuff without needing a compiler. I all those cases we assume that you serialize only once or few times and you materialize much more times. But of course, that does not apply to all cases, like you said about session replication.
I can image scenarios when there is a one to one relationship and scenarios where serialization is more important (e.g. session replication).
- Correctly support class reshape (when the class of serialized objects has changed).
So what do you do when an instance variable was added? Set it to nil and hope that everything will continue to work?
In that example yes. But what we meant is that right now the inst var names are bening encoded also. So at materializatio ntime we will be able to deal with them. Not all scnearios are developed right now, but the base is there. At some point, the only solution has to be done by the user. For example, you should implement #updateFrom: aVersion to: anotherVersion: anotherVersion object: anObject or something like that.... So...we meant that we store the instVar names and we support some type of changes. In fact, we have reified FLInstanceVariablesMapping. So maybe we can do something with that in a future.
- Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize.
Really? You serialize Socket, Process, FileStream and something meaningful happens?
Of course there are classes whose instances doesn't make sense once to load them back in the image. Our comment was literally: we can (ok, we should be able to) serialize them and materialize them without problem. Now....whether those instances are correct or still valid and meaningful in the current image is another problem ;) One of the future features will be to be able to implement #postMaterializationAction or something like that could be executed after materialization. Sockets could try to get a new socket from the OS, Processes could be restarted and rescheduled? I have no idea. If you have, please let us know. Thanks for the execellent questions :) -- Mariano http://marianopeck.wordpress.com

At Wed, 25 May 2011 15:28:00 +0200, Mariano Martinez Peck wrote:
One of the most important uses we want to do with Fuel (in a future) is to be able to use it for Monticello (to replace mzc). The idea in addition is to be able to boostrap a really small pharo image (hetzel) and be able to load stuff without needing a compiler.
Sounds interesting! Can I learn about "hetzel" somewhere? -- Yoshiki

Hi On Wed, May 25, 2011 at 10:28 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:
Hi Philippe. I don't know why but your emails always go to spam in my account :(
Actually, the project has already started since several months. Tristan Bourgois and I started with the project while doing an internship with RMoD, INRIA. Since a couple of months, Mariano Martinez Peck joined the team, and now he is the official mentor in the SummerTalk.
ESUG website for SummertTalk: http://www.esug.org/wiki/pier/Promotion/SummerTalk/SummerTalk2011
The website with all the necessary information is here: http://rmod.lille.inria.fr/web/pier/software/Fuel It even includes slides explaining the algorithm. In addition, a paper is in progress.
Could you make the slides available in some other format than Flash?
Ok, Martin will do it and upload them directly to the website.
Yes, sorry. I kept the embedded slides but I added links to the pdf "sources".
For the moment, Fuel already provides the following features:
- Fast pickle format. It is much faster to materialize than to serialize.
What has led to the conclusion that materialization is more important than serialization?
One of the most important uses we want to do with Fuel (in a future) is to be able to use it for Monticello (to replace mzc). The idea in addition is to be able to boostrap a really small pharo image (hetzel) and be able to load stuff without needing a compiler. I all those cases we assume that you serialize only once or few times and you materialize much more times. But of course, that does not apply to all cases, like you said about session replication.
Actually, despite we have this hypothesis (de-serialization is done much more often than serialization), I often think it is not something really essential to the project. It was an early decision to base on the Parcel's algorithm, which has this feature, but I believe we can eventually implement an alternative strategy that favors serialization performance over deserialization. I think other more essential features (or goals!) are: - any object can be serialized. - binary class load, without compilation. - fast and focused on just one dialect, don't worry about an inter-platform format. - flexibility, for selecting the graph to serialize.
I can image scenarios when there is a one to one relationship and scenarios where serialization is more important (e.g. session replication).
- Correctly support class reshape (when the class of serialized objects has changed).
So what do you do when an instance variable was added? Set it to nil and hope that everything will continue to work?
In that example yes. But what we meant is that right now the inst var names are bening encoded also. So at materializatio ntime we will be able to deal with them. Not all scnearios are developed right now, but the base is there. At some point, the only solution has to be done by the user. For example, you should implement #updateFrom: aVersion to: anotherVersion: anotherVersion object: anObject or something like that.... So...we meant that we store the instVar names and we support some type of changes. In fact, we have reified FLInstanceVariablesMapping. So maybe we can do something with that in a future.
I want to let the user configure an initialization block.
- Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize.
Really? You serialize Socket, Process, FileStream and something meaningful happens?
Of course there are classes whose instances doesn't make sense once to load them back in the image. Our comment was literally: we can (ok, we should be able to) serialize them and materialize them without problem. Now....whether those instances are correct or still valid and meaningful in the current image is another problem ;) One of the future features will be to be able to implement #postMaterializationAction or something like that could be executed after materialization. Sockets could try to get a new socket from the OS, Processes could be restarted and rescheduled? I have no idea. If you have, please let us know.
Yes... maybe I was too much optimistic with this "ANY". I think you can serialize and deserialize instances of these classes but probably they need something to be "meaningful".
Thanks for the execellent questions :)
Thanks! Martin
-- Mariano http://marianopeck.wordpress.com
participants (5)
-
Mariano Martinez Peck
-
Martin Dias
-
Philippe Marschall
-
Steve Cline
-
Yoshiki Ohshima