About UDMF again...

Logan MTM · Post by **Logan MTM** » Mon Jan 25, 2010 14:11

This is the Map i'm making ( 5% ) for "20 Years of Doom" project:

Now, there is the problem:

Spoiler:

If the TEXTMAP size is 1,28 MBs about 5%, that means it'll get (+-) 28~30 MBs about 100%!

Now the Math:

MegaWad with 32 Maps x 30 MBs per Map = 960 MBs of TEXTMAPs!?!?!

That is right?

Post by **Gez** » Mon Jan 25, 2010 14:30

Yes. The textmap format takes a lot more space than the binary format.

However, the bright side is that it compresses pretty well. If you don't make a megawad but a "megapack" (pk3 or, better yet, pk7), where each map wad is compressed, it shouldn't take that much more space than with Hexen format.

Post by **Graf Zahl** » Mon Jan 25, 2010 17:01

UDMF compressed is ca 20-25% larger than binary compressed. There's really no need to release anything with UDMF in WAD format as it's ZDoom only.

milasudril · Post by **milasudril** » Mon Feb 01, 2010 20:03

I agree that text format doom maps is a bad idea. Text format is for human-computer interface where the human wants to use a text based interface. But who want to use a text editor to "draw" a Doom map? It is quite obvious that it is done much better with a map editor where you can see what you are doing. In this case a binary format is much better. It is easier to read (in some cases you can just read an entire object from a file without any conversion) and write. And we should not speak about what a wave file (or probably even worse, a high resolution picture) would have looked like if it used XML:

Code: Select all

<frame time="123454">
    <sample channel="left" value="123" />
    <sample channel="right" value="123" />
</frame>

Is there any reason why is UDMF a text based format?

Post by **Graf Zahl** » Mon Feb 01, 2010 20:11

The reason is extensibility. In a text format you can add an unlimited amount of new properties. A binary format will forever be locked to what it was designed to do at the time it was created.

And now take one guess why definition formats like XML have become so popular that even word processors are using it as the main format to store their data. Yes, both MSOffice and OpenOffice have ditched their binary formats in favor of a text file representation, too.

Binary formats are a dead end everywhere where feature sets are evolving and need adjustment in the stored data. (No, make that: Binary formats are a dead end. Period.)

milasudril · Post by **milasudril** » Tue Feb 02, 2010 9:49

Graf Zahl wrote:The reason is extensibility. In a text format you can add an unlimited amount of new properties. A binary format will forever be locked to what it was designed to do at the time it was created.

You can use a class ID and size for all classes. This two fields are added at the beginning of every record type.

Code: Select all

struct Foo
{
unsigned int classID;
unsigned int size; //or unsigned long long int if really large objects are needed
//fields of Foo

};

The only problem left now is to standardize class ID:s but that is a problem with any file format.

Binary formats are not dead. Many of the commonly used media file formats are binary: JPEG RIFF-WAVE MP3.

Text format works probably well when used to describe text because much of the data stored is text. Therefore it works well for describing formatted documents too. But for a large set of numbers it is a very bad idea. It is important to note that if program a writes a floating point number in decimal and then program b reads that number, there is a risk of precision loss.

XML stores the name of each field for each record which is the worst with this file format. If I want to use a text format to store a larger data set I can use csv format, which do not repeat this information, instead.

Post by **Gez** » Tue Feb 02, 2010 10:30

milasudril wrote:XML stores the name of each field for each record which is the worst with this file format. If I want to use a text format to store a larger data set I can use csv format, which do not repeat this information, instead.

And then, what if you need to enhance a field with an additional, optional parameter, how does CSV fare?

Repeated text is not a real problem as soon as you bring in compression.

Sure, they could have used the same binary structure as the Hexen format, just with 64-bit words instead of 8-bit or 16-bit words. Maybe with a few additional fields as well, such as lineid. Woo. Limits are gone, cool. How would it help? What if a port introduces specials with 12 parameters, instead of just 5? The format has to be redefined for this port. Or you've got to go through some cumbersome workaround, like Hexen's Line_SetID or Eternity's ExtraData mechanism.

With UDMF, it's not a problem. You just list the additional params, the same format is used. Any UDMF-capable editor supports your added values without having to update its code and rebuild it.

About precision loss: it's not greater than that from fixed point.

Post by **Graf Zahl** » Tue Feb 02, 2010 11:08

Gez wrote: Sure, they could have used the same binary structure as the Hexen format, just with 64-bit words instead of 8-bit or 16-bit words. Maybe with a few additional fields as well, such as lineid. Woo. Limits are gone, cool. How would it help?

For the record, before UDMF there were such attempts. I strongly opposed all of them for the sole reason of the mess involved in them. Some cooked up some horrendously convoluted schemes of metadata and optional fields that just made my head hurt.

Fortunately, in the end all of these attempts died quickly without ever being heard from again. The sad thing is that without a certain person (Deep, a.k.a randomlag) we would have had something like UDMF much earlier but he persistently sabotaged the discussion so that it went nowhere - the probable motivation that it would have meant a lot of work for him on DeepSea...

Back to the text vs. binary approach:

@milasudril: You clearly have no idea what you are talking about. If you design a new format for *anything* that might need future expansion you simply cannot afford to think in such minimalistic terms. It is of paramount importance that such format is open ended should the need for something new arise. There has to be some means to add this new stuff.

Any binary format is by definition out of the picture here because by its very nature it's not expandable.
CSV is also useless because there's no direct association between a property and its value. This is clearly saving space at the wrong place.

And in the end: What does it matter? Yes, UDMF maps can easily become 20 MB of raw text but we are talking about monstrously huge and complex maps here that you shouldn't even bother starting on a machine with less than 1GB of RAM. The memory footprint may sound enormous but let's not forget that maps are normally loaded at a time when no other resources are in memory. So right after the map data gets deleted the engine starts to load textures and other stuff which normally require 3-4 times as much. As an example, KDiZD's Z1M10 is approx. 10 MB of USMF map size but fully initialized with all textures you need 50MB of RAM to load all the data that's needed to play this map. It wouldn't even work on systems that have problems loading the 10MB textmap lump.

As for distribution of such maps, any port which supports UDMF also supports loading Zips/PK3's as WAD replacement so the only real issue in the end is compressed file size - and there UDMF is 1.3 - 1.5 times the size of the binary format. To me that's a non issue compared to the gain in flexibility.

milasudril · Post by **milasudril** » Tue Feb 02, 2010 15:32

Gez wrote:How would it help? What if a port introduces specials with 12 parameters, instead of just 5?

That is why the all structs should include classID and size. For example: The first version introduces fields

Code: Select all

int foo;
int bar;

Now, the size of this structure is 8 byte. If the new version wants more information the size field just is increased. Perhaps it now looks like this:

Code: Select all

int foo;
int bar;
int baz;

and so on. BTW this technique is used frequently in Windows API

milasudril · Post by **milasudril** » Tue Feb 02, 2010 15:37

Graf Zahl wrote: Any binary format is by definition out of the picture here because by its very nature it's not expandable.
CSV is also useless because there's no direct association between a property and its value. This is clearly saving space at the wrong place.

Use a header that tells what kind of that comes and in what order.

Post by **Graf Zahl** » Tue Feb 02, 2010 16:38

Thank god I don't have to work with you. You absolutely don't get it, do you? I wouldn't want to work with such a messy format - ever!

Now take one guess why formats like XML, JSON or any other comparable format do not do such nonsense as you suggest.

Just a reminder: We no longer live in an age where file size is the most important factor when defining a data format.

Post by **Gez** » Tue Feb 02, 2010 17:20

milasudril wrote:
Gez wrote:How would it help? What if a port introduces specials with 12 parameters, instead of just 5?
That is why the all structs should include classID and size. For example: The first version introduces fields
Code: Select all
int foo;
int bar;
Now, the size of this structure is 8 byte. If the new version wants more information the size field just is increased. Perhaps it now looks like this:
Code: Select all
int foo;
int bar;
int baz;
and so on. BTW this technique is used frequently in Windows API

So, basically, each field needs to be introduced by a text field that tells its name and size. Great! You just introduced the multiple redundancy you wanted to avoid.

milasudril wrote:Use a header that tells what kind of that comes and in what order.

It wouldn't help any for additional, optional parameters. You add a field to something, and you have to reformat entirely all your CSV files before they can use this new field...

Anyway, the ship has sailed. The UDMF specs are defined and approved. There's a level editor and several ports that support it, plus various miscellaneous tools, and more coming. If you don't like it, tough luck for you.

milasudril · Post by **milasudril** » Tue Feb 02, 2010 17:53

Graf Zahl wrote:Just a reminder: We no longer live in an age where file size is the most important factor when defining a data format.

Just a reminder: everything that can be optimized should be optimized... The storage format may have a significant impact on load time. Decompress, dynamic string allocation, parse ugh...

milasudril · Post by **milasudril** » Tue Feb 02, 2010 18:10

Gez wrote: So, basically, each field needs to be introduced by a text field that tells its name and size. Great! You just introduced the multiple redundancy you wanted to avoid.

No, the description is NOT stored in the file. It is probably found in some .h file. For an example of a file format using this technique look at the bitmap file header. Version 5 is such an extension to version 4, that is such an extension to version 3. The OS2 version differs, yes.

Gez wrote:It wouldn't help any for additional, optional parameters. You add a field to something, and you have to reformat entirely all your CSV files before they can use this new field...

No, the old one use the old header. The newer file format uses the new header. The header solves the problem. An mathematical analogy

0=(x-a)(x-b)=...

if i expand this parenthesis I clearly waste time to solve this equation.

Anyway, the ship has sailed. The UDMF specs are defined and approved. There's a level editor and several ports that support it, plus various miscellaneous tools, and more coming. If you don't like it, tough luck for you.

If there were more than one way to represent the same kind of data, I would have used file format plug-ins. And perhaps I will sink that ship...

Post by **Graf Zahl** » Tue Feb 02, 2010 18:14

Sorry but LOL!

We should be long past the times where everything should optimized for performance. I'd rather optimize for maximum usability rather than use some bastard format that's neither binary nor flexible.

As for load times: Insignificant. I did some tests with an UDMF'd version of KDiZD's Z1M10 which to this date is the largest map available and the load times for the textmap is a) far less than a second even if uncompressed and b) insignificant in relation to all the things that need to be done to start the map (meaning: setting up internal data structures, spawning actors, loading textures, caching sounds and whatever else is needed.

Gez wrote:Anyway, the ship has sailed. The UDMF specs are defined and approved. There's a level editor and several ports that support it, plus various miscellaneous tools, and more coming. If you don't like it, tough luck for you.

And that of course. Just a reminder: The people who designed this format were the ones who have been/are going to be the ones working most closely with the new format and all its implications so you should at leasr assume that they knew what they were doing. Where was your input when it happened? (Not that it'd matter because you'd probably have been laughed at or ridiculed for your ideas.

DRD Team

About UDMF again...

About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...

Re: About UDMF again...