[aspect-devel] [deal.II] Duplicate Vertices

Tue Jan 15 10:35:03 PST 2013

On Jan 14, 2013, at 7:34 PM, Wolfgang Bangerth wrote:

> 
> Eric,
> 
>> - Create a class named "DataOutFilter" to hold vertices, cells, and
>> data values associated with an output file.  This will have flags
>> regarding whether to remove or average duplicate values on the same
>> vertex, merge vertices within certain distances, etc.
>> 
>> - Other output classes (VtkStream, HDF5MemStream, etc) can be
>> rewritten to utilize this for internal storage, but for now I'll just
>> alter HDF5MemStream since that's what I'm using for large data sets.
>> I imagine some output formats aren't commonly used for large results
>> (e.g. gnuplot) so there's little reason to change them.
> 
> How would the output writers use this DataOutFilter?

Initially, I was planning for each writer to put the data points passed to it in the filter, then actually write all output at once after flush() is called.  If we can put this functionality in DataOutBase::write_* then it will simplify things.

> Here's a different idea: I don't think it's important to be backward compatible if we can come up with something that's more compact as long as the default keeps the same logical information (i.e., it's ok to compress the data by throwing away vertices that are co-located and have the same data).
> 
> What I would suggest is the following:
> 1/ In DataOutBase, create a function or class that when given a std::vector<Patch> evaluates which data points are at identical location and have identical data.
> 2/ The function would have an interface along the following lines:
> 
>      /**
>       * Compute which data points described by the vector of Patches
>       * are truly unique and which are in fact duplicates of another
>       * data point.
>       *
>       * The returned vector has the following meaning: For each data
>       * data point in each patch, consecutively enumerated, the entry
>       * is either a unique number, counted from zero, that indicates
>       * the how-many-th unique (i.e., prototypical) data point this
>       * is, or numbers::invalid_unsigned_int if the point is
>       * identical to another one (i.e., it is either identical to a
>       * prototype.
>       */
>      std::vector<unsigned int>
>      compute_data_point_identities (const std::vector<Patch> &);
> 
> The various write_*() functions can then call this function at the top and make use of its result by only writing the prototype data points. In essence, this will require:
> - Simply skipping over non-prototypical data points with index
>  numbers::invalid_unsigned_int
> - Outputting the prototypical data points, renumbering them as
>  returned by the function above.
> The modifications to the individual write_*() functions shouldn't be too difficult to implement, and we can tackle them one by one, as desired.

Maybe I don't fully understand this, but it seems like it would have problems renumbering the cell nodes and handling multiple data values at a single point.  As an example, suppose we have a mesh with cells A & B, and points 0, 1, 2, 3:

+--A--+--B--+
0       1,2     3

Then the vector returned by this function would be [0, 1, invalid, 2].  However, when you output cell B, it will consist of nodes "invalid" and 2, with no way of knowing what "invalid" corresponds to.  Also, if we wanted to average multiple values on a point like Guido suggested, we would need to know which points are remapped to which other points.  The DataOutFilter class would maintain structures to hold this remapping information, which would also mean we could avoid recomputing this for each of the DataOutBase::write_* functions.  It is also possible to hold these structures in DataOutBase, but that seems to violate the class abstraction and you get in trouble if write_nodes() is called for multiple meshes with a given DataOutBase.

>> In this way, deal.ii can still allow many different output formats
>> and each format shouldn't require too much modification to handle
>> duplicate vertices.  I realize the idea of storing and processing
>> values before output isn't in line with the "*Stream" name given to
>> output classes.  Is there a particular reason why these are named
>> streams, or is it just a convention to go along with the C concept of
>> streams?
> 
> The idea is to put objects in one-by-one and at the end have a result. This is really not all that different from the way std::ostream works (which is also buffered -- albeit it's a byte buffer whereas the streams in data_out_base.cc buffer more complicated objects at times).

My impression of a stream is that whatever goes in comes out in the same order possibly with some formatting changes.  I wasn't sure if maintaining this sort of order was important for the deal.ii output but it sounds like the proposed changes in ordering and redundancy should be OK.

-Eric