[IUCr Home Page] [CIF Home Page]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Permitting new physical units?



Joergen Albertsson and I have been discussing the "units problem"
of data items such as _refine_diff_density_* and _refln_F_* etc.
Joergen is here for a couple of months while I am in the faculty
office... and these discussions have been very useful in providing
a non-cif-expert sounding board on the clarity of definitions etc.

I sympathise completely the concerns of Brian and David about 
"generic" definitions, because I am involved in developments 
where linkages between definitions will complicate searches...
not significantly, I might add, if there are DDL attributes that
specify what the link is (more on that later). However, I am just
as concerned about the proliferation of data items that will follow 
if we define a new item for each quantity which differs only in the
units. AND, as an application person I have to say that I see just 
as serious problems with having to search for N names rather than 1 
to access a given data quantity, even when the units of that item
has to be deduced from another place.

So we seem to be "between a rock and a hard place" with these choices.
Joergen prefers the generic approach because he considers it to be more 
intuitive and, as he says, it will keep the number of items that 
have to listed in the Notes for Authors to an absolute minimum! I 
believe that we all have to sympathise with this human(e) view!


I would like to quickly explore both possibilities, and it will
quickly become obvious which direction I prefer.



(1) The multiple definition approach
------------------------------------

Here are the current definitions with units involving electrons.

_refine_diff_density_max
_refine_diff_density_min
_refine_diff_density_rms
_refln_F_meas
_refln_F_calc
_refln_F_sigma
_refln_F_squared_calc
_refln_F_squared_meas
_refln_F_squared_sigma
_refln_A_calc
_refln_A_meas
_refln_B_calc
_refln_B_meas
_atom_type_scat_dispersion_imag
_atom_type_scat_dispersion_real
_exptl_crystal_F_000


Its not complicated to add....

_refine_xd_diff_density_max
_refine_xd_diff_density_min
_refine_xd_diff_density_rms
_refln_xd_F_meas
_refln_xd_F_calc
...

or 

_refine_diff_density_xd_max
_refine_diff_density_xd_min
_refine_diff_density_xd_rms
_refln_F_xd_meas
_refln_F_xd_calc
...

and

_refine_nd_diff_density_max
_refine_nd_diff_density_min
_refine_nd_diff_density_rms
_refln_nd_F_meas
_refln_nd_F_calc
...

and 

_refine_ed_diff_density_max
_refine_ed_diff_density_min
_refine_ed_diff_density_rms
_refln_ed_F_meas
_refln_ed_F_calc
...

but do we really honestly want to add 48 new items!!!!!?



2. The generic definition approach
----------------------------------

Here is the current definition of 

data_refine_diff_density_
    loop_ _name                '_refine_diff_density_max'
                               '_refine_diff_density_min'
                               '_refine_diff_density_rms'
    _category                    refine
    _type                        numb
    _type_conditions             esd
    _units                       e_A^-3^
    _units_detail              'electrons per cubic angstrom'
    _definition
;              The largest, smallest and root-mean-square-deviation, in
               electrons per angstrom cubed, of the electron density in the
               final difference Fourier map. The *_rms value is measured with
               respect to the arithmetic mean density, and is derived from
               summations over each grid point in the asymmetric unit of
               the cell. This quantity is useful for assessing the
               significance of *_min and *_max values, and also for
               defining suitable contour levels.
;


These items can be easily redefined as... and this is an example for
all of the electron-dependent items listed above.


data_refine_diff_density_
    loop_ _name                '_refine_diff_density_max'
                               '_refine_diff_density_min'
                               '_refine_diff_density_rms'
    _category                    refine
    _type                        numb
    _type_conditions             esd
    _units_construct            (_diffrn_radiation_scat_units)_A^-3^   
    _units_default               e_A^-3^
    _definition
;              The largest, smallest and root-mean-square-deviation, of the
               density in the final difference Fourier map. The *_rms value 
               is measured with respect to the arithmetic mean density, and 
               is derived from summations over each grid point in the 
               asymmetric unit of the cell. This quantity is useful for 
               assessing the significance of *_min and *_max values, and also 
               for defining suitable contour levels. The units of density are
               defined in _diffrn_radiation_scat_units per angstrom cubed.
;


OK, this introduces two new DDL1 attributes... a bit scary some might say
. but I foreshadow that its only the start of many more such cross linkages 
which will be introduced in future versions of the DDLs, as Nick, Ian, John
and I work on new "methods" paradigms. Such considerations are going on now.

Everyone in this learned group appreciates, and indeed has reminded me about 
on occasions, some data items are defined in a decidedly clumsy way, 
especially for electronic parsing. So please... before launching any missiles 
in this direction for supposedly moving the definition goal posts, understand 
that adding attributes to the definition language helps preserve the current 
naming of data items and PREVENT the proliferation of new items for non-sense 
reasons... such as in this case.

Indeed the problems with the data items we are discussing here is a classical
example of what happens when the description language is not rich enough. And
as we get to understand how to improve this richness intelligently, the actual
number of data items we will need should decrease rather than increase.



I have missed an important part of this proposal. What about the single
new data item _diffrn_radiation_scat_units and the definition of the 
two new DDL items _units_construct and _units_default?

In the cif core dictionary we will need...


data_diffrn_radiation_scat_units
    _name                      '_diffrn_radiation_scat_units'
    _category                    diffrn_radiation
    _type                        char
    _list                        both
    _list_reference            '_diffrn_radiation_wavelength_id'
    loop_ _enumeration           
          _enumeration_detail    e      electrons
                                 fm     femtometres
                                 V      volts        
    _definition
;             The scattering units associated with the radiation type
              as defined in _diffrn_radiation_probe.
;


and in the DDL1 dictionary...


data_units_construct
    _definition
;              String of characters specifying the construction of the units
               of the defined data item. The construction is composed of two 
               entities:
                  (1) data names
                  (2) construction characters
               The rules of construction conform to the the regular expression
               (REGEX) specifications detailed in the IEEE document P1003.2
               Draft 11.2 Sept 1991 (ftp file '/doc/POSIX/1003.2/p121-140').
;
    _name                      '_units_construct'
    _category                    units_construct
    _type                        char
    _example                   (_cell_length_unit)^3^ 
    _example_detail            'the units for _cell_volume'


data_units_default
    _definition
;              A unique code which identifies the default units of the defined 
               data item. This should be used if the value cannot be specified 
               from the value of _units_construct.
;
    _name                      '_units_default'
    _category                    units
    _type                        char
    _example                     e_A^-3^
    _example_detail             'electrons per angstrom cubed'
    




OK, the above does require more intelligent parsing but this had to come.
Joergen considers it much better than the proliferation alternative... and
so do I (:->). Seriously though, we really do need to address the immediate
and long term consequences of such proliferation. I want the existing data
names to remain active, AS DO ALL OF THE OTHER PRESENT SOFTWARE DEVELOPERS!
And since most don't use the DDL in the definitions actively, the proposed
new attributes won't inconvenience them. Its the next generation of cif
developers that we need to look out for and thats who a richer definition
language should be aimed at. Herbert Bernstein and John Westbrook may not be 
too chuffed by this proposal at first sight but knowing them they will turn 
it into an opportunity for even better cif tools.

Whatever happened to the day of rest?

Cheers, Syd.


[Send comment to list secretary]
[Reply to list (subscribers only)]


Copyright © International Union of Crystallography

IUCr Webmaster