Discussion:
docstrip : catcode 128 to 255 set to 12 in new version 2.5g (2018-05-03)
(too old to reply)
GL
2018-11-07 23:27:10 UTC
Permalink
Hello,

A change has been made to docstrip in version 2.5g (2018-05-03) :

the macro \readsource implements a loop to set every catcode
between 128 up to 255 to the value 12 (='other')

I used to set some of those catcode to 9 (='ignore') in order
to highlight the .dtx source in my editor, such characters beeing
ignored when \read is performed inside the \readsource macro.

For example, I used the character ¤ (with \catcode 164 = 9 ) to
highlight the .dtx source. But this character will be ignored by \read
and will not be present in the docstrip \output.

The documented source says :
" To avoid any UTF-8 handling of characters "
" we set code points 128{255 to other. "


It could be interesting to filter such assignments like in :

\ifnum 9=\catcode\@tempcnta
\else \catcode\@tempcnta 12\relax \fi

And a discussion could occur about which catcode have to be filtered
(trivially, the "category code 9 = ignore " should be ignored and not
set to 12, but my current opinion is that it should be the only
category 9...)

But such a discussion is beyond the scope of my post.

Thanks for any of your help / answer.

GL
d***@gmail.com
2018-11-08 09:02:02 UTC
Permalink
Post by GL
Hello,
the macro \readsource implements a loop to set every catcode
between 128 up to 255 to the value 12 (='other')
I used to set some of those catcode to 9 (='ignore') in order
to highlight the .dtx source in my editor, such characters beeing
ignored when \read is performed inside the \readsource macro.
For example, I used the character ¤ (with \catcode 164 = 9 ) to
highlight the .dtx source. But this character will be ignored by \read
and will not be present in the docstrip \output.
" To avoid any UTF-8 handling of characters "
" we set code points 128{255 to other. "
And a discussion could occur about which catcode have to be filtered
(trivially, the "category code 9 = ignore " should be ignored and not
set to 12, but my current opinion is that it should be the only
category 9...)
But such a discussion is beyond the scope of my post.
Thanks for any of your help / answer.
GL
This was cross posted to
https://github.com/latex3/latex2e/issues/87
and as I comment there, if you assume files are in UTF-8 then there isn't really a possibility to change catcodes for characters above 128 as they are not single byte characters.

As docstrip is "just" writing verbatim it can avoid knowing the file encoding if it just passes all bytes through (if it is not discarding the whole line) but otherwise would need some input and output encoding declaration mechanism.

It is still safe to do this for characters that are single byte in all relevant encodings, docstip.dtx itself makes the catcode of Z = 9 in some cases.

David

Loading...