Discussion:
Vietnamese with unicode?
(too old to reply)
unknown
2006-05-22 19:33:16 UTC
Permalink
I'm paying someone to do some Vietnamese translation work for me, and
he's not familiar with tex, so I'm trying to do the technical homework.
I would prefer to use unicode for the whole project, since it's more
likely to age gracefully and play nicely with other technologies besides
tex. However, the vntex package only seems to support legacy encodings
such as viscii and tcvn. If I try the obvious thing,

\documentclass{book}
\usepackage{vietnam}
\usepackage{ucs}
\usepackage[utf8]{inputenc}
\begin{document}
Trăm năm trong cõi người ta,\\
Chữ tài chữ mệnh khéo là ghét nhau.\\
Trải qua một cuộc bể dâu,\\
Những điều trông thấy mà đau đớn lòng.\\
Lạ gì bỉ sắc tư phong,\\
Trời xanh quen thói má hồng đánh ghen.
\end{document}

I get this:

! LaTeX Error: Option clash for package inputenc.

I understand that unicode support is improved in latex 2003/12/01.
Would this work if I was to upgrade from my current 2001/06/01 to the
later version?

The alternatives would seem to be:
(1) I ask my translator to use viscii or tcvn, and I leave it in that
encoding forever.
(2) I ask my translator to use unicode, and then I convert from
unicode to tcvn with a script every time I compile the document.

#1 seems lame, since unicode is the wave of the future, so I'd probably
have to convert to unicode some day in the future anyway. #2 seems to
be a problem, because I'm not having much luck locating any open-source
Unix software to convert *from* unicode *to* a legacy encoding. One of
the nice things about doing it with 100% unicode would be that I'd never
have to do any conversions, and doing the conversions scares me, because
I don't read Vietnamese, and therefore wouldn't be able to tell if a
conversion had bugs in it. I'm also planning to produce html output,
probably using tex4ht, so I'd have to get it into unicode at that point
anyway.

TIA for any suggestions!
Ralf Stubner
2006-05-22 20:09:19 UTC
Permalink
Post by unknown
I'm paying someone to do some Vietnamese translation work for me, and
he's not familiar with tex, so I'm trying to do the technical homework.
I would prefer to use unicode for the whole project, since it's more
likely to age gracefully and play nicely with other technologies besides
tex. However, the vntex package only seems to support legacy encodings
such as viscii and tcvn. If I try the obvious thing,
How do you come to this conclusion? I have here vietnam, 2000/01/27 v1.0
from teTeX 3.0, and that version does support UTF-8 input.
Post by unknown
\documentclass{book}
\usepackage{vietnam}
^ add [utf8] here
Post by unknown
\usepackage{ucs}
\usepackage[utf8]{inputenc}
^^^^ with recent LaTeX and ucs versions it is better to
use utf8x here
Post by unknown
\begin{document}
Trăm năm trong cõi người ta,\\
Chữ tài chữ mệnh khéo là ghét nhau.\\
Trải qua một cuộc bể dâu,\\
Những điều trông thấy mà đau đớn lòng.\\
Lạ gì bỉ sắc tư phong,\\
Trời xanh quen thói má hồng đánh ghen.
\end{document}
! LaTeX Error: Option clash for package inputenc.
I understand that unicode support is improved in latex 2003/12/01.
Would this work if I was to upgrade from my current 2001/06/01 to the
later version?
I don't think so. The UTF-8 support in the LaTeX kernel does not seem to
cover Vietnamese, which I find strange since I thought all the T*
encodings were covered by the kernel. Anyway, the vietnam.sty I have
here explicitly uses functionality from ucs.sty ...
Post by unknown
#2 seems to
be a problem, because I'm not having much luck locating any open-source
Unix software to convert *from* unicode *to* a legacy encoding.
recode should be able to do that. But staying with Unicode is the better
idea.

cheerio
ralf
unknown
2006-05-22 20:39:44 UTC
Permalink
Thanks, Ralf, for your helpful response!
Post by Ralf Stubner
Post by unknown
However, the vntex package only seems to support legacy encodings
such as viscii and tcvn.
How do you come to this conclusion? I have here vietnam, 2000/01/27 v1.0
from teTeX 3.0, and that version does support UTF-8 input.
Maybe the online information I found was out of date, but it didn't
mention unicode as a possible input encoding, and it didn't seem to
work when I tried it. However, I'm very happy to hear that I was
wrong!

Hmm...the problem is that it doesn't seem to work when I do what you
suggested. Here's my new input file:

\documentclass{book}

\usepackage[utf8]{vietnam}

\usepackage{ucs}
\usepackage[utf8x]{inputenc}

\begin{document}
Trăm năm trong cõi người ta,\\
Chữ tài chữ mệnh khéo là ghét nhau.\\
Trải qua một cuộc bể dâu,\\
Những điều trông thấy mà đau đớn lòng.\\
Lạ gì bỉ sắc tư phong,\\
Trời xanh quen thói má hồng đánh ghen.
\end{document}

The result is:

! LaTeX Error: Unknown option `utf8' for package `vietnam'.

And yet I seem to have the same version of vietnam.sty that you do!? ---


$ head /usr/share/texmf/tex/generic/vietnam/vietnam.sty
% This file is part of vntex. License: GPL.
%
% This is the file vietnam.sty which provides Vietnamese captions for the
% standard classes.
%
% written by Werner Lemberg <***@gnu.org> and
% Han The Thanh <***@fi.muni.cz>
\ProvidesPackage{vietnam}[2000/01/27 v1.0 Vietnamese captions]

Do you think you could test whether my input file compiles on your
system, or tell me if there's something obviously wrong with my input
file?

Sorry if I'm just being dense, and thanks for the help!
Ralf Stubner
2006-05-22 21:42:54 UTC
Permalink
Post by unknown
\documentclass{book}
\usepackage[utf8]{vietnam}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
Actually it might be better to stick to utf8 here. At least on my system
vietnam.sty loads inputenc allready producing an option clash.

[...]
Post by unknown
! LaTeX Error: Unknown option `utf8' for package `vietnam'.
Stange.
Post by unknown
And yet I seem to have the same version of vietnam.sty that you do!? ---
Not exactly:

head -n 20 /usr/share/texmf-tetex/tex/latex/vietnam/vietnam.sty
% This is the file vietnam.sty which provides Vietnamese captions for the
% standard classes.
%
% written by Werner Lemberg <***@gnu.org> and
% Han The Thanh <***@fi.muni.cz>

\ProvidesPackage{vietnam}[2000/01/27 v1.0 Vietnamese captions]

\RequirePackage{ifthen}
\newboolean{optenc}
\newboolean{dblaccnt}
\newboolean{noinputenc}
\newboolean{nocaptions}
\newboolean{vnutf8}

\DeclareOption{viscii}{\PassOptionsToPackage{viscii}{inputenc}\setboolean{optenc}{true}}
\DeclareOption{tcvn}{\PassOptionsToPackage{tcvn}{inputenc}\setboolean{optenc}{true}}
\DeclareOption{utf8}{\PassOptionsToPackage{utf8}{inputenc}\setboolean{optenc}{true}\setboolean{vnutf8}{true}}
\DeclareOption{vps}{\PassOptionsToPackage{vps}{inputenc}\setboolean{optenc}{true}}
\DeclareOption{mviscii}{\PassOptionsToPackage{mviscii}{inputenc}\setboolean{optenc}{true}}


Here one can see where the option utf8 is defined. If you don't have
this option defined, you should complain with the vntex people for
producing changed versions without changed date/version number.
Post by unknown
Do you think you could test whether my input file compiles on your
system, or tell me if there's something obviously wrong with my input
file?
Here it works with 'utf8' instead of 'utf8x'. (There is a warning from
ucs.sty, though, about utf8 being now the option for the kernel). Does
your vntex come with babel support (vietnam.ldf)? If so, you could try
using babel with option vietnam instead of vietnam.sty. The follwoing
header works here:

\documentclass{book}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\usepackage[vietnam]{babel}
\begin{document}
[...]

ucs.sty seems to be not stricktly necessary, and I am not sure, if on
your system utf8x will work. If not, try with 'utf8'. Here vietnam.sty
also knows the 'noinputenc' option. Hence the following works, too:

\usepackage[noinputenc]{vietnam}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}

Again change utf8x to utf8 if necessary. HTH.

Maybe it is time to update to a more recent TeX distribution, though.
What are you using?

cheerio
ralf
Frank Mittelbach
2006-05-24 07:44:56 UTC
Permalink
Post by Ralf Stubner
Post by unknown
I understand that unicode support is improved in latex 2003/12/01.
Would this work if I was to upgrade from my current 2001/06/01 to the
later version?
I don't think so. The UTF-8 support in the LaTeX kernel does not seem to
cover Vietnamese, which I find strange since I thought all the T*
encodings were covered by the kernel.
T5 is a contributed encoding and I haven't gotten around writing the mapping
table needed to support it. Vonunteers welcome. This isn't difficult it
just takes some time and a little care (and a unicode book or table)

frank
Werner LEMBERG
2006-05-27 23:40:50 UTC
Permalink
Post by Frank Mittelbach
T5 is a contributed encoding and I haven't gotten around writing the
mapping table needed to support it. Vonunteers welcome. This isn't
difficult it just takes some time and a little care (and a unicode
book or table)
t5enc.dfu is already part of vntex.


Werner
Ralf Stubner
2006-05-28 11:17:49 UTC
Permalink
Post by Werner LEMBERG
Post by Frank Mittelbach
T5 is a contributed encoding and I haven't gotten around writing the
mapping table needed to support it. Vonunteers welcome. This isn't
difficult it just takes some time and a little care (and a unicode
book or table)
t5enc.dfu is already part of vntex.
Thanks for the info. I guess the vntex in teTeX 3.0 is just to old then.

This raises the question whether contributed encodings should come with
an appropriate *enc.dfu, or if all of this should be merged into
utf8ienc.dtx such that utf8enc.dfu is more complete.

cheerio
ralf
Robin Fairbairns
2006-05-28 11:37:55 UTC
Permalink
Post by Ralf Stubner
Post by Werner LEMBERG
Post by Frank Mittelbach
T5 is a contributed encoding and I haven't gotten around writing the
mapping table needed to support it. Vonunteers welcome. This isn't
difficult it just takes some time and a little care (and a unicode
book or table)
t5enc.dfu is already part of vntex.
Thanks for the info. I guess the vntex in teTeX 3.0 is just to old then.
yup: vntex was updated around last christmas.
Post by Ralf Stubner
This raises the question whether contributed encodings should come with
an appropriate *enc.dfu, or if all of this should be merged into
utf8ienc.dtx such that utf8enc.dfu is more complete.
it's a bit silly to have the capability of decoding to a set of
commands for which you don't have the supporting macro package,
surely?

it is, however, a rather difficult issue. for example, i find the
situation with cyrillic rather odd: it's nominally a required latex
package, yet most distributions don't include it by default.
nevertheless, cyrillic is included in utfienc.dtx.
--
Robin Fairbairns, Cambridge
Ralf Stubner
2006-05-28 15:30:23 UTC
Permalink
Post by Robin Fairbairns
Post by Ralf Stubner
This raises the question whether contributed encodings should come with
an appropriate *enc.dfu, or if all of this should be merged into
utf8ienc.dtx such that utf8enc.dfu is more complete.
it's a bit silly to have the capability of decoding to a set of
commands for which you don't have the supporting macro package,
surely?
Agreed. It does make sense to keep *enc.def and *enc.dfu together. But
then Frank's comment sounded as if he is interested to integrate further
encodings into utf8ienc.dtx. And what is the point of utf8enc.dfu if
this is not done?
Post by Robin Fairbairns
it is, however, a rather difficult issue. for example, i find the
situation with cyrillic rather odd: it's nominally a required latex
package, yet most distributions don't include it by default.
nevertheless, cyrillic is included in utfienc.dtx.
It's odd, but here one can simply blame the distributers. The cyrillic
bundle is required after all. Although I do not understand why, eg,
cyrillic support is required while greek or vietnamese or ... is not.

cheerio
ralf
Robin Fairbairns
2006-05-28 16:59:48 UTC
Permalink
Post by Ralf Stubner
Post by Robin Fairbairns
it is, however, a rather difficult issue. for example, i find the
situation with cyrillic rather odd: it's nominally a required latex
package, yet most distributions don't include it by default.
nevertheless, cyrillic is included in utfienc.dtx.
It's odd, but here one can simply blame the distributers. The cyrillic
bundle is required after all. Although I do not understand why, eg,
cyrillic support is required while greek or vietnamese or ... is not.
in neither case is there a conforming latex encoding. one can argue
that vietnamese doesn't use hyphenation, so its non-standard status
doesn't matter; but for greek, there's no such excuse. (there was a
proposal that iso 8859-7 should be used as a font encoding, but that
doesn't work either...quite apart from only having monotonic accents,
which excites some greeks, though not me -- i've only ever learned
monotonic ;-)
--
Robin Fairbairns, Cambridge
Loading...