Discussion:
Call for volunteers to test new glossaries related application (bib2gls)
(too old to reply)
Nicola Talbot
2017-02-05 12:53:19 UTC
Permalink
Raw Message
Hi,

I'm developing a new command line application called bib2gls
(https://github.com/nlct/bib2gls) which works with the glossaries-extra
package (http://ctan.org/pkg/glossaries-extra), and I'm looking for
volunteers to try out the experimental version before uploading it to CTAN.

For those of you who are familiar with the glossaries package, you may
be aware that you can create a .tex file containing all your glossary
definitions which can then be input into the document using either
\input or \loadglsentries.

For example, the file "myentries.tex" might contain:

\newglossaryentry{matrix}{name={matrix},
plural={matrices},
description={rectangular array of values}
}

\newglossaryentry{pi}{name={\ensuremath{\pi}},
description={the ratio of the length of the circumference
of a circle to its diameter}
}

The document might then look something like:

\documentclass{article}

\usepackage{glossaries}

\makeglossaries
\loadglsentries{myentries}% or just \input{myentries}

\begin{document}
A \gls{matrix}. Lots of \glspl{matrix}.

\[C = 2 \gls{pi} r^2 \]

\printglossaries
\end{document}

The document build process (assuming the document is called "myDoc.tex")
is (using the makeglossaries Perl script):

latex myDoc
makeglossaries myDoc
latex myDoc

(replace 'latex' with pdflatex, xelatex, lualatex as appropriate).

Alternatively using the light-weight Lua makeglossaries-lite script:

latex myDoc
makeglossaries-lite myDoc
latex myDoc

(or makeindex can be called directly with all the required options set).

xindy may be used instead by adding the 'xindy' package option:

\usepackage[xindy]{glossaries}

However the 'sort' key must now be added to the 'pi' entry:

\newglossaryentry{pi}{name={\ensuremath{\pi}},
sort={pi},
description={the ratio of the length of the circumference
of a circle to its diameter}
}

(otherwise xindy will fail).

Alternatively you can get TeX to sort and collate the entries:

\documentclass{article}

\usepackage{glossaries}

\makenoidxglossaries % <--- changed
\loadglsentries{myentries}% or just \input{myentries}

\begin{document}
A \gls{matrix}. Lots of \glspl{matrix}.

\[C = 2 \gls{pi} r^2 \]

\printnoidxglossaries % <--- changed
\end{document}

This also requires the 'sort' key for the 'pi' entry.

The new bib2gls application allows an alternative approach, but it
requires commands provided with the extension package glossaries-extra
and the entries are stored in a .bib format instead.

The above "myentries.tex" file can be rewritten as "myentries.bib":

@entry{matrix,
name={matrix},
plural={matrices},
description={rectangular array of values}
}

@symbol{pi,
name={\ensuremath{\pi}},
description={the ratio of the length of the circumference
of a circle to its diameter}
}

This can now be maintained in a bibliographic management system such as
JabRef as long as that application can be configured to recognise the
glossary fields.

The document now looks like:

\documentclass{article}

\usepackage[record]{glossaries-extra}% <--- changed

\GlsXtrLoadResources[% <--- changed
src={myentries}% data in 'myentries.bib'
]

\begin{document}
A \gls{matrix}. Lots of \glspl{matrix}.

\[C = 2 \gls{pi} r^2 \]

\printunsrtglossaries%<--- changed
\end{document}

The build process is:

latex myDoc
bib2gls myDoc
latex myDoc

bib2gls works in a similar way to bibtex. By default it only selects
those entries that have been referenced in the document and the entries
they depend on. (Use the option selection=all to select all entries in
the bib file. Note that \glsaddall doesn't work in this context, since
the entries need to be defined in order for \glsaddall to iterate over
them.)

bib2gls also sorts and collates so the xindy/makeindex step can be
skipped. (You can use bib2gls with xindy/makeindex if required, for
example, if you need a custom xindy rule. In which case you need
'record=alsoindex' and use \makeglossaries and \printglossaries as
usual, and add sort=none to the \GlsXtrLoadResources options list.)

You can have multiple bib files. For example:

\GlsXtrLoadResources[src={myterms,mysymbols}]

or:

\GlsXtrLoadResources[
src={myterms}, % data in myterms.bib
type=main,% put these entries in the 'main' glossary
sort={de-CH-1996}% sort according to Swiss German new orthography
]

\newglossary*{symbols}{Symbols}

\GlsXtrLoadResources[
src={mysymbols},% data in mysymbols.bib
type=symbols, % put these entries in the 'symbols' glossary
sort={letter-nocase}% case-insensitive letter sort
]

You can select a different field to sort by. For example, if the file
'constants.bib' contains:

@entry{pi,
name={\ensuremath{\pi}},
description={the ratio of the length of the circumference
of a circle to its diameter},
user1={3.14159}
}

@entry{eulercons,
name={\ensuremath{\gamma}},
description={Euler's constant},
user1={0.57721}
}

@entry{root2,
name={\ensuremath{\surd2}},
description={Pythagoras' constant},
user1={1.41421}
}

Then these can be sorted numerically according to the 'user1' field:

\GlsXtrLoadResources[src=constants,% constants.bib
sort={double},% decimal sort
sort-field={user1}
]

Requirements: Java 7, glossaries-extra v1.12 and dependent packages
(including glossaries v4.19+).

Installation instructions for the experimental version are at:

https://github.com/nlct/bib2gls#testing-the-experimental-version

A draft copy of the manual (bib2gls.pdf) can be found in:

https://github.com/nlct/bib2gls/tree/master/unstable

or to download it:
https://github.com/nlct/bib2gls/raw/master/unstable/bib2gls.pdf

Regards
Nicola Talbot
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/
Jeffrey Goldberg
2017-02-05 17:17:34 UTC
Permalink
Raw Message
Post by Nicola Talbot
I'm developing a new command line application called bib2gls
(https://github.com/nlct/bib2gls) which works with the glossaries-extra
package (http://ctan.org/pkg/glossaries-extra), and I'm looking for
volunteers to try out the experimental version before uploading it to CTAN.
I can't promise to play with this, but I really want to.

The document for which I am extensively using glossaries is complicated
enough as it is and not something I can really "break", but I might see
if I can create a new git branch for trying this out.
Post by Nicola Talbot
For those of you who are familiar with the glossaries package, you may
be aware that you can create a .tex file containing all your glossary
definitions which can then be input into the document using either
\input or \loadglsentries.
I'm already using a separate glossary.tex file, which I'm loading with
\loadglsentries
Post by Nicola Talbot
The document build process (assuming the document is called "myDoc.tex")
latex myDoc
makeglossaries myDoc
latex myDoc
(replace 'latex' with pdflatex, xelatex, lualatex as appropriate).
latex myDoc
makeglossaries-lite myDoc
latex myDoc
(or makeindex can be called directly with all the required options set).
\usepackage[xindy]{glossaries}
And latexmk does a pretty good job of handling all of this.
Post by Nicola Talbot
The new bib2gls application allows an alternative approach, but it
requires commands provided with the extension package glossaries-extra
and the entries are stored in a .bib format instead.
@entry{matrix,
name={matrix},
plural={matrices},
description={rectangular array of values}
}
@symbol{pi,
name={\ensuremath{\pi}},
description={the ratio of the length of the circumference
of a circle to its diameter}
}
This can now be maintained in a bibliographic management system such as
JabRef as long as that application can be configured to recognise the
glossary fields.
Is that the primary advantage of the new way of doing things, or is
there something else you are going for?
Post by Nicola Talbot
\documentclass{article}
\usepackage[record]{glossaries-extra}% <--- changed
\GlsXtrLoadResources[% <--- changed
src={myentries}% data in 'myentries.bib'
]
\begin{document}
A \gls{matrix}. Lots of \glspl{matrix}.
\[C = 2 \gls{pi} r^2 \]
\printunsrtglossaries%<--- changed
\end{document}
latex myDoc
bib2gls myDoc
latex myDoc
bib2gls works in a similar way to bibtex. By default it only selects
those entries that have been referenced in the document and the entries
they depend on. (Use the option selection=all to select all entries in
the bib file
Ah. That is cool. Now I see what you are going for. That is a very nice
idea.
Post by Nicola Talbot
Requirements: Java 7,
I probably will not be experimenting with this in my current project
(which is the only thing I've ever used glossaries for). It has enough
dependencies as it is and I'm trying (and failing) to keep the build
process simple enough so that people unfamiliar with LaTeX can build the
thing (if I get run over by a bus). I might set up a private branch of
this project to play with all of this, but only if I have time or find
that doing so is a fun way to procrastinate from what I should be doing.
Post by Nicola Talbot
https://github.com/nlct/bib2gls#testing-the-experimental-version
https://github.com/nlct/bib2gls/tree/master/unstable
https://github.com/nlct/bib2gls/raw/master/unstable/bib2gls.pdf
Thanks! And thanks for developing and maintaining glossaries(-extra).

Cheers,

-j
--
Jeffrey Goldberg http://goldmark.org/jeff/
I rarely read HTML or poorly quoting posts
Reply-To address is valid
Nicola Talbot
2017-02-05 22:49:57 UTC
Permalink
Raw Message
Post by Jeffrey Goldberg
I can't promise to play with this, but I really want to.
The document for which I am extensively using glossaries is complicated
enough as it is and not something I can really "break", but I might see
if I can create a new git branch for trying this out.
Yes, I wouldn't try it out on an important large project :-) Although it
doesn't require much change to the actual document code, it's best to
wait for it to stabilize.
Post by Jeffrey Goldberg
Post by Nicola Talbot
This can now be maintained in a bibliographic management system such as
JabRef as long as that application can be configured to recognise the
glossary fields.
Is that the primary advantage of the new way of doing things, or is
there something else you are going for?
That was the starting point following on from the question 'Is there a
program for managing glossary tags?' on TeX on StackExchange
http://tex.stackexchange.com/questions/342544 but I decided that the
application may as well sort and collate at the same time to trim down
the build process. Since Java 8 can access the Unicode Common Locale
Data Repository it has good locale support for word-order comparisons.
Java 7 can't access the CLDR but it still has a fair amount of locale
support in the JRE.
Post by Jeffrey Goldberg
Post by Nicola Talbot
bib2gls works in a similar way to bibtex. By default it only selects
those entries that have been referenced in the document and the entries
they depend on. (Use the option selection=all to select all entries in
the bib file
Ah. That is cool. Now I see what you are going for. That is a very nice
idea.
There are other selection options as well. You can even reference the
same .bib across different \GlsXtrLoadResource instances, for example,
if you want a 'sorted by use' list and an alphabetic list of the same data:

\newglossary*{byuse}{Glossary (in Order of Appearance)}
\newglossary*{byname}{Glossary (Alphabetical)}

\GlsXtrLoadResources[
src={myentries},% myentries.bib
sort={use},% order by use
type={byuse}% put in the 'byuse' glossary
]

\GlsXtrLoadResources[
src={myentries},% myentries.bib
sort={en-GB},% sort by en-GB locale
label-prefix={byname.},% add a prefix to the labels to prevent a clash
type={byname}% put in the 'byname' glossary
]

It has more flexibility than the standard glossary build options. For
example, if an entry has a cross-reference using the 'see' field, you
can choose whether to put the cross-reference at the start or end of the
location or omit it (but still consider the cross-referenced term a
dependency).

There's no restriction to the way the locations are formatted. (For
example, makeindex will only accept arabic, Roman numerals or a-Z, A-Z
and xindy needs special rules.) bib2gls will try to determine a sequence
in order to form ranges, but if it can't it will simply list them
individually. It uses \p{javaDigit} in the regular expression which not
only matches 0-9 but also matches digits in other scripts as well. (Not
being a linguist, it's not something I can thoroughly test, but the few
tests I tried seemed to work as far as I could tell.)

There's also a primitive TeX parser used by bib2gls. So for example, it
can work out that $\vec{v}$ indicates the character 'v' (0x76) followed
by the combining right arrow above (0x20D7), so it will sort according
to those characters. It can parse the contents of @preamble as well. For
example:

@preamble{"\providecommand*{\card}[1]{|#1|}
\providecommand*{\set}[1]{\mathcal{#1}}"}

@entry{cardS,
name={\ensuremath{\card{\set{S}}}},
description={cardinality of set $\set{S}$}
}

It can work out from the @preamble code that the sort value should be
|S| (0x7C 0x53 0x7C). It has limited knowledge of a few packages, which
it can detect from the log file, although it won't be able to pick up
any (re-)definitions provided in the actual document. The parser will
strip unknown commands. If the sort value ends up empty bib2gls will
fall back on the label instead. (You can actually tell bib2gls to sort
by the label instead using sort-field=id in the options. I think I've
forgotten to put that in the manual. I'll have to check.)

It also uses the parser to check through fields for instances of \gls
etc to determine dependencies (in addition to checking the 'parent',
'see' and 'alias' fields).
Post by Jeffrey Goldberg
I probably will not be experimenting with this in my current project
(which is the only thing I've ever used glossaries for). It has enough
dependencies as it is and I'm trying (and failing) to keep the build
process simple enough so that people unfamiliar with LaTeX can build the
thing (if I get run over by a bus). I might set up a private branch of
this project to play with all of this, but only if I have time or find
that doing so is a fun way to procrastinate from what I should be doing.
Writing it was procrastination on my part :-)
Post by Jeffrey Goldberg
Thanks! And thanks for developing and maintaining glossaries(-extra).
Thank you for the feedback.

Regards
Nicola
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/
Ulrike Fischer
2017-02-06 10:10:23 UTC
Permalink
Raw Message
Post by Nicola Talbot
latex myDoc
bib2gls myDoc
latex myDoc
I wonder why you don't use biber for the build process?
--
Ulrike Fischer
http://www.troubleshooting-tex.de/
Nicola Talbot
2017-02-06 11:30:04 UTC
Permalink
Raw Message
Post by Ulrike Fischer
Post by Nicola Talbot
latex myDoc
bib2gls myDoc
latex myDoc
I wonder why you don't use biber for the build process?
bib2gls is specifically customised to integrate with glossaries-extra.sty.

With bib2gls you can have something like:

@dualentry{cat,
name={cat},
description={chat}
}

with

\newglossary*{english}{English to French Dictionary}
\newglossary*{french}{French to English Dictionary}

\GlsXtrLoadResources[
src={entries},% data in entries.bib
label-prefix={en.},
dual-label-prefix={fr.},
type={english},
dual-type={french}
]

this is equivalent to

\newglossaryentry{en.cat}{name={cat},type={english},description={chat}}
\newglossaryentry{fr.cat}{name={chat},type={french},description={cat}}

bib2gls can also parse the field contents to determine dependencies. For
example:

@symbol{S,
name={$\mathcal{S}$},
description={a set}
}

@entry{set,
name={set}
description={collection of values, denoted $\glshyperlink{S}$}
}

bib2gls can pick up \glshyperlink{S} in the description and knows that
if 'set' is referenced in the document then 'S' must also be included
even if it hasn't been referenced.

bib2gls can also interpret some known symbols so it can sort $\alpha$,
$\beta$, $\gamma$ etc according to the nearest suitable Unicode equivalent.

Regards
Nicola Talbot
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/
Ulrike Fischer
2017-02-06 14:17:08 UTC
Permalink
Raw Message
Post by Ulrike Fischer
I wonder why you don't use biber for the build process?
Yes. I got this. But you can get it with biber too. It can sort,
knows unicode, the bib-format, can handle a number of
(la)Tex-command, you can customize the output etc.

What do you gain by an additional application?
--
Ulrike Fischer
http://www.troubleshooting-tex.de/
Nicola Talbot
2017-02-06 17:37:08 UTC
Permalink
Raw Message
Post by Ulrike Fischer
Post by Ulrike Fischer
I wonder why you don't use biber for the build process?
Yes. I got this. But you can get it with biber too. It can sort,
knows unicode, the bib-format, can handle a number of
(la)Tex-command, you can customize the output etc.
That's interesting. Can you provide an example of how to instruct biber
to pick up the following lines from an aux file

\***@record{gls.sample}{}{page}{glsnumberformat}{1}
\***@record{gls.bird}{}{page}{glsnumberformat}{1}
\***@record{gls.sample}{}{page}{glsnumberformat}{2}
\***@record{gls.bird}{}{page}{glsnumberformat}{2}
\***@record{gls.sample}{}{page}{glsnumberformat}{3}
\***@record{gls.sample}{}{page}{glsnumberformat}{3}
\***@record{gls.bird}{}{page}{glsnumberformat}{4}
\***@record{gls.bird}{}{page}{hyperbf}{4}

and determine that:

- the 'gls.sample' location list can be concatenated into the range 1--3
(or more precisely \setentrycounter[]{page}\glsnumberformat{1}\delimR
\setentrycounter[]{page}\glsnumberformat{3})

- the 'gls.bird' location list is 1, 2, \hyperbf{4} (skipping the
generic glsnumberformat location).

and how it can be instructed to recognise specific commands (such as
\gls or \glshyperlink) within the fields to determine dependencies?

Regards
Nicola Talbot
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/
Ulrike Fischer
2017-02-07 15:45:12 UTC
Permalink
Raw Message
Post by Nicola Talbot
That's interesting. Can you provide an example of how to instruct biber
to pick up the following lines from an aux file
biber doesn't read the aux-file but a driver file (bcf in the case
of biblatex).
Post by Nicola Talbot
the 'gls.sample' location list can be concatenated into the range 1--3
I'm not quite sure if this should/can be delegated to biber.

Also I don't know enough about the internals to be able to decide if
biber can really do all the things your bib2gls does. I only had the
impression from your first post that bib2gls mostly converts a
bib-file to a format that glossaries likes -- and this sounds like
something biber could do and so I wondered if you ever considered to
use biber.
--
Ulrike Fischer
http://www.troubleshooting-tex.de/
Nicola Talbot
2017-02-07 18:35:20 UTC
Permalink
Raw Message
Post by Ulrike Fischer
Post by Nicola Talbot
That's interesting. Can you provide an example of how to instruct biber
to pick up the following lines from an aux file
biber doesn't read the aux-file but a driver file (bcf in the case
of biblatex).
Post by Nicola Talbot
the 'gls.sample' location list can be concatenated into the range 1--3
I'm not quite sure if this should/can be delegated to biber.
In which case I don't think biber is a solution for glossaries.
Post by Ulrike Fischer
Also I don't know enough about the internals to be able to decide if
biber can really do all the things your bib2gls does. I only had the
impression from your first post that bib2gls mostly converts a
bib-file to a format that glossaries likes
No, it doesn't simply convert the .bib file, it's also an indexing
application that performs the location list collations (which is why
makeglossaries/makeindex/xindy isn't in the document build list), and
aliased entries can have their locations dropped or merged with their
target's location list.

The indexing process also needs to be able to do hierarchical sorting,
which bib2gls can do (but perhaps that's something that biber can do as
well).
Post by Ulrike Fischer
-- and this sounds like
something biber could do and so I wondered if you ever considered to
use biber.
I had another look at the biber manual and it seems the file format is
pretty much a private format used by biber+biblatex. It's not documented
and changes so much that you have to make sure to match the appropriate
version of biblatex with biber according to the compatibility matrix
listed in the manual. It doesn't seem to be intended for use outside of
biblatex, as far as I can tell.

Regards
Nicola Talbot
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/
Una
2017-03-13 00:58:54 UTC
Permalink
Raw Message
I was on board to test this until I got to the Java 7 requirement. That's
not happening. Sorry.

Una
Nicola Talbot
2017-03-13 10:53:17 UTC
Permalink
Raw Message
Post by Una
I was on board to test this until I got to the Java 7 requirement. That's
not happening. Sorry.
Unfortunately the code uses libraries that were only included in Java 7,
so can't be compiled for earlier versions. Thank you for taking the time
to look at it.

Best regards
Nicola Talbot
--
Home: http://www.dickimaw-books.com/
Creating a LaTeX Minimal Example:
http://www.dickimaw-books.com/latex/minexample/
Loading...