PGN Specification Revision

Date: 06 Sep 2006 00:28:43
From: Adam Blinkinsop
Subject: PGN Specification Revision

The 1994 PGN specification is immense, filled with information
unrelated to the implementation of the standard. As a step towards
writing my own program to import and export PGN data, I took some time
to strip it down to the essentials. In case anyone wants to comment on
it (or, better yet, use it), the PDF is available on my site. TeX
sources are also available to those who
ask.

http://research.strangeabacus.com/sources/pgnspec.pdf

Date: 29 Sep 2006 07:36:32
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

David Richerby wrote:
> Gnaaaargh. The concept of `token' is central to parsing. Please,
> please, don't try to rewrite the PGN spec if you feel that
> tokenization is unnecessary.

Did I say that? I meant that when he talks about tokens he doesn't
define them at all. He assumes that you know what he means (a
whitespace delimited token). How are you supposed to parse something
without creating tokens with your lexer to send to your parser? He
just uses "token" as if it only applies to his whitespace delimited
words.

Date: 29 Sep 2006 16:16:47
From: David Richerby
Subject: Re: PGN Specification Revision

In article <[email protected] >,
Adam Blinkinsop <[email protected] > wrote:
>David Richerby wrote:
>> Gnaaaargh. The concept of `token' is central to parsing. Please,
>> please, don't try to rewrite the PGN spec if you feel that
>> tokenization is unnecessary.
>
> Did I say that?

I thought what you said was equivalent to that but I'm glad to read
your clarification that I'd misunderstood you!

Dave.

--
David Richerby Gigantic Swiss Atlas (TM): it's like
www.chiark.greenend.org.uk/~davidr/ a map of the world but it's made in
Switzerland and huge!

Date: 07 Sep 2006 07:24:17
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Anders Thulin wrote:
> There have been some attempts for a PGN2, usually mentioned in this newsgroup,
> and quite often on the web as well (Google for "PGN XML"). The problem
> I see with them is that they all concentrate on the XML format, and not
> at all on requirements and design. PGN at least formulates six goals
> that it can be evaluated against: and they're good goals, I think. None
> of the new proposals seems to have considered those.

I noticed that. The thing about XML as a format is that it's a pain to
write (as a human) and a pain to read (as a human), two things that PGN
really do well on. An XML-based format practically *requires* a
program to edit it. These days, I'm not sure if that's too much of a
problem, but it's something that needs to be taken into consideration.

The other problem with proposed XML formats is that people seem to want
to get rid of algebraic notation within them -- invent their own
notation just for the format. Chess has some good standard written
notations (SEN and FEN) that it's a shame to waste. The best format
would work with them.

PGN *is* a great format, definitely. But there are problems with it
that people have been pointing out over the last 12 years. Perhaps
they're enough to extend the format for?

Date: 06 Sep 2006 12:25:16
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Anders Thulin wrote:
> Adam Blinkinsop wrote:
> > However, the problem you raise is a moot
> > one: export data is required to place the move immediately after the
> > period (no spaces). Unnecessary, if you ask me.
>
> From an engineering point of view, it is. But from a standards point
> of view, well ... . If you're trying to restate PGN more clearly
> and less ambiguously, you shouldn't change it. (This is where
> I thought you were). But if you do change it, you should probably not
> use 'PGN' as name of the format, unless you also make it clear that you
> have made changes, and made it very clear what those changes are,
> and possibly even explain why those changes do no lead to
> incompatibilities.
>
> I don't see any problems with a 'moot points in PGN and my
> interpretation of them': that's what every PGN utility should have.
> That could very well include deliberate departures ... but then such
> a document won't seem to be a format specification.

Hmm. Perhaps I didn't state my purpose correctly. The reason for this
revision is to help a programmer write an import/export program that
would be compatible with PGN files currently in use. Any changes to
the standard made it looser for importing (with no loss of
functionality) and tighter for exporting (with no loss of
functionality). A subclass of the standard, if you will.

I guess in that sense, I'm not "restating" PGN, I'm "refactoring" it.
Does that make more sense?

> No matter -- I'll save my hostile specification reading mode for later.

I took no offense -- just a misunderstanding. Would a complete (and
precise) restatement of the standard be useful to someone?

Date: 07 Sep 2006 06:14:45
From: Anders Thulin
Subject: Re: PGN Specification Revision

Adam Blinkinsop wrote:

> Hmm. Perhaps I didn't state my purpose correctly. The reason for this
> revision is to help a programmer write an import/export program that
> would be compatible with PGN files currently in use. Any changes to
> the standard made it looser for importing (with no loss of
> functionality) and tighter for exporting (with no loss of
> functionality). A subclass of the standard, if you will.

I see ... I think you could elaborate a bit on in the Background
chapter. The abstract does say something along these lines, but as
I couldn't decide what that information came from, I ignored it.

> I guess in that sense, I'm not "restating" PGN, I'm "refactoring" it.
> Does that make more sense?

I understand better -- and I think that makes it even more important
to state clearly that you are not describing PGN, but a modified form of it.
(The last sentence in chapter two is ambiguous: does 'this document'
refer to the original or your document? I read it as referring to
your document, and interpreted that as a different description only.
I understand that that was not what you intended.)

Anything that cleans up PGN is welcome -- though I suspect that
most PGN utilities that can be written already have been written:
a PGN restatement probably won't change those.

> I took no offense -- just a misunderstanding. Would a complete (and
> precise) restatement of the standard be useful to someone?

Not impossible. But PGN is old. It works, but it doesn't work too well
(though partly because the original document is imprecise in places).

There have been some attempts for a PGN2, usually mentioned in this newsgroup,
and quite often on the web as well (Google for "PGN XML"). The problem
I see with them is that they all concentrate on the XML format, and not
at all on requirements and design. PGN at least formulates six goals
that it can be evaluated against: and they're good goals, I think. None
of the new proposals seems to have considered those.

--
Anders Thulin ath*algonet.se http://www.algonet.se/~ath

Date: 06 Sep 2006 10:29:30
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Ari Makela wrote:
> IMO one of the biggest problems with the PGN is the limited character
> set. It works quite well if one is from Northern America, Northen Europe
> or the Western Europe like you and me but it does bother me that
> PGN cannot be written in any language.

Absolutely. Ideally, a revised spec will pave the way for a _new_
spec, one that takes over 12 years of experience with the old one into
account. I have a running list of problems people have noted with the
current PGN format, so I'll add yours to it.

> Then again, it's very nice to see some development!

Thanks!

Date: 06 Sep 2006 09:36:05
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Anders Thulin wrote:
> (But I see now that you do mention token without defining it ... hm.)

Where do I mention it? Most likely text copied straight from the old
spec... I tried to stay away from the entire "token" idea, because
it's generally unnecessary. The original spec sounds like he just
found lex and yacc and wanted to do something cool with them :)

> I've still not figured out if the game termination kers are
> tokens. They have to be, as only tokens, white space separators,
> and comments are allowed in movetext. But '1/2-1/2' and '*' contain
> characters that are not legal in tokens, so they can't be ...

That's one of the internal inconsistencies in the standard, and one
major reason why I hesitate to define a token.

> Here's another: what's a 'printing character' (needed to decide
> if a line exceeds recommended length)? This may be solved
> if you switch to 8-bit ASCII for character set, but with Latin-1
> it is a bit of a poser: is SHY (10/13) a printing character or not?
> Is it always one or the other, or does it depend on the context?

I used the RFC's definitions of printing characters, an attempt to
avoid conflict by bowing to another standard. Do you think the set
should be defined differently?

> At one time I used these conundrums instead of counting sheep ...

Doesn't help for me -- keeps me awake :-P (at midnight, when I wrote
the first version of this)

> [Maximize the number of movetext tokens is] an interesting requirement.
> It says that '1. e4' is illegal,
> if '1.e4' allows one more token on the line. It would probably
> be a disaster if any program seriously checked for that kind of
> problems.

Absolutely. The thing is, any parser that works to spec will be
working with the "import" format, which doesn't care how many
tokens-per-line there are. However, the problem you raise is a moot
one: export data is required to place the move immediately after the
period (no spaces). Unnecessary, if you ask me.

> I also think your annotation production says to much: it includes
> ! and ?, but I don't think they're allowed -- those things are
> done as NAGs. (Or is this one of those incompatible changes?)

The import spec allows it, the export spec forbids it (see section
8.2.3.8 of the original). As I combined them into one, I attempted to
treat the import spec as the MUSTs and the export spec as the SHOULDs,
which is less complex to understand. All my changes (so far) are
completely compatible with any current program that runs to spec.

Date: 20 Sep 2006 12:04:34
From: David Richerby
Subject: Re: PGN Specification Revision

Adam Blinkinsop <[email protected] > wrote:
> Anders Thulin wrote:
>> (But I see now that you do mention token without defining it ... hm.)
>
> Where do I mention it? Most likely text copied straight from the old
> spec... I tried to stay away from the entire "token" idea, because
> it's generally unnecessary. The original spec sounds like he just
> found lex and yacc and wanted to do something cool with them :)

Gnaaaargh. The concept of `token' is central to parsing. Please,
please, don't try to rewrite the PGN spec if you feel that
tokenization is unnecessary.

Dave.

--
David Richerby Portable Painting (TM): it's like a
www.chiark.greenend.org.uk/~davidr/ Renaissance masterpiece but you can
take it anywhere!

Date: 06 Sep 2006 18:43:14
From: Anders Thulin
Subject: Re: PGN Specification Revision

Adam Blinkinsop wrote:

> I used the RFC's definitions of printing characters, an attempt to
> avoid conflict by bowing to another standard. Do you think the set
> should be defined differently?

I think the term should follow the character set standard you refer to.
If Latin-1, then the term should probably be 'graphic character', and
so on. This is a nasty area, since you essentially have to have a copy
of the official standards text. (I tend to go for the ECMA standards
when I can get away with it: some ISO standards are relabelled ECMA
standards, and many ECMA standards can be downloaded for free. Google
for "ECMA 94", for instance. Terminology tends to remain unaltered over
this relabelling, but section and page numbering does change.)

> Absolutely. The thing is, any parser that works to spec will be
> working with the "import" format, which doesn't care how many
> tokens-per-line there are.

I interpret the document differently: input format is intended to
cover data that may have been created by hand, and is for that reason
less formal. A PGN reader should be able to handle import format, but
there is nothing that says that it must parse all input files as if they
were input format. Export format files should be checked to export format
standards on input: if I import a PGN file expected to be of archive standard
into a database without verifying that it indeed follows archive format,
I may import a file of lower quality than intended.

If I wrote a PGN reader, I'd probably made this user-defineable.

But most PGN utilities do what you say: accept import format as input,
and do little more than a token acknowledgement at archive format on
output. I have PGN files with nested {}-comments -- should be
impossible to produce except by hand, and should definitely be possible
to detect as an error on input.

(This is one of those areas where I wish there had been some
explicit statement about format: '%PGN 1.0 ARCHIVE' as first line
could flag an archive file. Without it, it's treated as an import
file.)

> However, the problem you raise is a moot
> one: export data is required to place the move immediately after the
> period (no spaces). Unnecessary, if you ask me.

From an engineering point of view, it is. But from a standards point
of view, well ... . If you're trying to restate PGN more clearly
and less ambiguously, you shouldn't change it. (This is where
I thought you were). But if you do change it, you should probably not
use 'PGN' as name of the format, unless you also make it clear that you
have made changes, and made it very clear what those changes are,
and possibly even explain why those changes do no lead to
incompatibilities.

I don't see any problems with a 'moot points in PGN and my
interpretation of them': that's what every PGN utility should have.
That could very well include deliberate departures ... but then such
a document won't seem to be a format specification.

No matter -- I'll save my hostile specification reading mode for later.

--
Anders Thulin ath*algonet.se http://www.algonet.se/~ath

Date: 20 Sep 2006 12:09:11
From: David Richerby
Subject: Re: PGN Specification Revision

Anders Thulin <[email protected] > wrote:
> I interpret the document differently: input format is intended to
> cover data that may have been created by hand, and is for that
> reason less formal. A PGN reader should be able to handle import
> format, but there is nothing that says that it must parse all input
> files as if they were input format. Export format files should be
> checked to export format standards on input

This import/export/archive format thing is another reason the
designers of PGN should be tenth or eleventh against the wall when the
revolution comes. Why, why, why?

Dave.

--
David Richerby Radioactive Strange Priest (TM):
www.chiark.greenend.org.uk/~davidr/ it's like a man of the cloth but it's
totally weird and it'll make you glow
in the dark!

Date: 06 Sep 2006 08:56:33
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Simon Krahnke wrote:
> There is no such thing as 8-bit ASCII.

http://en.wikipedia.org/wiki/Extended_ASCII

I'll be more descriptive -- it is technically the Latin 1 form of
extended (I always call it eight-bit, but I guess that's not standard)
ASCII. That change has been made.

Date: 06 Sep 2006 08:09:11
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Alright, changes made. Updated version available at the same URL:

http://research.strangeabacus.com/sources/pgnspec.pdf

Date: 06 Sep 2006 07:18:55
From: Adam Blinkinsop
Subject: Re: PGN Specification Revision

Thanks for bringing these things to my attention! I'll make sure to
put them into the document. For expediency's sake, my reasoning
follows.

Anders Thulin wrote:
> For instance: the byte equivalence requirement (3.2.1) seems to be gone,

That's right. I took it out because there is no way for a parser to
know whether a given file follows that part of the spec, and it seemed
redundant anyway (unless an exporter includes some entropy internally,
it will _always_ be byte equivalent to itself, which is the letter of
the requirement). I'll document it anyway, though.

> comment lines (the ; comment) seems to be gone,

This is actually still in the document, in the formal syntax spec: see
page 4, the second section, under "rest-of-line-comment." Should I
explain the comments somewhere else to make sure they aren't missed?

> the requirement that
> tags outside the STR appear in ASCII order,

Absolutely -- I thought I had written it in there, but I can't find it
now. I'll make sure to put it in today.

> and that as many movetext
> tokens as possible must appear on the same line.

Hmm. That's one of the archival requirements as well. Now that you
mention it, I notice that I didn't emphasize either the need for no
empty lines until the movetext is over. One more thing to change.

> Oh, and ISO 8859-1 does not define any control characters, no matter
> how much PGN insists it does. Any references to carriage return and
> line feed are meaningless in the context of Latin-1 only.

I figured as much. I'll change it to 8-bit ASCII (which is what they
were talking about anyway) to make it consistent with itself.

Thanks for the proofread! Expect v2 to be up later today (around 9
PST).

Date: 06 Sep 2006 16:13:30
From: Anders Thulin
Subject: Re: PGN Specification Revision

Adam Blinkinsop wrote:

>> comment lines (the ; comment) seems to be gone,
>
> This is actually still in the document, in the formal syntax spec:

Oops, right ... I didn't double-check carefully enough.
I remember ... I was looking for that other weirdness that involves
integer and symbol tokens, but I suspect you may have sidestepped it
as I didn't find any 'token' production. As far as I can make out,
it's impossible to decide if you have an integer token or a symbol token
consisting only of digits.

(But I see now that you do mention token without defining it ... hm.)

I've still not figured out if the game termination kers are
tokens. They have to be, as only tokens, white space separators,
and comments are allowed in movetext. But '1/2-1/2' and '*' contain
characters that are not legal in tokens, so they can't be ...

Here's another: what's a 'printing character' (needed to decide
if a line exceeds recommended length)? This may be solved
if you switch to 8-bit ASCII for character set, but with Latin-1
it is a bit of a poser: is SHY (10/13) a printing character or not?
Is it always one or the other, or does it depend on the context?

At one time I used these conundrums instead of counting sheep ...

>> and that as many movetext
>> tokens as possible must appear on the same line.
>
> Hmm. That's one of the archival requirements as well.

It's an interesting requirement. It says that '1. e4' is illegal,
if '1.e4' allows one more token on the line. It would probably
be a disaster if any program seriously checked for that kind of
problems.

I also think your annotation production says to much: it includes
! and ?, but I don't think they're allowed -- those things are
done as NAGs. (Or is this one of those incompatible changes?)

--
Anders Thulin ath*algonet.se http://www.algonet.se/~ath

Date: 06 Sep 2006 17:10:37
From: Ari Makela
Subject: Re: PGN Specification Revision

On 2006-09-06, Anders Thulin <[email protected] > wrote:

> Here's another: what's a 'printing character' (needed to decide
> if a line exceeds recommended length)? This may be solved
> if you switch to 8-bit ASCII for character set, but with Latin-1
> it is a bit of a poser: is SHY (10/13) a printing character or not?
> Is it always one or the other, or does it depend on the context?

IMO one of the biggest problems with the PGN is the limited character
set. It works quite well if one is from Northern America, Northen Europe
or the Western Europe like you and me but it does bother me that
PGN cannot be written in any language.

Then again, it's very nice to see some development!

--
Ari Makela late autumn -
[email protected] a single chair waiting
http://arska.org/hauva/ for someone yet to come
-- Arima Akito

Date: 06 Sep 2006 11:31:16
From: Anders Thulin
Subject: Re: PGN Specification Revision

Adam Blinkinsop wrote:

> to strip it down to the essentials. In case anyone wants to comment on
> it (or, better yet, use it), the PDF is available on my site.

What changes did you introduce? (Best: document them in the document.)

For instance: the byte equivalence requirement (3.2.1) seems to be gone,
comment lines (the ; comment) seems to be gone, the requirement that
tags outside the STR appear in ASCII order, and that as many movetext
tokens as possible must appear on the same line.

They're rarely implemented, but they are requirements, and should not
be dropped without at least mention that they have been dropped and the
reason why, I think.

Oh, and ISO 8859-1 does not define any control characters, no matter
how much PGN insists it does. Any references to carriage return and
line feed are meaningless in the context of Latin-1 only.

--
Anders Thulin ath*algonet.se http://www.algonet.se/~ath

Date: 20 Sep 2006 12:01:54
From: David Richerby
Subject: Re: PGN Specification Revision

Anders Thulin <[email protected] > wrote:
> [...] the requirement that tags outside the STR appear in ASCII
> order, and that as many movetext tokens as possible must appear on
> the same line.

My only question is why such complete garbage ever entered the spec in
the first place. The PGN spec is, alas, full of this kind of crap.

(I agree that, a write up of what's in the spec must include all of
the spec, even the barking mad bits, of course.)

Dave.

--
David Richerby Carnivorous Dictator (TM): it's like a
www.chiark.greenend.org.uk/~davidr/ totalitarian leader but it eats flesh!

Date: 06 Sep 2006 17:05:21
From: Simon Krahnke
Subject: Re: PGN Specification Revision

* Adam Blinkinsop <[email protected] > (16:18) schrieb:

> I figured as much. I'll change it to 8-bit ASCII (which is what they
> were talking about anyway) to make it consistent with itself.

There is no such thing as 8-bit ASCII.

mfg, simon .... l