Using Regulus

Manny Rayner, Beth Ann Hockey, Pierrette Bouillon

mrayner@riacs.edu, bahockey@mail.arc.nasa.gov, pierrette.bouillon@issco.unige.ch

Regulus is a compiler, written in SICStus Prolog, that turns unification grammars written in a Prolog-based feature-value notation into Nuance GSL grammars. It is open source software, so you can use, modify and extend it in any way you like subject to the restrictions of the LGPL license.

Regulus 1, the first version, was developed by Netdecisions Ltd and Fluency Voice Technology Ltd at the Netdecisions Technology Centre, Cambridge, England. Regulus 2, the version you have here, is currently being developed by an open source consortium whose main partners are NASA Ames and Geneva University.

Contents of this document:

� Frequently asked questions

� How to install and run Regulus

o Installing Regulus

o Setting up the Regulus development environment

o Using the Regulus development environment

� Parsing utterances in the Regulus development environment

� Parsing non-top constituents in the Regulus development environment

� Using the grammar stepper

� Running the development environment with speech input

o Using Regulus in batch mode

o The config file

o Compiling Regulus to Nuance from the command line

o Calling the Regulus parser from Prolog

� Some simple Regulus grammars

o Toy0 - a minimal Regulus grammar

o Toy1 - a slightly larger Regulus grammar

� Building recognisers using grammar specialisation

o The general English grammar

o Building on top of the general English grammar

� Lexicon macros

o Invoking the grammar specialiser

o Writing operationality definitions

o Defining multiple top-level specialised grammars

� Ignoring subdomains to speed up compilation

o Handling ambiguity when constructing the treebank

� Default parse preferences

o Making Regulus to Nuance compilation more efficient by ignoring features

o Including lexicon entries directly into the specialised grammar

� Overriding include_lex declarations

� Conditional include_lex declarations

o Creating class N-gram grammars from specialised grammars

� Surface processing

o Files used for surface processing

o Defining surface patterns

� Using Regulus for generation

o Specialised generation grammars

o Generation preferences

� Using Regulus for translation

o Files used for translation

o The translation top-level

o Running translation in batch mode

o Calling translation from Prolog

o Translation rule files

� Resolving conflicts between transfer rules

� Conditional transfer rules

� Transfer variables

� Bidirectional transfer rules and transfer lexicon entries

� Translation tracing

o Interlingua and interlingua declarations

o Interlingua structure grammars

o Using macros in translation rule files

o Ellipsis processing

o Using generation in translation

o Collocation rules

o Orthography rules

� Using Regulus for dialogue applications

o Examples of dialogue processing applications

o Running dialogue applications from the Regulus top-level

o Dialogue processing commands

o Regression testing for dialogue applications

o Using LF patterns for dialogue applications

� Adding intelligent help to Regulus applications

� Formal description of Regulus grammar notation

o Comments

o Labels

o Macros

o Include statements

o Declarations

� ignore_item

� macro

� feature_value_space

� feature

� category

� top_level_category

� feature_instantiation_schedule

� specialises

� ignore_feature

� ignore_specialises

� feature_value_space_substitution

� external_grammar

o Rules

� RHS

� Lexical item

� Sequence

� Alternate

� Optional

� Category

� Feature value list

� Feature value pair

� Semantic value

� Atomic value

� Semantic variable

� Semantic feature value list

� List

� Unary GSL function expression

� Binary GSL function expression

� Syntactic feature value

� Atomic syntactic feature value

� Variable syntactic feature value

� Disjunctive syntactic feature value

� Conjunctive syntactic feature value

� Negated syntactic feature value

� Error and warning messages

� Using the RegServer

o Installing the RegServer

o Running the RegServer

o Interfacing the RegServer to a Prolog program

o Interfacing the RegServer to a Java program

o Sample RegServer applications

Frequently asked questions

What is Regulus good for?

The main point of Regulus is to make it possible to write large, complex GSL grammars, which would be difficult to code by hand. You are most likely to find it useful if you want to build a user-initiative or mixed-initiative speech application on top of the Nuance platform.

So what does it do exactly?

You write your language model in unification grammar format. Regulus will then compile it down to a normal GSL grammar.

What's the difference between unification grammar and GSL?

You can think of a unification grammar as parameterised GSL. Try looking at some of the example grammars to get the flavor of it...

Why is unification grammar better than ordinary GSL?

Parameterised GSL is better than straight GSL for the usual reasons; it's more compact and general, and it's easier to write maintainable and modular grammars.

Can Regulus produce any other kind of grammar formats?

The current version of Regulus can only produce Nuance GSL. Future versions may be able to produce other formats, for example ScanSoft or GrXML.

Do I need to know Prolog?

You need to know Prolog syntax and how to run Prolog programs. No knowledge of Prolog application programming is required.

Do I need to know linguistics?

You will probably find it easier to use Regulus if you have some basic knowledge of feature grammars and how they are used by linguists, but this is not strictly necessary if you are intending to build your grammars from scratch. Try looking at some of the example grammars and see if they make sense to you.
If you want to use the grammar specialisation tools, some understanding of linguistics and feature grammars is recommended.

How do I use Regulus as part of developing a speech application?

The most straightforward way to use Regulus is to build a static GSL grammar. This grammar can then be used as part of any normal Nuance application.
If you are building a Prolog-based application, you may find it convenient to use the RegServer, a simple utility which lets a Prolog program use a Regulus-based Nuance recognition package as though it were a Prolog predicate.

What can you do in the development environment?

The real point of Regulus is to be able to compile unification grammars into Nuance GSL grammars. Most people however find it easier to debug the unification grammars in text mode as unification grammars. There is a compiler which allows you to do this, by converting the Regulus grammar into a set of left-corner parser tables. These tables can be loaded and run inside the Regulus development environment .

What's this stuff about grammar specialisation?
You may simply want to use Regulus to compile your own unification grammars into Nuance GSL. Experience shows, however, that complex natural language grammars tend to have a lot of common structure, since they ultimately have to model general linguistic facts about English and other natural languages. There are consequently good reasons for wanting to save effort by implementing a SINGLE domain-independent core grammar, and producing domain-dependent versions out of it using some kind of specialisation process.
Regulus includes an experimental system which attempts to deliver this functionality. There is a general unification grammar for English , containing about 180 rules, and an accompanying core lexicon. For a given domain, you will need to supplement these with a domain-specific lexicon that you will write yourself. You will then be able to use the grammar specialisation tools to transform a small training corpus into a specialised version of the grammar.

How to install and run Regulus

Installing Regulus

1. Unpack the file Regulus.zip to an appropriate place. Set the environment variable $REGULUS to this place.

2. Make sure SICStus Prolog is installed on your system. Make sure that sicstus.exe is visible along your path. (I have C:\Program Files\SICStus Prolog\bin included in my path).

3. If you want to be able to give speech input to Regulus, do the following:

1. Make sure that /usr/bin (UNIX) or c:/cygwin/bin (Windows/Cygwin) are in your path.

2. Create a file called $REGULUS/scripts/run_license.bat, whose contents are a single line invoking the Nuance License Manager. This will require obtaining a license manager code from Nuance. A typical line would be something like the following (the license code is not genuine):

nlm C:/Nuance/Vocalizer4.0/license.txt ntk12-1234-a-1234-a1bc12de1234

Setting up the Regulus development environment

If you want to build and compile a Regulus grammar, the first step is to write a config file . The config file specifies the various files and parameters associated with your grammar. You can then start up the development environment as follows:

1. Start SICStus Prolog.

2. Load the Regulus system code by typing

:- ['$REGULUS/Prolog/load'].

at Prolog top-level.

3. Start the Regulus top-loop with the specified config file <Config> by typing

:- regulus('<Config>').

at Prolog top-level.

Note It is often convenient to specify the pathnames in the config file using Prolog file_search_path declarations, and this has in particular been done for the examples provided with this release. If you are using file_search_path declarations, you must load these before loading the config file. For example, the PSA application uses a set of file_search_path declarations kept in the file $REGULUS/Examples/PSA/scripts/library_declarations.pl. To start the Regulus development environment with the PSA example, you thus need to carry out the following specific series of commands:

1. Start SICStus Prolog.

2. :- ['$REGULUS/Prolog/load'].

3. :- ['$REGULUS/Examples/PSA/scripts/library_declarations'].

4. :- regulus('$REGULUS/Examples/PSA/scripts/psa.cfg').

Similar declaration files exist for the other example applications.

Using the Regulus development environment

Once you are in the development environment, you can get a listing of all the top-level Regulus commands by typing HELP:

>> HELP
(Print this message)
BATCH_DIALOGUE (Process dialogue corpus)
BATCH_DIALOGUE <Arg> (Process dialogue corpus with specified ID)
BATCH_DIALOGUE_SPEECH (Process dialogue speech corpus)
BATCH_DIALOGUE_SPEECH <Arg> (Process dialogue speech corpus with specified ID)
BATCH_DIALOGUE_SPEECH_AGAIN (Process dialogue speech corpus, using recognition results from previous run)
BATCH_DIALOGUE_SPEECH_AGAIN <Arg> (Process dialogue speech corpus with specified ID, using recognition results from previous run)
CHECK_ALTERF_PATTERNS (Check the consistency of the current Alterf patterns file)
COMPILE_ELLIPSIS_PATTERNS (Compile patterns used for ellipsis processing)
DIALOGUE (Do dialogue-style processing on input sentences)
DCG (Use DCG parser)
EBL (Do all EBL processing: equivalent to LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS, EBL_NUANCE)
EBL_ANALYSIS (Do all EBL processing, except for creation of Nuance grammar: equivalent to LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS)
EBL_GEMINI (Compile current specialised Regulus grammar into Gemini form)
EBL_GENERATION (Do main generation EBL processing: equivalent to LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS, EBL_LOAD_GENERATION)
EBL_GRAMMAR_PROBS (Create Nuance grammar probs training set from current EBL training set)
EBL_LOAD (Load current specialised Regulus grammar in DCG and left-corner form)
EBL_LOAD_GENERATION (Compile and load current specialised Regulus grammar for generation)
EBL_LOAD_GENERATION <Arg> (Compile and load designated version of current specialised Regulus grammar for generation)
EBL_NUANCE (Compile current specialised Regulus grammar into Nuance GSL form)
EBL_POSTPROCESS (Postprocess results of EBL training into specialised Regulus grammar)
EBL_TREEBANK (Parse all sentences in current EBL training set into treebank form)
EBL_TRAIN (Do EBL training on current treebank)
ECHO_ON (Echo input sentences (normally useful only in batch mode))
ECHO_OFF (Don't echo input sentences (default))
GEMINI (Compile current Regulus grammar into Gemini form)
GENERATION (Generate from parsed input sentences)
HELP (Print this message)
INIT_DIALOGUE (Initialise the dialogue state)
INTERLINGUA (Perform translation through interlingua)
LC (Use left-corner parser)
LINE_INFO_ON (Print line and file info for rules and lex entries in parse trees (default))
LINE_INFO_OFF (Don't print line and file info for rules and lex entries in parse trees)
LOAD (Load current Regulus grammar in DCG and left-corner form)
LOAD_TRANSLATE (Load translation-related files)
LOAD_GENERATION (Compile and load current generator grammar)
LOAD_GENERATION <Arg> (Compile and load current generator grammar, and store as designated subdomain grammar)
LOAD_SURFACE_PATTERNS (Load current surface patterns and associated files)
LOAD_DIALOGUE (Load dialogue-related files)
NO_INTERLINGUA (Perform translation directly, i.e. not through interlingua)
NORMAL_PROCESSING (Do normal processing on input sentences)
NOTRACE (Switch off tracing for DCG grammar)
NUANCE (Compile current Regulus grammar into Nuance GSL form)
NUANCE_COMPILE (Compile Nuance grammar into recogniser package)
SPLIT_SPEECH_CORPUS <GrammarName> <InCoverageId> <OutOfCoverageId> (Split speech corpus into in-coverage and out-of-coverage pieces with respect to the specified grammar)
STEPPER (Start grammar stepper)
SURFACE (Use surface pattern-matching parser)
TRACE (Switch on tracing for DCG grammar)
TRANSLATE (Do translation-style processing on input sentences)
TRANSLATE_TRACE_ON (Switch on translation tracing)
TRANSLATE_TRACE_OFF (Switch off translation tracing (default))
TRANSLATE_CORPUS (Process text translation corpus)
TRANSLATE_CORPUS <Arg> (Process text translation corpus with specified ID)
TRANSLATE_SPEECH_CORPUS (Process speech translation corpus)
TRANSLATE_SPEECH_CORPUS <Arg> (Process speech translation corpus with specified ID)
TRANSLATE_SPEECH_CORPUS_AGAIN (Process speech translation corpus, using recognition results from previous run)
TRANSLATE_SPEECH_CORPUS_AGAIN <Arg> (Process speech translation corpus with specified ID, using recognition results from previous run)
UPDATE_DIALOGUE_JUDGEMENTS (Update dialogue judgements file from annotated dialogue corpus output)
UPDATE_DIALOGUE_JUDGEMENTS <Arg> (Update dialogue judgements file with specified ID from annotated dialogue corpus output)
UPDATE_DIALOGUE_JUDGEMENTS_SPEECH (Update dialogue judgements file from annotated speech dialogue corpus output)
UPDATE_DIALOGUE_JUDGEMENTS_SPEECH <Arg> (Update dialogue judgements file with specified ID from annotated speech dialogue corpus
UPDATE_TRANSLATION_JUDGEMENTS (Update translation judgements from annotated translation corpus output)
UPDATE_TRANSLATION_JUDGEMENTS <Arg> (Update translation judgements file from annotated translation corpus output with specified ID)
UPDATE_TRANSLATION_JUDGEMENTS_SPEECH (Update translation judgements file from annotated speech translation corpus output)
UPDATE_TRANSLATION_JUDGEMENTS_SPEECH <Arg> (Update translation judgements file from annotated speech translation corpus output with specified ID)
UPDATE_RECOGNITION_JUDGEMENTS (Update recognition judgements file from temporary translation corpus recognition judgements)
UPDATE_RECOGNITION_JUDGEMENTS <Arg> (Update recognition judgements file from temporary translation corpus recognition judgements with specified ID)

The meanings of these commands are defined below.

HELP

Print the help message

LOAD

Load the current Regulus grammar. You need to do this first to be able to do parsing or training.
The current Regulus grammar is defined by the regulus_grammar config file entry.

DCG

Use DCG parser. The grammar can be parsed using either the left-corner parser (the default) or the DCG parser. The left-corner parser is faster, but the DCG parser can be useful for debugging. In particular, it can be used to parse non-top constituent ; the left-corner parser lacks this capability.

LC

Use left-corner parser. You can use this command to restore normal parsing after using the DCG parser for debugging.

STEPPER

Invoke the grammar stepper.

NORMAL_PROCESSING

Turn on normal processing, i.e. not translation mode processing or generation mode processing.

NUANCE

Compile current Regulus grammar into Nuance GSL form. You won't be able to use this command in conjunction with the large general grammar, since it currently runs out of memory during compilation - this why we need EBL. The NUANCE command is useful for smaller Regulus grammars, e.g. the original Medical SLT and House grammars.
The current Regulus grammar is defined by the regulus_grammar config file entry.
The location of the generated Nuance grammar is defined by the nuance_grammar config file entry.

GEMINI

Compile current Regulus grammar into Gemini form.
The current Regulus grammar is defined by the regulus_grammar config file entry.
The base name of the Gemini grammar <Gemini> is defined by the gemini_grammar config file entry.
Four files are created, called respectively <Gemini>.syn, <Gemini>.sem, <Gemini>.features and <Gemini>.lex. Regulus semantics is translated into Gemini semantics in a straightforward way, so that Nuance functions simply become Prolog functors.

TRACE

Switch on Prolog tracing for the predicates in the DCG grammar representing categories. Occasionally useful.

NOTRACE

Switch off Prolog tracing for DCG grammar

TRANSLATE

Do translation-style processing on input sentences. In this mode, the sentence is parsed using the current parser. If any parses are found, the first one is processed through translation and generation. Translation is performed using interlingual rules if the INTERLINGUA command has been applied, otherwise using direct transfer.

DIALOGUE

Do dialogue-style processing on input sentences. In this mode, the sentence is parsed using the current parser. If any parses are found, the first one is processed through the code defined by the dialogue_files config file entry.

INTERLINGUA

Make translation processing go through interlingua. This applies both to interactive processing when the TRANSLATE command is in effect, and to batch processing using the commands TRANSLATE_CORPUS, TRANSLATE_SPEECH_CORPUS and TRANSLATE_SPEECH_CORPUS_AGAIN.

NO_INTERLINGUA

Make translation processing use direct transfer. This applies both to interactive processing when the TRANSLATE command is in effect, and to batch processing using the commands TRANSLATE_CORPUS, TRANSLATE_SPEECH_CORPUS and TRANSLATE_SPEECH_CORPUS_AGAIN.

LOAD_TRANSLATE

Load all translation-related files defined in the currently valid config file. These consist of a subset of the following:

� One or more transfer rules files (optional) defined by the transfer_rules config file entry.

� An interlingua declarations file (optional) defined by the interlingua_declarations config file entry.

� One or more to_interlingua rules files (optional) defined by the to_interlingua_rules config file entry.

� One or more from_interlingua rules files (optional) defined by the from_interlingua_rules config file entry.

� An ellipsis classes file (optional) defined by the ellipsis_classes config file entry. If this is defined, you need to compile it first using the COMPILE_ELLIPSIS_PATTERNS command.

� A generation grammar file (required) defined by the generation_rules config file entry. This should be the compiled form of a Regulus grammar for the target language. The compiled generation grammar must first be created using the LOAD_GENERATION command.

� A generation preferences file (optional) defined by the generation_preferences config file entry.

� A collocations file (optional) defined by the collocation_rules config file entry.

� An orthography rules file (optional) defined by the orthography_rules config file entry.

If the config file entries wavfile_directory and wavfile_recording_script are defined, implying that output speech will be produced using recorded wavfiles, this command also produces a new version of the file defined by wavfile_recording_script.

TRANSLATE_TRACE_ON

Switch on translation tracing.

TRANSLATE_TRACE_OFF

Switch off translation tracing.

COMPILE_ELLIPSIS_PATTERNS

Compile the patterns used for ellipsis processing, which are defined by the ellipsis_classes config file entry. The compiled patterns will be loaded next time you invoke LOAD_TRANSLATE.

TRANSLATE_CORPUS

Process the default text mode translation corpus, defined by the translation_corpus config file entry. The output file, defined by the translation_corpus_results config file entry, contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the translation judgements file (defined by the translation_corpus_judgements config file entry) using the command UPDATE_TRANSLATION_JUDGEMENTS.

TRANSLATE_CORPUS <Arg>

Parameterised version of TRANSLATE_CORPUS. Process the text mode translation corpus with ID <Arg>, defined by the parameterised config file entry translation_corpus(<Arg>). The output file, defined by the parameterised config file entry translation_corpus_results(<Arg>), contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the translation judgements file (defined by the translation_corpus_judgements config file entry) using the parameterised command UPDATE_TRANSLATION_JUDGEMENTS <Arg>.

UPDATE_TRANSLATION_JUDGEMENTS

Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the default text translation corpus output file, defined by the translation_corpus_results config file entry. This command should be used after editing the output file produced by the TRANSLATE_CORPUS command. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.

UPDATE_TRANSLATION_JUDGEMENTS <Arg>

Parameterised version of UPDATE_TRANSLATION_JUDGEMENTS. Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the text translation corpus output file with ID <Arg>, defined by the parameterised config file entry translation_corpus_results(<Arg>). This command should be used after editing the output file produced by the parameterised command TRANSLATE_CORPUS <Arg>. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.

TRANSLATE_SPEECH_CORPUS

Process speech mode translation corpus, defined by the translation_speech_corpus config file entry. The output file, defined by the translation_speech_corpus_results config file entry, contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the stored translation judgements file using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH. A second output file, defined by the translation_corpus_tmp_recognition_judgements config file entry, contains "blank" recognition judgements: here, the question marks should be replaced with either 'y' (acceptable recognition), or 'n' (unacceptable recognition). Recognition judgements can be updated using the UPDATE_RECOGNITION_JUDGEMENTS command.

TRANSLATE_SPEECH_CORPUS <Arg>

Parameterised version of TRANSLATE_SPEECH_CORPUS. Process speech mode translation corpus, defined by the translation_speech_corpus(<Arg>) config file entry. The output file, defined by the translation_speech_corpus_results(<Arg) config file entry, contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the stored translation judgements file using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH <Arg>. A second output file, defined by the translation_corpus_tmp_recognition_judgements(<Arg>) config file entry, contains "blank" recognition judgements: here, the question marks should be replaced with either 'y' (acceptable recognition), or 'n' (unacceptable recognition). Recognition judgements can be updated using the UPDATE_RECOGNITION_JUDGEMENTS <Arg> command.

TRANSLATE_SPEECH_CORPUS_AGAIN

Process speech mode translation corpus, starting from the results saved from the most recent invocation of the TRANSLATE_SPEECH_CORPUS command. This is useful if you are testing speech translation performance, but have only changed the translation or generation files. The output files are the same as for the TRANSLATE_SPEECH_CORPUS command.

TRANSLATE_SPEECH_CORPUS_AGAIN <Arg>

Parameterised version of TRANSLATE_SPEECH_CORPUS_AGAIN. Process speech mode translation corpus, starting from the results saved from the most recent invocation of the TRANSLATE_SPEECH_CORPUS <Arg> command. This is useful if you are testing speech translation performance, but have only changed the translation or generation files. The output files are the same as for the TRANSLATE_SPEECH_CORPUS <Arg> command.

UPDATE_TRANSLATION_JUDGEMENTS_SPEECH

Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the speech translation corpus output file, defined by the translation_speech_corpus_results config file entry. This command should be used after editing the output file produced by the TRANSLATE_SPEECH_CORPUS or TRANSLATE_SPEECH_CORPUS_AGAIN command. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.

UPDATE_TRANSLATION_JUDGEMENTS_SPEECH <Arg>

Parameterised version of UPDATE_TRANSLATION_JUDGEMENTS_SPEECH. Update the translation judgements file, defined by the translation_corpus_judgements config file entry, from the output of the speech translation corpus output file, defined by the translation_speech_corpus_results(<Arg>) config file entry. This command should be used after editing the output file produced by the TRANSLATE_SPEECH_CORPUS <Arg> or TRANSLATE_SPEECH_CORPUS_AGAIN <Arg> command. Editing should replace question marks by valid judgements, currently 'good', 'ok' or 'bad'.

UPDATE_RECOGNITION_JUDGEMENTS

Update recognition judgements file, defined by the translation_corpus_recognition_judgements config file entry, from the temporary translation corpus recognition judgements file, defined by the translation_corpus_tmp_recognition_judgements config file entry and produced by the TRANSLATE_SPEECH_CORPUS or TRANSLATE_SPEECH_CORPUS_AGAIN commands. This command should be used after editing the temporary translation corpus recognition judgements file. Editing should replace question marks by valid judgements, currently 'y' or 'n'.

UPDATE_RECOGNITION_JUDGEMENTS <Arg>

Parameterised version of UPDATE_RECOGNITION_JUDGEMENTS. Update recognition judgements file, defined by the translation_corpus_recognition_judgements config file entry, from the temporary translation corpus recognition judgements file, defined by the translation_corpus_tmp_recognition_judgements(<Arg>) config file entry and produced by the TRANSLATE_SPEECH_CORPUS <Arg> or TRANSLATE_SPEECH_CORPUS_AGAIN <Arg> commands. This command should be used after editing the temporary translation corpus recognition judgements file. Editing should replace question marks by valid judgements, currently 'y' or 'n'.

SPLIT_SPEECH_CORPUS <GrammarName> <InCoverageId> <OutOfCoverageId>

Splits the speech translation corpus output file, defined by the translation_speech_corpus config file entry, into an in-coverage part defined by a translation_speech_corpus(<InCoverageId>) config file entry, and an out-of-coverage part defined by a translation_speech_corpus(<OutOfCoverageId>) config file entry. Coverage is with respect to the top-level grammar <GrammarName>, which must be loaded.

Typical call:

SPLIT_SPEECH_CORPUS .MAIN in_coverage out_of_coverage

LOAD_GENERATION

Compile and load the current generation grammar, defined by the regulus_grammar or generation_regulus_grammar config file entry. The resulting compiled generation grammar is placed in the file defined by the generation_grammar config file entry.

LOAD_GENERATION <Arg>

Compile and load the current generation grammar, defined by the generation_grammar config file entry. The resulting compiled generation grammar is placed in the file defined by the generation_grammar(<Arg>) config file entry. This can be useful if you are normally using grammar specialisation to build the generation grammar.

GENERATION

Run the system in "generation mode". Each input sentence is analysed. If any parses are found, the first one is generated back using the currently loaded generation grammar, showing all possible generated strings. This is normally used for debugging the generation grammar.

ECHO_ON

Echo utterances at top-level. This is often useful when running the system in batch mode.

ECHO_OFF

Don't echo utterances at top-level (default).

LINE_INFO_ON

Print line and file info for rules and lex entries in parse trees (default). A typical parse tree will look like this:

.MAIN [TOY1_RULES:1-5]
   utterance [TOY1_RULES:6-10]
      command [TOY1_RULES:11-15]
      / verb lex(switch) [TOY1_LEXICON:7-9]
      | onoff null lex(on) [TOY1_LEXICON:23-24]
      | np [TOY1_RULES:26-30]
      | / lex(the)
      | | noun lex(light) [TOY1_LEXICON:15-16]
      | | location_pp [TOY1_RULES:31-34]
      | | / lex(in)
      | | | np [TOY1_RULES:26-30]
      | | | / lex(the)
      | | | | noun lex(kitchen) [TOY1_LEXICON:20-21]
      \ \ \ \ null

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:   c:/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

LINE_INFO_OFF

Don't print line and file info for rules and lex entries in parse trees. A typical parse tree will look like this:

.MAIN
   utterance
      command
      / verb lex(switch)
      | onoff null lex(on)
      | np
      | / lex(the)
      | | noun lex(light)
      | | location_pp
      | | / lex(in)
      | | | np
      | | | / lex(the)
      | | | | noun lex(kitchen)
      \ \ \ \ null

LOAD_SURFACE_PATTERNS

Compile and load the current surface pattern and associated files. You can then parse in surface mode using the SURFACE command. The following config file entries must be defined:

� surface_patterns

� tagging_grammar

� target_model

� discriminants

� surface_postprocessing

SURFACE

Parse utterances using the surface parser. This assumes that surface pattern files have been loaded, using the LOAD_SURFACE_PATTERNS command.

LOAD_DIALOGUE

Compile the files defined by the dialogue_files config file entry.

INIT_DIALOGUE

Initialise the dialogue state when running in dialogue mode.

EBL_TREEBANK

Parse all sentences in current EBL training set, defined by the ebl_corpus config file entry, into treebank form. Sentences that fail to parse are printed out with warning messages, and a summary statistic is produced at the end of the run. This is very useful for checking where you are with coverage.

EBL_TRAIN

Do EBL training on current treebank. You need to build the treebank first using the EBL_TREEBANK command.

EBL_POSTPROCESS

Postprocess results of EBL training into a specialised Regulus grammar. You need to create these results first using the EBL_TRAIN command.

EBL_LOAD

Load current specialised Regulus grammar in DCG and left-corner form. Same as the LOAD command, but for the specialised grammar. The specialised grammar needs to be created using the EBL_TREEBANK, EBL_TRAIN and EBL_POSTPROCESS commands.

EBL_LOAD_GENERATION

Compile and load the current specialised generation grammar. This will be the file <prefix>_specialised_no_binarise_default.regulus, where <prefix> is the value of the config file entry working_file_prefix. The resulting compiled generation grammar is placed in the file defined by the generation_rules config file entry. Note that EBL_LOAD_GENERATION places the compiled generation grammar in the same place as LOAD_GENERATION.

EBL_LOAD_GENERATION <SubdomainTag>

Parameterised version of EBL_LOAD_GENERATION. Compile and load the specialised generation grammar for the subdomain tag <SubdomainTag>. This will be the file <prefix>_specialised_no_binarise_<SubdomainTag>.regulus, where <prefix> is the value of the config file entry working_file_prefix. The resulting compiled generation grammar is placed in the file defined by the generation_grammar(<SubdomainTag>) config file entry. Note that EBL_LOAD_GENERATION <SubdomainTag> places the compiled generation grammar in the same place as LOAD_GENERATION <SubdomainTag>.

EBL_NUANCE

Compile current specialised Regulus grammar into Nuance GSL form. Same as the NUANCE command, but for the specialised grammar. The input is the file created by the EBL_POSTPROCESS command; the output Nuance GSL grammar is placed in the file defined by the ebl_nuance_grammar config file entry.

EBL_GEMINI

Compile current specialised Regulus grammar into Gemini form. Same as the GEMINI command, but for the specialised grammar. The base name of the Gemini files produced is defined by the ebl_gemini_grammar config file entry.

EBL_ANALYSIS

Do all EBL processing, except for creation of Nuance grammar: equivalent to the sequence LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS

EBL

Do all EBL processing: equivalent to the sequence LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS, EBL_NUANCE

EBL_GENERATION

Do all EBL processing for generation: equivalent to the sequence LOAD, EBL_TREEBANK, EBL_TRAIN, EBL_POSTPROCESS, EBL_LOAD_GENERATION

EBL_GRAMMAR_PROBS

Convert the current EBL training set, defined by the by the ebl_corpus config file entry, into a form that can be used as training data by the Nuance compute-grammar-probs utility. The output training data is placed in the file defined by the ebl_grammar_probs config file entry.

CHECK_ALTERF_PATTERNS

Check the consistency of the current Alterf patterns file, defined by the alterf_patterns_file config file entry.

NUANCE_COMPILE

Compile the generated Nuance grammar, defined by the nuance_grammar config file entry, into a recognition package with the same name. This will be done using the Nuance language pack defined by the nuance_language_pack config file entry and the extra parameters defined by the nuance_compile_params config file entry. Typical values for these parameters are as follows:

regulus_config(nuance_language_pack, 'English.America').
regulus_config(nuance_compile_params, ['-auto_pron', '-dont_flatten']).

BATCH_DIALOGUE

Process the default dialogue mode development corpus, defined by the dialogue_corpus config file entry. The output file, defined by the dialogue_corpus_results config file entry, contains question marks for dialogue processing steps that have not yet been judged. If these are replaced by valid judgements, currently 'good', or 'bad', the new judgements can be incorporated into the dialogue judgements file (defined by the dialogue_corpus_judgements config file entry) using the command UPDATE_DIALOGUE_JUDGEMENTS.

BATCH_DIALOGUE <Arg>

Parameterised version of BATCH_DIALOGUE. Process the default dialogue mode development corpus, defined by the dialogue_corpus(<Arg>) config file entry. The output file, defined by the dialogue_corpus_results(<Arg>) config file entry, contains question marks for dialogue processing steps that have not yet been judged. If these are replaced by valid judgements, currently 'good', or 'bad', the new judgements can be incorporated into the dialogue judgements file (defined by the dialogue_corpus_judgements config file entry) using the command UPDATE_DIALOGUE_JUDGEMENTS <Arg>.

BATCH_DIALOGUE_SPEECH

Speech mode version of BATCH_DIALOGUE. Process the default dialogue mode speech corpus, defined by the dialogue_speech_corpus config file entry. The output file, defined by the dialogue_speechcorpus_results config file entry, contains question marks for dialogue processing steps that have not yet been judged. If these are replaced by valid judgements, currently 'good', or 'bad', the new judgements can be incorporated into the dialogue judgements file (defined by the dialogue_corpus_judgements config file entry) using the command UPDATE_DIALOGUE_JUDGEMENTS_SPEECH.

BATCH_DIALOGUE_SPEECH <Arg>

Parameterised speech mode version of BATCH_DIALOGUE. Process the default dialogue mode speech corpus, defined by the dialogue_corpus(<Arg>) config file entry. The output file, defined by the dialogue_speech_corpus_results(<Arg>) config file entry, contains question marks for dialogue processing steps that have not yet been judged. If these are replaced by valid judgements, currently 'good', or 'bad', the new judgements can be incorporated into the dialogue judgements file (defined by the dialogue_corpus_judgements config file entry) using the command UPDATE_DIALOGUE_JUDGEMENTS_SPEECH <Arg>.

BATCH_DIALOGUE_SPEECH_AGAIN

Version of BATCH_DIALOGUE_SPEECH that skips the speech recognition stage, and instead uses stored results from the previous run.

BATCH_DIALOGUE_SPEECH_AGAIN <Arg>

Version of BATCH_DIALOGUE_SPEECH <Arg> that skips the speech recognition stage, and instead uses stored results from the previous run.

UPDATE_DIALOGUE_JUDGEMENTS

Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the default text dialogue corpus output file, defined by the dialogue_corpus_results config file entry. This command should be used after editing the output file produced by the BATCH_DIALOGUE command. Editing should replace question marks by valid judgements, currently 'good', or 'bad'.

UPDATE_DIALOGUE_JUDGEMENTS <Arg>

Parameterised version of UPDATE_DIALOGUE_JUDGEMENTS. Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the dialogue corpus output file with ID <Arg>, defined by the parameterised config file entry dialogue_corpus_results(<Arg>). This command should be used after editing the output file produced by the parameterised command BATCH_DIALOGUE <Arg>. Editing should replace question marks by valid judgements, currently 'good' or 'bad'.

UPDATE_DIALOGUE_JUDGEMENTS_SPEECH

Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the default speech dialogue corpus output file, defined by the dialogue_speech_corpus_results config file entry. This command should be used after editing the output file produced by the BATCH_DIALOGUES_SPEECH command. Editing should replace question marks by valid judgements, currently 'good', or 'bad'.

UPDATE_DIALOGUE_JUDGEMENTS_SPEECH <Arg>

Parameterised version of UPDATE_DIALOGUE_JUDGEMENTS_SPEECH. Update the dialogue judgements file, defined by the dialogue_corpus_judgements config file entry, from the output of the dialogue corpus output file with ID <Arg>, defined by the parameterised config file entry dialogue_corpus_results(<Arg>). This command should be used after editing the output file produced by the parameterised command BATCH_DIALOGUES_SPEECH <Arg>. Editing should replace question marks by valid judgements, currently 'good' or 'bad'.

Parsing utterances in the Regulus development environment

You can parse full utterances by typing them in at Regulus top-level. If parsing is successful, Regulus returns the output logical form and the parse tree, e.g.

>> switch on the light in the kitchen
(Parsing with left-corner parser)

Analysis time: 0.00 seconds

Return value: [[utterance_type,command],[action,switch],[onoff,on],[device,light],[location,kitchen]]

Global value: []

Syn features: []

Parse tree:

.MAIN [TOY1_RULES:1-5]
   utterance [TOY1_RULES:6-10]
      command [TOY1_RULES:11-15]
      / verb lex(switch) [TOY1_LEXICON:7-9]
      | onoff null lex(on) [TOY1_LEXICON:23-24]
      | np [TOY1_RULES:26-30]
      | / lex(the)
      | | noun lex(light) [TOY1_LEXICON:15-16]
      | | location_pp [TOY1_RULES:31-34]
      | | / lex(in)
      | | | np [TOY1_RULES:26-30]
      | | | / lex(the)
      | | | | noun lex(kitchen) [TOY1_LEXICON:20-21]
      \ \ \ \ null

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES:   c:/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

The formatting of the parse tree can be controlled using the LINE_INFO_ON and LINE_INFO_OFF commands.

Parsing non-top constituents

For small grammars, you can run in DCG mode, using the DCG command.
When the DCG parser is being used, you can also parse non-top constituents using the command syntax

>> <NameOfConstituent> : <Words>

so for example

>> DCG
(Use DCG parser)

>> np: the light in the kitchen
(Parsing with DCG parser)

Analysis time: 0.00 seconds

Return value: [[device,light],[location,kitchen]]

Global value: []

Syn features: [sem_np_type=switchable\/dimmable,singplur=sing]

Parse tree:

np [TOY1_RULES:26-30]
/ lex(the)
| noun lex(light) [TOY1_LEXICON:15-16]
| location_pp [TOY1_RULES:31-34]
| / lex(in)
| | np [TOY1_RULES:26-30]
| | / lex(the)
| | | noun lex(kitchen) [TOY1_LEXICON:20-21]
\ \ \ null

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus
TOY1_RULES: c:/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Using the grammar stepper

The DCG mode is not suitable for large grammars, and is often inconvenient even for small ones. A better way to debug grammars is to use the grammar stepper, which is invoked using the STEPPER command. This enters a special version of the top loop, with its own set of commands. The basic functionality of the stepper is to support manipulation of parse trees. Trees can be created, examined, cut and pasted. It is usually quite easy to find feature bugs by carrying out a short sequence of these operations.

The stepper commands are as follows:

HELP��

Print help message describing commands

LEX WordOrWords

Add item for WordOrWords, e.g. 'LEX fan' or 'LEX living room'

GAP

Add item for gap expression�

��

PARSE <WordOrWords>

Add item formed by parsing <WordOrWords>, e.g. 'PARSE switch on the light'

COMBINE <IDOrIDs>

Combine items into a new item, e.g. 'COMBINE 1' or 'COMBINE 1 3'

CUT <ID> <Node>

Cut item <ID> at <Node>, e.g. 'CUT 2 3'

JOIN <ID1> <Node> <ID2>

Attach item <ID2> under <Node> of <ID1>, e.g. 'JOIN 1 15 4'

JOIN <ID1> <ID2>

Attach item <ID2> under <ID1>, e.g. 'JOIN 1 4'��

��

SHOW <ID>

Show item <ID>, e.g. 'SHOW 1'

SHOW <ID> <Node>

Show material under <Node> of item <ID>, e.g. 'SHOW 1 15'

RULE <ID> <Node>

Show rule at <Node> of item <ID>, e.g. 'RULE 1 15'

DELETE <IDOrIDs>

Delete item <IDorIDs>, e.g. 'DELETE 1' or 'DELETE 1 2'

DELETE_ALL

Delete all items

��

SUMMARY

Print summary for each item

EXIT

Leave stepper

The following annotated session using the Toy1 grammar illustrates use of the stepper.

Load Regulus environment

| ?- ['$REGULUS/Prolog/load'].

<snip>

Start Regulus with Toy1 grammar

| ?- regulus('$REGULUS/Examples/Toy1/scripts/toy1.cfg').

Loading settings from Regulus config file c:/cygwin/home/speech/regulus/examples/toy1/scripts/toy1.cfg

Loading settings from Regulus config file c:/cygwin/home/speech/regulus/examples/toy1/scripts/file_search_paths.cfg

>> LOAD

<snip>

Enter stepper loop. Note that the prompt changes.

>> STEPPER

(Start grammar stepper)

Print help message.

STEPPER>> HELP

Available stepper commands:

HELP�� - print this message

LEX WordOrWords�� - add item for WordOrWords, e.g. 'LEX pain' or 'LEX bright light'

GAP�� - add item for gap expression��

PARSE WordOrWords�� - add item formed by parsing WordOrWords, e.g. 'PARSE where is the pain'

COMBINE IDOrIDs�� - combine items into a new item, e.g. 'COMBINE 1' or 'COMBINE 1 3'

CUT ID Node�� - cut item ID at Node, e.g. 'CUT 2 3'

JOIN ID1 Node ID2�� - attach item ID2 under Node of ID1, e.g. 'JOIN 1 15 4'

JOIN ID1 ID2�� - attach item ID2 under ID1, e.g. 'JOIN 1 4'��

SHOW ID�� - show item ID, e.g. 'SHOW 1'

SHOW ID Node�� - show material under Node of item ID, e.g. 'SHOW 1 15'

DELETE IDOrIDs�� - delete item ID or IDs, e.g. 'DELETE 1' or 'DELETE 1 2'

DELETE_ALL� ��- delete all items��

SUMMARY�� - print summary for each item

EXIT�� - leave stepper

Parse a sentence.

STEPPER>> PARSE switch on the light�

Added item 1: .MAIN-->switch,on,the,light

Look at the resulting item.

STEPPER>> SHOW 1

Form:� .MAIN-->switch,on,the,light

Sem:�� concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],[[device,light]])))

Feats: []

Tree:

.MAIN (node 1) [TOY1_RULES:1-5]

�� utterance (node 2) [TOY1_RULES:6-10]

� ��command (node 3) [TOY1_RULES:11-15]

�� /� verb lex(switch) (node 4) [TOY1_LEXICON:7-9]

�� |� onoff lex(on) (node 5) [TOY1_LEXICON:23-24]

�� |� np (node 6) [TOY1_RULES:26-30]

�� |� /� lex(the)

�� \� \� noun lex(light) (node 7) [TOY1_LEXICON:15-16]

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

We can look at constituents inside the item. Here, we examine node 6:

STEPPER>> SHOW 1 6

Form:� np-->the,light

Sem:�� [[device,light]]

Feats: [sem_np_type=switchable\/dimmable,singplur=sing]

Tree:

np (node 1) [TOY1_RULES:26-30]

/� lex(the)

\� noun lex(light) (node 2) [TOY1_LEXICON:15-16]

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

We can also print the rule at this node:

STEPPER>> RULE 1 6

np:[sem=concat(Noun, Loc), singplur=N, sem_np_type=SemType] -->

�� the,

�� noun:[sem=Noun, singplur=N, sem_np_type=SemType],

�� ?location_pp:[sem=Loc].

We want to find out why the sentence "switch on the kitchen" doesn't parse. We try to cut and paste a tree for it. First, we create a tree containing the NP "the kitchen":

STEPPER>> PARSE switch on the light in the kitchen

Added item 2: .MAIN-->switch,on,the,light,in,the,kitchen

STEPPER>> SHOW 2

Form:� .MAIN-->switch,on,the,light,in,the,kitchen

Sem:�� concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],concat([[device,light]],

[[location,kitchen]]))))

Feats: []

Tree:

.MAIN (node 1) [TOY1_RULES:1-5]

�� utterance (node 2) [TOY1_RULES:6-10]

�� command (node 3) [TOY1_RULES:11-15]

�� /� verb lex(switch) (node 4) [TOY1_LEXICON:7-9]

�� |� onoff lex(on) (node 5) [TOY1_LEXICON:23-24]

�� |� np (node 6) [TOY1_RULES:26-30]

�� | �/� lex(the)

�� |� |� noun lex(light) (node 7) [TOY1_LEXICON:15-16]

�� |� |� location_pp (node 8) [TOY1_RULES:31-34]

�� |� |� /� lex(in)

�� |� |� |� np (node 9) [TOY1_RULES:26-30]

�� |� |� |� /� lex(the)

�� \� \� \� \� noun lex(kitchen) (node 10) [TOY1_LEXICON:20-21]

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

We cut at node 9, to create two pieces.

STEPPER>> CUT 2 9

Added item 3: .MAIN-->switch,on,the,light,in,np

Added item 4: np-->the,kitchen

Item 3 has the missing node marked as "cut".

STEPPER>> SHOW 3

Form:� .MAIN-->switch,on,the,light,in,np

Sem:�� concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],concat([[device,light]],(Sem for node 9)))))

Feats: []

Tree:

.MAIN (node 1) [TOY1_RULES:1-5]

�� utterance (node 2) [TOY1_RULES:6-10]

�� command (node 3) [TOY1_RULES:11-15]

�� /� verb lex(switch) (node 4) [TOY1_LEXICON:7-9]

�� |� onoff lex(on) (node 5) [TOY1_LEXICON:23-24]

�� |� np (node 6) [TOY1_RULES:26-30]

�� |� /� lex(the)

�� |� |� noun lex(light) (node 7) [TOY1_LEXICON:15-16]

�� |� |� location_pp (node 8) [TOY1_RULES:31-34]

�� |� |� /� lex(in)

�� \� \� \� np (node 9) *cut*

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Item 4 is the one we want.

STEPPER>> SHOW 4

Form:� np-->the,kitchen

Sem:�� [[location,kitchen]]

Feats: [sem_np_type=location,singplur=sing]

Tree:

np (node 1) [TOY1_RULES:26-30]

/� lex(the)

\� noun lex(kitchen) (node 2) [TOY1_LEXICON:20-21]

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Use the SUMMARY command to see what we have available.

STEPPER>> SUMMARY

1: .MAIN-->switch,on,the,light

2: .MAIN-->switch,on,the,light,in,the,kitchen

3: .MAIN-->switch,on,the,light,in,np

4: np-->the,kitchen

Take another look at item 1

STEPPER>> SHOW 1

Form:� .MAIN-->switch,on,the,light

Sem:�� concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],[[device,light]])))

Feats: []

Tree:

.MAIN (node 1) [TOY1_RULES:1-5]

�� utterance (node 2) [TOY1_RULES:6-10]

�� command (node 3) [TOY1_RULES:11-15]

�� /� verb lex(switch) (node 4) [TOY1_LEXICON:7-9]

�� |� onoff lex(on) (node 5) [TOY1_LEXICON:23-24]

�� |� np (node 6) [TOY1_RULES:26-30]

�� |� /� lex(the)

�� \� \� noun lex(light) (node 7) [TOY1_LEXICON:15-16]

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Cut out the NP.

STEPPER>> CUT 1 6

Added item 5: .MAIN-->switch,on,np

Added item 6: np-->the,light

We are going to try and paste together item 5 and item 4. Take a look at them:

STEPPER>> SHOW 5

Form:� .MAIN-->switch,on,np

Sem:�� concat([[utterance_type,command]],concat([[action,switch]],concat([[onoff,on]],(Sem for node 6))))

Feats: []

Tree:

.MAIN (node 1) [TOY1_RULES:1-5]

�� utterance (node 2) [TOY1_RULES:6-10]

�� command (node 3) [TOY1_RULES:11-15]

�� /� verb lex(switch) (node 4) [TOY1_LEXICON:7-9]

�� |� onoff lex(on) (node 5) [TOY1_LEXICON:23-24]

�� \� np (node 6) *cut*

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

STEPPER>> SHOW 4

Form:� np-->the,kitchen

Sem:�� [[location,kitchen]]

Feats: [sem_np_type=location,singplur=sing]

Tree:

np (node 1) [TOY1_RULES:26-30]

/� lex(the)

\� noun lex(kitchen) (node 2) [TOY1_LEXICON:20-21]

------------------------------- FILES -------------------------------

TOY1_LEXICON: c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_lexicon.regulus

TOY1_RULES:�� c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Now try and join them together. Item 4 is supposed to fit into the *cut* node in item 5.

STEPPER>> JOIN 5 4

Incompatible syntactic feats in categories:

np:[sem_np_type=switchable,singplur=A]

np:[sem_np_type=location,singplur=sing]

Feature clash: sem_np_type=switchable, sem_np_type=location

*** Error processing stepper command: "JOIN 5 4"

It didn't work, and we can see why: the sem_np_type features don't match.

We can also build items bottom-up, out of lexical entries. This is usually less efficient, but can be necessary if there is no way to cut and paste.

Make an item for the lexical entry "light":

STEPPER>> LEX light

Added item 7: noun-->light

Find a rule that can dominate item 7, and apply it. If there are several such rules, the stepper will present a menu.

STEPPER>> COMBINE 7

Using rule between lines 26 and 30 in c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Added item 8: np-->the,light

Same for "the living room".

STEPPER>> LEX living room

Added item 9: noun-->living,room

Make it into an NP.

STEPPER>> COMBINE 9

Using rule between lines 26 and 30 in c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Added item 10: np-->the,living,room

Make that into a location_pp.

STEPPER>> COMBINE 10

Using rule between lines 31 and 34 in c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Added item 11: location_pp-->in,the,living,room

We can combine this with item 7 to make the NP "the light in the living room".

STEPPER>> COMBINE 7 11

Using rule between lines 26 and 30 in c:/cygwin/home/speech/regulus/examples/toy1/regulus/toy1_rules.regulus

Added item 12: np-->the,light,in,the,living,room

� and if we want, we can successfully paste it into the cut in item 5, to make "switch on the light in the living room".

STEPPER>> JOIN 5 12

Added item 13: .MAIN-->switch,on,the,light,in,the,living,room

The config file

The config file specifies the various files and parameters referred to by a Regulus application. Each config item is defined by a declaration of the form

regulus_config(<ConfigItem>, <Value>).

You can include Prolog file_search_path declarations directly in the config file, using the syntax

file_search_path(<Name>, <Value>).

You can also allow one config file to load information from another one, using the syntax

include(<Pathname>).

Recursive includes are permitted.

The full set of possible config items is listed immediately below. For most applications, you will not need to specify more than a small fraction of the items in this set. For example, the config file for the Toy0 application is as follows:

regulus_config(regulus_grammar, [domain_specific_regulus_grammars(toy0)]).
regulus_config(top_level_cat, '.MAIN').
regulus_config(nuance_grammar, toy0_runtime(recogniser)).

regulus_config(working_file_prefix, toy0_runtime(toy0)).

This config file says that the Regulus grammar consists of the single file domain_specific_regulus_grammars(toy0), that its top-level category is .MAIN, that the generated Nuance grammar is to be placed in the location toy0_runtime(recogniser), and that working files are to be placed in the location toy0_runtime(toy0). Note that pathnames are specified here using Prolog file_search_path declarations.

� alterf_patterns_file

� collocation_rules

� compiled_ellipsis_classes

� dialogue_files

� discriminants

� ebl_context_use_threshold

� ebl_corpus

� ebl_include_lex

� ebl_regulus_component_grammar

� ellipsis_classes

� from_interlingua_rules

� gemini_grammar

� generation_grammar

� generation_grammar(<Arg>)

� generation_incremental_deepening_parameters

� generation_module_name

� generation_preferences

� generation_regulus_grammar

� generation_rules

� global_context

� ignore_subdomain

� interlingua_declarations

� lf_postproc_pred

� nuance_grammar

� nuance_compile_params

� nuance_language_pack

� orthography_rules

� parse_preferences

� regulus_grammar

� regulus_no_sem_decls

� surface_constituent_rules

� surface_patterns

� surface_postprocessing

� tagging_grammar

� target_model

� to_interlingua_rules

� to_source_discourse_rules

� top_level_cat

� top_level_generation_cat

� top_level_generation_feat

� top_level_generation_pred

� transfer_rules

� translation_corpus

� translation_corpus(<Arg>)

� translation_corpus_judgements

� translation_corpus_recognition_judgements

� translation_corpus_results

� translation_corpus_results(<Arg>)

� translation_corpus_tmp_recognition_judgements

� translation_corpus_tmp_recognition_judgements(<Arg>)

� translation_rec_params

� translation_speech_corpus

� translation_speech_corpus(<Arg>)

� translation_speech_corpus_results

� translation_speech_corpus_results(<Arg>)

� wavfile_directory

� wavfile_recording_script

� working_directory

alterf_patterns_file
Relevant if you are doing Alterf processing with LF patterns. Points to a file containing Alterf LF patterns, that can be tested using the CHECK_ALTERF_PATTERNS command.

collocation_rules
Relevant to translation applications. Points to a file containing rules for post-transfer collocation processing.

compiled_ellipsis_classes
Relevant to translation applications. Points to a file containing the compiled form of the ellipsis processing rules.

dialogue_files
Relevant to dialogue applications. Points to a list of files defining dialogue processing behaviour.

discriminants
Relevant to applications using surface processing. Points to a file of Alterf discriminants.

ebl_context_use_threshold
Relevant to applications using grammar specialisation. Defines the minimum number of examples of a rule that must be present if the system is to use rule context anti-unification to further constrain that rule.

ebl_corpus
Points to the file of training examples used as input to the EBL_TREEBANK operation. Intended originally for use for grammar specialisation, but can also be used simply to parse a set of examples to get information about coverage. The format is sent(Atom), so for example a typical line would be

sent('switch off the light').

(note the closing period).

If the application compiles multiple top-level specialised grammars, the grammars relevant to each example are defined in an optional second argument. For example, if a home control domain had separate grammars for each room, a typical line in the training file might be

sent('switch off the light', [bedroom, kitchen, living_room]).

ebl_gemini_grammar
Relevant to applications using grammar specialisation. Specifies the base name of the Gemini files generated by the EBL_GEMINI command.

ebl_grammar_probs

Relevant to applications using grammar specialisation. Specifies the file where the EBL_GRAMMAR_PROBS command places its output.

ebl_ignore_feats
Relevant to applications using grammar specialisation. The value should be a list of unification grammar features: these features will be ignored in the specialised grammar. A suitable choice of value can greatly speed up Regulus to Nuance compilation for the specialised grammar. This is documented further here.

ebl_include_lex

Relevant to applications using grammar specialisation. Specifies a file or list of files containing EBL include lex declarations.

ebl_nuance_grammar
Relevant to applications using grammar specialisation. Points to the specialised Nuance GSL grammar file produced by the EBL_NUANCE operation.

ebl_operationality
Relevant to applications using grammar specialisation. Specifies the operationality criterion, which must be defined in the file $REGULUS/Prolog/ebl_operational.pl

ebl_regulus_component_grammar
Relevant to applications using grammar specialisation that define multiple top-level specialised grammars. Identifies which specialised Regulus grammar will be loaded by the EBL_LOAD command.

ellipsis_classes
Relevant to translation applications. Points to a file defining classes of intersubstitutable phrases that can be used in ellipsis processing.

from_interlingua_rules
Relevant to translation applications. Points to a file, or list of files, containing rules that transfer source language representations into interlingual representations.

gemini_grammar
Specifies the base name of the Gemini files generated by the GEMINI command.

generation_grammar
Relevant to applications that use generation (typically translation applications). Points to the file containing the compiled generation grammar.

generation_grammar(<Arg>)
Relevant to applications that use generation (typically translation applications) and also grammar specialisation. Points to the file containing the compiled specialised generation grammar for the subdomain tag <Arg>.

generation_incremental_deepening_parameters
Relevant to applications that use generation (typically translation applications). Value should be a list of three positive numbers [<Start>, <Increment>, <Max>], such that both <Start> and <Increment> are less than or equal to <Max>. Generation uses an iterative deepening algorithm, which initially sets a maximum derivation length of <Start>, and increases it in increments of <Increment> until it exceeds <Max>.

Default value is [5, 5, 50].

generation_module_name
Relevant to applications that use generation (typically translation applications). Specifies the module name in the compiled generation grammar file. Default is generator.

generation_preferences
Relevant to applications that use generation (typically translation applications). Points to the file containing the generation preference declarations.

generation_regulus_grammar
Relevant to applications that use generation (typically translation applications). If there is no regulus_grammar entry, points to the Regulus file, or list of Regulus files, that are to be compiled into the generation file.

generation_rules
Relevant to translation applications. Points to the file containing the generation grammar. Normally this will be a Regulus grammar compiled for generation. The translation code currently assumes that this file will define the module generator, and that the top-level predicate will be of the form

generator:generate(Representation, Tree, Words)

global_context
Relevant to translation applications. Defines a value that can be accessed by conditional-dependent transfer rules, if transfer rules are to be shared across several applications defined by multiple config files.

ignore_subdomain
Relevant to applications using grammar specialisation. Sometimes, you will have defined multiple subdomains, but you will only be carrying out development in one of them. In this case, you can speed up EBL training by temporarily adding ignore_subdomain declarations in the config file. An ignore_subdomain declaration has the form

regulus_config(ignore_subdomain, <Tag>).

The effect is to remove all references to <Tag> when performing training, and not build any specialised grammar for <Tag>. You may include any number of ignore_subdomain declarations.

interlingua_declarations
Relevant to translation applications. Points to the file containing the interlingua declarations, which define the constants that may be used at the interlingual representation level.

lf_patterns
Relevant to dialogue applications. Points to a files of LF patterns.

lf_patterns_modules
Relevant to dialogue applications. Value should be a list of modules referenced by the compiled LF patterns.

lf_postproc_pred
Defines a post-processing predicate that is applied after Regulus analysis. If you are using the riacs_sem semantic macros, you must set this parameter to the value riacs_postproc_lf.

nuance_grammar
Points to the Nuance GSL grammar produced by the NUANCE command.

nuance_compile_params
Specifies a list of extra compilation parameters to be passed to Nuance compilation by the NUANCE_COMPILE command. A typical value is

['-auto_pron', '-dont_flatten']

nuance_language_pack
Specifies the Nuance language pack to be used by� Nuance compilation in the NUANCE_COMPILE command.

orthography_rules
Relevant to translation applications. Points to a file containing rules for post-transfer orthography processing.

parse_preferences
Can be used to define default analysis preferences.

regulus_grammar
Points to the Regulus file, or list of Regulus files, that constitute the main grammar.

regulus_no_sem_decls
Points to a file which removes the sem feature from the main grammar.

surface_constituent_rules
Relevant to applications using surface processing. Points to the surface constituent rules file.

surface_patterns
Relevant to applications using surface processing. Points to the surface patterns file.

surface_postprocessing
Relevant to applications using surface processing. Points to a file that defines a post-processing predicate that can be applied to the results of surface processing. The file should define a predicate surface_postprocess(Representation, PostProcessedRepresentation).

tagging_grammar
Relevant to applications using surface processing. Points to a file that defines a tagging grammar, in DCG form. The top-level rule should be of the form

tagging_grammar(Item) --> <Body>.

target_model
Relevant to applications using surface processing. Points to a file defining a target model. The file should define the predicates target_atom/1 and target_atom_excludes/2.

to_interlingua_rules
Relevant to translation applications. Points to a file, or list of files, containing rules that transfer interlingual representations into target language representations.

to_source_discourse_rules
Relevant to translation applications. Points to a file, or list of files, containing rules that transfer source representations into source discourse representations.

top_level_cat
Defines the top-level category of the grammar.

top_level_generation_cat
Relevant to applications that use generation (typically translation applications). Defines the top-level category of the generation grammar. Default is .MAIN.

top_level_generation_feat
Relevant to applications that use generation (typically translation applications). Defines the semantic feature in the top-level rule which holds the semantic value. Normally, the rule will be of the form

'.MAIN':[gsem=[value=Sem]] --> Body

and the value of this parameter will be value (default if not specified).

top_level_generation_pred
Relevant to applications that use generation (typically translation applications). Defines the top-level category of the generation grammar. For translation applications, the value should be generate (default if not specified).

transfer_rules
Relevant to translation applications. Points to a file, or list of files, containing rules that transfer source language representations into target language representations.

translation_corpus
Relevant to translation applications. Points to a file of examples used as input to the TRANSLATE_CORPUS command. The format is sent(Atom), so for example a typical line would be

sent('switch off the light').

(note the closing period).

translation_corpus(<Arg>)
Relevant to translation applications. Points to a file of examples used as input to the parameterised command TRANSLATE_CORPUS <Arg>. The format is the same as for the translation_corpus file.

translation_corpus_judgements
Relevant to translation applications. Points to a file of translation judgements. You should not normally edit this file directly, but update it using the commands UPDATE_TRANSLATION_JUDGEMENTS and UPDATE_TRANSLATION_JUDGEMENTS_SPEECH.

translation_corpus_recognition_judgements
Relevant to translation applications. Points to a file of recognition judgements. You should not normally edit this file directly, but update it using the command UPDATE_RECOGNITION_JUDGEMENTS.

translation_corpus_results
Relevant to translation applications. Points to the file containing the result of running the TRANSLATE_CORPUS command. You can then edit this file to update judgements, and incorporate them into the translation_corpus_judgements file by using the command UPDATE_TRANSLATION_JUDGEMENTS.

translation_corpus_results(<Arg>)
Relevant to translation applications. Points to the file containing the result of running the parameterised command TRANSLATE_CORPUS <Arg>. You can then edit this file to update judgements, and incorporate them into the translation_corpus_judgements file by using the parameterised command UPDATE_TRANSLATION_JUDGEMENTS <Arg>.

translation_corpus_tmp_recognition_judgements
Relevant to translation applications. Points to the file of new recognition results generated by running the TRANSLATE_SPEECH_CORPUS command. You can then edit this file to update the judgements, and incorporate them into the translation_corpus_recognition_judgements file using the command UPDATE_RECOGNITION_JUDGEMENTS.

translation_corpus_tmp_recognition_judgements(<Arg>)
Relevant to translation applications. Points to the file of new recognition results generated by running the TRANSLATE_SPEECH_CORPUS <Arg> command. You can then edit this file to update the judgements, and incorporate them into the translation_corpus_recognition_judgements file using the command UPDATE_RECOGNITION_JUDGEMENTS <Arg>.

translation_rec_params
Relevant to translation applications. Specifies the list of Nuance parameters that will be used when carrying out recognition for the TRANSLATE_SPEECH_CORPUS command. These parameters must at a minimum specify the recognition package and the top-level Nuance grammar, for example

[package=med_runtime(recogniser), grammar='.MAIN']

translation_speech_corpus
Relevant to translation applications. Points to a file of examples used as input to the TRANSLATE_SPEECH_CORPUS command. The format is <Wavfile> <Words>, so for example a typical line would be

C:/Regulus/data/utt03.wav switch off the light

translation_speech_corpus(<Arg>)
Relevant to translation applications. Points to a file of examples used as input to the TRANSLATE_SPEECH_CORPUS <Arg> command. Format is as for the translation_speech_corpus parameter.

translation_speech_corpus_results
Relevant to translation applications. Points to the file containing the result of running the TRANSLATE_SPEECH_CORPUS command. You can then edit this file to update judgements, and incorporate them into the translation_corpus_judgements file by using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH.

translation_speech_corpus_results(<Arg>)
Relevant to translation applications. Points to the file containing the result of running the TRANSLATE_SPEECH_CORPUS <Arg> command. You can then edit this file to update judgements, and incorporate them into the translation_corpus_judgements file by using the command UPDATE_TRANSLATION_JUDGEMENTS_SPEECH <Arg>.

wavfile_directory
Relevant to translation applications. If output speech is to be produced using recorded wavfiles, points to the directory that holds these files.

wavfile_recording_script
Relevant to translation applications. If output speech is to be produced using recorded wavfiles, points to an automatically created file that holds a script which can be used to create the missing wavfiles. This script is produced by finding all the lexical items in the file referenced by generation_rules, and creating an entry for every item not already in wavfile_directory. The file is created as part of the processing carried out by the LOAD_TRANSLATE command.

Due to limitations of some operating systems the script contains some latin-1 characters translated to character sequences shown in the table below.

Original Character	Translates to
�	a1
�	a2
�	a3
�	a4
�	a5
�	c1
�	e1
�	e2
�	e3
�	e4
�	e6
�	n1
�	o1
�	o2
�	o3
�	o4
�	u1
�	u2
�	u3
�	u4

working_directory
Working files will have names starting with this prefix.

Using Regulus in batch mode

You can invoke Regulus in batch mode using the predicate regulus_batch/2 , defined in $REGULUS/Prolog/regulus_top.pl. A call to regulus_batch/2 is of the form

regulus_batch(ConfigFile, Commands)

where ConfigFile is the name of a Regulus config file, and Commands is a list of Regulus commands, written as strings.

There is also a three-argument version of this predicate, where the call is of the form

regulus_batch(ConfigFile, Commands, Errors)

Here, Errors will be instantiated to a list consisting of all the error messages that may have been printed out while the Commands were executed.

Compiling Regulus to Nuance from the command line

regulus_batch/2 can be used to write scripts that invoke Regulus to compile grammars. For example, loading the following file into Prolog invokes Regulus to compile a recogniser for the Toy1 grammar:

% Load following file to define library directories etc
:- ['$REGULUS/Examples/Toy1/prolog/library_declarations'].

% Compile the main Regulus code
:- compile('$REGULUS/Prolog/load').

% Do Regulus to Nuance compilation
:- regulus_batch('$REGULUS/Examples/Toy1/scripts/toy1.cfg', ["NUANCE"]).

:- halt.

Similarly, loading the following file into Prolog invokes Regulus to compile a recogniser for the PSA grammar, using grammar specialisation.

% Define library directories etc
:- ['$REGULUS/Examples/PSA/prolog/library_declarations'].

% Compile the main Regulus code
:- compile('$REGULUS/Prolog/load').

% Load, do EBL training and post-processing, and compile specialised grammar to Nuance.
:- regulus_batch('$REGULUS/Examples/PSA/scripts/psa.cfg',
["LOAD", "EBL_TREEBANK", "EBL_TRAIN", "EBL_POSTPROCESS", "EBL_NUANCE"]).

:- halt.

You can run these Prolog files from the command line by calling SICStus with the -l flag. For example, if the first of the two files above is in the file $REGULUS/Examples/Toy1/scripts/compile_to_nuance.pl , then we can perform the compilation from the command line with the invocation

sicstus -l $REGULUS/Examples/Toy1/scripts/compile_to_nuance.pl

Running the development environment with speech input

It is possible to give spoken instead of written input in the Regulus top loop. This first requires performing the following steps:

1. Make sure that /usr/bin (UNIX) or c:/cygwin/bin (Windows/Cygwin) are in your path.

Add a line to your config file which defines one of the parameters translation_rec_params or dialogue_rec_params. This needs at a minimum to specify where to find the compiled Nuance recognition package, and what the top-level grammar is. A typical value might be

regulus_config(translation_rec_params, [package=toy1_runtime(recogniser), grammar='.MAIN']).

�

It should now be possible to load the recognition package using the command LOAD_RECOGNITION. This should start processes for the Nuance license manager, the Nuance recserver, and the Regulus Speech Server (regserver), and will normally take about a minute.

Once the speech resources have been loaded, the RECOGNISE command will take spoken input from the microphone, performing recognition using the loaded package.

Calling the Regulus parser from Prolog

It is possible to call the Regulus parser from Prolog using the predicate

atom_to_parse_using_current_parser(+Sent, +Grammar, -Parse)

Here, Sent is the utterance to be parsed and Grammar is the top-level grammar to use, both represented as Prolog atoms; Parse is the resulting parse.

The predicate assumes that some grammar is currently loaded. Most often, this has been done using an invocation of regulus_batch/2; Grammar is usually the atom '.MAIN'. A typical invocation sequence is thus something like the following:

regulus_batch('$REGULUS/Examples/Toy1/scripts/toy1.cfg', ["LOAD"])

(�intervening code�)

atom_to_parse_using_current_parser('switch on the light', '.MAIN', Parse)

Regulus grammar examples

This section presents some illustrative Regulus grammars and their translations into GSL. In a later section we describe Regulus syntax more formally. Comments in the grammars are in italics and preceded Prolog-style by a percent sign.

We start with a minimal toy grammar, which covers a few phrases like "a dog" or "two cats". The only point of interest is that we want to block phrases like *"a dogs" or *"two cat", which combine a singular specifier and a plural noun, or vice versa.

Toy0 (Regulus version)

% "num_value" is a feature value space with possible values "sing" and "plur"
feature_value_space(num_value, [[sing, plur]]).

% "num" is a feature taking values in "num_value"
feature(num, num_value).

% ".MAIN" is a category which allows global slot-filling and has no syntactic features
category('.MAIN', [gsem]).

% "np", "spec" and "n" are all categories which allow a semantic return value and have one syntactic feature, "num"
category(np, [sem, num]).
category(spec, [sem, num]).
category(n, [sem, num]).

% ".MAIN" is a top-level grammar
top_level_category('.MAIN').

% ".MAIN" can be rewritten to "np"
'.MAIN':[gsem=[value=S]] --> np:[sem=S].

% "np" can be rewritten to "spec" followed by "n". The "spec" and "num" have to agree on the value of the "num" feature.
np:[sem=[spec=S, num=N], num=Num] -->
spec:[sem=S, num=Num], n:[sem=N, num=Num].

% Lexicon entries

% "a" is a singular "spec"
spec:[sem=a, num=sing] --> a.

% "two" is a plural "spec"
spec:[sem=2, num=plur] --> two.

% "the" is a "spec" that can be either singular or plural
spec:[sem=the, num=(sing\/plur)] --> the.

% "cat" and "dog" are singular "n"s
n:[sem=cat, num=sing] --> cat.
n:[sem=dog, num=sing] --> dog.

% "cat" and "dog" are plural "n"s
n:[sem=cat, num=plur] --> cats.
n:[sem=dog, num=plur] --> dogs.

This grammar compiles to the following GSL grammar:

Toy0 (GSL version)

.MAIN

[ ( NP_ANY:v_0 ) { < value $v_0 > } ]

NP_ANY
[
( NP_PLUR:v_0 ) {return( $v_0 )}
( NP_SING:v_0 ) {return( $v_0 )}
]

NP_PLUR
[ ( SPEC_PLUR:v_0 N_PLUR:v_1 ) {return( [ < spec $v_0 > < num $v_1 > ] )}]

NP_SING
[ ( SPEC_SING:v_0 N_SING:v_1 ) {return( [ < spec $v_0 > < num $v_1 > ] )}]

SPEC_SING
[
( a ) {return( a )}
( the ) {return( the )}
]

SPEC_PLUR
[
( two ) {return( 2 )}
( the ) {return( the )}
]

N_SING
[
( cat ) {return( cat )}
( dog ) {return( dog )}
]

N_PLUR
[
( cats ) {return( cat )}
( dogs ) {return( dog )}
]

Our second example is a little more realistic, and shows how complex structures can be built up using the GSL "concat" operator.

Toy1 (Regulus version)

% Declarations

feature_value_space(number_value, [[sing, plur]]).
feature_value_space(vform_value, [[imperative, finite]]).
feature_value_space(vtype_value, [[transitive, switch, be]]).
feature_value_space(sem_np_type_value, [[n, location, switchable, dimmable]]).

feature(number, number_value).
feature(vform, vform_value).
feature(vtype, vtype_value).
feature(sem_np_type, sem_np_type_value).
feature(obj_sem_np_type, sem_np_type_value).

top_level_category('.MAIN').

category('.MAIN', [gsem]).
category(utterance, [sem]).
category(command, [sem]).
category(yn_question, [sem]).
category(np, [sem, number, sem_np_type]).
category(location_pp, [sem]).
category(noun, [sem, number, sem_np_type]).
category(spec, [sem, number]).
category(verb, [sem, number, vform, vtype, obj_sem_np_type]).
category(onoff, [sem]).

% Grammar

'.MAIN':[gsem=[value=S]] -->
utterance:[sem=S].

utterance:[sem=S] -->
( command:[sem=S] ;
yn_question:[sem=S]
).

command:[sem=concat([[type, command]], concat(Op, concat(OnOff, Np)))] -->
verb:[sem=Op, vform=imperative, vtype=switch, obj_sem_np_type=ObjType],
onoff:[sem=OnOff],
np:[sem=Np, sem_np_type=ObjType].

command:[sem=concat([[type, command]], concat(Op, Np))] -->
verb:[sem=Op, vform=imperative, vtype=transitive, obj_sem_np_type=ObjType],
np:[sem=Np, sem_np_type=ObjType].

yn_question:[sem=concat([[type, query]], concat(Verb, concat(OnOff, Np)))] -->
verb:[sem=Verb, vform=finite, vtype=be, number=N, obj_sem_np_type=n],
np:[sem=Np, number=N, sem_np_type=switchable],
onoff:[sem=OnOff].

% Discard semantic contribution of spec...
np:[sem=concat(Noun, Loc), number=N, sem_np_type=SemType] -->
spec:[sem=Spec, number=N],
noun:[sem=Noun, number=N, sem_np_type=SemType],
?location_pp:[sem=Loc].

location_pp:[sem=Loc] -->
in,
np:[sem=Loc, sem_np_type=location].

% Lexicon

verb:[sem=[[state, be]], vform=finite, vtype=be, number=sing,
obj_sem_np_type=n] --> is.
verb:[sem=[[state, be]], vform=finite, vtype=be, number=plur,
obj_sem_np_type=n] --> are.

verb:[sem=[[action, switch]], vform=imperative, vtype=switch, number=sing,
obj_sem_np_type=switchable] --> switch.
verb:[sem=[[action, switch]], vform=imperative, vtype=switch, number=sing,
obj_sem_np_type=switchable] --> turn.

verb:[sem=[[action, dim]], vform=imperative, vtype=transitive, number=sing,
obj_sem_np_type=dimmable] --> dim.

noun:[sem=[[device, light]], sem_np_type=switchable\/dimmable, number=sing] --> light.
noun:[sem=[[device, light]], sem_np_type=switchable\/dimmable, number=plur] --> lights.
noun:[sem=[[device, fan]], sem_np_type=switchable, number=sing] --> fan.
noun:[sem=[[device, fan]], sem_np_type=switchable, number=plur] --> fans.

noun:[sem=[[location, kitchen]], sem_np_type=location, number=sing] --> kitchen.
noun:[sem=[[location, living_room]], sem_np_type=location, number=sing] --> living, room.

spec:[sem=the, number=sing] --> the.
spec:[sem=all, number=plur] --> the.
spec:[sem=all, number=plur] --> all, ?((?of, the)).

onoff:[sem=[[onoff=on]]] --> ?switched, on.
onoff:[sem=[[onoff=off]]] --> ?switched, off.

Toy1 (GSL version)

.MAIN
[( UTTERANCE:v_0 ) { < value $v_0 > }]

UTTERANCE
[
( COMMAND:v_0 ) {return( $v_0 )}
( YN_QUESTION:v_0 ) {return( $v_0 )}
]

COMMAND
[
( VERB_ANY_DIMMABLE_IMPERATIVE_TRANSITIVE:v_0 NP_ANY_DIMMABLE:v_1 )
{return( concat( ( ( type command ) ) concat( $v_0 $v_1 ) ) )}
( VERB_ANY_SWITCHABLE_IMPERATIVE_SWITCH:v_0 ONOFF:v_1 NP_ANY_SWITCHABLE:v_2 )
{return( concat( ( ( type command ) ) concat( $v_0 concat( $v_1 $v_2 ) ) ) )}
]

NP_ANY_DIMMABLE
[
( NP_PLUR_DIMMABLE:v_0 ) {return( $v_0 )}
( NP_SING_DIMMABLE:v_0 ) {return( $v_0 )}
]

NP_ANY_LOCATION
[( NP_SING_LOCATION:v_0 ){return( $v_0 )}]

NP_ANY_SWITCHABLE
[
( NP_PLUR_SWITCHABLE:v_0 ) {return( $v_0 )}
( NP_SING_SWITCHABLE:v_0 ) {return( $v_0 )}
]

VERB_ANY_SWITCHABLE_IMPERATIVE_SWITCH
[( VERB_SING_SWITCHABLE_IMPERATIVE_SWITCH:v_0 ) {return( $v_0 )}]

VERB_ANY_DIMMABLE_IMPERATIVE_TRANSITIVE
[( VERB_SING_DIMMABLE_IMPERATIVE_TRANSITIVE:v_0 ) {return( $v_0 )}]

YN_QUESTION
[
( VERB_PLUR_N_FINITE_BE:v_0 NP_PLUR_SWITCHABLE:v_2 ONOFF:v_1 )
{return( concat( ( ( type query ) ) concat( $v_0 concat( $v_1 $v_2 ) ) ) )}
( VERB_SING_N_FINITE_BE:v_0 NP_SING_SWITCHABLE:v_2 ONOFF:v_1 )
{return( concat( ( ( type query ) ) concat( $v_0 concat( $v_1 $v_2 ) ) ) )}
]

NP_PLUR_DIMMABLE
[( SPEC_PLUR:v_2 NOUN_PLUR_DIMMABLE:v_0 ?(LOCATION_PP:v_1) ) {return( concat( $v_0 $v_1 ) )}]

NP_PLUR_SWITCHABLE
[( SPEC_PLUR:v_2 NOUN_PLUR_SWITCHABLE:v_0 ?(LOCATION_PP:v_1) ) {return( concat( $v_0 $v_1 ))}]

NP_SING_DIMMABLE
[( SPEC_SING:v_2 NOUN_SING_DIMMABLE:v_0 ?(LOCATION_PP:v_1) ) {return( concat( $v_0 $v_1 ) )}]

NP_SING_LOCATION
[( SPEC_SING:v_2 NOUN_SING_LOCATION:v_0 ?(LOCATION_PP:v_1) ) {return( concat( $v_0 $v_1 ) )}]

NP_SING_SWITCHABLE
[( SPEC_SING:v_2 NOUN_SING_SWITCHABLE:v_0 ?(LOCATION_PP:v_1) ){return( concat( $v_0 $v_1 ) )}]

LOCATION_PP
[( in NP_ANY_LOCATION:v_0 ) {return( $v_0 )}]

VERB_SING_N_FINITE_BE
[( is ) {return( ( ( state be ) ) )}]

VERB_PLUR_N_FINITE_BE
[( are ) {return( ( ( state be ) ) )}]

VERB_SING_SWITCHABLE_IMPERATIVE_SWITCH
[
( switch ) {return( ( ( action switch ) ) )}
( turn ) {return( ( ( action switch ) ) )}
]

VERB_SING_DIMMABLE_IMPERATIVE_TRANSITIVE
[( dim ) {return( ( ( action dim ) ) )}]

NOUN_SING_DIMMABLE
[( light ) {return( ( ( device light ) ) )}]

NOUN_SING_SWITCHABLE
[
( fan ) {return( ( ( device fan ) ) )}
( light ) {return( ( ( device light ) ) )}
]

NOUN_PLUR_DIMMABLE
[( lights ) {return( ( ( device light ) ) )}]

NOUN_PLUR_SWITCHABLE
[
( fans ) {return( ( ( device fan ) ) )}
( lights ) {return( ( ( device light ) ) )}
]

NOUN_SING_LOCATION
[
( kitchen ) {return( ( ( location kitchen ) ) )}
( living room ) {return( ( ( location living_room ) ) )}
]

SPEC_SING
[( the ) {return( the )}]

SPEC_PLUR
[
( the ) {return( all )}
( all ?(?(of) the) ) {return( all )}
]

ONOFF
[
( ?(switched) off ) {return( ( [ < onoff off > ] ) )}
( ?(switched) on ) {return( ( [ < onoff on > ] ) )}
]

Building recognisers using grammar specialisation

You may simply want to use Regulus to compile your own unification grammars into Nuance GSL. Experience shows, however, that complex natural language grammars tend to have a lot of common structure, since they ultimately have to model general linguistic facts about English and other natural languages. There are consequently good reasons for wanting to save effort by implementing a SINGLE domain-independent core grammar, and producing domain-independent versions out of it using some kind of specialisation process.
Regulus includes an experimental system which attempts to deliver this functionality. There is a general unification grammar for English , containing about 145 rules, and an accompanying core lexicon. For a given domain, you will need to extend the general grammar in some way. In particular, you will need to supplement the core lexicon with a domain-specific lexicon that you will write yourself. You will then be able to use the grammar specialisation tools to transform a small training corpus into a specialised version of the grammar.

The general English grammar

The general English grammar is in the directory $REGULUS/Grammar. It contains the following files:

� general_eng.regulus. The grammar rules and declarations.

� gen_eng_lex.regulus. Core function-word lexicon and some macro definitions useful for writing lexicon entries.

� riacs_sem.regulus. Definitions for semantics macros that produce a QLF-like semantics based on those used in the RIACS PSA system.

� nested_sem.regulus. Definitions for semantics macros that produce a minimal list-based recursive semantics.

� linear_sem.regulus. Definitions for semantics macros that produce a minimal list-based non-recursive semantics.

The multiple semantics files reflect the fact that the semantics in general_eng.regulus and gen_eng_lex.regulus are all defined in terms of macros.

Building on top of the general English grammar

A grammar based on the general English grammar should include general_eng.regulus, gen_eng_lex.regulus and exactly one of the semantics macros files. It will typically also contain a domain-specific lexicon file. These files are declared in the config file for the application.

If riacs_sem.regulus is used, the config file must contain the line

regulus_config(lf_postproc_pred, riacs_postproc_lf).

This specifies that the initial results from Nuance recognition are to be postprocessed using the predicate riacs_postproc_lf .

Lexicon macros

You are strongly advised to define domain-specific lexical entries using the macros from $REGULUS/Grammar/gen_eng_lex_entries.regulus. This file includes documentation and examples, using the format illustrated by the following example for the macro v_intransitive:

% Intransitive
% e.g. "John sleeps"
%
% @v_intransitive([sleep, sleeps, slept, slept, sleeping],
%                 [action, sleep], [agent], [takes_time_pp=y, takes_frequency_pp=y, takes_duration_pp=y]).

macro(v_intransitive(SurfaceForms, [SemType, SemConstant], [SubjSortalType], OtherFeats),
      @verb(SurfaceForms,
        [ @verb_sem(SemType, SemConstant)],
        [subcat=nx0v,
         inv=n,
         subj_sem_n_type=SubjSortalType |
        OtherFeats])).

Simple examples of how to use lexicon macros are in the domain-specific lexicon file for the Toy1Specialised application, $REGULUS/Examples/Toy1Specialised/Regulus/toy1_lex.regulus.

A wider range of examples can be found in the English lexicon in the Open Source MedSLT project : look at the file MedSLT2/Eng/Regulus/med_lex.regulus.

Invoking the grammar specialiser

A grammar built on top of the general grammar is transformed into a specialised Nuance grammar in the following processing stages:

1. The EBL training corpus (defined by the config file parameter ebl_corpus ) is parsed into a "treebank" of parsed representations. This is done using the Regulus command EBL_TREEBANK .

2. The treebank is used to produce a "raw" specialised Regulus grammar, using the EBL algorithm. This is done using the Regulus command EBL_TRAIN . The granularity of the learned rules is defined by the config file parameter ebl_operationality. This parameter should have the value file(<File>), where <File> identifies a file containing operationality definitions.

3. The "raw" specialised Regulus grammar is post-processed into the final specialised grammar. This is done using the Regulus command EBL_POSTPROCESS . The post-processing stage consists of three steps:

o Duplicate rules are merged, keeping only the different training examples as documentation.

o Specialised rules are sorted by number of training examples.

o If there are enough training examples for a rule, it is further constrained to unify with the the least common generalisation of all the contexts in which it has occurred. The threshold which determines when this happens is defined by the config file parameter ebl_context_use_threshold.

4. The final specialised Regulus grammar is compiled into the Nuance grammar. This is done using the Regulus command EBL_NUANCE .

Writing operationality definitions

The operationality definitions file contains declarations which specify how example trees from the "treebank" file are to be cut up into smaller trees, which are then flattened into specialised rules. The basic idea is to traverse the tree in a downward direction, starting with the root node: throughout the traversal process, the system maintains a value called the "context", which by default remains the same when moving downwards over a node. Each node is labelled with the associated category. Operationality declarations can do one of two things. Usually they will cut the tree, starting a new rule and simultaneously changing the value of the context. Occasionally, they will just change the value of the context without cutting the tree.

Operationality rules of both types syntactically have the form of Prolog rules. They are respectively of the forms

change_rule_and_context(<OldContext>, <NewContext>) :- <Conditions>
change_context(<OldContext>, <NewContext>) :- <Conditions>

where <OldContext>, <NewContext> are respectively the old and new values of the context, and <Conditions> is a set of conditions on the node.

The <Conditions> have the form of the body of a Prolog rule, and can include the usual Prolog logical connectors: conjunction (","), disjunction (";") and negation ("\+"). Four different primitives are currently available for constraining the node:

� cat(<CatName>). The category symbol for the node is <CatName>.

� dominates(<CatName>). The node dominates (directly or indirectly) another node whose category symbol is <CatName>.

� immediately_dominates(<CatName>). The node immediately dominates another node whose category symbol is <CatName>.

� lexical. The node is a lexical node, i.e. dominates only terminal symbols.

� gap. The node has null yield, i.e. dominates no terminal symbols.

A simple example of an operationality definitions file can be found in $REGULUS/Examples/Toy1Specialised/Prolog/operationality.pl. Derivation trees are cut up and flattened to produce a simple grammar with rules for UTTERANCE (the top category), NP, POST_MODS and lexical items. The definitions are as follows:

% Start new rule at UTTERANCE
change_rule_and_context(_Context, utterance) :-
    cat(utterance),
    \+ gap.

% Start new rule at NP or POST_MODS if under UTTERANCE
change_rule_and_context(utterance, np) :-
    cat(np),
    \+ gap.
change_rule_and_context(utterance, post_mods) :-
    cat(post_mods),
    \+ gap.

% Start new rule at NP if under POST_MODS
change_rule_and_context(post_mods, np) :-
    cat(np),
    \+ gap.

% Start new rule at POST_MODS if under NP
change_rule_and_context(np, post_mods) :-
    cat(post_mods),
    \+ gap.

% Always start new rule at lexical node
change_rule_and_context(_Context, lexical) :-
    lexical.

Defining multiple top-level specialised grammars

It is possible to use the grammar specialisation mechanism to produce multiple top-level specialised grammars. If you want to do this, you must first define a set of tags, which will label the different grammars. Each example in the EBL training corpus (defined by the config file parameter ebl_corpus ) must then be labelled with some subset of the grammar tags, to indicate which grammar or grammars it applies to. For example, if a home control domain had separate grammars for each room, the tags would be the names of the rooms (bedroom, kitchen, living_room and so on), and typical lines in the training file might be

sent('switch off the light', [bathroom, bedroom, kitchen, living_room, wc]).
sent('turn on the tap', [bathroom, kitchen, wc]).
sent('flush the toilet', [wc]).

The specialised Nuance grammar file produced by the EBL_NUANCE command will contain one top-level grammar for each tag. The name of the top-level Nuance grammar corresponding to the tag <Tag> will be .MAIN__<Tag> (note the double underscore), so for example the grammar for kitchen will be .MAIN__kitchen. The tag default is treated specially, and produces the top-level grammar .MAIN.

Ignoring subdomains to speed up compilation

Sometimes, you will have defined multiple subdomains, but you will only be carrying out development in one of them. In this case, you can speed up EBL training by temporarily adding ignore_subdomain declarations in the config file. An ignore_subdomain declaration has the form

regulus_config(ignore_subdomain, <Tag>).

The effect is to remove all references to <Tag> when performing training, and not build any specialised grammar for <Tag>. You may include any number of ignore_subdomain declarations.

Handling ambiguity when constructing the treebank

In general, the corpus utterances used as input to the treebank may be ambiguous. In most cases, the first analysis produced will the intended one. When the first analysis is not the intended one, it is possible to annotate the training corpus so as to choose a different analysis, by using the optional third argument of the sent record. This third argument should be a list of constraints on the logical form. Constraints may currently be of the following forms:

� lf_includes_structure=<Structure>. The logical form must contain a subterm that unifies with <Structure>

� lf_doesnt_include_structure=<Structure>. The logical form may not contain any subterm that unifies with <Structure>

� tree_includes_structure=<Structure>. The parse tree must contain a subterm that unifies with <Structure>

� tree_doesnt_include_structure=<Structure>. The parse tree may not contain any subterm that unifies with <Structure>

For example, suppose that the training utterance is 'i read the meter', which is ambiguous since 'read' can be either present or past tense. If we want to choose the present tense interpretation and reject the past tense interpretation, we can write the training corpus example in either of the following ways:

� sent('i read the meter', [default], [lf_includes_structure=[tense, present]]).

� sent('i read the meter', [default], [lf_doesnt_include_structure=[tense, past]]).

Tree-based preferences assume an encoding of the parse tree illustrated by the following example. Suppose that we have the training example "what is your temperature". With the general English grammar, this can be analysed in at least two ways, depending on whether the word-order is inverted or uninverted. Suppose that we want to block the inverted word-order. This analysis will have the parse-tree

.MAIN [GENERAL_ENG:504-509]
�� top [GENERAL_ENG:515-521]
�� /� utterance_intro null [GENERAL_ENG:529-531]
�� |� utterance [GENERAL_ENG:578-583]
�� |�� s [GENERAL_ENG:658-663]
�� |�� s [GENERAL_ENG:689-700]
�� |�� /� np [GENERAL_ENG:1852-1859]
�� |�� |�� d lex(what) [GEN_ENG_LEX:364-364]
�� |�� |� s [GENERAL_ENG:774-783]
�� |�� |�� vp [GENERAL_ENG:1252-1269]
�� |�� |�� /� vp [GENERAL_ENG:999-1008]
�� |�� |�� |� /� vbar [GENERAL_ENG:833-855]
�� |�� |�� |� |� /� v lex(is) [MED_LEX:314-322]
�� |�� |�� |� |� |� np [GENERAL_ENG:1942-1957]
�� |�� |�� |� |� |� /� np [GENERAL_ENG:1818-1826]
�� |�� |�� |� |� |� |� /� possessive lex(your) [GEN_ENG_LEX:377-377]
�� |�� |�� |� |� |� |� |� nbar [GENERAL_ENG:1982-1992]
�� |� ��|�� |� |� |� |� \�� n lex(temperature) [MED_LEX:501-501]
�� |�� |�� |� |� \� \� post_mods null [GENERAL_ENG:1383-1389]
�� |�� |�� |� |� np [GENERAL_ENG:1942-1957]
�� |�� |�� |� |� /� np null [GENERAL_ENG:2180-2191]
�� |�� |�� |� \� \� post_mods null [GENERAL_ENG:1383-1389]
�� |�� \�� \� post_mods null [GENERAL_ENG:1383-1389]
�� \� utterance_coda null [GENERAL_ENG:560-562]

which for the purposes of tree-based preferences will be represented as the Prolog term

(.MAIN <
�[(top <
�� [utterance_intro<null,
�� (utterance <
�� [(s <
�� [(s <
�� [np<[d<lex(what)],
�� (s <
�� [(vp <
�� [(vp <
�� [(vbar <
�� [v<lex(is),
�� np<[np<[possessive<lex(your),nbar<[n<lex(temperature)]],post_mods<null]]),
�� np<[np<null,post_mods<null]]),
�� post_mods<null])])])])]),
�� utterance_coda<null])])

We can choose not to allow this analysis by using a negative tree-based constraint matching a suitable substructure. For example, the constraint

tree_doesnt_include_structure=(s < [(np < _), (s < _)])

will block trees containing a fronted NP, and

tree_doesnt_include_structure=(np < null)

will block trees containing an NP gap.

Default parse preferences

It will often be the case that the same type of constraint will be required for many similar items in the treebank. For example, in Spanish the determiner "un" can be either an indefinite article ("a") or a number ("one"). In general, we will prefer the indefinite article reading, but some words like time-units will make it more likely that the number reading is to be preferred.

The best way to handle situation like these is to define default parse preferences, which are declared in the parse_preferences file. A record in the parse_preferences file is of the form

parse_preference_score(<Pattern>, <Score>).

where <Pattern> is an LF pattern and <Score> is a numerical score. Patterns can be either constraints of the type shown above, or Boolean combinations of these constraints formed using the usual Prolog operators "," (conjunction), ";" (disjunction) and "\+" (negation). The Spanish determiner example can be handled as follows using LF-based preferences:

% By default, disprefer readings where "un/una" is interpreted as a number
parse_preference_score(lf_includes_structure=[number,1], -1).

% But prefer to interpret "un/una" as number if it occurs with a timeunit
parse_preference_score((lf_includes_structure=[number,1],� lf_includes_structure=[timeunit,_]), 5).

Making Regulus to Nuance compilation more efficient by ignoring features

For any given specialised grammar, there will probably be several features deriving from the general grammar which have no appreciable positive effect in terms of constraining the language model. Although these features are essentially useless, they can still slow down Regulus to Nuance compilation very substantially, or even cause it to exceed resource limits.

It is possible to force the compiler to ignore features, by using the ebl_ignore_feats config file parameter. For example, the declaration

regulus_config(ebl_ignore_feats, [syn_type, subj_syn_type, obj_syn_type, indobj_syn_type]).

says that all the features in the "syn_type" group are to be ignored.

If you want to try to optimise performance by ignoring features, we recommend that you start by looking at the following groups:

1. "syn_type" features: syn_type, subj_syn_type, obj_syn_type, indobj_syn_type.

These features are only useful if you are using the ebl_context_use_threshold parameter, and can otherwise be safely ignored.

2. "def" features: def, subj_def, obj_def, indobj_def

These features can be used to constrain NPs with respect to definiteness, but we have found them to be of very limited value. They can probably also be ignored in most applications.

Including lexicon entries directly into the specialised grammar

One way to add lexicon entries to the specialised grammar is just to add suitable training examples in the corpus. You can also include lexicon entries directly from the general grammar. To do this, you need to write one or more files of include_lex entries, and add an ebl_include_lex entry to the config file, which points to them. So for example if your include_lex entries are in the file $MY_APP/my_lex_includes.pl, your config file needs the line

regulus_config(ebl_include_lex, '$MY_APP/my_lex_includes.pl').

The format of an include_lex entry is

include_lex(<Cat>:[words=<Words>, sem=<Sem>], <Tags>).

This says to include all lexicon entries of category <Cat>, whose surface form is <Words> and whose logical form contains a subterm matching <Sem>, in the specialised grammars whose tags are in the list <Tags>. Thus for example the declaration

include_lex(v:[words=start, sem=start_happening], [gram1, gram2]).

says to include in gram1 and gram2 the v entries whose surface form is start, and whose logical forms contain the atom start_happening. Note that <Sem> can be partially instantiated: for example

include_lex(v:[sem=[event, _]], [gram1]).

says to include in gram1 all the v entries whose logical forms contain a term matching [event, _]. You can use this feature to include all entries of a particular semantic class.

<Words>, <Sem> and <Tags> are optional, and in practice you will usually omit some or all of them. So for example

include_lex(v:[words=start, sem=start_happening]).

says to include the v entries whose surface form is start, and whose logical forms contain the atom start_happening in the single default specialised grammar;

include_lex(v:[sem=start_happening]).

says to include all v entries whose logical forms contain the atom start_happening in the single default specialised grammar; and

include_lex(v:[]).

says to include all v entries in the single default specialised grammar.

Overriding include_lex declarations

In practice, you often want to include nearly all of the entries matching some pattern, omitting just a few problem cases. You can do this with dont_include_lex entries. The format of a dont_include_lex entry is the same as that of an include_lex entry, i.e.

dont_include_lex(<Cat>:[words=<Words>, sem=<Sem>], <Tags>).

dont_include_lex entries take precedence over include_lex entries. Thus for example, the following entries say to include all English verb entries except those for the reduced forms of "is", "are", "am", "has", "had" and have":

% Add all entries for verbs
include_lex(v:[]).

% ... except a few auxiliaries that cause problems
dont_include_lex(v:[words='\'s']).
dont_include_lex(v:[words='\'re']).
dont_include_lex(v:[words='\'m']).
dont_include_lex(v:[words='\'d']).
dont_include_lex(v:[words='\'ve']).

Conditional include_lex declarations

It is also possible to write conditional include_lex declarations. These are intended to be used to cover the case where you want in effect to say "include all the inflected forms of any entry matching this pattern, if you see any instance of it".

A conditional include_lex declaration has the specific form

include_lex(<Cat>:[sem=<Pattern1>], Tags) :-
rule_exists(<Cat>:[sem=<Pattern2>], Tags).

where normally <Pattern1> and <Pattern2> will share variables. So for example

include_lex(v:[sem=[Type, Word]], Tags) :-
rule_exists(v:[sem=[[tense, Tense], [Type, Word]]], Tags).

says "include any v entry with a sem value including the subterm [Type, Word], if you have learned a rule for a v whose sem value exactly matches [[tense, Tense], [Type, Word]]".

Creating class N-gram grammars from specialised grammars

Nuance provides tools for creating class N-gram grammars, using the SayAnything package. For details of how to use SayAnything, see the Nuance documentation: the least trivial step, however, is usually writing the "tagging grammar', which defines the backoff classes. Sometimes, you may be in the situation of having already constructed a specialised Regulus grammar, and wanting to build a class N-gram grammar with similar coverage. Regulus provides a tool that allows you to define the classes by example: each class is specified by naming two or more lexical items, and consists of all the lexical items that match the common generalisation of the examples. The tool is packaged as the Prolog predicate

specialised_regulus2nuance_tagging(+RegulusGrammarFile, +SpecFile, +TaggingGrammarFile, +Debug, +TopGrammar)

defined in the file $REGULUS/Prolog/specialised_regulus2nuance_tagging.pl. The arguments are as follows:

� RegulusGrammarFile is a "no_binarise" specialised Regulus grammar file

� SpecFile is a file of items of the form

tagging_class(<ClassId>, <Examples>)

where

o <ClassId> is an atom that can be used as a Nuance grammar name

o <Examples> is a list of at least two lexical items, either atoms or comma-lists

� TaggingGrammarFile is an output Nuance GSL tagging grammar file

� Debug is one of {debug, nodebug}

� TopGrammar is the name of the top-level generated grammar

TaggingGrammarFile is created from RegulusGrammarFile and SpecFile by constructing one tagging grammar for each tagging_class declaration in SpecFile. The tagging grammar for tagging_class(<Grammar>, <Examples>) is constructed as follows:

1. Go through RegulusGrammarFile finding the lexicon entries matching <Examples>

2. Construct the anti-unification of the LHS categories in all these lexicon entries, to create a category Pattern.

3. Find all Words such that there is a lexicon entry matching Pattern --> Words

4. The generated GSL grammar <Grammar> is

<Grammar>
[
Words_1
Words_2
...
Words_n
]

The top-level GSL grammar is

.MAIN
[
<Grammar_1>
<Grammar_2>
...
<Grammar_n>
]

Here is an example of using the tool, taken from the Japanese version of the MedSLT system:

:- use_module('$REGULUS/Prolog/specialised_regulus2nuance_tagging').

:- specialised_regulus2nuance_tagging(
       '$MED_SLT2/Jap/GeneratedFiles/japanese_recognition_specialised_no_binarise_default.regulus',
       '$MED_SLT2/Jap/SLM/scripts/headache_tagging_grammar_spec.pl',
       '$MED_SLT2/Jap/SLM/med_generated_tagging_headache.grammar',
       debug,
       '.MAIN_tagging_headache').

:- halt.

Surface processing

Regulus permits a surface parsing method that can be used as an alternative to grammar-based parsing. Semantic representations are lists of elements produced by simple surface pattern matching. The surface parsing mode is switched on using the SURFACE command, and interacts cleanly with translation mode.

Files used for surface processing

In order to use surface parsing, you need to define the following config file entries:

� surface_patterns

� tagging_grammar

� target_model

� discriminants

� surface_postprocessing

The actual patterns are in the surface_patterns file, which is the only one described here; we currently recommend that the other files be filled with the placeholder values defined in the directory $MED_SLT2/Eng/Alterf. A later version of this documentation may describe how to develop non-trivial versions of these files.

If you want to use the surface processing rules to produce nested structures, you must also define a value for the config file entry surface_constitutent_rules. This is described further below.

Defining surface patterns

The surface_patterns file contains a set of declarations of the form

alterf_surface_pattern(<Pattern>, <Element>, <Doc>).

where the semantics are that if <Pattern> is matched in the surface string then <Element> is added to the semantic representation. The <Doc> field should be set to null or contain an example. The pattern language is illustrated by the following examples:

alterf_surface_pattern([pressing],[adj,pressing],null).

The word "pressing" produces the semantic element [adj, pressing].

alterf_surface_pattern([in,'...',morning/afternoon/evening],[prep,in_time],null).

The word "in", followed by a gap or zero or more words and one of the words "morning", "afternoon" or "evening", produces the semantic element [prep, in_time].

alterf_surface_pattern([not_word(least/than),once],[frequency,once],null).

The word "once", preceded by a word that is not "least" or "than", produces the semantic element [frequency, once].

alterf_surface_pattern(['is'/are/was, not(['...',increasing/decreasing/becoming])],[verb,be],null).

The words "is", "are" or "was", not followed by a gap of zero or more words and one of the words "increasing", "decreasing" or "becoming", produces the semantic element [verb, be].

alterf_surface_pattern(['*start*',when],[time,when],null).

The word "when", occurring at the start of the utterance, produces the semantic element [time, when].

Using surface patterns to produce nested structures

It is possible to use the surface pattern rules to produce nested constituents. To do this, you need to define a value for the config file entry surface_constitutent_rules. The file this entry points to contains surface_constituent_boundary rules, that define when nested constituents start and end.

A surface_constituent_boundary rule is of the form

surface_constituent_boundary(<BeforePattern>, <AfterPattern>, <StartOrEnd>, <Tag>).

where

� <BeforePattern> and <AfterPattern> are surface patterns of the type defined in the preceding section.

� <StartOrEnd> is either start or end

� <Tag> is the tag attached to the nested constituent. Currently this must have the value clause.

If the surface parser reaches a point in the string where the immediately preceding words match <BeforePattern>, the immediately following words match <AfterPattern>, and <StartOrEnd> is start, then it will open a nested constituentof type <Tag>. The nested constituent is closed off either by reaching the end of the string, or by reaching a point where there is an end rule whose before- and after-patterns match.

For example, the rule

surface_constituent_boundary([when], [not_word(do/does/have/has/can)], start, clause).

says that a nested constitutent of type clause is started at a point where the preceding word is when, and the following word is not one of do, does, have, has or can.

Using Regulus grammars for generation

It is possible to compile a Regulus grammar into a generation grammar, using the LOAD_GENERATION command. The files and parameters involved are specified in the config file, as follows:

� The Regulus grammar to be compiled is specified using the regulus_grammar config file item. For historical reasons, you can also use the generation_regulus_grammar config item.

� The compiled version of the generator is placed in the Prolog file specified by the generation_grammar config item.

� The top-level rule in the Regulus grammar must be of the form

<TopLevelCat>:[gsem=[<TopLevelFeature>=Sem]] --> <Body>

The value of <TopLevelCat> is specified using the top_level_generation_cat config item. Default is .MAIN.

The value of <TopLevelFeature> is specified using the top_level_generation_feat config item. Default is value.

� The Prolog module for the compiled generator file is specified by the generation_module_name config item. Default is generator.

� The top-level Prolog predicate in the compiled generator file is specified by the top_level_generation_pred config item. Default is generate.

� Generation uses an iterative deepening algorithm, which initially sets a maximum derivation length of <Start>, and increases it in increments of <Increment> until it exceeds <Max>. These parameters are specified using the generation_incremental_deepening_parameters config item. The default value is [5, 5, 50].

A typical config file for compiling a generator looks like this:

% Use same grammar for analysis and generation (analysis just for development)
regulus_config(regulus_grammar, [french_generation_grammars(french_generation)]).
regulus_config(top_level_cat, '.MAIN').

% Where to put the compiled generation grammar
regulus_config(generation_grammar, fre_runtime('generator.pl')).

% Trivial settings for iterative deepening - perform one iteration, and allow anything of depth =< 50
regulus_config(generation_incremental_deepening_parameters, [0, 50, 50]).

regulus_config(working_file_prefix, fre_runtime(french_generation)).

If you are developing a generation grammar, you will often find it useful to run the top-level in generation mode, using the GENERATION command. When in generation mode, the system attempts to parse each utterance, and then generates back from the result using the generation grammar. If it is possible to generate several different strings, all of them will be displayed. In order for this to work, you have to first load the analysis grammar using the LOAD or EBL_LOAD commands, and also load the generation grammar using the LOAD_GENERATION command.

Note that if you are building a translation application, it will usually be convenient to have a separate config file for the target language generation grammar, which you will just use for developing this grammar.

Specialised generation grammars

You can compile grammars that have been created using grammar specialisation into generation form, using the command EBL_LOAD_GENERATION <SubdomainTag>. If you omit the argument, it is assumed to have the value 'default'. The specialised grammar with tag <SubdomainTag> is compiled into generation form, and the compiled version is stored in the location referenced by the config file entry generation_grammar(<SubdomainTag>).

Generation preferences

If the generation grammar is ambiguous, in the sense that several surface strings can be generated from one logical form, the order in which the strings are generated is in general not defined. You can induce a specific ordering by adding a generation preferences file. The format of entries in this file is

generation_preference(<WordList>, <PreferenceScore>).

where <WordList> is a Prolog list of surface words, and <PrefenceScore> is a positive or negative number. The effect of the generation preferences is to define a score on each generated string, calculated by summing the preference scores for all substrings. So for example, if the grammar generates both "on the evening" and "in the evening" as semantically indisinguishable expressions, we could prefer "in the evening" by adding one or both of the following declarations:

% Prefer "in the evening"
generation_preference([in, the, evening], 1).
% Disprefer "on the evening"
generation_preference([on, the, evening], -1).

Using Regulus for translation

Regulus possesses an extensive infrastructure allowing it to be used for building speech translation systems. Most of this part of the system has been developed under the Open Source MedSLT project, which contains further documentation and numerous examples. To use the translation mechanisms, you need to declare the necessary translation-related files in the config file. You can then run the top-level Regulus development environment in translation mode. Translation can be run both interactively and in batch. It is also possible to call the translation routines from Prolog, so as to incorporate them into a speech translation application.

Translation can be performed using either a transfer-based or an interlingual framework. In a transfer-based framework, source-language representations are transferred directly into target-language representations. In an interlingual framework, translation goes through the following levels of representation:

� Source level. The representation produced by the source language grammar.

� Source discourse level. This is intended to be a slightly regularised version of the source representation, suitable for carrying out ellipsis processing. Ellipsis processing makes it possible to translate elliptical phrases in the context of the preceding dialogue.

� Interlingual level. This is intended to act as a neutral representation suitable for reducing the differences between source and target language representation.

� Target level. The level from which the target language grammar generates surface form.

All kinds of transformations (source to target in the transfer based framework; source to source discourse, source discourse to interlingua and interlingua to target in the interlingual one) are implemented using the same transfer rule formalism.

When the target language representation has been produced, it needs to be converted into a surface string using a generation grammar. If the generation grammar is ambiguous (i.e. one representation can produce multiple surface string), it is possible to define generation preferences. The output of the generation grammar can be further post-processed using collocation rules and orthography rules.

Files used for translation

In order to build a translation application, you need to declare at least some of the following files.

� A transfer rules file (optional) defined by the transfer_rules config file entry.

� An interlingua_declarations file (optional) defined by the interlingua_declarations config file entry.

� A to_source_discourse_rules file (optional) defined by the to_source_discourse_rules config file entry.

� A to_interlingua rules file (optional) defined by the to_interlingua_rules config file entry.

� A from_interlingua rules file (optional) defined by the from_interlingua_rules config file entry.

� An ellipsis classes file (optional) defined by the ellipsis_classes config file entry. If this is defined, you need to compile it first using the COMPILE_ELLIPSIS_PATTERNS command.

� A collocations file (optional) defined by the collocation_rules config file entry.

� An orthography rules file (optional) defined by the orthography_rules config file entry

You must define EITHER a transfer_rules file OR both a to_interlingua rules file and a from_interlingua rules file.

The translation top-level

You can put the Regulus top-level into translation mode using the TRANSLATE command. You can then go back into normal mode using the NO_TRANSLATE command. You can switch between the transfer-based framework and the interlingual framework using the INTERLINGUA and NO_INTERLINGUA commands. The default in translation mode is to assume the transfer-based framework. Translation mode is compatible with surface parsing, so if you invoke the SURFACE command while in translation mode, you will perform translation using surface processing. This of course assumes that you have defined and loaded the files required for surface parsing.

The following example, which uses the English to French language version of the MedSLT system, illustrates interaction in translation mode. Note that the second sentence is translated in the context set up by the first one, with the ellipsis resolution being carried out at source_discourse level.

>> do you have headaches in the evening

Source: do you have headaches in the evening+*no_preceding_utterance*
Target: avez-vous vos maux de t�te le soir
Other info:
   n_parses=1
   source_representation=[[prep,in_time],[pronoun,you],[spec,the_sing],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
   source_discourse=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
   resolved_source_discourse=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
   resolution_processing=trivial
   interlingua=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,evening],[utterance_type,ynq],[voice,active]]
   target_representation=[[pronoun,vous],[path_proc,avoir],[symptom,mal_de_t�te],[tense,present],[temporal,soir],[utterance_type,sentence],[voice,active]]
   n_generations=1
   other_translations=[]

>> in the morning

Source: in the morning+avez-vous vos maux de t�te le soir
Target: avez-vous vos maux de t�te le matin
Other info:
   n_parses=1
   source_representation=[[prep,in_time],[spec,the_sing],[time,morning],[utterance_type,phrase]]
   source_discourse=[[time,morning],[utterance_type,phrase]]
   resolved_source_discourse=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,morning],[utterance_type,ynq],[voice,active]]
   resolution_processing=ellipsis_substitution(when_pain_appears)
   interlingua=[[pronoun,you],[state,have_symptom],[symptom,headache],[tense,present],[time,morning],[utterance_type,ynq],[voice,active]]
   target_representation=[[pronoun,vous],[path_proc,avoir],[symptom,mal_de_t�te],[tense,present],[temporal,matin],[utterance_type,sentence],[voice,active]]
   n_generations=1
   other_translations=[]

Running translation in batch mode

It is possible to perform batch translation, using input data in both text and speech form. The results can then be judged, and the judgements stored to use for future regression testing.

Batch translation of text data
Input text data for batch translation should be placed in the file defined by the translation_corpus config file entry. The format is sent(Atom), so for example a typical line would be

sent('switch off the light').

(note the closing period). Translation is performed using the TRANSLATE_CORPUS command. The output file is defined by the translation_corpus_results config file entry.

Batch translation of speech data
Input speech data for batch translation is in the form of recorded wavfiles. The names of these wavfiles, together with accompanying transcriptions, should be placed in the file defined by the translation_speech_corpus config file entry. The format is <Wavfile> <Words>, so for example a typical line would be

C:/Regulus/data/utt03.wav switch off the light

Translation is performed using the TRANSLATE_SPEECH_CORPUS command. The output file is defined by the translation_speech_corpus_results config file entry. A second output file, defined by the translation_corpus_tmp_recognition_judgements config file entry, contains "blank" recognition judgements.

Since it usually takes much longer to perform speech recognition on a batch file than to translate the resulting text, the TRANSLATE_SPEECH_CORPUS_AGAIN command makes it possible to re-run translation on the saved recognition results. This is useful if you are testing speech translation performance, but have only changed the translation or generation files.

Judging translations
To judge the results of translation, manually edit the output file. This should contains question marks for translations that have not yet been judged. If these are replaced by valid judgements, currently 'good', 'ok' or 'bad', the new judgements can be incorporated into the translation judgements file (defined by the translation_corpus_judgements config file entry) using the commands UPDATE_TRANSLATION_JUDGEMENTS (results of text translation) and UPDATE_TRANSLATION_JUDGEMENTS_SPEECH (results of speech translation).

You can revise judgements by simply changing them in the output file. If you then use UPDATE_TRANSLATION_JUDGEMENTS or a similar command, the translation judgements file will be updated appropriately.

Judging recognition
In speech mode, the second output file, defined by the translation_corpus_tmp_recognition_judgements config file entry, contains recognition judgements. This file should also be manually edited, and the question marks replaced with either 'y' (acceptable recognition) or 'n' (unacceptable recognition). The recognition judgements file can be updated using the UPDATE_RECOGNITION_JUDGEMENTS command. Sentences judged as unacceptably recognised are explicitly categorised as such, and not, for example, as translation errors.

Calling translation from Prolog

[Not yet written. How to incorporate translation into an application. What to load, in terms of both code and rules. Pointer to MedSLT.]

Translation rule files

Translation files can be used for both transfer and interlingual processing. In the case of transfer, there should be one file, specified by the transfer_rules config file entry. In the case of interlingua, there should be two files, specified by the from_interlingua_rules and to_interlingua_rules config file entries. The format of the files is the same in both cases. The intent is to define a mapping from a source representation to a target representation. Both representations are lists of items.

Translation files can contain two kinds of entries, respectively for transfer_lexicon items and transfer_rule items. A transfer_lexicon item is of the form

transfer_lexicon(SourceItem, TargetItem).

where the assumption is that SourceItem can appear as an ELEMENT of the list that makes up the source representation. The semantics is that the SourceItem is to be replaced in the target-language representation by the TargetItem. Here are some examples of English to French transfer lexicon items from the MedSLT system:

transfer_lexicon([adj, deep], [body_part, profond]).
transfer_lexicon([body_part, face], [body_part, visage]).
transfer_lexicon([event, relieve], [event, soulager]).
transfer_lexicon([freq, often], [frequency, souvent]).

A transfer_rule item is of the form

transfer_rule(SourceItemList, TargetItemList).

where the assumption is that SourceItemList is a LIST OF ONE OR MORE ELEMENTS in the list that makes up the source representation, and TargetItemList is a list of zero or more elements . The semantics are that the SourceItemList is to be replaced in the target-language representation by the TargetItemList. Here are some examples of English to French transfer rule items from the MedSLT system:

transfer_rule([[tense,present], [aspect,perfect]], [[tense, pass�_compos�]]).
transfer_rule([[event, make_adj], [adj, worse]], [[event, aggraver]]).
transfer_rule([[state, spread], [prep, to_loc]], [[path_proc, irradier]]).
transfer_rule([[spec, the_sing]], []).

Resolving conflicts between transfer rules

The order of the elements in the left- and right-hand sides of a transfer rule DOES NOT matter.
The ordering of transfer rules in the file DOES matter. If more than one rule can be applied, the choice is made as follows:

� If it is possible to apply a transfer rule and a transfer lexicon entry, then the transfer rule is chosen.

� If more than one transfer rule can be applied, then the rule with the longer left-hand side is chosen.

� If the criterion of choosing the longest left-hand side still results in a tie, then the rule appearing first in the file is chosen.

Conditional transfer rules

Transfer rules can optionally include conditions, which most often will be conditions on the context in which the rule is called. A transfer rule with conditions is of the form

transfer_rule(SourceItemList, TargetItemList) :- Conditions.

A simple condition is of one of the forms

� number(<Term>) where <Term> is any term.

� context(<Element>)where <Element> is a single representation element.

� context_above(<Element>) where <Element> is a single representation element.

� global_context(<Term>) where <Term> is any term.

The first case is number(<Term>). This constrains <Term> to be a number. For example, the rule

transfer_rule([[spec, N]], [[spec, N]]) :- number(N).

says that [spec, N] should be translated into [spec, N] in the case that N is a number.

The context element refers to the context of the current clause. For example, the following rule says that [tense,present] should be translated into into [tense, pass�_compos�] in a context which includes the element [frequency, ever]:

transfer_rule([[tense,present]], [[tense, pass�_compos�]]) :- context([frequency, ever]).

context_above is similar, but is used for rules that will be invoked inside representations of subordinate clauses. In the case, the context referred to is that of the main clause in which the subordinate clause appears. In contrast, the next rule says that [tense,present] should be translated into into [tense, past] in a subordinate clause which appears inside a main clause that includes the element [tense, past]:

transfer_rule([[tense,present]], [[tense, past]]) :- context_above([tense, past]).

Context elements may be partially uninstantiated, for example:

transfer_rule([[event, relieve]], [[event, soulager]]) :- context([cause, _]).

It is also possible to define rules dependent on global context. Global context is defined in the config file, using the global_context declaration. It can be accessed using a context condition of the form global_context(<Element>) where <Element> is a term potentially defined by a global_context declaration. For example, the following rule says that "it" should be translated into "headache" if there is a global context declaration of the form subdomain(headache):

transfer_rule([[pronoun, it]], [[symptom, headache]]) :- global_context(subdomain(headache)).

You can combine context elements using conjunction, disjunction and negation, as in usual Prolog syntax:

"Translate [event, relieve] into [event, soulager] if there is BOTH something matching [cause, _] AND something matching [tense, present] in the context"
transfer_rule([[event, relieve]], [[event, soulager]]) :- context([cause, _]), context([tense, present]).

"Translate [event, relieve] into [event, soulager] if there is� EITHER something matching [cause, _] OR something matching [symptom, _] in the context"
transfer_rule([[event, relieve]], [[event, soulager]]) :- context([cause, _]) ; context([symptom, _]).

"Translate [tense, present] into [tense, present] if there is� NOT anything matching [frequency, ever] in the context"
transfer_rule([[tense,present]], [[tense, present]]) :- \+ context([frequency, ever]).

Transfer variables

Transfer rules may also contain constrained transfer variables. The syntax of a constrained transfer variable is different depending on whether it occurs on the left hand or the right hand side of the rule.

On the left hand side, the variable is written as tr(<Id>, <Constraint>), where <Id> is an atomic identifier, and <Constraint> is a pattern matching a single LHS element. Thus the following are valid LHS transfer variables:

tr(device, [device, _])
tr(tense_var, [tense, _])

On the right hand side, the variable is written as simply tr(<Id>), where <Id> is an atomic identifier. Thus the following are valid RHS transfer variables:

tr(device)
tr(tense_var)

Each LHS variable must be associated with at least one RHS variable (normally, it will be associated with exactly one). Conversely, each RHS variable must be associated with exactly one LHS variable. The semantics are that if an element on the LHS which matches the constraint in the transfer variable tr(<Id>, <Constraint>) can be translated using other transfer rules, the translation will appear in the RHS in the place marked by the associated RHS variable(s) tr(<Id>). Thus the following example

transfer_rule([[action, switch], [onoff, off], tr(device, [device, _])],
[[action, switch_off], tr(device)]).

can be read: "Match the the elements [action, switch], [onoff, off] and [device, _] in the LHS, and replace them with the element [action, switch_off] together with the translation of the element matching [device, _]".

Bidirectional transfer rules and transfer lexicon entries

Transfer lexicon entries and unconditional transfer rules can be bidirectional. A bidirectional lexicon entry has the form

bidirectional_transfer_lexicon(LHSItem, RHSItem).

where the two arguments are of the same forms as those in a normal transfer lexicon entry. Similarly, a bidirectional transfer rule is of the form

bidirectional_transfer_rule(LHSItem, RHSItem).

where the arguments again have the same form as in a normal transfer rule.

By default, a bidirectional lexicon entry or rule is compiled in the same way as its normal counterpart. However, if the list of transfer rules contains the dummy element
�
transfer_direction(backward)

it is compiled as though the arguments had appeared in the reverse order, i.e. with RHSItem first and LHSItem second.

Translation tracing

The commands TRANSLATION_TRACE_ON and TRANSLATION_TRACE_OFF can be used to toggle translation tracing, which is off by default. When translation tracing is on, translation prints out a trace associating the source and target atoms used by each rule. Here is a typical example:

>> where is the pain

Transfer trace, "to_source_discourse"

[[loc,where]] --> [[loc,where]]�� [ENG_TO_NEW_SOURCE_DISCOURSE:417-417]
[[secondary_symptom,pain]] --> [[symptom,pain]]�� [ENG_TO_NEW_SOURCE_DISCOURSE:1-3]
[[tense,present]] --> [[tense,present]]�� [ENG_TO_NEW_SOURCE_DISCOURSE:474-474]
[[utterance_type,whq]] --> [[utterance_type,whq]]�� [ENG_TO_NEW_SOURCE_DISCOURSE:502-502]
[[verb,be]] --> [[verb,be]]�� [ENG_TO_NEW_SOURCE_DISCOURSE:505-505]
[[voice,active]] --> [[voice,active]]�� [ENG_TO_NEW_SOURCE_DISCOURSE:836-837]

------------------------------- FILES -------------------------------

ENG_TO_NEW_SOURCE_DISCOURSE: c:/cygwin/home/speech/speechtranslation/medslt2/eng/prolog/eng_to_new_source_discourse.pl

Transfer trace, "to_interlingua"

[[loc,where]] --> [[loc,where]]�� [ENG_TO_NEW_INTERLINGUA:634-634]
[[symptom,pain]] --> [[symptom,pain]]�� [ENG_TO_NEW_INTERLINGUA:103-104]
[[tense,present]] --> [[tense,present]]�� [ENG_TO_NEW_INTERLINGUA:689-689]
[[utterance_type,whq]] --> [[utterance_type,whq]]�� [ENG_TO_NEW_INTERLINGUA:702-702]
[[verb,be]] --> [[verb,be]]�� [ENG_TO_NEW_INTERLINGUA:705-705]
[[voice,active]] --> [[voice,active]]�� [ENG_TO_NEW_INTERLINGUA:1021-1022]

------------------------------- FILES -------------------------------

ENG_TO_NEW_INTERLINGUA: c:/cygwin/home/speech/speechtranslation/medslt2/eng/prolog/eng_to_new_interlingua.pl

Transfer trace, "from_interlingua"
[[loc,where],[symptom,pain],[verb,be]] --> [[locative,o�],[path_proc,avoir],[symptom,mal],[pronoun,vous]]�� [INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:293-294]
[[tense,present]] --> [[tense,present]]�� [INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:96-96]
[[utterance_type,whq]] --> [[utterance_type,wh]]�� [INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:15-15]
[[voice,active]] --> [[voice,active]]�� [INTERLINGUA_FRE_BIDIRECTIONAL_MAIN:311-312]

------------------------------- FILES -------------------------------

INTERLINGUA_FRE_BIDIRECTIONAL_MAIN: c:/cygwin/home/speech/speechtranslation/medslt2/fre/prolog/interlingua_fre_bidirectional_main.pl

Source: where is the pain+o� avez-vous mal
Target: o� avez-vous mal
Other info:

�� n_parses=1
�� source_representation=[[loc,where],[secondary_symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
�� source_discourse=[[loc,where],[symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
�� resolved_source_discourse=[[loc,where],[symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
�� resolution_processing=trivial
�� interlingua=[[loc,where],[symptom,pain],[tense,present],[utterance_type,whq],[verb,be],[voice,active]]
�� target_representation=[[locative,o�],[path_proc,avoir],[pronoun,vous],[symptom,mal],[tense,present],[utterance_type,wh],[voice,active]]
�� n_generations=1
�� other_translations=[]

Interlingua and interlingua declarations

If you are using interlingual translation, all the constants that refer to the interlingual level must be declared in the interlingua_declarations file. This file contains a set of Prolog clauses of one of the forms

interlingua_constant([<Key>, <Value>]).
interlingua_constant([<Key>, <Value>]) :- <Body>.

Here, <Key> and <Value> can be any Prolog terms. In the second case, <Body> can be any expression that can legitimately appear in the body of a Prolog rule.

For example, if you want to write the English-to-interlingua rule

transfer_rule([[event, make_adj], [adj, worse]],
[[event, make_worse]]).

you need to add the declaration

interlingua_constant([event, make_worse]).

Similarly, if you want to write the interlingua-to-French rule

transfer_rule([[spec, [more_than, N]]],
[[comparative, plus_de], [number, N]]).

you need to add the declaration

interlingua_constant([spec, [more_than, N]]) :- ( number(N) ; var(N) ).

The body of the declaration needs to be written this way because the interlingua declarations may be invoked either at rule-compilation time or at runtime. At rule-compilation time, the N in the left hand side of the interlingua-to-French rule will be an unbound variable. At runtime, it will be instantiated to a number.

In general, you should not be adding many items to the interlingua declarations file; the whole point is to limit the set of constants that appear in a multi-lingual application, and encourage developers working on different languages to use the same constants.

Interlingua structure grammars

It is possible to specify the interlingua more tightly by defining an interlingua structure grammar. This is a Regulus grammar whose purpose is explicitly to define the range of semantic forms which constitute valid interlingua. The interlingua structure grammar is used as follows.

1. The interlingua structure grammar itself is compiled in generation mode. The config file used to specify it must contain the following lines:

regulus_config(generation_module_name, check_interlingua).
regulus_config(top_level_generation_pred, check_interlingua).

It must also include a line of the form

regulus_config(generation_grammar, <CompiledFile>).

where <CompileFile> is the name of the file used to hold the generation-compiled version of the interlingua structure grammar.

2. The translation application must contain a declaration of the form

regulus_config(interlingua_structure, <CompiledFile>).

where <CompileFile> is the name of the generation-compiled interlingua structure grammar.

If an interlingua structure grammar is being used in a translation application, an extra line is produced in the translation trace, giving the surface form generated by the interlingua structure grammar. If the interlingua structure grammar fails to generate anything (i.e. the interlingua form is outside its coverage), this line will contain the value WARNING: INTERLINGUA REPRESENTATION FAILED STRUCTURE CHECK.

It is possible to get informative feedback about generation failures in the interlingua structure grammar using the top-level command INTERLINGUA_DEBUGGING_ON. When this mode is enabled, generation failures are followed by attempts to generate from variants of the interlingua formed by inserting, deleting or substituting elements. This will often enable the developer to identify the reason why an interlingua form fails to respect the constraints imposed by the interlingua structure grammar. Interlingua debugging mode is disabled using the command INTERLINGUA_DEBUGGING_OFF.

Using macros in translation rule files

You can define and use macros in translation rule files. These macros have the same syntax and semantics as the macros used in grammar files. To take a simple example, supposing that a file of English to French transfer rules contains the macro definition

macro(device(Eng, Fre),
      transfer_lexicon([device, Eng], [device, Fre])).

The macro call

@device(light, lampe).

would then be equivalent to the transfer lexicon entry

transfer_lexicon([device, light], [device, lampe])).

Just as with grammar macros, transfer macros may be non-deterministic. The following more complex example illustrates this. Suppose that we have included the following macro definitions in a set of English to Interlingua transfer rules:

macro(onoff(on), [[onoff, on]]).
macro(onoff(off), [[onoff, off]]).

macro(switch_onoff(on), [action, switch_on]).
macro(switch_onoff(off), [action, switch_off]).

The macro call

transfer_rule([[action, switch], @onoff(OnOff)],
          [@switch_onoff(OnOff)]).

would then be equivalent to the two transfer rules

transfer_rule([[action, switch], [onoff, on]],
          [[action, switch_on]]).

transfer_rule([[action, switch], [onoff, off]],
          [[action, switch_off]]).

Macro definitions DO NOT need to precede associated macro calls, but macro definitions MUST be included in the set of transfer files where the calls appear.

Ellipsis processing

If you are using the interlingual framework, it is possible to perform simple context-dependent translation by defining an ellipsis_classes file. This file contains ellipsis class definitions. Each definition is of the form

ellipsis_class(<Id>, <Examples>).

where <Id> is an arbitrary identifier, and <Examples> is a list of source-language phrases having the property that one phrase in the list could reasonably be substituted for another if it appeared as an elliptical phrase. Thus the definition

ellipsis_class(since_when,
               ['for months',
                'for more than a day',
                'for several days']).

specifies that phrases of the forms 'for months', 'for more than a day' and 'for several days' are all intersubstitutable as elliptical phrases. So for example if the preceding utterance was "have you had headaches for months", the phrase "for several days" would be translated as though it were "have you had headaches for several days".

The ellipsis-processing mechanism assumes that source discourse level semantic representations will be lists of elements of the form [Type, Value]. Except in the case of WH-ellipsis, examples are generalised by keeping only the Key part of the representation. For example, in the MedSLT system the interlingual representation produced by the phrase 'for several days' is

[[prep,duration_time],[spec,several],[timeunit,day]]

and this is generalised to

[[prep,_],[spec,_],[timeunit,_]]

When more than one ellipsis processing pattern can be applied, the interpreter chooses the one that matches THE LONGEST POSSIBLE SUBLIST of the current context.

The ellipsis_classes file is compiled using the command COMPILE_ELLIPSIS_PATTERNS. The file used to hold the compiled version of the ellipsis classes can be specified explicitly using the compiled_ellipsis_classes config file entry. This is normally only useful if you for some reason want to maintain two different sets of compiled ellipsis rules for the same domain.

WH-ellipsis

It is also possible to perform ellipsis resolution on WH-questions. For example, if the preceding question is "how often does the headache occur", then "several times a day" can reasonably be interpreted as meaning "does the headache occur several times a day".

The ellipsis declarations mechanism contains an extension that can be used to support ellipsis on WH-questions. An example in a class can be tagged as a WH-element, e.g.

ellipsis_class(frequency,
�� ['several times a day',
�� 'once a day',
�� 'at least once a day',
�� 'every day',
�� wh-'how often'
�� ]).

Here, the example 'how often' is tagged as a WH-element. WH-ellipsis elements are not generalized, and must be matched exactly.

WH-elements can also be made dependent on a context. For example, if the preceding question was "what is your temperature", then "over ninety nine degrees" can reasonably be interpreted as meaning "is your temperature over ninety nine degrees". Here, "over ninety nine degrees" substitutes "what", but we don't always want it to substitute "what"; we only want this to happen in a context where containing a phrase like "your temperature". We specify this using the declaration

ellipsis_class(temperature,
�� ['over one hundred degrees',
�� wh_with_context-['what', 'your temperature']]).

Here, 'what' is the substituted element, and 'your temperature' is the context. Context elements are not generalized.

Using generation in translation

A generation grammar file must be defined, using the generation_rules config file entry. This should normally be the compiled form of a Regulus grammar for the target language. When compiling the generation grammar, the following config file entries are required:

regulus_config(generation_module_name, generator).
regulus_config(top_level_generation_pred, generate).

Generation preferences

If the generation grammar is ambiguous, in the sense that one representation can produce multiple surface strings, it may be desirable to define generation preferences, to force some strings to be chosen ahead of others. The generation preference file is defined using the generation_preferences config file entry, and should contain entries of the form

generation_preference(<ListOfWords>, <Score>).

where <ListOfWords> is a list of one or more surface words, and <Score> is an associated numerical score. For example, if the intent is to prefer the collocation "in the night" and disprefer the collocation "at the night", it might be appropriate to add the two declarations

generation_preference([in, the, night], 1).
generation_preference([at, the, night], -1).

If multiple generation results are produced, each result S is assigned a score calculated by adding the generation preferences for all substrings of S, and the result with the highest total score is preferred.

Collocation rules

The surface form produced by generation can optionally be post-processed by a set of collocation rules, defined using the collocation rules config file entry. A collocation rule file consists of a list of entries of the form

better_collocation(<LHSWords>, <RHSWords>).

where the semantics are that sequences of words matching <LHSWords> will be replaced by <RHSWords>. Here are some examples from the English to French version of MedSLT:

better_collocation("de un", "d'un").
better_collocation("de une", "d'une").
better_collocation("dans le c�t�", "sur le c�t�").

Orthography rules

A second optional post-processing stage can be defined using the orthography rules config file entry. An orthography rule file consists of a list of entries of the form

orthography_rewrite(<LHSString>, <RHSString>).

where the semantics are that sequences of characters matching <LHSWords> will be replaced by <RHSWords>. If more than one rule matches, the first one is used. Here are some examples from the English to French version of MedSLT:

orthography_rewrite("t -- il", "t-il").
orthography_rewrite(" -- il", "-t-il").
orthography_rewrite(" -- ", "-").

You can define simple context-sensitive orthography rules by adding one or more letter_class declarations, and then using variables that range over letters of a specified type. For example, we can capture the English rule that "a" becomes "an" before a word starting with a vowel as follows:

letter_class('V', "aeiou").

orthography_rewrite(" a V1", " an V1").

Here, the letter_class declaration defines 'V' to be the class of vowels, and the occurrences of V1 in the rule mean "a variable which can be any letter in the class 'V'". Note the spaces before the "a" on the left-hand side and the "an" on the right-hand side: these are necessary in order to match only the word "a", as opposed to any word ending in an "a".

The syntax of the letter_class declaration is

letter_class(<ClassID>, <Letters>).

where <ClassID> is a one-letter Prolog atom, and <Letters> is a Prolog string. Variables in rules are written as a letter-class letter followed by a single-digit number.

Using Regulus for dialogue applications

Regulus provides support for building Prolog dialogue applications, using a version of the "update semantics" model. The basic requirement is that all dialogue state must be represented as a Prolog term: there are no constraints on the form of this term. Dialogue processing code is supplied in the list of files which the dialogue_files config entry points to. These files need to define the following four predicates:

1. lf_to_dialogue_move(+LF, -DialogueMove)

Converts a logical form produced by a Regulus grammar into a "dialogue move", the internal representation used by the dialogue manager. If you do not wish to distinguish between LFs and dialogue moves, this predicate can be trivial.

2. initial_dialogue_state(?DialogueState)

Defines the initial value of the dialogue state object used by the DM.

3. update_dialogue_state(+DialogueMove, +InState, -AbstractAction, -OutState)

Takes as input a dialogue move and the previous dialogue state; returns an "abstract action" and the new dialogue state.

4. abstract_action_to_action(+AbstractAction, -ConcreteAction)

Converts abstract actions into concrete actions. If you do not want to distinguish between abstract actions and concrete actions, this predicate can be trivial.

It is optionally possible to define two more top-level dialogue processing predicates:

1. resolve_lf(+LF, +InState, -ResolvedLF, -Substitutions)

Performs context-dependent resolution on the logical form produced by the Regulus grammar, with respect to the dialogue state. It return a term ResolvedLF, and a possibly empty list Substitutions of items of the form Term1 � Term2, detailing which substitutions have been carried out to effect the transformation. This predicate will typically be used to implement some kind of ellipsis resolution. It is called first in the processing sequence, immediately before lf_to_dialogue_move/4.

2. resolve_dialogue_move(+DialogueMove, +InState, -ResolvedDialogueMove)

Performs context-dependent resolution on the dialogue move produced by lf_to_dialogue_move/4, and is called immediately after it. This predicate will typically be used to implement some kind of reference resolution.

Examples of dialogue processing applications

Examples of simple dialogue processing applications are provided in the directories $REGULUS/Examples/Toy1 and $REGULUS/Examples/Toy1Specialised. In each case, the file Prolog/input_manager.pl defines the predicate lf_to_dialogue_move/2; Prolog/dialogue_manager.pl defines the predicates initial_dialogue_state/1 and update_dialogue_state/4; and Prolog/output_manager.pl defines the predicate abstract_action_to_action/2.

A more complex example of dialogue processing can be found in $REGULUS/Examples/Calendar. As before, Prolog/input_manager.pl defines the predicate lf_to_dialogue_move/2, Prolog/dialogue_manager.pl defines the predicates initial_dialogue_state/1 and update_dialogue_state/4, and Prolog/output_manager.pl defines the predicate abstract_action_to_action/2. There are however several more files: Prolog/resolve_lf.pl defines the predicate resolve_lf/4, and Prolog/resolve_dialogue_move.pl defines resolve_dialogue_move/3. Note also that the main body of the input manager is specified using the LF-patterns mechanism. The patterns themselves can be found in the file Prolog/lf_patterns.pl.

Running dialogue applications from the Regulus top-level

Dialogue processing files can be compiled directly from the Regulus top-level using the LOAD_DIALOGUE command, and the DIALOGUE command puts the Regulus top-level into a dialogue processing mode. In this mode, text utterances are parsed and the output is passed to the dialogue processing code. The following edited session shows an example of how to use these commands to run the dialogue application defined in $REGULUS/Examples/Toy1:

| ?- ['$REGULUS/Prolog/load'].

(... loads Regulus code files ...)

| ?- regulus('$REGULUS/Examples/Toy1/scripts/toy1_dialogue.cfg').
Loading settings from Regulus config file c:/home/speech/regulus/examples/toy1/scripts/toy1_dialogue.cfg

>> LOAD

(... loads Toy1 grammar ...)

>> LOAD_DIALOGUE

(... compiles dialogue processing files ...)

>> DIALOGUE
(Do dialogue-style processing on input sentences)

>> switch on the light in the kitchen

      Old state: [device(light,kitchen,off,0),device(light,living_room,off,0),device(fan,kitchen,off,0)]
             LF: [[type,command],[action,switch],[onoff,on],[device,light],[location,kitchen]]
Dialogue move: [command,device(light,kitchen,on,100)].
Abstract action: say(device(light,kitchen,on,100))
Concrete action: say_string("the light in the kitchen is on")
      New state: [device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]

Dialogue processing time: 0.00 seconds

>> is the light switched on

      Old state: [device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]
             LF: [[type,query],[state,be],[onoff,on],[device,light]]
Dialogue move: [query,device(light,_,on,_)].
Abstract action: say(device(light,kitchen,on,100))
Concrete action: say_string("the light in the kitchen is on")
      New state: [device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]

Dialogue processing time: 0.01 seconds

>> switch off the light

      Old state: [device(light,kitchen,on,100),device(light,living_room,off,0),device(fan,kitchen,off,0)]
             LF: [[type,command],[action,switch],[onoff,off],[device,light]]
Dialogue move: [command,device(light,_,off,0)].
Abstract action: say(device(light,kitchen,off,0))
Concrete action: say_string("the light in the kitchen is off")
      New state: [device(light,kitchen,off,0),device(light,living_room,off,0),device(fan,kitchen,off,0)]

Dialogue processing time: 0.00 seconds

Dialogue processing commands

There are dialogue processing commands to perform most obvious kinds of corpus-based regression testing, in both text and speech mode. These commands are extremely similar to the corresponding ones for translation.

Regression testing for dialogue applications

Regression testing files for dialogue applications may contain the following types of items:

sent(<SentenceAtom>). A normal text sentence, e.g.
sent('switch off the light')
sent('LF: <PrologLogicalForm>'). A non-speech LF, e.g.
sent('LF: light_status(kitchen, 50)').
wavfile(<WavfileName>). Perform recognition on the designated wavfile, followed by dialogue processing on the result. This requires having first loaded speech resources using the LOAD_RECOGNITION command, as described here.
init_dialogue. Reset the dialogue state to the initial one.

Using LF patterns for dialogue applications

The logical forms produced by a Regulus grammar typically contain a lot of useful structure, but none the less are not suitable for direct use in a dialogue application. In most cases, the input manager is responsible for performing some kind of non-trivial transformation that turns the logical form into a dialogue move.

It is often possible for some or all of the structure of a dialogue move to be a list of feature-value pairs, whose values are determined by searching for combinations of patterns in the LF. The lf_pattern mechanism is designed to help build applications of this type. The implementor needs to supply the following pieces of information:

The config file must include an lf_patterns declaration, which points to the file containing the lf_pattern elements. For example, the Calendar application contains the following declaration:

regulus_config(lf_patterns, calendar_prolog('lf_patterns.pl')).
The config file will in most cases also include an lf_patterns_modules declaration. The value of this declaration should be a list of modules referenced by the lf_patterns. The Calendar application contains the following declaration:

regulus_config(lf_patterns_modules,
�� [library(lists),
�� library(system),
�� '$REGULUS/PrologLib/utilities',
�� '$REGULUS/Examples/Calendar/Prolog/calendar_utils']).
The lf_patterns file can contain three kinds of elements: lf_pattern, lf_boundary, or macro. Macros are just the same as any other Regulus macros.

An lf_pattern declaration is of the following basic form:

lf_pattern(<MainPattern>,
�� <ConditionPattern>,
�� <Feature>=<Value>) :-
�� <Body>.

Here, <MainPattern> is a Prolog term that is to match some part of the LF, <ConditionPattern> is a Boolean combination of Prolog terms, <Feature> is an atom, <Value> is an arbitrary Prolog term, and <Body> is arbitrary Prolog code. The semantics are "If the LF contains an occurrence of <MainPattern> at a given place P, the combination of patterns <ConditionPattern> matches anywhere in the LF, and <Body> evaluates to true, then the assignment <Feature>=<Value> is added at place P. It is possible to omit either the <ConditionPattern> or the <Body> or both. Boolean combinations are produced using the operators ',' (conjunction), 'or' (disjunction) and 'not' (negation).

For example, the Calendar lf_pattern declaration

lf_pattern([around_time, time(H, M, PartOfDay)],

�� in_interval=referent(approximate_time(H, M, PartOfDay))) :-

�� number(H),

�� number(M).

says that if the piece of logical form [around_time, time(H, M, PartOfDay)] is matched at some point in the LF, and H and M are both numbers, then this gives rise to the feature-value assignment in_interval=referent(approximate_time(H, M, PartOfDay)).

It is also possible to create nested representations using the lf_boundary declaration, which is of the form

lf_boundary(<Pattern>,

�� X^<Form>) :-

�� <Body>.

where <Form> is a Prolog term which contains an occurrence of the variable X. This says that a piece of logical form matching <Pattern> should give rise to an occurrence of <Form> in the output dialogue move; also, if any other patterns at matched on terms inside <Pattern>, then they are included in a list which becomes the value of the variable X. The effect is to say that <Form> "wraps around" all the feature-value pairs corresponding to structure inside <Pattern>. For example, the declaration

lf_boundary(term([ordinal, the_sing, N], meeting, _Body),

�� X^aggregate(nth_meeting(N), X)) :-

�� number(N).

says that a piece of logical form matching term([ordinal, the_sing, N], meeting, _Body), where N is a number, gives rise to a piece of dialogue move aggregate(nth_meeting(N), X), where X is a list containing all the feature-value pairs corresponding to structure inside term([ordinal, the_sing, N], meeting, _Body).

Many examples of lf_pattern and lf_boundary declarations can be found in the file $REGULUS/Examples/Calendar/Prolog/lf_patterns.pl

Adding intelligent help to Regulus applications

It is generally desirable to add some kind of help component to a grammar-based application, to give users feedback about the system's supported coverage. Experience shows that systems lacking a help component are generally very hard to use, while addition of even simple help functionality tends to create a dramatic improvement in usability. The Regulus platform contains tools that make it easy to add help functionality to a speech translation or spoken dialogue system based on a Regulus grammar.

The basic model assumed is as follows. The user provides two resources:

One or more files of regression testing output, from which the system extracts corpus of help examples.
A file of declarations, which is used to define equivalence classes of related words. This is declared in the config file entry targeted_help_classes_file.

At runtime, the system carries out recognition using both the main Regulus-based recognizer, and also a backup statistical recognizer. The output from the statistical recognizer is matched against the help corpus, also taking account of the equivalence classes. The system returns the N examples which match most closely.

Defining the help resources

The config file needs to contain the following three entries:

targeted_help_source_files. This should point to a list of one or more files of regression testing output, either from offline translation (for a translation application) or from offline dialogue testing (for a dialogue application).
targeted_help_classes_file. This should point to a file of help class declarations.
targeted_help_backed_off_corpus_file. This points to the file which will hold the compiled help corpus.

Help source data

The source data from which the help examples are extracted can be of several possible kinds:

For a dialogue application, source files must be output produced by offline dialogue processing commands like BATCH_DIALOGUE. All examples which are not explicitly judged as errors or bad dialogue processing are included as help sentences.
For a translation application, the simplest alternative is to use source files produced by offline translation commands like TRANSLATE_CORPUS. All examples which are not explicitly judged as errors or bad translations are included as help sentences.
For a multilingual interlingua-based translation application, it is also possible to take input from a merged interlingua corpus. Records in the corpus need to be of the form

interlingua_item(<SurfaceInterlingua>, <Interlingua>, <FromToList>)

where the first two items are not used here, and the third is a list of items of either "from-items", of the form

(from(<FromLang>)-[<FromSent>, <FromLF>])

or "to-items", of the form

(to(<FromLang>)-[<ToSent>, <FromLF>])

If a merged interlingua corpus is used, the value of the targeted_help_source_files must be of the special form

[use_combined_interlingua_corpus(<FromLang>, <ToLang>), <MergedInterlinguaCorpus>]

In this case, the items included in the help corpus are all the examples of <FromSent>s tagged with <FromLang> for which there is a <ToSent> in the same list tagged with <ToLang>.

Help classes

The help class declarations can be of one of two possible types. Simple help class declarations are of the form

help_class_member(<SurfaceForm>, <ClassId>).

for example

help_class_member(show, list_verb).
help_class_member((what, people), who_expression).

Here, <SurfaceForm> is either a single atom, or a list of atoms enclosed in parentheses; <ClassId> is an arbitrary Prolog atom. The effect is to declare that <SurfaceForm> belongs to the equivalence class <ClassId>.

Complex class declarations make use of the Regulus lexicon's feature system, and have the syntax

help_class_member(<V>, <ClassId>) :-

�� lex_entry((<Cat>:<Feats> --> <V>)).

where <V> is an arbitrary Prolog variable, <ClassId> is an arbitrary Prolog atom, <Cat> is a Regulus category symbol, and <Feats> is a possibly empty list of Regulus feature-value assignments valid for <Cat>. For example, the following complex declarations are used in the Calendar application:

% All words of category 'p' belong to the class 'preposition'.

help_class_member(Surface, preposition) :-

�� lex_entry((p:[] --> Surface)).

% All words of category 'd' such that article=n and det_type=ordinal belong to the class 'ordinal_det'.

help_class_member(Surface, ordinal_det) :-

�� lex_entry((d:[article=n, det_type=ordinal] --> Surface)).

% All words of category 'name' with sem_n_type=agent belong to the class 'person_name'

help_class_member(Surface, person_name) :-

�� lex_entry((name:[sem_n_type=agent] --> Surface)).

Macros

Help class files may also contain macros, and as usual it may be easier to structure declarations by careful use of macros. For example, the following declarations from the English MedSLT help file define all nouns with semantics of the form [[timeunit, _]] to belong to the class time_period:

macro(sem_help_class_member(Category, Sem, Class),

� ��( help_class_member(Surface, Class) :-

�� lex_entry((Category:[sem=Sem] --> Surface))

�� )).

% Time periods: seconds, minutes, etc.

@sem_help_class_member(n, [[timeunit, _]], time_period).

Stop words

The class stop_word has a special meaning: all words which are backed off to this class are ignored. So for example the English help class declaration

help_class_member(Surface, stop_word) :-
�� lex_entry((d:[article=y] --> Surface)).

says that all words in category d such that article=y should be ignored.

Regulus commands relevant to help processing

COMPILE_HELP. Compile the help resources.

LOAD_HELP. Load compiled help resources.

HELP_RESPONSE_ON. Switch on help responses in the Regulus top loop. In this mode, the help module is called for each input sentence, and the top 5 help matches are printed.

HELP_RESPONSE_OFF. Switch off help responses in the Regulus top loop.

LIST_MISSING_HELP_DECLARATIONS Write out a list of lexical items that are not listed in targeted help declarations.

Including help in an application

The predicate get_help_matches/3, defined in $REGULUS/Prolog/help.pl, can be used to retrieve a help response. This assumes that the help resources are loaded. A call is of the form

get_help_matches(Sent, N, Matches)

where Sent is an atom representing an utterances, N is the required number of help responses. Matches will be instantiated to a list of the N most closely matching help responses.

Regulus grammar notation

Regulus syntax is Prolog-based - that is, every well-formed Regulus expression is a Prolog term. There are two basic kinds of Regulus expressions: declarations and rules, both of which can be enclosed inside an optional label. Note that there is no formal distinction between a grammar rule and a lexical entry. It is also possible to use comments , macros and include statements .

COMMENTS

Comments are Prolog-style; anything between a percent-sign (%) and the end of a line is a comment, or alternately comments can be enclosed between an initial /* and a closing */.

Examples

% This is a comment

yesno:[sem=no] --> no, fucking, way. % This line needs a comment...

% This is a
% multi-line
% comment

/* And this is a
multi-line comment
too */

LABELS

A rule or declaration can optionally be given a (possibly non-unique) identifier. This makes it possible to add ignore_item declarations to ignore specified labelled rules. Note that Prolog syntax requires an extra pair of parentheses around a rule if it is enclosed in a label.

Syntax

labelled_item(<Label>, <RuleOrDeclaration>).

where <Label> is an Prolog atom and <RuleOrDeclaration> is any rule or declaration.

Examples

labelled_item(foo_macro,
macro(foo(X), [f1=X, f2=@bar])
).

labelled_item(foo_rule,
(foo:[] --> bar)
).

MACROS

Regulus permits definition of macros . These macros may be used in rules and category declarations. (It is possible that they may also be used in other declarations in later versions of Regulus). The syntax of a macro invocation is

@<Term>

where <Term> is a Prolog term that unifies with the head of a macro definition. Macro definitions can be of two forms: macros , and default_macros , with the semantics that macros take precedence over default_macros.

The semantics of macro invocation are as follows.

1. All terms of the form @<Term> which unify with the head of a macro invocation are non-deterministically replaced with the bodies of the matching macro definitions.

2. If no definition matches, an error is signalled.

3. If the macro invocation appears in the context of a feature/value list, and the macro body expands to a list, then the body is appended to the rest of the feature/value list.

4. If the macro body itself contains macro invocations, it is recursively expanded until the result contains no macro invocations.

5. If macro invocation results in a cycle, an error is signalled.

6. If there are any matching definitions of the form "macro(<Term>, <Body>)", then the "macro" definitions are used.

7. If there are no matching definitions of the form "macro(<Term>, <Body>)", but there are definitions of the form "default_macro(<Term>, <Body>)" then the "default_macro" definitions are used.

Examples

Suppose we have the following macro and default_macro definitions:

macro(foo(X), [f1=X, f2=@bar]).
macro(bar, c).
macro(bar, d).

default_macro(bar, e).
default_macro(frob, z).

Then the rule

cat1:[f3= @bar] --> word1.

expands to the two rules

cat1:[f3=c] --> word1.
cat1:[f3=d] --> word1.

Note that the default_macro definition for bar is not used, since there are normal macro rules available. Similarly, the rule

cat2:[f3=a, f4= @frob, @foo(b)] --> word2.

expands to

cat1:[f3=a, f4=z, f1=b, f2=c] --> word2.
cat1:[f3=a, f4=z, f1=b, f2=d] --> word2.

Note here that the default_macro definition for frob has been used, since there is no macro definition available.

INCLUDE STATEMENTS

It is possible for Regulus files to include other Regulus files. The syntax is

include(<Pathname>).

where <Pathname> is an Prolog-syntax pathname. If no extension is given, it is assumed to be ".regulus". Included files may themselves include file, to any depth of nesting. If the pathname is not relative, it is assumed relative to the directory of the including file.

Examples

include(foo).
include('foo.regulus').
include('$REGULUS_GRAMMAR/foo').
include(regulus_grammar(foo)).
include('more_grammars/foo').
include('../foo').
include('../other_grammars/foo').

DECLARATIONS

The following types of declarations are permitted:

� ignore_item

� macro

� default_macro

� feature_value_space

� feature

� category

� top_level_category

� feature_instantiation_schedule

� specialises

� ignore_feature

� ignore_specialises

� feature_value_space_substitution

� external_grammar

IGNORE_ITEM

Syntax

ignore_item(<Label>).

Conditions

<Label> is an atom.

Effect

All rules and/or declarations with label <Label> are ignored.

Examples

ignore_item(foo_macro).
ignore_item(foo_rule).

MACRO DECLARATION

Syntax

macro(<MacroHead>, <MacroBody>).

Conditions

<MacroHead> is a non-variable term.

<MacroBody> is an arbitrary term

Effect

<MacroHead> is defined as a macro pattern. This means that any term in a rule which unifies with @<MacroHead> is expanded to <MacroBody> . If there are common variables in <MacroHead> and <MacroBody>, then the variables in <MacroBody> are instantiated from those in <MacroHead> . Macros are described in more detail here.

Examples

macro(foo(X), [f1=X, f2=@bar]).
macro(bar, c).
macro(bar, d).

DEFAULT MACRO DECLARATION

Syntax

default_macro(<MacroHead>, <MacroBody>).

Conditions

<MacroHead> is a non-variable term.

<MacroBody> is an arbitrary term

Effect

<MacroHead> is defined as a default macro pattern. This means that any term in a rule which unifies with @<MacroHead> is expanded to <MacroBody> , as long as there are no macro declarations that match. If there are common variables in <MacroHead><MacroBody>, then the variables in <MacroBody> are instantiated from those in <MacroHead> . Macros are described in more detail here.

Examples

macro(foo(X), [f1=X, f2=@bar]).
macro(bar, c).
macro(bar, d).

FEATURE_VALUE_SPACE DECLARATION

Syntax

feature_value_space(<ValueSpaceId>, <ValueSpace>).

Conditions

<ValueSpaceId> is an atom.

<ValueSpace> is a list of lists of atoms

Effect

<ValueSpaceId> is defined as the name of the feature value space <ValueSpace> . The lists in <ValueSpace> represent the range of possible values along each dimension (one per list) of the value space. Usually, <ValueSpace> will be a singleton list, i.e. the space will be one-dimensional.

It is possible to have multiple feature_value_space declarations for the same value_space_id, as long as the forms of the declarations are compatible in terms of number of dimensions. In this case, the lists of possibilities along each dimension are unioned.

Examples

feature_value_space(sem_np_value, [[n, device, location]]).
feature_value_space(number_value, [[sing, plur]]).

feature_value_space(agr_value, [[sing, plur], [1, 2, 3]]).

FEATURE DECLARATION

Syntax

feature(<FeatName>, <ValueSpaceID>).

Conditions

<FeatName> and <ValueSpaceID> are both atoms. <ValueSpaceID> must be declared as a feature_value_space .

Effect

<FeatName> is defined as a feature taking values in <ValueSpaceID> .

Examples

feature(number, number_value).

feature(sem_np_type, sem_np_value).

feature(obj_np_type, sem_np_value).

CATEGORY DECLARATION

Syntax

category(<CategoryName>, <FeatsList>).

Conditions

<CategoryName> is an atom. <FeatsList> is a list of atoms, all of which must be declared as features , except for the pre-defined features 'sem' and 'gsem'.

Effect

<CategoryName> is declared as a category with features <FeatsList> .

Examples

category('.MAIN', [gsem]).
category(noun, [sem, number, sem_np_type]).
category(verb, [sem, number, vform, vtype, obj_sem_np_type]).

TOP_LEVEL_CATEGORY DECLARATION

Syntax

top_level_category(<CategoryName>).

Conditions

<CategoryName> is an atom that has been declared as a category .

Effect

Declares that <CategoryName> is a top-level category, i.e. a start symbol in the grammar. In the GSL translation, rules for <CategoryName> will use the symbol <CategoryName> exactly as it is specified in the Regulus grammar, e.g. without changing capitalisation. This may mean that category names specified by top_level_category may need to start with a period.

Example

top_level_category('.MAIN').

FEATURE_INSTANTIATION_SCHEDULE DECLARATION

Syntax

feature_instantiation_schedule(<Schedule>).

Conditions

<Schedule> is a list of lists of atoms, all of which must be declared as features . Every declared feature must appear in one and only one of the lists.

Effect

This declaration can be used to control the way in which feature expansion is carried out. Feature expansion is initially invoked only on the first group of features, after which the rule space is filtered to remove irrelevant rules. Then expansion and filtering is performed using the second group of features, and so on until all the features have been expanded.

If no feature_instantiation_schedule declaration is supplied, the compiler performs expansion and filtering on the whole set of features at once.

Example

feature_instantiation_schedule([[number, vform, vtype], [sem_np_type, obj_sem_np_type]]).

SPECIALISES DECLARATION

Syntax

specialises(<FeatVal1>, <FeatVal2>, <ValueSpaceId>).

Conditions

<FeatVal1>, <FeatVal2> and <ValueSpaceId> are all atoms. <FeatVal1> and <FeatVal2> must be declared as possible values of the feature value space <ValueSpaceId>.

Effect

Declares that <FeatVal1> is a specialisation of <FeatVal2> . At compile-time, <FeatVal2> will be replaced by the disjunction of all the values that specialise it.

Example

specialises(switchable, device, sem_np_type_value).

IGNORE_FEATURE DECLARATION

Syntax

ignore_feature((<FeatName>).

Conditions

<FeatName> is an atom that has been declared as a feature .

Effect

All occurrences of <FeatName> in rules and lexical entries are ignored.

Example

ignore_feature(number).

IGNORE_SPECIALISES DECLARATION

Syntax

ignore_specialises(<FeatVal1>, <FeatVal2>, <ValueSpaceId>).

Conditions

<FeatVal1>, <FeatVal2> and <ValueSpaceId> are all atoms. <FeatVal1> and <FeatVal2> are declared as possible values of the feature value space <ValueSpaceId>.

Effect

Cancels the effect of the specialises declaration specialises(<FeatVal1>, <FeatVal2>, <ValueSpaceId>), if there is one.

Example

ignore_specialises(switchable, device, sem_np_type_value).

FEATURE_VALUE_SPACE_SUBSTITUTION DECLARATION

Syntax

feature_value_space_substitution(<FeatVal1>, <FeatVal2>, <ValueSpaceId>).

Conditions

<FeatVal1>, <FeatVal2> and <ValueSpaceId> are all atoms. <FeatVal1> and <FeatVal2> are both possible values in the feature value space <ValueSpaceId>.

Effect

<FeatVal1> is substituted by <FeatVal2> wherever it appears as part of the value of a feature taking values in <ValueSpaceId> .

Example

feature_value_space_substitution(switchable, device, sem_np_type_value).

EXTERNAL_GRAMMAR DECLARATION

Syntax

external_grammar(<TopLevelGrammar>, <GSLGrammar>).

Conditions

<TopLevelGrammar> is an atom. <GSLGrammar> is an atom representing a full GSL grammar. <TopLevelGrammar> should NOT be defined as a category.

Effect

The GSL rule

<TopLevelGrammar> <GSLGrammar>

is added to the output grammar. This is mainly useful for including SLM grammars in Regulus-generated grammars.

Example

external_grammar('.EXTERNAL', '[foo bar]').

RULES

Syntax

<Category> --> <RHS>

Conditions

<Category> is a category . <RHS> is an RHS .

Semantics

Declares that <Category> can be rewritten to <RHS> .

Examples

utterance:[sem=S] --> command:[sem=S].
np:[sem=[spec=S, noun=N]] --> spec:[sem=S, number=Num], noun:[sem=N, number=Num].
noun:[sem=light, number=sing] --> light.

RHS

An RHS is of one of the following forms:

� category

� lexical item

� sequence

� alternate

� optional

LEXICAL ITEM

Syntax

Prolog atom

Conditions

Print name of atom starts with a lower-case letter.

Semantics

Specific word

Examples

light
the
switch

SEQUENCE

Syntax

( <RHS1>, <RHS2> )

Conditions

<RHS1> and <RHS2> are RHSs .

Semantics

Sequence consisting of <RHS1> followed by <RHS2>.

Examples

( all, of, the )
( spec:[sem=S, number=N], device_noun:[sem=D, number=N] )
( at, least, number:[sem=S] )

ALTERNATE

Syntax

( <RHS1> ; <RHS2> )

Conditions

<RHS1> and <RHS2> are RHSs .

Semantics

Either <RHS1> or <RHS2>.

Examples

( under ; over ; ( at, least ) )
( adj:[sem=S] ; np:[sem=S] )
( a ; an ; number:[sem=S] )

OPTIONAL

Syntax

?<RHS>

Conditions

<RHS> is an RHS .

Semantics

Either <RHS> or nothing.

Examples

?the
?pp:[sem=S, type=loc]
?( (of, the) )

FEATURE VALUE LIST

A feature value list is a (possibly empty) list of feature value pairs .

FEATURE VALUE PAIR

A feature value pair is either a semantic feature value pair or a syntactic feature value pair .

SEMANTIC FEATURE VALUE PAIR

Syntax

<SemFeat> = <SemVal>

Conditions

<SemFeat> is either sem or gsem . <SemVal> is a semantic value .

Semantics

sem translates into a GSL return value. gsem translates into GSL slot-filling.

Examples

gsem=[operation=Op, spec=S, device=D, onoff=O, location=L]
sem=[and, N1, N2]

SYNTACTIC FEATURE VALUE PAIR

Syntax

<SynFeat> = <SynVal>

Conditions

<SynFeat> is an atom declared as a feature for the category in which the feature value pair occurs. <SynVal> is a syntactic feature value .

Semantics

<SynFeat> has a value compatible with <SynVal>.

Examples

number=plur
gender=Gen,
vp_modifiers_type=(location\/n)

SEMANTIC VALUE

A semantic value is of one of the following:

� Atomic value

� Semantic variable

� Semantic feature value list

� List

� Unary GSL function expression

� Binary GSL function expression

ATOMIC VALUE

Syntax

Prolog atom

Conditions

Can only be used in expressions occurring in LHS category.

Semantics

Translates into atomic GSL value.

Examples

light

device

SEMANTIC VARIABLE

Syntax

Prolog variable

Conditions

If in LHS category, same variable must also occur in RHS.

Semantics

Value from RHS is passed up into LHS.

Examples

X
Sem

SEMANTIC FEATURE VALUE LIST

Syntax

[<Atom1> = <SemVal1>, <Atom2> = <SemVal2>, ...]

Conditions

<Atom1>, <Atom2> etc are Prolog atoms.

<SemVal1>, <SemVal2> etc are semantic values .

Semantics

If in LHS category, becomes a GSL feature value list.
If in RHS, giving <SemVal1> variable values allows access to components of feature value lists in RHS.

Examples

[spec=S, device=D]
[op=command, device=D]

LIST

Syntax

[<SemVal1>, <SemVal2>, ...]

Conditions

Can only be used in LHS category.

<SemVal1>, <SemVal2> etc are semantic values .

Semantics

Creates a GSL list.

Examples

[device, light]
[[operation, command], [device, D]]

UNARY GSL FUNCTION EXPRESSION

Syntax

<UnaryGSLFunction>(<SemVal>)

Conditions

Can only be used in LHS category.
<SemVal> is a semantic value .
<UnaryGSLFunction> is one of the following unary GSL functions: neg , first, last , rest .

Semantics

Translates into corresponding GSL function expression.

Examples

neg(X)
first(List)

BINARY GSL FUNCTION EXPRESSION

Syntax

<BinaryGSLFunction>(<SemVal1>, <SemVal2>)

Conditions

Can only be used in LHS category.
<SemVal1> and <SemVal2> are semantic values .
<BinaryGSLFunction> is one of the following binary GSL functions: add , sub, mul , div , strcat , insert_begin , insert_end , concat

Semantics

Translates into corresponding GSL function expression.

Examples

add(X, Y)
strcat(A, B)
concat(L1, L2)

SYNTACTIC FEATURE VALUE

A syntactic feature value is of one of the following forms:

� Atomic syntactic feature value

� Variable syntactic feature value

� Disjunctive syntactic feature value

� Conjunctive syntactic feature value

� Negated syntactic feature value

ATOMIC SYNTACTIC FEATURE VALUE

Syntax

Prolog atom <Atom>

Conditions

<Atom> must be declared as a member of a dimension of the feature value space for the appropriate feature .

Semantics

The value of the feature is restricted to be consistent with <Atom> . If the feature value space is one-dimensional (the usual case) then the value must be equal to <Atom>. If the space is multi-dimensional, then the value must be of the form <Atom>/\<OtherValues> where <OtherValues> is a conjunction of values along the remaining dimensions of the feature value space.

Examples

sing
device
no

VARIABLE SYNTACTIC FEATURE VALUE

Syntax

Prolog variable <Var>

Conditions

<Var> can only occur as a value of other features if they have the same feature value space .

Semantics

The value of the feature will be the same as that of any other feature in the same rule whose value is <Var> .

Examples

V
Number

DISJUNCTIVE SYNTACTIC FEATURE VALUE

Syntax

<SynVal1> \/ <SynVal2>

Conditions

<SynVal1> and <SynVal2> are syntactic feature values belonging to the same feature value space .

Semantics

The value of the feature is constrained to be compatible with either <SynVal1> or <SynVal2> .

Examples

( yes \/ no )
( switchable \/ dimmable \/ null )

CONJUNCTIVE SYNTACTIC FEATURE VALUE

Syntax

<SynVal1> /\ <SynVal2>

Conditions

<SynVal1> and <SynVal2> are syntactic feature values belonging to different dimensions of the same feature value space . If the space is one-dimensional (the usual case), note that this makes no sense.

Semantics

The value of the feature is constrained to be compatible with both <SynVal1> or <SynVal2> .

Examples

( sing /\ 3 )
( plur /\ 1 )

NEGATED SYNTACTIC FEATURE VALUE

Syntax

( \ ( <SynVal> ) )

Conditions

<SynVal> is a syntactic feature value .

Semantics

The value of the feature is constrained to be NOT compatible with <SynVal> .

Examples

( \ ( device ) )
( \ ( sing /\ 3 ) )

Compiler error messages

The compiler can produce the following error messages:

� Arg in feature_instantiation_schedule declaration not a list of lists of atoms
See feature instantiation schedule .

� Arg in ignore_feature declaration not an atom
See ignore_feature .

� Arg in ignore_feature declaration not declared as feature
See ignore_feature .

� Bad category <cat>
Something unspecific is wrong with this category.

� Bad subterm <term> in rule
Something unspecific is wrong with this subterm

� Cannot have both sem and gsem as features
The compiler only allows a rule to use one out of 'sem' and 'gsem'.
The rule can have a semantic return value (sem), or do global slot-filling (gsem), but not both at once.

� Circular chain of specialisations
specialisation declarations must form a hierarchy.

� First arg in category declaration not an atom
See category declaration .

� First arg in external_grammar declaration not an atom
See external grammar declaration .

� First arg in feature declaration not an atom
See feature declaration .

� First arg in feature_value_space declaration not an atom
See feature value space declaration

� First arg in top_level_category declaration not an atom
See top level category declaration .

� First arg in top_level_category declaration not declared as category
See top level category declaration .

� Following atoms in feature_instantiation_schedule declaration not declared as features...
See feature instantiation schedule declaration .

� Following features not listed in feature_instantiation_schedule declaration...
See feature instantiation schedule declaration .

� Gsem feature is meaningless except in head of rule
Since the gsem feature corresponds to global slot-filling, it needs to be in the head.

� List in body of rule not allowed yet
If you want to extract elements from a list produced by an RHS category, use the unary GSL functions first, last and rest.

� Meaningless for top-level category to use sem feature. Probably gsem intended?
The sem feature corresponds to a return value, but a top-level category can only pass out information using global slot-filling (gsem).

� More than one feature_instantiation_schedule declaration
It is meaningless to have more than one feature_instantiation_schedule declaration .

� No top-level categories left at start of top-down filtering phase.
During the compilation process, the compiler performs "bottom-up filtering", and removes all categories which are irrelevant in terms of not being expandable to lexical entries. If all of the top-level categories, the grammar becomes equivalent to the null language. In practice, this usually means that you are have not added enough lexical entries yet.

� No top-level category defined.
There must be at least one top level category definition .

� Not meaningful to use GSL function <func> in body of rule
Unary and binary GSL functions can only be used in the LHS of a rule.

� Second arg in category declaration not a list
See category declaration .

� Second arg in external_grammar declaration not an atom
See external grammar declaration .

� Second arg in feature declaration not an atom
See feature declaration .

� Second arg in feature declaration not declared as feature value space
See feature declaration .

� Second arg in feature_value_space declaration not a list of lists of category values
See feature value space declaration .

� Semantic variable assigned value in body of rule
GSL provides no mechanism to check a semantic value, so it is meaningless to give semantic variables specific values on the RHS.

� Semantic variable in rule head doesn't appear in body
The point of having a semantic variable in the rule head is that it should get its value from another occurrence of the same variable on the RHS. Note that the occurrence in the body must be a semantic value - using the variable as a syntactic value is not permitted.

� Semantic variable occurs twice in body of rule
Since there is no way to check the value of a semantic variable, it makes no sense to include one twice on the RHS.

� "specialises" declaration must be of form specialises(Val1, Val2, Space)
See specialises declaration.

� Third arg in "feature_value_space_substitution" declaration not an atom
See feature value space substitution declaration .

� Third arg in "ignore_specialises" declaration not an atom
See ignore specialises declaration .

� Third arg in "ignore_specialises" declaration not declared as feature value space
See ignore specialises declaration .

� Third arg in "specialises" declaration not an atom
See specialises declaration .

� Top-level category may not have syntactic features
Since a top-level category has to have external significance to Nuance, it may not have syntactic features.

� Unable to combine feature_value_space declarations...
See feature value space declaration .

� Unable to internalise regulus declaration
Something unspecific is wrong with this declaration.

� Unable to interpret <file1> (included in <file2>) as the name of a readable file with .regulus extension
See include statements .

� Unable to interpret <file> as the name of a readable file with .regulus extension
See compiling Regulus grammars into GSL .

� Undeclared feature(s) <feats> in category <cat>
All features must be declared using a feature declaration .

� Undeclared features in category declaration: <decl>
All features must be declared using a feature declaration .

� Value of gsem feature can only be a feature/value list.
Since gsem translates to global slot-filling, its value must be a feature/value list.

� Variable used as file name
File names must be instantiated.

Using the RegServer

The RegServer is a C++ application that provides a simple Prolog-compatible interface to a Regulus grammar. In effect, it lets the developer access Nuance speech functionality from a Prolog program as though the low-level speech input and output calls were Prolog predicates. Communication between the Prolog program and the C++ app is through a simple socket-based interface.

The rest of the section is organised as follows:

� Installing the RegServer

� Running the RegServer

� Interfacing the RegServer to a Prolog application

� Interfacing the RegServer to a Java application

� Sample RegServer applications

Installing the RegServer

All files are in the directory $REGULUS/RegulusSpeechServer.

If you are running under Windows, you do not need to do anything more once you have unpacked the Regulus directory; simply invoke the RegServer exe file, as described below.

If you are running in some other environment, or you want to recompile the SpeechServer for some reason, the C++ source code is in the subdirectory C_src. If you are using Visual C++, the SpeechServer .dsp and .dsw files are in the directory VC++.

Running the RegServer

The executable for the RegServer app is in the file runtime/regserver.exe. Usage is as follows:

c:\home\speech\Regulus\RegulusSpeechServer\runtime\regserver.exe \
        -package <package dir> \
        [nuance parameters] \
        [-port <tcp port the server listens for connection - default is 1974>] \
        [-v] \
        [-f <log file>]

In order to run the RegServer on a Regulus grammar, you need to do the following:

� Compile the Regulus grammar to a GSL grammar.

� Compile the GSL grammar into a recognition package <package>, as described in the Nuance documentation.

� Start a Nuance license manager, as described in the Nuance documentation.

� Start a recserver on <package>, as described in the Nuance documentation.

� Finally, invoke the RegServer executable, specifying <package> as the package.

An invocation of the RegServer from the command-line needs to specify at least the following parameters:

� A port

� A Nuance recognition package derived from a Regulus grammar

� audio.Provider and other Nuance parameter, if any: please consult the Nuance documentation.

Usually, you will also want to supply a parameter which specifies a port for a TTS engine. A typical invocation looks like this:

%REGULUS%\RegulusSpeechServer\runtime\regserver.exe -port 1975 -package ..\GeneratedFiles\recogniser client.TTSAddresses=localhost:32323 -f C:/tmp/regserver_log.txt

Interfacing the RegServer to a Prolog application

A Prolog program can communicate with the SpeechServer using the following predicates, all defined in the file Prolog/regulus_sockettalk.pl:

� regulus_sockettalk_init/1

� regulus_sockettalk_exit_client/0

� regulus_sockettalk_exit_server/0

� regulus_sockettalk_say_file/1

� regulus_sockettalk_say_tts/1

� regulus_sockettalk_say_list/1

� regulus_sockettalk_set_output_volume/1

� regulus_sockettalk_set_parameter/2

� regulus_sockettalk_get_parameter/2

� regulus_sockettalk_recognise/2

� regulus_sockettalk_recognise_file/3

� regulus_sockettalk_interpret/3

regulus_sockettalk_init(+Port)

Conditions

Port is a port number.

Effect

Initialise socket connection; call before invoking any of the other calls.

Example

regulus_sockettalk_init(1975)

regulus_sockettalk_exit_client

Conditions

None

Effect

Closes connection to regserver.

Example

regulus_sockettalk_exit_client

regulus_sockettalk_exit_server

Conditions

None

Effect

Exits regserver.

Example

regulus_sockettalk_exit_server

regulus_sockettalk_say_file(+File)

Conditions

File is an atom whose print name is the name of a .wav file in the current RegServer prompt directory

Effect

The wavfile File is appended to the prompt queue

Example

regulus_sockettalk_say_file(hello)

regulus_sockettalk_say_tts(+String)

Conditions

String is a Prolog string

Effect

A request to say String using TTS is appended to the prompt queue

Example

regulus_sockettalk_say_string("hello world")

regulus_sockettalk_say_list(+ItemList)

Conditions

ItemList is a list of items of form either

� file(FileAtom) where FileAtom is an atom representing a wavfile.

� tts(StringAtom) where StringAtom is an atom representing a string.

Effect

An ordered list of output requests to play wavfile and/or perform TTS is appended to the prompt queue.

Example

regulus_sockettalk_say_list([file('hello.wav'), file('world.wav'), tts('OK, did that')])

regulus_sockettalk_set_output_volume(+Number)

Conditions

Number is an integer

Effect

A request to set the output volume to Number is sent to the server

Example

regulus_sockettalk_set_output_volume(255)

regulus_sockettalk_set_parameter(+ParamName, +Value)

Conditions

ParamName is an atom
Value is an atom or number

Effect

A request to set Param to Value is sent to the server

Example

regulus_sockettalk_set_param('audio.OutputVolume', 255)

regulus_sockettalk_get_parameter(+ParamName, -Value)

Conditions

ParamName is an atom

Effect

A request to get Param is sent to the server. Value is unified with the result, which can be either an integer, a float or an atom.

Example

regulus_sockettalk_get_param('audio.OutputVolume', Volume)

regulus_sockettalk_recognise(+Grammar, -Response)

regulus_sockettalk_recognise_file(+Wavfile, +GrammarName, -Result)

regulus_sockettalk_interpret(+StringAtom, +GrammarName, -Result)

Conditions

Grammar is a Prolog atom representing a top-level grammar present in the current recognition package.

Wavfile is an atom representing a wavfile

StringAtom is an atom representing a text string

Effect

The process sends a recognition request to the server, using the top-level grammar Grammar.
For regulus_sockettalk_recognise_file/3, recognition is performed on the designated wavfile.
For regulus_sockettalk_interpret, the designated string is parsed using the specified grammar.

The response can be one of the following:

� recognition_succeeded(Confidence, Words, Result) where

o Confidence is the Nuance confidence score ;

o Words are the recognised words, expressed as a Prolog atom ;

o Result is a Regulus semantic expression, expressed as a Prolog term.

� recognition_failed(FailureType) where FailureType is a Prolog term.

Example

regulus_sockettalk_recognise('.MAIN', Result)
regulus_sockettalk_recognise_file('C:/tmp/utt1.wav', '.MAIN', Result)
regulus_sockettalk_interpret("switch on the light', '.MAIN', Result)

Interfacing the RegServer to a Java application

There is a Java client library which provides functionality similar to that of the Prolog library. You can use this to construct Java-based applications that use the RegServer. Documentation is available in the file regServer.html, in this directory.

Sample RegServer applications

Sample Prolog-based dialogue application

There is a sample Prolog-based RegServer dialogue application in $REGULUS/Examples/Toy1/Prolog/toy1_app.pl. You can run this application as follows:

1. Compile the Toy1 recognition package by executing a 'make' in the directory $REGULUS/Examples/Toy1/scripts.

2. Start a Nuance license manager.

3. Start a Nuance recserver by invoking the script $REGULUS/Examples/Toy1/scripts/run_recserver.bat.

4. Start an English Nuance Vocalizer by invoking the script $REGULUS/Examples/Toy1/scripts/run_vocalizer3.bat.

5. Start a RegServer by invoking the script $REGULUS/Examples/Toy1/scripts/run_regserver.bat.

6. Start the top-level app by invoking the script $REGULUS/Examples/Toy1/scripts/run_app.bat.

Sample Prolog-based speech translation application

There is a sample Prolog-based RegServer dialogue application in $REGULUS/Examples/Toy1/Prolog/toy1_slt_app.pl. You can run this application as follows:

1. Compile the Toy1 recognition package by executing a 'make' in the directory $REGULUS/Examples/Toy1/scripts.

2. Start a Nuance license manager.

3. Start a Nuance recserver by invoking the script $REGULUS/Examples/Toy1/scripts/run_recserver.bat.

4. Start a French Nuance Vocalizer by invoking the script $REGULUS/Examples/Toy1/scripts/run_vocalizer3_fre.bat.

5. Start a RegServer by invoking the script $REGULUS/Examples/Toy1/scripts/run_regserver.bat.

6. Start the top-level app by invoking the script $REGULUS/Examples/Toy1/scripts/run_slt_app.bat.

Sample Java-based dialogue application

[Not yet documented]

Using Regulus

Frequently asked questions

How to install and run Regulus

Installing Regulus

Setting up the Regulus development environment

Using the Regulus development environment

HELP

LOAD

DCG

LC

STEPPER

Invoke the grammar stepper.

NORMAL_PROCESSING

NUANCE

GEMINI

TRACE

NOTRACE

TRANSLATE

DIALOGUE

INTERLINGUA

NO_INTERLINGUA

LOAD_TRANSLATE

TRANSLATE_TRACE_ON

Switch on translation tracing.

TRANSLATE_TRACE_OFF

Switch off translation tracing.

COMPILE_ELLIPSIS_PATTERNS

TRANSLATE_CORPUS

TRANSLATE_CORPUS <Arg>

UPDATE_TRANSLATION_JUDGEMENTS

UPDATE_TRANSLATION_JUDGEMENTS <Arg>

TRANSLATE_SPEECH_CORPUS

TRANSLATE_SPEECH_CORPUS <Arg>

TRANSLATE_SPEECH_CORPUS_AGAIN

TRANSLATE_SPEECH_CORPUS_AGAIN <Arg>

UPDATE_TRANSLATION_JUDGEMENTS_SPEECH

UPDATE_TRANSLATION_JUDGEMENTS_SPEECH <Arg>

UPDATE_RECOGNITION_JUDGEMENTS

UPDATE_RECOGNITION_JUDGEMENTS <Arg>

SPLIT_SPEECH_CORPUS <GrammarName> <InCoverageId> <OutOfCoverageId>

LOAD_GENERATION

LOAD_GENERATION <Arg>

GENERATION

LOAD_SURFACE_PATTERNS

SURFACE

LOAD_DIALOGUE

Compile the files defined by the dialogue_files config file entry. INIT_DIALOGUE

EBL_TREEBANK

EBL_TRAIN

EBL_POSTPROCESS

EBL_LOAD

EBL_LOAD_GENERATION

EBL_LOAD_GENERATION <SubdomainTag>

EBL_NUANCE

EBL_GEMINI

EBL_ANALYSIS

EBL

EBL_GENERATION

UPDATE_DIALOGUE_JUDGEMENTS

UPDATE_DIALOGUE_JUDGEMENTS <Arg>

UPDATE_DIALOGUE_JUDGEMENTS_SPEECH

UPDATE_DIALOGUE_JUDGEMENTS_SPEECH <Arg>

Parsing utterances in the Regulus development environment

Parsing non-top constituents

Using the grammar stepper

The config file

Using Regulus in batch mode

Compiling Regulus to Nuance from the command line

Running the development environment with speech input

Calling the Regulus parser from Prolog

Regulus grammar examples

Toy0 (Regulus version)

Toy0 (GSL version)

Toy1 (Regulus version)

Toy1 (GSL version)

Building recognisers using grammar specialisation

The general English grammar

Building on top of the general English grammar

Lexicon macros

Invoking the grammar specialiser

Compile the files defined by the dialogue_files config file entry.

INIT_DIALOGUE

FEATURE DECLARATION

Syntax