Specification

From OpenFormula

OpenFormula Format for Office Applications (OpenFormula)
Rough Draft 2006-02-21

Location: http://www.openformula.org

Abstract: This is the specification of the OpenFormula Format, an open format for exchanging recalculated formulas in office documents, particularly for spreadsheets. This is intended for use as a supporting document to the Open Document Format for Office Applications (OpenDocument) format.

Status: This document is a draft, and will be updated periodically.

License: See the License Terms section for licensing, and Rules for Contributors that explains the requirements placed on contributors. We hope that a future version of this specification will used as a base document by a formal standards group."

Note: You are viewing the annotated version of this specification, which includes notes, rationales, discussion items, and TBD markers. The annotations are not normative, but may be very helpful to specification users.

Contents

Scope

OpenFormula is an open format for exchanging recalculated formulas between office application implementations, particularly for spreadsheets. OpenFormula defines the types, syntax, and semantics for calculated formulas, including many predefined functions and operations, so that formulas can be exchanged between applications and produce substantively equal outputs when recalculated with equal inputs. Both closed and open source software can implement OpenFormula.

OpenFormula is intended to be a supporting document to the Open Document Format for Office Applications (OpenDocument) format, particularly for defining its attributes table:formula and text:formula. It can be used in other circumstances where a simple, easy-to-read infix notation is desired for exchanging recalculated formulas.

OpenFormula does not define:

  • the user interface. User interfaces may use different syntaxes, different function names, and/or different parameter orders.
  • the display format or controls over it.
  • internal representations.
  • a general notation for mathematics.
  • a full-fledged programming language. OpenFormula formulas calculate a single result and return it. By design, most operations and functions are free of side-effects, and it is not possible to "loop forever".

Note: This specification is derived from actual practice in industry. It was especially influenced by the OpenOffice.org syntax and by the semantics of Microsoft Excel, but many other spreadsheet implementations were considered including (alphabetically) Corel's WordPerfect suite, Document To Go's Sheet to Go, GNOME Gnumeric, IBM/Lotus 1-2-3, and KOffice's KSpread.

Implementations use OpenFormula as a method to exchange formulas. Once an implementation reads a formula in this format, it may choose to represent the formula internally in an arbitrary way (such as a bytecode sequence, compiled machine code, or a tree of nodes). Formulas are executed whenever they are needed to compute a specific result.

OpenFormula is not a full-fledged programming language. It is specifically designed to describe how to calculate a single result, which it returns as its answer. OpenFormula is, in general, side-effect-free; it does not have built-in operations to "assign" a value, and unless otherwise noted its functions do not have side-effects. Built-in functions, when passed the same values, generally produce the same result (with the notable exception of random number functions like RAND()). It is designed so that it is not possible to describe a calculation in a single function that can "loop forever." By design, all formula calculations that only reference built-in functions must eventually end (though this may take an extremely long time for a few computationally-intense functions). However, endless loops can be created indirectly -- formulas may reference values (such as table cells) which in turn depend on the value being computed by the formula. Implementations typically handle such cycles specially, either forbidding them or requiring a special setting before allowing them. For example, implementations of OpenDocument normally forbid circular references unless the table:iteration element is set, and if set, iterations are allowed up to a specified maximum number of iterations; for more information see OpenDocument section 8.5.2.

Normative References


TBD: IEEE floating-point, though use the ISO reference.

Note: This will be short. We do not plan to have a "Terms, definitions, and symbols" section following it for now, though that could change if helpful.

Document Notation and Conventions

Within this specification, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in IETF RFC 2119 if they appear in uppercase letters.

Information within a Rationale:, Note:, or Discussion: section is not normative:

  • Rationale explains why a part of the specification is defined that way.
  • Notes provide other information, such as tips for implementation or information on what various existing implementations do.
  • Discussion describes the pros and cons for a decision yet to be made.

It is expected that there will eventually be an unannotated version of this document, without this information, and an annotated version with this information.

Rationale: Throughout the document rationales are included to prevent questions from being asked repeatedly during development and to aid implementors in avoiding common mistakes. Much discussion about existing implementations is included; where practical, existing implementations should help guide any standard, and these additional notes should help ensure that this is true.

This specification defines OpenFormula formulas in terms of a canonical text representation used for exchange. Typically OpenFormula formulas are included inside XML documents; in such cases, certain characters must be escaped as defined in the XML specification (e.g., & must be represented as & or similar). These XML documents are often compressed (e.g., using zip). These are outside the scope of OpenFormula.

In English the plural of formula can be "formulas" or "formulae"; this document uses the more common plural in English, "formulas".

Rationale: Both the New York Times "Everyday Dictionary" (1982) and Merriam-Webster's "Webster's Ninth New Collegiate Dictionary" (1983) declare that both "formulas" and "formulae" are acceptable. A Google search on 2005-12-30 found 24,800,000 pages with the term "formulas" as opposed to 6,810,000 for "formulae". Please change all uses of the term "formulas" to "formulae" when translating this specification into Latin.

Conformance

Levels

A "level" describes the amount of functionality provided by an OpenFormula implementation or required by a given file containing formulas. There are four levels, where each level is a strict superset of lower levels:

  • Level 1: Minimum implementation. This level provides the minimal capabilities (including functions, types, and their meanings) that are very widely implemented with spreadsheet applications, even in resource-contrained environments. It includes around 100 functions.
  • Level 2: Minimum desktop implementation. This level provides the minimal capabilities expected from typical desktop spreadsheets. Many current desktop implementations meet or nearly meet this level.
  • Level 3: Basic implementation. This level provides all the capabilities necessary for typical desktop spreadsheet use. Many current implementations meet or nearly meet this level. This adds a distinguished logical type and support for complex numbers.
  • Level 4: Full implementation. This level provides some additional, less-commonly used features intended for advanced users.

A conforming application MUST report the highest level it conforms to, if it claims conformance to this specification. Unless stated otherwise, all requirements of this specification apply to any application that claims to meet the OpenFormula specification. In particular, all conforming applications MUST meet the level 1 requirements to claim conformance to this specification. Applications MAY accept extensions as long as they do not interfere with processing data in the format defined here. In particular, applications MAY support additional formulas and operations, additional optional parameters for functions, or make certain function parameters optional when they are required by this specification. It is RECOMMENDED that applications clearly document their extensions in all user documentation, both online and paper, in a manner that users would be very likely to be aware when they are using a non-standard extension. Applications MAY partly implement a higher level of the specification.

This specification's text is written as a description of the requirements of an implementing application. However, documents can also comply (or fail to comply) with this specification. A conforming document MUST be correctly recalculable given only the specification given here. To claim conformance, a conforming document MUST report the lowest level it conforms to. A document with no formulas is notionally given level 0. Note that some applications MAY be able to read and write some documents of a higher level than their compliance level, because applications may partially implement higher levels.

Rationale: Here are some of the drivers for the various levels:
  • Level 1 is driven by PDAs, cell phones, etc. Level 1 represents the capabilities and functions implemented by such diverse applications as Sheet To Go (which runs on PalmOS PDAs), OpenOffice.org 2 Calc, and Microsoft Excel 2003. In general, if any implementation did not implement a particular function or semantic, that requirement was moved to a higher level.
  • Level 2 is a "majority intersection" of the capabilities in common desktop spreadsheet programs (e.g., most spreadsheet implementations for the desktop provide those capabilities), considering implementations such as Microsoft Excel 2003, Corel Word Perfect Office 12, IBM/Lotus 1-2-3 version 9.8, OpenOffice.org Calc, and Gnumeric.
  • Level 3 is driven by what is needed to be compatible with market-leading spreadsheet applications, particularly Microsoft Excel 2003. Here additional semantics are required.
  • Level 4 is almost the union of capabilities of common desktop spreadsheet programs.

Having levels of compliance lets implementations quickly achieve some level of compliance that is sufficient for many user needs, yet be able to distinguish themselves if they provide an especially rich set of functionality. In particular, it is expected that many implementations will meet level 1 out of the box (once they support the file exchange and syntax). Level 2 gives implementations a relatively easy goal (if they don't already implement it). Level 4 is a steeper goal, but even if spreadsheet implementations that do not implement everything in level 4 can implement a subset with the same semantics -- and thus be able to exchange that information with other programs that implement the same subset.

Here are some of the known weaknesses of current spreadsheet implementations as compared to level 3:

  • Microsoft Excel's AND and OR functions do not work correctly in array formulas (this is a known bug). It is expected that if Microsoft changed this, it will not affect any actual spreadsheets, because the current semantic of Excel is essentially unusable in this case. In particular, current books recommend using * and + to try to work around this inside array formulas.
  • KSpread's MOD function handles negative numbers differently than other spreadsheets, and KSpread 1.4.2 incorrectly computes VALUE (e.g., VALUE("6") is 0 instead of 6).

Note that at level 4, some built-in operations will be required to correctly support complex numbers. Excel 2002 cannot correctly use "*" to multiple complex numbers, so a concession to allow this behavior at lower levels will probably be needed, but at the upper levels handling "*" (etc.) correctly should be required.

For more about the differences between Excel and Lotus 1-2-3, see Calculation differences between Microsoft Excel and Lotus 1-2-3 formulas. That document has many errors in it, however: Excel does not ALWAYS treat text as an error in a Number context, in fact, if given A4+A5, and A4 is the text "100", Excel 2002 will silently convert the text to a number. The note does report that "Excel 2000 and later versions contain functions for compatibility with Lotus 1-2-3 Release 4.0 and later. The "A" functions— AVERAGEA, MAXA, MINA, STDEVA, STDEVPA, VARA, and VARPA— calculate results by using all of the cells in a range (including blank cells), cells that contain text, and cells that contain the logical values TRUE or FALSE."

TBD: How should we address complex numbers? Excel can handle them, but only when the Analysis Toolpak is installed (by going to Tools/Add-ins). Gnumeric can handle them, OOo2 cannot. They presumably should NOT be required for level 1 since not everything implements them, and they should probably be part of level 3. But what about level 2? One serious problem with complex numbers is that Excel's approach to Complex numbers is incredibly convoluted. Instead of creating a new type (Complex) and allowing functions to handle the new type, you have to use a whole new set of functions, even for operators that are normally expressed with infix notation. Thus, to compute e^PI*i, instead of saying EXP(PI*i), you have to say IMEXP(IMPRODUCT(PI(),COMPLEX(0,1))). Not all functions have an IMxyz() equivalent, making complex number use even more complicated. Do we really want to cast this extremely awkward notation in stone? One option is to note that implementations have to support IMPRODUCT(), etc., but recommend implementations also support "*" etc. for products?

Semantic Variances

Historically, spreadsheets have been influenced by three market leaders: VisiCorp's VisiCalc, Lotus 1-2-3, and Microsoft Excel. Excel's default semantics have some minor "semantic" differences with Lotus 1-2-3, and different implementations tend to follow one or the other approaches.

Level 1 defines the common set of semantics between various implementations, some of whose semantics are based on Lotus 1-2-3's while others are like Excel's. Level 3 requires a specific semantic (the Excel default semantics). In practice, for nearly all spreadsheets the differences are irrelevant. The primary variances permitted at lower levels are as follows:

  • Some conversions between types are not automatic. In particular, text in a cell may need to be converted by an explicit function before being used as a number (e.g., by VALUE) or date (e.g., by DATEVALUE). Creators of spreadsheet files MUST use the VALUE or DATEVALUE functions if they to convert from text to a number and have maximum portability. At level 3 or higher, such conversions are always automatic.
  • There may not a distinguishable Logical type; instead, logical values may be represented by the values 1 (True) and 0 (False). At level 3 or higher, there is always a distinct logical type.
  • The "=" operator between text strings may or may not be case-sensitive. At level 3 and higher, it is case-insensitive.
  • Some Lotus-like applications have string index positions starting from 0; in this specification, string index positions start from 1. Users of applications with string index positions starting from 0 may have 1 added or subtracted in functions that use index positions, and when they export, the 1 will be added or subtracted as appropriate.
  • Database criteria for simple text values may be treated differently. Some (Lotus 1-2-3 like) may require a match of the entire string; others (Excel like) will match any cell that begins with the text. At level 3 or higher, it will match any cell that begins with the text. Also, some Criteria match patterns (such as the pattern matching language for text) may differ.
  • Some functions may have additional parameters that are required or forbidden at lower levels (see the function definitions).

For more details, see the rest of the specification.

Test Cases

To reduce the risk of misunderstanding a requirement, and to increase the likelihood of compliance, this specification includes a large number of test cases that are NORMATIVE (that is, they are part of the specification). In particular, every function has at least one test case, and typically many test cases. An implementation MUST pass all of the test cases at or below a level to claim that level (unless otherwise specifically noted in the text). No set of test cases can be complete, so implementations MUST meet all the requirements of this specification of a given level, even if there is no specific test for some aspect of it.

A very few test cases depend on a specific locale. In these cases, the presumed locale is "en_US" (American English). This does not imply that any locale is "better" than another; this is simply done to reduce the costs of checking compliance, and to be sure that at least one locale works correctly. Implementations SHOULD support a large range of locales. In general, test cases have been defined to be locale-independent where practical to do so.

Test cases are defined by a table. For each test case, the following are defined:

  • Expression: The expression being computed for that test case, in OpenFormula OpenDocument format. In most cases, an effort has been made to write expressions so that the results have a finite representation in both base 2 and base 10 (e.g., integers or fractions of powers of 2) to simplify comparisons (with the exception of test cases specifically to test representations). For example, many test expressions involving trigonometric functions involve PI(), so that the results are easier to compare. The expression does not display any necessary XML encoding when it is stored in an XML document; thus an expression will show "<", not "&lt;".
  • Result: The required result of the expression in a compliant implementation of OpenFormula. This is one of the following:
    • A Logical Value: The result can be True or False.
    • A Text Value: The result begins with a double-quote, contains the correct text, and ends with a double-quote.
    • A Number: A number, which begins with a digit or a minus sign.
    • An Error: The result can be Error (any error value) or more specifically NA. This is the same as surrounding the expression with the ISERROR or ISNA function (respectively) and comparing that result with True.
  • Level: OpenFormula functional level at which this test result is required. 1 means all implementations MUST produce this result.

Note: Test cases are designed to be automatic and only test one capability at a time, where possible. For example:
  • For example, the DATE() function is used to generate most date values, since it's portable everywhere. Lotus 1-2-3 has a DATEVALUE(), but its DATEVALUE() does not accept ISO 8601 format dates.
  • Many applications interpret all text values as 0 when in a number context, including Lotus 1-2-3, Quattro Pro, and KSpread. Tests for conversions are in the conversion sections; the rest of the test cases are designed to not require conversion of text to a non-number.
  • Floating-point values (including time values) are inexact in most implementations. Where this can be an issue, test cases subtract the expected value and use ABS() to ensure that the result is correct within a reasonable range.

Test Case Data Set

In the test cases, the following spreadsheet values are presumed:

  B C
3 ="7"
4 =2 4
5 =3 5
6 =1=1 7
7 ="Hello" 2005-01-31
8 2006-01-31
9 =1/0 02:00:00
10 =0 23:00:00
11 3 5
12 4 6
13 2005-01-31T01:00:00 8

Note that B6 has the logical value TRUE(), B8 is a blank cell, and B9 is an error. C7 and C8 are dates, C9 and C10 are times, and B13 is a datetime value.

The following is a trial test database; this range is assigned the name TESTDB:

  A B C D E F G H I
18 TestID Constellation Bright Stars Northern Abbrev Decl Next South Date Rev
19 1 Cancer 0 TRUE Cnc 20 =B20 12 Mar 2005 13
20 =[.A19]*2 Canis Major 5 FALSE Cma 5 =B27 3 Feb 2002 12
21 =[.A20]*2 Canis Minor 2 TRUE Cmi -20 = B24 8 Mar 2005 11
22 =[.A21]*2 Carina 5 FALSE Car -60 27 Mar 1991 10
23 =[.A22]*2 Draco 3 TRUE Dra 75 =B31 5 Jul 1967 9
24 =[.A23]*2 Eridanus 4 FALSE Eri -29 =B29 23 Dec 1912 8
25 =[.A24]*2 Gemini 4 TRUE Gem 20 =B19 6 Feb 1992 7
26 =[.A25]*2 Hercules 0 TRUE Her 30 =B25 4 Jul 1934 6
27 =[.A26]*2 Orion 8 TRUE Ori 5 =B21 8 Jan 1909 5
28 =[.A27]*2 Phoenix 1 FALSE Phe -50 =B22 28 Nov 1989 4
29 =[.A28]*2 Scorpio 9 FALSE Sco -40 =B28 22 Feb 2000 3
30 =[.A29]*2 Ursa Major 6 TRUE Uma 55.38 =B26 29 Mar 2004 2
31 =[.A30]*2 Ursa Minor 2 TRUE Umi 70 =B30 13 Jul 1946 1

Notes:

  • The TRUE and FALSE values are actually =1=1 and =1=0 respectively (for maximum compatibility).
  • Some of the dates are in the future. No dates are before 1900, for maximum compatibility.
  • The last column is there solely to test reverse searches.
  • The "Next south" field is for testing references. Notice that one of the references is blank.

The following are test criteria, for extracting data from the database:

  B C D E F G H I
36 Bright Stars Northern Constellation Decl Rev Bright Stars Date
37 4 TRUE Ursa Major < 0 <= 7 < 4 >1950-01-01
38 <2 TRUE Constellation >= 8 CONSTELLATION
39 >1 FALSE Ursa URSA MAJOR

Basic Concepts


Namespaces

In XML representations, the use of OpenFormula (and the version of OpenFormula) is specified using XML namespaces.

Implementations using OpenFormula MAY begin formulas with a "formula specification selector" that identifies which formula specification to use. This marking may be as a separate attribute, or a prefix before the actual formula. As described in the OpenDocument profile, in the OpenDocument attributes table:formula and text:formula formulas MUST be preceded with a namespace prefix (a series of one or more letters) followed by ":", or by a single "=". OpenDocument implementations SHOULD treat such formulas beginning with a "=" as though they used the selector "formula:". The formula specification selector is NOT considered part of the formula.

The selector is then defined to be the namespace of a version of the OpenFormula specification. Implementations MAY accept formula syntaxes other than OpenFormula, and they MAY accept various compatible extensions to the default OpenFormula syntax. However, all implementations that claim to accept OpenFormula formulas MUST accept the minimum formula syntax and semantics as described in this specification, when processing formulas whose selectors reference a namespace of the OpenFormula specification.

For this version of the OpenFormula specification, using ODF formula syntax, the namespace is:

http://www.openformula.org/spec/1.0/odf

Rationale: This provides additional flexibility so that other languages can be used when desired. The selector is not considered part of OpenFormula itself, so that specifications that use OpenFormula can choose how to best include this selector.

TBD: Need to Rewrite this section - Do we need a formula: and oooc: namespace? Everyone needs to read oooc: and formula:, if nothing said, default is formula:. Don't imply that oooc: and formula: are equal where unspecified, and we may have to identify differences (remains to be seen).

Note: OpenOffice.org typically outputs spreadsheet formulas using oooc: as the prefix. OOo2 cannot read files with the OpenFormula namespace specification, so early test sheets used the OOo2 Calc namespace instead.

Formula Processing

Implementations of OpenFormula recalculate formulas when its information is needed. Typical implementations will note what values a formula depends on, and when those dependent values are changed, it will re-execute the formulas that depend on them to produce the new results (choosing the formulas in the right order based on their dependencies). Implementations may recalculate when a value changes (this is termed automatic recalculation) or on user command (this is termed manual recalculation).

Some functions' dependencies are difficult to determine and/or should be recalculated more frequently. These include functions that return today's date or time, random number generator functions (such as RAND), or ones that indirectly determine the cells to act on. Many implementations always recalculate formulas including such functions whenever a recalculation occurs. Functions that are always recalculated whenever a recalculation occurs are termed volatile functions. Functions that are often volatile functions include AREAS, CELL, COLUMNS, INDEX, INDIRECT, NOW, OFFSET, RAND, ROWS, and TODAY.

Note: For more on volatile functions, see Walkenbach's "Microsoft Excel 2003 Formulas" page 109, or the equivalent section in "Microsoft Excel 2000 Formulas", page 108.

This specification does not require either finite or infinite representations of numbers. Most implementations use finite representations (typically 64-bit floating point representations, with intermediate values represented using 80 bits). However, some effort has been made so that implementations may use alternative representations for numbers, such as arbitrary-length strings and/or fractions.

When a formula is computed, it is notionally provided a "context" as input. The context may include formula variables (including named ranges, document variables, fields, and so on), and/or additional function definitions that the formula can call. A formula may also be provided as input an ordered list of zero or more parameters (though the syntax for parameters is not given in this version). A formula may include calls to functions, which are normally provided the same context but with their own set of ordered parameters. Any formula computes a single result (though that single result may actually be a set of values).

Note that many formula creators are not sophisticated in their understanding of how computers determine their results. Most, for example, do not understand computer floating point arithmetic models, and have no idea that many implementations use a base other than 10 (or what that would mean). Most implementations use finite representations, yet many users lack an understanding of how that impacts them. This specification includes some requirements (especially at higher levels) so that unsophisticated users are more likely to get the results they intended. For example, at level 2 the "=" operator MUST ignore case differences, because many unsophisticated users do not notice case differences. At level 3, text placed in a number context MUST be converted to a number where possible, because many users have a poor understanding of data types (and will be unable to perceive a difference between the text value 3 and the number value 3). However, spreadsheets can be created by more sophisticated users, and this specification includes capabilities to support these more sophisticated users.

TBD: Originally the text said this: As a result, some efforts have been expended to try to make formulas produce the "expected answer" for unsophisticated users. For example, the equal-to operator for numbers matches imprecisely, because many users do not understand that (1/3)*3 on most implementations will produce a value close to one, but not precisely equal to one. Originally there was a test to ensure that (1/3)*3 was equal to 1. The Gnumeric developers objected, on the grounds that requiring that equality be "sloppy" made it very difficult for sophisticated users to use spreadsheets to their full capabilities.

It is RECOMMENDED that implementations avoid imposing arbitrary limits. In any case, applications MUST support formulas up to at least 1024 characters long (as measured when in ODF interchange format) and at least 30 parameters per function when the function prototype permits a list of parameters. This implies that portable spreadsheet files MUST stay within these limits to be truly portable.

Rationale: Excel 2003 supports 1,024 characters per formula, per Walkenbach "Excel 2003 Formulas" page 33. This measures a slightly different value than shown here, since the representation is similar but not identical. Excel supports 30 parameters, according to OpenOffice.org's documentation on the .xls file format].


Test Cases:

Expression Result Level Comment
=SUM([.B4];[.B5];[.B4];[.B5];[.B4];[.B5]; [.B4];[.B5];[.B4];[.B5];[.B4];[.B5]; [.B4];[.B5];[.B4];[.B5];[.B4];[.B5]; [.B4];[.B5];[.B4];[.B5];[.B4];[.B5]; [.B4];[.B5];[.B4];[.B5];[.B4];[.B5]) 75 1 Functions must be able to take 30 parameters.
=[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+ [.B5]+[.B4]+[.B5]+[.B4]+[.B5]+[.B4]+[.B5]+ [.B4]+[.B5]+[.B4]+[.B5] +[.B4]+[.B5] 420 1 Formulas can be up to 1024 characters long.


Types

OpenFormula expressions, including formulas, always produce a value. Any table cell that can be referenced by an expression may be either empty or have a value. If a table cell has a constant value, it is exchanged using the format used to exchange such values. For example, see the OpenDocument standard for information on how to exchange constant table cell values (including constant date and time values).

Any value has a basic type. In addition, many functions require a type or a set of types with special properties (for example, a "Database" requires headers that are the field names); these specialized types are called pseudotypes.

Basic Types

In OpenFormula a value MAY have one of the following basic types: Text (the string type), Number, Logical (the boolean type), Error, Reference, or Array. At lower levels the Logical type MAY be implemented as a subtype of Number, as described below. At higher levels complex numbers MUST be supported as described below. An implementation MAY provide other types, and it MAY have many specialized subtypes of these types. However, a document MUST only use constructs in this specification (including only these types) to conform to it.

TBD: Should "empty" also be a possible value result? This is needed at level 3 for empty parameters; it can also occur at level 2 with an intersection of non-overlapping references. For the moment, presume that it's a kind of reference (a reference with no cells as content). But are they distinguishable?

Text

A text value (also called a string value) is a sequence of zero or more characters. A cell with a label has this type, as do constants surrounded by double-quotes. Implementations SHOULD support Unicode strings, but MUST at least support strings of ASCII characters.

Some functions, such as FIND, refer to index positions inside a text value. The first position of a text value is numbered 1, not 0. In implementations that support Unicode, strings MAY be represented internally using a variable-width representation such as UTF-8 or UTF-16, but unless otherwise noted, any reference to a particular index inside a string MUST be by character not byte position. Implementations MUST maintain the illusion of fixed-width characters, even if they use variable-length encodings internally.

Note: Excel, Gnumeric, SheetToGo, OpenOffice.org, and KSpread all use 1-based positioning. Lotus1-2-3v9.8 and Quattro Pro 12 use 0-based positioning. It is impractical to stay silent on whether 0 or 1 is the first position -- too many functions (such as FIND) depend on this. So, the more common usage is chosen here. Lotus1-2-3 and Quattro Pro both have translation mechanisms that account for this difference already.

Note: We discuss type Text first, because complex numbers can be Text, Number, or their own different type at level 3.

TBD: Discuss handling special characters, like double-quote, in a string in formulas. Define the minimum length of text that MUST be supported - more than 2^16 characters?

Number

A number is simply a numeric value such as 0, -4.5, or $1000. Numbers MUST be able to represent fractional values (they MUST NOT be limited to only integers). The "number" type MAY be displayed in many different formats, including date, time, percentage, and currency.

Note: Typical implementations store numbers as 64-bit IEEE floating point internally and use the CPU's floating-point instructions where available (so intermediate values may actually have more than 64 bits to represent them). Implementations MAY choose to use fixed-point, exact (fractional) representations, arbitrary-length numbers, and so on to implement numbers. A cell with a constant numeric value has the number type. Typical implementations try to heuristically determine the "right" format for a cell when a formula is first created, based on the operations in the formula. However, users can override this format and the heuristic varies between implementations; thus this heuristic is outside the scope of this specification.

Implementations typically support many subtypes of Number, including date, time, datetime, percentages, fixed-point arithmetic, and arithmetic supporting arbitrarily long integers, and determine the display format from this. All such Number subtypes MUST yield True for the ISNUMBER() function. This specification does not require that specific subtypes be distinguishable from each other, or that the subtype be tracked, but in practice most implementations do such tracking because requiring users to manually format every cell appropriately quickly becomes tedious. This is especially important when generating OpenDocument format, since some subtypes (such as date, time, and currency) are stored in a different manner. Thus, this specification identifies some common subtypes and identifies those subtypes where relevant in function definition, as an aid to implementing good user interfaces. Many applications vary in the subtype produced when combining subtypes (e.g., what is the result when percentages are multiplied together), so unless otherwise noted these are unspecified.

Time

A time is represented as a fraction of a day. For example, 2 hours is considered to be 2/24. This can be considered a duration (e.g., 2:00 which is 2 hours) or a time of day (e.g., 2:00 which is 2AM). This specification does not distinguish between duration and time of day; a time of day is simply the duration of time since the beginning (midnight) of the day.

Date and DateTime

A Date is a subtype of number; the number is the number of days from a particular date called the epoch. Thus, a date when presented as a general-purpose number is also called a serial number. This specification does not specify the exact value of the epoch, but implementations MUST support all dates from January 1, 1904, through December 31, 9999 (inclusive). Portable spreadsheet files MUST NOT assume any particular epoch values. Since dates are simply numbers, they can be added, subtracted, and so on like other Numbers. Subtracting one date from another produces the number of days between the dates.

A Datetime is also a subtype of number, and for purposes of formulas it is simply the date plus the time of day.

Note: Excel for Windows usually uses 1/1/1900 as serial number 1, while Excel for Windows uses 1/1/1904 as serial number 1. "Excel 2000 in a Nutshell" page 330 discusses time storage in Excel, including this, and noting December 31, 9999 as a date both support.

Excel 2003 copies a bug from an old version of Lotus 1-2-3; both act as though 1900 was a leap year. Thus 1900-02-29 has the serial number 60, and all date calculations on or before that date are wrong by one day. This specification does not require copying this bug. See "Excel 2003 Formulas" page 143.

Excel 2003 is unable to deal with dates before January 1, 1900; again, there's no requirement that other implementations have this limitation. Implementations that wish to support a broader range of dates, yet also the same numbers for most dates, could do so by using negative numbers as dates before the epoch (be careful, because a time inside the day adds to the beginning of the date).

In OpenDocument Format a date, datetime, or time value in a cell is stored in a special locale-independent format based on ISO 8601; see the OpenDocument specification for more information. Implementations may choose to store dates in a special type that is distinguishable from other numbers. However, from the point of view of a formula, a date, datetime, or time value is simply a subtype of Number, and must follow the rules of this specification. Most countries use the Gregorian calendar and ISO 8601, but not all. Note that applications must be able to convert text, in a variety of formats, into date values.

TBD: In earlier times dates were dependent on the location of the event, which is not necessarily the current locale. In particular, different countries switched from Julian to Gregorian on different dates. This creates a challenge if it is desired to represent dates in formulas significantly before 1900. One solution is to use the "proleptic Gregorian" calendar, which is simply the current Gregorian calendar indefinitely extended in both directions of time. Python 2.4's date types use proleptic Gregorian, and points to Dershowitz and Reingold's book "Calendrical Calculations" for various means to convert that to other calendar systems. The advantage of proleptic Gregorian is that it is locale-independent, works well with ISO 8601, and there are defined ways to convert between it and other calendars. If the goal is just to store dates, and not compute differences, then it can easily represent arbitrary dates without complexity in the basic spreadsheet implementation. If conversions are needed, they can be embedded in spreadsheet formulas -- which is the right place to put them, because the current locale is often not the locale of the event, and only the person entering the data will know the correct locale.

Percentage

A percentage is a subtype of number which is typically displayed by multiplying the number by 100 and adding the sign "%". Thus, the value 0.50 if considered a percentage would typically be displayed as 50%.

Currency

A currency is a subtype of number which is typically displayed with a currency symbol.

Complex Number

A complex number (sometimes also called an imaginary number) is a pair of real numbers including a real part and an imaginary part. In mathematics, complex numbers are often written as x + iy, where x (the real part) and y (the imaginary part) are real numbers and i is \sqrt{-1}. A complex number can also be written as reiθ = rcos(θ) + irsin(θ), where r is the modulus of the complex number (a real number) and θ is the argument or phase (a real number representing an angle in radians). In many engineering fields the letter j is used instead of i; j is easier to distinguish from the digit 1. Complex numbers are widely used in mathematics and engineering, and applications of higher levels must support them.

Applications of level 3 and higher MUST support complex numbers, though this support MAY require user installation of optional plug-ins. Applications below that level MAY support complex numbers, again possibly requiring additional installation of components. A complex number MAY be a subtype of Number, a subtype of Text, or a different distinguishable type or subtype altogether. Thus, portable spreadsheet files MAY NOT assume any particular result for ISNUMBER or ISTEXT when applied to a complex number.

Many functions are designed to work with complex (imaginary) numbers, and these functions generally have prefix IM (e.g., IMSQRT, and so on). The function COMPLEX(real;imaginary) creates a single complex number from a given real and imaginary part. Note that functions and operators that accept complex numbers must accept Text values as complex numbers (see the section on convert to complex), as well as Numbers that are not complex numbers, so IMSUM("3i";4) will produce the same result as COMPLEX(4;3).

Spreadsheet files at level 3 that use complex numbers MUST use only functions that are explicitly noted as accepting or producing complex numbers (as appropriate). In particular, operators are NOT REQUIRED to support complex numbers at level 3 or below (including "+", "-", "*", "/", "^", "="), and implementations do not include such support. Instead, at level 3 the results of using an operator on a complex number are undefined (and often produce an Error). Thus, COMPLEX(2;3)+COMPLEX(4;5) is not defined at level 3. Level 3 spreadsheet files MUST use the complex number functions instead of operators to work portably. For example, the portable expression IMSUM(COMPLEX(2;3); COMPLEX(4;5)) adds the numbers 2+3i to 4+5i (resulting in 6+8i). Thus, level 3 spreadsheet files that use complex numbers MUST use the functions instead when a value may be a complex number:

  • IMSUM (not infix "+")
  • IMSUB (not infix "-")
  • IMPRODUCT (not infix "*")
  • IMDIV (not infix "/")
  • IMPOWER (not "^", but see its definition for limitations)

For prefix "-", IMSUB with the first value 0 may be used.

Query functions supported on complex numbers at level 3 are IMREAL (returns real part), IMAGINARY (returns imaginary part), IMABS (returns magnitude), IMARGUMENT (returns the angle expressed by the complex number). At level 3, the following additional functions are supported for complex numbers (sorted alphabetically): IMCONJUGATE, IMCOS, IMEXP, IMLN, IMLOG10, IMLOG2, IMSIN, and IMSQRT.

Note that since applications MAY use a Text type to represent complex numbers, the "=" operator MAY not work correctly when comparing directly to constant inline text. E.G., "3+2i" would not compare equal to "3+2.0i", since they are textual comparisons. A portable way to test equality is to use IMSUB to compute the difference, use IMABS to find the absolute difference, and then ensure the absolute difference is smaller than some value.

TBD: Is there a need for an IMEQUAL and IMNEQUAL, at least at level 4? This is a very awkward way to determine equality. Level 4 can't really solve this if at level 4 a text type MAY be used to represent complex numbers.

Applications at level 4 and higher MUST support complex numbers the infix operators "+", "-", "*", "/", and "^", the comparitive infix operators = (equal to) and <> (not equal to). All of these operators MUST support complex numbers on either or both sides. At level 4, the prefix operator "-" MUST also support complex numbers as well. Thus, applications at level 4 MUST accept expressions such as COMPLEX(2;3)+COMPLEX(4;5) and calculate them correctly. Since text is a permitted representation, this means that "3+4i"+"5+6i" MUST also be accepted at level 4 and higher (it would produce a value equal to COMPLEX(8;10)). At level 4, COS, EXP, LN, LOG10, LOG2, and SIN must also accept complex numbers (not requiring their IM version). Level 4 applications MUST also support the level 3 operations (such as IMSUM), so that level 3 spreadsheet files can be widely portable. At level 4, functions related to powers must still be handled specially. At level 4, SQRT of a negative number is still an error; users who want the square root of a negative number to produce a complex number MUST use IMSQRT instead. Similarly, the infix operator "^" (power) is limited in what it can produce; use IMPOWER instead.

TBD: Requiring level 4 to support infix "+", etc. is controversial, and may be removed soon. However, it would probably be wise to write the specification so that implementations MAY do so; once done, future versions of the specification could require it. An alternative would be to not mandate IM* function support at level 3, and require infix "+" etc. support at level 4.

Test Cases:

Expression Result Level Comment
=IMREAL(IMSUM(COMPLEX(2;3); COMPLEX(4;5))) 6 3 Must use IMSUM to add at level 3.
=IMAGINARY(IMSUM(COMPLEX(2;3); COMPLEX(4;5))) 8 3 Must use IMSUM to add at level 3.
=IMABS(IMSUB(IMSUM(4;"3i");"2+i");COMPLEX(6;2)) 0 3 Can use Text and ordinary Numbers with IM functions.
=IMREAL(COMPLEX(2;3)+COMPLEX(4;5)) 6 4 Can use "+" at level 4.
=IMAGINARY(COMPLEX(2;3)+COMPLEX(4;5)) 8 4 Can use "+" at level 4.
=("2+3i"+"3+5i")=COMPLEX(5;8) True 4 Can use equal-to at level 4, and can use text too.

TBD: Currently the only exchange format is COMPLEX(real;imaginary) or Text formats as in "3+2i". Should there be a special syntax for complex numbers, e.g., 2+3i, -7i, and i? If modifying the number syntax is not desired, another possibility is defining "i" as sqrt(-1), and then 2+3*i would be possible (less ideal, but workable). Note that strings for numbers are locale-dependent, e.g., 2,4+3,2i might not work in a U.S. locale.

Note: Handling complex numbers is itself complex. Excel can handle them, but only when the Analysis Toolpak is installed (by going to Tools/Add-ins). Gnumeric can handle them, OOo2 cannot. One serious problem with complex numbers is that Excel's approach to Complex numbers is incredibly convoluted. Instead of creating a new type (Complex) as a subtype of Number, and allowing functions to handle the new type, Excel treats Complex Numbers as a Text value (ISTEXT is true, ISNUMBER is false). This, in Wheeler's view, is a major design flaw. Because of this, Excel users who attempt to use complex numbers have to use a whole new set of functions, even for operators that are normally expressed with infix notation. Thus, to compute e^(PI*i), instead of saying EXP(PI*i), you have to say IMEXP(IMPRODUCT(PI(),COMPLEX(0,1))). Even addition and subtraction cannot use the infix notation, you must instead use IMSUM() and IMSUB(). Not all functions have an IMxyz() equivalent (e.g., there's no IMTAN, etc.), making complex number use even more complicated. In some cases having a different function makes sense anyway (e.g., SQRT should error on a negative number, and IMSQRT should not), but in most cases functions should seamlessly handle complex numbers. Complex number use in Excel is relatively rare, in part due to its incredibly awkward and non-intuitive support.

The solution adopted here is to specify a subset at level 3 a set of capabilities that does work on existing spreadsheets that support complex numbers (such as Excel and Gnumeric), but not to require that complex numbers be stored as text, and not to require that operators like "+" and '*" fail to work on them. Then, at level 4, require that the normal operators work correctly with complex numbers. This means that careful users can create spreadsheets that port everywhere on today's spreadsheets, but that applications can gradually improve their support so that traditional infix operators (like "+") work correctly.

Logical

A Logical value (also called a boolean value) is a value with one of two values: TRUE() and FALSE(). In this document these values will often be referred to as TRUE and FALSE respectively.

In level 3 or higher implementations, logical values MUST be implemented as a distinct, distinguishable type from numbers, so that ISNUMBER(TRUE()) evaluates to FALSE(). It is RECOMMENDED that lower level implementations do so as well, but they implementations MAY implement Logical as a number type, where TRUE() simply returns 1 and FALSE() simply returns 0.

Rationale: This is called "Logical" in this specification, not the usual Computer Science term Boolean, because that's the usual term in spreadsheet implementations and it's consistent with the function name ISLOGICAL(). Excel, Gnumeric, PalmToGo, and many other spreadsheet implementations have distinct Number and Logical types. This means for example, that ISNUMBER(TRUE()) is FALSE, and because logical values are distinguished, a SUM() over a range covering Boolean values skips the boolean values. However, Lotus 1-2-3 (Walkenbach 2004, pg. 712) and OpenOffice.org 2, as well as the original VisiCalc, do not have a distinguished boolean type, so SUMs over a range with boolean values will produce a different result. Higher levels are required to implement a distinct logical type because of their widespread use. As noted in Python PEP 285, "Most languages eventually grow a Boolean type". E.G., C added one as part of C99, Python added one in version 2.3.

Test Cases:

Expression Result Level Comment
=ISNUMBER(TRUE()) FALSE() 3 Distinct logical type required at level 3.
=ISTEXT(TRUE()) FALSE() 3 Distinct logical type required at level 3.
=ISLOGICAL(1) FALSE() 3 Distinct logical type required at level 3.

Error

An error is one of a set of possible error values. Implementations may have many different error types, but one in particular is distinct: NA() (written as #N/A in Excel). Users may choose to enter some data values as NA(), so that this error value propogates to any other formula that uses it, and may test for this using the function ISNA().

Note: Excel 2003 has the following error values (with ERROR.TYPE values), according to Walkenbach 2003, page 49 and Simon's "Excel 2000 in a Nutshell" page 527:
  • #DIV/0! (2) - Attempt to divide by zero, including division by an empty cell.
  • #NAME? (5) - Unrecognized/deleted name.
  • #N/A (7) - NA. Lookup functions which failed and NA() return this value.
  • #NULL! (1) - Intersection of ranges produced zero cells.
  • #NUM! (6) - Failed to meet domain constraints (e.g., input was too large or too small)
  • #REF! (4) - Reference to invalid cell.
  • #VALUE! (3) - Parameter is wrong type.

TBD: Should we list a standard set of error values? Currently, we are keeping maximum flexibility by simply asserting that there are error values without saying what they are other than NA.

Reference

A reference contains 0 or more areas, each of which to an ordered collection of non-reference values. It may be subdivided further into ranges, each of which have one or more rows and one or more columns. A special case of set is a matrix, a rectangular connected range of cells with rows and columns. Another special case is a single cell reference, a reference with a single area containing a reference to a single cell.

At levels 1 and 2, implementations MAY only implement references with 0 or 1 areas (i.e., all references must be at most a rectangular range of cells).

TBD: Need more details here about internal structures of what references point to. This one needs fixing. Esp. look at multi-range concatenations like A1:B3:D2. See AREAS(), for example. What about references to empty ranges of cells? Can you distinguish A1:A1 from A1? How do you handle 1:3? Do array functions change anything here?

TBD: Arrays

Pseudotypes

Many functions require a type or a set of types with special properties (for example, a "Database" requires headers that are the field names); these specialized types are called pseudotypes.

Scalar

A Scalar value is a value has a single value. A reference to more than one cell is not a scalar (by itself), and must be converted to one. Similarly, an array with more than one element is not a scalar. The types Number (including a complex number), Logical, and Text are scalars.

Database

A database is an organized set of data similar to a table in relational database systems. Any one database has a set of one or more fields that determine the structure of the database. A database has a set of zero or more records with data, and each record contains data for every field (though that data may be empty).

This specification requires that applications support the use of ranges as a database. In this case, the first row of the range is interpreted as the set of field names. In any compliant spreadsheet database, field names MUST be of type Text and MUST be unique ignoring upper/lower case distinctions. All rows after the first row are data records; each data record contains the corresponding field values.

Note that a single cell containing text can be used as a database; if it is, it is a database with a single field and no data records.

It is considered good practice to define a named range for a database, but this is not required by this specification. Formulas MAY use different portions of a database by using different ranges, but since this often indicates a defective spreadsheet, implementations MAY warn when this occurs. Applications MAY support access to external databases as well as cell ranges.

Field

A field is a value that selects a field in a database; it is also called a field selector. If the field is text, it selects the field with the same name. Implementations SHOULD match the field in a case-insensitive way, and compliant spreadsheets SHOULD specify the field name exactly (including by case) to be completely portable.

If the field is a Number, it is considered a positive integer and used to select the fields. A field of 1 selects the first (leftmost) field, 2 selects the second field, and so on.

A field that selects a non-existent field is an error.

Criteria

A criteria is a set of selectors that select records from a database. A criteria is a rectangular set of values, with at least one column and at least two rows. The first row lists the fields that are to be considered (see the field section for the requirements on fields). The rest of the rows list criteriasets.

For a record to be selected from a database, all of the requirements of at least one row of criteriasets must match. Inside each criteriasets are zero or more criteria.

Criteria can be one of the following:

  • A Number or Logical value. The matching field must equal the given number.
  • A text value beginning with a comparitor (<, <=, =, >, >=, <>). The matching field must compare appropriately with the given value after it.
  • Other text value. The matching field must equal at least the field for the number of characters in the criteria; an implementation MAY require an exact match (no additional characters permitted).

Test Cases:

Expression Result Level Comment
=DSUM(TESTDB; "TestID"; [.B36:.B37]) 96 1 Trivial criteria, checking for equal to a number. We use SUM on the TestID to make sure that EXACTLY the right records were selected.
=DSUM(TESTDB; "TestID"; [.G36:.G37]) 4757 1 Check for less than a number.
=DSUM(TESTDB; "TestID"; [.B36:.C37]) 64 1 Two criteria side-by-side are an AND (must meet ALL criteria)
=DSUM(TESTDB; "TestID"; [.B36:.B38]) 737 1 Two criteria on top of each other are an OR (must meet ANY of the rows of criteria)
=DSUM(TESTDB; "TestID"; [.B36:.C38]) 193 1 Can have multiple criteria sets.
=DSUM(TESTDB; "TestID"; [.B36:.D38]) 0 1 Can have multiple criteria sets.
=DSUM(TESTDB; "TestID"; [.D36:.D37]) 2048 1 Simple text match.
=DSUM(TESTDB; "TestID"; [.H36:.H37]) 3679 1 Date comparison
=DSUM(TESTDB; "TestID"; [.E36:.E37]) 1580 1 Comparison less than zero.
=DSUM(TESTDB; "TestID"; [.F36:.F37]) 8128 1 Less than or equal to.
=DSUM(TESTDB; "TestID"; [.G36:.G38]) 6037 1 Pair of comparisons, and check on greater than or equal to.
=DSUM(TESTDB; "TestID"; [.H38:.H39]) 2048 3 Matches of field names and text should ignore case
=DSUM(TESTDB; "TestID"; [.D38:.D39]) 6144 3 If initial text matches, SHOULD return it (do not require exact match at higher levels)


TBD: There are many other criteria. How do we handle the variance? Various implementations have different pattern languages, and that's a problem.

Note:

Expression Syntax

In OpenFormula, a formula must be an expression, which in turn must meet a specific syntax. Any OpenFormula implementation MUST support this syntax when reading and writing formulas to support data interchange. This syntax is defined using the BNF notation of XML version 1.1, as described at http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-notation.

Discussion: Should we Change formula definition so that namespace prefix is included? Should we discuss empty expressions here?

TBD: Need to review array formulas - is there a syntax change? We also need to identify the semantics for array formulas, throughout.

Rationale: This section includes tests for syntax; tests for operations should be in later sections. That way, if multiple syntaxes are allowed in later versions of the specification, tests do not need to be duplicated.

Any reading implementation must be able to read at least the syntax defined in this section (it may accept various extensions), and any writing implementation must generate this format when a user creates data that can comply with it.

The required OpenFormula syntax for functional level 1 and 2 is as follows:

formula ::= expression
expression ::=  NUMBER | STRING | formula_variable |
                expression binary_op expression |
                unary_op expression |
                expression postfix_op |
                "(" expression ")" |
                function_call |
                cell_specifier
formula_variable ::= IDENTIFIER
unary_op ::= '+' | '-'
binary_op ::= comparison_op | '+' | '-' | '&' | '*' | '/' | '^' |
              intersection_op
comparison_op ::= '<' | '>' | '<=' | '>=' | '=' | '<>'
postfix_op ::= '%'
function_call ::= IDENTIFIER "(" expression_list ")"
parameter ::= expression
nonempty_expr_list ::= parameter | nonempty_expr_list parm_separator parameter
expression_list ::=  /* empty */ | nonempty_expr_list

intersection_op ::= '!'   /* intersection operation */
parm_separator ::= ';'
cell_specifier ::= "[" source_location? cellRangeAddress "]"
cellRangeAddress ::= cellAddress ( ":" cellAddress )?
cellAddress ::= "$"? sheet_name "." in_sheet_address
in_sheet_address ::= "$"? [A-Z]+ "$"? [0-9]+
sheet_name ::= /* empty */ | "$"? ( [A-Za-z0-9]+ | "'" [^']+ "'" )
source_location ::= "'" URI "'" "#"      

The URI in the source_location MUST support absolute URIs (URLs are URIs). Implementations SHOULD support relative URLs, which can be distinguished because they do not begin with [A-Za-z]+ ":". If a relative URL looks like an absolute URL, it MUST save relative URLs using a "./" prefix, e.g., './mystuff.xls'. Implementations MUST support they file:/// prefix; they MAY support others, but beware of the security ramifications (an attacker could use this ability to track someone when they download a file to extract data from it, or use this to send malicious data). Implementations SHOULD ask the user before loading data from any external data sources.

Note that OpenFormula formulas are typically embedded inside an XML document. When this occurs, various characters (such as "<", ">", '"', and "&") must be escaped, as described in the XML specification.

TBD: Drop ALL support for source_location for level 1?

Rationale: cellRangeAddress and cellAddress are shown differently (with underscores) because they directly connect to the OpenDocument values of the same name.

The following subsections describe the leaf nodes.

Numbers

Numbers are written and read in this format using the "C" locale (which uses the "." as the decimal separator and never uses a thousands separator), using POSIX setlocale(LC_NUMERIC, "C") or equivalent. Note that leading - and + signs are allowed as unary operators, described below. Writers MUST write numbers that match the pattern (note that it must begin with a digit); these are interpreted as base 10 numbers:

 [0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)? 

Readers MUST be able to read these numbers, as well as accept numbers that begin with a leading ".", so they must be able to read numbers in the form:

 ((\.[0-9]+)|([0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?))

Rationale: A trailing "%" is not included, because "%" is allowed as a general-purpose operator; including it here as well would create an unnecessarily ambiguous grammar. For the same reason, leading + and - are not included in the grammar for a number. Implementations typically identify leading "-" for a number, precompute that, and store that instead of computing the negation each time.

Test Cases:

Expression Result Level Comment
=1.5 1.5 1 Trivial fractions supported.
=1e4 10000 1 Lower case "e" for exponential supported.
=1E4 10000 1 Upper case "E" for exponential supported.

Constant Strings

Constant strings are surrounded by double-quotes; to embed a double-quote, the double-quote character is used twice. When formulas are contained in XML, the rules for XML are used when exchanging string values (UTF-8 is commonly used). Note that since when formulas are contained in an XML's attribute, all double-quotes in the formula are actually quoted (e.g., stored as &quot; in the XML). Constant strings match the pattern:

 \"([^"]|\"\")*\"

Test Cases:

Expression Result Level Comment
=LEN("Hi") 2 1 Simple string.
=LEN("Hi""t") 4 2 Repeat double-quote to get a single double-quote character.
=LEN("\n") 2 2 C-format strings have no effect inside character strings.

Formula variables

Formula variables are named values which can be retrieved from the formula's context. Examples include named ranges in a spreadsheet, and both variables and fields in a word processing document. Note that some formula variables may be a single value (e.g., a named range defined as a single cell) or a set of values (e.g., a named range of a selected set of cells). Names are not case-sensitive, so "a" and "A" refer to the same formula variable.

Implementations MUST accept at least named ranges whose names match the following regular expression:

 [A-Za-z][A-Za-z0-9_]*  

Note: The exchange format is unambiguous -- any cell specifier is surrounded by "[".."]", while formula variables are not. However, most implementations display and allow users to enter formula variables without the surrounding square brackets, which means that the range of legal formula variable names cannot have the name as a cell address in such cases. Excel 2000 in a Nutshell, page 120, gives the following rules for its names. A name must:
  • begin with a letter or underscore
  • be followed with zero or more letters, digits, underscore (_), period (.), question mark (?), or backslash. FIXME: Note that it permits periods, which complicates entry (since "." is used as the separator between sheetnames and cell addresses in OpenFormula). Note that spaces are not permitted.
  • cannot exceed 255 (and really can't exceed 253, since then selecting the name box is a problem).
  • cannot match cell's address. A cell address is the one or two letter column label A through IV (uppercase, lowercase, or combination) followed by a number between 1 and 65535. It also cannot match the obsolete RrCc row-column name format (where r is a number from 1 to 255 and c is a number from 1 to 65535).

Operators

Ordinary infix and prefix operators are accepted. These have the following associativity and precedence (from lowest to highest priority):

AssociativityOperator(s)Comments
left

<, =, >, <=, >=, <>, !=, ==

Less than, equal to, greater than, less than or equal to, greater than or equal to, not equal to, not equal to (alternative), equal to (alternative).

left&

Binary operation string concatenation. Note that unary (prefix) + and "#147; has a different priority. Note that "&" must be escaped when included in an XML document, typically as "&".

left+,-

Binary operations add, subtract. Note that unary (prefix) + and "#147; have a different priority.

left

*,/

Multiply, divide. Division does not truncate, so 1/2 is equal to 0.5.

right

^

Power (2^3 is 8).

left

%

Postfix unary operator % (divide by 100). Note that this is legal with expressions (e.g., [.B1]%) and it can be duplicated (1), and it does not change the meaning of other operations such as "+".

right

+,-

Prefix unary operators, e.g., -5 or -[.A1]. Note that these have a difference precedence than add and subtract.

left

!

Cell address intersection ([.A1:.C4]![.B1:.B5] is [.B1:.B4]).

Precedence can be overridden by using parentheses, so "2+3*4" computes 14 while "(2+3)*4" computes 20. Implementations' user interfaces may display these operators differently or with a different precedence, but when exchanging formulas they MUST use the precedence rules here. Implementations SHOULD retain and regenerate "unnecessary" parentheses and white space, since these are added by people to improve readability.

Level 1 implementations MAY allow unary prefix "-" to have a higher precedence than "^", but level 2 and higher implementations MUST implement the precedence as stated above.

Rationale: These precedences were primarily derived through testing real spreadsheet implementations.

The precedence of unary prefix "-" compared to "^" has initiated more than a little research. In this specification prefix "-" has higher precedence than "^", because "-2^2" is "4" in Microsoft Excel, OpenOffice.org, Gnumeric, and probably many others. This is not universal; prefix "-" has a lower precedence on Lotus 1-2-3, Quattro Pro, and Excel's own Visual Basic (Walkenbach, 2004, pg. 579), so these products will need to insert and remove parentheses when reading/writing expressions in OpenFormula where this matters. However, the vast majority of spreadsheet implementations used today treat "-" as higher precedence, so as usual we identified and documented common conventions. Using a different precedence would create a high likelihood of error and additional work in implementations, for no good reason. Ideally OpenFormula could also be used for attribute draw:formula as defined in OpenDocument 1.0 section 9.5.5 and attribute anim:formula as defined in 13.3.2; neither of these attributes includes an exponentiation operator "^", so this issue of precedence causes no problem. (Note that they require "," as the function separator, and have other requirements, as discussed later in this document).

Prefix "-" and "+" are right-associative, not non-associative, because "=--[.B3]" is legal (and it converts B3 to a number, so it can have an effect). Cell address intersection has an even higher precedence, so that a unary minus in front an intersection will work correctly.

Rationale: Walkenbach gives Microsoft Excel 2003's precedence levels as (lowest to highest, note that the book gives the reverse order) comparison (such as "="), "&", "+" and "-", "*" and "/", "^", "%", and unary "-" (negation). (Walkenbach, 2004, pg 38). By having the precedence be the same as Excel, it simplifies transition.

Note: this format is intentionally similar to traditional presentations of spreadsheet formulas, which reduces the likelihood of error or misunderstanding. Although the user interface is not specified here, the representation is intentionally chosen so that formulas can "round trip" to this format and back without loss using typical formula representations. It would be possible to replace some or all uses of "%" with "/100", and to replace "^" with POWER(), but this would cause the display format to change once it was saved and reloaded.

TBD: Does everyone have string concat as LOWER priority than binary +/-?

Functions

A function call has a function name matching the pattern [A-za-z][A-Za-z0-9_]* followed by an opening parenthesis, zero or more parameters, and a closing parenthesis. Parameters are separated by a semicolon (not a comma), though readers MAY optionally accept function calls using commas as separators as well. Function names are case-insensitive, so "sum", "Sum", and "SUM" are the same function. However, function names SHOULD be saved as uppercase-only values.

Rationale: Function names are often displayed in all upper case by both implementations, showed this way in documentation, and saved in uppercase, owing to the influence of the first spreadsheet program (VisiCalc). By saying implementations SHOULD save in all uppercase, this increases the likelihood that implementations can trivially write back formulas and, if the formulas are unchanged, produce an identical result. This can be helpful to systems that depend on detecting differences, by minimizing the number of unnecessary differences.

Level 1 and 2 do not need to support empty parameters, but MUST support functions with 0 parameters. Thus at levels 1 and 2 X(;) is not a legal function call but RAND() is.

Some functions always produce the same value, and are thus constants; PI() and TRUE() are examples.

Note: Microsoft Excel accepts empty parameters in any position; OpenOffice 1.1.3 does not, so the current syntax above doesn't either. See below (level 3) for how this rule can be relaxed. Typical implementations will have many built-in functions, and most implementations also support one or more ways to create user-defined functions.

Rationale: Excel uses the "," as the function parameter separator, but OpenFormula uses ";" instead. Many locales use "," as the decimal separator; using the semicolon as the parameter separator eliminates confusion and the risk of incorrect implementation.

Cell Specifiers

In OpenFormula cell specifiers are surrounded by "["..."]" and contain a cellRangeAddress, which may simply be a simple cellAddress. Both cellRangeAddress and cellAddress are as they are defined in OpenDocument 1.0 section 8.3.1. The "$" markings note that the following component is absolute, otherwise it is relative; these markings have no effect on calculation (they are intended for use when copying or moving formulas).

Rationale: Cell addresses in OpenFormula begin with "[" and end with a "]"; this makes parsing simpler, faster, and more reliable. Cell addresses are specified in A1 notation, not an R1C1 notation, as required by OpenDocument 1.0 section 8.1.3 subsection "Formula". Not using R1C1 notation can in some cases cause an increase in compressed file size (because copied formulas are shown differently and thus do not compress as well). However, A1 notation is much easier for humans to understand, so using A1 format is likely to increase reliability (because it is more likely to be correctly generated and interpreted). If this is a serious problem, Array formulas can usually be used to achieve the same results. Both R1C1 and A1 could be allowed, but then different spreadsheets are likely to generate different characters for the same cell, and thus create "differences" that do not exist even in simple spreadsheets. Only uppercase characters are allowed, for the same reason.

Note: Typical spreadsheet displays will often not display or require input of the square brackets. OpenDocument 1.0 section 8.3.1 also defines a cellRangeAddressList (space-separated ranges) which is used by several OpenDocument constructs. OpenFormula does not use cellRangeAddressList; cell concatenation can do the same thing when this is necessary.

If future versions of OpenFormula are to handle OpenDocument's anim:formula and draw:formula, they will need to add variable syntax such as $0, $a, and $. In addition, the issue of using "," instead of ";" as a function parameter separator will need to be resolved to accept those formulas. Future versions may also accept constant array specifiers, e.g., "{" list "}" - and in that list, there must be a way to separate different values inside a row as well as a way to separato different rows (Excel uses "," to separate entries within a row, and ";" to separate entries in different rows).

OpenOffice.org 2.0 always generates absolute Sheetnames if it's not the current sheet (i.e., '$' always prefixes a Sheetname). However, the OpenDocument 1.0 specification 8.3.1 clearly makes the "$" optional in front of a sheetname.

Discussion: ??? Named values are actually named expressions in Excel, and can have references ("the cell row-2 and column+1 from here") and be reused in many different cells. Thus, there appears to be a real need to support an R[]C[]-style reference format, possibly at level 3. Probably should also support an argument format, so that they can be used as user-defined formulas, using the sytax compatible with the other formula-like constructs in OpenDocument (e.g., $1). However, once you start changing the fundamentals of how cell references are made, should we abandon the OOo storage format? After all, the current syntax already doesn't well-support arbitrary range operations. The Gnumeric folks argue for using their syntax instead; details of this proposal have not yet been seen. See mailing list.

TBD: Need to specify how to save relative sheet names, and how to handle ' embedded in sheet names.

TBD: Need to specify how to save "sheet deleted" ("#REF") sheet references.

Whitespace

Whitespace (space, tab, newline, and carriage return) is ignored in the default formulas syntax, except in the contents of string constants. A line break is normally represented by a single ASCII character 10 (\n); when embedded in an XML document this is typically represented as "&#10;". Implementations SHOULD retain whitespace entered by the original formula creator, and SHOULD NOT add additional whitespace unless directed to do so during the process of editing a formula.

Note: In Microsoft Excel's display format, the space is used as the intersection operator. This is very confusing, so the OpenFormula exchange format uses "!" instead for intersection.

Level 3 Extensions

The preceding syntax is sufficient for a vast number of actual spreadsheets, but some users may want more capabilities. This section defines the syntax for the additional capabilities a functional level 3 implementation MUST support. These are shown as extensions to the syntax above.

Extension: Empty parameters

The level 2 syntax (above) requires that a function call have 0 parameters or that it have 1 or more parameters each of which are non-empty. Level 3 implementations MUST also support empty parameters. When there are no parameters provided, this MUST be considered to be a call with 0 parameters (instead of 1 parameter that is empty).

This modifies the syntax as follows:

   nonempty_expr_list ::= /* Not used */
   parameter ::= /* empty */ | expression
   expression_list ::=  parameter ( ";" parameter )*    

Rationale: OpenOffice.org 1.1.3 and 2.0 do not accept empty parameters, so this was moved to level 3. However, Microsoft Excel accepts empty parameters, so it was believed important to be able to express them. This creates a minor ambiguity - expressions such as "PI()" can be read as either having 0 parameters or as having 1 empty parameter.

TBD: There is currently no way to call a function giving it one parameter, the empty parameter; should X(()) be suggested?

Extension: Cell concatenation and Range Extension

Level 3 implementations MUST support additional binary operators for cell concatenation and range extension:

   bin_op ::= bin_op.level2 | '~' | ':'    

The cell concatenation operation "~" MUST be left-associative with a precedence just greater than cell intersection. The cell range extension operation ":" MUST be left-associative with a precedence just greater than cell concatenation.

An implementation MUST write a cell range address ":" separator, and MUST NOT write the cell range extension operator, where it is possible to do so. That is, an implementation MUST write the expression "[.A1:.A3]" instead of "[.A1]:[.A3]" even though they are semantically identical. Where there is a choice of which cells to join inside a cell range address, an implmentation MUST choose the leftmost one.

Rationale: This specification does not mandate a cell concatenation operator at level 2, since OpenOffice.org 1.1.3 and 2.0 do not include one. However, Microsoft Excel includes this operation. In Excel this is represented using the comma (",") character, the same symbol used as the parameter separator for function calls. This is a very poor choice with a number of unfortunate ramifications. One problem is that concatentating cells in a function parameter requires surrounding the cells with additional parentheses in Excel display syntax. For example, AREAS(A1:A3,B2:B4) is a function call with two parameters, while AREAS((A1:A3,B2:B4)) is a function call with one parameter. Another problem is that the comma interferes with the use of "," as a decimal separator (as it is used in many locales) when using traditional entry formats (which do not mark cell addresses with "[".."]"). Gnumeric uses "+" as the cell concatenation operator in its display, but this has its own problems: it interferes with the use of "+" as a matrix addition operator. There are many alternatives, e.g., other characters (such as "~", "|", and "\"), or requiring a function syntax for this purpose. The character "_" would be a poor choice because formula variables can also include this character in their name (complicating parsing when "[..]" are not used "#147; is B3_B2 a formula variable, or are B2 and B3 concatenated?). Because of these issues, this document proposes using "~" as the cell concatenation symbol.

A special range-extension operator is needed to deal with cases such as more than one range extension, where one or more sides is a name, or where one or more sides is calculated as the result of an expression. But where possible it shouldn't be used, for maximum portability, since level 2 systems don't necessarily implement it. The leftmost rule is noted so that implementations are likely to generate the same text for the same formula, eliminating the appearance of change when none occurs.

Extension: Whole row/column selectors

Cell range addresses MUST be able to identify one or more whole rows, or whole columns, to meet functional level 3.

   cellRangeAddress ::= cellRangeAddress.level2 |                      
                        "[A-Z]+" ":" "[A-Z]+" | /* whole column(s) */
                        "[0-9]+" ":" "[0-9]+"   /* whole row(s) */    

Rationale: Excel 2003 supports this. OpenOffice 2.0 does not directly support it, but it fakes it very well by translating a "whole row" or "whole column" entry into a specific range. As long as spreadsheets only go from A1 to IV65536, there is no difference, and OpenOffice.org 2.0's approach is probably fine for many people. However, spreadsheet implementations of the future are likely to support larger limits, at which point these spreadsheets will quietly fail. It would be wiser to directly support these selectors so that large spreadsheets of the future would not silently fail.

Extension: Subtables

Level 3 implmentations MUST support addressing subtables (tables in tables), using this syntax:

 cellAddress ::= "$"? sheet_name ( "." in_sheet_address )+  

Rationale: OpenDocument 1.0 section 8.3.1 discusses subtables, and requires this syntax when addressing subtables. However, it doesn't require that subtables actually be supported in its syntax for cellAddress.

Extension: R1C1 Notation

Level 3 implementations MUST support R1C1 notation.

 in_sheet_address ::= ( "$"? [A-Z]+ "$"? [0-9]+ ) |                         
                      ( [-+$] [0-9]+ [-+$] [0-9]+ )

The symbols + and - indicate a relative reference; "$" indicates an absolute reference. The initial "r" MUST be in lowercase letters.

Rationale: R1C1 notation tends to produce much better compression results, it supports large spreadsheets (larger than IV65536) more easily, and better supports textual XML diffs. In earlier versions this had a lowercase 'r' and 'c', but this was unnecessary.

TBD: It's not clear this particular syntax is a good idea; discussion welcome.

Extension: Source locations for named identifiers

Formula variables are changed slightly to support source locations as cell references do:

   formula_variable ::= source_location? IDENTIFIER    

Rationale: OpenOffice.org 2.0 supports source locations for Cell addresses like A1, but not for named values, so this was moved to level 3.

Extension: In-line arrays

The syntax above does not include a specification for arrays specified in-line. Microsoft Excel includes a way to write arrays of constants without having to create a cell range for them. OpenOffice.org 1.1.3 does not support this. Excel's display format uses a comma as the separator between values in a row, and a semicolon for the separator between rows. Here semicolon is used as the separator between values in a row (again, so different locales will have a simpler time entering data when entering data), and the pipe symbol "|" is proposed as the symbol separating rows (with absolutely no precedent).

   array_spec ::= "{" matrix ( '_' matrix )* "}"
   matrix ::= ( matrix_row ( column_separator matrix_row )* ) ?
   matrix_row ::= expression ( ";" expression )*
   column_separator ::= "|"    

TBD: This is a proposed syntax, which in addition permits non-constant values in the array (something Excel does not permit) and concatenation (which Excel may not permits either).

Gnumeric is adding the ability to handle matrixes and NOT forcing them into a fixed cell array; these are called dynamic matrices. Does this require syntactic extensions? Need to discuss in types and functions.

Need to re-investigate in-line arrays and other array function syntax issues.

Should we add arguments (e.g., "$1") so that anim:formula, etc. are easier to support AND so that named expressions are much more effective (they'd then allow the easy definition of new functions)?

Standard Operators and Functions

This section defines the predefined operators and functions. For the purpose of this specification, operators are simply functions that use a different syntax: they may be prefix, infix, or postfix operators, and that is noted in their definition. Implementations MAY (and typically do) define additional predefined functions, beyond what is defined in this specification, as long as those additions do not conflict with the requirements here. In addition, implementations MAY add additional optional parameters, or accept a wider domain of inputs, in these predefined functions, as long as they continue to meet the specification.

Implementations MAY, and typically do, display different function names than listed here in their user interface. For example, implementations MAY change the displayed function name depending on the current locale's language and/or a user interface "skin." An implementation MAY display a different function name because the implementation has traditionally used a different function name as well.

Function names ignore case, so "sum" and "SUM" refer to the same function. However, it is RECOMMENDED that implementations write function names in all uppercase letters.

Rationale: By writing function names in all upper case, the text is more likely to stay the same when a file is read and later rewritten, making textual comparison simpler. Also, function names are traditionally written in all uppercase for spreadsheet formulas. This may be due to the influence of the first spreadsheet, VisiCalc; VisiCalc first ran on the Apple ][ microcomputer, which did not display lowercase letters unless it was modified.

Unless otherwise noted, if any value being provided is an error, the result is an error; if more than one error is provided, one of them is returned. It is RECOMMENDED that the first error value be returned, if there is more than one possible error value return possible. Some "functions", such as PI() and TRUE(), always return the same constant value, and are considered constants.

TBD: Need to define functions when they are in array formulas. Their semantics are often different, though we hope they can be abstracted away in many cases. Help wanted.

This section begins with a subsection describing the common template used to describe all functions and operators. This is followed by the subsection on implicit conversion operators (the operators that are implicitly called when a type is requested). The next subsection describes the standard operators such as "+" (add numbers) and "&" (concatenate text). This is followed by a number of different subsections defining functions; these subsections group similar functions together so that similar functions are near each other.

Common Template for Functions and Operators

For every function or operator, the following are defined in this specification:

  • Name: The function/operator name, e.g., "SUM". The name is given in a specification header. Operators are named first with its location (Prefix, Infix, or Postfix), following by the word Operator and the actual name surrounded by double-quotes. Thus, Infix Operator "+" is easily distinguished from Prefix Operator "+".
  • Summary: A one-sentence briefly describing the function or operator.
  • Syntax: A prototype showing its syntax. In particular, it shows parameter names (in order), with each parameter prefixed by the type required of that parameter. Here { ... } indicates a list of zero or more parameters, separated by the function parameter separator character. Components surrounded by [ ... ] are optional. Parameters are separated with a semicolon (";"), as per the OpenDocument Format syntax; note that some implementations may display parameter separators with commas or some other alternative. Note that when a function is given a value of a different type, the parameters are first converted using the conversion rules before the function operates on its parameters.
  • Returns: Return type (e.g., Number, Text, Logical, Reference).
  • Level: The level at which this MUST be supported. "1" means all implementations must support it. If there special cases, the lowest level is given, and the semantics section discusses the additional constraints at different levels.
  • Constraints: A description of constraints, and where possible, what happens when the constraints are not met. This is in addition to the constraints imposed by the parameter types. If there are no constraints beyond those imposed by the parameter types, this is "None". If a constraint is not met, the function/operator returns an Error.
  • Semantics: This text describes what the function/operator does. In some cases they will be defined by mathematical formulas or by an OpenFormula formula.
  • Test Cases: A set of one or more test cases. Test cases are NORMATIVE, not notional or optional; they are considered part of the definition. See the Conformance section for more about test cases.
  • Comment: Explanatory comment.
  • See also: A list of related operators and functions.

The implicit conversion operators omit many of these items, e.g., the syntax (since there is none).

Rationale: The template for function information is similar to

Proposal - use OOo's structure. The test cases are normative. Experience shows that test cases are valuable for helping explain the text, it is often easier to explain or clarify with a few test cases, and creating test cases later is often the task everyone agrees needs doing -- but no one actually does.

Implicit Conversion Operators

Any given function or operand takes 0 or more parameters, and each of those parameters has an expected type. The expected type can be one of the base types, identified above. It can also be of some conversion type that controls conversion, e.g., Any means that no conversion is done (it can be of any type); NumberSequence causes a conversion to an ordered sequence of zero or more numbers. If the passed-in type does not match the expected type, an attempt is made to automatically convert the value to the expected type. An error is returned if the type cannot be converted (this can never happen if the expected type is Any). Unless otherwise noted, any conversion operation applied to a value of type Error returns the same value.

Conversion to Scalar

To convert to a scalar, if the value is of type:

  • Number, Logical, or Text, return the value.
  • reference to a single cell: obtain the value of the referenced cell, and return that value.
  • reference to more than one cell: do an implied intersection (described below) to determine which cell to use and then handle as a single cell.

Implied intersection

In some cases a reference to a single cell is needed, but a reference to multiple cells is provided. In this case an "implied intersection" is performed. To perform an implied intersection:

  • Compute the union of cells contained in the current row and current column of the formula being computed.
  • Intersect this with the provided reference to multiple cells
  • If a single cell is referenced; return it; otherwise, return an Error.

Note: Implied intersection is basically ((my_row union my_column) intersection (given reference)).

Tests with Excel 2003 and OpenOffice.org 2 seem to confirm this.

Conversion to Number

If the expected type is Number, then if value is of type:

  • Number, return it.
  • Logical, return 0 if FALSE, 1 if TRUE.
  • Text: If less than level 3, an application MAY always return 0 or return the result of applying VALUE() to the text. Level 3 and above MUST use VALUE() to convert the text to a number or error value.
  • Reference: If the reference covers more than one cell, do an implied intersection to determine which cell to use. Then obtain the value of the single cell and perform the rules as above. Applications less than level 3 MAY choose to use a different option for references to text than they do for embedded text.

Note that VALUE() can convert not only ordinary numbers, but it can also handle dates, times, and so on. Thus, on level 3 applications, DAY("2005-01-01") is REQUIRED to be computed as DAY(VALUE("2005-01-01")). Spreadsheet developers who want their spreadsheets to be portable between applications less than level 3 MUST insert the VALUE() function to convert a text value into a number based on the text, since it is not guaranteed to be invoked by lower-level applications.

Note: Semantics vary in current applications:
  • Excel, Gnumeric, and SheetToGo use "Excel" semantics: If they encounter Text or a reference to Text, they always convert the text value to a number (using VALUE()) and return the number or error. (Walkenbach, 2004) Under these semantics, COS("hi") first computes VALUE("hi"), producing an Error, and COS() applied to an error value produces an error value.
  • Lotus 1-2-3v9, Quattro Pro, KSpread use "Lotus" semantics: If they encounter Text or a reference to Text, they always return the number 0. Under these semantics, COS("hi") computes COS(0), producing 1.
  • OpenOffice.org 2 splits the difference: Inline text is converted to a number (like Excel), but references to text are always considered 0 (even if they could be converted to a different number, and would be converted to a different number if in-line). Thus, in OOo 2, if B3 has the string value "7", B3+1 is 1, but "7"+1 and (B3&"")+1 are both 8.

TBD: Should we allow ALWAYS producing an error on implied text to number conversions? It'd give a better warning of spreadsheet problems, and incorrectly created spreadsheets can be very costly. Allowing this would require fancier tests, such as ISERR(IF(OR( ("7"+0)=7; ("7"+0)=0); 1/0; 0)) would have to be True. No current implementations do this, however. It also interferes with using text versions of complex numbers where ordinary numbers are accepted (e.g., with "+"), which would be an obvious way to implement complex numbers.

TBD: Handle complex numbers. The conversion for VALUE isn't right for level 4 if it's to support "+", etc., with possibly text values.

Expression Result Level Comment
=(1=1)+2 3 1 Inline logical True is converted to 1.
=[.B5]+[.B6] 4 1 Adding forces conversion of TRUE to 1, even if by reference
=OR( ("7"+0)=7; ("7"+0)=0) True 1 "7" should convert to 0 or 7.
=OR( ([.B3]+0)=7; ([.B3]+0)=0) True 1 B3 is "7"; it should convert to 0 or 7. This may be different than "7"+0.
=ISERR(IF(COS("hi")=1;1/0;0)) True 1 At level 1, functions expecting a number but get non-numeric text convert the number to 0 or an Error.
="7"+[.B4] 9 3 Adding forces conversion of constant text value "7" on LHS.
=[.B4]+"7" 9 3 Adding forces conversion of constant text value "7" on RHS.
=("4" & "5")+2 47 3 Adding forces conversion of computed text value "45" on LHS.
=2+("4" & "5") 47 3 Adding forces conversion of computed text value "45" on RHS.
=[.B3]+[.B4] 9 3 Adding forces conversion of referenced text value "7".
=[.B3]+[.B6] 8 3 Neither side needs to be numeric; plus itself forces the conversion from text.
=("2006-05-21"+0)=DATE(2006;5;21) True 3 Dates are converted too at level 3 and higher.

TBD: What happens on a reference to zero cells?

TBD: Need to define implied intersection function. If the implicit intersection is null, I believe it just takes the "first off the list" but again need to verify. See Walkenbach's "Microsoft Excel 2000 Formulas", pg 68.

Conversion to NumberSequence

If the expected type is NumberSequence, then if value is of type:

  • Number, Text, or Logical, handle as convert to Number (creating a sequence of length 1).
  • reference, create a sequence of numbers from the values of the referenced cells that only includes the values of type number or error. Text that could be converted into a value is not included in a number sequence. Level 3 or higher implementations MUST NOT include referenced values that are of type Logical, they MUST be skipped. Note: Level 1 and 2 systems may not have a distinguishable logical type; such systems use Number instead in those cases, and thus those values would be included in a NumberSequence.
Expression Result Level Comment
=SUM([.B3:.B6]) 5 3 In level 3 and higher, conversion to NumberSequence ignores strings (in B3) and logical values (a TRUE() in B6).
=SUM([.B3:.B10]) Error 1 If a sequence includes Error, the result has Error.


Note: Although this conversion is described as "creating" a sequence of numbers, in practice implementations typically implement this using mechanisms that do not create intermediate forms. For example, they may embed searching for the numbers as part of each function. This makes handling NumberSequences very fast.

Note: SUM expects a list of zero or more NumberSequences. In Excel, SUM() skips non-numbers and does NOT convert boolean (logical) or text types. This means that A1+A2 is not the same as SUM(A1:A2): if A1 or A2 are not a number, the "+" operator attempts to convert the value to a number, while SUM() simply ignores the values.

Note: On November 17, 2005, Richard Kernick posted to the OpenFormula mailing list some results from Pocket Excel, showing that Pocket Excel had subtly different semantics than Excel 2003. In particular, in Pocket Excel references to a range are treated differently than a reference to a single cell: explicit ranges work as usual for a conversion to a NumberSequence, EVEN when the range contains only one cell, but non-explicit ranges containing only one cell force a conversion as with the rules for Number. We have not identified any other implementation where this is true. Thus, Pocket Excel does not quite meet the rules stated above. Here was Kernick's demonstration. First, he set the following:
A1: =1=1 (boolean, TRUE)
A2: '2 (text, "2")
A3: 3 (numeric)

He then computed:

=A1+A2+A3      =6 (as expected)
=SUM(A1:A3)    =3 (as expected)
=SUM(A1,A2,A3) =6 (different, and not meeting the rules above. Excel returns 3)
=SUM(A1:A1,A2:A2,A3:A3) =3 (same as Excel 2003, but different from above)

This demonstrates that references to a range is treated differently to a reference to a single cell in Pocket Excel; this is a semantic variance with other implementations.

Rationale: Converting to a NumberSequence is different than converting to a Number. Obviously, it produces a list. NumberSequences ignore non-numbers, and an advantage of this rule is speed - implementations can quickly determine what values to use. This means that if users should convert text to numbers if they want them used in a NumberSequence. Since OOo2 has no separate Logical type, it has no way to implement the "skip Logicals" semantic.


Conversion to Complex Number

As noted earlier, an application may implement complex numbers as Text, Number, or a different distinguishable type. This specification does not state which, but it does define the properties of that implementation.

If the value is:

  • Number that is not complex, use the Number with 0 as the imaginary part.
  • Text, attempt to convert to number using VALUE(). If it is a number that is not complex, use it. If the text matches one of these patterns, use it:
 ([+-]?Number [+-])?Number[ij]
 [+-]?Number[ij]
  • Logical, convert to Number and then handle as Number.
  • reference: Convert to Scalar, then use the rules above.

Conversion to Logical

If the expected type is Logical, then if value is of type:

  • Number, return TRUE() for nonzero and FALSE() for 0.
  • Text, level 3 and higher applications examine the text; if the text matches the text "TRUE" ignoring case, then return TRUE, if the text matches the text "FALSE" ignoring case, then return FALSE, otherwise return Error. Applications less than level 3 MAY instead always consider text false (e.g., as with Numbers, return 0 which is always considered False).
  • Logical, return it.
  • reference to a single cell: first, obtain the value of the single cell. If that cell's value is of type Logical, return it. If it is not of type Logical, perform the conversion to Logical listed above (recurse).
  • reference to more than one cell: do an implied intersection to determine which cell to use and then handle as a single cell.


Expression Result Level Comment
=IF(5;TRUE();FALSE()) True 1 Nonzero considered True.
=IF(0;TRUE();FALSE()) False 1 Zero considered False.
=IF("FALSE";TRUE();FALSE()) False 1 Text matching "false" (ignoring case) is false.
=IF("False";TRUE();FALSE()) False 1 Text matching "false" (ignoring case) is false.
=IF("false";TRUE();FALSE()) False 1 Text matching "false" (ignoring case) is false.
=IF("TRUE";TRUE();FALSE()) True 3 Text matching "true" (ignoring case) is true.
=IF("True";TRUE();FALSE()) True 3 Text matching "true" (ignoring case) is true.
=IF("true";TRUE();FALSE()) True 3 Text matching "true" (ignoring case) is true.

Note: In Excel-like semantics (Excel, Gnumeric, SheetToGo), the text "True" and "False" (ignoring case) are automatically converted to True and False, else an error. On others, such as Lotus 1-2-3 Quattro Pro, and OpenOffice.org 2, this is handled like text to number. So on Lotus 1-2-3 and Quattro Pro, text is always considered 0, so all text is considered False. On OpenOffice.org, inline text is converted following Excel-like rules, but referenced text is always considered 0 and thus always False.

Conversion to Text

If the expected type is Text, then if value is of type:

  • Number, transform into text (no whitespace).
  • Text, return it.
  • Logical, return "TRUE" if it is TRUE and "FALSE" if it is false.
  • reference to a single cell: first, obtain the value of the single cell. If that cell's value is of type Text, return it. If it is not of type Text, perform the conversion to Text listed above (recurse).
  • reference to more than one cell: do an implied intersection to determine which cell to use and then handle as a single cell.

TBD: What happens precisely to number?

Standard Operators

The standard operators are simply functions with special syntax. The binary operations +, -, *, /, and ^, when provided two numbers, compute (respectively) the addition, subtraction, multiplication, division, and exponentiation of those numbers. The unary operation "-" produces the negative of this number. Unary + simply passes on its operand's value, unchanged, with exactly the same type. The string concatenation operator "&" converts any non-string values into strings before concatenating them.

Infix Operator "+"

Summary: Add two numbers.

Syntax: Number Left + Number Right

Returns: Number

Level: 1

Constraints: None

Semantics: Adds numbers together. Due to the way conversion works, text and logical values are converted to numbers.

Test Cases:

Expression Result Level Comment
=1+2 3 1 Simple addition.
=[.B4]+[.B5] 5 1 2+3 is 5.

See also: Infix Operator "-", Prefix Operator "+"

Infix Operator "-"

Summary: Subtract the second number from the first.

Syntax: Number Left - Number Right

Returns: Number

Level: 1

Constraints: None

Semantics: Due to the way conversion works, text and logical values are converted to numbers.

Test Cases:

Expression Result Level Comment
=3-1 2 1 Simple subtraction.
=[.B5]-[.B4] 1 1 3-2 is 1.
=5--2 7 1 Subtraction can be combined with unary minus.
=[.C8]-[.C7] 365 1 Difference of two dates is the number of days between them.

See also: Infix Operator "+", Prefix Operator "-"

Infix Operator "*"

Summary: Multiply two numbers.

Syntax: Number Left * Number Right

Returns: Number

Level: 1

Constraints: None

Semantics: Multiplies numbers together. Due to the way conversion works, text and logical values are converted to numbers.

Test Cases:

Expression Result Level Comment
=3*4 12 1 Simple multiplication.
=[.B4]*[.B5] 6 1 2*3 is 6.
=2+3*4 14 1 Multiplication has a higher precedence than addition.

See also: Infix Operator "+", Infix Operator "/"


TBD: Excel 2002 cannot use "*" to multiply two complex numbers together; instead, you have to use IMPRODUCT. This is, of course, rediculous. At one level note that implementations MAY do this (as a concession), but at a higher level require "*" and so on to do the Right Thing.

Infix Operator "/"

Summary: Divide the second number into the first.

Syntax: Number Left / Number Right

Returns: Number

Level: 1

Constraints: None

Semantics: Divides numbers. Due to the way conversion works, text and logical values are converted to numbers. Dividing by zero returns an error.

Test Cases:

Expression Result Level Comment
=6/3 2 1 Simple division.
=5/2 2.5 1 Simple division; fractional values are possible.
=1/0 Error 1 Dividing by zero is not allowed.

See also: Infix Operator "-", Infix Operator "*"

Infix Operator "^"

Summary: Exponentiation (Power).

Syntax: Number Left ^ Number Right

Returns: Number

Level: 1

Constraints: (Left != 0) OR (Right != 0)

Semantics: Returns POWER(Left, Right). Due to the way conversion works, text and logical values are converted to numbers. Implementations MAY determine 0^0 as 0, 1, or an Error. Note that level 2 and higher systems MUST implement unary minus with a higher precedence than exponentiation, but level 1 MAY reverse this order. Implementations are NOT REQUIRED to add parentheses to formulas to force this particular precedence rule; implementations MAY require that those who create formulas add parentheses as necessary to ensure that equations are always computed correctly on level 1 implementations.

Test Cases:

Expression Result Level Comment
=2^3 8 1 Simple exponentiation.
=9^0.5 3 1 Raising to the 0.5 power is the same as a square root.
=(-5)^3 -125 1 Must be able to accept Left < 0.
=4^-1 0.25 1 Must be able to accept Right < 0.
=5^0 1 1 Raising nonzero to the zeroth power results in 1.
=0^5 0 1 Raising zero to nonzero power results in 0.
=2+3*4^2 50 1 Precedence: ^ is higher than *, which is higher than +.
=-2^2 4 2 Unary "-" has a higher priority than "^".

TBD: Should we discuss 0^0? Excel returns #NUM! in such cases.

Rationale: We have both ^ and POWER(), even though they are SEMANTICALLY the same thing. Most spreadsheet implementations supply both, allowing users to use whichever representation they prefer and supporting roundtripping their preferred representations. Some users intentionally use ^ in some contexts, and POWER() in others, and without both representations it would be difficult to reconstruct a user's preferences.

Rationale: There has been a lot of discussion about the precedence of binary "^" compared to unary "-". It was felt that we could not be silent about precedence, and what's more, today's spreadsheets are nearly unanimous in implementing the precedence as stated, e.g., Excel and OOo2 both compute -2^2 as 4. Mandating anything else would greatly increase confusion and create an unnecessary difference with actual practice. The impact of this is expected to be rare; few financial applications use ^ at all, and those who are genuinely concerned about precedence are likely to parenthesize anyway.

See also: Infix Operator "*", POWER

Infix Operator "="

Summary: Report if two values are equal

Syntax: Scalar Left = Scalar Right

Returns: Logical

Level: 1

Constraints: None

Semantics: Returns TRUE if two values are equal. If the values differ in type, return FALSE. If the values are both Number, return TRUE if they are nearly equal near the limits of the accuracy of representation, else return FALSE. If they are both Text, return TRUE if the two values match, else return FALSE. If they are both Logicals, return TRUE if they are identical, else return FALSE. Note that, like most other functions, errors are propogated; error values cannot be compared to a constant error value to determine if that is the same error value.

For level 2 and higher, text comparisons MUST be case-insensitive; at level 1, case-sensitive matches for "=" are allowed.

Test Cases:

Expression Result Level Comment
=1=1 TRUE 1 Trivial comparison.
=[.B4]=2 TRUE 1 References are converted into numbers, and then compared.
=1=0 FALSE 1 Trivial comparison.
=3=3.0001 FALSE 1 Grossly wrong equality results are not acceptable. Spreadsheets cannot "pass" automated tests by simply making "=" always return TRUE when it's even slightly close.
="Hi"="HI" TRUE 2 Trivial text comparison - note that this is case-insensitive for level 2 or higher.
="Hi"="Bye" FALSE 1 Trivial text comparison - no match.
=FALSE()=FALSE() TRUE 1 Can compare logical values.
=TRUE()=FALSE() FALSE 1 Can compare logical values.
="5"=5 FALSE 1 Different types are not equal.
=TRUE()=1 FALSE 2 Logical types not the same as numbers.
=ISNA(NA()=NA()) True 1 If there's an error on either side, the result is an error -- even if you're comparing the "same" error on both sides.
=ERROR.TYPE(NA())=7 TRUE 3 Use ERROR.TYPE to determine which specific error has resulted.

See also: Infix Operator "<>"

Note: By default OpenOffice.org's equal operator performs case sensitive or case-insensitive matching, depending on the setting of Tools.Options.Calc.Calculate "Case sensitive". By default, OOo2 is case-sensitive, while most others are case-insensitive.

TBD: It'd be nice if a more accurate definition of "=" for numbers could be devised; it is not clear that implementations agree on one.

TBD: Check to make sure that "5" is never equal to 5, in all implementations, for the operator. EQUALS may be different.

Infix Operator "<>"

Summary: Report if two values are not equal

Syntax: Any Left <> Any Right

Returns: Logical

Level: 1

Constraints: None

Semantics: Returns NOT(Left = Right)

Test Cases:

Expression Result Level Comment
=1<>1 FALSE 1 1 really is 1.
=1<>2 TRUE 1 1 is not 2.
=1<>"1" TRUE 1 Text and Number have different types.
="Hi"<>"HI" FALSE 2 Text comparision ignores case distinctions.

See also: Infix Operator "="

Note: We don't need many test cases here, because it's defined in terms of another function that is tested.

Infix Operator Ordered Comparison ("<", "<=", ">", ">=")

Summary: Report if two values are equal

Syntax: Scalar Left op Scalar Right where op is one of: "<", "<=", ">", ">="

Returns: Logical

Level: 1

Constraints: None

Semantics: Returns TRUE if the two values are less than, less than or equal, greater than, or greater than or equal (respectively). If both Left and Right are Numbers, compare them as numbers. If both Left and Right are Text, compare them as text, case-insensitive. If the values are both Logical, convert both to Number and then compare as Number.

TBD: What if types differ? Is the case-insensitive comparison as though they were both upper case, both lower case, or something else? What about complex numbers?

Test Cases:

Expression Result Level Comment
=5<6 True 1 Trivial comparison.
=5<=6 True 1 Trivial comparison.
=5>6 False 1 Trivial comparison.
=5>=6 False 1 Trivial comparison.
="A"<"B" True 1 Trivial comparison.
="a"<"B" True 1 Text comparison is case-insensitive.
="AA">"A" True 1 Longer text is "larger" than shorter text, if they match in case-insensitive way through to the end of the shorter text.

See also: Infix Operator "<>", Infix Operator "="

Infix Operator Range (":")

Summary: Computes an inclusive range given two references

Syntax: Reference Left : Reference Right

Returns: Reference

Level: 2

Constraints: None

Semantics: Takes two references and computes the range, that is, a reference to cells including both Left and Right.

TBD: What if Left or Right are not a reference to a single cell? We have an "internal" range operator in ODF, plus this more general range operator - how can we describe this?

Test Cases:

Expression Result Level Comment
=SUM([.B4:.B5]) 5 1 This isn't a range operation in ODF; it's just a cell specifier, with a range embedded
=SUM([.B4]:[.B5]) 5 2 Simple range creation. Note the range OUTSIDE the [..] markers

See also:

Infix Operator "&"

Summary: Concatenate two strings.

Syntax: Text Left & Text Right

Returns: Text

Level: 1

Constraints: None

Semantics: Concatenates two text (string) values. Due to the way conversion works, numbers are converted to strings. Note that this is equivalent to CONCATENATE(Left,Right).

Test Cases:

Expression Result Level Comment
="Hi " & "there" ="Hi there" 1 Simple concatenation.
="H" & "" ="H" 1 Concatenating an empty string produces no change.

See also: Infix Operator "+", CONCATENATE

Rationale: As with exponentiation, both infix operator "&" and the CONCATENATE function are required so that it is easy to preserve user preferences for representation.

Infix Operator Intersection ("!" or " ")

Summary: Compute the intersection of two references

Syntax: Reference Left ! Reference Right

Returns: Reference

Level: 1

Constraints: None

Semantics: Takes two references and computes the intersection - a reference to the intersection of cells in both Left and Right. If there are no cells in common, returns an Error. Note that this is notated as "!" in OpenDocument format, but as a space in MOOX.

TBD: What if the references are not simple, e.g., multiple areas? What if Left or Right are not a reference?

Test Cases:

Expression Result Level Comment
=SUM([.B3:.B5]![.B5:.B6]) 3 1 Simple intersection; reference result is [.B5], and SUM simply sums that one number (3) and returns it.

See also:


Infix Operator Union ("~", ",")

Summary: Compute the union of two references

Syntax: Reference Left ~ Reference Right

Returns: Reference

Level: 3

Constraints: None

Semantics: Takes two references and computes the "cell union", which is simply a concatenation of the reference Left followed by the reference Right. This is not the same as a set union; duplicate references to cells are not removed. Note that this is proposed as being notated as "~" in OpenDocument format, but as a comma in MOOX.

TBD: What if Left or Right are not a reference?

Test Cases:

Expression Result Level Comment
=SUM([.B4:.B5]~[.B4:.B5]) 10 3 Simple union. Note that duplicate values NOT removed, so we have 2 + 3 + 2 + 3

See also:

Postfix Operator "%"

Summary: Divide the operatand by 100

Syntax: Number Left %

Returns: Number

Level: 1

Constraints: None

Semantics: Computes Left / 100.

Test Cases:

Expression Result Level Comment
=50% 0.5 1 Simple percent value.
=20+50% 20.5 1 Percent does not change the meaning of other operations; this is not 30.

Note: Implementations typically change the display format for values that are percentages. Many have heuristics that treat percentages as a subtype of Number, and track which subtype results from a computation, to automatically set the initial format of a number. These heuristics vary by implementation (and by version of implementation), so they are not specified here.

Rationale: The % operator makes it easy to enter and display percentages. Spreadsheet implementations use % as general operator (like ! for factorial in traditional mathematics), and do not limit its use to being after a constant number. This standard defines it in its full generality, to accurately reflect general consensus by spreadsheet implementors.

Although this could be represented as simply /100, doing so would make it unnecessarily difficult to round-trip formulas while accurately reflecting user input.

See also: Prefix Operator "-"

Prefix Operator "+"

Summary: No operation; simply returns its one argument.

Syntax: + Any Right

Returns: Any

Level: 1

Constraints: None

Semantics: Returns the value given to it. Note that this does not convert a value to the Number type. In fact, it does no conversion at all of a Number, Logical, or Text value - it returns the same Number, Logical, or Text value (respectively). The "+" applied to a reference may return the reference, or an Error.

Test Cases:

Expression Result Level Comment
=+5 5 1 Numbers don't change
=+"Hello" "Hello" 1 Does not convert a string to a number.

See also: Infix Operator "+"

Note: In OOo2, SUM(+[.B4:.B5]) produces 5, because the "+" operator causes no change and just returns the range reference. In Excel 2003, it produces a value error; prefix "+" is not permitted on references.


Prefix Operator "-"

Summary: Negate its one argument.

Syntax: - Number Right

Returns: Number

Level: 1

Constraints: None

Semantics: Computes 0 - Right.

Test Cases:

Expression Result Level Comment
=-[.B4] -2 1 Negated 2 is -2.
=-2=(0-2) True 3 Negative numbers are fine.

See also: Infix Operator "-"

TBD: Need to add comparison operators. The comparison operators compare two different values. If the values are both numbers, they are compared numerically. The equal operator when comparing strings does a case-insensitive match; for case-sensitive comparing use EQUAL() (Walkenbach, 2004). Need to discuss what happens with various types in comparisons, in particular, what if types are mixed -- which one "wins"? May need to add a pseudotype so that there's no need to repeat the rules in each comparison operator.

List of Functions

The following is a list of all functions predefined by this specification, grouped by level.

Level 1

The following 109 functions MUST be supported by all implementing applications, and are thus assigned to level 1:

ABS ACOS AND ASIN ATAN ATAN2 AVERAGE CHOOSE COLUMNS COS COUNT COUNTA COUNTBLANK COUNTIF DATE DAVERAGE DAY DCOUNT DCOUNTA DDB DEGREES DGET DMAX DMIN DPRODUCT DSTDEV DSTDEVP DSUM DVAR DVARP EVEN EXACT EXP FACT FALSE FIND FV HLOOKUP HOUR IF INDEX INT IRR ISBLANK ISERR ISERROR ISLOGICAL ISNA ISNONTEXT ISNUMBER ISTEXT LEFT LEN LN LOG LOG10 LOWER MATCH MAX MID MIN MINUTE MOD MONTH N NA NOT NOW NPER NPV ODD OR PI PMT POWER PRODUCT PROPER PV RADIANS RATE REPLACE REPT RIGHT ROUND ROWS SECOND SIN SLN SQRT STDEV STDEVP SUBSTITUTE SUM SUMIF SYD T TAN TIME TODAY TRIM TRUE TRUNC UPPER VALUE VAR VARP VLOOKUP WEEKDAY YEAR

Rationale: The list of level 1 functions is based on Richard Kernick's email of 2005-10-20 on the OpenFormula mailing list, identifying the list of functions supported by Documents to Go for the Palm, version 6. He proposed that this list would be an excellent subset, since it demonstrated (through an actual implementation) what functions would best support a significant number of spreadsheet users in small environments. In general, we want to emphasize what industry is doing, instead of pretending, so unless suggested otherwise we used actual implementations as our guide.

Changes were then made based on examination of various implementations. ERROR.TYPE is named ERRORTYPE in OOo2 with completely different results, and so has been moved to a level higher than 1. CONCATENATE was not in OOo2 (and people normally use & anyway), so it was moved to a higher level as well.

TBD: A few of these functions are not in Quattro Pro or Lotus 1-2-3.

In particular, there is an ISERR/ISERROR issue. Both Quattro Pro and Lotus 1-2-3v9 have a function named ISERR; neither has a function named ISERROR. Even more confusingly, in Lotus 1-2-3v9, ISERR has the functionality of ISERROR instead of ISERR. This is because in Lotus 1-2-3v9, @ISERR(@NA) is 1 (True). In Quattro Pro, Excel, OpenOffice.org, SheetToGo, etc., ISERR(NA()) is 0 (False); this is written as @ISERR(@NA) in Quattro Pro.

The function ISERROR could be moved to a higher level, leaving only ISERR at level 1; but this would not handle Lotus 1-2-3v9.8, since it would map its ISERR to this specification's ISERROR. Removing ISERROR would require rewriting of many level 1 tests as well as the testsuite generator. In some sense this means that "NA" isn't considered an "error" by systems that only have ISERR, since the only True/False detection system skips NA. But by defining ISERR with the semantics as has been done, the model defined in this specification is still correct. Conversely, ISERR could be moved to a higher level, which would mean that Quattro Pro does not meet level 1. There is no perfect solution; it is impractical to have a specification lacking both ISERR and ISERROR, and although the similarity in names is unfortunate, this really is common practice.

Other functions not in Quattro Pro are: COLUMNS, COUNTA, DCOUNTA, DSTDEV, LOG10, PRODUCT.

In some cases the "nonmatches" are simply different names. E.G., Quattro Pro's ISSTRING maps to ISTEXT, LENGTH to LEN, REPEAT to REPT, and so on.

Lotus 1-2-3v9.8.1 represents AND, OR, and NOT as operators but has them. It has no distinguishable POWER function, and must use the infix operator. Functions not in Lotus 1-2-3v9.8.1 are: COLUMNS, COUNTA, DCOUNTA, DEGREES, DPRODUCT, ISLOGICAL, RADIANS, SUBSTITUTE. Many functions are just renamed, e.g., ISEMPTY becomes ISBLANK.


Note: John Cowan reported on 2005-10-19 that he had examined the following implementations for their time/date functions:
Excel 2003              internal help
Gnumeric                list on web site
KSpread                 posted to this list
Quantrix Modeler 2.0    manual on web site
OpenOffice 2.0          manual draft on web site

Cowan reported that "Quantrix Modeler is a spreadsheet-like application that descends conceptually from Lotus Improv, so it is an offshoot of the main line of tradition and therefore valuable as a cross-check."

All 5 of them implemented the following, and therefore were recommended for level 1:

DATE, DATEVALUE, DAY, DAYS360, HOUR, MINUTE, MONTH, NOW,
SECOND, TIME, TIMEVALUE, TODAY, WEEKDAY, YEAR.

He proposed the following functions for level 2. They are not present in Quantrix, and are available in OpenOffice only when the Analysis AddIn is present:

EDATE, EOMONTH, NETWORKDAYS, WORKDAY, YEARFRAC.

Level 2

The following are available in applications which meet at least level 2 (this includes, of course, all of the ones of the previous level):

ABS ACCRINT ACCRINTM ACOS ACOSH ADDRESS AND ASIN ASINH ATAN ATAN2 ATANH AVEDEV AVERAGE BESSELI BESSELJ BESSELK BESSELY BETADIST BETAINV BINOMDIST CEILING CELL CHAR CHIDIST CHIINV CHITEST CHOOSE CLEAN CODE COLUMN COMBIN CONCATENATE CONFIDENCE CONVERT CORREL COS COSH COUNT COUNTBLANK COUNTIF COUPDAYBS COUPDAYS COUPDAYSNC COUPNCD COUPNUM COUPPCD COVAR CRITBINOM CUMIPMT CUMPRINC DATE DATEDIF DATEVALUE DAVERAGE DAY DAYS360 DB DCOUNT DDB DEGREES DEVSQ DGET DISC DMAX DMIN DOLLAR DOLLARDE DOLLARFR DPRODUCT DSTDEV DSUM DURATION DVAR DVARP EOMONTH ERF ERFC EVEN EXACT EXP EXPONDIST FACT FALSE FDIST FIND FINV FISHER FISHERINV FIXED FLOOR FORECAST FTEST FV GAMMADIST GAMMAINV GAMMALN GCD GEOMEAN HARMEAN HLOOKUP HOUR HYPGEOMDIST IF INDEX INT INTERCEPT INTRATE IRR ISBLANK ISERR ISEVEN ISLOGICAL ISNA ISNONTEXT ISNUMBER ISODD ISTEXT KURT LARGE LCM LEFT LEN LINEST LN LOG LOGEST LOGINV LOGNORMDIST LOOKUP LOWER MATCH MAX MDURATION MEDIAN MID MIN MINUTE MINVERSE MIRR MMULT MOD MODE MONTH MROUND MULTINOMIAL N NA NEGBINOMDIST NETWORKDAYS NOMINAL NORMDIST NORMINV NORMSDIST NORMSINV NOT NOW NPER NPV ODD ODDFPRICE ODDFYIELD ODDLPRICE ODDLYIELD OFFSET OR PEARSON PERCENTILE PERCENTRANK PERMUT PI PMT POISSON POWER PRICE PRICEMAT PROB PRODUCT PROPER PV QUARTILE QUOTIENT RADIANS RAND RANDBETWEEN RANK RATE RECEIVED REPLACE REPT RIGHT ROMAN ROUND ROUNDDOWN ROUNDUP ROW ROWS RSQ SECOND SERIESSUM SIGN SIN SINH SKEW SLN SLOPE SMALL SQRT SQRTPI STANDARDIZE STDEV STDEVP STDEVPA STEYX SUBSTITUTE SUBTOTAL SUM SUMIF SUMPRODUCT SUMSQ SUMX2MY2 SUMX2PY2 SUMXMY2 SYD T TAN TANH TBILLEQ TBILLPRICE TBILLYIELD TDIST TIME TIMEVALUE TINV TODAY TRANSPOSE TREND TRIM TRIMMEAN TRUE TRUNC TTEST TYPE UPPER VALUE VAR VARA VARP VDB VLOOKUP WEEKDAY WEEKNUM WEIBULL WORKDAY XIRR XNPV YEAR YEARFRAC YIELD YIELDDISC YIELDMAT ZTEST


Rationale: This is the set of functions that are "widely implemented" by desktop spreadsheet applications. This was originally computed by starting with level 1, and adding functions that were implemented by at least 4 of the following applications: Excel 2003, Gnumeric, Lotus 1-2-3 v9, OpenOffice.org 2, and QuattroPro 12. Some effort was made to identify translations (e.g., where the same function has a different name). Changes can be made, of course.

Level 3

The following functions are available at level 3 (this includes, of course, all of the functions in the previous level):

ABS ACCRINT ACCRINTM ACOS ACOSH ADDRESS AMORDEGRC AMORLINC AND AREAS ASC ASIN ASINH ATAN ATAN2 ATANH AVEDEV AVERAGE AVERAGEA BAHTTEXT BESSELI BESSELJ BESSELK BESSELY BETADIST BETAINV BIN2DEC BIN2HEX BIN2OCT BINOMDIST CEILING CELL CHAR CHIDIST CHIINV CHITEST CHOOSE CLEAN CODE COLUMN COLUMNS COMBIN COMPLEX CONCATENATE CONFIDENCE CONVERT CORREL COS COSH COUNT COUNTA COUNTBLANK COUNTIF COUPDAYBS COUPDAYS COUPDAYSNC COUPNCD COUPNUM COUPPCD COVAR CRITBINOM CUMIPMT CUMPRINC DATE DATEDIF DATESTRING DATEVALUE DAVERAGE DAY DAYS360 DB DBSC DCOUNT DCOUNTA DDB DEC2BIN DEC2HEX DEC2OCT DEGREES DELTA DEVSQ DGET DISC DMAX DMIN DOLLAR DOLLARDE DOLLARFR DPRODUCT DSTDEV DSTDEVP DSUM DURATION DVAR DVARP EDATE EFFECT EOMONTH ERF ERFC ERROR.TYPE EVEN EXACT EXP EXPONDIST FACT FACTDOUBLE FALSE FDIST FIND FINDB FINV FISHER FISHERINV FIXED FLOOR FORECAST FREQUENCY FTEST FV FVSCHEDULE GAMMADIST GAMMAINV GAMMALN GCD GEOMEAN GESTEP GETPIVOTDATA GROWTH HARMEAN HEX2BIN HEX2DEC HEX2OCT HLOOKUP HOUR HYPERLINK HYPGEOMDIST HYPGEOMVERT IF IMABS IMAGINARY IMARGUMENT IMCONJUGATE IMCOS IMDIV IMEXP IMLN IMLOG10 IMLOG2 IMPOWER IMREAL IMSIN IMSQRT IMSUB INDEX INDIRECT INFO INT INTERCEPT INTRATE IPMT IRR ISBLANK ISERR ISERROR ISEVEN ISLOGICAL ISNA ISNONTEXT ISNUMBER ISODD ISPMT ISREF ISTEXT KURT LARGE LCM LEFT LEFTB LEN LENB LINEST LN LOG LOG10 LOGEST LOGINV LOGNORMDIST LOOKUP LOWER MATCH MAX MAXA MDETERM MDURATION MEDIAN MID MIDB MIN MINA MINUTE MINVERSE MIRR MMULT MNORMSINV MOD MODE MONTH MROUND MULTINOMIAL N NA NEGBINOMDIST NETWORKDAYS NOMINAL NORMDIST NORMINV NORMSDIST NORMSINV NOT NOW NPER NPV NUMBERSTRING OCT2BIN OCT2DEC OCT2HEX ODD ODDFPRICE ODDFYIELD ODDLPRICE ODDLYIELD OFFSET OR PEARSON PERCENTILE PERCENTRANK PERMUT PHONETIC PI PMT POISSON POWER PPMT PRICE PRICEMAT PROB PRODUCT PROPER PV QUARTILE QUOTIENT RADIANS RAND RANDBETWEEN RANK RATE RECEIVED REPLACE REPLACEB REPT RIGHT RIGHTB ROMAN ROUND ROUNDDOWN ROUNDUP ROW ROWS RSQ RTD SEARCH SEARCHB SECOND SERIESSUM SIGN SIN SINH SKEW SLN SLOPE SMALL SQRT SQRTPI STANDARDIZE STDEV STDEVA STDEVP STDEVPA STEYX SUBSTITUTE SUBTOTAL SUM SUMIF SUMPRODUCT SUMSQ SUMX2MY2 SUMX2PY2 SUMXMY2 SYD T TAN TANH TBILLEQ TBILLPRICE TBILLYIELD TDIST TEXT TIME TIMEVALUE TINV TODAY TRANSPOSE TREND TRIM TRIMMEAN TRUE TRUNC TTEST TYPE UPPER USDOLLAR VALUE VAR VARA VARP VARPA VDB VLOOKUP WEEKDAY WEEKNUM WEIBULL WORKDAY XIRR XNPV YEAR YEARFRAC YIELD YIELDDISC YIELDMAT ZTEST

Rationale: This is the list supported by Microsoft Excel. Many people want to make sure that their spreadsheets can be sent to Excel, and/or that they can read spreadsheets from Excel, and the way to ensure that is to ensure that the same functions are supported. This list is presented as a complete list, instead of a separate "these functions have been added" list, to allow people to look at a single list if they want to see "is this function supported".

Level 4

The following functions are added at level 4, in addition to the functions above:

ABDAYS ACCRINTXL ACCRUED ACCRUED2 ACDAYS ACOT ACOTH ACSC ACSCH ADDB ADDBO ADDH ADDHO AMAINT AMINT AMNTHS AMPMT AMPMTI AMPRN AMRES AMRPRN AMTERM ANDB ANDH ARABIC ARRAY ASCII ASCIITOCHAR ASCTOHEX ASEC ASECH ATL_LAST B BASE BDAYS BERNOULLI BETA BETAI BETALN BINO BINOMIAL BINTOHEX BINTOHEX64 BINTONUM BINTONUM64 BINTOOCT BINTOOCT64 BITAND BITLSHIFT BITOR BITRB BITRH BITRSHIFT BITSB BITSH BITTB BITTH BITXOR BLOCKNAME BLOCKNAME2 BLOCKNAMES BLOCKNAMES2 BOOL2INT BOOL2STRING BUSDAY CALL CARX CARY CATB CATH CATNB CATNH CAUCHY CDAYS CEIL CELLINDEX CELLPOINTER CHARTOASCII CHR CNT COLS COLUMNNUMBER COMB COMBINA COMMAND COMPARE COMPOUND CONTINUOUS COORD COT COTH COV CRITBINOMIAL CRONBACH CSC CSCH CUMIPMT_ADD CUMPRINC_ADD CUM_BIV_NORM_DIST CUR CURRENT CURRENTDATE CURRENTDATETIME CURRENTTIME CURVALUE DATALINK DATE2UNIX DATECONVERT DATEINFO DAYNAME DAYOFYEAR DAYS DAYSINMONTH DAYSINYEAR DCNT DDE DDELINK DEC2FRAC DECILE DECIMAL DECSEX DEGTORAD DEVSQA DFRAC DIMCIRC DIV DOLLARTEXT DPURECOUNT DSTD DSTDS DURAT DURATION_ADD DVARS EASTERSUNDAY EDIGIT EFFECTIVE EFFECT_ADD EMNTH EPS ERFD ERR ERROR ERRORTYPE EURO EUROCONVERT EXP2 EXPM1 EXPPOWDIST EXPRESSION FACTLN FBDAY FEETBL FIB FIELD FILEEXISTS FILENAME FIRSTBLANKPAGE FIRSTINGROUP FORMULA FRAC2DEC FRACD FRACTION FREQDIST FULLP FUTV FV2 FVAL FVAMOUNT FV_ANNUITY GAMMA GAMMAI GAMMAP GAMMAQ GAUSS GCD_ADD GEOMDIST GEOSUM GETENV GETGROUP GETREGISTRYKEY GRANDTOTAL GRANDTOTAL123 G_DURATION G_PRODUCT HALFP HEX HEXTOASC HEXTOBIN HEXTOBIN64 HEXTONUM HEXTONUM64 HEXTOOCT HEXTOOCT64 HOLS HOURS HYPOT IMARCCOS IMARCCOSH IMARCCOT IMARCCOTH IMARCCSC IMARCCSCH IMARCSEC IMARCSECH IMARCSIN IMARCSINH IMARCTAN IMARCTANH IMCOSH IMCOT IMCOTH IMCSC IMCSCH IMINV IMNEG IMPRODUCT IMSEC IMSECH IMSINH IMSUM IMTAN IMTANH INDEXTOLETTER INT2BOOL INTXL INV INVB INVBINO INVH INVSUMINV IPAYMT IRATE ISAAF ISAPP ISBDAY ISBETWEEN ISBLOCK ISDATE ISEMPTY ISEVEN_ADD ISFILE ISFORMULA ISLEAPYEAR ISLEGALPAGENAME ISMACRO ISNOTTEXT ISNUM ISODD_ADD ISOWEEKNUM ISOYEAR ISPRIME ISTIME ITHPRIME KANSUUJI KPRODUCT KURTOSIS KURTP LANDAU LAPLACE LARGEST LASTBLANKPAGE LASTCELLVALUE LASTINGROUP LBDAY LCM_ADD LENGTHB LETTERTOINDEX LEVEL_COUPON LINTERP LLDEC LN1P LOG2 LOGBASE LOGCONV LOGFIT LOGISTIC LOGN LOGREG LWKDAY MAXLOOKUP MDAYS MDET MEMAVAIL MEMEMSAVAIL MINLOOKUP MINUTES MNTHS MODULO MONTHNAME MONTHS MTGACC MULT MULTIPLEOPERATIONS MULTIPLY MUNIT NAND NBDAY NENGO NETPV NEXTMONTH NOMINAL_ADD NOR NORMAL NSUM NT_D NT_MU NT_PHI NT_PI NT_SIGMA NUM2STRING NUMTOBIN NUMTOBIN64 NUMTOHEX NUMTOHEX64 NUMTOOCT NUMTOOCT64 NWKDAY OCTTOBIN OCTTOHEX OCTTONUM OFFCAP OFFTRAF OPT_2_ASSET_CORRELATION OPT_AMER_EXCHANGE OPT_BAW_AMER OPT_BINOMIAL OPT_BJER_STENS OPT_BS OPT_BS_CARRYCOST OPT_BS_DELTA OPT_BS_GAMMA OPT_BS_RHO OPT_BS_THETA OPT_BS_VEGA OPT_COMPLEX_CHOOSER OPT_EURO_EXCHANGE OPT_EXEC OPT_EXTENDIBLE_WRITER OPT_FIXED_STRK_LKBK OPT_FLOAT_STRK_LKBK OPT_FORWARD_START OPT_FRENCH OPT_GARMAN_KOHLHAGEN OPT_JUMP_DIFF OPT_MILTERSEN_SCHWARTZ OPT_ON_OPTIONS OPT_RGW OPT_SIMPLE_CHOOSER OPT_SPREAD_APPROX OPT_TIME_SWITCH ORB ORH PAGEINDEX PAGEINDEX2 PAGENAME PAGENAME2 PAGENAMES PAGENAMES2 PARETO PAYMT PBDAY PERMUTATIONA PFACTOR PHI PIRATE PMT2 PMTC PMTI POLA POLR POW PPAYMT PRANK PRICE2 PRICEDISC PROBBLOCK PROPERTY PUREAVG PURECOUNT PUREMAX PUREMEDIAN PUREMIN PURESTD PURESTDS PUREVAR PUREVARS PV2 PVAL PVAMOUNT PV_ANNUITY R.DBETA R.DBINOM R.DCAUCHY R.DCHISQ R.DEXP R.DF R.DGAMMA R.DGEOM R.DHYPER R.DLNORM R.DNBINOM R.DNORM R.DPOIS R.DT R.DWEIBULL R.PBETA R.PBINOM R.PCAUCHY R.PCHISQ R.PEXP R.PF R.PGAMMA R.PGEOM R.PHYPER R.PLNORM R.PNBINOM R.PNORM R.PPOIS R.PT R.PWEIBULL R.QBETA R.QBINOM R.QCAUCHY R.QCHISQ R.QEXP R.QF R.QGAMMA R.QGEOM R.QHYPER R.QLNORM R.QNBINOM R.QNORM R.QPOIS R.QT R.QWEIBULL RADIANS RADIX RADTODEG RAND RANDBERNOULLI RANDBETA RANDBETWEEN RANDBINOM RANDCAUCHY RANDCHISQ RANDDISCRETE RANDEXP RANDEXPPOW RANDFDIST RANDGAMMA RANDGEOM RANDGUMBEL RANDHYPERG RANDLANDAU RANDLAPLACE RANDLEVY RANDLOG RANDLOGISTIC RANDLOGNORM RANDNEGBINOM RANDNORM RANDNORMTAIL RANDPARETO RANDPOISSON RANDRAYLEIGH RANDRAYLEIGHTAIL RANDTDIST RANDUNIFORM RANDWEIBULL RANGENAME RANK RATE RAYLEIGH RAYLEIGHTAIL RECEIVED REFCONVERT REGEXP REGEXPRE REGRESSION REPEAT REPLACE REPLACEB REPT RIGHT RIGHTB ROMAN ROOTN ROT ROUND ROUNDDOWN ROUNDDOWNXL ROUNDM ROUNDUP ROUNDUPXL ROW ROWS RRI SCENARIOINFO SCENARIOLAST SCMARG SEC SECH SECONDS SEMEAN SETSTRING SEXDEC SHEET SHEETS SHLB SHLBO SHLH SHLHO SHRB SHRBO SHRH SHRHO SIMTABLE SKEWNESS SKEWP SLEEK SMALLEST SOY SPI SPLINE SSMEDIAN STD STDS STEC STKOPT STRCMPNORM STREQ STRING STYLE SUBB SUBBO SUBH SUBHO SUBTOTAL123 SUBTOTAL2 SUBTOTALX SUM2XMY SUMA SUMNEGATIVE SUMPOSITIVE SUMXPY2 SUMXY SUMXY2 SUUJI TABLELINK TDATESTRING TDIGIT TDOW TERM TERM2 TFIND TLDATESTRING TLEFT TLENGTH TMID TNUMBERSTRING TOGGLE TOTAL TREPLACE TRIGHT UNICHAR UNICODE UNIX2DATE USESPLINE V VARIANCE VARS VERSION VERSIONCURRENT VERSIONDATA VERSIONINFO VHLOOKUP WEEKNUM_ADD WEEKS WEEKSINYEAR WEIGHTAVG WKDAY XCOUNT XINDEX XOR XORB XORH YDAYS YDIV YEARS YIELD2 YIELDPER YLD2YLD ZERO_COUPON


Rationale: This is a starting point for identifying additional functions, based on their existence in at least one spreadsheet application. This list is presented as an additional list, instead of a separate "this is the complete list of functions" list, because many people want to see exactly what was added at level 4. We expect the actual list to be far smaller, but starting with the complete list seemed like the simplest way to start.

TBD: This list is far too long; many of these functions will be culled away. The list above is only a temporary measure, as a place to start, so that we do not forget any. Some functions, like ARABIC, BASE, and ISPRIME, are almost certain to be included. Some, like EASTERSUNDAY, may be of value or inspire another function to be added. Others, like GNUMERIC_VERSION, should definitely not be included. See http://www.openformula.org/Research_Data for more information.

Array Functions

Array functions operate on arrays, often in special ways.

Database Functions

Database functions operate on a database, generally with a criteria and a selected field. See the earlier discussion on types for definitions of these terms.


DAVERAGE

Summary: Finds the average of values in a given field from the records (rows) in a database that match a search criteria.

Syntax: DAVERAGE(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform MAX on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DAVERAGE(TESTDB; "TestID"; [.B36:.B37]) 48 1 Simple criteria.

See also: AVERAGE COUNT DSUM DCOUNT SUM

DCOUNT

Summary: Counts the number of records (rows) in a database that match a search criteria and contain numerical values.

Syntax: DCOUNT(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform COUNT on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DCOUNT(TESTDB; "Bright Stars"; [.B36:.B37]) 2 1 Simple criteria.

See also: COUNT COUNTA DCOUNTA DSUM

DCOUNTA

Summary: Counts the number of records (rows) in a database that match a search criteria and contain values (as COUNTA).

Syntax: DCOUNTA(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform COUNTA on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DCOUNTA(TESTDB; "Bright Stars"; [.B36:.B37]) 2 1 Simple criteria.

See also: COUNT COUNTA DCOUNT DSUM

Note: DCOUNTA counts Text or Number. Is Logical is a distinct type, as it is in level 2 or higher, such rows will not be counted. Criteria that have Logical values should still work.

TBD: KSpread 1.4.2 DCOUNTA does not work if the Criteria are Logical values. E.G., you can't say "Correct?" with a boolean value below, and expect it to match boolean values in the records. If you change them all to Numbers (e.g., 0 or 1), so that the Criteria to match is a number and the True/False computed values become 1 or 0, KSpread works fine.

Also, if KSpread's DCOUNTA has as a Criteria that it must match "1", for some odd reason it also matches 1/0 (Infinity). Note that in KSpread, 1/0 is not the same as MOD(3;0); the former is "Infinity", while the latter is #DIV/0! (other spreadsheets consider them the same error).

DGET

Summary: Get the single value in the field from the single record (row) in a database that matches a search criteria.

Syntax: DGET(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Extracts the value in field F of the one data record in database D that matches criteria C. If no records match, or more than one matches, it returns an error.

Test Cases:

Expression Result Level Comment
=DGET(TESTDB; "TestID"; [.D36:.D37]) 2048 1 Simple criteria, single row answer.
=DGET(TESTDB; "TestID"; [.B36:.B37]) Error 1 Simple criteria, but multiple row answers so an error is returned.

See also: DMAX DMIN DSUM

DMAX

Summary: Finds the maximum value in a given field from the records (rows) in a database that match a search criteria.

Syntax: DMAX(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform MAX on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DMAX(TESTDB; "TestID"; [.B36:.B37]) 64 1 Simple criteria.

See also: MAX DMIN MIN

DMIN

Summary: Finds the minimum value in a given field from the records (rows) in a database that match a search criteria.

Syntax: DMIN(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform MIN on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DMIN(TESTDB; "TestID"; [.B36:.B37]) 32 1 Simple criteria.

See also: MIN DMAX MAX

DPRODUCT

Summary: Finds the product of values in a given field from the records (rows) in a database that match a search criteria.

Syntax: DPRODUCT(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Multiply together only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DPRODUCT(TESTDB; "TestID"; [.B36:.B37]) 2048 1 Simple criteria.

See also: SUM DSUM

DSTDEV

Summary: Finds the sample standard deviation in a given field from the records (rows) in a database that match a search criteria.

Syntax: DSTDEV(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform STDEV on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=ABS( DSTDEV(TESTDB; "TestID"; [.B36:.B37]) - 22.6274169979695) < 0.0000001 TRUE 1 Simple criteria.

See also: STDEV DSTDEVP

DSTDEVP

Summary: Finds the population standard deviation in a given field from the records (rows) in a database that match a search criteria.

Syntax: DSTDEVP(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform STDEVP on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DSTDEVP(TESTDB; "TestID"; [.B36:.B37]) 16 1 Simple criteria.

See also: STDEVP DSTDEV

DSUM

Summary: Finds the sum of values in a given field from the records (rows) in a database that match a search criteria.

Syntax: DSUM(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform SUM on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DSUM(TESTDB; "TestID"; [.B36:.B37]) 96 1 Simple criteria.

See also: SUM DMIN DMAX

DVAR

Summary: Finds the sample variance in a given field from the records (rows) in a database that match a search criteria.

Syntax: DVAR(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform VAR on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DVAR(TESTDB; "TestID"; [.B36:.B37]) 512 1 Simple criteria.

See also: VAR DVARP

DVARP

Summary: Finds the population variance in a given field from the records (rows) in a database that match a search criteria.

Syntax: DVARP(Database D; Field F; Criteria C)

Returns: Number

Level: 1

Constraints: None

Semantics: Perform VARP on only the data records in database D field F that match criteria C.

Test Cases:

Expression Result Level Comment
=DVARP(TESTDB; "TestID"; [.B36:.B37]) 256 1 Simple criteria.

See also: VARP DVAR

Date and Time Functions

The following are date and time functions. Note that VALUE and DATEVALUE may be needed to convert text into dates.

DATE

Summary: Construct date from year, month, and day of month.

Syntax: DATE( Number Year ; Number Month ; Number Day )

Returns: Number

Level: 1

Constraints: None

Semantics: This computes the date (i.e., its serial number) given Year, Month, and Day. Fractional values are truncated. The actual number DATE produces is not mandated by this specification (it depends on the current epoch), but it must compute values so that date differences are computed correctly. Note that some applications may not handle correctly dates before 1904; in particular, many spreadsheets incorrectly determine that 1900 is a leap year (it was not; there was no 2/29/1900). At level 2 or greater, numbers out of range wrap around (level 1 systems may report an error instead).

TBD: If days are negative, are they truncated like INT() towards negative infinity, or something else?

TBD: It's unclear if everyone truncates Year and Month; state that it's unspecified?

Test Cases:

Expression Result Level Comments
=DATE(2005;1;31)=[.C7] True 1 Simple date value.
=DATE(2005;12;31)-DATE(1904;1;1) 37255 1 Date differences are computed correctly.
=DATE(2004;2;29)=DATE(2004;2;28)+1 True 1 2004 was a leap year.
=DATE(2000;2;29)=DATE(2000;2;28)+1 True 1 2000 was a leap year.
=DATE(2005;3;1)=DATE(2005;2;28)+1 True 1 2005 was not a leap year.
=DATE(2017.5; 1; 2)=DATE(2017; 1; 2) True 1 Fractional values for year are truncated
=DATE(2006; 2.5; 3)=DATE(2006; 2; 3) True 1 Fractional values for month are truncated
=DATE(2006; 1; 3.5)=DATE(2006; 1; 3) True 1 Fractional values for day are truncated
=DATE(2006; 13; 3)=DATE(2007; 1; 3) True 2 Months > 12 roll over to year 2
=DATE(2006; 1; 32)=DATE(2006; 2; 1) True 2 Days greater than month limit roll over to month2
=DATE(2006; 25; 34)=DATE(2008;2;3) True 2 Days and months roll over transitively 2
=DATE(2006;-1; 1)=DATE(2005;11;1) True 2 Negative months roll year backward 2
=DATE(2006;4;-1)=DATE(2006;3;30) True 2 Negative days roll month backward 2
=DATE(2006;-4;-1)=DATE(2005;7;30) True 2 Negative days and months roll backward transitively2
=DATE(2003;2;29)=DATE(2003;3;1) True 2 Non-leap year rolls forward 2

See also: TIME, DATEVALUE

Note:
  1. Zero year goes to 2000 in OOo default; interpretation of two-digit years is user-settable
  2. KSpread gives an error rather than wrapping out-of-range numbers.
  3. OOo gives an error for dates less than 15 October 1582, the first day of the Gregorian calendar.
  4. OOo and KSpread display 1/1/1899 rather than adding 1900.

Implementations generally do not allow negative year numbers. Earlier versions of this required that years < 1900 have 1900 added to them, but this would be a harmful restriction. It is likely, though, that extant spreadsheets require that years < 110 have 1900 added to them; should that be required?

DATEVALUE

Summary: Return date serial number from given text.

Syntax: DATEVALUE( Text Date )

Returns: Number

Level: 2

Constraints: None

Semantics: This computes the serial number of the text string Date, using the current locale. It is semantically equal VALUE(Date) if Date has a date format, since text matching a date format is automatically converted to a serial number when used as a Number. If the text of Date does not have a date format, an implementation may return an error. See VALUE for more information on date formats.

Discussion: If text is tied to locale how is the value deterministic as it passes across locale boundaries? In order to promote interoperability should we not specify a format to the text. Implementations can reformat to locale and back for UI. I Suggest YYYY/MM/DD.

Having thought about this a bit, I see that it wouldn't work to specify a format as the function accepts references, and the structure of the text in there cannot be changed. Therefore we will have to specify rules for parsing, e.g. first by locale, then by switching DD and MM in the locale. Could also add an optional param that specifies the locale, this goes beyond any spreadsheet I know of now. -- Richard Kernick

Test Cases:

Expression Result Level Comments
=DATEVALUE("2004-12-25")=DATE(2004;12;25) True 2 DATEVALUE.

Note: Excel 2000 doesn't allow the input to be a reference, but Excel 2002 does (it allows normal operations). DATEVALUE is widespread, but is not available in SheetToGo. Note that text in a date format is converted to a serial number anyway. Other than the test for "date-formattedness", which you can't count on, this is basically the same as VALUE().

TBD: Make sure there are no level 1 tests that use DATEVALUE.

See also: TIME, DATE

DAY

Summary: Extract the day from a date.

Syntax: DAY( Number Date )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the day portion of the date.

Test Cases:

Expression Result Level Comments
=DAY(DATE(2006;5;21)) 21 1 Basic extraction.

See also: MONTH YEAR

HOUR

Summary: Extract the hour (0 through 23) from a time.

Syntax: HOUR( Number T )

Returns: Number

Level: 1

Constraints: None

Semantics: Extract from T the hour value, 0 through 23, as per a 24-hour clock. This is equal to:

DayFraction=(T-INT(T))
Hour=INT(DayFraction*24)

Test Cases:

Expression Result Level Comments
=HOUR(5/24) 5 1 5/24ths of a day is 5 hours, aka 5AM.
=HOUR(5/24-1/(24*60*60)) 4 1 A second before 5AM, it's 4AM.

See also: MONTH DAY MINUTE SECOND

MINUTE

Summary: Extract the minute (0 through 59) from a time.

Syntax: MINUTE( Number T )

Returns: Number

Level: 1

Constraints: None

Semantics: Extract from T the minute value, 0 through 59, as per a clock. This is equal to:

DayFraction=(T-INT(T))
HourFraction=(DayFraction*24-INT(DayFraction*24))
Minute=INT(HourFraction*60)

Test Cases:

Expression Result Level Comments
=MINUTE(1/(24*60)) 1 1 1 minute is 1/(24*60) of a day.
=MINUTE(TODAY()+1/(24*60)) 1 1 If you start with today, and add a minute, you get a minute.
=MINUTE(1/24) 0 1 At the beginning of the hour, we have 0 minutes.

See also: MONTH DAY HOUR SECOND

MONTH

Summary: Extract the month from a date.

Syntax: MONTH( Number Date )

Returns: Number

Level: 1

Constraints: None

Semantics: Takes a date and returns the month portion.

Test Cases:

Expression Result Level Comments
=MONTH([.C7]) 1 1 Month extraction from date in cell.
=MONTH(DATE(2006;5;21)) 5 1 Month extraction from DATE() value.

See also: YEAR DAY

NOW

Summary: Return the serial number of the current date and time.

Syntax: NOW()

Returns: DateTime

Level: 1

Constraints: None

Semantics: This returns the current day and time serial number, using the current locale. If you want only the serial number of the curren day, TODAY.

Test Cases:

Expression Result Level Comments
=NOW()>DATE(2006;1;3) True 1 NOW constantly changes, but we know it's beyond this date.
=INT(NOW())=TODAY() True 1 NOW() is part of TODAY(). WARNING: this test is allowed to fail if the locale transitions through midnight while computing this test; this failure is incredibly unlikely to occur in practice.

See also: TIME, TODAY

SECOND

Summary: Extract the second (the integer 0 through 59) from a time. This function presumes that leap seconds never exist.

Syntax: SECOND( Number T )

Returns: Number

Level: 1

Constraints: None

Semantics: Extract from T the second value, 0 through 59, as per a clock. Note that this returns an integer, without a fractional part. Note also that this rounds to the nearest second, instead of returning the integer part of the seconds. This is equal to:

DayFraction=(T-INT(T))
HourFraction=(DayFraction*24-INT(DayFraction*24))
MinuteFraction=(HourFraction*60-INT(HourFraction*60))
Second=ROUND(MinuteFraction*60)

Test Cases:

Expression Result Level Comments
=SECOND(1/(24*60*60)) 1 1 This is one second into today.
=SECOND(1/(24*60*60*2)) 1 1 Rounds.
=SECOND(1/(24*60*60*4)) 0 1 Rounds.

See also: MONTH DAY HOUR MINUTE

Rationale: Excel 2002, at least, rounds -- which is possibly counter-intuitive.

TIME

Summary: Construct time from hours, minutes, and seconds.

Syntax: TIME( Number hours ; Number minutes ; Number seconds )

Returns: Number representing time (as a fraction of a day); display usually formatted in current locale

Level: 1

Constraints: None

Semantics: Returns the fraction of the day consumed by the given time, i.e.:

((hours*60*60)+(minutes*60)+seconds)/(24*60*60)

Time is considered a number, where a time value of 1 = 1 day = 24 hours. Note that time is considered a fraction between 0 and 1, so typical implementations will only be able to compute approximations of the correct time value. Level 2 and higher systems MUST accept arbitrary integers for hours, minutes, and seconds; level 1 systems MAY limit time to integers where hours are limited to 0 through 23 (inclusive), hours from 0 through 59 (inclusive), and seconds from 0 through 59 (inclusive).

Note that in typical implementations, if the display is forced to be "time", the integer is discarded and then time is computed; for computational purposes, though, the entire value is retained.

Test Cases:

Expression Result Level Comments
=TIME(0;0;0) 0 1 All zero arguments becomes midnight, 12:00:00 AM.
=ABS(TIME(23;59;59)*60*60*24-86399)<1e-6 True 1 This is 11:59:59 PM.
=ABS(TIME(11;125;144)*60*60*24-47244)<1e-6 True 2 Seconds and minutes roll over transitively; this is 1:07:24 PM.
=ABS(TIME(11;0; -117)*60*60*24-39483)<1e-6 True 2 Negative seconds roll minutes backwards, 10:58:03 AM
=ABS(TIME(11;-117;0)*60*60*24-32580)<1e-6 True 2 Negative minutes roll hours backwards, 9:03:00 AM
=ABS(TIME(11;-125;-144)*60*60*24-31956)<1e-6 True 2 Negative seconds and minutes roll backwards transitively, 8:52:36 AM


See also: DATE

TBD: For the moment, tests involving fractions and negative values have been removed. There seems to be no consensus on them.

Note: Lotus1-2-3v9 limits TIME values to "reasonable" values.


TODAY

Summary: Return the serial number of today.

Syntax: TODAY()

Returns: Date

Level: 1

Constraints: None

Semantics: This returns the current day's serial number, using current locale. This only returns the date, not the datetime value; if you need the specific time of day as well, use NOW.

Test Cases:

Expression Result Level Comments
=TODAY()>DATE(2006;1;3) True 1 Every date TODAY() changes, but we know it's beyond this date.
=INT(TODAY())=TODAY() True 1 TODAY() returns an integer. WARNING: this test is allowed to fail if the locale transitions through midnight while computing this test; it's incredibly unlikely to occur in practice.

See also: TIME, NOW

WEEKDAY

Summary: Extract the day of the week from a date; if text, uses current locale to convert to a date.

Syntax: WEEKDAY( Number Date [ ; Number Type ] )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the day of the week from a date, as a number from 0 through 7. The exact meaning depends on the value of Type:

  1. When Type is 1, Sunday is the first day of the week, with value 1; Saturday has value 7.
  2. When Type is 2, Monday is the first day of the week, with value 1; Sunday has value 7.
  3. When Type is 3, Monday is the first day of the week, with value 0; Sunday has value 6.


Day of Week Type=1 Result Type=2 Result Type=3 Result
Sunday 1 7 6
Monday 2 1 0
Tuesday 3 2 1
Wednesday 4 3 2
Thursday 5 4 3
Friday 6 5 4
Saturday 7 6 5


Test Cases:

Expression Result Level Comments
=WEEKDAY(DATE(2006;5;21)) 1 1 Year-month-date format
=WEEKDAY(DATE(2005;1;1)) 7 1 Saturday.
=WEEKDAY(DATE(2005;1;1);1) 7 1 Saturday.
=WEEKDAY(DATE(2005;1;1);2) 6 1 Saturday.
=WEEKDAY(DATE(2005;1;1);3) 5 1 Saturday.

See also: DAY MONTH YEAR

YEAR

Summary: Extract the year from a date given in the current locale of the application.

Syntax: YEAR( Number D )

Returns: Number

Level: 1

Constraints: None

Semantics: Parses a date-formatted string in the current locale's format and returns the year portion.

Note: Should the following go on the "discussion" page?

If a year is given as a two-digit number, as in "05-21-15", then the year returned is either 1915 or 2015, depending upon a break point either wired into the application or given by the application's user.

Three-digit year numbers precede adoption of the Gregorian calendar, and may return either an error or the year number. Four-digit year numbers preceding 1582 (inception of the Gregorian Calendar) may return either an error or the year number. Four-digit year numbers following 1582 should return the year number (but do not do so in applications that set an artificial starting point of 1900 for the "epoch.")


Test Cases:

Expression Result Level Comments
=YEAR(DATE(1904;1;1)) 1904 1 Extracts year from a given date.

See also: MONTH DAY VALUE

Financial Functions

The following functions are specifically for various financial calculations.

DDB

Summary: Compute the amount of depreciation at a given period of time (the default method is double-declining balance).

Syntax: DDB(Number Cost; Number Salvage; Number LifeTime; Number Period [ ; Number Method ])

Returns: Money

Level: 1

Constraints: None.

Semantics: Compute the amount of depreciation of an asset at a given period of time (the default method is double-declining balance). The parameters are:

  • Cost: the total amount paid for the asset.
  • Salvage: the salvage value at the end of the LifeTime (often 0)
  • LifeTime: the number of periods that the depreciation will occur over. Must be a positive integer.
  • Period: the period for which the depreciation value is desired.
  • Method: the method of calculating depreciation. Defaults to 2. If 2, double-declining balance is used.


Test Cases:

Expression Result Level Comment
=DDB(4000;500;4;2) 1000 1 A trivial example of DDB.
=DDB(4000;500;4;2;2) 1000 1 Default method is 2 (double declining balance).

See also: SLN


TBD: Need to define other methods.

FV

Summary: Compute the future value (FV) of an investment.

Syntax: FV(Number Rate; Number Nper; Number Payment; [ Number Pv [ ; Number PayType ] ])

Returns: Money

Level: 1

Constraints: None.

Semantics: Computes the present value of an investment. The parameters are:

  • Rate: the interest rate per period.
  • Nper: the total number of payment periods.
  • Payment: the payment made in each period. If left blank, then Fv must be provided.
  • Pv: the present value; default is 0.
  • PayType: the type of payment, defaults to 0. It is 0 if payments are due at the end of the period; 1 if they are due at the beginning of the period.

See PV for the equation this must solve.


Note that Money is a subtype of Number; it is shown as Money since many implementations track this subtype and will set displays accordingly.

Test Cases:

Expression Result Level Comment
=ABS(FV(10%;12;-100;100)-1824.59)<0.01 True 1 A trivial example of FV.

TBD: Need lots more examples.

See also: PV NPER PMT RATE

IRR

Summary: Compute the internal rate of return for a series of cash flows.

Syntax: IRR(NumberSequence Values [; Number Guess] )

Returns: Percentage

Level: 1

Constraints: None.

Semantics: Compute the internal rate of return for a series of cash flows. If provided, Guess is an estimate of the interest rate to start the iterative computation.

The result of IRR is the rate at which the NPV() function will return zero with the given values. Typical implementations determine this iteratively.

Test Cases:

Expression Result Level Comment
=ABS(IRR([.F24:.F26])-0.418787000165341)<0.00001 True 1 A trivial example of IRR.

See also: NPV RATE

NPER

Summary: Compute the number of payment periods for an investment.

Syntax: NPER(Number Rate; Number Payment; Number Pv; [ Number Fv [ ; Number PayType ] ])

Returns: Number

Level: 1

Constraints: Rate must zero or greater.

Semantics: Computes the number of payment periods for an investment. The parameters are:

  • Rate: the constant interest rate.
  • Payment: the payment made in each period.
  • Pv: the present value of the investment.
  • Fv: the future value; default is 0.
  • PayType: the type of payment, defaults to 0. It is 0 if payments are due at the end of the period; 1 if they are due at the beginning of the period.

If rate is 0, then NPER solves this equation:

PV = - Fv - (Payment * Nper)

If rate is non-zero, then NPER solves this equation:

Pv*(1+Rate)^Nper + Payment * (1 + Rate*PaymentType) *
  ( (1+Rate)^Nper -1)/Rate + Fv = 0

Pv * (1 + Rate)Nper + Payment * (1 + Rate * PaymentType) * ((1 + Rate)Nper − 1) / Rate + Fv = 0

Applications at level 2 and higher MUST support negative rates; level 1 applications need not.

Test Cases:

Expression Result Level Comment
=ABS(NPER(5%;-100;1000)-14.2067)<0.005 True 1 A trivial example of NPER.
=ABS(NPER(5%;-100;1000;100)-15.2067)<0.005 True 1 A trivial example of NPER with non-zero FV.
=ABS(NPER(5%;-100;1000;100;1)-14.2067)<0.005 True 1 A trivial example of NPER with non-zero FV and PayType.
=NPER(0;-100;1000) 10 1 Rate can be zero.
=ABS(NPER(-1%;-100;1000)-9.483283066)<1e-2 True 1 Rate can be negative.

Note: Gnumeric gives an error for negative rates. Excel and OOo2 do not. For NPER(-1%;-100;1000), OOo2 gives 9.48, Excel produces 9.483283066, Gnumeric gives a #DIV/0 error. This appears to be a bug in Gnumeric.

See also: FV RATE PMT PV

NPV

Summary: Compute the net present value (NPV) for a series of periodic cash flows.

Syntax: NPV(Number Rate; { NumberSequence Value } )

Returns: Money

Level: 1

Constraints: None.

Semantics: Computes the net present value for a series of periodic cash flows with the discount rate Rate. Values should be positive if they are received as income, and negative if the amounts are paid as outgo.

If n is the number of values in the NumberSequence Value, the formula for NPV is:

NPV = sum (from i=1 to n) ( values[i] / (1+Rate)^i)

Test Cases:

Expression Result Level Comment
=NPV(100%;4;5;7) 4.125 1 A trivial example of NPV
=NPV(100%;[.C4:.C6]) 4.125 1 Note that each number in a range is considered separately.
=ABS(NPV(10%;100;200)-256.198347107438)<0.01 True 1 A more interesting value.

See also: FV IRR NPER PMT PV

PMT

Summary: Compute the payment made each period for an investment.

Syntax: PMT(Number Rate; Number Nper; Number Pv; [ Number Fv [ ; Number PayType ] ])

Returns: Money

Level: 1

Constraints: None.

Semantics: Computes the payment made each period for an investment. The parameters are:

  • Rate: the interest rate per period.
  • Nper: the total number of payment periods.
  • Pv: the present value of the investment.
  • Fv: the future value of the investment; default is 0.
  • PayType: the type of payment, defaults to 0. It is 0 if payments are due at the end of the period; 1 if they are due at the beginning of the period.

If rate is 0, the following equation is solved:

PV = - Fv - (Payment * Nper)

If rate is nonzero, then PMT solves this equation:

Pv*(1+Rate)^Nper + Payment * (1 + Rate*PaymentType) *
  ( (1+Rate)^nper -1)/Rate + Fv = 0

Note that Money is a subtype of Number; it is shown as Money since many implementations track this subtype and will set displays accordingly.

Test Cases:

Expression Result Level Comment
=ABS(PMT(5%;12;1000)+112.82541)<0.005 True 1 A trivial example of PMT.
=ABS(PMT(5%;12;1000;100)+119.10795)<0.005 True 1 A trivial example of PMT with non-zero FV.
=ABS(PMT(5%;12;1000;100;1)+113.43614)<0.005 True 1 A trivial example of PMT with non-zero FV and PayType.
=PMT(0;10;1000) -100 1 Rate can be zero.

See also: FV NPER PV RATE

PV

Summary: Compute the present value (PV) of an investment.

Syntax: PV(Number Rate; Number Nper; Number Payment; [ Number Fv [ ; Number PayType ] ])

Returns: Money

Level: 1

Constraints: None.

Semantics: Computes the present value of an investment. The parameters are:

  • Rate: the interest rate per period.
  • Nper: the total number of payment periods.
  • Payment: the payment made in each period. If left blank, then Fv must be provided.
  • Fv: the future value; default is 0.
  • PayType: the type of payment, defaults to 0. It is 0 if payments are due at the end of the period; 1 if they are due at the beginning of the period.

If rate is 0, then:

PV = - Fv - (Payment * Nper)

If rate is nonzero, then PV solves this equation:

Pv*(1+Rate)^Nper + Payment * (1 + Rate*PaymentType) *
  ( (1+Rate)^nper -1)/Rate + Fv = 0

Note that Money is a subtype of Number; it is shown as Money since many implementations track this subtype and will set displays accordingly.

Test Cases:

Expression Result Level Comment
=ABS(PV(10%;12;-100;100)-649.51)<0.01 True 1 A trivial example of PV.

TBD: Need lots more examples.

See also: FV NPER PMT RATE

RATE

Summary: Compute the interest rate per period of an investment.

Syntax: RATE(Number Nper; Number Payment; Number Pv; [ Number Fv [ ; Number PayType [; Number Guess] ] ])

Returns: Percentage

Level: 1

Constraints: If Nper is 0 or less than 0, the result is an error.

Semantics: Computes the interest rate of an investment. The parameters are:

  • Nper: the total number of payment periods.
  • Payment: the payment made in each period. If left blank, then Fv must be provided.
  • Pv: the present value of the investment.
  • Fv: the future value; default is 0.
  • PayType: the type of payment, defaults to 0. It is 0 if payments are due at the end of the period; 1 if they are due at the beginning of the period.
  • Guess: An estimate of the interest rate to start the iterative computation.

RATE solves this equation:

Pv*(1+Rate)^Nper + Payment * (1 + Rate*PaymentType) *
  ( (1+Rate)^Nper -1)/Rate + Fv = 0

Pv * (1 + Rate)Nper + Payment * (1 + Rate * PaymentType) * ((1 + Rate)Nper − 1) / Rate + Fv = 0

Test Cases:

Expression Result Level Comment
=ABS(RATE(12;-100;1000)-2.92285%)<0.005 True 1 A trivial example of RATE.
=ABS(RATE(12;-100;1000;100)-1.623133%)<0.005 True 1 A trivial example of RATE with non-zero FV.
=ABS(RATE(12;-100;1000;100;1)-1.996455%)<0.005 True 1 A trivial example of RATE with non-zero FV and PayType.
=ABS(RATE(12;-100;1000;100;1;1%)-1.996455%)<0.005 True 1 A trivial example of RATE with a guess.
=RATE(0;-100;1000) Error 1 Nper must be greater than 0.

TBD: In Gnumeric, Nper seems to be truncated if it is not an integer. In OOo2, Nper seems not to be truncated if it is not an integer and the fractional part is included in the calculation. What does excel do? We should include a test for this, but what is the right thing to do? For example, in Gnumeric, RATE(12.9,-100,1000) is 2.92% but in OOo2 RATE(12.9,-100,1000) is 3.88%.

See also: FV NPER PMT PV

SLN

Summary: Compute the amount of depreciation at a given period of time using the straight-line depreciation method.

Syntax: DDB(Number Cost; Number Salvage; Number LifeTime)

Returns: Money

Level: 1

Constraints: None.

Semantics: Compute the amount of depreciation of an asset at a given period of time using straight-line depreciation. The parameters are:

  • Cost: the total amount paid for the asset.
  • Salvage: the salvage value at the end of the LifeTime (often 0)
  • LifeTime: the number of periods that the depreciation will occur over. Must be a positive integer.

For alternative methods to compute depreciation, see DDB.

Test Cases:

Expression Result Level Comment
=SLN(4000;500;4) 875 1 A trivial example of SLN.

See also: DDB

SYD

Summary: Compute the amount of depreciation at a given period of time using the sum-of-years' digits method.

Syntax: SYD(Number Cost; Number Salvage; Number LifeTime; Number Period)

Returns: Money

Level: 1

Constraints: None.

Semantics: Compute the amount of depreciation of an asset at a given period of time using the sum-of-years' digits method. The parameters are:

  • Cost: the total amount paid for the asset.
  • Salvage: the salvage value at the end of the LifeTime (often 0)
  • LifeTime: the number of periods that the depreciation will occur over. Must be a positive integer.
  • Period: the period for which the depreciation value is desired.

For other methods of computing depreciation, see DDB.


Test Cases:

Expression Result Level Comment
=SYD(4000;500;4;2) 1050 1 A trivial example of SYD. Note that DDB would have calculated 1000 instead.

See also: SLN DDB

Information Functions

Information functions provide information about the spreadsheet or underlying environment, including special functions for converting between data types.

AREAS

CELL

COLUMNS

Summary: Returns the number of columns in a given range

Syntax: COLUMNS( Range R )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the number of columns in the range specified. The result is not dependent on the cell content in the range.

Test Cases:

Expression Result Level Comment
=COLUMNS([.C1]) 1 1 Single cell range contains one column.
=COLUMNS([.C1:.C4]) 1 1 Range with only one column.
=COLUMNS([.A4:.D100]) 4 1 Number of columns in range.

See also: ROWS

Note: Need to determine that null range and sheet traversal works in other implementations. Only checked OO. Made Level 2 for now on assumption that not all will support it. The null range gives an error in Gnumeric and Excel, so removed test.

Note: Excel supports inline Arrays. OpenOffice.org 2 does not support inline arrays.

TBD: Need to recheck type given here. It's really "Reference or Array"; we probably need to define a type that means that.


COUNT

Summary: Count the number of Numbers provided

Syntax: COUNT( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: None.

Semantics: Counts the numbers in the list of NumberSequences provided. Only numbers in references are counted; all other types are ignored. Errors are not propogated. For level 2 and higher, if no parameters are passed the result is 0.

Test Cases:

Expression Result Level Comment
=COUNT(1;2;3) 3 1 Simple count.
=COUNT(FALSE();2;3) 3 2 Logical is treated as a number if inline; it is converted to a number, and thus counted (when inline).
=COUNT([.B6];2;3) 2 2 Logical types are ignored in reference parameters.
=COUNT([.B4:.B5]) 2 1 Two numbers in the range.
=COUNT([.B4:.B5];[.B4:.B5]) 4 1 Duplicates are not removed.
=COUNT([.B4:.B9]) 2 2 Errors in referenced cells or ranges are ignored.
=COUNT([.B4:.B8];1/0) 2 2 Errors in direct parameters are still ignored.
=COUNT([.B3:.B6]) 2 3 In level 3 and higher, conversion to NumberSequence ignores strings (in B3) and logical values (a TRUE() in B6).
=COUNT() 0 3 No parameters are allowed, result is 0.

See also: COUNTA

Note: Errors are propogated and logicals are treated as numbers in OOo2, hence these items at level 2. Note the result for COUNT(FALSE();2;3).

COUNTA

Summary: Count the number of non-empty values

Syntax: COUNTA( { AnySequence A } )

Returns: Number

Level: 1

Constraints: None.

Semantics: Counts the number of non-blank values in the list of AnySequences provided. A value is non-blank if it contains any content of any type, including an error. In a reference, every cell that is not blank is included in the count. An empty string value ("") is not considered blank. Errors contained in a range are considered a value for purposes of the count; errors do not propagate. Constant expressions or formulas are allowed; these are evaluated and if they produce an error value the error value is counted as one.

Test Cases:

Expression Result Level Comment
=COUNTA("1";2;TRUE()) 3 1 Simple count of 3 constant values.
=COUNTA([.B3:.B5]) 3 1 Three non-empty cells in the range.
=COUNTA([.B3:.B5];[.B3:.B5]) 6 1 Duplicates are not removed
=COUNTA([.B3:.B9]) 6 1 Where B9 is "=1/0", i.e. an error, counts the error as non-empty; errors contained in a reference do not propogate the error into the result.
=COUNTA("1";2;1/0) 3 2 An error in the list of values is just counted; errors in a constant parameter do not propagate.
=COUNTA("1";2;SUM([.B3:.B9])) 3 2 Errors in an evaluated formula do not propagate; they are just counted.
=COUNTA("1";2;[.B3:.B9]) 8 2 Errors in an evaluated formula do not propagate; they are just counted.

See also: COUNT, ISBLANK

COUNTBLANK

Summary: Count the number of blank values

Syntax: COUNTBLANK( Reference R )

Returns: Number

Level: 1

Constraints: Implementations are only REQUIRED to support references as a parameter.

Semantics: Counts the number of blank cells in the Reference provided. A cell is blank if it cell is empty. A cell with numeric value zero ('0') is not blank. At Level 2 a cell returning the empty string ("") is considered blank.

Test Cases:

Expression Result Level Comment
=COUNTBLANK([.B3:.B10]) 1 1 Only B8 is blank. Zero ('0') in B10 is not considered blank.

See also: COUNT, COUNTA, COUNTIF, ISBLANK

Rationale: Excel treats "" as blank, OO as non-blank, therefore only Level 2 conformant implementations are required to treat "" as blank.

TBD: If some cell has the formula IF(TRUE();"";"False"), will ISBLANK consider the cell blank? Should we forbid COUNTBLANK("")?

COUNTIF

Summary: Count the number of cells in a range that meet a criteria.

Syntax: COUNTIF( Range R, Criteria C )

Returns: Number

Level: 1

Constraints: Does not accept constant values as the range parameter.

Semantics: Counts the number of cells in the Range provided which meet the criteria C. The criteria may be a number, an expression, or a text string.

Test Cases:

Expression Result Level Comment
=COUNTIF([.B4:.B5];">2.5") 1 1 B4 is 2 and B5 is 3, so there is one cell in the range with a value greater than 2.5.
=COUNTIF([.B3:.B5];[.B4]) 1 1 Test if a cell equals the value in [.B4].
=COUNTIF("";[.B4]) Error 1 Constant values are not allowed for the range.
=COUNTIF([.B3:.B10];"7") 1 2 [.B3] is the string "7".
=COUNTIF([.B3:.B10];1+1) 1 1 The criteria can be an expression.

Note: In OOo2, the criteria optionally supports regular expressions, if this option is turned on.

TBD: COUNTIF([.B3:.B10];"7") produces 1, not 0, in Excel 2002. OOo2 works the same as excel, but Gnumeric returns a 0. The test is currently set to level 2 because of the difference. Are the strings ("7") being converted to numbers before comparison in excel and OOo2? Or are they doing a string comparison?

See also: COUNT, COUNTA, COUNTBLANK, SUMIF

ERROR.TYPE

Summary: Returns Number representing the specific error type.

Syntax: ERROR.TYPE(Error E)

Returns: Number

Level: 2

Constraints: None.

Semantics: Returns a number representing exactly what kind of error has occurred. Note that unlike most functions, this function does not propogate error values. Receiving a non-error value returns an error.

Test Cases:

Expression Result Level Comment
=ERROR.TYPE(NA()) 7 2 By convention, the ERROR.TYPE of NA() is 7.
=ERROR.TYPE(0) Error 2 Non-errors produce an error.

See also: NA


Note: Excel 2003 has the following error values (with ERROR.TYPE values), according to Walkenbach 2003, page 49 and Simon's "Excel 2000 in a Nutshell" page 527:
  • #DIV/0! (2) - Attempt to divide by zero, including division by an empty cell.
  • #NAME? (5) - Unrecognized/deleted name.
  • #N/A (7) - NA. Lookup functions which failed and NA() return this value.
  • #NULL! (1) - Intersection of ranges produced zero cells.
  • #NUM! (6) - Failed to meet domain constraints (e.g., input was too large or too small)
  • #REF! (4) - Reference to invalid cell.
  • #VALUE! (3) - Parameter is wrong type.

TBD: Should we list a standard set of error values? Currently, we are keeping maximum flexibility by simply asserting that there are error values without saying what they are other than NA.

TBD: Should this function be internally named "ERRORTYPE" instead of "ERROR.TYPE"? Note that OOo2 uses ERRORTYPE, not ERROR.TYPE, and implementations are free to display ERROR.TYPE if they would like to. Function names generally don't have "." in them, and the "." is also used to separate sheetnames from cellreferences. This dual use of "." isn't a problem in the ODF exchange syntax (because "." before a cell reference can only occur inside square brackets), but it creates a minor parsing complication when accepting human input if (1) the human doesn't enter the square brackets (a typical use), and (2) the number of columns greatly exceeds current common maximums (a likely future possibility). A column named "ERROR" is obviously far beyond today's sheets (which typically stop at "IV"), but future sheets will presumably have fewer limits, and in any case coding around special cases is not desirable. If they must code around anything, they might code in "ERROR.TYPE" as a special case.

TBD: OpenOffice.org 2's ERRORTYPE() returns _DIFFERENT_ values, e.g., ERRORTYPE(NA()) returns 32767, not 7. We need to raise this function beyond 1; what should it be raised to?

ISBLANK

Summary: Return TRUE if the referenced cell is blank, else return FALSE

Syntax: ISBLANK( Any X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Number, Text, or Logical, return FALSE. If X is a reference to a cell, examine the cell; if it is blank (has no value), return TRUE, but if it has a value, return FALSE. A cell with the empty string is not considered blank.

Test Cases:

Expression Result Level Comment
=ISBLANK(1) False 1 Numbers return false.
=ISBLANK("") False 1 Text, even empty string, returns false.
=ISBLANK([.B8]) True 1 Blank cell is true.
=ISBLANK([.B7]) False 1 Non-blank cell is false.

See also: ISNUMBER, ISTEXT

ISERR

Summary: Return True if the parameter has type Error and is not NA, else return False.

Syntax: ISERR( Scalar X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Error, and ISNA(X) is not true, returns TRUE. Otherwise it returns FALSE. Note that this function returns False if given NA(); if this is not desired, use ISERROR. Note that this function does not propogate error values.

ISERR(X) is the same as:

IF(ISNA(X),FALSE(),ISERROR(X))

Test Cases:

Expression Result Level Comment
=ISERR(1/0) True 1 Error values other than NA() return true.
=ISERR(NA()) False 1 NA() does NOT return True.
=ISERR("#N/A") False 1 Text is not an error.
=ISERR(1) False 1 Numbers are not an error.

See also: ERROR.TYPE, ISERROR, ISNUMBER, ISTEXT, NA

Note: In Lotus 1-2-3v9, the function named ISERR actually maps to ISERROR, not to ISERR. This is because in Lotus 1-2-3v9, @ISERR(@NA) is 1 (True). In Quattro Pro, Excel, OpenOffice.org, SheetToGo, etc., ISERR(NA()) is 0 / False. The function named ISERR is given here with the semantics of the majority.

ISERROR

Summary: Return TRUE if the parameter has type Error, else return FALSE

Syntax: ISERROR( Scalar X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Error, returns TRUE, else returns FALSE. Note that this function returns True if given NA(); if this is not desired, use ISERR. Note that this function does not propogate error values.

Test Cases:

Expression Result Level Comment
=ISERROR(1/0) True 1 Error values return true.
=ISERROR(NA()) True 1 Even NA().
=ISERROR("#N/A") False 1 Text is not an error.
=ISERROR(1) False 1 Numbers are not an error.

See also: ERROR.TYPE, ISERR, ISNA, ISNUMBER, ISTEXT, NA

Note: KSpread 1.4.2 lacks ISERROR, ISERR, and ISNA. However, KSpread developer Tomas Mecir announced on January 9, 2006, that he had added implementations of these functions due to an early draft of the OpenFormula specification, and a future release of KSpread will include these functions.

OOo2's ISERROR() function lets some errors slip through, instead of being captured. E.G., ISERROR(CHOOSE(0;"Apple";"Orange";"Grape";"Perry")) produces an error, rather than TRUE.

TBD: Lotus 1-2-3 and Quattro Pro lack ISERROR; does this need to move up to a higher level? They work as though NA were not an error value, but something else.. but it's believed NA can still be modelled as an Error value even in those cases, because it has the same effect.

ISLOGICAL

Summary: Return TRUE if the parameter has type Logical, else return FALSE

Syntax: ISLOGICAL( Scalar X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Logical, returns TRUE, else FALSE. Level 1 implementations may not have a distinguished logical type.

Test Cases:

Expression Result Level Comment
=ISLOGICAL(TRUE()) TRUE 1 Logical values return true.
=ISLOGICAL(FALSE()) TRUE 1 Logical values return true.
=ISLOGICAL("TRUE") FALSE 1 Text values are not logicals, even if they can be converted.

See also: ISTEXT, ISNUMBER

ISNONTEXT

Summary: Return TRUE if the parameter does not have type Text, else return FALSE

Syntax: ISNONTEXT( Any X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Text, returns TRUE, else FALSE. References to blank cells are NOT considered text, so a reference to a blank cell will return TRUE.

ISNONTEXT(X) is the same as:

NOT(ISTEXT(X))

Test Cases:

Expression Result Level Comment
=ISNONTEXT(1) TRUE 1 Numbers are not text
=ISNONTEXT(TRUE()) TRUE 1 Logical values are not text.
=ISNONTEXT("1") FALSE 1 Text values are text, even if they can be converted into a number.

See also: ISNUMBER, ISLOGICAL, ISTEXT

ISLOGICAL

Summary: Return TRUE if the parameter has type Logical, else return FALSE

Syntax: ISNA( Any X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is an error type with the NA value, returns TRUE, else FALSE. This function does not propogate error values; if given an error type, it simply returns FALSE.

Test Cases:

Expression Result Level Comment
=ISNA(NA()) TRUE 1 The whole point of ISNA().
=ISNA(1/0) FALSE 1 Division by zero is a different error type.
=ISNA(1) FALSE 1 Numbers are not NA.

See also: ISTEXT, ISNUMBER, ISLOGICAL

ISNUMBER

Summary: Return TRUE if the parameter has type Number, else return FALSE

Syntax: ISNUMBER( Any X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Number, returns TRUE, else FALSE. Level 1 implementations may not have a distinguished logical type; in such implementations, ISNUMBER(TRUE()) is TRUE.

Test Cases:

Expression Result Level Comment
=ISNUMBER(1) =TRUE() 1 Numbers are numbers
=ISNUMBER("1") =FALSE() 1 Text values are not numbers, even if they can be converted into a number.

See also: ISTEXT, ISLOGICAL

ISTEXT

Summary: Return TRUE if the parameter has type Text, else return FALSE

Syntax: ISTEXT( Any X )

Returns: Logical

Level: 1

Constraints: None

Semantics: If X is of type Text, returns TRUE, else FALSE. References to blank cells are NOT considered text.

Test Cases:

Expression Result Level Comment
=ISTEXT(1) =FALSE() 1 Numbers are not text
=ISTEXT("1") =TRUE() 1 Text values are text, even if they can be converted into a number.

See also: ISNONTEXT, ISNUMBER, ISLOGICAL

NA

Summary: Return the constant error value, NA().

Syntax: NA()

Returns: Error

Level: 1

Constraints: Must have 0 parameters

Semantics: This function takes no arguments and returns the error NA.

Test Cases:

Expression Result Level Comment
=ISERROR(NA()) True 1 NA is an error.
=ISNA(NA()) True 1 Obviously, if this doesn't work, NA() or ISNA() is broken.
=ISNA(5+NA()) True 1 NA propogates through various functions and operators, just like any other error type.

See also: ERROR.TYPE, ISERROR, PI

N

Summary: Return the number

Syntax: N( Any X )

Returns: Text

Level: 1

Constraints: None

Semantics: If X is a reference, it is first dereferenced to a scalar. Then its type is examined. If it is of type Number, it is returned. If it is of type Logical, 1 is returned if TRUE else 0 is returned. It is unspecified what happens if it is provided a Text value (some implementations always return 0 if given a value of type Text; others attempt to convert the value to Number and provide the result). This is not intended to be a type-conversion function from Text to Number; for that, see VALUE. Many implementations heuristically determine the correct format of numbers; such implementations SHOULD consider the result to be a general number, even if its input is a date or some other specially-formatted number.

Test Cases:

Expression Result Level Comment
=N(6) 6 1 N does not change numbers.
=N(TRUE()) 1 1 Does convert logicals.
=N(FALSE()) 0 1 Does convert logicals.

See also: T, VALUE

TBD: May need to limit this function's definition to references. Lotus1-2-3v9 only accepts references. But be sure to allow implementations to accept other types.

Note: Many public documents inadequately describe this function; in particular, few seem to describe exactly what happens when text is provided. N("4") produces 0 in Excel, but 4 in OOo2.

ROWS

Summary: Returns the number of rows in a given range

Syntax: ROWS( Range R )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the number of rows in the range specified. The result is not dependent on the cell content in the range.

Test Cases:

Expression Result Level Comment
=ROWS([.C1]) 1 1 Single cell range contains one row.
=ROWS([.C1:.C4]) 4 1 Range with four rows.
=ROWS([.A4:.D100]) 97 1 Number of rows in range.

See also: COLUMNS

Note: Removed tests for null range and sheet traversal works in other implementations. Only checked OO. Made Level 2 for now on assumption that not all will support it. The null range gives an error in Gnumeric and in Excel.

Note: Excel supports inline Arrays. OpenOffice.org 2 does not support inline arrays.

TBD: Need to recheck type given here. It's really "Reference or Array"; we probably need to define a type that means that.



VALUE

Summary: Convert text to number

Syntax: VALUE( Text X )

Returns: Number

Level: 1

Constraints: None

Semantics: Converts given text value X into Number. If X is a reference, it is first dereferenced. VALUE is only specified if it is given a Text value or a reference to a single cell containing a Text value; it is unspecified what happens if VALUE is given is neither a Text value nor a reference to a Text value. If the Text has a date format, it is converted into a serial number. Level 2 and higher systems MUST be able to convert textual dates in ISO 8601 format (YYYY-MM-DD). If the supplied text cannot be converted into a number, an error is returned.

An implementation MUST accept numbers matching this regular expression (if it ends in %, it MUST divide the number by 100):

\$?((\.[0-9]+)|([0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?))%?

An implementation MUST accept also accept fractional values matching this regular expression (note the space between the integer and the fractional portion; values between 0 and 1 can be represented by using 0 for the integer part):

[0-9]+ [0-9]+/[1-9][0-9]?

Implementations MUST support time values in at least the HH:MM and HH:MM:SS formats, where HH is a 1-2 digit value from 0 to 23, MM is a 1-2 digit value from 0 to 59, and SS is a 1-2 digit value from 0 to 59. The hour may be one or two digits when it is less than 10. VALUE converts time values into Numbers ranging from 0 to 1, which is percentage of day that has elapsed by that time. Thus, VALUE("2:00") is the same as 2/24.

Implementations MUST accept dates in VALUE() and convert them to serial numbers. Applications MUST when running in the en_US locale accept the format MM/DD/YYYY (this is necessary to permit uniform testing). At level 2 or higher, implementations MUST accept ISO 8601 format dates (YYYY-MM-DD). Implementations typically accept many locale-specific formats (such as 31-Dec-96, Dec-96, 31-Dec, 12/31/96, and 12/31). At level 3, in locale en_US, applications MUST support the following formats (where YYYY is a 4-digit year, YY a 2-digit year, MM a numerical month, DD a numerical day, mmm a 3-character abbreviated alphabetical name, and mmmmm a full name):

Format Example Comment
MM/DD/YYYY 5/21/2006 LOCALE-DEPENDENT; Long year format with slashes
MM/DD/YY 5/21/06 LOCALE-DEPENDENT; Short year format with slashes
MM-DD-YYYY 5/21/2006 LOCALE-DEPENDENT; Long year format with dashes (short year MAY be supported, but it may also be used for years less than 100 so it's not required)
mmm DD, YYYY Oct 29, 2006 LOCALE-DEPENDENT; Short alphabetic month day, year. Note: mmm depends on the locale's language.
DD mmm YYYY 29 Oct 2006 LOCALE-DEPENDENT; Short alphabetic day month year
mmmmm DD, YYYY October 29, 2006 LOCALE-DEPENDENT; Long alphabetic month day, year
DD mmmmm YYYY 29 October 2006 LOCALE-DEPENDENT; Long alphabetic day month year

Implementations SHOULD support many other locales. Note that many conversions will vary by locale, including the decimal point (comma or period), names of months, date formats (MM/DD vs. DD/MM), and so on. Dates in particular vary by locale. This specification only mandates en_US so that there is a single locale which all applications can test in; it is not presumed that any locale is "better" than another.

Implementations are free to determine other formats that will convert to numbers. Where no conversion is determined, an Error is returned.

Even low-level implementations which do not automatically convert references to a Text value into a Number will convert Text to a Number with this function. Therefore, this is a highly portable function for converting text in a cell into a number elsewhere, and should be used when a conversion from Text to a Number is needed.

Test Cases:

Expression Result Level Comment
=VALUE("6") 6 1 VALUE converts text to numbers (unlike N).
=VALUE("1E5") 100000 1 Works with exponential notation.
=VALUE("200%") 2 1 Optional %.
=VALUE("1.5") 1.5 1 Accepts fractional values. LOCALE-DEPENDENT: in en_US, this is true.
=VALUE("7 1/4") 7.25 1 Fractional part.
=VALUE("0 1/2") 0.5 1 Fractional part.
=VALUE([.B3]) 7 1 VALUE converts references to text to numbers.
=VALUE("00:00") 0 1 VALUE converts time values to numbers between 0 and 1.
=ABS(VALUE("02:00")-2/24)<1e-6 True 1 VALUE converts time values to numbers between 0 and 1.
=ABS(VALUE("2:03")-2/24-3/(24*60))<1e-6 True 1 Time value with hours and minutes.
=ABS(VALUE("2:03:05") -2/24-3/(24*60) -5/(24*60*60)) <1e-6 True 1 Hours, minutes, and seconds.
=VALUE("3/32/2006") Error 1 Invalid date yields an error
=VALUE("2/29/2006") Error 1 "False leap year" yields an error (for 1901 and beyond)
=VALUE("1/2/2005")=DATE(2005;1;2) True 1 LOCALE-DEPENDENT; in en_US, this is true.
=VALUE("2005-01-02")=DATE(2005;1;2) True 2 VALUE converts dates into serial numbers. At level 2, ISO 8601 formats for dates must be supported
=VALUE("5/21/06")=DATE(2006;5;21) True 3 LOCALE-DEPENDENT; Short year format with slashes
=VALUE("Oct 29, 2006")=DATE(2006;10;29) True 3 LOCALE-DEPENDENT; Short alphabetic month day, year
=VALUE("29 Oct 2006")=DATE(2006;10;29) True 3 LOCALE-DEPENDENT; Short alphabetic day month year
=VALUE("October 29, 2006")=DATE(2006;10;29) True 3 LOCALE-DEPENDENT; Long alphabetic month day, year
=VALUE("29 October 2006")=DATE(2006;10;29) True 3 LOCALE-DEPENDENT; Long alphabetic day month year

See also: N, T

Note: In applications where a reference to a text type is considered 0 (and not converted), such as OOo2 and Lotus 1-2-3, VALUE() is how text in a reference is converted into a number, while on systems which automatically do the conversion, VALUE() is a no-op; thus VALUE() is a way of portably converting text into a number on any system. KSpread 1.4.2 incorrectly computes VALUE("6") as 0; it should be 6. Tomas Mecir reports that as of January 9, 2006, the development version of KSpread (in CVS) computes VALUE("6") correctly instead.

Note: Short year formats with dashes, like 5-21-06, aren't required here, because if an implementation supports years before 100 they're not always distinguishable.

TBD: Need more detail on what text formats are converted, and how they are converted. What about time? Datetime? Fractional values like "7 3/7"? This is especially complicated with dates, since in some cases they depend on local settings, and is especially weird with 2-digit years. For more on date conversion, see "Excel 2003 Formulas" page 144-145. For the moment, most date formats are simply not required since they're locale-dependent. Ideally all tests would use ISO 8601 YYYY-MM-DD format, and we'd require support for that at level 1, but Lotus 1-2-3 and Quattro Pro don't support it.

TBD: Can we require "." be accepted in ALL locales as the decimal point?


Lookup Functions

These functions look up information. Note that IF() can be considered a trivial lookup function, but it is listed as a logical function instead.

CHOOSE

Summary: Uses an index to return a value from a list of values.

Syntax: CHOOSE( Number Index ; { Any Value } )

Returns: Any

Level: 1

Constraints: Returns an error if Index < 1 or if there is no corresponding value in the list of Values.

Semantics: Uses Index to determine which value, from a list of values, to return. If Index is 1, CHOOSE returns the first Value; if Index is 2, CHOOSE returns the second value, and so on. Note that the Values may be formulas.

TBD: Should we mandate a specific response for fractional Indexes? Excel does an INT() on Index first. Index less than one?

Test Cases:

Expression Result Level Comment
=CHOOSE(3;"Apple";"Orange";"Grape";"Perry") "Grape" 1 Simple selection.
=CHOOSE(0;"Apple";"Orange";"Grape";"Perry") Error 2 Index has to be at least 1.
=CHOOSE(5;"Apple";"Orange";"Grape";"Perry") Error 2 Index can't refer to non-existent entry.
=CHOOSE(2;SUM([.B4:.B5]);SUM([.B5])) 3 1 Simple selection, using a set of formulas.
=SUM(CHOOSE(2;[.B4:.B5];[.B5])) 3 1 CHOOSE can pass references

See also: IF

Note: OOo2 produces errors if the index is too small or large, but the index cannot be captured by ISERROR. This is a defect in ISERROR. Currently this ISERROR requirement assigned to level 2, to allow automated tests to work correctly.

HLOOKUP

Summary: Look for a matching value in the first row of the given table, and return the value of the indicated row.

Syntax: HLOOKUP( Any Lookup ; Reference DataSource ; Number Row [; Number RangeLookup ] )

Returns: Any

Level: 1

Constraints: Row >= 1.

Semantics: Examine each value in the first row of DataSource in order (starting at the left) until its value matches Lookup. If there is a match, return the value row Row, relative to the DataSource, where the topmost row in DataSource is 1. If there is no match, at level 4 return an error; below level 4, the result is undefined.

TBD: Should we mandate a specific response for fractional Rows and Columns? Excel does an INT() on Index first. What about Areas? Need to describe RangeLookup.

Note: Excel 2003 returns "garbage" data when it can't find a match.

Test Cases:

Expression Result Level Comment
=HLOOKUP("Rev"; [.B18:.I31];2) 13 1 First Rev data entry.
=HLOOKUP("NOSUCHNAME"; [.B18:.I31];2) Error 4 No such value.

See also: CHOOSE INDEX VLOOKUP

INDIRECT

INDEX

Summary: Select a value using a row and column index value (and optionally an area index).

Syntax: INDEX( Reference DataSource ; Number Row ; Number Column [; Number AreaNumber ] )

Returns: Any

Level: 1

Constraints: Row >= 1, Column >= 1, AreaNumber >= 1.


Semantics: Given a datasource, selects the value at the given row and column (starting numbering at 1, relative to the top left of the datasource) of the given area AreaNumber. If AreaNumber is not given, it defaults to 1 (the first and possibly only area). This function is essentially a two-dimensional version of CHOOSE.

TBD: Should we mandate a specific response for fractional Rows and Columns? Excel does an INT() on Index first. What about Areas?

Test Cases:

Expression Result Level Comment
=INDEX([.B3:.C5];2;1) 2 1 Simple index.

See also: CHOOSE

MATCH

Summary: Finds a Search item in a sequence, and returns its position (starting from 1).

Syntax: MATCH(Scalar Search ; AnySequence SearchRegion [; Number MatchType])

Returns: Any

Level: 1

Constraints: -1 <= MatchType < 1

Semantics: Searches inside SearchRegion for the value Search; if it's found, it returns the relative position (starting from 1). Matching is determined using "=", so for text the comparison is case-insensitive. MatchType determines the type of search; if MatchType is 0, the SearchRegion must be considered unsorted, and the first match is returned. If MatchType is 1, the SearchRegion may be assumed to be sorted in ascending order, with smaller numbers before larger ones, smaller text values before larger ones (e.g., "A" before "B", and "B" before "BA"), and False before True. If the types are mixed, Numbers are sorted before Text, and Text before Logicals; implementations without a separate logical type MAY include a Logical as a Number. If MatchType is -1, then SearchRegion may be assumed to be sorted in descending order (the opposite of the above). If MatchType is 1 or -1, implementations may use binary search or other techniques so that they do not need to examine every value in linear order. MatchType defaults to 1.

Test Cases:

Expression Result Level Comment
=MATCH("HELLO";[.B3:.B7];0) 5 1 Simple MATCH().

See also: FIND, SEARCH

TBD: Need more test cases.

OFFSET

VLOOKUP

Summary: Look for a matching value in the first column of the given table, and return the value of the indicated column.

Syntax: VLOOKUP( Any Lookup ; Reference DataSource ; Number Column [; Number RangeLookup ] )

Returns: Any

Level: 1

Constraints: Column >= 1.

Semantics: Examine each value in the first column of DataSource in order (starting at the top) until its value matches Lookup. If there is a match, return the value column Column, relative to the DataSource, where the leftmost column in DataSource is 1. If there is no match, the result is undefined at levels less than 4; at level 4 it is an error.

TBD: Should we mandate a specific response for fractional Rows and Columns? Excel does an INT() on Index first. What about Areas? Need to describe RangeLookup.

Test Cases:

Expression Result Level Comment
=VLOOKUP("Ursa Major"; [.B19:.I31];2) 6 1 Table states Ursa Major has 6 bright stars.
=VLOOKUP("Ursa Major"; [.B19:.I31];4) "Uma" 1 Table states Ursa Major has abbreviation Uma.
=VLOOKUP(2; [.C19:.I31];3) "Cmi" 1 Can match numbers as well as text. Canis Minor is first in table (starting from top).
=VLOOKUP("NoSuchConstellation"; [.B19:.I31];4) Error 4 Error returned if not found, and we're at level 4.

See also: CHOOSE HLOOKUP INDEX

Note: If there's no matching record, OpenOffice.org 2 just chooses a record (apparantly at random) to match.

Logical Functions

The logical functions are the constants TRUE() and FALSE(), the functions that compute logical values NOT(), AND(), and OR(), and the conditional function IF(). The OpenDocument specification mentions "logical operators"; these are simply another name for the logical functions.

Note that because of Error values, any logical function that accepts parameters can actually produce TRUE, FALSE, or an Error value, instead of simply TRUE or FALSE.

TBD: Should minimum/maximum number of parameters be specified?

TBD: Should we specify that "If an error value is computed for an expression, then the first error is the result of the logical operation."? Is this widely done ("first error is the result")? Or is there sometimes a pecking order for errors, or is sometimes "last error is the result" the rule?

AND

Summary: Compute logical AND of all parameters.

Syntax: AND( { Logical L } )

Returns: Logical

Level: 1

Constraints: Must have 1 or more parameters

Semantics: Computes the logical AND of the parameters. If all parameters are True, returns True; if any are False, returns False.

Test Cases:

Expression Result Level Comment
=AND(FALSE();FALSE()) False 1 Simple AND.
=AND(FALSE();TRUE()) False 1 Simple AND.
=AND(TRUE();FALSE()) False 1 Simple AND.
=AND(TRUE();TRUE()) True 1 Simple AND.
=AND(TRUE();NA()) NA 1 Returns an error if given one.
=AND(1;TRUE()) True 1 Nonzero considered TRUE.
=AND(0;TRUE()) False 1 Zero considered FALSE.
=AND(TRUE();TRUE();TRUE()) True 1 More than two parameters okay.
=AND(TRUE()) True 2 One parameter okay - simply returns it.

TBD: Implementations of AND and OR do NOT short-circuit on True and False, because they look for error conditions and propogate them. For example, on Excel: AND(FALSE(), #NA) is #NA. Should the "do not short-circuit" rule be REQUIRED? If it is required, should short-circuiting versions of AND() and OR() be defined, e.g., ANDSC() and ORSC()?

TBD: Microsoft Excel 2003's AND and OR have a serious bug - they do not work correctly in array formulas. See Walkenbach's "Excel 2003 Formulas", page 409-410, and other references. To work around this, in Excel you must use * and + in odd ways. Should this be noted?

See also: OR, IF

FALSE

Summary: Returns constant FALSE

Syntax: FALSE()

Returns: Logical

Level: 1

Constraints: Must have 0 parameters

Semantics: Returns logical constant FALSE.

Test Cases:

Expression Result Level Comment
=FALSE() =FALSE() 1 Constant, so nothing else to test!

See also: TRUE, IF

IF

Summary: Return one of two values, depending on a condition

Syntax: IF( Logical Condition ; Any IfTrue [ ; Any IfFalse ] )

Returns: Any

Level: 1

Constraints: For level 1, must have 3 non-null parameters. For level 2 and higher, must have 2 or more parameters. For level 3 or higher, the second or third parameter may be null.

Semantics: Computes Condition. If it is TRUE, it returns IfTrue, else it returns IfFalse. If there are only 2 parameters, IfFalse is considered to be FALSE. If there are 3 parameters but the third parameter is null, IfFalse is considered to be 0.

Test Cases:

Expression Result Level Comment
=IF(FALSE();7;8) 8 1 Simple if.
=IF(TRUE();7;8) 7 1 Simple if.
=IF(TRUE();"HI";8) ="HI" 1 Can return strings, and the two sides need not have equal types
=IF(1;7;8) 7 1 A non-zero is considered true.
=IF(5;7;8) 7 1 A non-zero is considered true.
=IF(0;7;8) 8 1 A zero is considered false.
=IF(TRUE();[.B4];8) 2 1 The result can be a reference.
=IF(TRUE();[.B4]+5;8) 7 1 The result can be a formula.
=IF("x";7;8) Error 2 Condition has be be convertable to Logical.
=IF("1";7;8) Error 2 Condition has be be convertable to Logical.
=IF("";7;8) Error 2 Condition has be be convertable to Logical.
=IF(FALSE();7) =FALSE() 2 Default IfFalse is false.
=IF(FALSE();7;) 0 3 Empty parameter is considered 0

See also: AND, OR

TBD: Implementations of AND and OR do NOT short-circuit on TRUE and FALSE, because they look for error conditions and propogate them. For example, on Excel: AND(FALSE, #NA) is #NA. Should the "do not short-circuit" rule be REQUIRED? If it is required, should short-circuiting versions of AND() and OR() be defined, e.g., ANDSC() and ORSC()?

Note: Microsoft Excel documentation says that early versions of Lotus 1-2-3 required that IfTrue and IfFalse had to be constant numbers or strings, and could not be formulas or references. This is not an acceptable limitation even for level 1, and there are tests above to make sure that applications do not have such a limitation.

NOT

Summary: Compute logical NOT

Syntax: NOT( Logical L )

Returns: Logical

Level: 1

Constraints: Must have 1 parameter

Semantics: Computes the logical NOT. If given TRUE, returns FALSE; if given FALSE, returns TRUE.

Test Cases:

Expression Result Level Comment
=NOT(FALSE()) =TRUE() 1 Simple NOT, given FALSE.
=NOT(TRUE()) =FALSE() 1 Simple NOT, given TRUE.


See also: AND, IF

OR

Summary: Compute logical OR of all parameters.

Syntax: OR( { Logical L } )

Returns: Logical

Level: 1

Constraints: Must have 1 or more parameters

Semantics: Computes the logical OR of the parameters. If all parameters are False, returns False; if any are True, returns True.

Test Cases:

Expression Result Level Comment
=OR(FALSE();FALSE()) False 1 Simple OR.
=OR(FALSE();TRUE()) True 1 Simple OR.
=OR(TRUE();FALSE()) True 1 Simple OR.
=OR(TRUE();TRUE()) True 1 Simple OR.
=OR(FALSE();NA()) NA 1 Returns an error if given one.
=OR(FALSE();FALSE();TRUE()) True 1 More than two parameters okay.
=OR(TRUE()) True 2 One parameter okay - simply returns it

TBD: Implementations of AND and OR do NOT short-circuit on True and False, because they look for error conditions and propogate them. For example, on Excel: AND(FALSE(), #NA) is #NA. Should the "do not short-circuit" rule be REQUIRED? If it is required, should short-circuiting versions of AND() and OR() be defined, e.g., ANDSC() and ORSC()?

TBD: Microsoft Excel 2003's AND and OR have a serious bug - they do not work correctly in array formulas. To work around this, in Excel you must use * and + in odd ways. Should this be noted?


See also: AND, IF

TRUE

Summary: Returns constant TRUE

Syntax: TRUE()

Returns: Logical

Level: 1

Constraints: Must have 0 parameters

Semantics: Returns logical constant TRUE.

Test Cases:

Expression Result Level Comment
=TRUE() =TRUE() 1 Constant, so nothing else to test!

See also: FALSE, IF

Mathematical Functions

This section describes functions for various mathematical functions, including trigonometic functions like SIN).

Note: It's debatable if SUM and AVERAGE etc. should be listed under mathematical functions or as statistical functions. COUNT could belong here, under statistics, or under information functions. It doesn't really matter; these divisions are arbitrary, simply to make it easier for users to find them.

ABS

Summary: Return the absolute (nonnegative) value.

Syntax: ABS( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: If N < 0, returns -N, otherwise returns N.

TBD: What if it's not a number? Say anything?

Test Cases:

Expression Result Level Comment
=ABS(-4) 4 1 If less than zero, return negation
=ABS(4) 4 1 Positive values return unchanged.

See also:

ACOS

Summary: Return the arc cosine of a number. The angle is returned in radians.

Syntax: ACOS( Number N )

Returns: Number

Level: 1

Constraints: N must be between -1.0 and 1.0. The result must be between 0.0 and PI().

Semantics: Computes the arc cosine of a number, in radians.

Test Cases:

Expression Result Level Comment
=ACOS(SQRT(2)/2)*4/PI() 1 1 arc cosine of SQRT(2)/2 is PI()/4 radians.
=ACOS(TRUE()) 0.0 1 TRUE() is 1 if inline.
=ACOS(-1.0)/PI() 1 1 The result must be between 0.0 and PI().
=ACOS(2.0) Error 1 The argument must be between -1.0 and 1.0.
=ACOS([.B7]) Error 3 The arc cosine of a non-number gives an error, even if it's acquired via reference.

See also: COS, RADIANS, DEGREES

ASIN

Summary: Return the arc sine of a number. The angle is returned in radians.

Syntax: ASIN( Number N )

Returns: Number

Level: 1

Constraints: N must be between -1.0 and 1.0. The result must be between -PI()/2 and PI()/2.

Semantics: Computes the arc sine of a number, in radians.

Test Cases:

Expression Result Level Comment
=ASIN(SQRT(2)/2)*4/PI() 1 1 arc sine of SQRT(2)/2 is PI()/4 radians.
=ASIN(TRUE())*2/PI() 1 1 TRUE() is 1 if inline.
=ASIN(-1)*2/PI() -1 1 The result must be between -PI()/2 and PI()/2.
=ASIN(2) Error 1 The argument must be between -1.0 and 1.0.

See also: SIN, RADIANS, DEGREES

ATAN

Summary: Return the arc tangent of a number. The angle is returned in radians.

Syntax: ATAN( Number N )

Returns: Number

Level: 1

Constraints: The result must be between -PI()/2.0 and PI()/2.0.

Semantics: Computes the arc tangent of a number, in radians.

Test Cases:

Expression Result Level Comment
=ATAN(1)*4/PI() 1 1 arc tangent of 1 is PI()/4 radians.
=ATAN(TRUE())*4/PI() 1 1 TRUE() is 1 if inline.
=ABS(ATAN(-1.0e16)+1.570796) <= .0001 True 1 Check if ATAN gives reasonably accurate results.

See also: ATAN2, TAN, RADIANS, DEGREES

ATAN2

Summary: Return the arc tangent given a coordinate of two numbers. The angle is returned in radians.

Syntax: ATAN2( Number x ; Number y )

Returns: Number

Level: 1

Constraints: The result must be between -PI() and PI().

Semantics: Computes the arc tangent of two numbers (the x and y coordinates of a point), in radians. Similar to ATAN(y/x), but the signs of the two numbers are taken into account so that the result covers the full range from -PI() to +PI(). It is unspecified what ATAN2(0;0) returns.

Test Cases:

Expression Result Level Comment
=ATAN2(1;1)*4/PI() 1 1 arc tangent of 1.0/1.0 is PI()/4 radians.
=ATAN2(1;-1)*4/PI() -1 1 Location of sign makes a difference.
=ATAN2(-1;1)*4/PI() 3 1 Location of sign makes a difference.
=ATAN2(-1;-1)*4/PI() -3 1 Location of sign makes a difference.
=SIGN(ATAN2(-1.0;0.001)) 1 1 The result must be between -PI() and PI().
=SIGN(ATAN2(-1.0;-0.001)) -1 1 The result must be between -PI() and PI().
=ATAN2(-1.0;0)/PI() 1 1 By definition ATAN2(-1,0) should give PI() rather than -PI().

TBD: Should ATAN2(-1;0) result to PI() or -PI()? Gnumeric and Excel 2002 gives +PI().

TBD: Should ATAN2(0;0) result in 0, Error, or something else? OOo2 and Gnumeric return 0. Excel 2002 produces #DIV/0! For the moment, left unspecified.

TBD: KSpread 1.4.2 does not include a working version of ATAN2, though it's documented. Should ATAN2 be moved to level 2? Or do later versions include it?

See also: ATAN, TAN, RADIANS, DEGREES


AVERAGE

Summary: Average the set of numbers

Syntax: AVERAGE( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: At least one number included. Returns an error if no numbers provided.

Semantics: Computes SUM(List) / COUNT(List).

Test Cases:

Expression Result Level Comment
=AVERAGE(2;4) 3 1 Simple average.

See also: SUM, COUNT

COS

Summary: Return the cosine of an angle specified in radians.

Syntax: COS( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Computes the cosine of an angle specified in radians.

Test Cases:

Expression Result Level Comment
=COS(PI()/4)*2/SQRT(2) 1 1 cosine of PI()/4 radians is SQRT(2)/2.
=ABS(COS(PI()/2))<1e-15 True 1 cosine of PI()/2 radians is 0; the test here gives a little "wiggle room" for computational inaccuracies.
=COS([.B7]) Error 2 The cosine of a non-number gives an error.

See also: ACOS, RADIANS, DEGREES

DEGREES

Summary: Convert radians to degrees.

Syntax: DEGREES( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Converts a number in radians into a number in degrees. DEGREES(N) is equal to N*180/PI().

Test Cases:

Expression Result Level Comment
=DEGREES(PI()) 180 1 PI() radians is 180 degrees.

See also: RADIANS, PI

EVEN

Summary: Rounds a number up to the nearest even integer. Rounding is away from zero.

Syntax: EVEN( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the even integer whose sign is the same as N's and whose absolute value is greater than or equal to the absolute value of N. That is, if rounding is required, it is rounded away from zero.

TBD: What if it's not a number? Say anything?

Test Cases:

Expression Result Level Comment
=EVEN(6) 6 1 Positive even integers remain unchanged.
=EVEN(-4) -4 1 Negative even integers remain unchanged.
=EVEN(1) 2 1 Non-even positive integers round up.
=EVEN(0.3) 2 1 Positive floating values round up.
=EVEN(-1) -2 1 Non-even negative integers round down.
=EVEN(-0.3) -2 1 Negative floating values round down.
=EVEN(0) 0 1 Since zero is even, EVEN(0) returns zero.

See also: ODD

EXP

Summary: Return the exponential of a number.

Syntax: EXP( Number N)

Returns: Number

Level: 1

Constraints: None

Semantics: Computes the exponential of a number (e raised to the power of the number).

TBD: Is there a requirement for the precision on e (i.e. exp(1))?

Test Cases:

Expression Result Level Comment
=EXP(0) 1 1 Anything raised to the 0 power is 1.
=EXP(LN(2)) 2 1 The EXP function is the inverse of the LN function.
=ABS(EXP(1)-2.71828182845904)<=0.000001 True 1 The value of e.

See also: LOG, LN

FACT

Summary: Return factorial (!).

Syntax: FACT(Number F)

Returns: Number

Level: 1

Constraints: F >= -1. Implementations less than level 4 are not required to accept -1 < F < 0.

Semantics: Return F!. This function is not required to handle fractional factorials; it may first compute INT(F). Values of F between 0 and -1 are unspecified. Thus, this function cannot portably substitute for the gamma function.

Test Cases:

Expression Result Level Comment
=FACT(0) 1 1 Factorial of 0 is 1
=FACT(1) 1 1 Factorial of 1 is 1
=FACT(2) 2 1 Factorial of 2 is 2
=FACT(3) 6 1 Factorial of 3 is 6
=FACT(-1) Error 1 Requires F >= -1

See also: Infix Operator "*"

TBD: Should we mandate that it compute INT() first? Excel 2002 computes the INT() of the parameter first, and then computes the factorial. It'd be more flexible if we allowed fractional values. This has been reworded to note that it's only REQUIRED to support integer parameters, and then permitting gamma() for fractional values.

INT

Summary: Rounds a number down to the nearest integer.

Syntax: INT( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the nearest integer whose value is less than or equal to N. Rounding is towards negative infinity.

Test Cases:

Expression Result Level Comment
=INT(2) 2 1 Positive integers remain unchanged
=INT(-3) -3 1 Negative integers remain unchanged
=INT(1.2) 1 1 Positive floating values are truncated
=INT(1.7) 1 1 It doesn’t matter if the fractional part is > 0.5
=INT(-1.2) -2 1 Negative floating values round towards negative infinity
=INT((1/3)*3) 1 1 Naive users expect INT to "correctly" make integers even if there are limits on precision.

Specific error mesages for non-numeric entries may vary among implementations.

See also: ROUND, TRUNC

LN

Summary: Return the natural logarithm of a number.

Syntax: LN( Number N)

Returns: Number

Level: 1

Constraints: N must be greater than zero.

Semantics: Computes the natural logarithm (base e) of a number.

Test Cases:

Expression Result Level Comment
=LN(1) 0 1 The logarithm of 1 (in any base) is 0.
=LN(EXP(1)) 1 1 The natural logarithm of e is 1.
=LN(0) Error 1 The argument must be greater than zero.
=LN([.B7]) Error 1 The natural logarithm of a non-number gives an error.

See also: LOG, LOG10, POWER, EXP

LOG

Summary: Return the logarithm of a number in a specified base.

Syntax: LOG( Number N [ ; Number Base ] )

Returns: Number

Level: 1

Constraints: N must be greater than zero.

Semantics: Computes the logarithm of a number in the specified base. If the base is not specified, the logarithm base 10 is returned.

Note: Level 1 systems MAY require the base, instead of allowing it to be optional.

Test Cases:

Expression Result Level Comment
=LOG(1) 0 2 The logarithm of 1 (in any base) is 0.
=LOG(1;EXP(1)) 0 1 The natural logarithm of 1 is 0.
=LOG(10;10) 1 1 The base 10 logarithm of 10 is 1.
=LOG(10) 1 2 If the base is not specified, base 10 is assumed.
=LOG(8*8*8;8) 3 1 Log base 8 of 8^3 should return 3.
=LOG(0;10) Error 1 The argument must be greater than zero.

See also: LOG10, LN, POWER, EXP

Note: OOo2 requires the base for log, instead of allowing it to be optional. Excel, etc. consider them optional.

LOG10

Summary: Return the base 10 logarithm of a number.

Syntax: LOG10( Number N)

Returns: Number

Level: 1

Constraints: N must be greater than zero.

Semantics: Computes the base 10 logarithm of a number.

Test Cases:

Expression Result Level Comment
=LOG10(1) 0 1 The logarithm of 1 (in any base) is 0.
=LOG10(10) 1 1 The base 10 logarithm of 10 is 1.
=LOG10(100) 2 1 The base 10 logarithm of 100 is 2.
=LOG10(0) Error 1 The argument must be greater than zero.
=LOG10("H") Error 1 The logarithm of a non-number gives an error.

See also: LOG, LN, POWER, EXP

MAX

Summary: Return the maximum from a set of numbers.

Syntax: MAX( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: Returns zero if no numbers are provided in the list. Non-numbers in ranges are ignored.

Semantics: Returns the value of the maximum number in the list passed in.

Test Cases:

Expression Result Level Comment
=MAX(2;4;1;-8) 4 1 Negative numbers are smaller than positive numbers.
=MAX([.B4:.B5]) 3 1 The maximum of (2,3) is 3.
=ISNA(MAX(NA())) True 1 Inline errors are propagated.
=MAX([.B3:.B5]) 3 1 Strings are not converted to numbers and are ignored.
=MAX([.B3:.B7]) 3 1 Strings are not converted to numbers and are ignored.
=MAX([.B3:.B9]) Error 1 Errors inside ranges are NOT ignored.

TBD: Should we mandate support for zero parameters? Excel 2002 doesn't support it. Neither does OOo, but gnumeric seems to.

TBD: Excel 2002 acts inconsistently here; errors in ranges are NOT ignored. Help needed to pin this down.


See also: MIN

MIN

Summary: Return the minimum from a set of numbers.

Syntax: MIN( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: None.

Semantics: Returns the value of the minimum number in the list passed in. Returns zero if no numbers are provided in the list. Implementations MAY require at least one parameter; if they permit zero parameters, MIN() with no parameters should return 0.

Test Cases:

Expression Result Level Comment
=MIN(2;4;1;-8) -8 1 Negative numbers are smaller than positive numbers.
=MIN([.B4:.B5]) 2 1 The minimum of (2,3) is 2.
=MIN([.B3]) 0 2 If no numbers are provided, MIN returns 0
=MIN("a") Error 1 Non-numbers are NOT ignored.
=MIN([.B3:.B5]) 2 1 Strings are not converted to numbers and are ignored.

TBD: How should we handle zero-parameter MIN and MAX? Excel 2002 doesn't support it. Neither does OOo, but gnumeric seems to.

See also: MAX

MOD

Summary: Return the remainder when one number is divided by another number.

Syntax: MOD( Number a, Number b)

Returns: Number

Level: 1

Constraints: None

Semantics: Computes the remainder of a/b.

Test Cases:

Expression Result Level Comment
=MOD(10;3) 1 1 10/3 has remainder 1.
=MOD(2;8) 2 1 2/8 is 0 remainder 2.
=MOD(5.5;2.5) 0.5 1 The numbers need not be integers.
=MOD(-2;3) 1 2 The location of the sign matters.
=MOD(2;-3) -1 2 The location of the sign matters.
=MOD(-2;-3) -2 2 The location of the sign matters.
=MOD(10;0) Error 1 Division by zero is not allowed

TBD: MOD with negative numbers is tricky. The tests for MOD(-2;3), MOD(2;-3), and MOD(-2;-3) work with Gnumeric, OOo2, and Excel 2002 as shown. KSpread 1.4.2 also computes MOD(-2;3) as 1. However, KSpread 1.4.2 (on Fedora Core 4) computes MOD(-2;-3) as -5 (not -1), and MOD(2;-3) computes as 2 (not -1); because of this variance in results, these tests for negative values of MOD were raised to level 2. On January 9, 2006, KSpread developer Tomas Mecir announced that KSpread's development version had been changed, and the next release of KSpread would produce the same results with these other negative values.

See also: Infix Operator "/"

ODD

Summary: Rounds a number up to the nearest odd integer, where "up" means "away from 0".

Syntax: ODD( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the odd integer whose sign is the same as N's and whose absolute value is greater than or equal to the absolute value of N. In other words, any "rounding" is away from zero. By definition, ODD(0) is 1.

TBD: What if it's not a number? Say anything?

Test Cases:

Expression Result Level Comment
=ODD(5) 5 1 Positive odd integers remain unchanged.
=ODD(-5) -5 1 Negative odd integers remain unchanged.
=ODD(2) 3 1 Non-odd positive integers round up.
=ODD(0.3) 1 1 Positive floating values round up.
=ODD(-2) -3 1 Non-odd negative integers round down.
=ODD(-0.3) -1 1 Negative floating values round down.
=ODD(0) 1 1 By definition, ODD(0) is 1.

See also: EVEN

PI

Summary: Return the approximate value of Pi.

Syntax: PI()

Returns: Number

Level: 1

Constraints: Must have 0 parameters

Semantics: This function takes no arguments and returns the (approximate) value of pi. Pi is an irrational number, and cannot be represented by a finite number of digits. Implementations are not required to represent pi exactly (obviously); they should simply make an attempt to give a reasonable result given their numerical representation method. Implementations which use a finite numerical representation using 64 or fewer bits SHOULD use the closest possible numerical representation.

TBD: Is there a requirement for the precision of the returned value?

Test Cases:

Expression Result Level Comment
=ABS(PI()-3.14159265358979323846264338327950)<1.e-6 TRUE() 1 The approximate value of pi. Lots of digits given here, in case the implementation can actually handle that many, but implementations are not required to exactly store this many digits. PI() should return the closest possible numeric representation.
=ABS(PI()-3.14159265358979323846264338327950)<1.e-12 TRUE() 2 A more stringent requirement at level 2

See also: SIN, COS

POWER

Summary: Return the value of one number raised to the power of another number.

Syntax: POWER( Number a ; Number b)

Returns: Number

Level: 1

Constraints: None

Semantics: Computes a raised to the power b.

Test Cases:

Expression Result Level Comment
=POWER(10;0) 1 1 Anything raised to the 0 power is 1.
=POWER(2;8) 256 1 2^8 is 256.

See also: LOG, LOG10, LN, EXP

PRODUCT

Summary: Multiply the set of numbers, including all numbers in ranges

Syntax: PRODUCT( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: None

Semantics: Returns the product of the numbers (and only numbers). At level 2 and higher, return 1 if there are no parameters or the list of numbers is empty (at level 1 the result may be an error in these cases).

Test Cases:

Expression Result Level Comment
=PRODUCT(2;3;4) 24 1 Multiply all the numbers.
=PRODUCT(TRUE();2;3) 6 1 TRUE() is 1 if inline.
=PRODUCT([.B4:.B5]) 6 1 2*3 is 6.
=PRODUCT([.B3:.B6]) 6 2 In level 2 and higher, conversion to NumberSequence ignores strings (in B3) and logical values (a TRUE() in B6).
=PRODUCT() 0 2 Product with no parameters returns 0

TBD: OOo 2.0 gives an error when a non-number is included in the list, e.g. product(2;3;"2"). Gnumeric seems to ignore non-numbers, even if they can be converted to a number, but does not give an error, e.g. product(2,3,"2") gives 6. Excel 2002 converts these to numbers, so product(2,3,"2") produces 12.

Note: Lotus1-2-3v9 gives an error for PRODUCT().

See also: SUM

RADIANS

Summary: Convert degrees to radians.

Syntax: RADIANS( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Converts a number in degrees into a number in radians. RADIANS(N) is equal to N*PI()/180.

Test Cases:

Expression Result Level Comment
=RADIANS(180)/PI() 1 1 180 degrees is PI() radians.


See also: DEGREES, PI

RAND

Summary: Return a random number between 0 and 1.

Syntax: RAND()

Returns: Number

Level: 2

Constraints: Must have 0 parameters and return a number between 0 and 1.

Semantics: This function takes no arguments and returns a random number between 0 and 1.

TBD: Some random number generators are better than others. Is there a requirement for the randomness of the result?

Test Cases:

Expression Result Level Comment
=RAND()>=0 TRUE() 2 The random number must be between 0 and 1.
=RAND()<=1 TRUE() 2 The random number must be between 0 and 1.

See also:

ROUND

Summary: Round a number to a specified precision.

Syntax: ROUND( Number A [; Number B])

Returns: Number

Level: 1

Constraints: None

Semantics: Round number A to the precision specified by B. The number A is rounded to the nearest power of 10 given by 10B. If B is zero, or absent, round to the nearest decimal integer. If B is positive, round to the specified number of decimal places. If B is negative, round to the left of the decimal point. If A is positive, and A is halfway between the two nearest values, the result must round up. In level 2 or higher systems, if A is halfway between the two nearest values, round away from zero. Note that if A is a Number, and B <= 0, the results will always be an integer (without a fractional component).

Test Cases:

Expression Result Level Comment
=ROUND(10.1) 10 1 If b is not specified, round to the nearest integer.
=ROUND(0.5) 1 1 0.5 rounds up.
=ROUND(1/3;0) 0 1 Round to the nearest integer.
=ROUND(1/3;1) 0.3 1 Round to one decimal place.
=ROUND(1/3;2) 0.33 1 Round to two decimal places.
=ROUND(1/3;2.9) 0.33 1 If b is not an integer, it is truncated.
=ROUND(5555;-1) 5560 1 Round to the nearest 10.
=ROUND(-1.1) -1 1 Negative number rounded to the nearest integer
=ROUND(-1.5) -2 2 Negative number rounded to the nearest integer

See also: TRUNC,INT

TBD: Should ROUND(-1.5) be at level 1 or level 2? Currently assigned to level 2.

SIGN

Summary: Return the sign of a number

Syntax: SIGN( Number N )

Returns: Number

Level: 2

Constraints: None

Semantics: If N < 0, returns -1; if N > 0, returns +1; if N == 0, returns 0.

Test Cases:

Expression Result Level Comment
=SIGN(-4) -1 2 N < 0 returns -1
=SIGN(4) 1 2 N > 0 returns +1
=SIGN(0) 0 2 N == 0 returns 0

See also: ABS

Note: This is placed at level 2, not level 1, because Sheet To Go version 6 doesn't include this function. It can be trivially implemented using IF for level 1.

SIN

Summary: Return the sine of an angle specified in radians

Syntax: SIN( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Computes the sine of an angle specified in radians.

Test Cases:

Expression Result Level Comment
=SIN(PI()/4.0)*2/SQRT(2) 1 1 sine of PI()/4 radians is SQRT(2)/2.
=SIN(PI()/2.0) 1 1 sine of PI()/2 radians is 1.

See also: ASIN, RADIANS, DEGREES

SQRT

Summary: Return the square root of a number

Syntax: SQRT( Number N )

Returns: Number

Level: 1

Constraints: N must be non-negative.

Semantics: Returns the square root of a non-negative number.

Test Cases:

Expression Result Level Comment
=SQRT(4) 2 1 The square root of 4 is 2.
=SQRT(-4) Error 1 The argument must be non-negative

See also: Power

STDEV

Summary: Compute the sample standard deviation of a set of numbers.

Syntax: STDEV( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: At least two numbers must be included. Returns an error if less than two numbers are provided.

Semantics: Computes the sample standard deviation s, where s^2 = {{\sum_{i=1}^{n}(x_i-\bar x)^2}\over{n-1}}.

Note: s is not the same as the standard deviation of the set, σ, which uses n rather than n − 1.

Note: A naive implementation of this function can easily run into trouble and end up trying to take the square root of a negative number. This is due to various effects of floating point arithmetic, in particular, the problem of using subtraction between two numbers that may be close together in value. For a better computer algorithm, see "The Art of Computer Programming" by Donald E. Knuth, Volume 2, Third Edition, page 232.

Test Cases:

Expression Result Level Comment
=STDEV(2;4)/SQRT(2) 1 1 The sample standard deviation of (2;4) is SQRT(2).
=STDEV([.B4:.B5])*SQRT(2) 1 1 The sample standard deviation of (2;3) is 1/SQRT(2).
=STDEV([.B3:.B5])*SQRT(2) 1 1 Strings are not converted to numbers and are ignored.
=STDEV(1) Error 1 At least two numbers must be included

See also: STDEVP, AVERAGE

STDEVP

Summary: Compute the standard deviation of the set for a set of numbers.

Syntax: STDEVP( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: None.

Semantics: Computes the standard deviation of the set σ, where \sigma^2 = {{\sum_{i=1}^{n}(x_i-\bar x)^2}\over{n}}.

Note: σ is not the same as the sample standard deviation, s, which uses n − 1 rather than n.

Note: Naive implementations can run into trouble; see STDEV for more information.

Test Cases:

Expression Result Level Comment
=STDEVP(2;4) 1 1 The standard deviation of the set for (2;4) is 1.
=STDEVP([.B4:.B5])*2 1 1 The standard deviation of the set for (2;3) is 0.5.
=STDEVP([.B3:.B5])*2 1 1 Strings are not converted to numbers and are ignored.
=STDEVP(1) 0 1 STDEVP(1) is 0.

See also: STDEV, AVERAGE

SUM

Summary: Sum (add) the set of numbers, including all numbers in ranges

Syntax: SUM( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: None

Semantics: Adds numbers (and only numbers) together. See the text on conversions; Logicals and Text are ignored at higher levels. At level 2 and higher, given zero paramters returns 0.

Test Cases:

Expression Result Level Comment
=SUM(1;2;3) 6 1 Simple sum.
=SUM(TRUE();2;3) 6 1 TRUE() is 1 if inline.
=SUM([.B4:.B5]) 5 1 2+3 is 5.
=SUM() 0 2 Sum with no parameters returns 0

See also: AVERAGE

Note: The exact semantics of SUM() in Excel are defined in (Walkenbach, 2004, pg 705). Lotus 1-2-3v9.8.1 does not accept SUM with no parameters.

SUMIF

Summary: Sum the values of cells in a range that meet a criteria.

Syntax: SUMIF( Range R, Criteria C [, Range S] )

Returns: Number

Level: 1

Constraints: Does not accept constant values as the range parameter.

Semantics: Sums the values in the range R which meet the criteria C. Adds numbers (and only numbers) together. The criteria may be a number, an expression, or a text string. If the optional range S is included, then the values in S are summed if the corresponding value in R meets the criteria.

Test Cases:

Expression Result Level Comment
=SUMIF([.B4:.B5];">2.5") 3 1 B4 is 2 and B5 is 3, so only B5 has a value greater than 2.5.
=SUMIF([.B3:.B5];[.B4]) 2 1 Test if a cell equals the value in [.B4].
=SUMIF("";[.B4]) Error 1 Constant values are not allowed for the range.
=SUMIF([.B3:.B4];"7") 0 1 [.B3] is the string "7", but only numbers are summed.
=SUMIF([.B3:.B4];"7";[.B4:.B5]) 2 2 [.B3] is the string "7", but its match is mapped to [.B4] for the summation.
=SUMIF([.B3:.B10];1+1) 2 1 The criteria can be an expression.

Note: In OOo2, the criteria optionally supports regular expressions, if this option is turned on.

TBD: SUMIF([.B3:.B10];"7";[.B4:.B5]) produces 2 in Excel 2002 and OOo2, but Gnumeric returns a 0. The test is currently set to level 2 because of the difference. Are the strings ("7") being converted to numbers before comparison in excel and OOo2? Or are they doing a string comparison? This difference is also encountered in COUNTIF.

See also: COUNTIF, SUM

TAN

Summary: Return the tangent of an angle specified in radians

Syntax: TAN( Number N )

Returns: Number

Level: 1

Constraints: None

Semantics: Computes the tangent of an angle specified in radians.

TAN(x) = SIN(x) / COS(x)

TBD: What happens for TAN(PI()/2.0), TAN(-PI()/2.0), etc.? Error? Infinity?

Test Cases:

Expression Result Level Comment
=ABS(TAN(PI()/4.0)-1)<1e-6 True 1 tangent of PI()/4.0 radians.

See also: ATAN, ATAN2, RADIANS, DEGREES, SIN, COS

TRUNC

Summary: Truncate a number to a specified number of digits.

Syntax: TRUNC( Number a ; Number b)

Returns: Number

Level: 1

Constraints: None

Semantics: Truncate number a to the number of digits specified by b. If b is zero, or absent, truncate to a decimal integer. If b is positive, truncate to the specified number of decimal places. If b is negative, truncate to the left of the decimal point. If b is not an integer, it is truncated.

Test Cases:

Expression Result Level Comment
=TRUNC(10.1) 10 1 If b is not specified, truncate to the nearest integer.
=TRUNC(0.5) 0 1 Truncate rather than rounding.
=TRUNC(1/3;0) 0 1 Truncate to an integer.
=TRUNC(1/3;1) 0.3 1 Truncate to one decimal place.
=TRUNC(1/3;2) 0.33 1 Truncate to two decimal places.
=TRUNC(1/3;2.9) 0.33 1 If b is not an integer, it is truncated.
=TRUNC(5555;-1) 5550 1 Truncate to the nearest 10.
=TRUNC(-1.1) -1 1 Negative number truncated to an integer
=TRUNC(-1.5) -1 1 Negative number truncated to an integer

See also: ROUND,INT

VAR

Summary: Compute the sample variance of a set of numbers.

Syntax: VAR( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: At least two numbers must be included. Returns an error if less than two numbers are provided.

Semantics: Computes the sample vaiance s2, where s^2 = {{\sum_{i=1}^{n}(x_i-\bar x)^2}\over{n-1}}.

Note: s2 is not the same as the variance of the set, σ2, which uses n rather than n − 1.

Test Cases:

Expression Result Level Comment
=VAR(2;4) 2 1 The sample variance of (2;4) is 2.
=VAR([.B4:.B5])*2 1 1 The sample variance of (2;3) is 0.5.
=VAR([.B3:.B5])*2 1 1 Strings are not converted to numbers and are ignored.
=VAR(1) Error 1 At least two numbers must be included

See also: VARP, STDEV, AVERAGE

TBD: Should we specify what happens with non-numbers? In Excel 2002, it produces an error.


VARP

Summary: Compute the variance of the set for a set of numbers.

Syntax: VARP( { NumberSequence N } )

Returns: Number

Level: 1

Constraints: At least two numbers must be included; it is unspecified what is returned with only one number. Returns an error if less than two numbers are provided.

Semantics: Computes the vaiance of the set σ2, where \sigma^2 = {{\sum_{i=1}^{n}(x_i-\bar x)^2}\over{n}}.

Note: σ2 is not the same as the sample variance, s2, which uses n − 1 rather than n.

Test Cases:

Expression Result Level Comment
=VARP(2;4) 1 1 The variance of the set for (2;4) is 1.
=VARP([.B4:.B5])*4 1 1 The variance of the set for (2;3) is 0.25.
=VARP([.B3:.B5])*4 1 1 Strings are not converted to numbers and are ignored.

TBD: VARP(1) shouldn't be required to be an error. Excel, at least, accepts it.

See also: VAR, STDEVP, AVERAGE


Statistical Functions


Text Functions

CHAR

Summary: Return character represented by the given numeric value

Syntax: CHAR( Number N )

Returns: Text

Level: 2

Constraints: N >= 0.

Semantics:

Test Cases: Returns character represented by the given numeric value. This is only portable when 0 <= N <= 127; in these cases the ASCII character is returned in text. Beyond 127, some implementations return a character from a system-specific code page, while others return the Unicode character. For level 3, CHAR(10) represents a new line. Many implementations do not allow values greater than 255.

Expression Result Level Comment
=CHAR(65) "A" 2 ASCII character 65 is "A".
=LEN(CHAR(10)) 1 3 New line. May need to turn on "word wrap" formatting to see the result.

Note: Excel supports CHAR(10), though it only shows as a newline if the cell is formatted to word wrap (else the character shows as a box character). OOo2 does not support CHAR(10) as a newline character.

See also: CODE

CONCATENATE

Summary: Concatenate the text strings

Syntax: CONCATENATE( { Text T } )

Returns: Text

Level: 1

Constraints: None

Semantics: Concatenate each text value, in order, into a single text result.

Test Cases:

Expression Result Level Comment
=CONCATENATE("Hi "; "there") "Hi there" 2 Simple concatenation.
=CONCATENATE("A"; "B"; "C") "ABC" 2 More than two parameters okay.
=CONCATENATE([.B4]; [.B5]) "23" 2 Numbers converted to strings automatically.

See also: Infix Operator "&"

Rationale: Raised beyond level 1 because OOo2 doesn't support CONCATENATE.

EXACT

Summary: Report if two values are "exactly" equal (primarily used with text).

Syntax: Scalar Left = Scalar Right

Returns: Logical

Level: 1

Constraints: None

Semantics: Returns TRUE if two values are "exactly" equal. If the values differ in type, return FALSE. At level 1, this function MUST handle two text values. If they are both Text, return TRUE if the two values match in a case-sensitive way, else return FALSE.

Beyone level 1, this function MUST support Number and Logical as well. If the values are both Number, return TRUE if they are nearly equal according to "=", else return FALSE. If they are both Logicals, return TRUE if they are identical, else return FALSE.

Test Cases:

Expression Result Level Comment
=EXACT("A";"A") True 1 Trivial comparison.
=EXACT("A";"a") False 1 EXACT, unlike "=", considers different cases different.
=EXACT(1;1) True 2 EXACT does work with numbers.
=EXACT((1/3)*3;1) True 2 Numerical comparisons ignore "trivial" differences that depend only on numeric precision of finite numbers.
=EXACT(TRUE();TRUE()) True 2 Works with Logical values.
=EXACT("1";2) False 2 Different types with different values are different.
=EXACT("h";1) False 2 If text and number, and text can't be conveted to a number, they are different and NOT an error.
=EXACT("1";1) True 2 If text and number, see if numbers equal.


See also: Infix Operator "<>"

Rationale: EXACT on many implementations can take any scalar, not just text. However, it's typically considered a Text function, because it doesn't really add any new capabilities for Number and Logical ("=" is more convenient). For text, however, EXACT is not the same as "=".

FIND

Summary: Return the starting position of a given text.

Syntax: FIND(Text Search ; Text T [; Number Start])

Returns: Number

Level: 1

Constraints: Start >= 0

Semantics: Returns the character position where Search is first found in T, when the search is started from character position Start. The match is case-sensitive, and no wildcards or other instructions are considered in Search. Start is 1 if omitted. Returns an Error if text not found.

Test Cases:

Expression Result Level Comment
=FIND("b";"abcabc") 2 1 Simple FIND().
=FIND("b";"abcabcabc"; 3) 5 1 Start changes the start of the search.
=FIND("b";"ABC";1) Error 1 Matching is case-sensitive.

See also: EXACT, SEARCH

Note: SEARCH is not part of level 1; SEARCH allows more complex matching than simple case-sensitive matching.

TBD: Lotus 1-2-3v9's start-number is not optional; should start be non-optional for level 1? Also, it starts counting from 0, not 1, for string positions.

LEFT

Summary: Return a selected number of text characters from the left.

Syntax: LEFT(Text T [; Number Length])

Returns: Text

Level: 1

Constraints: Length >= 0

Semantics: Returns the Length number of characters of text T, starting from the left. If Length is omitted, it defaults to 1. If T has fewer than Length characters, it returns T. This means that if T is an empty string (which has length 0) or the parameter Length is 0, LEFT() will always return an empty string.

Test Cases:

Expression Result Level Comment
=LEFT("Hello";2) "He" 1 Simple LEFT().
=LEFT("Hello") "H" 1 Length defaults to 1.
=LEFT("Hello";20) "Hello" 1 If Length is longer than T, returns T.
=LEFT("Hello";0) "" 2 If Length 0, returns empty string.
=LEFT("";4) "" 1 Given an empty string, always returns empty string.
=LEN(LEFT("xxx";-0.5)) Error 1 It makes no sense to request a negative number of characters.
=LEN(LEFT("xxx";2^32)) 3 2 Needs to return empty string at level > 1.

See also: RIGHT


Note: LEFT("xxx";2^32) should produce "xxx", but it's not an unusual bug to do otherwise.

Future spreadsheets may implement Text strings even larger, and we don't want to prevent that.

LEN

Summary: Return the length, in characters, of given text

Syntax: LEN( Text T)

Returns: Number

Level: 1

Constraints: None.

Semantics: Computes number of characters (not the number of bytes) in T. Implementations that support ISO 10646 / Unicode MUST consider any character in the Basic Multilingual Plane (BMP) basic plane as one character, even if they occupy multiple bytes. (The BMP are the characters numbered 0 through 65535 inclusive). Implementations SHOULD consider any character not in the BMP as one character as well.

Test Cases:

Expression Result Level Comment
=LEN("Hi There") 8 1 Space is a character.
=LEN("") 0 1 Empty string has zero characters.
=LEN(55) 2 1 Numbers are automatically converted.

See also: TEXT, ISTEXT, LEFT, RIGHT

Note: Many implementations use UTF-8 internally for text representations. This is fine, but they must be careful to avoid mistakenly using byte length as the length.

Rationale: The emphasis is placed on characters, not bytes, because different implementations will use a different number of characters to represent a given text value. In contrast, counting characters is more portable, and it's closer to the user's expectations. Some implementations use a 16-bit representation for characters. Implementations using 16 bits per character should consider any character outside the BMP as a single character, but as a concession to buggy implementations, this is merely specified as a 'should'.

LOWER

Summary: Return input string, but with all uppercase letters converted to lowercase letters.

Syntax: LOWER(Text T )

Returns: Text

Level: 1

Constraints: None

Semantics: Return input string, but with all uppercase letters converted to lowercase letters. As with most functions, it is side-effect free (it does not modify the source values). All implementations convert A-Z to a-z.

Test Cases:

Expression Result Level Comment
=LOWER("HELLObc7") "hellobc7" 1 Uppercase converted to lowercase; other characters just copied to result.

See also: UPPER, PROPER

Note: Ideally, implementations could handle any ISO 10646 characters. However, it is not clear that they actually handle all international characters. Many character sets do not really have an "uppercase" and "lowercase" set, in which case this function just copies the character to the Result.

MID

Summary: Returns extracted text, given an original text, starting position, and length.

Syntax: MID(Text T ; Number Start ; Number Len)

Returns: Text

Level: 1

Constraints: Start >= 1, Len > 0.

Semantics: Returns the characters from T, starting at character position Start, for up to Len characters. If there are less than Len characters starting at start, it returns as many characters as it can beginning with Start. In particular, if Start > LEN(T), it returns "". MID(T;1;Len) produces the same results as LEFT(T;Len).

Test Cases:

Expression Result Level Comment
=MID("123456789";5;3) "567" 1 Simple use of MID.
=MID("123456789";20;3) "" 1 If Start is beyond string, return empty string.

See also: LEFT, RIGHT, REPLACE, SUBSTITUTE

PROPER

Summary: Return the input string with the first letter of each word converted to an uppercase letter and the rest of the letters in the word converted to lowercase.

Syntax: PROPER(Text T)

Returns: Text

Level: 1

Constraints: None

Semantics: Return input string with the first letter of each word as an uppercase letter and the remaining letters in each word as lowercase letters. As with most functions, it is side-effect free (it does not modify the source values).

Test Cases:

Expression Result Level Comment
=PROPER("hello there") "Hello There" 1 The first letter is uppercase and the following letter are lowercase.
=PROPER("HELLO THERE") "Hello There" 1 The first letter is uppercase and the following letter are lowercase.
=PROPER("HELLO.THERE") "Hello.There" 1 Words are separated by spaces, punctuation, etc.

TBD: What exactly can separate words? In Gnumeric it seems to be any non-letter.

Rationale: The syntax requires exactly one parameter. An early version of this specification had curly braces around the parameter, which would say that it took a list, but OOo2, Gnumeric, and KSpread at least do not allow this (and Excel probably doesn't either), so presumably this was just an accident.

See also: LOWER, UPPER

Note: Ideally, implementations could handle any ISO 10646 character. However, it is not clear that they actually handle all international characters. Many character sets do not really have an "uppercase" and "lowercase" set, in which case this function just copies the character to the Result.

REPLACE

Summary: Returns text where an old text is substituted with a new text.

Syntax: REPLACE(Text T ; Number Start ; Number Len ; Text New)

Returns: Text

Level: 1

Constraints: Start >= 1.

Semantics: Returns text T, but remove the characters starting at character position Start for Len characters, and instead replace them with New. Character positions defined by Start begin at 1 (for the leftmost character). If Level is more than 1, if Len=0, the text New is inserted before character position Start, and all the text before and after Start is retained.

REPLACE(T;Start;Len;New) is the same as:

LEFT(T;Start-1) & New & MID(T; Start+Len; LEN(T)))


Test Cases:

Expression Result Level Comment
=REPLACE("123456789";5;3;"Q") "1234Q89" 1 Replacement text may have different length.
=REPLACE("123456789";5;0;"Q") "1234Q56789" 2 If Len=0, 0 characters removed.

See also: LEFT, RIGHT, MID, SUBSTITUTE

REPT

Summary: Return text repeated Count times.

Syntax: T( Text T ; Number Count )

Returns: Text

Level: 1

Constraints: None

Semantics: Returns text T repeated INT(Count) number of times; if INT(Count) is zero, an empty string is returned. If INT(Count) < 0, the result is either Error or an empty string (unspecified which).

Test Cases:

Expression Result Level Comment
=REPT("X";3) "XXX" 1 Simple REPT.
=REPT("XY";2) "XYXY" 1 Repeated text can have length > 0.
=REPT("X";2.9) "XX" 1 INT(Count) used if count is a fraction.
=REPT("X";0) "" 1 If Count is zero, empty string.

See also:

TBD: What if Count <0? KSpread 1.4.2 returns empty string. Gnumeric and OOo2 return Error. Returning empty string probably makes things easier, but should we require that?

Note: This function is sometimes used as a trivial kind of text graphic.

RIGHT

Summary: Return a selected number of text characters from the right.

Syntax: RIGHT(Text T [; Number Length])

Returns: Text

Level: 1

Constraints: Length >= 0

Semantics: Returns the Length number of characters of text T, starting from the right. If Length is omitted, it defaults to 1. If T has fewer than Length characters, it returns T. This means that if T is an empty string (which has length 0) or the parameter Length is 0, RIGHT() will always return an empty string.

Test Cases:

Expression Result Level Comment
=RIGHT("Hello";2) "lo" 1 Simple RIGHT().
=RIGHT("Hello") "o" 1 Length defaults to 1.
=RIGHT("Hello";20) "Hello" 1 If Length is longer than T, returns T.
=RIGHT("Hello";0) "" 2 If Length 0, returns empty string.
=RIGHT("";4) "" 1 Given an empty string, always returns empty string.

See also: LEFT

SUBSTITUTE

Summary: Returns text where an old text is substituted with a new text.

Syntax: SUBSTITUTE(Text T ; Text Old ; Text New [; Number Which ])

Returns: Text

Level: 1

Constraints: Which >= 1

Semantics: Returns text T, but with text Old replaced by text New (when searching from the left). If Which is omitted, every occurrence of Old is replaced with New; if Which is provided, only that occurrance of Old is replaced by New (starting the count from 1). If there is no match, or if Old has length 0, T is returned unchanged. Note that Old and New may have different lengths. If Which < 1, returns Error.

Test Cases:

Expression Result Level Comment
=SUBSTITUTE("121212";"2";"ab") "1ab1ab1ab" 1 Without Which, all replaced.
=SUBSTITUTE("121212";"2";"ab";2) "121ab12" 1 Which starts counting from 1.
=SUBSTITUTE("Hello";"x";"ab") "Hello" 1 If not found, returns unchanged.
=SUBSTITUTE("";"";"ab") "" 2 Returns T if Old is Length 0.

See also: LEFT, RIGHT, MID, REPLACE

T

Summary: Return the text (if text), else return 0-length Text value

Syntax: T( Any X )

Returns: Text

Level: 1

Constraints: None

Semantics: If X is a reference, it is first dereferenced to a scalar. Then its type is examined; if it is of type Text, it is returned, else a null string (Text value of zero length) is returned. This is not a type-conversion function; T(5) produces an empty string, not "5".

Test Cases:

Expression Result Level Comment
=T("HI") "HI" 1 T does not change text.
=T([.B3]) "7" 1 References transformed into what they reference.
=T(5) "" 1 Non-text converted into null string.

See also: N

Note: Many public documents inadequately describe this function; in particular, few seem to describe exactly what happens when something other than type text is provided to this function. It is known that Excel and OOo2 both produce an empty string for T(5). In some implementations this function is named S, for string.

TRIM

Summary: Remove leading and trailing spaces, and replace all internal multiple spaces with a single space.

Syntax: TRIM(Text T)

Returns: Text

Level: 1

Constraints: None.

Semantics: Takes T and removes all leading and trailing space. Any other sequence of 2 or more spaces is replaced with a single space.

Test Cases:

Expression Result Level Comment
=TRIM(" HI ") "HI" 1 Simple TRIM().
=LEN(TRIM("H" & " " & " " & "I")) 3 1 Multiple spaces become 1 space internally.

See also: LEFT, RIGHT

Note: The following implementations are known to turn multiple internal spaces into a single space: Microsoft Excel 2002, OOo2, Gnumeric, KSpread. If there are implementations which don't, this could be moved to a higher level, but currently there's no reason to believe that this is necessary.


TBD: Is only character 32 meant? What about other "blank" characters, like linefeed, formfeed, etc.?

UPPER

Summary: Return input string, but with all lowercase letters converted to uppercase letters.

Syntax: UPPER(Text T )

Returns: Text

Level: 1

Constraints: None

Semantics: Return input string, but with all lowercase letters converted to uppercase letters. As with most functions, it is side-effect free (it does not modify the source values). All implementations convert a-z to A-Z.

Test Cases:

Expression Result Level Comment
=UPPER("Habc7") "HABC7" 1 Lowercase converted to upper case; other characters just copied to result.

See also: LOWER, PROPER

Note: Ideally, implementations could handle any ISO 10646 character and convert lowercase letters into uppercase letters. However, it is not clear that they actually handle all international characters. Many character sets do not really have an "uppercase" and "lowercase" set, in which case this function just copies the character to the Result.

Relationship with OpenDocument

OpenFormula is designed to smoothly fit with OpenDocument; the following provides specific information on what is required of implementations that implement, and documents that use, both OpenDocument and OpenFormula.

An OpenDocument document implementing the strict schema and this specification MAY NOT use any extensions in formulas (including additional functions) beyond the semantics and syntax described in this specification, since there is no guarantee that other receiving systems would be able to process it.

The OpenDocument attributes that use formulas at the time of this writing are text:formula, text:condition, table:formula, draw:formula, and anim:formula. Note in particular that:

  • In text:formula and text:condition, the formula context MUST include all variables and fields' current values, which can be referenced by name as a formula variable.
  • In table:formula (which is used for spreadsheets and word processing tables) it is always legal to include a table cell reference, since there is at least one table. Not all syntactically legal table cell references are actually legal, e.g., a named table might not exist. The formula context MUST include all named ranges that apply to that table.
  • In draw:formula a "modifier reference" (using prefixed "$") MUST be legal. (FIXME: should this be supported at all?)
  • In anim:formula the context MUST always include the following: e, x, y, width, height, as described in the OpenDocument specification. (FIXME: should this be supported at all?)

OpenDocument attributes that use formulas MUST begin with a formula specification selector that identifies which formula specification to use. This selector MUST be either a namespace (one or more letters) followed by a colon, or the equal sign ("="). The prefixed "=" is equivalent to "formula:". OpenDocument documents SHOULD define the formula namespace; if they do not, the default value for the formula namespace is version 1.0 of OpenFormula (this specification). The "formula:" namespace MUST be some version of this specification.

The formula specification selector is NOT considered part of the formula, but users will typically see the formula specification selector as the first character(s) of the formula. Implementations MAY accept multiple formula syntaxes, and they MAY accept various extensions to the default formula syntax. However, all implementations that claim to accept OpenFormula formulas MUST accept the formula syntax and semantics as described in this specification.

Rationale: This provides additional flexibility so that other languages can be used when desired. The selector is not considered part of OpenFormula itself, so that specifications that use OpenFormula can choose how to best include this selector.

  • It is expected that OpenFormula formulas will be preceded by a "namespace selector."

    For example, OpenDocument implementations' formulas can begin with a namespace prefix to identify the formula specification language to use, or they can begin with an "=" which indicates the OpenFormula specification's namespace. Allowing namespace specification in formulas permits great flexibility and permits upgrading to better formats if OpenFormula is found to be inadequate for that use. However, if there is no "default" formula specification format implemented by all mplementations, no two implementations will necessarily be able to exchange formulas. Since this default is expected to be widely used, a simple "=" prefix makes it easy to use the common case, while permitting great flexibility.
  • OpenFormula is defined in a separate document from the OpenDocument standard.

    This makes it possible to define upgrade the formula language separately from the OpenDocument specification, and allows other implementations to use OpenFormula independently of OpenDocument if they wish to do so.
  • OpenFormula's format does not use XML, but instead uses the traditional mathematical infix notation.

    Formulas used for calculation have not traditionally been written using XML. Instead, computer-calculatable formulas are traditionally written using a modified mathematical infix notation such as "value*5+2". This non-XML format is much easier for humans to understand (because it is the form people have used for decades), is shorter than XML formats, and there are many tools specifically designed to handle these formats. Note that formulas used primarily for display are often written using a nested notation, such as MathML. However, formulas not used for calculation are not the primary subject of this specification.
  • OpenFormula's semantics are intentionally based on previous spreadsheet programs, such as VisiCalc, Lotus 1-2-3, and Microsoft Excel.

    This was done to simplify transition of spreadsheet documents to this format.
  • OpenFormula unifies the various formula formats defined in OpenDocument.

    There are many disadvantages to having several similar yet subtly different formula syntaxes, in particular, they tend to increase implementation size (since multiple calculation engines and user interfaces must be developed), and they tend to increase user confusion. As compared with "Open Document Format for Office Applications (OpenDocument) 1.0 Committee Draft 2, 21 Dec 2004", the latest version available at the time of writing, here are the OpenDocument fields that use formulas:
    • Attribute text:formula defined in 6.7.6 (Formula), and noted in 6.3.2, 6.3.5, 6.3.9, 6.3.10. Note that text:formula is deprecated when inside a table cell (use table:formula instead), though it's allowed; see 6.6.11. Note that variables and most variable fields have a current value, as described in 6.7.1.
    • Attribute text:condition is defined as being the same as text:formula in 6.3.5.
    • Attribute table:formula is defined in 8.1.3 (Table Cell) - this is used for spreadsheets.
    • Attribute table:condition is described in 8.5.3 (Table cell content validations); this format allows certain conditions, many of which can end up calling functions. For example, "cell-content() > [.A5]" would be valid table:condition.
    • Named expressions as described in section 8.5.5 define cell ranges that may be used as formula variables, and can also include a table:expression that is an OpenFormula formula (in fact, since they cannot begin with "=", they are always in the "formula:" namespace of the document).
    • This may also be used for attribute draw:formula as defined in 9.5.5 (Enhanced Geometry "#147; Equation). It's mentioned in 9.5.3 (Enhanced Geometry - Path Attributes) for enhanced path, text areas, and glue points, and in 9.5.6 too. This is defined in some detail; note that it includes requirements for a number of functions (like sqrt()), and note also that "," is used as the function parameter separator instead of ";". It also permits a "modifier reference" using prefixed "$".
    • This may also be used for attribute anim:formula as defined in 13.3.2., in a way similar to (but not equal to) the draw formulas. Section 9.8.2 (Document Dependent SMIL Animation Attribute Values) subsection "formula" defines additional variables for use in anim:formula: e, x, y, width, height. Again, "," is used as the function parameter separator instead of ";".
    • Note that attribute draw:equation is also defined in 9.5.5; this is outside the scope of this document.
    • Section 8.9 (Consolidation) and 8.6.10 (Subtotal Field) notes the use of table:function for consolidating tables and creating subtotals, and gives a fixed list of allowed functions: auto, average, count,countnums, max, min, product, stdev, stdevp, sum, var, and varp. This is outside the scope of this document, though such functions might be implemented using the same code that implements OpenFormula, and implementations MAY allow other OpenFormula functions to be used in table:function.
    • Section 8.7 describes table filters; these are similar to formulas but are outside the scope of this document (perhaps in future versions they will be brought in-scope, by simply defining them as an OpenFormula formula with an implied "cell-contents()" prefix and permitting cell-contents()).
    • Note that section 6.3.5 gives an example of how to use a namespace to provide a formula that is not in the OpenFormula namespace:
 text:formula='ooo-w:[addressbook file.address.FIRSTNAME] == "Julie"'

TBD: This specification doesn't define how to create new functions, though it'd be trivial to do -- just define a way to retrieve parameters, and then use named formulas. Should this be added? If so, should it be added in a way that makes it easy to do the above, or should a function like PARAMETER(n) be added?

Miscellaneous

Identifying Errors

Spreadsheets can contain errors. If spreadsheets are used for extremely critical decisions, it is wise to use tools that try to detect such errors (some are built into some spreadsheet implementations; other tools are external). More information about errors in spreadsheets can be found in locations such as the article "The Risky Business of Spreadsheet Errors" by Ivars Peterson, the European Spreadsheet Risks Interest Group, and Ray Panko's Spreadsheet Research (SSR) page. This specification does not directly implement a defect detection tool, but it is key to enabling and improving such tools. By publicly defining a common format for spreadsheet formulas, and by publicly defining their semantics, this specification makes it much easier to build defect detection tools for spreadsheets. The specification even points out potential issues (e.g., precedence of ^ vs. unary -), making it easier to build tools. Thus, this specification can be viewed as a key step in countering defects in spreadsheets.

Lotus 1-2-3 Function Names

Some Lotus 1-2-3 functions are given different names in this specification. In general, OpenFormula uses Excel-like names instead. Note that no "@" symbol precedes function names, and the Lotus 1-2-3 "@@" becomes INDIRECT. Logical operators are functions (AND, OR, and NOT). In addition, the following Lotus 1-2-3 names have the following OpenFormula names (which generally follows the "Excel branch" naming conventions):

Lotus 1-2-3 OpenFormula
@ INDIRECT
AVG AVERAGEA
CELLPOINTER CELL
COUNT COUNTA
CTERM NPER, but different parameters
DAVG DAVERAGE
D360 DAYS360
DCOUNT DCOUNTA
DSTD DSTDP
DSTDS DSTDEV
DVAR DVARP
DVARS DVAR
INT TRUNC
ISNUMBER ISNONTEXT
ISRANGE ISREF
ISSTRING ISTEXT
LENGTH LEN
MAX MAXA
MIN MINA
PMT PMT - different parameter order
PUREAVG AVERAGE
PURECOUNT COUNT
PUREMAX MAX
PUREMIN MIN
PURESTD STDEVP
PURESTDS STDEV
PUREVAR VARP
PUREVARS VAR
PV PV - different parameter order
RAND RAND - different each recalculation
RATE RATE - different parameter order
REPEAT REPT
S T
STD STDEVPA
STDDS STDEVA
STRING FIXED
TERM NPER - different parameter order
VAR VARPA
VARS VARA

Guidelines for Users (Non-normative)

This section notes issues that users should be aware of.

TBD: This section needs a major cleanup, but it has useful information I don't want to lose.

David A. Wheeler noted in a February 25, 2005 posting to the OpenDocument mailing list that OpenOffice.org 1.1.3's semantics are DIFFERENT than Microsoft Excel's. This means that syntactically converted spreadsheets can produce different answers in OOo, compared to Excel. It's not a matter of "right" or "wrong"; what OOo is doing is reasonable. But it's different than Excel, producing different results. From the point-of-view of compatibility, they are compatibility bugs, IF you believe that Excel semantics need to be matched to enable easy transition and interoperation. The test Excel spreadsheet at http://www.dwheeler.com/openformula/testss.xls shows some of the differences. For example:

  • In Excel, Logical values (TRUE, FALSE) are not considered numbers; they aren't counted by COUNT(). They ARE in OOo.
  • In Excel, SUM(...) is NOT the same as num + num + num ...; SUM _only_ looks at values of type "numeric", ignoring strings and logical values. In OOo, SUM(...) also ignores strings/text, but OOo converts any logical values to 0 and 1, so TRUE() adds one in OOo but not in Excel.
  • In Excel, ISNUMBER(TRUE()) is false. In OOo, ISNUMBER(TRUE()) is true.
  • ROMAN() works slightly differently when given TRUE() and FALSE() as format parameters.
  • OOo does not support constant arrays, or cell concatenation.

If Excel's are the "right" semantics, then I can document that "+" requires certain operations, etc., etc. But if that's not the case, then I don't know what to document. For functions, it's easy to define one function name as doing one thing, another function name that does the other, and then let Excel converters match the names up... but that requires implementations of those functions, and agreement on the names.

Appendix: User Input-Output (Non-normative)

Recommendations to implementors

User interfaces of implementations MAY choose to accept and display formulas differently from how they are exchanged in this data format. It is quite common for implementations to accept and display cell addresses without square brackets, to accept and display numbers using the format of the current locale, and to accept commas instead of semicolons for parameter separators. Implementations may also display function names differently (e.g., translated), use parameters in a different order, or use a completely different display method.

It is RECOMMENDED that implementations allow users to directly enter and view formulas in the OpenFormula format as part of the application. Indeed, the OpenFormula specification has been designed to permit direct human entry of these formulas. It is RECOMMENDED that implementations accept numbers that meet the "C" locale requirements (as well as the current locale's), accept bracketed cell addresses for the current table, and accept the semicolon as a function parameter separator.

It is also RECOMMENDED that implementation user interfaces correct likely mistakes with a dialogue. In particular, if an implementation normally uses semicolons as a parameter separators, it is RECOMMENDED that implementations attempt to detect if a user is using commas as function parameter separators instead of semicolons, and recommend a plausible correction.

In the exchange format, "!" is the cell intersection operation; in Excel this is entered and displayed as a space.

Note that it is RECOMMENDED that spreadsheet implementations accept user input of a cell address without the "[..]" markings, and automatically determine if the user intended to reference a named range or a cell address. Excel simply forbids creating a rangename that could also be a cell address; in many ways this is the simplest solution. An implementation could simply search for a named range, and if there is no such range and the entry could be a cell address, the implementation automatically promotes it to a cell address. This means that if a table has a range named "Q1" (meaning "first quarter"), any future formula where "Q1" is entered will refer to the named range; referring to cell address Q1 will require being surrounded by square brackets.

Rationale: It's normal for spreadsheet users to simply type in cell addresses and expect them to work, while the [] format is rarely used directly (even OpenOffice.org 1.1.1, which stores formulas using [..], doesn't accept user input using []). However, there should be a way for users to disambiguate between named ranges and cell addresses. By allowing direct user entry and display of the OpenFormula format, users are more likely to be able understand their formulas, users can more easily move between applications, and developers can more easily identify and debug any interoperability problems. Note that the OpenFormula format is highly accessible, aiding those with disabilities.

Excel and OpenOffice.org surround formulas that are written to a range of cells with "{" and "}" when they are displayed. These characters are not included in the data interchange format.

Examples

Here are some examples of OpenFormula formulas -- what is exchanged, as well as user displays for two common spreadsheet programs. Note that in the exchange format, cell references are surrounded by "[..]" (which simplifies parsing and disambiguates them from formula variables), but typical implementations do not display or require the square brackets to be entered. Function parameters are separated with ";". For simplicity, the leading "=" is displayed in all three formats, though technically that's just a selector for OpenFormula:

OpenFormula Exchange Formula Microsoft Excel Display OpenOffice.org 1.1.3 Display (en_US Locale)

Comments

=1+2*3^4 =1+2*3^4 =1+2*3^4

Arithmetic precedence; the answer must be 163

=salary*2 =salary*2 =salary*2

Variables permitted

=[.A2]*3.5 =A2*3.5 =A2*3.5

Exchange format surrounds cell references with "[".."]", and row/column address is preceded by ".". C locale used for numbers

=sum([.A1:.A5]) =sum(A1:A5) =sum(A1:A5)

":" creates a range

=sum([.A1]; [.A2]; [.A3]) =sum(A1,A2,A3) =sum(A1;A2;A3)

Parameters are ";" separated (Excel uses ",")

=MMULT(Matrix1; Matrix2) {=MMULT(Matrix1, Matrix2)} {=MMULT(Matrix1; Matrix2)}

Array results not surrounded by "{..}"; variable names

=[$Sheet1.B2] =Sheet1!B2 =$Sheet1.B2

Sheet references (in same workbook)

=['[file:///C: file:///C:] /dwheeler/ misc/ testssold.xls' #$Sheet1.B3] ='C:\dwheeler\ misc\ [testssold.xls] Sheet1'!B3 ='[file:///C: file:///C:] /dwheeler/ misc/ testssold.xls' #$Sheet1.B3

Referencing another file

=SUM([.B2:.B3]! [.B1:.B2]) =SUM(B2:B3 B1:B2) =SUM(B2:B3! B1:B2)

"!" is cell intersection in OpenFormula (Excel uses space character)

=SUM(B1:B3~ B2:B4) =SUM((B1:B3, B2:B4)) =SUM(B1:B3~ B2:B4)

(Optional) Cell concatenation. This is NOT union - B2 will be added twice. ~ proposed.

=AREAS(B1:B3~ B2:B4) =AREAS((B1:B3, B2:B4)) =AREAS(B1:B3~ B2:B4)

(Optional) Cell concatenation again.

=MDETERM( {2;3|4;5}) =MDETERM( {2,3;4,5}) =MDETERM( {2;3|4;5}) (Optional) Constant arrays, proposed syntax

Appendix: Development of this Specification (Non-normative)

History

  • 2004-11-01: David A. Wheeler recommended that the Open Document's specification of formulas be strengthened to allow exchange of formulas (particularly for spreadsheets), including a grammar to eliminate ambiguities. On 2005-01-04 he posted an updated version of his proposal.
  • 2005-01-14: This proposal was formally considered and rejected by the OpenDocument technical committee. The rejection was explained as follows: "A comment was submitted concerning the [inclusion] of a grammar for spreadsheet formulas which conforming implementations should support. While we think that having interoperability on that level would be of great benefit to users, we do not belive that this is in the scope of the current specification. Especially since it is not specifically related to the actual XML format the specification describes. The TC will work on a solution concerning the documentation of interoperabilty standards that go beyond what is defined in the specification. The submitted grammar should be published on the TC web pages." David A. Wheeler noted that without such a specification of the formulas, implementations would not have a standard way to exchange spreadsheet formulas, rendering the entire specification incomplete with respect to spreadsheets.
  • 2005-02-04: James Clark reported on his independent discovery of this shortcoming in the Open Document format. He said he searched "for the section where the Calc formula syntax is specified, and I search, and I search...and I find things like [string]... I really hope I'm missing something, because, frankly, I'm speechless. You cannot be serious. You have virtually zero interoperability for spreadsheet documents. To put this spec out as is would be a bit like putting out the XSLT spec without doing the XPath spec. How useful would that be? It is essential that in all contexts that allow expressions the spec precisely define the syntax and semantics of the allowed expressions. Using a namespace prefix is a nice idea, but it needs to be required (and enforced by the schema), and then you need to define a namespace that contains at least the basic functionality needed by a spreadsheet, and preferably everything supported by OOo 2.0, and precisely define the syntax and semantics of expressions associated with that namespace. OpenDocument has the potential to be [extraordinarily] valuable and important standard. I urge you not to throw away a huge part of that potential by leaving such a gaping hole in your specification."
  • 2005-02-06: Claus Agerskov posted that "it has come to my knowledge that OpenDocument doesn't specify the [formulas] used in spreadsheets so every spreadsheet vendor can implement formulars in their own way without being an open standard. This way a vendor can create lock-in to their spreadsheets. Please [delay] announcing OpenDocument 1.0 until the format for spreadsheet [formulas is] specified."
  • 2005-02-26: David A. Wheeler released a draft of this OpenFormula specification to allay these concerns.
  • 2005-09-20: Marco Fioretti's article "OpenDocument office suites lack formula compatibility" was published on NewsForge.
  • 2005-10-15: Wheeler started up an informal group to work on completing this OpenFormula specification.
  • 2005-12-18: Jean Hollis Weber's article "What is OpenFormula and why should you care?" published by O'Reilly.

Rationale for General Approach

In general, OpenFormula strives to follow prior art and practice. Winer's Law of the Internet (by Dave Winer) claims that "Productive open work will only result in standards as long as the parties involved strive to follow prior art in every way possible. Gratuitous innovation is when the standardization process ends, and usually that happens quickly." In practice, that means that for syntax inside OpenDocument, existing spreadsheets that use OpenDocument (such as OpenOffice, StarOffice, and KOffice) tend to be the model. For semantics, Microsoft Office and other implementations (including the spreadsheets previously noted) are considered.

Style Guide

This section includes recommendations to contributors that don't belong in "Document Notation and Conventions."

Use lots of section headings and subheadings. It makes editing and identifying pieces easier. Indented paragraphs are easily lost in translation between file formats, and are often misunderstood by readers.

Use MediaWiki templates for the following:

  • {{Rationale|...text here...}} : Rationale explaining why something is required the way it is.
  • {{Note|...text here...}} : Note explaining what current implementations do, suggestions on how to implement this, etc.
  • {{Discussion|...text here...}} : Discussion on various alternatives where a decision has not yet been made. It's best to identify major points here, so that people who print out the whole document can read them. However, these aren't good places for lengthy discussion texts. For lengthy discussions, place the more detailed material in the corresponding Wiki "Discussion" pages (or mailing list if you believe it needs wide discussion), and then briefly note the existence, topic, and location of discussion in {{Discussion|...}} template text. That way, people who just read the document will know about the discussion, without having to print the details of every discussion when they print the document.
  • {{TBD|...text here...}} : To be done; this identifies issues that are not yet resolved. This was previously marked with "???", but "???" is harder to see, and harder to remove automatically.

In MediaWiki templates the "=" cannot be used in argument text. Thus, an "=" sign in the main text of a template must be represented as "&#61;" unless it's a URL; in a URL, use %3D to represent an equal sign. The text can be many paragraphs; the ending curly braces do not need to be on the same line.

Rationale: Templates are used for this material so that it will be easier to print versions of the document that omit this text.

Useful Information Sources


Information used to develop this specification included the following:

Tangentially related:

Contributors

Rules for Contributors

In short, contributors to the OpenFormula specification must agree to:

  1. support the fundamental goal (an open standard anyone can implement in perpetuity, both open source and proprietary)
  2. work as reasonably possible to help meet the secondary goals
  3. strive to respectfully reach a rough consensus during the specification's development, and
  4. act ethically and legally.

All contributors automatically grant David A. Wheeler a non-exclusive copyright to their contributions, who will negotiate with groups such as standards bodies to implement the above. All contributors also retain a copyright to their own work, which they can use however they wish. The specification is released under the Creative Commons Attribution-ShareAlike, GPL and GFDL; see License Terms for more information.

Details below. From here on in this section, "you" means any contributor.

Fundamental goal

The fundamental goal of this OpenFormula Development Group is to "develop a specification for exchanging formulas (at least spreadsheet formulas for OpenDocument) that is implementable by ALL proprietary and open source software projects, using ANY and ALL of the common licenses, business models, and development models for them; this must be true in perpetuity for itself and any formally approved derivatives." Specifically, the specification MUST be implementable using licenses such as Microsoft's Office 2003 license, the Corel Word Perfect Office 12 Home Edition license, the GNU GPL, LGPL, MIT, BSD-new, and the MPL. Terms that are incompatible with the GNU GPL version 2 are not acceptable; terms that require the GNU GPL version 2 are also not acceptable.

People who cannot accept the fundamental goal, stated above, are not allowed to contribute.

This means you may not contribute anything encumbered by a potentially enforceable patent, to the best of your knowledge and ability. If you know of a patent (including third parties') and you believe it's possibly enforceable, you must let us know so we can avoid it. If your organization has a patent on something, they need to allow royalty-free implementation with all terms necessary to meet the fundamental goal. We understand that individuals not representing their company can only speak for themselves; that's fine. Frankly, formulas are so old it's hard to imagine anyone having or getting a patent on this kind of material, but we need to avoid the problem (and we want any later standards body to avoid patented components in later revisions).

Here are other implications:

  1. Implementations must be able to have transferable and sub-licensable licenses, and they must be allowed to implement a subset (there may be a requirement that partial implementations can't claim they implement OpenFormula in such cases, like Adobe's license for PDF). This is because OSS development models effectively require these conditions.
  2. It must be possible for implementations to not release their source code, e.g., there cannot be a "header" file that is licensed under only the GPL. This is because typical proprietary development models require this.
  3. Other legal tricks that might "lock out" an implementor are also unacceptable.
  4. Standards bodies often create test suites. These can be valuable, but they must not prevent implementations from using the test suites, or claiming conformance when they do conform. There is, in fact, a test suite built into this specification itself. We intend to require that any standards body that is granted the rights to control this specification must allow anyone to freely test against the test suite defined in the specification. What's more, any open source software implementation, or proprietary program that is made free to end-users, must be allowed to test against any test suite and to publicly claim conformance to the test suite (if true), without fee or other legal restriction. Such implementation developers may not have the economic means to pay for conformance testing, and the goal is maximum and widespread conformance, not maximum profit for standards bodies.

We're aware that it's possible for someone to be in the process of getting a patent and no one else would know it. We're also aware that the patent office routinely grants absurd patents (such as swinging on a swing) and patents for established prior art (that's why it says "potentially enforceable" above).

Rationale: Users want to be able select any implementation; unnecessary constraints are not acceptable. An Economic Basis for Open Standards by Rishab Aiyer Ghosh discusses the need for open standards further. Groklaw summarizes that paper.

Secondary Goals

You must strive to try to meet the secondary goals:

  1. The specification should be publicly accessible, without fee or registration requirements, by electronic means (e.g., a web address). The IETF, W3C, OASIS, Free Standards Group, and others do this; free availability has been shown to be very effective at encouraging use of a specification. That's especially important in this case, because we expect that users may need access to this specification. It's okay if a public standards body has a for-fee reformatted specification, as long as the equivalent technical contents are also freely available (examples include Ada and the Common Criteria).
  2. The specification should be VERY widely vetted, by many implementors.
  3. Eventually maintenance of this specification should be transferred to some vendor-neutral standards body/bodies such as OASIS, W3C, Free Standards Group, IETF, ECMA, or ISO.

We can give on these secondary goals where necessary, but we think it's very undesirable to do so.

Respectfully Reach a Rough Consensus

You agree to respectfully work together with other contributors to reach a rough consensus, following the IETF mantra "rough consensus and running code". Where there are disagreements, you agree to try to identify the root issues, find compromises, and so on. You agree to try to avoid attacks on other contributors, and instead on concentrate on technical merit. You agree to consider any implementors' concerns very seriously, since unimplemented specifications are worthless.

Act Ethically and Legally

You must act ethically and legally. You must work to ensure this specification helps all, not just yourself, and you must reveal any conflicts of interest. You must respect the copyrights and licenses of other works. You must not work to find loopholes in these rules, but instead work to maintain their spirit.

Consequences

If you are significantly disruptive to the development process or egregiously fail to meet the requirements above, you may be removed by the group leader. This step will not be taken lightly, and typically you will be given a warning first. We are delighted to have so many knowledgeable volunteers, and we cannot allow someone to disrupt them.

Trademark

OpenFormula (TM), OpenFormula Specification (TM), OpenFormula Standard (TM), and OpenFormula Project (TM) are trademarks of David A. Wheeler. Mr. Wheeler will grant use of this trademark to any vendor-independent standards body who agrees to completely implement the fundamental and secondary rules above (in perpetuity); see Mr. Wheeler for details.

License Terms

Anyone can implement this specification, without restriction. You do not need to pay any fee to implement it. Your implementation can be Free-Libre / Open Source Software (FLOSS) or proprietary, for example, you MAY release your implementation under the GNU General Public License (GPL) but you DO NOT have to. So, if you are an implementor, please use this specification with our blessing.

Anyone can redistribute unmodified or modified copies of this specification. Releasing unmodified versions requires no special privilege (other than including information about its license terms). Recipients may re-release modified versions of this document under one or more of the following licenses:

  • Creative Commons Attribution-ShareAlike version 2.5 (or greater)
  • GNU General Public License (GPL) version 2 (or greater)
  • GNU Free Documentation License (GFDL) version 1.2 (or greater)

Modifiers also grant a non-exclusive copyright to their modifications of the specification to David A. Wheeler (so that the combined work can be provided to a standards body sometime in the future with different privileges). Unless otherwise stated, any modification is also released under all three of those liceses, but at least one must be selected.

You cannot, however, claim that your version is an official version of OpenFormula, or make representations that make it appear that way. A trademark has been asserted on OpenFormula to prevent this.

We anticipate that standards bodies will want additional terms. If you are involved in such a body, please contact David A. Wheeler. We want to transition this to a formal standards body, but in a way that ensures that the specification's goals are met.

Also see the Rules for Contributors.

Bibliography (Non-normative)

  • Blattner, Patrick, Laurie, Ulrich, Ken Cook, and Timothy Dyck (1999). Special Edition Using Microsoft Excel. Que. Indianapolis, Indiana. ISBN 0-7897-1729-8.
  • Simon, Jinjer (2000). Excel 2000 in a Nutshell. O'Reilly.
  • Walden, Jeff (1986). File Formats for Popular PC Software: A Programmer's Reference. Wiley Press Book, John Wiley & Sons, Inc. NY, NY.
  • Walkenbach, John (1999). Excel 2000 Formulas. M&T Books, an imprint of IDG Books Worldwide, Inc. ISBN 0-7645-4609-0.
  • Walkenbach, John (2004). Excel 2003 Formulas. Wiley Publishing, Inc. Indianapolis, Indiana. ISBN 0-7645-4073-4.