Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ In EditorConfig:
settings based on the key-value pairs.
- "Editors" permit editing files, and use plugins to update settings for
files being edited.
- The words "tab" and "hard tab" are interchangable and represent the
character defined by the Unicode HT/TAB symbol (U+0009).
- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other encodings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good concern. However, we support the following (and we say that in the spec) encodings:

  • UTF8 including.one with byte order mark
  • UTF16, both endianness
  • Latin1

All of those implement unicode even UTF16, which is not ascii compatible, but still implement unicode code points. I think we can agree that practically any encoding that you can imagine and that appeared in the last 30-35 years implemented a unicode.

The alternative is that we will not be able to reliably specify what exactly we mean by the words space and tab.

Copy link
Member

@xuhdev xuhdev May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).
- The word "space" refers to the character corresponding to the Unicode Space/SP symbol (U+0020) in any encoding.

... we support the following (and we say that in the spec) encodings: ...

A user can choose to not specify an encoding, in which case EditorConfig disregards any encoding settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines

A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored...

The whole text doesn't define space.

This approach may be better here because our "space" is meant to fit in the general context of text, free from any encoding requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general point is that, we would like to define some words since we would like them to have a specific meaning in our context. But we don't need to define all common words, if they are not different from other technical context and they don't bring ambiguity that hinders implementation. One similar general principle is from law, which I believe is also a good principle for specs to follow:

Statutory construction begins with looking at the plain language of the statute to determine its original intent. To determine a statute's original intent, courts first look to the words of the statute and apply their usual and ordinary meanings. https://www.law.cornell.edu/wex/statutory_construction

If we replace "statute" with "spec", and "court" with "implementation", that's exactly how I read specs 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines

That is not entirely true, actually.

The documentation section you've mentioned does not specify the meaning of what space because of that:

Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see PEP 3120 for details. If the source file cannot be decoded, a SyntaxError is raised.

The Python interpreter expects the UTF-8 as the default encoding if not specified, so I disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is Python does not require UTF-8. If the source is not UTF-8, what does space mean in its spec? The meaning of space doesn't suddenly become ambiguous simply because Python doesn't enumerate what space means in various encodings.

The most important point is that people understand what space, tab mean in the context, and I can hardly imagine any ambiguities. If you really feel the need to define space, I believe my edit suggested above is more appropriate (the original text confuses readers because it seems to suggest UTF-8 is only supported).

- "Column" is an abstract atomic unit of indentation. A single space, as defined above, is expected
to contribute exactly one column to the indentation of a given line. The amount of columns
contributed by the hard tab depends on the configuration pairs defined in the :ref:`supported-pairs` section.
- The term "soft tab" represents the numerous amount (1..N) of sequential space characters, which,
when considered together, form an indentation level that has a length equal to the length of a single
hard tab. The length of both the soft tab and a hard tab is measured in columns.

A conforming core or plugin must pass the tests in the
`core-tests repository`_ or `plugin-tests repository`_, respectively.
Expand Down Expand Up @@ -332,7 +341,7 @@ section to specify their behavior. Consider the following code snippet:

The ``indent_size`` setting for this code snippet equals 4, because ``indent_size`` means how many columns are required
to indent the next line in relation to previous (if indentation, of course, is applicable for this line). Then the next question
is *how* this indentation of 4 columns is achieved. It may be 4 consequent spaces/soft tabs,
is *how* this indentation of 4 columns is achieved. It may be 4 consequent spaces,
a single tab with width equal to 4, or two tabs with width equal to 2.

This is when ``indent_style`` comes into picture. It specifies what character should be used **whenever possible** in order to
Expand Down