Dotneteers.net
All for .net, .net for all!

LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive

In the previous part of the LearnVSXNow! series I shared my first experiences with the new Visual Studio 2010 SDK CTP. To help you understand new features in VS 2010 editor extensibility I decided to write a deep dive about a few examples. I selected the TextColoringSample application shipped with the CTP as the first example.

At the time of writing this post I can leverage only on the information I have found in the help files provided with the VS SDK CTP, so it may happen that my deep dive contains wrong definitions or assumptions about certain types. I’d be very happy if you’d find this kind of issues and share them with me.

Text Coloring Sample

This sample program demonstrates how to provide custom formatting of word in a text document. Only words “this” and “body” are customized by the sample; their text color is set to red.

The editor extensibility concept behind this kind of coloring is called “classification”. Classification means that we can logically classify the content of an editor (in this case the text behind the editor) and highlight elements of the editor content matching with our classification.

This sample defines its own logical classifier called “word”. When applying this classifier on the text behind the editor, this recognizes words “this” and “body”, and says “these words match with me and I want to mark each of them as ‘word’”.  A classifier can be assigned with a format to highlight the text spans matching with the classification visually.

Classification architecture

The sample operates with four key types in this scenario to implement the classification pattern. Here is the blueprint of the solution:

[Export(typeof(IClassifierProvider))]

[ContentType("text")]

// --- Provides the classification implemented by the Colorer class

internal sealed class MyClassifierProvider : IClassifierProvider

{

  // ...

}

 

// --- Implements the algorithm for the logical classification named “word”

// --- Uses the concept of “word provider” to set words matching with this

// --- classification

internal sealed class Colorer : IClassifier

{

  // ...

}

 

// --- Defines a logical classification in order to be registered by Visual Studio

// --- text editor

[Export(typeof(ClassificationTypeDefinition))]

internal sealed class WordClassificationType : ClassificationTypeDefinition

{

  // ...

}

 

// --- Defines a visual format used for the “word” classification

[Export(typeof(ClassificationFormatDefinition))]

[ClassificationType(ClassificationTypeNames = "word")]

[Name("WordClassificationFormat")]

[Order]

internal sealed class WordClassificationFormatDefinition :

  ClassificationFormatDefinition

{

  // ...

}

These types are loosely-coupled using the Managed Extensibility Framework (MEF) behind. MEF is responsible for composing the relationships between objects in order these types can co-operate with each other. Without going into details about MEF, imagine that MEF provides a way to couple related elements during runtime using attributes declaring contract information.

The editor recognizes MyClassifierProvider class as one implementing the IClassifierProvider contract (the Export attribute marks this fact). The ContentType metadata attribute tells the editor that MyClassifierProvider intends to be a provider for content type of “text”. Actually the expression “the editor recognizes” means, that MEF binds an instance of MyClassifierProvider to the editor. MyClassifierProvider instantiates a Colorer object implementing the IClassifier interface and the editor will use it to carry on the classification over the text buffer.

A classifiers main task is to recognize elements of the underlying text buffer and highlight those that match with a classification type. Although a classifier can recognize one or more classification types, Colorer implementation recognizes only the “word” classification type. In order it can assign with its highlighted text spans with the logical classification named “word”, the editor must know this classification type.

This is where WordClassificationType comes into the picture. It implements the ClassificationTypeDefinition contract (marked by the Export attribute).

There is one piece of information the editor needs to finish its task: it has to know how to format text spans matching with a classification type. This information is set also in a loosely-coupled way. The WordClassificationFormatDefinition class implements the ClassificationFormatDefinition contract and with its ClassificationType metadata attribute is declares that this format should be used for classification of type “word”. I suppose now this time I should not tell that the Export attribute names the contract.

The Order attribute’s role is to set an order of format definitions to apply in case if there is more than one for the same classification type. This time the Order attribute seems unnecessary, however (it is a bug or feature) omitting it will prevent the format definition from working.

As you can see, responsibilities of classes are well-defined and the loosely coupled implementation provides a lot of flexibility. For example, we can define a classifier extension and the classification format definition as another extension. While we do not extend Visual Studio with the format definition, no visual clues are shown for the classification. As soon as we plug-in a format definition, classifications are visualized. If we do not like that visualization we can use another format extension.

The Classifier Provider

In the Text Coloring Sample the MyClassifierProvider class is responsible for offering an instance that classifies the editor content. This factory pattern provides more flexibility to influence how the classifier should work (or even which classifier to use) depending on the current environment. The implementation of MyClassifierProvider is the following:

[Export(typeof(IClassifierProvider))]

[ContentType("text")]

internal sealed class MyClassifierProvider : IClassifierProvider

{

  [Import]

  internal IClassificationTypeRegistryService classificationTypeRegistry

    { get; set; }

 

  [Import]

  internal ImportInfoCollection<IWordListProvider> wordListProviders

    { get; set; }

 

  public IClassifier GetClassifier(ITextBuffer buffer, IEnvironment context)

  {

    return new Colorer(buffer, classificationTypeRegistry, wordListProviders);

  }

}

After MEF has finished the composition, the editor has a MyClassifierProvider instance ready-to-use. Because of the Import attributes during the composition MEF sets the values of the fields:

—  classificationTypeRegistry field will contain an instance to an object implementing the IClassificationRegistryService contract

—  The wordListProviders container will hold a collection of objects implementing the IWordListProvider contract

The editor uses the MyClassifiersProvider instance simply for calling the GetClassifier method retrieving the IClassifier instance responsible for carrying on the classification of the underlying text buffer. Please note, the context parameter with type of IEnvironment: our classifier could use information coming from the environment to adopt its classification algorithm.

Here we simply create a new Colorer instance passing the text buffer, the service handling the classification type registry and the list of word providers.

The IWordListProvider concept is defined in this sample; it is not a part of the Visual Studio architecture. This interface is very simple, it provides a way to return the enumeration of words to be recognized by our classifier:

public interface IWordListProvider

{

  IEnumerable<string> GetWords();

}

The sample provides a lightweight implementation:

[Export(typeof(IWordListProvider))]

internal sealed class MyWordListProvider : IWordListProvider

{

  public IEnumerable<string> GetWords()

  {

    return new List<string>(new string[] { "this", "body" });

  }

}

The Export attribute is very important here. This marks the type as an implementer of the IWordListProvider contract and that is how a MyWordListProvider instance is pumped into the wordListProviders container field of the MyClassifierProvider instance.

The Classifier

The lion’s share of the work is done by the Colorer classifier class.

internal sealed class Colorer : IClassifier

{

  private ITextBuffer buffer;

  private IClassificationTypeRegistryService _classificationTypeRegistry;

  private ImportInfoCollection<IWordListProvider> _wordListProviders;

 

  internal Colorer(ITextBuffer bufferToClassify,

    IClassificationTypeRegistryService classificationTypeRegistry,

    ImportInfoCollection<IWordListProvider> wordListProviders)

  {

    buffer = bufferToClassify;

    _classificationTypeRegistry = classificationTypeRegistry;

    _wordListProviders = wordListProviders;

  }

The class implements the IClassifer interface that defines the behavior of a classifier object. The class constructor simply stores the input parameters to use them later.

  public event EventHandler<ClassificationChangedEventArgs> ClassificationChanged;

IClassifier defines an event raised when a classification has been changed. You can use this event to re-evaluate your existing classifications.

For example, the word “select” and “from” are not keywords in C#, they can be identifiers. When using them in a LINQ query, they become special keywords in the context of the LINQ query (and for example they can have different classification).

In the sample this event is not used.

The most complex method in this class is the GetClassificationSpans method that collects the text spans matching with the classifications this classifier is responsible for. The first part of the method prepares variables for carry on the search algorithm:

  public IList<ClassificationSpan> GetClassificationSpans(SnapshotSpan span)

  {

    IClassificationType wordClassificationType =

      _classificationTypeRegistry.GetClassificationType("word");

    Span simpleSpan = span.Span;

    string text = buffer.CurrentSnapshot.GetText(simpleSpan);

    List<ClassificationSpan> classifications = new List<ClassificationSpan>();

The input parameter for this method is a SnapshotSpan instance. Snapshots are new objects in the editor model, their role is to provide a context where the underlying editor context is immutable (a snapshot in a time point). Snapshots are good concepts to exempt developers from coping with asynchronous changes on the content of the editor. The SnapshotSpan here represents the immutable part of the text to classify.

When Colorer recognizes a word belonging to its classification, it must add to the return set and must name the classification type. The IClassificationType instance stored in the wordClassificationType variable is used for this purpose and is obtained from the classification type registry. We store the part of the editor buffer to classify in the text variable.

The next part of the algorithm searches the text for words defined in the coupled IWordListProviders:

    int searchOffset = 0;

    do

    {

      int nextStart = -1;

      string nextWord = null;

      foreach (ImportInfo<IWordListProvider> wordListInfo in _wordListProviders)

      {

        foreach (string word in wordListInfo.GetBoundValue().GetWords())

        {

          int wordStart = text.IndexOf(word, searchOffset);

          Boolean foundMatch = wordStart != -1;

          if (foundMatch && (nextStart == -1 || wordStart < nextStart))

          {

            nextStart = wordStart;

            nextWord = word;

          }

        }

      }

      if (nextWord == null) break;

The algorithm works so that searches the text for all words after each other starting from a specific search position. In every cycle after searching for all words it calculates the starting search position to be used in the next cycle, and quits the cycle when none of the words is found. The algorithm works in this way because the text spans returned must be ordered by their starting position and non-overlapping in order to be displayed by the editor.

      int wordLength = nextWord.Length;

      classifications.Add(new ClassificationSpan(new SnapshotSpan(span.Snapshot,

        new Span(nextStart + simpleSpan.Start, wordLength)),

        wordClassificationType));

When a new classification is found we add it to the result set as a new instance of ClassificationSpan and the type of the classification is also specified. This approach allows a classifier to recognize one or more classification types.

      searchOffset = nextStart + wordLength;

    } while (true);

    return classifications;

  }

}

At the last part of the methods the next starting search position is calculated. When we quit the search cycle the collected classification spans are returned.

Defining the Classification Type

In order the editor could handle our logical classification we must define that. This is the role of the WordClassificationType class:

[Export(typeof(ClassificationTypeDefinition))]

internal sealed class WordClassificationType : ClassificationTypeDefinition

{

  public WordClassificationType()

  {

    Name = "word";

  }

}

This type implements the ClassificationTypeDefinitionContract simply by setting the Name property of the classification type to “word”. Classification types can create a hierarchy: a classification can be derived from one or more other classifications! If we want to set them, the base classes should be added to the protected BaseDerivesFrom list.

Here we do not use the inheritance feature.

Defining the Classification Format Definition

When we have a text span matching with a classification type we probably want to define a format for it to visualize its classification.

This is where the ClassificationFormatDefinition type comes to the picture. Through the metadata attributes it is assigned to the “word” classification type with the given name “WordClassificationFormat”.

[Export(typeof(ClassificationFormatDefinition))]

[ClassificationType(ClassificationTypeNames = "word")]

[Name("WordClassificationFormat")]

[Order]

internal sealed class WordClassificationFormatDefinition :

  ClassificationFormatDefinition

{

  public WordClassificationFormatDefinition()

  {

    ForegroundBrush = Brushes.Red;

  }

}

ClassificationFormatDefinition contains a few properties like BackgroundBrush, ForegroundBrush, FontTypeFace, TextDecorationsXaml, etc. In the constructor we can set these properties to visualize our format definition according to our needs. Here we simply set the text color to red.

Summary

The new Visual Studio editor provides easy-to-use extension mechanism based on the Managed Extensibility Framework. The Text Coloring Sample is a good demonstration of this fact where the components of the solution are loosely-coupled. The sample uses the so-called “classification types” to select and colorize words.


Posted Nov 04 2008, 04:56 PM by inovak
Filed under: ,

Comments

Bill wrote re: LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive
on Tue, Nov 25 2008 21:55

Great article... question though.  Say you were writing a classifier for a language like C# where a /* could be above the line that is being classified by IClassifier.GetClassificationSpans.  How would you know that?  In your sample, you are just iterating words assuming that the start of a line is a start of the default lexical state.  Can you give any hints on how an advanced classifier would be done that processes things like being in multi-line comments, etc.?

inovak wrote re: LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive
on Wed, Nov 26 2008 6:28

Hi Bill,

The question you mentioned is one of the few I am really interested in. Right now I try get closer to those Microsoft guys who work on the new editor. They have code samples answering your questions, but those are not public yet. As a VSX insider I hope they will open it and allow me to blog about the whole classification process.

One thing I know is that classification is the mechanism they use for syntax coloring in the new VS. With classicifaction they can handle all aspects of coloring includng comments and context sensitive keywords (like select, from, yield, etc.)

As soon as I get some news I try to publish them.

text editor | Digg.com wrote text editor | Digg.com
on Sat, Nov 29 2008 15:07

Pingback from  text editor  | Digg.com

Noah Richards wrote re: LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive
on Sun, Feb 8 2009 11:10

Hey Bill,

I'm an editor dev, so I'll see if I can help a bit.

Except in the simplest circumstances, language classification probably requires you to build some type of model over the language.  If the only state you need to remember is, say, the lines that start in comments or multi-line strings or something similar, that's information that you'll want to create and store in your classifier (and update whenever text changes).  More complicated scenarios will generally require building a more complicated model on top of the text, which is what most languages end up doing.

To the end of making this analysis simpler, buffers use a snapshot model, where you can get a stable version of the text in a buffer at any point in time (hence why the argument to GetClassifications is a SnapshotSpan), along with methods for translating SnapshotPoints and SnapshotSpans from one version of the buffer to another.

Unfortunately, the TextColoringSample is an overly-complicated example of writing a classifier, since it adds in the aspect of creating and consuming a new extension (the IWordListProvider).  Most classifiers will only have to write the classifier provider and classifier (and either create or re-use some of the existing classification types and format definitions).  The intent is that implementing the classifier interfaces is a small burden on top of whatever language analysis you'll need to write, and so you can concentrate on the complicated stuff and not classifiers themselves.

If you (or Inovak) have any questions, drop me a line; noah.richards @ ms.

-Noah

GA30 wrote re: LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive
on Sun, May 17 2009 6:31

Hello,

Noah, Inovak, or anyone. Can anyone give any insight into the degree in which developing custom editors (rather than extensions to existing ones) has changed from VS2008 to VS2010. For example, have things changed drastically from the custom editor process described in LVSXN parts 15 - 17?

Supernova wrote re: LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive
on Fri, Nov 13 2009 18:18

Why don't you support us with a frakking solution file for download?

Tomáš Pastorek wrote Boo syntax highlighting pro Visual Studio 2010
on Fri, Mar 26 2010 18:52

V jenom svém projektu používám skriptovací jazyk Boo , o kterém jsem před nedávnem psal ve svém seriálu

Jin Feng wrote re: LearnVSXNow! #38 - VS 2010 Editor - Text Coloring Sample Deep Dive
on Tue, Mar 30 2010 3:44

Hi Noah and inovak,

I got the same question that Bill asked in this thread. Do you have any update or examples to illustrate the solution?

Thanks

Creating a new language for use in Visual Studio « Serkan Hekimoglu wrote Creating a new language for use in Visual Studio &laquo; Serkan Hekimoglu
on Fri, Nov 26 2010 8:53

Pingback from  Creating a new language for use in Visual Studio «  Serkan Hekimoglu

create new language for use in Visual Studio | Coding and Programing wrote create new language for use in Visual Studio | Coding and Programing
on Thu, Dec 4 2014 7:40

Pingback from  create new language for use in Visual Studio | Coding and Programing