Working with Text

This topic concerns identifying and manipulating text in a document, with particular focus on the body text property of documents, sections, and pages.

Here is the dictionary listing from the iWork Text Suite for the basic text elements: characters, words, and paragraphs; as well as the entry for the rich text class that contains their formatting properties:

The iWork Text Suite

character  [inh. rich text ] : One of some text’s characters.

elements

contained by rich text, paragraphs, words.

 

paragraph  [inh. rich text ] : One of some text’s paragraphs.

elements

contained by rich text.

 

word  [inh. rich text ] : One of some text’s words.

elements

contained by rich text, paragraphs.

 

rich text   pl  rich text : This provides the base rich text class for all iWork applications.

elements

contains characters, paragraphs, words

properties

colorRGB color | text ) : The color of the font. Expressed as an RGB value consisting of a list of three color values from 0 to 65535. ex: Blue = {0, 0, 65535}. In addition, the lowercase names of the following standard colors can be used in place of RGB color value lists: "black", "blue", "brown", "cyan", "green", "magenta", "orange", "purple", "red", "yellow", "gray", and "white".

fonttext ) : The name of the font. Can be the PostScript name, such as: “TimesNewRomanPS-ItalicMT”, or display name: “Times New Roman Italic”. TIP: Use the Font Book application get the information about a typeface.

sizeinteger ) : The size of the font (in points).

These object and properties will be incorporated into the discussion and examples about how to locate, extract, and manipulate text.

Identifying Text

DO THIS ►For use with this tutorial DOWNLOAD the example Alice in Wonderland Pages document.

The first requirement for accessing and/or editing text is to identify to the script, the text you want it to examine. This is done through the use of references. You can identify (reference) text using either of three methods:

  • By position value
  • By property value
  • By contents

Identifying by Position

Position referencing is a fairly straight-forward concept:

the first character
the first word
the first paragraph
the last character of the first paragraph
the first character of the second word of the third paragraph

You can reference by a contiguous range of elements as well:

characters 1 thru 5 of the first paragraph
words three thru -1 of the second paragraph -- -1 is the last item
paragraphs 3 thru 7 of section 2

Identifying by Property

Referencing by property value requires a bit more detail in that the script must contain the target property name and the target property value:

every word of every paragraph where its font is "Impact"
every paragraph where its size is 11

Identifying by Contents

For content matching, the script indicates the class of the object it wants to locate and the target text of the class:

first word of every paragraph where it is "Alice"

IMPORTANT: Avoid searching the document body text directly, such as: every word of body text where it is "Alice". Doing so can hang the application. Instead, search all paragraphs of the body text: every word of every paragraph of the body text where it is "Alice"

Getting the Contents of Text Objects

This examination of working with text begins with getting the contents of text objects. Simply precede a text object reference with the get verb (command) and the result of the script statement will be the text string contents of the text object:

Get the Contents of a Text Object
  
01tell application "Pages"
02 tell the front document
03 tell the body text
04 get first word of the third paragraph
05 --> "Alice"
06 end tell
07 end tell
08end tell

IMPORTANT: In the AppleScript language, use of the get command to retrieve an object’s value is optional, as its use is implied by default. So in the iWork applications, use of a text object reference without a preceding verb returns the text objects contents.

The Implied Get Verb (command)
  
01tell application "Pages"
02 tell the front document
03 tell the body text
04 -- BOTH STATMENTS RETURN THE SAME RESULT
05 -- NOTE: by default in the iWork applicaitons, use of a text object reference not preceded with a verb (command) returns its contents
06 
07 -- explicit use of the get verb:
08 get first word of the third paragraph
09 --> "Alice"
10 
11 -- implicit use of the get verb:
12 the first word of the third paragraph
13 --> "Alice"
14 end tell
15 end tell
16end tell

The implicit use of the get verb (command) also occurs when a script statement uses the set verb to set the value of variables to the result of a text object reference:

Variables and the Get Verb
  
01tell application "Pages"
02 tell the front document
03 tell the body text
04 -- implicit use of the get verb:
05 set thisWord to the first word of the third paragraph
06 say thisWord
07 --> "Alice"
08 -- Explicit use of the get verb:
09 set thisWord to get the first word of the third paragraph
10 say thisWord
11 --> "Alice"
12 end tell
13 end tell
14end tell

To store a text object reference in a variable instead of the contents of the text object, use the AppleScript construct: a reference to

Store a Text Object Reference
  
01tell application "Pages"
02 tell the front document
03 tell the body text
04 set thisWord to a reference to the first word of the third paragraph
05 return thisWord
06 --> word 1 of paragraph 3 of body text of document 1 of application "Pages"
07 end tell
08 end tell
09end tell
Changing the Values of Text Object Properties

To change the values of the properties of a text object, such as its typeface, type size, or color, use a tell statement or tell block targeting the text object. The tell statement or blocks include the use of the set verb to trigger the change of the value of properties.

Tell statements are useful for changing the value of a single property, tell blocks are used to change the values of multiple properties:

Changing Properties of Text Object
  
01tell application "Pages"
02 activate
03 tell the front document
04 tell body text
05 
06 -- tell statement (single sentence):
07 tell the first character to set its color to "blue"
08 
09 -- tell block (begin tell and end tell)
10 tell the first character
11 set the color of it to "blue"
12 set the size to "24"
13 set the font to "Times New Roman Italic"
14 end tell
15 
16 end tell
17 end tell
18end tell

NOTE: In the Pages dictionary, the term “color” is both a class and a property. To avoid ambiguity and possible errors in scripts using this term, explicitly indicate color’s use as a property by including possessive terms such as its color or color of it to indicate that color is being used as a property of the targeted text object (it).

The use of possessive indicators is optional for the other text object properties.

Here are two short script examples of changing the value of text object properties:

Large 1st Character
  
01tell application "Pages"
02 activate
03 tell the front document
04 set thisCharacter to ¬
05 a reference to the 1st character of the 1st paragraph of the body text
06 tell thisCharacter
07 set size to ((its size) * 3)
08 set font to "Zapfino"
09 set color of it to "blue"
10 end tell
11 end tell
12end tell
Turn Alice Blue
  
01tell application "Pages"
02 activate
03 tell the front document
04 tell body text
05 set the color of ¬
06 (every word of every paragraph where it is "Alice") to "blue"
07 end tell
08 end tell
09end tell

The following script uses the provided example Pages file to demonstrate how to format the section titles with multiple text styles:

Formatting Chapter Titles
  
01tell application "Pages"
02 activate
03 tell the front document
04 
05 get words 1 thru 2 of the first paragraph of the body text of every section
06 -- {{"CHAPTER", "I"}, {"CHAPTER", "II"}, {"CHAPTER", "III"}, {"CHAPTER", "IV"}, {"CHAPTER", "V"}, {"CHAPTER", "VI"}, {"CHAPTER", "VII"}, {"CHAPTER", "VIII"}, {"CHAPTER", "IX"}, {"CHAPTER", "X"}, {"CHAPTER", "XI"}, {"CHAPTER", "XII"}}
07 
08 -- format the first part of the section title
09 tell words 1 thru 2 of the first paragraph of the body text of every section
10 set font to "Zapfino"
11 set size to 8
12 set the color of it to {26214, 26214, 26214}
13 end tell
14 
15 -- format the second part of the section title
16 tell words 3 thru -1 of the first paragraph of the body text of every section
17 set font to "Times New Roman Italic"
18 set size to 18
19 set the color of it to {0, 0, 0}
20 end tell
21 
22 -- Format any ending punctuation
23 tell the last character of the first paragraph of the body text of every section
24 set font to "Times New Roman Italic"
25 set size to 18
26 set the color of it to {0, 0, 0}
27 end tell
28 
29 end tell
30end tell

(⬇ see below ) Some of the section titles formatted by the script:

Examples of the section chapter titles with new formatting
Changing the Content of a Text Object

To change the content of a text object, use the set verb followed by a delineated reference to the text object, followed by the new text for the text object:

set (the first character of the body text) to "K"

Note that the text object reference is placed within parens ( ) to ensure that the object reference is resolved first by the script, before it attempts to change the text object’s contents.

Scripts can replace the content of all of the Pages text objects: characters, words, paragraphs, and even the document’s body text:

Change the Content of Text Objects
  
01tell application "Pages"
02 activate
03 
04 set thisDocument to ¬
05 make new document with properties {document template:template "Blank"}
06 
07 tell thisDocument
08 -- replace body text
09 set body text to "Curabitur blandit tempus porttitor. Donec sed odio dui. Donec id elit non mi porta gravida at eget metus. Etiam porta sem malesuada magna mollis euismod.
10 
11Donec ullamcorper nulla non metus auctor fringilla. Maecenas faucibus mollis interdum. Nulla vitae elit libero, a pharetra augue. Praesent commodo cursus magna, vel scelerisque nisl consectetur et. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
12 
13Donec sed odio dui. Vestibulum id ligula porta felis euismod semper. Praesent commodo cursus magna, vel scelerisque nisl consectetur et. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla vitae elit libero, a pharetra augue. Donec justo odio, dapibus ac facilisis in, egestas eget quam. Maecenas sed diam eget risus varius blandit sit amet non magna.
14 
15COPYRIGHT NOTICE"
16 
17 tell body text
18 -- set body tet properties
19 set font to "Verdana"
20 set size to 18
21 set color of it to "gray"
22 
23 -- replace character
24 set (the first character) to "T"
25 
26 -- replace word
27 set (every word of every paragraph where it is "Donec") to "Salnec"
28 
29 -- replace paragraph
30 set (every paragraph where it begins with "COPYRIGHT NOTICE") to ¬
31 "© ACME WIDGETS, INC. ALL RIGHTS RESERVED."
32 end tell
33 end tell
34end tell

Read the next page for details on how to perform find and replace on document with long body text.

TOP | CONTINUE