book_cataloguing module reference

Note

You may notice that with the current layout of the book_cataloguing package, all of these functions are actually defined in the file contents.py. However, this layout is subject to change in future versions of the package; please import functions from book_cataloguing itself rather than book_cataloguing.contents.

Unicode support?

book_cataloguing has some support for non-ASCII characters:

>>> from book_cataloguing import capitalize_title
>>> print(capitalize_title("l'île noire"))
L'Île Noire

However, this support is experimental, and subject to change: please do not rely on it for much. The package does not actually support any language other than English; it probably will not do a good job capitalizing non-English book titles that are more complicated than the one above.

Changing internal lists

In the module book_cataloguing.contents, four lists of strings are created for later use by the functions defined therein. The names of these lists are subject to change, but currently they are:

  • LOWERCASE_TITLE_WORDS,

  • LOWERCASE_AUTHOR_WORDS,

  • MAC_SURNAMES,

  • and AUTHOR_TITLES.

(More information on each list is available below in the corresponding function.)

Their starting values are (in the developer’s opinion) quite suitable for general use, but a function is provided to change each one, if you choose. In each case, the new strings to put in the list must be in an external file, whose name is passed to the function.

book_cataloguing.set_lowercase_title_words(filename: str | None = None) None

Get a new list of lowercase words in book titles from a file.

In the file should be words like “the”, “a”, and “of”, that should not be capitalized when they are in the title of a book (unless they are at the beginning or end of a title or subtitle.)

The referenced file should have one word on each line. The case of the words does not matter, and they need not be sorted in any particular order.

If filename is None, the default file for this list (book_cataloguing/lowercase_title_words.txt) will be used.

book_cataloguing.set_lowercase_author_words(filename: str | None = None) None

Get a new list of lowercase words in author names from a file.

In the file should be words like “le”, “von”, and “of”, that should not be capitalized when they are part of an author’s name, and that might be part of a multi-word surname (such as “von Neumann”).

The referenced file should have one word on each line. The case of the words does not matter, and they need not be sorted in any particular order.

If filename is None, the default file for this list (book_cataloguing/lowercase_author_words.txt) will be used.

book_cataloguing.set_mac_surnames(filename: str | None = None) None

Get a new list of surnames starting with “Mac” from a file.

In the file should be names like “MacDonald”, where the fourth letter (the letter following the “Mac”) should be capitalized.

The referenced file should have one word on each line. The case of the words does not matter, and they need not be sorted in any particular order.

If filename is None, the default file for this list (book_cataloguing/mac_surnames.txt) will be used.

book_cataloguing.set_author_titles(filename: str | None = None) None

Get a new list of author titles from a file.

In the file should be words like “lord”, “mrs”, and “president”, that, when they appear in an author’s name, are likely titles rather than part of the name itself.

The referenced file should have one word on each line. The case of the words does not matter, and they need not be sorted in any particular order.

If filename is None, the default file for this list (book_cataloguing/author_titles.txt) will be used.

Main Functions

book_cataloguing.capitalize_title(title: str, handle_mc_prefix: bool = True) str

Capitalize a book title, preserving all non-alphanumeric characters.

This function considers all non-alphanumeric characters except apostrophes to separate words, and it converts all words recognized as Roman numerals to uppercase. It also capitalizes the second letter of words starting with any letter followed by an apostrophe (e.g. O’Brien). See Examples below.

Parameters:
  • title (str) – Title to capitalize.

  • handle_mc_prefix (bool) – Whether or not to treat words starting with “mc” or “mac” differently. When True, capitalize the third letter of all words starting with “mc” (e.g. convert “mcdonald” to “McDonald”), and fourth letter of all words starting with “mac” if they are in the list of Mac surnames. (You can change this list with the function set_mac_surnames().) These prefixes are detected case-insensitively. When False, capitalize only the first letter of such names.

Returns:

Capitalized version of title.

Return type:

str

Examples

>>> capitalize_title("the hobbit: or, there and back again")
'The Hobbit: Or, There and Back Again'
>>> capitalize_title(" THE*LORD =of tHE RIngs]")
' The*Lord =of the Rings]'
>>> capitalize_title("the thirteen-gun salute")
'The Thirteen-Gun Salute'
>>> capitalize_title("a midsummer night's dream")
"A Midsummer Night's Dream"

Handling of Roman numerals:

>>> capitalize_title("henry vi, part ii")
'Henry VI, Part II'

Handling of name prefixes:

>>> capitalize_title("A BIOGRAPHY OF GEORGE MACDONALD")
'A Biography of George MacDonald'
>>> capitalize_title("a biography of george macdonald", False)
'A Biography of George Macdonald'
>>> capitalize_title("a biography of patrick o'brien")
"A Biography of Patrick O'Brien"
book_cataloguing.capitalize_author(author: str, handle_mc_prefix: bool = True) str

Capitalize the name of an author, preserving non-alphanumeric characters.

This function considers all non-alphanumeric characters except apostrophes to separate words, and it converts all words recognized as Roman numerals to uppercase. It also capitalizes the second letter of words starting with any letter followed by an apostrophe (e.g. O’Brien). See Examples below.

Parameters:
  • author (str) – Author name to capitalize.

  • handle_mc_prefix (bool) – Whether or not to treat words starting with “mc” or “mac” differently. When True, capitalize the third letter of all words starting with “mc” (e.g. convert “mcdonald” to “McDonald”), and fourth letter of all words starting with “mac” if they are in the list of Mac surnames. (You can change this list with the function set_mac_surnames().) These prefixes are detected case-insensitively. When False, capitalize only the first letter of such names.

Returns:

Capitalized version of author name.

Return type:

str

Examples

>>> capitalize_author("ludwig van beethoven")
'Ludwig van Beethoven'
>>> capitalize_author(" .LEO*TOLstoY =")
' .Leo*Tolstoy ='

Handling of Roman numerals:

>>> capitalize_author("pope john xxiii")
'Pope John XXIII'

Handling of name prefixes:

>>> capitalize_author("CORMAC MCCARTHY")
'Cormac McCarthy'
>>> capitalize_author("cormac mccarthy", False)
'Cormac Mccarthy'
>>> capitalize_author("patrick.o'brien")
"Patrick.O'Brien"
book_cataloguing.get_sortable_title(title: str, handle_mc_prefix: bool = True, correct_case: bool = True, smart_numbers: bool = True) str

Return a representation of the title that is usable for sorting.

This involves removing the first word of the title if it is “a”, “an”, or “the”, and removing non-alphanumeric characters as well.

From this function’s point of view, a word separator is any combination of non-alphanumeric characters that contains a space. See Examples below.

Parameters:
  • title (str) – Title to return sortable version of.

  • handle_mc_prefix (bool) – If correct_case is True (see below), then pass this parameter as a keyword argument with the same name in the call to capitalize_title(). Default True.

  • correct_case (bool) – If True, capitalize the title with the function capitalize_title() before returning it. If False, return the title in all lowercase. Default True.

  • smart_numbers (bool) – If True, convert all Arabic numerals in the title to their written-out equivalents. See Number Handling below. Default True.

Returns:

Sortable version of title, with no leading “a”, “an”, or “the”.

Return type:

str

Number Handling

When the parameter smart_numbers is True (the default), all words in the title made entirely of ASCII numerals will be converted to their written-out equivalents. Comma-separated numbers will also be converted as if the commas were not present (e.g. “30,000” to “thirty thousand”, “1,2” to “twelve”). If a word begins with a numeral but contains letters as well, the entire word will be replaced with the ordinal form of the number which begins it. Thus “1st” will be replaced with “first”, and “21st”, “21nd”, and “21st0” will all be replaced with “twenty-first”.

Examples

>>> get_sortable_title("an episode of sparrows")
'Episode of Sparrows'
>>> get_sortable_title(" `the +Hob.bit")
'Hobbit'
>>> get_sortable_title("MOSTLY  H-ARMLESS)")
'Mostly Harmless'

When correct_case is False:

>>> get_sortable_title("an episode of sparrows", correct_case=False)
'episode of sparrows'
>>> get_sortable_title(" `the +Hob.bit", correct_case=False)
'hobbit'
>>> get_sortable_title("MOSTLY  H-ARMLESS)", correct_case=False)
'mostly harmless'

With numbers in the title:

>>> get_sortable_title("20,000 leagues under the sea")
'Twenty Thousand Leagues Under the Sea'
>>> get_sortable_title("Around the World in 8,0 Days", correct_case=False)
'around the world in eighty days'
>>> get_sortable_title("the 1st 2 lives of lukas-kasha")
'First Two Lives of Lukas-Kasha'
>>> # Commas within numbers will be removed even if smart_numbers == False,
>>> # as they are non-alphanumeric
>>> get_sortable_title("20,000 leagues under the sea", smart_numbers=False)
'20000 Leagues Under the Sea'
book_cataloguing.get_sortable_author(author: str, handle_mc_prefix: bool = True, correct_case: bool = True) str

Return author’s name in the format “last, first”.

This function considers all non-alphanumeric characters except apostrophes to separate words. It also places periods after one-letter words (assuming them to be initials), and it removes all non-alphanumeric characters in the result except for:

  • These periods,

  • All hyphens and apostrophes,

  • The comma separating the first and last names, and

  • The period after “jr” or “sr”, if applicable.

By default, the author’s surname is assumed to be one word long. However, if the last part of the name is a Roman numeral, “jr”, or “sr”, it is assumed to be part of the surname. Also, if the surname is prefixed with a word in the list of lowercase author words (such as “le” or “von”), that word is assumed to be part of the surname. (You may change this list with the function set_lowercase_author_words().) See Examples below.

According to the Anglo-American Cataloguing Rules, authors whose names begin with “mc” should be alphabetized as if their names start with “mac”. This function replaces the prefix “mc” in this way to make that rule easier to follow; again, please see Examples.

Lastly, this function removes from the given name words such as “lord” and “mr” that are in the list of author titles. (You may change this list with the function set_author_titles().)

Parameters:
  • author (str) – Author name to return in “last, first” format.

  • handle_mc_prefix (bool) – If correct_case is True (see below), then pass this parameter as a keyword argument with the same name in the call to capitalize_author(). Default True. Please note that this parameter does not change whether or not the “mc” prefix is replaced with “mac” as mentioned above; this behavior cannot be disabled. It only controls the capitalization of such prefixes.

  • correct_case (bool) – If True, capitalize the author’s name with the function capitalize_author() before returning it. If False, return the name in all lowercase. Default True.

Returns:

Author’s name in “last, first” format.

Return type:

str

Examples

>>> get_sortable_author("charles dickens")
'Dickens, Charles'
>>> get_sortable_author(" /Douglas#ADAMS. ")
'Adams, Douglas'
>>> get_sortable_author("GENE STRATTON-PORTER")
'Stratton-Porter, Gene'

With name suffixes:

>>> get_sortable_author("richard henry dana jr")
'Dana Jr., Richard Henry'
>>> get_sortable_author("john doe iii")
'Doe III, John'

With multi-word surnames:

>>> get_sortable_author("alexander the great")
'the Great, Alexander'
>>> get_sortable_author("johannes van der doe")
'van der Doe, Johannes'

With titles in the name:

>>> get_sortable_author("Alfred, Lord Tennyson")
'Tennyson, Alfred'
>>> get_sortable_author("president george herbert walker bush")
'Bush, George Herbert Walker'

Handling of “Mc” prefixes:

>>> get_sortable_author("cormac mccarthy")
'MacCarthy, Cormac'
>>> get_sortable_author("cormac mccarthy", correct_case=False)
'maccarthy, cormac'
>>> get_sortable_author("cormac mccarthy", handle_mc_prefix=False)
'Maccarthy, Cormac'
book_cataloguing.title_sort(iterable: Iterator[Any], /, *, key: Callable[[Any], str] | None = None, reverse: bool = False, smart_numbers: bool = True) list[Any]

Sort the given objects as if they are book titles.

Parameters:
  • iterable (Iterator[Any]) – Iterator of objects to sort.

  • key (Optional[Callable[[Any], str]]) – Function with which to extract a comparison key from each item from the iterable. Default is None (items are compared directly).

  • reverse (bool) – Whether or not to reverse the sorted order, making it descending instead of ascending. Default False.

  • smart_numbers (bool) – This parameter is supplied as a keyword argument with the same name in the calls to get_sortable_title().

Returns:

Sorted list of given objects.

Return type:

list[Any]

The given titles are not sorted as they are; instead the return values of a call to get_sortable_title() for each given object are sorted. Thus, please see the documentation for that function for more details on the sorting. The calls have the correct_case argument set to False, so comparisons are case-insensitive.

book_cataloguing.author_sort(iterable: Iterator[Any], /, *, key: Callable[[Any], str] | None = None, reverse: bool = False) list[Any]

Sort the given objects as if they are the authors of books.

Parameters:
  • iterable (Iterator[Any]) – Iterator of objects to sort.

  • key (Optional[Callable[[Any], str]]) – Function with which to extract a comparison key from each item from the iterable. Default is None (items are compared directly).

  • reverse (bool) – Whether or not to reverse the sorted order, making it descending instead of ascending. Default False.

Returns:

Sorted list of given objects.

Return type:

list[Any]

The given authors are sorted case-insensitively: first by last name, and then by first name. The last and first names used by this function correspond exactly to those determined by get_sortable_author(), and put before and after the comma by that function. Thus, please see its documentation for details on the sorting.