Error measures for DR chords

This is an expert page. It is written to allow experienced readers to learn more about the advanced elements of the topic.

This article will describe several least-squares error measures for delta-rational chords. They have the advantage of not fixing a particular interval in the chord when constructing the chord of best fit. However, like any other numerical measure of concordance or error, you should take them with a grain of salt.

Conventions and introduction

The idea motivating least-squares error measures on a chord as an approximation to a given delta signature is the following (for simplicity, let’s talk about the fully DR case first):

We want the error of a chord 1:r₁:r₂:...:r_n (in increasing order), with n > 1, in the linear domain as an approximation to a fully delta-rational chord with signature +δ₁ +δ₂ ... +δ_n, i.e. a chord

[math]\displaystyle{ x : x + \delta_1 : \cdots : x + \sum_{l=1}^n \delta_l }[/math]

with root real-valued harmonic x. Let [math]\displaystyle{ D_0 = 0, D_i = \sum_{k=1}^i \delta_k }[/math] be the delta signature +δ₁ +δ₂ ... +δ_n written cumulatively.

We want to measure the error without having to fix any dyad (as one might naively fix a dyad and measure errors in the other deltas in relation to the fixed dyad). To do this we solve a least-squares error problem: use a root-sum-square error function and optimize x (and any free deltas) to minimize that function.

Domain and error model

We have two choices:

to measure either the linear (frequency ratio) error or the logarithmic (cents) one (called the domain);
the collection of intervals to sum over (which we call the error model):
- Rooted: Only intervals from the root real-valued harmonic x are chosen.
- Pairwise: All intervals in the chord are compared.
- All-steps: Only intervals between adjacent notes are compared.

We arrive at the following general formula: Let [math]\displaystyle{ [n] = \{0, 1, 2, ..., n\}, }[/math] let [math]\displaystyle{ I \subseteq {[n] \choose 2} }[/math] be the error model, and let [math]\displaystyle{ f_D }[/math] represent the domain function (identity for linear, or [math]\displaystyle{ \log }[/math]). Then the error function to be minimized by optimizing [math]\displaystyle{ x }[/math] and any free deltas is:

[math]\displaystyle{ \sqrt{\sum_{i<j, \{i,j\} \in I} \Bigg( f_D \Bigg( \frac{x + D_j}{x + D_i} \Bigg) - f_D\Bigg( \frac{r_j}{r_i} \Bigg) \Bigg)^2}. }[/math]

Error function for various modes and error models
Domain	Error model	Error function
Linear	Rooted	[math]\displaystyle{ \sqrt{\sum_{i=1}^n \Bigg( \frac{x + D_i}{x} - r_i \Bigg)^2 } = \sqrt{\sum_{i=1}^n \Bigg( 1 + \frac{D_i}{x} - r_i \Bigg)^2 } }[/math]
	Pairwise	[math]\displaystyle{ \sqrt{\sum_{0\leq i<j\leq n} \Bigg( \frac{x + D_j}{x + D_i} - \frac{r_j}{r_i} \Bigg)^2 } }[/math]
	All-steps	[math]\displaystyle{ \sqrt{\sum_{0\leq i<n} \Bigg( \frac{x + D_{i+1}}{x + D_i} - \frac{r_{i+1}}{r_i} \Bigg)^2 } }[/math]
Logarithmic (nepers)	Rooted	[math]\displaystyle{ \sqrt{\sum_{i=1}^n \Bigg(\log \frac{x + D_i}{r_i x} \Bigg)^2} }[/math]
	Pairwise	[math]\displaystyle{ \sqrt{\sum_{0\leq i<j\leq n} \Bigg(\log \frac{x + D_j}{x + D_i} - \log \frac{r_j}{r_i} \Bigg)^2} }[/math]
	All-steps	[math]\displaystyle{ \sqrt{\sum_{0\leq i<n} \Bigg(\log \frac{x + D_{i+1}}{x + D_i} - \log \frac{r_{i+1}}{r_i} \Bigg)^2} }[/math]

To convert nepers to cents, multiply by [math]\displaystyle{ \frac{1200}{\log 2}. }[/math]

Solution methods

This section gets into the depths of mathematical optimization methods used to minimize DR error. (Optimization is a whole field and there are many different methods; the reason these methods exist is to find solutions when finding them symbolically is infeasible.)

Analytic

Fully DR, rooted linear

Setting the derivative to 0 gives us the closed-form solution

[math]\displaystyle{ x = \frac{\sum_{i=1}^n D_i }{-n + \sum_{i=1}^n r_i}, }[/math]

which can be plugged back into

[math]\displaystyle{ \sqrt{\sum_{1=1}^n \Bigg( 1 + \frac{D_i}{x} - r_i \Bigg)^2 } }[/math]

to obtain the least-squares linear error.

Partially DR (one related delta set, one free variable), rooted linear

Suppose we wish to approximate a target delta signature of the form [math]\displaystyle{ +\delta_1 +? +\delta_3 }[/math] with the chord [math]\displaystyle{ 1:r_1:r_2:r_3 }[/math] (where the +? is free to vary). The least-squares problem is to minimize by varying x and y the following:

[math]\displaystyle{ \displaystyle{\sqrt{\bigg(\frac{x + \delta_1}{x} - r_1 \bigg)^2 + \bigg(\frac{x+\delta_1 + y}{x} - r_2 \bigg)^2 + \bigg(\frac{x+\delta_1 + y + \delta_3}{x} - r_3 \bigg)^2 }}, }[/math]

where y represents the free delta +?.

We can set the partial derivatives with respect to x and y of the inner expression equal to zero (since the derivative of sqrt() is never 0) and use SymPy to solve the system symbolically:

import sympy
x = sympy.Symbol("x", real=True)
y = sympy.Symbol("y", real=True)
d1 = sympy.Symbol("\\delta_{1}", real=True)
d2 = sympy.Symbol("\\delta_{2}", real=True)
d3 = sympy.Symbol("\\delta_{3}", real=True)
r1 = sympy.Symbol("r_1", real=True)
r2 = sympy.Symbol("r_2", real=True)
r3 = sympy.Symbol("r_3", real=True)
err_squared = ((x + d1) / x - r1) ** 2 + ((x + d1 + y) / x - r2) ** 2 + ((x + d1 + y + d3) / x - r3) ** 2
err_squared.expand()
err_squared_x = sympy.diff(err_squared, x)
err_squared_y = sympy.diff(err_squared, y)
sympy.nonlinsolve([err_squared_x, err_squared_y], [x, y])

The unique solution with x > 0 is

[math]\displaystyle{ x = \frac{2 \delta_1 + \delta_3 + 2y}{r_2 + r_3 - 2}, }[/math]

[math]\displaystyle{ y = \frac{- 2 \delta_1^2 r_1 + \delta_1^2 r_2 + \delta_1^2 r_3 - \delta_1 \delta_3 r_1 + \delta_1 \delta_3 r_2 - \delta_1 \delta_3 r_3 + \delta_1 \delta_3 + \delta_3^2 r_2 - \delta_3^2}{2 \delta_1 r_1 - 2 \delta_1 - \delta_3 r_2 + \delta_3 r_3}. }[/math]

Grid method (FDR case)

<syntaxhighlight lang="python"> class GridSolution:

   def __init__(self, x, fx):
       self.x = x
       self.fx = fx
   
   def __repr__(self):
       return f"GridSolution(x={self.x:.5f}, fx={self.fx:.5f})"

def grid_method(f, window_size=100, coarse_steps=1000, fine_steps=1000):

   best_error = float("inf")
   best_x = 0
   
   coarse_step = window_size / coarse_steps
   # Fine step partitions one coarse step into smaller pieces
   fine_step = coarse_step / fine_steps 
   
   # --- Phase 1: Coarse Grid Search ---
   # Search in the window (0, window_size]
   for i in range(1, coarse_steps + 1):
       x = i * coarse_step
       fx = f(x)
       if fx < best_error:
           best_error = fx
           best_x = x

   # --- Phase 2: Fine Grid Search ---
   # Center the new search window around the best_x found in Phase 1
   # We search from (best_x - coarse_step/2) to (best_x + coarse_step/2)
   
   fine_window_lower = best_x - (coarse_step / 2)
   
   for j in range(1, fine_steps + 1):
       x = fine_window_lower + (j * fine_step)
       fx = f(x)
       if fx < best_error:
           best_error = fx
           best_x = x
           
   return GridSolution(best_x, best_error)

BFGS-B (one related delta set, arbitrary free deltas)

We let x₁ = x and include additional free variables x₂, ..., x_n, one for every additional +?, after coalescing strings of consecutive +?'s into one +? and after trimming leading and trailing free delta segments.

BFGS-B is a quasi-Newton optimization method (based on BFGS) particularly suited for this problem:

The error function is smooth, allowing use of gradients
Fast convergence, requiring at worst 20 iterations for accuracy
Naturally deals with the x > 0 constraint using a log barrier and minimizing the transformed function using the unconstrained BFGS method
Acceptable memory usage given a realistic number of parameters for practical DR chords (up to 3 interior free delta segments, thus 4 parameters).

It is a quasi-Newton method because it uses an approximation of the Hessian (matrix of mixed second partial derivatives) of the error function at each step.

In the Python implementation below, x represents the vector [math]\displaystyle{ (x_1, x_2, ..., x_n), }[/math] x0 is the initial guess for the solution, and f is the error function.

<syntaxhighlight lang="python"> import numpy as np import math

class BFGSSolution:

   def __init__(self, x, fx, iterations, success):
       self.x = x
       self.fx = fx
       self.iterations = iterations
       self.success = success

def numerical_gradient(f, x, eps=1e-8):

   grad = np.zeros_like(x)
   for i in range(len(x)):
       x_plus = x.copy()
       x_minus = x.copy()
       x_plus[i] += eps
       x_minus[i] -= eps
       grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
   return grad

def bfgs(f, grad, x0, max_iterations=100, tolerance=1e-5):

   x = np.array(x0, dtype=float)
   n = len(x)
   fx = f(x)
   g = grad(x)
   
   # Approximate inverse Hessian
   # Initial approximation is the identity matrix
   H = np.eye(n)
   
   for i in range(max_iterations):
       # 0: Check convergence
       grad_norm = np.linalg.norm(g)
       if grad_norm < tolerance:
           return BFGSSolution(x, fx, i, True)
       
       # 1: Set search direction p (negative gradient direction transformed by H)
       # p = -H * g
       p = -H @ g
       
       # 2: Get alpha satisfying Wolfe conditions (Armijo rule and curvature condition)
       c1 = 1e-4
       c2 = 0.9
       max_line_search = 20
       rho_ls = 0.5
       
       gp = np.dot(g, p)
       alpha = 1.0
       
       for _ in range(max_line_search):
           x_next_guess = x + alpha * p
           
           # Check Armijo rule
           # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
           if f(x_next_guess) <= fx + c1 * alpha * gp:
               # Check Curvature condition
               # -p^T * grad(x_next) <= -c2 * p^T * g
               g_next_guess = grad(x_next_guess)
               if -np.dot(p, g_next_guess) <= -c2 * gp:
                   break
           
           alpha *= rho_ls
       
       # 3: Set s = alpha * p and x_next = x + s
       s = alpha * p
       x_next = x + s
       
       # 4: Set y = grad(x_next) - grad(x)
       # We re-calculate grad(x_next) here to match the strict logic flow, 
       # though optimization could reuse g_next_guess from the successful line search.
       g_next = grad(x_next)
       y = g_next - g
       
       # 5: BFGS Update
       # Update H += U + V
       sy = np.dot(s, y)
       
       # Prevent division by zero if step size was extremely small
       if sy == 0: 
           # In a robust implementation, you might reset H to Identity here
           break 
           
       Hy = H @ y
       
       # Calculate scalar for the first term: (s^T y + y^T H y) / (s^T y)^2
       scalar1 = (sy + np.dot(y, Hy)) / (sy * sy)
       U = scalar1 * np.outer(s, s)
       
       # Calculate the second term matrices
       # W = (H y) s^T + s (y^T H)
       # Note: Since H is symmetric, y^T H is equivalent to (H y)^T
       W = np.outer(Hy, s) + np.outer(s, Hy)
       V = (-1 / sy) * W
       
       H = H + U + V
       
       # Update x, fx, and g for next iteration
       x = x_next
       fx = f(x_next)
       g = g_next

   return BFGSSolution(x, fx, max_iterations, False)

def bfgs_barrier(f, bounds, x0, history_size=10, max_iterations=100, tolerance=1e-5, barrier_weight=1e-4):

   """
   Solves bounded optimization using a Log-Barrier method wrapped around BFGS.
   Note: x0 must be strictly feasible (inside bounds).
   """
   def transformed_f(x):
       penalty = 0
       for i, val in enumerate(x):
           lower, upper = bounds[i]
           
           # Hard cutoff prevents log domain errors during line search exploration
           if (lower is not None and val <= lower) or (upper is not None and val >= upper):
               return float("inf")
           
           # Log barrier penalties
           if lower is not None:
               penalty -= barrier_weight * math.log(val - lower)
           if upper is not None:
               penalty -= barrier_weight * math.log(upper - val)
       
       return f(x) + penalty
   
   # Use the transformed function for gradients as well
   grad = lambda x: numerical_gradient(transformed_f, x)
   
   result = bfgs(transformed_f, grad, x0, history_size, max_iterations, tolerance)
   
   # Restore actual function value
   result.fx = f(result.x)
   return result

L-BFGS-B (one related delta set, arbitrary free deltas)

L-BFGS-B is an approximation to BFGS-B limiting memory usage, particularly suited for high-dimensional problems but nevertheless agreeing very well with BFGS-B for typical DR test cases. Python code for the L-BFGS method is provided below.

<syntaxhighlight lang="python"> import numpy as np import math

class LBFGSSolution:

   def __init__(self, x, fx, iterations, success):
       self.x = x
       self.fx = fx
       self.iterations = iterations
       self.success = success

def numerical_gradient(f, x, eps=1e-8):

   grad = np.zeros_like(x)
   for i in range(len(x)):
       x_plus = x.copy()
       x_minus = x.copy()
       x_plus[i] += eps
       x_minus[i] -= eps
       grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
   return grad

def l_bfgs(f, grad, x0, history_size=10, max_iterations=100, tolerance=1e-5):

   x = np.array(x0, dtype=float) # Ensure float array
   fx = f(x)
   g = grad(x)
   
   s_history = []
   y_history = []
   rho_history = []
   
   for iteration in range(max_iterations):
       grad_norm = np.linalg.norm(g)
       if grad_norm < tolerance:
           return LBFGSSolution(x, fx, iteration, True)
       
       # --- Two-loop recursion ---
       q = g.copy()
       
       # We need to store alpha to use it in the second loop
       m = len(s_history)
       alpha = [0.0] * m 
       
       # First loop (Backward)
       for i in range(m - 1, -1, -1):
           alpha[i] = rho_history[i] * np.dot(s_history[i], q)
           q = q - alpha[i] * y_history[i]
       
       # Initial Hessian approximation scaling
       gamma = 1.0
       if m > 0:
           gamma = np.dot(s_history[-1], y_history[-1]) / np.dot(y_history[-1], y_history[-1])
       
       z = q * gamma

       # Second loop (Forward)
       for i in range(m):
           beta = rho_history[i] * np.dot(y_history[i], z)
           z = z + s_history[i] * (alpha[i] - beta)
       
       p = -z # Search direction

       # --- Line search (Backtracking) ---
       step_size = 1.0
       c1 = 1e-4
       rho_ls = 0.5
       max_line_search = 20
       
       # Check for sufficient decrease (Armijo rule)
       # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
       
       g_dot_p = np.dot(g, p)
       
       for _ in range(max_line_search):
           x_new = x + step_size * p
           fx_new = f(x_new)
           
           if fx_new <= fx + c1 * step_size * g_dot_p:
               break
           step_size *= rho_ls
       else:
           # If line search fails to find a better point, stop or accept best attempt
           # (Here we just accept the last reduced step_size)
           pass 
       
       # --- Update History ---
       s = x_new - x
       g_new = grad(x_new)
       y = g_new - g
       
       sy = np.dot(s, y)
       
       if sy > 1e-10: # Ensure curvature condition
           s_history.append(s)
           y_history.append(y)
           rho_history.append(1.0 / sy)
           
           # Maintain history size
           if len(s_history) > history_size:
               s_history.pop(0)
               y_history.pop(0)
               rho_history.pop(0)
           
       x = x_new
       fx = fx_new
       g = g_new
   
   return LBFGSSolution(x, fx, max_iterations, False)

External links

Inthar's DR chord explorer (includes least-squares error calculation in both domains and multiple error models)