Error measures for DR chords

Revision as of 15:22, 15 December 2025 by Inthar (talk | contribs)
This is an expert page. It is written to allow experienced readers to learn more about the advanced elements of the topic.

This article will describe several least-squares error measures for delta-rational chords. They have the advantage of not fixing a particular interval in the chord when constructing the chord of best fit. However, like any other numerical measure of concordance or error, you should take them with a grain of salt.

Conventions and introduction

The idea motivating least-squares error measures on a chord as an approximation to a given delta signature is the following (for simplicity, let’s talk about the fully DR case first):

We want the error of a chord 1:r1:r2:...:rn (in increasing order), with n > 1, in the linear domain as an approximation to a fully delta-rational chord with signature +δ12 ... +δn, i.e. a chord

[math]\displaystyle{ x : x + \delta_1 : \cdots : x + \sum_{l=1}^n \delta_l }[/math]

with root real-valued harmonic x. Let [math]\displaystyle{ D_0 = 0, D_i = \sum_{k=1}^i \delta_k }[/math] be the delta signature +δ12 ... +δn written cumulatively.

We want to measure the error without having to fix any dyad (as one might naively fix a dyad and measure errors in the other deltas in relation to the fixed dyad). To do this we solve a least-squares error problem: use a root-sum-square error function and optimize x (and any free deltas) to minimize that function.

Domain and error model

We have two choices:

  • to measure either the linear (frequency ratio) error or the logarithmic (cents) one (called the domain);
  • the collection of intervals to sum over (which we call the error model):
    • Rooted: Only intervals from the root real-valued harmonic x are chosen.
    • Pairwise: All intervals in the chord are compared.
    • All-steps: Only intervals between adjacent notes are compared.

We arrive at the following general formula: Let [math]\displaystyle{ [n] = \{0, 1, 2, ..., n\}, }[/math] let [math]\displaystyle{ I \subseteq {[n] \choose 2} }[/math] be the error model, and let [math]\displaystyle{ f_D }[/math] represent the domain function (identity for linear, or [math]\displaystyle{ \log }[/math]). Then the error function to be minimized by optimizing [math]\displaystyle{ x }[/math] and any free deltas is:

[math]\displaystyle{ \sqrt{\sum_{i<j, \{i,j\} \in I} \Bigg( f_D \Bigg( \frac{x + D_j}{x + D_i} \Bigg) - f_D\Bigg( \frac{r_j}{r_i} \Bigg) \Bigg)^2}. }[/math]

Error function for various modes and error models
Domain Error model Error function
Linear Rooted [math]\displaystyle{ \sqrt{\sum_{i=1}^n \Bigg( \frac{x + D_i}{x} - r_i \Bigg)^2 } = \sqrt{\sum_{i=1}^n \Bigg( 1 + \frac{D_i}{x} - r_i \Bigg)^2 } }[/math]
Pairwise [math]\displaystyle{ \sqrt{\sum_{0\leq i<j\leq n} \Bigg( \frac{x + D_j}{x + D_i} - \frac{r_j}{r_i} \Bigg)^2 } }[/math]
All-steps [math]\displaystyle{ \sqrt{\sum_{0\leq i<n} \Bigg( \frac{x + D_{i+1}}{x + D_i} - \frac{r_{i+1}}{r_i} \Bigg)^2 } }[/math]
Logarithmic
(nepers)
Rooted [math]\displaystyle{ \sqrt{\sum_{i=1}^n \Bigg(\log \frac{x + D_i}{r_i x} \Bigg)^2} }[/math]
Pairwise [math]\displaystyle{ \sqrt{\sum_{0\leq i<j\leq n} \Bigg(\log \frac{x + D_j}{x + D_i} - \log \frac{r_j}{r_i} \Bigg)^2} }[/math]
All-steps [math]\displaystyle{ \sqrt{\sum_{0\leq i<n} \Bigg(\log \frac{x + D_{i+1}}{x + D_i} - \log \frac{r_{i+1}}{r_i} \Bigg)^2} }[/math]

To convert nepers to cents, multiply by [math]\displaystyle{ \frac{1200}{\log 2}. }[/math]

Solution methods

This section gets into the depths of mathematical optimization methods used to minimize DR error. (Optimization is a whole field and there are many different methods; the reason these methods exist is to find solutions when finding them symbolically is infeasible.)

Analytic

Fully DR, rooted linear

Setting the derivative to 0 gives us the closed-form solution

[math]\displaystyle{ x = \frac{\sum_{i=1}^n D_i }{-n + \sum_{i=1}^n r_i}, }[/math]

which can be plugged back into

[math]\displaystyle{ \sqrt{\sum_{1=1}^n \Bigg( 1 + \frac{D_i}{x} - r_i \Bigg)^2 } }[/math]

to obtain the least-squares linear error.

Partially DR (one related delta set, one free variable), rooted linear

Suppose we wish to approximate a target delta signature of the form [math]\displaystyle{ +\delta_1 +? +\delta_3 }[/math] with the chord [math]\displaystyle{ 1:r_1:r_2:r_3 }[/math] (where the +? is free to vary). The least-squares problem is to minimize by varying x and y the following:

[math]\displaystyle{ \displaystyle{\sqrt{\bigg(\frac{x + \delta_1}{x} - r_1 \bigg)^2 + \bigg(\frac{x+\delta_1 + y}{x} - r_2 \bigg)^2 + \bigg(\frac{x+\delta_1 + y + \delta_3}{x} - r_3 \bigg)^2 }}, }[/math]

where y represents the free delta +?.

We can set the partial derivatives with respect to x and y of the inner expression equal to zero (since the derivative of sqrt() is never 0) and use SymPy to solve the system symbolically:

import sympy
x = sympy.Symbol("x", real=True)
y = sympy.Symbol("y", real=True)
d1 = sympy.Symbol("\\delta_{1}", real=True)
d2 = sympy.Symbol("\\delta_{2}", real=True)
d3 = sympy.Symbol("\\delta_{3}", real=True)
r1 = sympy.Symbol("r_1", real=True)
r2 = sympy.Symbol("r_2", real=True)
r3 = sympy.Symbol("r_3", real=True)
err_squared = ((x + d1) / x - r1) ** 2 + ((x + d1 + y) / x - r2) ** 2 + ((x + d1 + y + d3) / x - r3) ** 2
err_squared.expand()
err_squared_x = sympy.diff(err_squared, x)
err_squared_y = sympy.diff(err_squared, y)
sympy.nonlinsolve([err_squared_x, err_squared_y], [x, y])

The unique solution with x > 0 is

[math]\displaystyle{ x = \frac{2 \delta_1 + \delta_3 + 2y}{r_2 + r_3 - 2}, }[/math]

[math]\displaystyle{ y = \frac{- 2 \delta_1^2 r_1 + \delta_1^2 r_2 + \delta_1^2 r_3 - \delta_1 \delta_3 r_1 + \delta_1 \delta_3 r_2 - \delta_1 \delta_3 r_3 + \delta_1 \delta_3 + \delta_3^2 r_2 - \delta_3^2}{2 \delta_1 r_1 - 2 \delta_1 - \delta_3 r_2 + \delta_3 r_3}. }[/math]

Grid method (FDR case)

<syntaxhighlight lang="python"> class GridSolution:

   def __init__(self, x, fx):
       self.x = x
       self.fx = fx
   
   def __repr__(self):
       return f"GridSolution(x={self.x:.5f}, fx={self.fx:.5f})"

def grid_method(f, window_size=100, coarse_steps=1000, fine_steps=1000):

   best_error = float("inf")
   best_x = 0
   
   coarse_step = window_size / coarse_steps
   # Fine step partitions one coarse step into smaller pieces
   fine_step = coarse_step / fine_steps 
   
   # --- Phase 1: Coarse Grid Search ---
   # Search in the window (0, window_size]
   for i in range(1, coarse_steps + 1):
       x = i * coarse_step
       fx = f(x)
       if fx < best_error:
           best_error = fx
           best_x = x
   # --- Phase 2: Fine Grid Search ---
   # Center the new search window around the best_x found in Phase 1
   # We search from (best_x - coarse_step/2) to (best_x + coarse_step/2)
   
   fine_window_lower = best_x - (coarse_step / 2)
   
   for j in range(1, fine_steps + 1):
       x = fine_window_lower + (j * fine_step)
       fx = f(x)
       if fx < best_error:
           best_error = fx
           best_x = x
           
   return GridSolution(best_x, best_error)

<syntaxhighlight>

BFGS-B (one related delta set, arbitrary free deltas)

We let x1 = x and include additional free variables x2, ..., xn, one for every additional +?, after coalescing strings of consecutive +?'s into one +? and after trimming leading and trailing free delta segments.

BFGS-B is a quasi-Newton optimization method (based on BFGS) particularly suited for this problem:

  • The error function is smooth, allowing use of gradients
  • Fast convergence, requiring at worst 20 iterations for accuracy
  • Naturally deals with the x > 0 constraint using a log barrier and minimizing the transformed function using the unconstrained BFGS method
  • Acceptable memory usage given a realistic number of parameters for practical DR chords (up to 3 interior free delta segments, thus 4 parameters).

It is a quasi-Newton method because it uses an approximation of the Hessian (matrix of mixed second partial derivatives) of the error function at each step.

In the Python implementation below, x represents the vector [math]\displaystyle{ (x_1, x_2, ..., x_n), }[/math] x0 is the initial guess for the solution, and f is the error function.

<syntaxhighlight lang="python"> import numpy as np import math

class BFGSSolution:

   def __init__(self, x, fx, iterations, success):
       self.x = x
       self.fx = fx
       self.iterations = iterations
       self.success = success

def numerical_gradient(f, x, eps=1e-8):

   grad = np.zeros_like(x)
   for i in range(len(x)):
       x_plus = x.copy()
       x_minus = x.copy()
       x_plus[i] += eps
       x_minus[i] -= eps
       grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
   return grad

def bfgs(f, grad, x0, max_iterations=100, tolerance=1e-5):

   x = np.array(x0, dtype=float)
   n = len(x)
   fx = f(x)
   g = grad(x)
   
   # Approximate inverse Hessian
   # Initial approximation is the identity matrix
   H = np.eye(n)
   
   for i in range(max_iterations):
       # 0: Check convergence
       grad_norm = np.linalg.norm(g)
       if grad_norm < tolerance:
           return BFGSSolution(x, fx, i, True)
       
       # 1: Set search direction p (negative gradient direction transformed by H)
       # p = -H * g
       p = -H @ g
       
       # 2: Get alpha satisfying Wolfe conditions (Armijo rule and curvature condition)
       c1 = 1e-4
       c2 = 0.9
       max_line_search = 20
       rho_ls = 0.5
       
       gp = np.dot(g, p)
       alpha = 1.0
       
       for _ in range(max_line_search):
           x_next_guess = x + alpha * p
           
           # Check Armijo rule
           # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
           if f(x_next_guess) <= fx + c1 * alpha * gp:
               # Check Curvature condition
               # -p^T * grad(x_next) <= -c2 * p^T * g
               g_next_guess = grad(x_next_guess)
               if -np.dot(p, g_next_guess) <= -c2 * gp:
                   break
           
           alpha *= rho_ls
       
       # 3: Set s = alpha * p and x_next = x + s
       s = alpha * p
       x_next = x + s
       
       # 4: Set y = grad(x_next) - grad(x)
       # We re-calculate grad(x_next) here to match the strict logic flow, 
       # though optimization could reuse g_next_guess from the successful line search.
       g_next = grad(x_next)
       y = g_next - g
       
       # 5: BFGS Update
       # Update H += U + V
       sy = np.dot(s, y)
       
       # Prevent division by zero if step size was extremely small
       if sy == 0: 
           # In a robust implementation, you might reset H to Identity here
           break 
           
       Hy = H @ y
       
       # Calculate scalar for the first term: (s^T y + y^T H y) / (s^T y)^2
       scalar1 = (sy + np.dot(y, Hy)) / (sy * sy)
       U = scalar1 * np.outer(s, s)
       
       # Calculate the second term matrices
       # W = (H y) s^T + s (y^T H)
       # Note: Since H is symmetric, y^T H is equivalent to (H y)^T
       W = np.outer(Hy, s) + np.outer(s, Hy)
       V = (-1 / sy) * W
       
       H = H + U + V
       
       # Update x, fx, and g for next iteration
       x = x_next
       fx = f(x_next)
       g = g_next
   return BFGSSolution(x, fx, max_iterations, False)

def bfgs_barrier(f, bounds, x0, history_size=10, max_iterations=100, tolerance=1e-5, barrier_weight=1e-4):

   """
   Solves bounded optimization using a Log-Barrier method wrapped around BFGS.
   Note: x0 must be strictly feasible (inside bounds).
   """
   def transformed_f(x):
       penalty = 0
       for i, val in enumerate(x):
           lower, upper = bounds[i]
           
           # Hard cutoff prevents log domain errors during line search exploration
           if (lower is not None and val <= lower) or (upper is not None and val >= upper):
               return float("inf")
           
           # Log barrier penalties
           if lower is not None:
               penalty -= barrier_weight * math.log(val - lower)
           if upper is not None:
               penalty -= barrier_weight * math.log(upper - val)
       
       return f(x) + penalty
   
   # Use the transformed function for gradients as well
   grad = lambda x: numerical_gradient(transformed_f, x)
   
   result = bfgs(transformed_f, grad, x0, history_size, max_iterations, tolerance)
   
   # Restore actual function value
   result.fx = f(result.x)
   return result

<syntaxhighlight>

L-BFGS-B (one related delta set, arbitrary free deltas)

L-BFGS-B is an approximation to BFGS-B limiting memory usage, particularly suited for high-dimensional problems but nevertheless agreeing very well with BFGS-B for typical DR test cases. Python code for the L-BFGS method is provided below.

<syntaxhighlight lang="python"> import numpy as np import math

class LBFGSSolution:

   def __init__(self, x, fx, iterations, success):
       self.x = x
       self.fx = fx
       self.iterations = iterations
       self.success = success

def numerical_gradient(f, x, eps=1e-8):

   grad = np.zeros_like(x)
   for i in range(len(x)):
       x_plus = x.copy()
       x_minus = x.copy()
       x_plus[i] += eps
       x_minus[i] -= eps
       grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
   return grad

def l_bfgs(f, grad, x0, history_size=10, max_iterations=100, tolerance=1e-5):

   x = np.array(x0, dtype=float) # Ensure float array
   fx = f(x)
   g = grad(x)
   
   s_history = []
   y_history = []
   rho_history = []
   
   for iteration in range(max_iterations):
       grad_norm = np.linalg.norm(g)
       if grad_norm < tolerance:
           return LBFGSSolution(x, fx, iteration, True)
       
       # --- Two-loop recursion ---
       q = g.copy()
       
       # We need to store alpha to use it in the second loop
       m = len(s_history)
       alpha = [0.0] * m 
       
       # First loop (Backward)
       for i in range(m - 1, -1, -1):
           alpha[i] = rho_history[i] * np.dot(s_history[i], q)
           q = q - alpha[i] * y_history[i]
       
       # Initial Hessian approximation scaling
       gamma = 1.0
       if m > 0:
           gamma = np.dot(s_history[-1], y_history[-1]) / np.dot(y_history[-1], y_history[-1])
       
       z = q * gamma
       # Second loop (Forward)
       for i in range(m):
           beta = rho_history[i] * np.dot(y_history[i], z)
           z = z + s_history[i] * (alpha[i] - beta)
       
       p = -z # Search direction
       # --- Line search (Backtracking) ---
       step_size = 1.0
       c1 = 1e-4
       rho_ls = 0.5
       max_line_search = 20
       
       # Check for sufficient decrease (Armijo rule)
       # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
       
       g_dot_p = np.dot(g, p)
       
       for _ in range(max_line_search):
           x_new = x + step_size * p
           fx_new = f(x_new)
           
           if fx_new <= fx + c1 * step_size * g_dot_p:
               break
           step_size *= rho_ls
       else:
           # If line search fails to find a better point, stop or accept best attempt
           # (Here we just accept the last reduced step_size)
           pass 
       
       # --- Update History ---
       s = x_new - x
       g_new = grad(x_new)
       y = g_new - g
       
       sy = np.dot(s, y)
       
       if sy > 1e-10: # Ensure curvature condition
           s_history.append(s)
           y_history.append(y)
           rho_history.append(1.0 / sy)
           
           # Maintain history size
           if len(s_history) > history_size:
               s_history.pop(0)
               y_history.pop(0)
               rho_history.pop(0)
           
       x = x_new
       fx = fx_new
       g = g_new
   
   return LBFGSSolution(x, fx, max_iterations, False)

<syntaxhighlight>

External links