khoa181101
Registered Member
I've been working on **LTF-B4-17** (LOGOS Transformation Format, Base-4, N=17), a
new fixed-width Unicode encoding I've submitted as a formal UTC proposal.
---
## What is LTF-B4-17?
A 34-bit encoding that extends UTF-32 while keeping 100% backward compatibility:
| Encoding | Bits/char | Capacity | Efficiency |
|---|---|---|---|
| UTF-8 | 8–32 | 1,114,112 | Variable |
| UTF-16 | 16–32 | 1,114,112 | Variable |
| UTF-32 | 32 | 4,294,967,296 | 100% |
| **LTF-B4-17 ★** | **34** | **17,179,869,184** | **100% (zero waste)** |
---
## Key properties
- **4× UTF-32 capacity** — 17,179,869,184 codepoints (4^17 = 2^34)
- **Identity encoding** — encode(c) == c, no lookup table needed
- **100% bit efficiency** — Base-4 = 2², each digit maps to exactly 2 bits
- **Full UTF-32 backward compat** — lower 32 bits identical to UTF-32
- **+12.89B extension slots** — codepoints 4,294,967,296–17,179,869,183
---
## Formula (verifiable in 3 lines)
```python
# quaternary(c, 17): divide c by 4 seventeen times
# digit map: 0→00 1→01 2→10 3→11
# total bits: 17 × 2 = 34
```
---
## Python package (pip-installable)
```bash
pip install logos_b4n17-2.1.0-py3-none-any.whl
```
```python
from logos_b4n17 import encode, decode, encode_text, decode_text, zone_of
# Identity encoding: encode(65) == 65
print(encode(65)) # 65
print(decode(65)) # 65
# Text round-trip
blob = encode_text("Hello LTF-B4-17!")
print(decode_text(blob)) # Hello LTF-B4-17!
# Zone info
z = zone_of(0x4E2D) # '中'
print(z) # UNICODE-BMP
# Capacity
from logos_b4n17 import CAPACITY
print(CAPACITY) # 17179869184
```
---
## Performance (x86-64, gcc -O3)
| Operation | LTF-B4-17 | UTF-32 |
|---|---|---|
| Encode single | 67,114 M/s | ~67,000 M/s |
| Decode single | 22,600 M/s | ~22,000 M/s |
| Stream encode | 36,400 M/s | N/A |
| Stream decode | 144,600 M/s | N/A |
Stream overhead vs UTF-32: **+6.25%** (2 extra bits/char).
---
## Zone map (12 zones)
| Zone | Range | Size |
|---|---|---|
| ASCII | 0–127 | 128 |
| UNICODE-BMP | 243–65,535 | 65,293 |
| UNICODE-SMP | 65,536–131,071 | 65,536 |
| UTF-32 full | 0–4,294,967,295 | 4.29B |
| **LTF-B4-17-EXT** | **4,294,967,296–17,179,869,183** | **12.89B new** |
---
UTC_Proposal_LTF_B4_17: https://docs.google.com/document/d/...ouid=109480604438012146765&rtpof=true&sd=true
logos_b4n17-2.1.0-py3-none-any: https://drive.google.com/file/d/1Jxgnp8IO9_xLoTJWeH61uZ3GGXxKRks2/view?usp=drive_link
## UTC Submission
A formal proposal has been submitted to the Unicode Technical Committee
(UTC) requesting evaluation of LTF-B4-17 as a new Unicode Transformation Format.
**Document**: UTC Proposal LTF-B4-17 — submitted 2026-05-24
**Author**: HUA VAN ANH KHOA (TAO HUA)
**Copyright**: © 2026 AXIOM CODE 010 — All Rights Reserved
Feedback welcome — especially on stream format design and the backward-compatibility guarantee.
new fixed-width Unicode encoding I've submitted as a formal UTC proposal.
---
## What is LTF-B4-17?
A 34-bit encoding that extends UTF-32 while keeping 100% backward compatibility:
| Encoding | Bits/char | Capacity | Efficiency |
|---|---|---|---|
| UTF-8 | 8–32 | 1,114,112 | Variable |
| UTF-16 | 16–32 | 1,114,112 | Variable |
| UTF-32 | 32 | 4,294,967,296 | 100% |
| **LTF-B4-17 ★** | **34** | **17,179,869,184** | **100% (zero waste)** |
---
## Key properties
- **4× UTF-32 capacity** — 17,179,869,184 codepoints (4^17 = 2^34)
- **Identity encoding** — encode(c) == c, no lookup table needed
- **100% bit efficiency** — Base-4 = 2², each digit maps to exactly 2 bits
- **Full UTF-32 backward compat** — lower 32 bits identical to UTF-32
- **+12.89B extension slots** — codepoints 4,294,967,296–17,179,869,183
---
## Formula (verifiable in 3 lines)
```python
# quaternary(c, 17): divide c by 4 seventeen times
# digit map: 0→00 1→01 2→10 3→11
# total bits: 17 × 2 = 34
```
---
## Python package (pip-installable)
```bash
pip install logos_b4n17-2.1.0-py3-none-any.whl
```
```python
from logos_b4n17 import encode, decode, encode_text, decode_text, zone_of
# Identity encoding: encode(65) == 65
print(encode(65)) # 65
print(decode(65)) # 65
# Text round-trip
blob = encode_text("Hello LTF-B4-17!")
print(decode_text(blob)) # Hello LTF-B4-17!
# Zone info
z = zone_of(0x4E2D) # '中'
print(z) # UNICODE-BMP
# Capacity
from logos_b4n17 import CAPACITY
print(CAPACITY) # 17179869184
```
---
## Performance (x86-64, gcc -O3)
| Operation | LTF-B4-17 | UTF-32 |
|---|---|---|
| Encode single | 67,114 M/s | ~67,000 M/s |
| Decode single | 22,600 M/s | ~22,000 M/s |
| Stream encode | 36,400 M/s | N/A |
| Stream decode | 144,600 M/s | N/A |
Stream overhead vs UTF-32: **+6.25%** (2 extra bits/char).
---
## Zone map (12 zones)
| Zone | Range | Size |
|---|---|---|
| ASCII | 0–127 | 128 |
| UNICODE-BMP | 243–65,535 | 65,293 |
| UNICODE-SMP | 65,536–131,071 | 65,536 |
| UTF-32 full | 0–4,294,967,295 | 4.29B |
| **LTF-B4-17-EXT** | **4,294,967,296–17,179,869,183** | **12.89B new** |
---
UTC_Proposal_LTF_B4_17: https://docs.google.com/document/d/...ouid=109480604438012146765&rtpof=true&sd=true
logos_b4n17-2.1.0-py3-none-any: https://drive.google.com/file/d/1Jxgnp8IO9_xLoTJWeH61uZ3GGXxKRks2/view?usp=drive_link
## UTC Submission
A formal proposal has been submitted to the Unicode Technical Committee
(UTC) requesting evaluation of LTF-B4-17 as a new Unicode Transformation Format.
**Document**: UTC Proposal LTF-B4-17 — submitted 2026-05-24
**Author**: HUA VAN ANH KHOA (TAO HUA)
**Copyright**: © 2026 AXIOM CODE 010 — All Rights Reserved
Feedback welcome — especially on stream format design and the backward-compatibility guarantee.