tron-产块-SR产块机制

SR 基于DPOS共识，所有节点按照时间顺序轮流产块。

DPOS 共识简述

DPOS 共识即为，Delegated Proof of Stake 股份授权证明，在 POS 机制上进行改进。
相较于DPOS更为中心化，大白话主要就是两个角色：

持股人(持币用户)投票选举出委托人(Delegates)
被委托人进行出块，将奖励分给投票人

在DPOS机制下，算法要求系统做三件事：

随机指定生产者出场顺序；
必须按顺序产块，不按顺序生产的区块无效；
每过一个周期洗牌一次，打乱原有顺序；

受托人的职责主要有：

保证节点的正常运行；
收集网络里的交易；
节点验证交易，把交易打包到区块；
节点广播区块，其他节点验证后把区块添加到自己的数据库；
带领并促进区块链项目的发展；

大至概念就是这些，下面对SR产块原理进行分析。

产块机制

注意，TRON对DPOS的产块机制是做了调整的，不完全是按照这个的机制来实现。这个嘛。。。懂的都懂。

产块大流程

产块节点通过定时任务制每隔最多不超过3秒执行一次，判断是否轮到自己产块
如果是自己产块，回滚当前节点交易状态，并将交易池中的交易打包
打包成功后广播该区块给其他节点
处理刚才自己产的区块，这一步是为了走固化逻辑

产块机制需要关注的几个重点：

27节点如何论流产块
如何知道当前该我产块
产块后做什么
产块异常场景怎么处理
- 产块失败怎么办
- 成功产块，但是区块没广播出去怎么办
- 没有收到上一个节点产的块怎么办

27节点如何论流产块

节点有27个，且都是分布式的环境下，并没有中心化的节点进行调度。典型的拜占庭将军问题。
通过严格的时间轮进行节点控制。
啥意思？

产块逻辑入口：DposTask.init()

public void init() {

  if (!dposService.isEnable() || StringUtils.isEmpty(dposService.getMiners())) {
    return;
  }

  Runnable runnable = () -> {
    while (isRunning) {
      try {
        if (dposService.isNeedSyncCheck()) {
          Thread.sleep(1000);
          dposService.setNeedSyncCheck(dposSlot.getTime(1) < System.currentTimeMillis());
        } else {
          // 产块间隔时间，3S
          // 取模，可以获得整数时间
          long time =
              BLOCK_PRODUCED_INTERVAL - System.currentTimeMillis() % BLOCK_PRODUCED_INTERVAL;
          // sleep n秒，注意这里使用的是当前系统时间，得出的时间并不是一个对 3000 取模的秒
          // 假设 System.currentTimeMillis() = 1647161596195
          // 3000 - 1647161596195 % 3000，那time = 1805
          // 这么做的意义是什么？
          // 意义就是为了保证是严格的每3秒执行一次，线程自动执行后对 3000 取模，就可以知道当前时间戳离 300 还差多少豪秒
          Thread.sleep(time);
          // 产块逻辑
          State state = produceBlock();
          if (!State.OK.equals(state)) {
            logger.info("Produce block failed: {}", state);
          }
        }
      } catch (InterruptedException e) {
        logger.warn("Produce block task interrupted.");
        Thread.currentThread().interrupt();
      } catch (Throwable throwable) {
        logger.error("Produce block error.", throwable);
      }
    }
  };
  produceThread = new Thread(runnable, "DPosMiner");
  produceThread.start();
  logger.info("DPoS task started.");
}

核心逻辑produceBlock()

这段代码体现的是产块逻辑中的时间轮机制。

private State produceBlock() {

  State state = stateManager.getState();
  if (!State.OK.equals(state)) {
    return state;
  }

  synchronized (dposService.getBlockHandle().getLock()) {
    //获得一个slot，细节在下面说明
    long slot = dposSlot.getSlot(System.currentTimeMillis() + 50);
    if (slot == 0) {
      return State.NOT_TIME_YET;
    }
    //根据当前时间轮，判断是否到自己产块
    //根据slot获取自己的信息
    ByteString pWitness = dposSlot.getScheduledWitness(slot);
    
    Miner miner = dposService.getMiners().get(pWitness);
    if (miner == null) {
      return State.NOT_MY_TURN;
    }
    //获取当前的一个时间戳，用作产块时间
    long pTime = dposSlot.getTime(slot);
    // int BLOCK_PRODUCE_TIMEOUT_PERCENT = 50; // 50%
    // 3000 / 2 * 50 / 100 = 750(ms) 
    // 所以产块时间只有750豪秒
    long timeout =
        pTime + BLOCK_PRODUCED_INTERVAL / 2 * dposService.getBlockProduceTimeoutPercent() / 100;
    BlockCapsule blockCapsule = dposService.getBlockHandle().produce(miner, pTime, timeout);
    if (blockCapsule == null) {
      return State.PRODUCE_BLOCK_FAILED;
    }

    BlockHeader.raw raw = blockCapsule.getInstance().getBlockHeader().getRawData();
    logger.info("Produce block successfully, num: {}, time: {}, witness: {}, ID:{}, parentID:{}",
        raw.getNumber(),
        new DateTime(raw.getTimestamp()),
        ByteArray.toHexString(raw.getWitnessAddress().toByteArray()),
        new Sha256Hash(raw.getNumber(), Sha256Hash.of(CommonParameter
            .getInstance().isECKeyCryptoEngine(), raw.toByteArray())),
        ByteArray.toHexString(raw.getParentHash().toByteArray()));
  }

  return State.OK;
}

时间槽机制getSlot

这个方法看似简单，实际上很有意思，这实际上是时间槽的实现。包括像EOS也是这个机制，很多DPOS的项目都是Slot机制。
Slot机制，简单的说就是把时间按单位进行分片，每3秒一个Slot，是不是很熟悉，在缓存分片中有一种方案叫哈希环也有Slot的概念。
一个是对时间进行分片，一个是对空间进行分片。Tron 是怎么实现的，看代码说明。

下面这段代码是获取一个slot，一个slot是3000ms。

public long getSlot(long time) {
  long firstSlotTime = getTime(1);
  if (time < firstSlotTime) {
    return 0;
  }
  return (time - firstSlotTime) / BLOCK_PRODUCED_INTERVAL + 1;
}

public long getTime(long slot) {
  // 上面调用传入 1 不可能为0
  if (slot == 0) {
    return System.currentTimeMillis();
  }
  // BLOCK_PRODUCED_INTERVAL = 3000; 这个常量贯穿很多地方，注意这个常量
  long interval = BLOCK_PRODUCED_INTERVAL;
  // 程序刚启动 getLatestBlockHeaderNumbe=0，接收到新区块更新这个值
  if (consensusDelegate.getLatestBlockHeaderNumber() == 0) {
    return dposService.getGenesisBlockTime() + slot * interval;
  }
  if (consensusDelegate.lastHeadBlockIsMaintenance()) {
    slot += consensusDelegate.getMaintenanceSkipSlots();
  }
  // 注意这里，获取的是最新高度的区块头时间戳
  long time = consensusDelegate.getLatestBlockHeaderTimestamp();
  // GenesisBlockTime = 0 这个值在配置文件 config.conf 中配置的
  time = time - ((time - dposService.getGenesisBlockTime()) % interval);
  // 返回当前时间戳 + 3000 * 1
  return time + interval * slot;
}

拿到下个时间节点的 slot 之后，就可以判断是不是自己轮到自己产块了。
实现方式：使用当前块高对27进行取模。在启动时将27个SR加入列表。

public ByteString getScheduledWitness(long slot) {
  final long currentSlot = getAbSlot(consensusDelegate.getLatestBlockHeaderTimestamp()) + slot;
  if (currentSlot < 0) {
    throw new RuntimeException("current slot should be positive.");
  }
  int size = consensusDelegate.getActiveWitnesses().size();
  if (size <= 0) {
    throw new RuntimeException("active witnesses is null.");
  }
  int witnessIndex = (int) currentSlot % (size * SINGLE_REPEAT);
  witnessIndex /= SINGLE_REPEAT;
  return consensusDelegate.getActiveWitnesses().get(witnessIndex);
}

产块逻辑

终于到了这个最核心的部分了。细节都在代码注释当中，有几个小点提一下：

产块是有时间限制的，不超过750ms
区块大小有限制：不会超过2MB
如果没有交易，是会产出空块的
产块后，立即处理区块，在PendingManager中清空pending队列

/**
 * Generate a block.
 */
public synchronized BlockCapsule generateBlock(Miner miner, long blockTime, long timeout) {

  long postponedTrxCount = 0;

  // 构建一个空的 BlockCapsule 对象三个参数：
  // chainBaseManager.getHeadBlockNum() 前一个区块高度
  // chainBaseManager.getHeadBlockId() 前一个区块hash
  // 本节点公钥
  BlockCapsule blockCapsule = new BlockCapsule(chainBaseManager.getHeadBlockNum() + 1,
      chainBaseManager.getHeadBlockId(),
      blockTime, miner.getWitnessAddress());
  blockCapsule.generatedByMyself = true;
  // 这个注意！！reset 是将当前数据快照回退！！
  // 这是一个很精随又很复杂的操作，可以理解为将上一个区块到这一行代码之前的所有数据库的所有操作回滚！！
  session.reset();
  session.setValue(revokingStore.buildSession());

  accountStateCallBack.preExecute(blockCapsule);
  // 判断是否多签
  if (getDynamicPropertiesStore().getAllowMultiSign() == 1) {
    byte[] privateKeyAddress = miner.getPrivateKeyAddress().toByteArray();
    AccountCapsule witnessAccount = getAccountStore()
        .get(miner.getWitnessAddress().toByteArray());
    if (!Arrays.equals(privateKeyAddress, witnessAccount.getWitnessPermissionAddress())) {
      logger.warn("Witness permission is wrong");
      return null;
    }
  }

  TransactionRetCapsule transactionRetCapsule = new TransactionRetCapsule(blockCapsule);

  Set<String> accountSet = new HashSet<>();
  AtomicInteger shieldedTransCounts = new AtomicInteger(0);
  // pendingTransactions 交易池
  // rePushTransactions，上次打包 和 本次打包中pendingTransactions中没执行完的交易移入 rePushTransactions
  // 所以每次打包并不一定会将pendingTransactions 中的所有交易打包完，毕竟只有750ms的打包时间
  while (pendingTransactions.size() > 0 || rePushTransactions.size() > 0) {
    boolean fromPending = false;
    TransactionCapsule trx;
    if (pendingTransactions.size() > 0) {
      // 注意这里是 peek 不是 poll，为了防止本次执行异常交易丢失
      trx = pendingTransactions.peek();
      //交易排序，默认不开启
      if (Args.getInstance().isOpenTransactionSort()) {
        TransactionCapsule trxRepush = rePushTransactions.peek();
        if (trxRepush == null || trx.getOrder() >= trxRepush.getOrder()) {
          fromPending = true;
        } else {
          trx = rePushTransactions.poll();
        }
      } else {
        fromPending = true;
      }
    } else {
      trx = rePushTransactions.poll();
    }
    // 是否 > 750ms
    if (System.currentTimeMillis() > timeout) {
      logger.warn("Processing transaction time exceeds the producing time.");
      break;
    }

    // check the block size
    // ChainConstant.BLOCK_SIZE = 2_000_000
    // 检验区块是否大于 2MB，也就是说，一块 block 不会超过2MB
    if ((blockCapsule.getInstance().getSerializedSize() + trx.getSerializedSize() + 3)
        > ChainConstant.BLOCK_SIZE) {
      postponedTrxCount++;
      continue;
    }
    //shielded transaction
    // 是否允许匿名交易
    if (isShieldedTransaction(trx.getInstance())
        && shieldedTransCounts.incrementAndGet() > SHIELDED_TRANS_IN_BLOCK_COUNTS) {
      continue;
    }
    //multi sign transaction
    // 判断多签交易，如果是自己的多签交易跳过去
    // 否则就添加到 accountSet 中
    Contract contract = trx.getInstance().getRawData().getContract(0);
    byte[] owner = TransactionCapsule.getOwner(contract);
    String ownerAddress = ByteArray.toHexString(owner);
    if (accountSet.contains(ownerAddress)) {
      continue;
    } else {
      if (isMultiSignTransaction(trx.getInstance())) {
        accountSet.add(ownerAddress);
      }
    }
    if (ownerAddressSet.contains(ownerAddress)) {
      trx.setVerified(false);
    }
    // apply transaction
    // 构建一个内存快照，目的是如果执行失败了，回滚所有交易状态
    try (ISession tmpSession = revokingStore.buildSession()) {
      accountStateCallBack.preExeTrans();
      // 又执行一遍交易，实际上接收效易的时候已经执行过一次
      TransactionInfo result = processTransaction(trx, blockCapsule);
      accountStateCallBack.exeTransFinish();
      // 合并当前快照状态，这个不用记较，后面转门说一下这个快照功能，非常经典，即便不做区块链
      // 这个功能也可以用在别的场景
      tmpSession.merge();
      // 将这笔交易添加到区块中！！！这样区块中就有了交易了
      blockCapsule.addTransaction(trx);
      if (Objects.nonNull(result)) {
        transactionRetCapsule.addTransactionInfo(result);
      }
      if (fromPending) {
        // 上面已经添加到 block中了，弹出这一笔交易
        // 因为中间如果执行时间超时了，这笔交易就丢了，所以到这里才poll
        // 那如果，执行到这里，这个节点挂了，交易不还是丢失了?
        // 本节点交易是会丢失，但是其他26个节点还保留着完整的数据。
        pendingTransactions.poll();
      }
    } catch (Exception e) {
      logger.error("Process trx {} failed when generating block: {}", trx.getTransactionId(),
          e.getMessage());
    }
  }

  // 构建状态根
  accountStateCallBack.executeGenerateFinish();

  // 回滚快照，这个地方很容易让人疑惑，都打完包了，为什么还要回滚一次?
  // 那之前的状态不就全回去了，比如 A 给 B 转10块钱，回滚后相当于这个操作没有执行过？
  // 其实，这里回滚是为了后继步骤自己处理区块的时候数据库状态回退到原始状态，这么做的原因，后续单独讲为什么，也是一个很经典的设计。
  // 但是我个人感觉这样的处理非常费性能，因为 reset 操作遍历层级太多，也比较费时。
  session.reset();

  logger.info("Generate block {} success, pendingCount: {}, rePushCount: {}, postponedCount: {}",
      blockCapsule.getNum(),
      pendingTransactions.size(), rePushTransactions.size(), postponedTrxCount);

  // 设置 默克尔根
  blockCapsule.setMerkleRoot();
  // 对整个区块签名
  blockCapsule.sign(miner.getPrivateKey());

  BlockCapsule capsule = new BlockCapsule(blockCapsule.getInstance());
  capsule.generatedByMyself = true;
  return capsule;
}

产块后做什么

主要就是几件事

广播区块
处理区块

public BlockCapsule produce(Miner miner, long blockTime, long timeout) {
  // 1.产块
  BlockCapsule blockCapsule = manager.generateBlock(miner, blockTime, timeout);
  if (blockCapsule == null) {
    return null;
  }
  try {
    consensus.receiveBlock(blockCapsule);
    // 2.构建广播消息
    BlockMessage blockMessage = new BlockMessage(blockCapsule);
    // 3.广播区块
    tronNetService.broadcast(blockMessage);
    // 4.处理区块，自己生产的区块并没有在产块阶段直接入库，而是调用处理区块方法，处理并入库
    manager.pushBlock(blockCapsule);
  } catch (Exception e) {
    logger.error("Handle block {} failed.", blockCapsule.getBlockId().getString(), e);
    return null;
  }
  return blockCapsule;
}

产块异常怎么处理

场景复现，假设只有三个节点，分别在以下假设的时间节点产块:
A 在 16000000 产块
B 在 16003000 产块
C 在 16006000 产块

A 在 16000000 时产了个块高为 10000 的块后广播给 B、C
B 在 16003000 时产了个块高为 10001 的块后广播给 A、C，但是由于网络原因这个区块没有广播出去
特殊场景来了：C 没有接到到 B 的区块，只接收到了 A 的区块高度，所以：
C 在 16006000 时产了个块高为 10001，向A、B广播

此时A的区块链为 10000(A)-->10001(C)
此时B的区块链为 10000(A)-->10001(B)
此时C的区块链为 10000(A)-->10001(C)

但是这个时候，B的网络恢复了，向 A、C 广播出块高为 10001(B) 的块，那么A、C 都会收到 B 的块，这个时候就分叉
B 也会收到 C 广播出去的块高。

此时A的区块链为

1 2	10000(A)-->10001(C) \->10001'(B)

此时B的区块链为

1 2	10000(A)-->10001(B) \->10001'(C)

此时C的区块链为

1 2	10000(A)-->10001'(B) \->10001(C)

这么乱，怎么搞？
这个时候，就会泛及到区块链的另一个经典问题：分叉和切链。
先说解决方案：切链。
切链是走最长链原则，有分叉不要仅，继续接收分叉的区块，最后看谁的链条长，就切到到谁的链上。

处理产块后的区块

产块是产完了，产完之后怎么处理。处理在专门的Manager.pushBlock中进行处理。
这个过程比较长，这里只说产块后需要共识的处理部份：

1 2	Manager.pushBlock() \-processBlock() //方法中的共识处理部分

processBlock() 中处理共识的部分

...
   if (!consensus.applyBlock(block)) {
     throw new BadBlockException("consensus apply block failed");
   }
...

到DposService.applyBlock() 中

@Override
public boolean applyBlock(BlockCapsule blockCapsule) {
  statisticManager.applyBlock(blockCapsule);
  // 计算维护期
  maintenanceManager.applyBlock(blockCapsule);
  // 更新固化块高度
  updateSolidBlock();
  return true;
}

更新固化块高度的逻辑在updateSolidBlock()

private void updateSolidBlock() {
  // 拿到所有 SR 节点的最新区块高度
  // 一个新的块广播到其他26个SR节点中，SR处理成功则更新最新区块高度，否则丢弃区块
  List<Long> numbers = consensusDelegate.getActiveWitnesses().stream()
      .map(address -> consensusDelegate.getWitness(address.toByteArray()).getLatestBlockNum())
      // 注里这里 排序了一下，从小到大排序，这个很重要，影响到下面取区块高度
      .sorted()
      .collect(Collectors.toList());
  long size = consensusDelegate.getActiveWitnesses().size();
  // position=30%的位置，由于上面从小到大排序，所有 30%的位置，就是70%的结果
  // 不理解的话，自己写个 List 打印一下就知道了
  int position = (int) (size * (1 - SOLIDIFIED_THRESHOLD * 1.0 / 100));
  long newSolidNum = numbers.get(position);
  long oldSolidNum = consensusDelegate.getLatestSolidifiedBlockNum();
  if (newSolidNum < oldSolidNum) {
    logger.warn("Update solid block number failed, new: {} < old: {}", newSolidNum, oldSolidNum);
    return;
  }
  CommonParameter.getInstance()
      .setOldSolidityBlockNum(consensusDelegate.getLatestSolidifiedBlockNum());
  consensusDelegate.saveLatestSolidifiedBlockNum(newSolidNum);
  logger.info("Update solid block number to {}", newSolidNum);
}

总结

TRON 的链结合了 DPOS 的机制，这种机制的优点是产块效率高，低功耗只有27个产块节点，问题也很明显，27个节点被控制，那整条链就被控制，大部分区块链的社区都希望链更加透明化公开化。
总的来说在国产链的应用上算是很广了，手续费非常便宜，值得一用。