libcrypto升级导致syslog-ng单向认证由于'unknown ca'错误无法解析报文

问题描述

PME版本升级, 更新了libcrypto版本, 导致syslog-ng单向认证无法解析报文, 提示unknown ca
报错显示如下

Sep 3 14:09:33 syslog4 syslog-ng[45709]: Syslog connection accepted; fd=’5’, client=’AF_INET(71.41.5.127:38057)’, local=’AF_INET(0.0.0.0:2460)’
Sep 3 14:09:33 syslog4 syslog-ng[45709]: SSL error while reading stream; tls_error=’SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca’, location=’/etc/syslog-ng/syslog-ng.conf:254:9’
Sep 3 14:09:33 syslog4 syslog-ng[45709]: I/O error occurred while reading; fd=’5’, error=’Connection reset by peer (104)’
Sep 3 14:09:33 syslog4 syslog-ng[45709]: Syslog connection closed; fd=’5’, client=’AF_INET(71.41.5.127:38057)’, local=’AF_INET(0.0.0.0:2460)’

定位思路

  1. 先尝试复现问题
  2. 二分定位确定问题代码
    1. 通过git bisect -> 编译libcrypto -> 使用此so,替换syslog服务器使用文件 -> 验证问题是否存在 -> git bisect
  3. 对变更代码加日志, 查看二者流程差异
  4. 客户端gdb调试syslog-ng, 查看报错点, 以及是如何调用openssl的libctypto的
  5. 客户端上tcpdump抓包, 对比正误情况下的报文

复现问题

syslog服务器71.47.142.13

配置生效重启syslog服务killall /sbin/syslog-ng; /sbin/syslog-ng -f /etc/syslog-ng/syslog-ng.conf

客户端 71.41.5.127

killall /usr/sbin/syslog-ng;/usr/sbin/syslog-ng -f /etc/syslog-ng/syslog-ng.conf -p /var/run/syslogd.pid -F -R /opt/pme/pram/syslog-ng.persist

复现步骤

  1. 配置好syslog和客户端
  2. 在客户端上使用旧版本libcrypto, 通过单向认证发送syslog报文
  3. 在syslog服务器上tail -f /var/log/RemoteSyslog/messages 以及 tail -f /var/log/RemoteSyslog/syslog.log查看产生的日志
  4. syslog.log收到如下日志, 表示配置正确, 发送成功

    Sep 3 10:51:55 huawei127 <134>Sep 3 10:54:40 huawei127 sensor_alarm: server 2 test: The receipt of this message confirms that Syslog is configured correctly
    Sep 3 10:51:55 huawei127 <174>Sep 3 10:54:40 huawei127 sensor_alarm: 2021-09-03 10:54:40 WEB,Administrator@75.255.30.59,sensor_alarm,Set remote syslog dest2 test successfully

  5. 在客户端替换新版本libcrypto, 重启syslog服务killall /usr/sbin/syslog-ng;/usr/sbin/syslog-ng -f /etc/syslog-ng/syslog-ng.conf -p /var/run/syslogd.pid -F -R /opt/pme/pram/syslog-ng.persist, 发送单向认证报文, 有如下报错

    Sep 3 14:09:33 syslog4 syslog-ng[45709]: Syslog connection accepted; fd=’5’, client=’AF_INET(71.41.5.127:38057)’, local=’AF_INET(0.0.0.0:2460)’
    Sep 3 14:09:33 syslog4 syslog-ng[45709]: SSL error while reading stream; tls_error=’SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca’, location=’/etc/syslog-ng/syslog-ng.conf:254:9’
    Sep 3 14:09:33 syslog4 syslog-ng[45709]: I/O error occurred while reading; fd=’5’, error=’Connection reset by peer (104)’
    Sep 3 14:09:33 syslog4 syslog-ng[45709]: Syslog connection closed; fd=’5’, client=’AF_INET(71.41.5.127:38057)’, local=’AF_INET(0.0.0.0:2460)’

确定问题代码

libcrypto版本更新代码不多,
通过二分定位git bisect -> 编译libcrypto -> 使用此so,替换syslog服务器使用文件 -> 验证问题是否存在 -> git bisect,确定问题结点来自openssle2590c3a162eb118c36b09c2168164283aa099b4

commit e2590c3a162eb118c36b09c2168164283aa099b4 (syslog)
Author: Dr. David von Oheimb David.von.Oheimb@siemens.com
Date: Tue Dec 24 11:25:15 2019 +0100

Fix issue 1418 by moving check of KU_KEY_CERT_SIGN and weakening check_issued()

Move check that cert signing is allowed from x509v3_cache_extensions() to
where it belongs: internal_verify(), generalize it for proxy cert signing.
Correct and simplify check_issued(), now checking self-issued (not: self-signed).
Add test case to 25-test_verify.t that demonstrates successful fix.

As prerequisites, this adds the static function check_sig_alg_match()
and the internal functions x509_likely_issued() and x509_signing_allowed().

This is a backport of the core of PR #10587.
Fixes #1418

Reviewed-by: Tomas Mraz tmraz@fedoraproject.org
(Merged from https://github.com/openssl/openssl/pull/12357)

该结点修改代码不多.

使用gdb调试

断点挂上了,但没有停住, syslog-ng没有调试信息, 此尝试失败

tcpdump抓包

./tcpdump_arm32 host 71.47.142.13 -w /data/tcp_suc

对比报文, 发现

成功

1 0.0 71.41.5.127 71.47.142.13 TLSv1.3 Client Hello
2 0.0 71.47.142.13 71.41.5.127 TCP 2460 → 46157 [ACK] Seq=1 Ack=294 Win=501 Len=0
3 0.0 71.47.142.13 71.41.5.127 TLSv1.3 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data
4 0.0 71.41.5.127 71.47.142.13 TCP 46157 → 2460 [ACK] Seq=294 Ack=1461 Win=502 Len=0
5 0.0 71.47.142.13 71.41.5.127 TLSv1.3 Application Data
6 0.0 71.41.5.127 71.47.142.13 TCP 46157 → 2460 [ACK] Seq=294 Ack=1480 Win=502 Len=0
7 0.3 71.41.5.127 71.47.142.13 TLSv1.3 Change Cipher Spec, Application Data

失败

4 0.0 71.41.5.127 71.47.142.13 TLSv1.3 Client Hello
5 0.0 71.47.142.13 71.41.5.127 TCP 2460 → 37895 [ACK] Seq=1 Ack=294 Win=64128 Len=0
6 0.0 71.47.142.13 71.41.5.127 TLSv1.3 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data
7 0.0 71.41.5.127 71.47.142.13 TCP 37895 → 2460 [ACK] Seq=294 Ack=1461 Win=32128 Len=0
8 0.0 71.47.142.13 71.41.5.127 TLSv1.3 Application Data
9 0.0 71.41.5.127 71.47.142.13 TCP 37895 → 2460 [ACK] Seq=294 Ack=1480 Win=32128 Len=0
10 0.0 71.41.5.127 71.47.142.13 TLSv1.3 Alert (Level: Fatal, Description: Unknown CA)

也即, 客户端对服务器的证书校验没有通过.

变更代码打印调试

  1. 二分确认问题点: 变更代码较少, 可以很快确定
  2. 编译
  3. 对比成败问题点数据

对变更代码进行控制

1
2
3
4
5
6
7
8
9
10
// 先把新旧代码放在一起
#ifdef my_test_xxx
// new version code
#else
// old version code
#endif
// 然后逐步放开代码, 找到错误代码
#if 1
#else
#endif

添加打印

不知道printf和log被重定向到哪里去了, 直接使用echo打印到文件

1
2
3
4
char cmd[1000] = {0};
snprintf(cmd, sizeof(cmd) - 1, "echo '(!ku_reject(subject, KU_KEY_CERT_SIGN)) != (ret == X509_V_OK)' >> /tmp/log");
system(cmd);
memset(cmd, 0, sizeof(cmd));

对比数据

旧版本日志

begin:—————–

!ku_reject(x, KU_KEY_CERT_SIGN)=0
(!ku_reject(subject, KU_KEY_CERT_SIGN)) != (ret == X509_V_OK)
ret=0
!ku_reject(subject, KU_KEY_CERT_SIGN)=0
check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK = 1

X509_check_akid(x, x->akid) =0

x->sha1_hash=043E21AC3520878795F3AA60CCAF2D58D72A31EF

else flag

end:—————–

begin:—————–

!ku_reject(x, KU_KEY_CERT_SIGN)=1
check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK = 1

X509_check_akid(x, x->akid) =0

x->sha1_hash=AF35EF3490C5965D3FF4179A9BDA53F4F10B12DF

suc flag

end:—————–

新版本日志

begin:—————–

!ku_reject(x, KU_KEY_CERT_SIGN)=0
(!ku_reject(subject, KU_KEY_CERT_SIGN)) != (ret == X509_V_OK)
ret=0
!ku_reject(subject, KU_KEY_CERT_SIGN)=0
check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK = 1

X509_check_akid(x, x->akid) =0

x->sha1_hash=043E21AC3520878795F3AA60CCAF2D58D72A31EF

(!ku_reject(subject, KU_KEY_CERT_SIGN)) != (ret == X509_V_OK)
ret=0
!ku_reject(subject, KU_KEY_CERT_SIGN)=0
fail flag

end:—————–

begin:—————–

!ku_reject(x, KU_KEY_CERT_SIGN)=1
check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK = 1

X509_check_akid(x, x->akid) =0

x->sha1_hash=AF35EF3490C5965D3FF4179A9BDA53F4F10B12DF

fail flag

end:—————–

通过调试数据确定问题出现在crypto/x509v3/v3_purp.c:586条件判断处

新版本判断条件由!ku_reject(x, KU_KEY_CERT_SIGN) 变为 check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK),

出错时, 根据打印的证书sha值, 也即win下打开证书查看的指纹

可以看到, server.csr在新版本走if分支, 而旧版本走else分支, 而ca.csr则分支一致, 说明问题出在server.csr的校验上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 snprintf(cmd, sizeof(cmd) - 1, "echo 'x->sha1_hash=%s\n' >> /tmp/log", consumer_number);
system(cmd);
memset(cmd, 0, sizeof(cmd));
#if 1
if (X509_check_akid(x, x->akid) == X509_V_OK /* SKID matches AKID */
/* .. and the signature alg matches the PUBKEY alg: */
&& check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK) // 新版本代码
{
memset(cmd, 0, sizeof(cmd));
snprintf(cmd, sizeof(cmd) - 1, "echo 'fail flag\n' >> /tmp/log");
system(cmd);
x->ex_flags |= EXFLAG_SS; /* indicate self-signed */
}
#else
if (X509_check_akid(x, x->akid) == X509_V_OK &&
!ku_reject(x, KU_KEY_CERT_SIGN))
{
snprintf(cmd, sizeof(cmd) - 1, "echo 'suc flag\n' >> /tmp/log");
system(cmd);
memset(cmd, 0, sizeof(cmd));
x->ex_flags |= EXFLAG_SS;
}
#endif
else {
memset(cmd, 0, sizeof(cmd));
snprintf(cmd, sizeof(cmd) - 1, "echo 'else flag\n' >> /tmp/log");
system(cmd);
}

其中ku_reject定义如下

1
2
3
4
#define ku_reject(x,usage) (((x)->ex_flags & EXFLAG_KUSAGE) && !((x)->ex_kusage & (usage)))
//Expands to:

(((x)->ex_flags & EXFLAG_KUSAGE) && !((x)->ex_kusage & (0x0004)))

代码分析

解决问题后回头看, 这里的讨论或许能解释为何要更改这个判断条件, 但当时既然已经有很有限的代码, 就直接顺着代码分析下去, 没有仔细看这个帖子

Improve key usage checks in internal_verify() and backport fix of issue 1418 to v1.1.1 by DDvO · Pull Request #12357 · openssl/openssl

syslog-ng支持使用libcrypto进行单向和双向认证, 单向认证时, 客户端使用本地的ca.csr校验服务器发送过来的server.csr.

这里出错时, 将server.csr本不该加上的EXFLAG_SS加上了, 导致失败

这个标志, 表明证书是self-signed.

从代码上看,self-signed证书的条件在于

  • 首先得是self-issued, 也即满足!X509_NAME_cmp(X509_get_subject_name(x), X509_get_issuer_name(x))

    • 这里的subject即使用者, issuer即签发者
  • 而后,旧版本, 满足!ku_reject(x, KU_KEY_CERT_SIGN)

    • 也即通过key_usage判断;

      1
      2
      3
      4
      #define ku_reject(x,usage) (((x)->ex_flags & EXFLAG_KUSAGE) && !((x)->ex_kusage & (usage)))
      //Expands to:

      (((x)->ex_flags & EXFLAG_KUSAGE) && !((x)->ex_kusage & (0x0004)))
    • 这里的依据, 可能是ssl - OpenSSL CA keyUsage extension - Super User所讨论的Self-signed CA的keyUsage的差异

  • 新版本, 判定条件改为check_sig_alg_match(X509_get0_pubkey(x), x) == X509_V_OK)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    /* Check that issuer public key algorithm matches subject signature algorithm */
    static int check_sig_alg_match(const EVP_PKEY *pkey, const X509 *subject)
    {
    int pkey_nid;

    if (pkey == NULL)
    return X509_V_ERR_NO_ISSUER_PUBLIC_KEY;
    if (OBJ_find_sigid_algs(OBJ_obj2nid(subject->cert_info.signature.algorithm),
    NULL, &pkey_nid) == 0)
    return X509_V_ERR_UNSUPPORTED_SIGNATURE_ALGORITHM;
    if (EVP_PKEY_type(pkey_nid) != EVP_PKEY_base_id(pkey))
    return X509_V_ERR_SIGNATURE_ALGORITHM_MISMATCH;
    return X509_V_OK;
    }
    • 可以看出, 和之前毫无关系.
  • 不论如何, 要构造出可用的server.csr, 需要满足以下条件

    • self-issued, 即证书颁发者字段和使用者字段一致
    • SKID(subject key identifier)和AKID(authority key identifier)不一致, 或者签名算法和公钥算法不一致
  • 这里可以很容易构造出SKID和AKID不一致的场景

问题解决

修改证书生成脚本,重新生成证书, 其中ca.csr不变, server.csr中使SKID和AKID不一致.

至此, 问题得到解决

附件

  • 重启syslog /usr/sbin/syslog-ng -f /etc/syslog-ng/syslog-ng.conf -p /var/run/syslogd.pid -F -R /opt/pme/pram/syslog-ng.persist -dev
  • 报错日志
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

[2021-10-12T11:22:38.956853] Syslog connection failed; fd='19', server='AF_INET(71.47.142.13:2440)', error='Connection refused (111)', time_reopen='10'
[2021-10-12T11:22:45.778034] Syslog connection accepted; fd='13', client='AF_UNIX(anonymous)', local='AF_UNIX(/dev/log)'
[2021-10-12T11:22:45.805019] Incoming log entry; line='<174>Oct 12 11:22:45 user: 2021-10-12 11:22:45 WEB,Administrator@75.255.28.83,User,Administrator(75.255.28.83) login successfully over the WebUI'
[2021-10-12T11:22:45.818988] Incoming log entry; line='<86>Oct 12 11:22:45 user: Administrator(75.255.28.83) login successfully over the WebUI'
[2021-10-12T11:22:45.824206] Initializing destination file writer; template='/var/log/pme/security_log', filename='/var/log/pme/security_log'
[2021-10-12T11:22:45.826221] Outgoing message; message='2021-10-12T11:22:45+08:00 cxtkVeyDTL user: Administrator(75.255.28.83) login successfully over the WebUI\x0a'
[2021-10-12T11:22:45.846269] Outgoing message; message='<86> iBMC test: 2021-10-12T11:22:45+08:00 cxtkVeyDTL user: Administrator(75.255.28.83) login successfully over the WebUI\x0a'
[2021-10-12T11:22:45.870464] Outgoing message; message='<174> iBMC test: 2021-10-12 11:22:45 WEB,Administrator@75.255.28.83,User,Administrator(75.255.28.83) login successfully over the WebUI\x0a'
[2021-10-12T11:22:45.896590] Certificate validation failed; subject='CN=syslog4.it.bmctest.com, OU=it, O=huawei, L=hz, ST=zj, C=cn', issuer='CN=syslog4.it.bmctest.com, OU=it, O=huawei, L=hz, ST=zj, C=cn', error='self signed certificate', depth='0'
[2021-10-12T11:22:45.914936] SSL error while writing stream; tls_error='SSL routines:tls_process_server_certificate:certificate verify failed', location='/etc/syslog-ng/remotelog.conf:20:119'
[2021-10-12T11:22:45.915711] I/O error occurred while writing; fd='16', error='Broken pipe (32)'
[2021-10-12T11:22:45.926733] Syslog connection broken; fd='16', server='AF_INET(71.47.142.13:2460)', time_reopen='10'
[2021-10-12T11:22:45.935473] Certificate validation failed; subject='CN=syslog4.it.bmctest.com, OU=it, O=huawei, L=hz, ST=zj, C=cn', issuer='CN=syslog4.it.bmctest.com, OU=it, O=huawei, L=hz, ST=zj, C=cn', error='self signed certificate', depth='0'
[2021-10-12T11:22:45.941132] SSL error while writing stream; tls_error='SSL routines:tls_process_server_certificate:certificate verify failed', location='/etc/syslog-ng/remotelog.conf:19:115'
[2021-10-12T11:22:45.944786] I/O error occurred while writing; fd='15', error='Broken pipe (32)'
[2021-10-12T11:22:45.949132] Syslog connection broken; fd='15', server='AF_INET(71.47.142.13:2460)', time_reopen='10'
[2021-10-12T11:22:48.965869] Syslog connection failed; fd='15', server='AF_INET(71.47.142.13:2440)', error='Connection refused (111)', time_reopen='10'
[2021-10-12T11:22:55.935707] Syslog connection established; fd='15', server='AF_INET(71.47.142.13:2460)', local='AF_INET(0.0.0.0:0)'
[2021-10-12T11:22:55.956495] Syslog connection established; fd='16', server='AF_INET(71.47.142.13:2460)', local='AF_INET(0.0.0.0:0)'
[2021-10-12T11:22:58.974612] Syslog connection failed; fd='19', server='AF_INET(71.47.142.13:2440)', error='Connection refused (111)', time_reopen='10'
  • gdb查看调用栈
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#0  tls_session_verify (self=0x101e348, ok=0, ctx=0xb5b489a8) at lib/tlscontext.c:186
#1 0xb6f271ec in tls_session_verify_callback (ok=0, ctx=0xb5b489a8) at lib/tlscontext.c:264
#2 0xb6c6ee6c in ?? () from /usr/lib/libcrypto.so.1.1
#3 0xb6c6fed8 in X509_verify_cert () from /usr/lib/libcrypto.so.1.1
#4 0xb6d1eb44 in ssl_verify_cert_chain () from /usr/lib/libssl.so.1.1
#5 0xb6d4176c in tls_process_server_certificate () from /usr/lib/libssl.so.1.1
#6 0xb6d43ccc in ossl_statem_client_process_message () from /usr/lib/libssl.so.1.1
#7 0xb6d3d900 in state_machine () from /usr/lib/libssl.so.1.1
#8 0xb6d138ac in ssl3_write_bytes () from /usr/lib/libssl.so.1.1
#9 0xb6d1b220 in ssl3_write () from /usr/lib/libssl.so.1.1
#10 0xb6d26bc8 in ssl_write_internal () from /usr/lib/libssl.so.1.1
#11 0xb6d26d90 in SSL_write () from /usr/lib/libssl.so.1.1
#12 0xb6f27e48 in log_transport_tls_write_method (s=0x101bfc8, buf=<optimized out>, buflen=<optimized out>) at lib/transport/transport-tls.c:178
#13 0xb6f39744 in log_transport_write (count=106, buf=<optimized out>, self=<optimized out>) at ./lib/transport/logtransport.h:47
#14 log_proto_text_client_flush (s=0x101fed0) at lib/logproto/logproto-text-client.c:90
#15 0xb6f19ac0 in log_proto_client_process_in (s=<optimized out>) at lib/logproto/logproto-client.h:163
#16 log_writer_process_in (self=0xfdb2c0) at lib/logwriter.c:1318
#17 log_writer_work_perform (s=0xfdb2c0, cond=<optimized out>) at lib/logwriter.c:204
#18 0xb6f1c070 in _work (self=<optimized out>) at lib/mainloop-io-worker.c:69
#19 0xb6f65b44 in iv_work_thread_got_event (_thr=0x101dc90) at iv_work.c:106
#20 0xb6f64404 in __iv_event_run_pending_events (_st=0xb5b23860) at iv_event.c:56
#21 0xb6f69824 in iv_fd_epoll_timerfd_poll (st=0xb5b23860, active=0xb64fd3d4, abs=<optimized out>) at iv_fd_epoll.c:484
#22 0xb6f667dc in iv_fd_poll_and_run (st=st@entry=0xb5b23860, abs=<optimized out>) at iv_fd.c:206
#23 0xb6f67884 in iv_main () at iv_main_posix.c:112
#24 0xb6f6594c in iv_work_thread (_thr=0x101dc90) at iv_work.c:185
#25 0xb6f68998 in iv_thread_handler (_thr=0xfe0940) at iv_thread_posix.c:112
#26 0xb6a53374 in ?? () from /lib/libpthread.so.0
#27 0xb69da9e0 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  • 服务器日志
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# server message 日志
$ ssh root@71.47.142.13
Password:
Last login: Tue Oct 12 14:17:40 2021 from 75.255.28.83
Have a lot of fun...
localhost:~ # tail -f /var/log/RemoteSyslog/messages


Oct 12 15:19:29 localhost syslog-ng[2835]: Syslog connection accepted; fd='26', client='AF_INET(71.56.25.9:35867)', local='AF_INET(0.0.0.0:2460)'
Oct 12 15:19:29 localhost syslog-ng[2835]: Syslog connection accepted; fd='28', client='AF_INET(71.56.25.9:34934)', local='AF_INET(0.0.0.0:2460)'
Oct 12 15:19:29 localhost syslog-ng[2835]: SSL error while reading stream; tls_error='SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca', location='/etc/syslog-ng/syslog-ng.conf:254:9'
Oct 12 15:19:29 localhost syslog-ng[2835]: I/O error occurred while reading; fd='26', error='Connection reset by peer (104)'
Oct 12 15:19:29 localhost syslog-ng[2835]: Syslog connection closed; fd='26', client='AF_INET(71.56.25.9:35867)', local='AF_INET(0.0.0.0:2460)'
Oct 12 15:19:29 localhost syslog-ng[2835]: SSL error while reading stream; tls_error='SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca', location='/etc/syslog-ng/syslog-ng.conf:254:9'
Oct 12 15:19:29 localhost syslog-ng[2835]: I/O error occurred while reading; fd='28', error='Connection reset by peer (104)'
Oct 12 15:19:29 localhost syslog-ng[2835]: Syslog connection closed; fd='28', client='AF_INET(71.56.25.9:34934)', local='AF_INET(0.0.0.0:2460)'



Reference

Key usage extensions and extended key usage

Improve key usage checks in internal_verify() and backport fix of issue 1418 to v1.1.1 by DDvO · Pull Request #12357 · openssl/openssl

java - Find if a certificate is self signed or CA signed - Stack Overflow