mongod does not start after enabling encryption

Hi ,

Using Percona for mongodb 4.2 latest release on a Centos7 VM.

Tried multiple cases as below.

Storage engine encryption :
[LIST]
[*]Method 1:: Local key management :-
[/LIST]

Test 1 : placed the key with OpenSSL random 32 bit key and made the necessary config changes.

Instance does not start: I am on Centos7 This seems to be a bug similar to below JIRA. Workaround to change SELinux audit rules doesn’t help either.

[URL]https://jira.mongodb.org/browse/SERVER-24101[/URL] and [url]https://jira.mongodb.org/browse/SERVER-39648[/url]

Error message:

Mar 12 00:43:21 mongo01 sshd[10539]: Received disconnect from 157.230.123.253 port 57740:11: Normal Shutdown, Thank you for playing [preauth]
Mar 12 00:43:21 mongo01 sshd[10539]: Disconnected from 157.230.123.253 port 57740 [preauth]
Mar 12 00:43:33 mongo01 sshd[10541]: Received disconnect from 157.230.123.253 port 37478:11: Normal Shutdown, Thank you for playing [preauth]
Mar 12 00:43:33 mongo01 sshd[10541]: Disconnected from 157.230.123.253 port 37478 [preauth]
Mar 12 00:43:38 mongo01 polkitd[378]: Registered Authentication Agent for unix-process:10544:2365227 (system bus name :1.81 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
Mar 12 00:43:40 mongo01 polkitd[378]: Unregistered Authentication Agent for unix-process:10544:2365227 (system bus name :1.81, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

Test 2: Single instance without authentication.

Mar 13 00:22:48 mongo00 polkitd[363]: Registered Authentication Agent for unix-process:17706:10888067 (system bus name :1.198 [/usr/bin/pkttyagent --notify-fd 5 --fallback],
Mar 13 00:22:48 mongo00 systemd[1]: Starting High-performance, schema-free document-oriented database…
– Subject: Unit mongod.service has begun start-up
– Defined-By: systemd
– Support: [url]http://lists.freedesktop.org/mailman/listinfo/systemd-devel[/url]

– Unit mongod.service has begun starting up.
Mar 13 00:22:48 mongo00 systemd[1]: mongod.service: control process exited, code=exited status=2
Mar 13 00:22:48 mongo00 systemd[1]: Failed to start High-performance, schema-free document-oriented database.
– Subject: Unit mongod.service has failed
– Defined-By: systemd
– Support: [url]http://lists.freedesktop.org/mailman/listinfo/systemd-devel[/url]

– Unit mongod.service has failed.

– The result is failed.
Mar 13 00:22:48 mongo00 systemd[1]: Unit mongod.service entered failed state.
Mar 13 00:22:48 mongo00 systemd[1]: mongod.service failed.
Mar 13 00:22:48 mongo00 polkitd[363]: Unregistered Authentication Agent for unix-process:17706:10888067 (system bus name :1.198, object path /org/freedesktop/PolicyKit1/Authe

Test 3: Single instance without authentication.

– Unit mongod.service has begun starting up.
Mar 13 00:40:53 mongo00 systemd[1]: mongod.service: control process exited, code=exited status=2
Mar 13 00:40:53 mongo00 systemd[1]: Failed to start High-performance, schema-free document-oriented database.
– Subject: Unit mongod.service has failed
– Defined-By: systemd
– Support: [url]http://lists.freedesktop.org/mailman/listinfo/systemd-devel[/url]

– Unit mongod.service has failed.

– The result is failed.
Mar 13 00:40:53 mongo00 systemd[1]: Unit mongod.service entered failed state.
Mar 13 00:40:53 mongo00 systemd[1]: mongod.service failed.
Mar 13 00:40:53 mongo00 polkitd[363]: Unregistered Authentication Agent for unix-process:19039:10996553 (system bus name :1.203, object path /org/freedesktop/PolicyKit1/Authe

I am interested to incorporate both Encryption at REST and TRANSPORT. Can you please let me know which is the best environment it works without issues or what is the solution to this?

Thanks,
Amritha

Hello,

Can anyone please help here. Sharing a snippet of the mongod log.From the logs I do see that the encryption DB is getting initialized ; however the mongo service is not coming up. It gets timed out and terminates the process.

/var/run/mongod.pid" }, security: { authorization: “enabled”, enableEncryption: true, encryptionKeyFile: “/data/db/mongodb-keyfile” }, storage: { dbPath: “/data/db”, journal: { enabled: true } }, systemLog: { destination: “file”, logAppend: true, path: “/var/log/mongo/mongod.log” } }
2020-03-15T19:40:44.239+0000 I STORAGE [initandlisten] Detected data files in /data/db created by the ‘wiredTiger’ storage engine, so setting the active storage engine to ‘wiredTiger’.
2020-03-15T19:40:44.240+0000 I STORAGE [initandlisten] Initializing KeyDB with wiredtiger_open config: create,config_base=false,extensions=[local=(entry=percona_encryption_extension_init,early_load=true,config=(cipher=AES256-CBC,rotation=false))],encryption=(name=percona,keyid=“”),log=(enabled,file_max=5MB),transaction_sync=(enabled=true,method=fsync),
2020-03-15T19:40:44.975+0000 I STORAGE [initandlisten] Encryption keys DB is initialized successfully
2020-03-15T19:40:44.975+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=256M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],encryption=(name=percona,keyid=“/default”),extensions=[local=(entry=percona_encryption_extension_init,early_load=true,config=(cipher=AES256-CBC)),],
2020-03-15T19:40:45.048+0000 I STORAGE [initandlisten] WiredTiger message [1584301245:48292][16218:0x7fd6040cec40], txn-recover: Recovering log 11 through 12
2020-03-15T19:40:45.270+0000 I STORAGE [initandlisten] WiredTiger message [1584301245:270189][16218:0x7fd6040cec40], txn-recover: Recovering log 12 through 12
2020-03-15T19:40:45.584+0000 I STORAGE [initandlisten] WiredTiger message [1584301245:575689][16218:0x7fd6040cec40], txn-recover: Main recovery loop: starting at 11/6528 to 12/256
2020-03-15T19:40:45.717+0000 I STORAGE [initandlisten] WiredTiger message [1584301245:717937][16218:0x7fd6040cec40], txn-recover: Recovering log 11 through 12
2020-03-15T19:40:45.788+0000 I STORAGE [initandlisten] WiredTiger message [1584301245:788432][16218:0x7fd6040cec40], txn-recover: Recovering log 12 through 12
2020-03-15T19:40:45.857+0000 I STORAGE [initandlisten] WiredTiger message [1584301245:857846][16218:0x7fd6040cec40], txn-recover: Set global recovery timestamp: (0, 0)
2020-03-15T19:40:45.917+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2020-03-15T19:40:46.012+0000 I STORAGE [initandlisten] Timestamp monitor starting
2020-03-15T19:40:46.033+0000 I SHARDING [initandlisten] Marking collection local.system.replset as collection version:
2020-03-15T19:40:46.038+0000 I STORAGE [initandlisten] Flow Control is enabled on this deployment.
2020-03-15T19:40:46.040+0000 I SHARDING [initandlisten] Marking collection admin.system.roles as collection version:
2020-03-15T19:40:46.040+0000 I SHARDING [initandlisten] Marking collection admin.system.version as collection version:
2020-03-15T19:40:46.059+0000 I SHARDING [initandlisten] Marking collection local.startup_log as collection version:
2020-03-15T19:40:46.059+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory ‘/data/db/diagnostic.data’
2020-03-15T19:40:46.066+0000 I SHARDING [LogicalSessionCacheRefresh] Marking collection config.system.sessions as collection version:
2020-03-15T19:40:46.067+0000 I SHARDING [LogicalSessionCacheReap] Marking collection config.transactions as collection version:
2020-03-15T19:40:46.067+0000 I NETWORK [listener] Listening on /tmp/mongodb-27017.sock
2020-03-15T19:40:46.067+0000 I NETWORK [listener] Listening on 127.0.0.1
2020-03-15T19:40:46.067+0000 I NETWORK [listener] waiting for connections on port 27017
2020-03-15T19:40:47.006+0000 I SHARDING [ftdc] Marking collection local.oplog.rs as collection version: . <— gets stuck here …

2020-03-15T19:42:13.759+0000 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2020-03-15T19:42:13.763+0000 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets…
2020-03-15T19:42:13.877+0000 I NETWORK [listener] removing socket file: /tmp/mongodb-27017.sock
2020-03-15T19:42:13.877+0000 I - [signalProcessingThread] Stopping further Flow Control ticket acquisitions.
2020-03-15T19:42:13.878+0000 I CONTROL [signalProcessingThread] Shutting down free monitoring
2020-03-15T19:42:13.878+0000 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2020-03-15T19:42:13.880+0000 I STORAGE [signalProcessingThread] Deregistering all the collections
2020-03-15T19:42:13.880+0000 I STORAGE [signalProcessingThread] Timestamp monitor shutting down
2020-03-15T19:42:13.881+0000 I STORAGE [signalProcessingThread] WiredTigerKVEngine shutting down
2020-03-15T19:42:13.881+0000 I STORAGE [signalProcessingThread] Shutting down session sweeper thread
2020-03-15T19:42:13.881+0000 I STORAGE [signalProcessingThread] Finished shutting down session sweeper thread
2020-03-15T19:42:13.881+0000 I STORAGE [signalProcessingThread] Shutting down journal flusher thread
2020-03-15T19:42:13.977+0000 I STORAGE [signalProcessingThread] Finished shutting down journal flusher thread
2020-03-15T19:42:13.977+0000 I STORAGE [signalProcessingThread] Shutting down checkpoint thread
2020-03-15T19:42:13.978+0000 I STORAGE [signalProcessingThread] Finished shutting down checkpoint thread
2020-03-15T19:42:14.003+0000 I STORAGE [signalProcessingThread] shutdown: removing fs lock…
2020-03-15T19:42:14.003+0000 I CONTROL [signalProcessingThread] now exiting
2020-03-15T19:42:14.003+0000 I CONTROL [signalProcessingThread] shutting down with code:0

Thanks,
Amritha.

Hello ashenoy9 could you update with a bit more context e.g. is this a development or test environment you are working in? Or is it production?
Did you get it working elsewhere or is this your first try?

While you get that info, I will see if I can get anyone from the team to help with this anyway, and to answer your question “I am interested to incorporate both Encryption at REST and TRANSPORT. Can you please let me know which is the best environment it works without issues or what is the solution to this?”

Hi Amritha.

I suspect it’s something in SELinux that’s taking things out. Looking around for similar issues I found one person who found the mongod’s *.sock file in the /tmp/ directory that was stuck. But then I found another idea, and another, beyond those two JIRA tickets you shared. (They do look related too. The second one’s advice to check [URL]https://docs.mongodb.com/manual/tutorial/install-mongodb-on-red-hat/index.html#configure-selinux[/URL] is something I would do, but only after differential diagnosic test to show that it works when SELinux is competely off.) So there are more ways to get stuck with SELinux that I feel I can diagnose effectively back from the mongod’s symptoms.

If I were to start CentOS 7 image in say AWS I am confident of being able to start the mongod (4.2 or any supported version) with SELinux off. That includes enabling both network encryption (the net.tls.* config options (formerly net.ssl, but renamed to “tls” since 4.2) and the data-at-rest (security.enableEncryption config option), plus the taken-for-granted authentication options. Pease note that one of the authentication-related options is called “keyfile” too, which is confusing. (I have a few notes in [URL]https://www.percona.com/blog/2019/07/12/mongodb-security-vs-five-bad-guys/[/URL] which note a few uinituitive things like that).

Anyhow I wanted to ask if I were to make a CentOS 7 VM in AWS, with SELinux off, and I was to set up a normal mongod process with data-at-rest and network encryption enabled, what should I then do to enable SELinux as it is on your server?

Regards,

Akira

Hi Lorraine,

This is the first time I am setting this up . The need is to test both encryption at rest and transport .If it works well, we are planning to take it forward to production in one of our use cases. Thank you for getting back to my post.

Best,
Amritha.

Hi Akira,

I am trying to set up a simple 3 node replica set (1 primary and 2 secondaries)

Current SELinux config:

sestatus
SELinux status: disabled

This file controls the state of SELinux on the system.

SELINUX= can take one of these three values:

enforcing - SELinux security policy is enforced.

permissive - SELinux prints warnings instead of enforcing.

disabled - No SELinux policy is loaded.

SELINUX=disabled

SELINUXTYPE= can take one of three values:

targeted - Targeted processes are protected,

minimum - Modification of targeted policy. Only selected processes are protected.

mls - Multi Level Security protection.

SELINUXTYPE=targeted

uname -a
Linux mongo00 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

mongod --version
db version v4.2.3-4
git version: 595a91c2903eade8162806a86008bc14895a0cbf
OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
allocator: tcmalloc
modules: none
build environment:
distarch: x86_64
target_arch: x86_64

Thanks,
Amritha.

Hi Akira,

I rebuilt the cluster with changes in the SELINUX config and the service is up . However, I am getting the below alert and unable to connect to the mongo shell .Also the inter node communication is not happening.

Connection string:

mongo --host hostname -i --tls --tlsCAFile /etc/mongodb/ssl/mongoCA1.crt --tlsCertificateKeyFile /etc/mongodb/ssl/mongo1.pem -u mongoroot -p --authenticationDatabase admin

Percona Server for MongoDB shell version v4.2.3-4
Enter password:
connecting to: mongodb://:27017/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb
2020-03-17T23:28:04.439+0000 E NETWORK [js] SSL peer certificate validation failed: self signed certificate
2020-03-17T23:28:04.441+0000 E QUERY [js] Error: couldn’t connect to server :27017, connection attempt failed: SSLHandshakeFailed: SSL peer certificate validation failed: self signed certificate :

I have referred to the below article to set up TLS config:

[url]https://www.percona.com/blog/2018/05/31/mongodb-deploy-replica-set-with-transport-encryption-part-3/[/url]

###############################

Errors in logs:

2020-03-17T23:23:45.210+0000 E NETWORK [ReplicaSetMonitor-TaskExecutor] SSL peer certificate validation failed: self signed certificate
2020-03-17T23:28:04.424+0000 I NETWORK [listener] connection accepted from :42620 #681 (1 connection now open)
2020-03-17T23:28:04.439+0000 E NETWORK [conn681] SSL peer certificate validation failed: self signed certificate
2020-03-17T23:28:04.439+0000 I NETWORK [conn681] Error receiving request from client: SSLHandshakeFailed: SSL peer certificate validation failed: self signed certificate. Ending connection from :42620 (connection id: 681)
2020-03-17T23:28:04.439+0000 I NETWORK [conn681] end connection :42620 (0 connections now open)

###############################

Below is my config parameters in mongod.conf:

network interfaces

net:
port: 27017
bindIp: 127.0.0.1,
tls:
mode: requireTLS
certificateKeyFile: /etc/mongodb/ssl/mongo1.pem
CAFile: /etc/mongodb/ssl/mongoCA1.crt
clusterFile: /etc/mongodb/ssl/mongo1.pem
security:
authorization: enabled
clusterAuthMode: x509
enableEncryption: true
encryptionKeyFile: /var/lib/mongo/mongodb-keyfile

#operationProfiling:

replication:
replSetName: rs1

Please help as I am stuck and need to overcome this to proceed further.

Thank you,
Amritha.

Hi Amritha.

I notice the error “SSL peer certificate validation failed: self signed certificate.” most in the symptoms shared above.

Self-signed certificates are something that pretty much all of us do (myself included) when testing TLS configuration, but they’re not totally secure. If I recall correctly the mongod and mongos nodes will, at default settings, reject client authentication attempts when they’re being used.

If I recall correctly the root cause of the issue is that the hostnames information with self-signed certificates aren’t trusted. So as a troubleshooting step set net.tls.allowInvalidHostnames to true and try again. I believe it will suppress the error. After that the next level of deployment progress is to get certificate authority-issued certificates and disabled (or omit) the net.tls.allowInvalidHostnames setting from the config again.

Not that changing this setting as differential diagnostic proves or disprove only that the self-signed certificate is the issue. It could be other things, but I think this is a very common gotcha. Anyhow please do also see [URL]https://www.percona.com/blog/2019/07/30/network-transport-encryption-for-mongodb/[/URL], including some links to other testing/installing/troubleshooting tutorials.

Cheers,

Akira

Thank you Akira, that article was really good. I also succeeded in bringing up a working replica set with tls and data at rest encryption enabled. I will get back if I have any doubts concerning key management vaults/ key rotation.

Thank you for your timely help.

Best,
Amritha.

Nice to hear! Enjoy.

Akira