Any suggestion for choosing MongoDB jdbc driver

Hi, my team is looking at MongoDB as the database for many existing Spark programs which currently are doing ELT from file, JSON, DB2, Oracle, and MS SQL to MS SQL Server 2019. We are using jdbc drivers within these PySpark programs with SQL statements. As you can imagine the challenges are either we rewrite these sQL statements to native MongoDB syntax or we find a jdbc driver that can convert them automatically. Any suggestion will be very much appreciated. So far we found unity and hope more options to evaluate.

If there was one common way to port relational tables to MongoDB collections an auto-translation wrapper would have been a bundled tool with MongoDB as of Year 2 at the latest I am sure.

But what schema you’ll choose once you have the convenience of a document format instead of table-row format is too varied. ‘Child’ rows may or may not become nested array items, and if they are nested, do you keep the foreign-key linking id values or throw them away for being redundant. Is there a unique key to use for the “_id” primary key that MongoDB collections require? If not then you’ll have to insert one, or if you don’t there will be an auto-generated ObjectId type of one. Nest array values [0, 1, … n], or nested objects with key names for convenience {‘foo’: xx, ‘bar’: yy, … }. Etc. etc.

So that sort of facility doesn’t exist.

For a different user requirement to yours: There are services which are used put a relational-table facade connection in front of a MongoDB instance. This has a value to BI software suites that are stuck with RDMBS and ODBC assumptions. But they need you make some very complex and ugly mapping files to get it to work as you expect. It is not any better going the other way. I thought I would share that before google searching lead you there and you spend many hours reading about that before working out that those tools are not for your use-case.

MongoDB drivers provide a simpler API than ODBC+SQL to work with database records as objects in your code. Thinking of it as a SQL translation project is complex, and as you’re trying to do this as a migration short-cut it is natural for your gut to tell you that if the ‘shortcut’ is this hard, then rewriting the code for the ‘native’ MongoDB driver API must be even harder. But it is the opposite.