So… you suppose all of your information in a specific area are a string sort, however whenever you attempt to run your question, you get some errors. Doing extra investigation, it seems like you may have some int and undefined sorts as properly. Bummer…
Despair not! We are able to truly work round this (with out information prep 😉). To recap, in our first weblog, we created an integration with MongoDB on Rockset, so Rockset can learn and [update] the info coming in MongoDB. As soon as the info is in Rockset, we are able to run SQL on schemaless and unstructured information.
The info:
Embedded content material: https://gist.github.com/nfarah86/ef1cc9da88e56226c4c46fd0e3c8e16e
We have an interest within the release_date
area: "release_date": "1991-06-07".
The question:
Rockset has a perform referred to as DATE_PARSE(), which lets you flip a string formatted date right into a date object. If you wish to order motion pictures by simply the yr, you need to use EXTRACT().
Basically, when you flip your string formatted date right into a date object, you’ll be able to then extract the yr.
At first look, this appears fairly straightforward to resolve— in case you needed to order all of the film titles by the discharge yr, you’ll be able to write one thing like this:
SELECT
t.title, t.release_date
FROM
commons.TwtichMovies t
ORDER BY
EXTRACT(
YEAR
FROM
DATE_PARSE(t.release_date, '%Y-%m-%d')
) DESC
;
When working this question, we get a timestamp parsing error:
Error [Query]
Timestamp parse error:
This might imply you’re working with different information sorts that aren’t strings. To examine, you’ll be able to write one thing like this:
SELECT
t.title, TYPEOF(t.release_date)
FROM
commons.TwtichMovies t
WHERE
TYPEOF(t.release_date) != 'string'
;
That is what we get again:
Now, that we all know what’s inflicting the error, we are able to re-write the question to discard something that’s not a string sort— proper 🤗?
SELECT
t.title, t.release_date
FROM
commons.TwtichMovies t
WHERE
TYPEOF(t.release_date) = 'string';
ORDER BY
EXTRACT(
YEAR
FROM
DATE_PARSE(t.release_date, '%Y-%m-%d')
)DESC
;
WRONG 🥺! This truly returns a timestamp parsing error as properly:
Error [Query]
Timestamp parse error
You are most likely saying to your self, “what the heck.” One case we didn’t consider earlier is that there might be empty strings 🤯- If we run the next question:
SELECT DATE_PARSE('', '%Y-%m-%d');
We get the identical timestamp parsing error again:
Error [Query]
Timestamp parse error
Aha.
How can we truly write this question to keep away from the timestamp parsing errors? Right here, we are able to truly examine the LENGTH() of the string and filter out every part that doesn’t meet the size requirement— so one thing like this:
WHERE LENGTH(t.release_date) = 10
We are able to additionally TRY_CAST() t.release_date
to a string. If the sector worth can’t be was a string, a null
worth is returned (i.e. it gained’t error out). Placing this all collectively, we are able to technically write one thing like this:
SELECT
t.title,
t.release_date
FROM
commons.TwtichMovies t
WHERE
TRY_CAST(t.release_date AS string) shouldn't be null
AND LENGTH(TRY_CAST(t.release_date AS string)) = 10
ORDER BY
EXTRACT(
YEAR
FROM
DATE_PARSE(t.release_date, '%Y-%m-%d')
)
;
Voila! it really works!
In the course of the stream, I truly wrote a extra difficult model of this question. The above question and the question within the stream are equal. We additionally wrote queries that combination! You may catch the total breakdown of the session under:
Embedded content material: https://youtu.be/PGpEsg7Qw7A
TLDR: you could find all of the assets it’s essential get began on Rockset within the developer nook.