-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Query
.fetch_raw
#182
base: main
Are you sure you want to change the base?
WIP Query
.fetch_raw
#182
Conversation
examples/raw_stream_json_each_row.rs
Outdated
.query( | ||
" | ||
SELECT number, hex(randomPrintableASCII(20)) AS hex_str | ||
FROM system.numbers | ||
LIMIT 10 | ||
FORMAT JSONEachRow | ||
", | ||
) | ||
// By default, ClickHouse quotes (U)Int64 in JSON* family formats; | ||
// disable it to simplify this example. | ||
.with_option("output_format_json_quote_64bit_integers", "0") | ||
.fetch_raw()? | ||
.newline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe like this?
.query( | |
" | |
SELECT number, hex(randomPrintableASCII(20)) AS hex_str | |
FROM system.numbers | |
LIMIT 10 | |
FORMAT JSONEachRow | |
", | |
) | |
// By default, ClickHouse quotes (U)Int64 in JSON* family formats; | |
// disable it to simplify this example. | |
.with_option("output_format_json_quote_64bit_integers", "0") | |
.fetch_raw()? | |
.newline(); | |
.query( | |
" | |
SELECT number, hex(randomPrintableASCII(20)) AS hex_str | |
FROM system.numbers | |
LIMIT 10 | |
", | |
) | |
// By default, ClickHouse quotes (U)Int64 in JSON* family formats; | |
// disable it to simplify this example. | |
.with_option("output_format_json_quote_64bit_integers", "0") | |
.with_format("JSONEachRow") | |
.fetch_raw()? | |
.newline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because specifying FORMAT xxx
as part of the SQL feels not convenient and a bit more error prone.
Ideally, I would prefer with_format(clickhouse::Format::JsonEachRow)
if the maintenance burden of mapping all formats to an enum isn't too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible after a few tweaks to the SQLBuilder. See this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave the raw API as-is. The user will quickly find an invalid FORMAT
with the first query.
Same thread #182 (comment)
Open questions so far:
Fails on the CI for some reason.
|
", | ||
) | ||
// By default, ClickHouse quotes (U)Int64 in JSON* family formats; | ||
// disable it to simplify this example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to simplify this example
because serde_json
supports u64 natively, there is no reason to use strings here
|
||
// NB: there is no `Row` derive here | ||
#[derive(Debug, Deserialize)] | ||
struct MyRowInJSONEachRowFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest avoiding any use cases where using non-raw API is preferred.
Here we can use https://docs.rs/serde_json/latest/serde_json/enum.Value.html
@@ -13,7 +14,7 @@ use crate::{ | |||
|
|||
// === RawCursor === | |||
|
|||
struct RawCursor(RawCursorInner); | |||
pub struct RawCursor(RawCursorInner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs (preparing for warn(missing_docs)
lint).
use std::fmt; | ||
|
||
#[derive(Debug, Clone)] | ||
pub enum OutputFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to leave this to the user. This format seems useless for us and will require extending this enum in the future.
Btw: all public enums should be annotated with #[non_exhaustive]
(for further compatible extensions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, we can use this info to better understand errors (like sent inside payload in JSON case), but probably there more seamless way
use clickhouse::format::OutputFormat; | ||
|
||
#[tokio::test] | ||
async fn fetch_raw_with_error() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about testing a successful path?
@@ -11,7 +11,9 @@ use std::{collections::HashMap, fmt::Display, sync::Arc}; | |||
pub use self::{compression::Compression, row::Row}; | |||
pub use clickhouse_derive::Row; | |||
|
|||
pub mod cursor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be implementation details; let's reexport from mod query
where this is actually used.
@@ -27,11 +28,18 @@ struct RawCursorLoading { | |||
} | |||
|
|||
impl RawCursor { | |||
fn new(response: Response) -> Self { | |||
pub(crate) fn new(response: Response) -> Self { | |||
Self(RawCursorInner::Waiting(response.into_future())) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we should also add into_futures03(self) -> RawStream03
(implementing https://docs.rs/futures/latest/futures/prelude/trait.Stream.html) under a new futures03
feature (to support future v1 and AsyncIterator
from std when they will released).
This is de facto the standard for streams (with built-in combinators, e.g. concat) and should be used for owned types (like Bytes
).
Alternatively, we can directly implement Stream
for RawCursor
(also under the feature).
@@ -0,0 +1,54 @@ | |||
use std::str::from_utf8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another real use case is to collect everything into one Bytes
(or Vec<u8>
). It's nice to have such an example, too!
Self(RawCursorInner::Waiting(response.into_future())) | ||
} | ||
|
||
async fn next(&mut self) -> Result<Option<Bytes>> { | ||
pub fn newline(self) -> RawCursorNewline { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The futures
crate already provides the AsyncBufReadExt::lines method.
It seems we can implement the AsyncBufRead
(under the futures03
feature) trait for RawCursor
covering more use cases, for instance the use of io::copy. This function can be also used in the example of streaming to a file.
|
||
let mut file = File::create(filename).await.unwrap(); | ||
|
||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #182 (comment) for how to make this example more canonical.
Summary
Adds a method to fetch raw bytes from a query, exposes
RawCursor
, and adds an additional cursor type (RawCursorNewline
) to work with arbitrary formats where a newline character separates rows. See the related examples regarding streaming into a file, as well as streaming separate*EachRow
records.Checklist