Skip to content

Test all examples from library-user-guide & user-guide docs#14544

Merged
alamb merged 77 commits into
apache:mainfrom
ugoa:run-test-from-library-user-guide-docs
Feb 10, 2025
Merged

Test all examples from library-user-guide & user-guide docs#14544
alamb merged 77 commits into
apache:mainfrom
ugoa:run-test-from-library-user-guide-docs

Conversation

@ugoa

@ugoa ugoa commented Feb 7, 2025

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Library User Guide:

  • docs/source/library-user-guide/adding-udfs.md
  • docs/source/library-user-guide/api-health.md
  • docs/source/library-user-guide/building-logical-plans.md
  • docs/source/library-user-guide/catalogs.md
  • docs/source/library-user-guide/custom-table-providers.md
  • docs/source/library-user-guide/extending-operators.md
  • docs/source/library-user-guide/extensions.md
  • docs/source/library-user-guide/index.md
  • docs/source/library-user-guide/profiling.md
  • docs/source/library-user-guide/query-optimizer.md
  • docs/source/library-user-guide/using-the-dataframe-api.md
  • docs/source/library-user-guide/using-the-sql-api.md
  • docs/source/library-user-guide/working-with-exprs.md

User Guide:

  • docs/source/user-guide/example-usage.md
  • docs/source/user-guide/faq.md
  • docs/source/user-guide/introduction.md
  • docs/source/user-guide/cli/index.rst
  • docs/source/user-guide/cli/overview.md
  • docs/source/user-guide/cli/datasources.md
  • docs/source/user-guide/cli/usage.md
  • docs/source/user-guide/cli/installation.md
  • docs/source/user-guide/configs.md
  • docs/source/user-guide/concepts-readings-events.md
  • docs/source/user-guide/dataframe.md
  • docs/source/user-guide/explain-usage.md
  • docs/source/user-guide/expressions.md
  • docs/source/user-guide/crate-configuration.md
  • docs/source/user-guide/sql/sql_status.md
  • docs/source/user-guide/sql/index.rst
  • docs/source/user-guide/sql/scalar_functions.md
  • docs/source/user-guide/sql/write_options.md
  • docs/source/user-guide/sql/explain.md
  • docs/source/user-guide/sql/information_schema.md
  • docs/source/user-guide/sql/select.md
  • docs/source/user-guide/sql/special_functions.md
  • docs/source/user-guide/sql/dml.md
  • docs/source/user-guide/sql/operators.md
  • docs/source/user-guide/sql/ddl.md
  • docs/source/user-guide/sql/aggregate_functions.md
  • docs/source/user-guide/sql/data_types.md
  • docs/source/user-guide/sql/subqueries.md
  • docs/source/user-guide/sql/window_functions.md

Are these changes tested?

Yes

Are there any user-facing changes?

No

Comment thread datafusion/core/Cargo.toml Outdated
test-utils = { path = "../../test-utils" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot", "fs"] }
dashmap = "6.1.0"
snmalloc-rs = "0.3"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need snmalloc?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment below I don't think we can use snmalloc in the dev dependencies without messing up the other builds -- can you please remove that (and maybe add a note to the example that tried to use it?)

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

Ah, I see the snmalloc comes from running

docs/source/user-guide/crate-configuration.md

I think the issue there is that snmalloc takes over the global allocator,

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @ugoa -- this is looking really cool

Let's try and get this PR ready for merge -- it is already an amazing improvement. I'll try and fix up the tests and get it in later today. Then maybe we can make additional improvements as follow ons

Comment thread datafusion/core/Cargo.toml Outdated
test-utils = { path = "../../test-utils" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot", "fs"] }
dashmap = "6.1.0"
snmalloc-rs = "0.3"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment below I don't think we can use snmalloc in the dev dependencies without messing up the other builds -- can you please remove that (and maybe add a note to the example that tried to use it?)

Comment thread docs/rustdoc_trim.py
# specific language governing permissions and limitations
# under the License.

import re

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to DeepSeek generating this code snippet haha, too powerful

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

I also double checked the "hide lines starting with # and it looks great!"

Screenshot 2025-02-10 at 8 14 32 AM

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

I will push a few commits to fix this

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the generation script, such as

And add the sql there (there are equivalents for the other types of functions)

Perhaps you can make a follow on PR?

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

I plan to merge this PR in once the tests pass

@ugoa

ugoa commented Feb 10, 2025

Copy link
Copy Markdown
Contributor Author

Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the generation script

Ah I didn't know that, so I can generate the doc by run cargo run --bin print_functions_docs -- scalar right?

And add the sql there (there are equivalents for the other types of functions)
Perhaps you can make a follow on PR?

Sure I can open another PR to address this, cheers.

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

Making the examples in the doc rendred with sql rather than ``` was a great idea. However, since those files are automatically generated from the source code we need to update the generation script

Ah I didn't know that, so I can generate the doc by run cargo run --bin print_functions_docs -- scalar right?

The way I do it is

./dev/update_function_docs.sh

And add the sql there (there are equivalents for the other types of functions)
Perhaps you can make a follow on PR?

Sure I can open another PR to address this, cheers.

🙏 thank you so much

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

Ah, I see now the issue is that we'll try and test all examples: https://github.com/apache/datafusion/actions/runs/13241873169/job/36958947894?pr=14544

So they need to be marked as SQL to avoid CI failures. I'll look into it

@ugoa

ugoa commented Feb 10, 2025

Copy link
Copy Markdown
Contributor Author

So they need to be marked as SQL to avoid CI failures. I'll look into it

Yeah, with just ``` it will be treated as rust code and be run and tested

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

Yeah, with just ``` it will be treated as rust code and be run and tested

I pushed a fix in e0b360c

Let's see if we can get a clean run

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

So close...

@alamb

alamb commented Feb 10, 2025

Copy link
Copy Markdown
Contributor

🚀

Thanks again @ugoa -- this is great work

FYI @tshauck who I think started this documentation many months ago

@alamb alamb merged commit 3f900ac into apache:main Feb 10, 2025
@ugoa

ugoa commented Feb 10, 2025

Copy link
Copy Markdown
Contributor Author

My pleasure! I learned a lot during the time as well.

@ugoa ugoa deleted the run-test-from-library-user-guide-docs branch February 10, 2025 15:42
@ugoa

ugoa commented Feb 11, 2025

Copy link
Copy Markdown
Contributor Author

Hey @alamb , btw we probably need to remove the installation of cmake in the .github/actions/setup-builder/action.yaml since it was added for snmalloc-rs = "0.3" which we don't need anymore

@alamb

alamb commented Feb 11, 2025

Copy link
Copy Markdown
Contributor

Hey @alamb , btw we probably need to remove the installation of cmake in the .github/actions/setup-builder/action.yaml since it was added for snmalloc-rs = "0.3" which we don't need anymore

Makes sense -- thank you @ugoa

Can you make a PR to do so?

@ugoa

ugoa commented Feb 11, 2025

Copy link
Copy Markdown
Contributor Author

@alamb Sure, here it is #14606, please help review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hide boilerplate in documentation examples Run / Test all examples in Documentation

2 participants